History
Linked lists were developed in 1955–1956, by Allen Newell, Cliff Shaw and Herbert A. Simon at RAND Corporation and Carnegie Mellon University as the primaryBasic concepts and nomenclature
Each record of a linked list is often called an 'element' or ' node'. The field of each node that contains the address of the next node is usually called the 'next link' or 'next pointer'. The remaining fields are known as the 'data', 'information', 'value', 'cargo', or 'payload' fields. The 'head' of a list is its first node. The 'tail' of a list may refer either to the rest of the list after the head, or to the last node in the list. InSingly linked list
Singly linked lists contain nodes which have a 'value' field as well as 'next' field, which points to the next node in line of nodes. Operations that can be performed on singly linked lists include insertion, deletion and traversal. The following C language code demonstrates how to add a new node with the "value" to the end of a singly linked list:Doubly linked list
In a 'doubly linked list', each node contains, besides the next-node link, a second link field pointing to the 'previous' node in the sequence. The two links may be called 'forward('s') and 'backwards', or 'next' and 'prev'('previous'). A technique known as XOR-linking allows a doubly linked list to be implemented using a single link field in each node. However, this technique requires the ability to do bit operations on addresses, and therefore may not be available in some high-level languages. Many modern operating systems use doubly linked lists to maintain references to active processes, threads, and other dynamic objects. A common strategy for rootkits to evade detection is to unlink themselves from these lists.Multiply linked list
In a 'multiply linked list', each node contains two or more link fields, each field being used to connect the same set of data arranged in a different order (e.g., by name, by department, by date of birth, etc.). While a doubly linked list can be seen as a special case of multiply linked list, the fact that the two and more orders are opposite to each other leads to simpler and more efficient algorithms, so they are usually treated as a separate case.Circular linked list
In the last node of a linked list, the link field often contains aSentinel nodes
In some implementations an extra 'sentinel' or 'dummy' node may be added before the first data record or after the last one. This convention simplifies and accelerates some list-handling algorithms, by ensuring that all links can be safely dereferenced and that every list (even one that contains no data elements) always has a "first" and "last" node.Empty lists
An empty list is a list that contains no data records. This is usually the same as saying that it has zero nodes. If sentinel nodes are being used, the list is usually said to be empty when it has only sentinel nodes.Hash linking
The link fields need not be physically part of the nodes. If the data records are stored in an array and referenced by their indices, the link field may be stored in a separate array with the same indices as the data records.List handles
Since a reference to the first node gives access to the whole list, that reference is often called the 'address', 'pointer', or 'handle' of the list. Algorithms that manipulate linked lists usually get such handles to the input lists and return the handles to the resulting lists. In fact, in the context of such algorithms, the word "list" often means "list handle". In some situations, however, it may be convenient to refer to a list by a handle that consists of two links, pointing to its first and last nodes.Combining alternatives
The alternatives listed above may be arbitrarily combined in almost every way, so one may have circular doubly linked lists without sentinels, circular singly linked lists with sentinels, etc.Tradeoffs
As with most choices in computer programming and design, no method is well suited to all circumstances. A linked list data structure might work well in one case, but cause problems in another. This is a list of some of the common tradeoffs involving linked list structures.Linked lists vs. dynamic arrays
A '' dynamic array'' is a data structure that allocates all elements contiguously in memory, and keeps a count of the current number of elements. If the space reserved for the dynamic array is exceeded, it is reallocated and (possibly) copied, which is an expensive operation. Linked lists have several advantages over dynamic arrays. Insertion or deletion of an element at a specific point of a list, assuming that a pointer is indexed to the node (before the one to be removed, or before the insertion point) already, is a constant-time operation (otherwise without this reference it is O(n)), whereas insertion in a dynamic array at random locations will require moving half of the elements on average, and all the elements in the worst case. While one can "delete" an element from an array in constant time by somehow marking its slot as "vacant", this causes fragmentation that impedes the performance of iteration. Moreover, arbitrarily many elements may be inserted into a linked list, limited only by the total memory available; while a dynamic array will eventually fill up its underlying array data structure and will have to reallocate—an expensive operation, one that may not even be possible if memory is fragmented, although the cost of reallocation can be averaged over insertions, and the cost of an insertion due to reallocation would still be amortized O(1). This helps with appending elements at the array's end, but inserting into (or removing from) middle positions still carries prohibitive costs due to data moving to maintain contiguity. An array from which many elements are removed may also have to be resized in order to avoid wasting too much space. On the other hand, dynamic arrays (as well as fixed-sizeSingly linked linear lists vs. other lists
While doubly linked and circular lists have advantages over singly linked linear lists, linear lists offer some advantages that make them preferable in some situations. A singly linked linear list is a recursive data structure, because it contains a pointer to a ''smaller'' object of the same type. For that reason, many operations on singly linked linear lists (such as merging two lists, or enumerating the elements in reverse order) often have very simple recursive algorithms, much simpler than any solution using iterative commands. While those recursive solutions can be adapted for doubly linked and circularly linked lists, the procedures generally need extra arguments and more complicated base cases. Linear singly linked lists also allow tail-sharing, the use of a common final portion of sub-list as the terminal portion of two different lists. In particular, if a new node is added at the beginning of a list, the former list remains available as the tail of the new one—a simple example of a persistent data structure. Again, this is not true with the other variants: a node may never belong to two different circular or doubly linked lists. In particular, end-sentinel nodes can be shared among singly linked non-circular lists. The same end-sentinel node may be used for ''every'' such list. Innil
or ()
.
The advantages of the fancy variants are often limited to the complexity of the algorithms, not in their efficiency. A circular list, in particular, can usually be emulated by a linear list together with two variables that point to the first and last nodes, at no extra cost.
Doubly linked vs. singly linked
Double-linked lists require more space per node (unless one uses XOR-linking), and their elementary operations are more expensive; but they are often easier to manipulate because they allow fast and easy sequential access to the list in both directions. In a doubly linked list, one can insert or delete a node in a constant number of operations given only that node's address. To do the same in a singly linked list, one must have the ''address of the pointer'' to that node, which is either the handle for the whole list (in case of the first node) or the link field in the ''previous'' node. Some algorithms require access in both directions. On the other hand, doubly linked lists do not allow tail-sharing and cannot be used as persistent data structures.Circularly linked vs. linearly linked
A circularly linked list may be a natural option to represent arrays that are naturally circular, e.g. the corners of a polygon, a pool of buffers that are used and released in FIFO ("first in, first out") order, or a set of processes that should be time-shared in round-robin order. In these applications, a pointer to any node serves as a handle to the whole list. With a circular list, a pointer to the last node gives easy access also to the first node, by following one link. Thus, in applications that require access to both ends of the list (e.g., in the implementation of a queue), a circular structure allows one to handle the structure by a single pointer, instead of two. A circular list can be split into two circular lists, in constant time, by giving the addresses of the last node of each piece. The operation consists in swapping the contents of the link fields of those two nodes. Applying the same operation to any two nodes in two distinct lists joins the two list into one. This property greatly simplifies some algorithms and data structures, such as the quad-edge and face-edge. The simplest representation for an empty ''circular'' list (when such a thing makes sense) is a null pointer, indicating that the list has no nodes. Without this choice, many algorithms have to test for this special case, and handle it separately. By contrast, the use of null to denote an empty ''linear'' list is more natural and often creates fewer special cases. For some applications, it can be useful to use singly linked lists that can vary between being circular and being linear, or even circular with a linear initial segment. Algorithms for searching or otherwise operating on these have to take precautions to avoid accidentally entering an endless loop. One well-known method is to have a second pointer walking the list at half or double the speed, and if both pointers meet at the same node, a cycle has been found.Using sentinel nodes
Sentinel node may simplify certain list operations, by ensuring that the next or previous nodes exist for every element, and that even empty lists have at least one node. One may also use a sentinel node at the end of the list, with an appropriate data field, to eliminate some end-of-list tests. For example, when scanning the list looking for a node with a given value ''x'', setting the sentinel's data field to ''x'' makes it unnecessary to test for end-of-list inside the loop. Another example is the merging two sorted lists: if their sentinels have data fields set to +∞, the choice of the next output node does not need special handling for empty lists. However, sentinel nodes use up extra space (especially in applications that use many short lists), and they may complicate other operations (such as the creation of a new empty list). However, if the circular list is used merely to simulate a linear list, one may avoid some of this complexity by adding a single sentinel node to every list, between the last and the first data nodes. With this convention, an empty list consists of the sentinel node alone, pointing to itself via the next-node link. The list handle should then be a pointer to the last data node, before the sentinel, if the list is not empty; or to the sentinel itself, if the list is empty. The same trick can be used to simplify the handling of a doubly linked linear list, by turning it into a circular doubly linked list with a single sentinel node. However, in this case, the handle should be a single pointer to the dummy node itself.Linked list operations
When manipulating linked lists in-place, care must be taken to not use values that have been invalidated in previous assignments. This makes algorithms for inserting or deleting linked list nodes somewhat subtle. This section gives pseudocode for adding or removing nodes from singly, doubly, and circularly linked lists in-place. Throughout, ''null'' is used to refer to an end-of-list marker or sentinel, which may be implemented in a number of ways.Linearly linked lists
Singly linked lists
The node data structure will have two fields. There is also a variable, ''firstNode'' which always points to the first node in the list, or is ''null'' for an empty list. record ''Node'' record ''List'' Traversal of a singly linked list is simple, beginning at the first node and following each ''next'' link until reaching the end: node := list.firstNode while node not null ''(do something with node.data)'' node := node.next The following code inserts a node after an existing node in a singly linked list. The diagram shows how it works. Inserting a node before an existing one cannot be done directly; instead, one must keep track of the previous node and insert a node after it.removeBeginning()
sets list.firstNode
to null
when removing the last node in the list.
Since it is not possible to iterate backwards, efficient insertBefore
or removeBefore
operations are not possible. Inserting to a list before a specific node requires traversing the list, which would have a worst case running time of O(n).
Appending one linked list to another can be inefficient unless a reference to the tail is kept as part of the List structure, because it is needed to traverse the entire first list in order to find the tail, and then append the second list to this. Thus, if two linearly linked lists are each of length , list appending has asymptotic time complexity of . In the Lisp family of languages, list appending is provided by the append
procedure.
Many of the special cases of linked list operations can be eliminated by including a dummy element at the front of the list. This ensures that there are no special cases for the beginning of the list and renders both insertBeginning()
and removeBeginning()
unnecessary, i.e., every element or node is next to another node (even the first node is next to the dummy node). In this case, the first useful data in the list will be found at list.firstNode.next
.
Circularly linked list
In a circularly linked list, all nodes are linked in a continuous circle, without using ''null.'' For lists with a front and a back (such as a queue), one stores a reference to the last node in the list. The ''next'' node after the last node is the first node. Elements can be added to the back of the list and removed from the front in constant time. Circularly linked lists can be either singly or doubly linked. Both types of circularly linked lists benefit from the ability to traverse the full list beginning at any given node. This often allows us to avoid storing ''firstNode'' and ''lastNode'', although if the list may be empty, there needs to be a special representation for the empty list, such as a ''lastNode'' variable which points to some node in the list or is ''null'' if it is empty; it uses such a ''lastNode'' here. This representation significantly simplifies adding and removing nodes with a non-empty list, but empty lists are then a special case.Algorithms
Assuming that ''someNode'' is some node in a non-empty circular singly linked list, this code iterates through that list starting with ''someNode'': function iterate(someNode) if someNode ≠ null node := someNode do do something with node.value node := node.next while node ≠ someNode Notice that the test "while node ≠ someNode" must be at the end of the loop. If the test was moved to the beginning of the loop, the procedure would fail whenever the list had only one node. This function inserts a node "newNode" into a circular linked list after a given node "node". If "node" is null, it assumes that the list is empty. function insertAfter(''Node'' node, ''Node'' newNode) if node = null // assume list is empty newNode.next := newNode else newNode.next := node.next node.next := newNode update ''lastNode'' variable if necessary Suppose that "L" is a variable pointing to the last node of a circular linked list (or null if the list is empty). To append "newNode" to the ''end'' of the list, one may do insertAfter(L, newNode) L := newNode To insert "newNode" at the ''beginning'' of the list, one may do insertAfter(L, newNode) if L = null L := newNode This function inserts a value "newVal" before a given node "node" in O(1) time. A new node has been created between "node" and the next node, then puts the value of "node" into that new node, and puts "newVal" in "node". Thus, a singly linked circularly linked list with only a ''firstNode'' variable can both insert to the front and back in O(1) time. function insertBefore(''Node'' node, newVal) if node = null // assume list is empty newNode := new Node(data:=newVal, next:=newNode) else newNode := new Node(data:=node.data, next:=node.next) node.data := newVal node.next := newNode update ''firstNode'' variable if necessary This function removes a non-null node from a list of size greater than 1 in O(1) time. It copies data from the next node into the node, and then sets the node's ''next'' pointer to skip over the next node. function remove(''Node'' node) if node ≠ null and size of list > 1 removedData := node.data node.data := node.next.data node.next = node.next.next return removedDataLinked lists using arrays of nodes
Languages that do not support any type ofListHead
would be set to 2, the location of the first entry in the list. Notice that entry 3 and 5 through 7 are not part of the list. These cells are available for any additions to the list. By creating a ListFree
integer variable, a free list could be created to keep track of what cells are available. If all entries are in use, the size of the array would have to be increased or some elements would have to be deleted before new entries could be stored in the list.
The following code would traverse the list and display names and account balance:
i := listHead
while i ≥ 0 ''// loop through the list''
print i, Records name, Records balance ''// print entry''
i := Records next
When faced with a choice, the advantages of this approach include:
* The linked list is relocatable, meaning it can be moved about in memory at will, and it can also be quickly and directly serialized for storage on disk or transfer over a network.
* Especially for a small list, array indexes can occupy significantly less space than a full pointer on many architectures.
* Locality of reference can be improved by keeping the nodes together in memory and by periodically rearranging them, although this can also be done in a general store.
* Naïve dynamic memory allocators can produce an excessive amount of overhead storage for each node allocated; almost no allocation overhead is incurred per node in this approach.
* Seizing an entry from a pre-allocated array is faster than using dynamic memory allocation for each node, since dynamic memory allocation typically requires a search for a free memory block of the desired size.
This approach has one main disadvantage, however: it creates and manages a private memory space for its nodes. This leads to the following issues:
* It increases complexity of the implementation.
* Growing a large array when it is full may be difficult or impossible, whereas finding space for a new linked list node in a large, general memory pool may be easier.
* Adding elements to a dynamic array will occasionally (when it is full) unexpectedly take linear ( O(n)) instead of constant time (although it is still an amortized constant).
* Using a general memory pool leaves more memory for other data if the list is smaller than expected or if many nodes are freed.
For these reasons, this approach is mainly used for languages that do not support dynamic memory allocation. These disadvantages are also mitigated if the maximum size of the list is known at the time the array is created.
Language support
ManyInternal and external storage
When constructing a linked list, one is faced with the choice of whether to store the data of the list directly in the linked list nodes, called ''internal storage'', or merely to store a reference to the data, called ''external storage''. Internal storage has the advantage of making access to the data more efficient, requiring less storage overall, having better locality of reference, and simplifying memory management for the list (its data is allocated and deallocated at the same time as the list nodes). External storage, on the other hand, has the advantage of being more generic, in that the same data structure and machine code can be used for a linked list no matter what the size of the data is. It also makes it easy to place the same data in multiple linked lists. Although with internal storage the same data can be placed in multiple lists by including multiple ''next'' references in the node data structure, it would then be necessary to create separate routines to add or delete cells based on each field. It is possible to create additional linked lists of elements that use internal storage by using external storage, and having the cells of the additional linked lists store references to the nodes of the linked list containing the data. In general, if a set of data structures needs to be included in linked lists, external storage is the best approach. If a set of data structures need to be included in only one linked list, then internal storage is slightly better, unless a generic linked list package using external storage is available. Likewise, if different sets of data that can be stored in the same data structure are to be included in a single linked list, then internal storage would be fine. Another approach that can be used with some languages involves having different data structures, but all have the initial fields, including the ''next'' (and ''prev'' if double linked list) references in the same location. After defining separate structures for each type of data, a generic structure can be defined that contains the minimum amount of data shared by all the other structures and contained at the top (beginning) of the structures. Then generic routines can be created that use the minimal structure to perform linked list type operations, but separate routines can then handle the specific data. This approach is often used in message parsing routines, where several types of messages are received, but all start with the same set of fields, usually including a field for message type. The generic routines are used to add new messages to a queue when they are received, and remove them from the queue in order to process the message. The message type field is then used to call the correct routine to process the specific type of message.Example of internal and external storage
To create a linked list of families and their members, using internal storage, the structure might look like the following: record ''member'' record ''family'' To print a complete list of families and their members using internal storage, write: aFamily := Families ''// start at head of families list'' while aFamily ≠ null ''// loop through list of families'' print information about family aMember := aFamily.members ''// get head of list of this family's members'' while aMember ≠ null ''// loop through list of members'' print information about member aMember := aMember.next aFamily := aFamily.next Using external storage, the following structures can be created: record ''node'' record ''member'' record ''family'' To print a complete list of families and their members using external storage, write: famNode := Families ''// start at head of families list'' while famNode ≠ null ''// loop through list of families'' aFamily := (family) famNode.data ''// extract family from node'' print information about family memNode := aFamily.members ''// get list of family members'' while memNode ≠ null ''// loop through list of members'' aMember := (member)memNode.data ''// extract member from node'' print information about member memNode := memNode.next famNode := famNode.next Notice that when using external storage, an extra step is needed to extract the record from the node and cast it into the proper data type. This is because both the list of families and the list of members within the family are stored in two linked lists using the same data structure (''node''), and this language does not have parametric types. As long as the number of families that a member can belong to is known at compile time, internal storage works fine. If, however, a member needed to be included in an arbitrary number of families, with the specific number known only at run time, external storage would be necessary.Speeding up search
Finding a specific element in a linked list, even if it is sorted, normally requires O(''n'') time ( linear search). This is one of the primary disadvantages of linked lists over other data structures. In addition to the variants discussed above, below are two simple ways to improve search time. In an unordered list, one simple heuristic for decreasing average search time is the ''move-to-front heuristic'', which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again. Another common approach is to "Random-access lists
A random-access list is a list with support for fast random access to read or modify any element in the list. One possible implementation is a skew binary random-access list using the skew binary number system, which involves a list of trees with special properties; this allows worst-case constant time head/cons operations, and worst-case logarithmic time random access to an element by index. Random-access lists can be implemented as persistent data structures. Random-access lists can be viewed as immutable linked lists in that they likewise support the same O(1) head and tail operations. A simple extension to random-access lists is the min-list, which provides an additional operation that yields the minimum element in the entire list in constant time (without mutation complexities).Related data structures
Both stacks and queues are often implemented using linked lists, and simply restrict the type of operations which are supported. The skip list is a linked list augmented with layers of pointers for quickly jumping over large numbers of elements, and then descending to the next layer. This process continues down to the bottom layer, which is the actual list. ANotes
References
Further reading
* * * * * * * * * * * * * * * *External links