In computer science , an ASSOCIATIVE ARRAY, MAP, SYMBOL TABLE, or DICTIONARY is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection. Operations associated with this data type allow: * the addition of a pair to the collection * the removal of a pair from the collection * the modification of an existing pair * the lookup of a value associated with a particular key The DICTIONARY PROBLEM is a classic computer science problem: the task of designing a data structure that maintains a set of data during 'search', 'delete', and 'insert' operations. The two major solutions to the dictionary problem are a hash table or a search tree . In some cases it is also possible to solve the problem using directly addressed arrays , binary search trees , or other more specialized structures. Many programming languages include associative arrays as primitive data types , and they are available in software libraries for many others. Contentaddressable memory is a form of direct hardwarelevel support for associative arrays. Associative arrays have many applications including such fundamental programming patterns as memoization and the decorator pattern . CONTENTS * 1 Operations * 2 Example * 3 Implementation * 3.1
Hash table
* 3.2 Tree implementations * 3.2.1 Selfbalancing binary search trees * 3.2.2 Other trees * 3.3 Comparison * 4 Language support * 5 Permanent storage * 6 See also * 7 References * 8 External links OPERATIONS In an associative array, the association between a key and a value is often known as a "binding", and the same word "binding" may also be used to refer to the process of creating a new association. The operations that are usually defined for an associative array are: * ADD or INSERT: add a new ( k e y , v a l u e ) {displaystyle (key,value)} pair to the collection, binding the new key to its new value. The arguments to this operation are the key and the value. * REASSIGN: replace the value in one of the ( k e y , v a l u e ) {displaystyle (key,value)} pairs that are already in the collection, binding an old key to a new value. As with an insertion, the arguments to this operation are the key and the value. * REMOVE or DELETE: remove a ( k e y , v a l u e ) {displaystyle (key,value)} pair from the collection, unbinding a given key from its value. The argument to this operation is the key. * LOOKUP: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some associative array implementations raise an exception . Often then instead of add or reassign there is a single SET operation that adds a new ( k e y , v a l u e ) {displaystyle (key,value)} pair if one does not already exist, and otherwise reassigns it. In addition, associative arrays may also include other operations such as determining the number of bindings or constructing an iterator to loop over all the bindings. Usually, for such an operation, the order in which the bindings are returned may be arbitrary. A multimap generalizes an associative array by allowing multiple values to be associated with a single key. A bidirectional map is a related abstract data type in which the bindings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as argument and looks up the key associated with that value. EXAMPLE Suppose that the set of loans made by a library is represented in a
data structure. Each book in a library may be checked out only by a
single library patron at a time. However, a single patron may be able
to check out multiple books. Therefore, the information about which
books are checked out to which patrons may be represented by an
associative array, in which the books are the keys and the patrons are
the values. Using notation from Python or
JSON
{ "Pride and Prejudice": "Alice", "Wuthering Heights": "Alice", "Great Expectations": "John" } A lookup operation on the key "Great Expectations" would return "John". If John returns his book, that would cause a deletion operation, and if Pat checks out a book, that would cause an insertion operation, leading to a different state: { "Pride and Prejudice": "Alice", "The Brothers Karamazov": "Pat", "Wuthering Heights": "Alice" } IMPLEMENTATION For dictionaries with very small numbers of bindings, it may make sense to implement the dictionary using an association list , a linked list of bindings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of bindings; however, it is easy to implement and the constant factors in its running time are small. Another very simple implementation technique, usable when the keys are restricted to a narrow range of integers, is direct addressing into an array: the value for a given key k is stored at the array cell A, or if there is no binding for k then the cell stores a special sentinel value that indicates the absence of a binding. As well as being simple, this technique is fast: each dictionary operation takes constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small. The two major approaches to implementing dictionaries are a hash table or a search tree . HASH TABLE IMPLEMENTATIONS The most frequently used general purpose implementation of an associative array is with a hash table : an array combined with a hash function that separates each key into a separate "bucket" of the array. The basic idea behind a hash table is that accessing an element of an array via its index is a simple, constanttime operation. Therefore, the average overhead of an operation for a hash table is only the computation of the key's hash, combined with accessing the corresponding bucket within the array. As such, hash tables usually perform in O(1) time, and outperform alternatives in most situations. Hash tables need to be able to handle collisions : when the hash function maps two different keys to the same bucket of the array. The two most widespread approaches to this problem are separate chaining and open addressing . In separate chaining, the array does not store the value itself but stores a pointer to another container, usually an association list , that stores all of the values matching the hash. On the other hand, in open addressing, if a hash collision is found, then the table seeks an empty spot in an array to store the value in a deterministic manner, usually by looking at the next immediate position in the array. This graph compares the average number of cache misses required to look up elements in tables with separate chaining and open addressing. Open addressing has a lower cache miss ratio than separate chaining when the table is mostly empty. However, as the table becomes filled with more elements, open addressing's performance degrades exponentially. Additionally, separate chaining uses less memory in most cases, unless the entries are very small (less than four times the size of a pointer). TREE IMPLEMENTATIONS Main article:
Search tree
Selfbalancing Binary Search Trees Another common approach is to implement an associative array with a selfbalancing binary search tree , such as an AVL tree or a redblack tree . Compared to hash tables, these structures have both advantages and weaknesses. The worstcase performance of selfbalancing binary search trees is significantly better than that of a hash table, with a time complexity in big O notation of O(log n). This is in contrast to hash tables, whose worstcase performance involves all elements sharing a single bucket, resulting in O(n) time complexity. In addition, and like all binary search trees, selfbalancing binary search trees keep their elements in order. Thus, traversing its elements follows a leasttogreatest pattern, whereas traversing a hash table can result in elements being in seemingly random order. However, hash tables have a much better averagecase time complexity than selfbalancing binary search trees of O(1), and their worstcase performance is highly unlikely when a good hash function is used. It is worth noting that a selfbalancing binary search tree can be used to implement the buckets for a hash table that uses separate chaining. This allows for averagecase constant lookup, but assures a worstcase performance of O(log n). However, this introduces extra complexity into the implementation, and may cause even worse performance for smaller hash tables, where the time spent inserting into and balancing the tree is greater than the time needed to perform a linear search on all of the elements of a linked list or similar data structure. Other Trees Associative arrays may also be stored in unbalanced binary search trees or in data structures specialized to a particular type of keys such as radix trees , tries , Judy arrays , or van Emde Boas trees , but these implementation methods are less efficient than hash tables as well as placing greater restrictions on the types of data that they can handle. The advantages of these alternative structures come from their ability to handle operations beyond the basic ones of an associative array, such as finding the binding whose key is the closest to a queried key, when the query is not itself present in the set of bindings. COMPARISON UNDERLYING DATA STRUCTURE LOOKUP INSERTION DELETION ORDERED AVERAGE WORST CASE AVERAGE WORST CASE AVERAGE WORST CASE HASH TABLE O(1) O(n) O(1) O(n) O(1) O(n) No SELFBALANCING BINARY SEARCH TREE O(log n) O(log n) O(log n) O(log n) O(log n) O(log n) Yes UNBALANCED BINARY SEARCH TREE O(log n) O(n) O(log n) O(n) O(log n) O(n) Yes Sequential container of keyvalue pairs (e.g. association list ) O(n) O(n) O(1) O(1) O(n) O(n) No LANGUAGE SUPPORT Main article: Comparison of programming languages (mapping) Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using arraylike subscripting. Builtin syntactic support for associative arrays was introduced by
SNOBOL 4, under the name "table".
MUMPS made multidimensional
associative arrays, optionally persistent, its key data structure.
SETL supported them as one possible implementation of sets and maps.
Most modern scripting languages, starting with
AWK and including Rexx
,
Perl
In
Smalltalk
PERMANENT STORAGE Main article: Keyvalue store Most programs using associative arrays will at some point need to store that data in a more permanent form, like in a computer file . A common solution to this problem is a generalized concept known as archiving or serialization , which produces a text or binary representation of the original objects that can be written directly to a file. This is most commonly implemented in the underlying object model, like .Net or Cocoa, which include standard functions that convert the internal data into text form. The program can create a complete text representation of any group of objects by calling these methods, which are almost always already implemented in the base associative array class. For programs that use very large data sets, this sort of individual file storage is not appropriate, and a database management system (DB) is required. Some DB systems natively store associative arrays by serializing the data and then storing that serialized data and the key. Individual arrays can then be loaded or saved from the database using the key to refer to them. These keyvalue stores have been used for many years and have a history as long as that as the more common relational database (RDBs), but a lack of standardization, among other reasons, limited their use to certain niche roles. RDBs were used for these roles in most cases, although saving objects to a RDB can be complicated, a problem known as objectrelational impedance mismatch . After c. 2010, the need for high performance databases suitable for cloud computing and more closely matching the internal structure of the programs using them led to a renaissance in the keyvalue store market. These systems can store and retrieve associative arrays in a native fashion, which can greatly improve performance in common webrelated workflows. SEE ALSO * Computer programming portal *
Keyvalue database
*
Tuple
REFERENCES * ^ A B C D E F Goodrich, Michael T. ; Tamassia, Roberto (2006),
"9.1 The Map Abstract Data Type", Data Structures & Algorithms in Java
(4th ed.), Wiley, pp. 368–371
* ^ A B C D E Mehlhorn, Kurt ; Sanders, Peter (2008), "4 Hash
Tables and Associative Arrays", Algorithms and Data Structures: The
Basic Toolbox (PDF), Springer, pp. 81–98
* ^ Anderson, Arne (1989). "Optimal Bounds on the Dictionary
Problem". Proc. Symposium on Optimal Algorithms. Springer Verlag:
106–114.
* ^ A B C D Cormen, Thomas H. ; Leiserson, Charles E. ; Rivest,
Ronald L. ; Stein, Clifford (2001), "11 Hash Tables", Introduction to
Algorithms (2nd ed.),
MIT Press
* ^ "Archives and Serializations Programming Guide", Apple Inc., 2012 EXTERNAL LINKS
