computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...

, an associative array, map, symbol table, or dictionary is an

abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...

that stores a

collection Collection or Collections may refer to: * Cash collection, the function of an accounts receivable department * Collection (church), money donated by the congregation during a church service * Collection agency, agency to collect cash * Collectio ...

of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an associative array is a

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...

with ''finite''

domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined **Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function * Do ...

. It supports 'lookup', 'remove', and 'insert' operations. The dictionary problem is the classic problem of designing efficient

data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...

s that implement associative arrays. The two major solutions to the dictionary problem are

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...

s and

search tree In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less ...

s..Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994
"Dynamic Perfect Hashing: Upper and Lower Bounds"
. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370 In some cases it is also possible to solve the problem using directly addressed

array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...

binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...

s, or other more specialized structures. Many programming languages include associative arrays as

primitive data type In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled pr ...

s, and they are available in

software libraries In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subro ...

for many others.

Content-addressable memory Content-addressable memory (CAM) is a special type of computer memory used in certain very-high-speed searching applications. It is also known as associative memory or associative storage and compares input search data against a table of stored d ...

is a form of direct hardware-level support for associative arrays. Associative arrays have many applications including such fundamental

programming pattern In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine cod ...

s as

memoization In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization ...

and the

decorator pattern In object-oriented programming, the decorator pattern is a design pattern that allows behavior to be added to an individual object, dynamically, without affecting the behavior of other objects from the same class. The decorator pattern is often ...

., pp. 597–599. The name does not come from the

associative property In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement ...

known in mathematics. Rather, it arises from the fact that values are associated with keys. It is not to be confused with associative processors.

Operations

In an associative array, the association between a key and a value is often known as a "mapping", and the same word mapping may also be used to refer to the process of creating a new association. The operations that are usually defined for an associative array are: * Insert or put: add a new

(key, value)

pair to the collection, mapping the key to its new value. Any existing mapping is overwritten. The arguments to this operation are the key and the value. * Remove or delete: remove a

(key, value)

pair from the collection, unmapping a given key from its value. The argument to this operation is the key. * Lookup, find, or get: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some lookup functions raise an exception, while others return a default value (zero, null, specific value passed to the constructor, ...). In addition, associative arrays may also include other operations such as determining the number of mappings or constructing an

iterator In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterat ...

to loop over all the mappings. Usually, for such an operation, the order in which the mappings are returned may be implementation-defined. A

multimap In computer science, a multimap (sometimes also multihash, multidict or multidictionary) is a generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key. Both m ...

generalizes an associative array by allowing multiple values to be associated with a single key. A

bidirectional map In computer science, a bidirectional map is an associative data structure in which the (key, value) pairs form a one-to-one correspondence. Thus the binary relation is functional in each direction: each value can also be mapped to a unique key. A ...

is a related abstract data type in which the mappings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as an argument and looks up the key associated with that value.

Properties

The operations of the associative array should satisfy various properties: *

lookup(k, insert(j, v, D)) = if k 
 j then v else lookup(k, D)

* lookup(k, new()) = fail, where fail is an exception or default value *

remove(k, insert(j, v, D)) = if k 
 j then remove(k, D) else insert(j, v, remove(k, D))

* remove(k, new()) = new() where k and j are keys, v is a value, D is an associative array, and new() creates a new, empty associative array.

Example

Suppose that the set of loans made by a library is represented in a data structure. Each book in a library may be checked out only by a single library patron at a time. However, a single patron may be able to check out multiple books. Therefore, the information about which books are checked out to which patrons may be represented by an associative array, in which the books are the keys and the patrons are the values. Using notation from

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...

, the data structure would be: A lookup operation on the key "Great Expectations" would return "John". If John returns his book, that would cause a deletion operation, and if Pat checks out a book, that would cause an insertion operation, leading to a different state:

Implementation

For dictionaries with very small numbers of mappings, it may make sense to implement the dictionary using an

association list In computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list in which each list element (or node) comprises a key and a value. The association list is said to ''associate'' the value with ...

, a

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...

of mappings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of mappings; however, it is easy to implement and the constant factors in its running time are small. Another very simple implementation technique, usable when the keys are restricted to a narrow range, is direct addressing into an array: the value for a given key ''k'' is stored at the array cell ''A'' 'k'' or if there is no mapping for ''k'' then the cell stores a special

sentinel value In computer programming, a sentinel value (also referred to as a flag value, trip value, rogue value, signal value, or dummy data) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically in ...

that indicates the absence of a mapping. As well as being simple, this technique is fast: each dictionary operation takes constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small. The two major approaches to implementing dictionaries are a

or a

Hash table implementations

The most frequently used general purpose implementation of an associative array is with a

: an

combined with a

hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...

that separates each key into a separate "bucket" of the array. The basic idea behind a hash table is that accessing an element of an array via its index is a simple, constant-time operation. Therefore, the average overhead of an operation for a hash table is only the computation of the key's hash, combined with accessing the corresponding bucket within the array. As such, hash tables usually perform in O(1) time, and outperform alternatives in most situations. Hash tables need to be able to handle

collisions In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great f ...

: when the hash function maps two different keys to the same bucket of the array. The two most widespread approaches to this problem are

separate chaining In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', also ...

and

open addressing Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''probe sequence'') until either the t ...

.. In separate chaining, the array does not store the value itself but stores a pointer to another container, usually an

, that stores all of the values matching the hash. On the other hand, in open addressing, if a hash collision is found, then the table seeks an empty spot in an array to store the value in a deterministic manner, usually by looking at the next immediate position in the array. Open addressing has a lower

cache miss In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...

ratio than separate chaining when the table is mostly empty. However, as the table becomes filled with more elements, open addressing's performance degrades exponentially. Additionally, separate chaining uses less memory in most cases, unless the entries are very small (less than four times the size of a pointer).

Tree implementations

Self-balancing binary search trees

Another common approach is to implement an associative array with a

self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donal ...

, such as an

AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any nod ...

or a red–black tree. Compared to hash tables, these structures have both advantages and weaknesses. The worst-case performance of self-balancing binary search trees is significantly better than that of a hash table, with a time complexity in

big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...

of O(log ''n''). This is in contrast to hash tables, whose worst-case performance involves all elements sharing a single bucket, resulting in O(''n'') time complexity. In addition, and like all binary search trees, self-balancing binary search trees keep their elements in order. Thus, traversing its elements follows a least-to-greatest pattern, whereas traversing a hash table can result in elements being in seemingly random order. Because they are in-order, tree-based maps can also satisfy range queries (find all values between two bounds) where a hashmap can only find exact values. However, hash tables have a much better average-case time complexity than self-balancing binary search trees of O(1), and their worst-case performance is highly unlikely when a good

is used. It is worth noting that a self-balancing binary search tree can be used to implement the buckets for a hash table that uses separate chaining. This allows for average-case constant lookup, but assures a worst-case performance of O(log ''n''). However, this introduces extra complexity into the implementation, and may cause even worse performance for smaller hash tables, where the time spent inserting into and balancing the tree is greater than the time needed to perform a

linear search In computer science, a linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched. A linear search runs in at ...

on all of the elements of a linked list or similar data structure.

Other trees

Associative arrays may also be stored in unbalanced

s or in data structures specialized to a particular type of keys such as

radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resul ...

trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...

Judy array Judy is a short form of the name Judith. Judy may refer to: Places * Judy, Kentucky, village in Montgomery County, United States * Judy Woods, woodlands in Bradford, West Yorkshire, England, United Kingdom Animals * Judy (dog) (1936–1950) ...

s, or

van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...

s, though the ability of these implementation methods within comparison to hash tables varies; for instance, Judy trees remain indicated to perform with a smaller quantity of efficiency than hash tables, while carefully selected hash tables generally perform with increased efficiency in comparison to adaptive radix trees, with potentially greater restrictions on the types of data that they can handle. The advantages of these alternative structures come from their ability to handle operations beyond the basic ones of an associative array, such as finding the mapping whose key is the closest to a queried key, when the query is not itself present in the set of mappings.

Comparison

Ordered dictionary

The basic definition of the dictionary does not mandate an order. To guarantee a fixed order of enumeration, ordered versions of the associative array are often used. There are two senses of an ordered dictionary: * The order of enumeration is always deterministic for a given set of keys by sorting. This is the case for tree-based implementations, one representative being the container of C++. * The order of enumeration is key-independent and is instead based on the order of insertion. This is the case for the "ordered dictionary" in

.NET Framework The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...

, the LikedHashMap of

Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...

and

. The latter sense of ordered dictionaries are more commonly encountered. They can be implemented using an

, by overlaying a

doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked record (computer science), records called node (computer science), nodes. Each node contains three field (computer science), fields: ...

on top of a normal dictionary, or by moving the actual data out of the sparse (unordered) array and into a dense insertion-ordered one.

Language support

Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting. Built-in syntactic support for associative arrays was introduced in 1969 by

SNOBOL4 SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of ...

, under the name "table". TMG offered tables with string keys and integer values.

MUMPS MUMPS ("Massachusetts General Hospital Utility Multi-Programming System"), or M, is an imperative, high-level programming language with an integrated transaction processing key–value database. It was originally developed at Massachusetts Gener ...

made multi-dimensional associative arrays, optionally persistent, its key data structure.

SETL SETL (SET Language) is a very high-level programming language based on the mathematical theory of sets. It was originally developed by (Jack) Jacob T. Schwartz at the New York University (NYU) Courant Institute of Mathematical Sciences in the ...

supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with

AWK AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lang ...

and including

Rexx Rexx (Restructured Extended Executor) is a programming language that can be interpreted or compiled. It was developed at IBM by Mike Cowlishaw. It is a structured, high-level programming language designed for ease of learning and reading. ...

Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...

PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group ...

Tcl TCL or Tcl or TCLs may refer to: Business * TCL Technology, a Chinese consumer electronics and appliance company **TCL Electronics, a subsidiary of TCL Technology * Texas Collegiate League, a collegiate baseball league * Trade Centre Limited ...

JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...

Maple ''Acer'' () is a genus of trees and shrubs commonly known as maples. The genus is placed in the family Sapindaceae.Stevens, P. F. (2001 onwards). Angiosperm Phylogeny Website. Version 9, June 2008 nd more or less continuously updated since http ...

Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...

Wolfram Language The Wolfram Language ( ) is a general multi-paradigm programming language developed by Wolfram Research. It emphasizes symbolic computation, functional programming, and rule-based programming and can employ arbitrary structures and data. It is ...

, Go, and

Lua Lua or LUA may refer to: Science and technology * Lua (programming language) * Latvia University of Agriculture * Last universal ancestor, in evolution Ethnicity and language * Lua people, of Laos * Lawa people, of Thailand sometimes referred t ...

, support associative arrays as a primary container type. In many more languages, they are available as library functions without special syntax. In

Smalltalk Smalltalk is an object-oriented, dynamically typed reflective programming language. It was designed and created in part for educational use, specifically for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by Alan Ka ...

Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...

, .NET,

REALbasic The Xojo programming environment and programming language is developed and commercially marketed by Xojo, Inc. of Austin, Texas for software development targeting macOS, Microsoft Windows, Linux, iOS, the Web and Raspberry Pi. Xojo uses a propri ...

Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...

, VBA and

Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracle ...

they are called ''dictionaries''; in

and

Seed7 Seed7 is an extensible general-purpose programming language designed by Thomas Mertes. It is syntactically similar to Pascal and Ada. Along with many other features, it provides an extension mechanism. Daniel Zingaro"Modern Extensible Languages" ...

they are called ''hashes''; in

C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...

, Go,

Clojure Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is comm ...

, Scala,

OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...

Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...

they are called ''maps'' (see map (C++), unordered_map (C++), and ); in

Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fro ...

and

Windows PowerShell PowerShell is a task automation and configuration management program from Microsoft, consisting of a command-line shell and the associated scripting language. Initially a Windows component only, known as Windows PowerShell, it was made open-sourc ...

, they are called ''hash tables'' (since both typically use this implementation); in

and Lua, they are called ''tables''. In

, all arrays can be associative, except that the keys are limited to integers and strings. In JavaScript (see also

), all objects behave as associative arrays with string-valued keys, while the Map and WeakMap types take arbitrary objects as keys. In Lua, they are used as the primitive building block for all data structures. In

Visual FoxPro Visual FoxPro is a Microsoft data-centric procedural programming language with object-oriented programming (OOP) features. It was derived from FoxPro (originally known as FoxBASE) which was developed by Fox Software beginning in 1984. Fox Techn ...

, they are called ''Collections''. The D language also has support for associative arrays.

Permanent storage

Many programs using associative arrays will at some point need to store that data in a more permanent form, like in a

computer file A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and transfe ...

. A common solution to this problem is a generalized concept known as ''archiving'' or ''

serialization In computing, serialization (or serialisation) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e ...

'', which produces a text or binary representation of the original objects that can be written directly to a file. This is most commonly implemented in the underlying object model, like .Net or Cocoa, which include standard functions that convert the internal data into text form. The program can create a complete text representation of any group of objects by calling these methods, which are almost always already implemented in the base associative array class."Archives and Serializations Programming Guide"
Apple Inc., 2012 For programs that use very large data sets, this sort of individual file storage is not appropriate, and a

database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...

(DB) is required. Some DB systems natively store associative arrays by serializing the data and then storing that serialized data and the key. Individual arrays can then be loaded or saved from the database using the key to refer to them. These key–value stores have been used for many years and have a history as long as that as the more common

relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...

(RDBs), but a lack of standardization, among other reasons, limited their use to certain niche roles. RDBs were used for these roles in most cases, although saving objects to a RDB can be complicated, a problem known as object-relational impedance mismatch. After , the need for high performance databases suitable for

cloud computing Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over mul ...

and more closely matching the internal structure of the programs using them led to a renaissance in the key–value store market. These systems can store and retrieve associative arrays in a native fashion, which can greatly improve performance in common web-related workflows.

References

External links

NIST's Dictionary of Algorithms and Data Structures: Associative Array
{{Data types Abstract data types * Composite data types Data types