
In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a hash table is a
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
that implements an
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
, also called a dictionary or simply map; an associative array is an
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
that maps
keys
Key, Keys, The Key or The Keys may refer to:
Common uses
* Key (cryptography), a piece of information needed to encode or decode a message
* Key (instrument), a component of a musical instrument
* Key (lock), a device used to operate a lock
* ...
to
values
In ethics and social sciences, value denotes the degree of importance of some thing or action, with the aim of determining which actions are best to do or what way is best to live ( normative ethics), or to describe the significance of different a ...
.
A hash table uses a
hash function
A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.
Most hash table designs employ an
imperfect hash function.
Hash collisions, where the hash function generates the same index for more than one key, therefore typically must be accommodated in some way.
In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of
key–value pairs, at
amortized
In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case ...
constant average cost per operation.
Hashing is an example of a
space-time tradeoff. If
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
or
linear search
In computer science, linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
A linear search runs in linea ...
can be used to retrieve the element.
In many situations, hash tables turn out to be on average more efficient than
search tree
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and les ...
s or any other
table
Table may refer to:
* Table (database), how the table data arrangement is used within the databases
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (information), a data arrangement with rows and column ...
lookup structure. For this reason, they are widely used in many kinds of computer
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
, particularly for
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
s,
database index
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data withou ...
ing,
caches, and
sets.
History
The idea of hashing arose independently in different places. In January 1953,
Hans Peter Luhn
Hans Peter Luhn (July 1, 1896 – August 19, 1964) was a German-American researcher in the field of computer science and Library & Information Science for IBM, and creator of the Luhn algorithm, KWIC (Key Words In Context) indexing, and s ...
wrote an internal
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
memorandum that used hashing with chaining. The first example of
open addressing
Open addressing, or closed hashing, is a method of Hash table#Collision resolution, collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''prob ...
was proposed by A. D. Linh, building on Luhn's memorandum.
Around the same time,
Gene Amdahl
Gene Myron Amdahl (November 16, 1922 – November 10, 2015) was an American computer architect and high-tech entrepreneur, chiefly known for his work on mainframe computers at IBM and later his own companies, especially Amdahl Corporation. ...
,
Elaine M. McGraw,
Nathaniel Rochester
Nathaniel Rochester (February 21, 1752 – May 17, 1831) was an American Revolutionary War soldier and land speculator, most noted for founding the settlement which would become Rochester, New York.
Early life
Nathaniel Rochester was the ...
, and
Arthur Samuel Arthur Samuel may refer to:
* Arthur Samuel (computer scientist)
Arthur Lee Samuel (December 5, 1901 – July 29, 1990) was an American pioneer in the field of computer gaming and artificial intelligence. He popularized the term "machine learni ...
of
IBM Research
IBM Research is the research and development division for IBM, an American Multinational corporation, multinational information technology company. IBM Research is headquartered at the Thomas J. Watson Research Center in Yorktown Heights, New York ...
implemented hashing for the
IBM 701
The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May 2 ...
assembler. Open addressing with linear probing is credited to Amdahl, although
Andrey Ershov
Andrey Petrovich Yershov (; 19 April 1931, Moscow – 8 December 1988, Moscow) was a Soviet computer scientist, notable as a pioneer in systems programming and programming language research.
Donald Knuth considers him to have independently co ...
independently had the same idea.
The term "open addressing" was coined by
W. Wesley Peterson in his article which discusses the problem of search in large files.
The first
published
Publishing is the activities of making information, literature, music, software, and other content, physical or digital, available to the public for sale or free of charge. Traditionally, the term publishing refers to the creation and distribu ...
work on hashing with chaining is credited to
Arnold Dumey, who discussed the idea of using remainder modulo a prime as a hash function. The word "hashing" was first published in an article by Robert Morris. A
theoretical analysis of linear probing was submitted originally by Konheim and Weiss.
Overview
An
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
stores a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of
unique key
In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candi ...
s. In the hash table implementation of associative arrays, an array
of length
is partially filled with
elements, where
. A key
is hashed using a hash function
to compute an index location