Hopscotch Hashing
   HOME

TheInfoList



OR:

Hopscotch hashing is a scheme in
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
for resolving hash collisions of values of
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
s in a
table Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
using
open addressing Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''probe sequence'') until either the t ...
. It is also well suited for implementing a
concurrent hash table A concurrent hash table (concurrent hash map) is an implementation of hash tables allowing ''concurrent access'' by ''multiple'' threads using a hash function. Concurrent hash tables represent a key concurrent data structure for use in concur ...
. Hopscotch hashing was introduced by
Maurice Herlihy Maurice Peter Herlihy (born 4 January 1954) is a computer scientist active in the field of multiprocessor synchronization. Herlihy has contributed to areas including theoretical foundations of wait-free synchronization, linearizable data structu ...
, Nir Shavit and Moran Tzafrir in 2008. The name is derived from the sequence of hops that characterize the table's insertion algorithm. The algorithm uses a single array of ''n'' buckets. For each bucket, its ''neighborhood'' is a small collection of ''H'' consecutive buckets (i.e. ones with indices close to the original hashed bucket). The desired property of the neighborhood is that the cost of finding an item in the buckets of the neighborhood is close to the cost of finding it in the bucket itself (for example, by having buckets in the neighborhood fall within the same
cache line A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
). The size of the neighborhood must be sufficient to accommodate a logarithmic number of items in the worst case (i.e. it must accommodate log(''n'') items), but only a constant number on average. If some bucket's neighborhood is filled, the table is resized. In hopscotch hashing, as in
cuckoo hashing Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick ...
, and unlike in
linear probing Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene ...
, a given item will always be inserted-into and found-in the neighborhood of its hashed bucket. In other words, it will always be found either in its original hashed array entry, or in one of the next ''H''−1 neighboring entries. ''H'' could, for example, be 32, a common machine word size. The neighborhood is thus a "virtual" bucket that has fixed size and overlaps with the following ''H''−1 buckets. To speed the search, each bucket (array entry) includes a "hop-information" word, an ''H''-bit bitmap that indicates which of the next ''H''−1 entries contain items that hashed to the current entry's virtual bucket. In this way, an item can be found quickly by looking at the word to see which entries belong to the bucket, and then scanning through the constant number of entries (most modern processors support special bit manipulation operations that make the lookup in the "hop-information" bitmap very fast). Here is how to add item ''x'' which was hashed to bucket ''i'': # If the hop-information word for bucket ''i'' shows there are already ''H'' items in this bucket, the table is full; expand the hash table and try again. # Starting at entry ''i'', use a linear probe to find an empty entry at index ''j''. (If no empty slot exists, the table is full.) # While (''j''−''i'') mod ''n'' ≥ ''H'', move the empty slot toward ''i'' as follows: ## Search the ''H''−1 slots preceding ''j'' for an item ''y'' whose hash value ''k'' is within ''H''−1 of ''j'', i.e. (''j''−''k'') mod ''n'' < ''H''. (This can be done using the hop-information words.) ## If no such item ''y'' exists within the range, the table is full. ## Move ''y'' to ''j'', creating a new empty slot closer to ''i''. ## Set ''j'' to the empty slot vacated by ''y'' and repeat. # Store ''x'' in slot ''j'' and return. The idea is that hopscotch hashing "moves the empty slot towards the desired bucket". This distinguishes it from
linear probing Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene ...
which leaves the empty slot where it was found, possibly far away from the original bucket, or from
cuckoo hashing Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick ...
which, in order to create a free bucket, moves an item out of one of the desired buckets in the target arrays, and only then tries to find the displaced item a new place. To remove an item from the table, one simply removes it from the table entry. If the neighborhood buckets are cache aligned, then one could apply a reorganization operation in which items are moved into the now vacant location in order to improve alignment. One advantage of hopscotch hashing is that it provides good performance at very high table load factors, even ones exceeding 0.9. Part of this efficiency is due to using a linear probe only to find an empty slot during insertion, not for every lookup as in the original
linear probing Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene ...
hash table algorithm. Another advantage is that one can use any hash function, in particular simple ones that are close to universal.


See also

*
Cuckoo hashing Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick ...
* Hash collision *
Hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
*
Linear probing Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene ...
*
Open addressing Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''probe sequence'') until either the t ...
* Perfect hashing * Quadratic probing


References


External links


libhhash - a C hopscotch hashing implementationhopscotch-map - a C++ implementation of a hash map using hopscotch hashing
* {{cite web , title=Hopscotch hashing , first=Emmanuel , last=Goossaert , date=11 August 2013 , access-date=16 October 2019 , url=http://codecapsule.com/2013/08/11/hopscotch-hashing/ A detailed description and example implementation. Search algorithms Hashing