Primary Clustering
   HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
, primary clustering is one of two major failure modes of
open addressing Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''probe sequence'') until either the t ...
based
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
s, especially those using
linear probing Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene ...
. It occurs after a
hash collision In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits. Al ...
causes two of the records in the hash table to hash to the same position, and causes one of the records to be moved to the next location in its probe sequence. Once this happens, the cluster formed by this pair of records is more likely to grow by the addition of even more colliding records, regardless of whether the new records hash to the same location as the first two. This phenomenon causes searches for keys within the cluster to be longer.. For instance, in
linear probing Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene ...
, a record involved in a collision is always moved to the next available hash table cell subsequent to the position given by its hash function, creating a contiguous cluster of occupied hash table cells. Whenever another record is hashed to anywhere within the cluster, it grows in size by one cell. Because of this phenomenon, it is likely that a linear-probing hash table with a constant load factor (that is, with the size of the table proportional to the number of items it stores) will have some clusters of logarithmic length, and will take logarithmic time to search for the keys within that cluster.. A related phenomenon, secondary clustering, occurs more generally with open addressing modes including linear probing and
quadratic probing Quadratic probing is an open addressing scheme in computer programming for resolving hash collisions in hash tables. Quadratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial unti ...
in which the probe sequence is independent of the key, as well as in hash chaining. In this phenomenon, a low-quality
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
may cause many keys to hash to the same location, after which they all follow the same probe sequence or are placed in the same hash chain as each other, causing them to have slow access times. Both types of clustering may be reduced by using a higher-quality hash function, or by using a hashing method such as
double hashing Double hashing is a computer programming technique used in conjunction with open addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an offset when a collision occurs. Double hashing with open addressing is ...
that is less susceptible to clustering.


References

Hashing {{Comp-sci-stub