Open addressing, or closed hashing, is a method of
collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''probe sequence'') until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.
Well-known probe sequences include:
;
Linear probing
Linear probing is a scheme in computer programming for resolving hash collision, collisions in hash tables, data structures for maintaining a collection of Attribute–value pair, key–value pairs and looking up the value associated with a giv ...
: in which the interval between probes is fixed — often set to 1.
;
Quadratic probing
Quadratic probing is an open addressing scheme in computer programming for resolving hash collisions in hash tables. Quadratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial un ...
: in which the interval between probes increases linearly (hence, the indices are described by a quadratic function).
;
Double hashing
Double hashing is a computer programming technique used in conjunction with open addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an offset when a collision occurs. Double hashing with open addressing is ...
: in which the interval between probes is fixed for each record but is computed by another hash function.
The main trade offs between these methods are that linear probing has the best
cache performance but is most sensitive to
clustering, while double hashing has poor cache performance but exhibits virtually no clustering; quadratic probing falls in between in both areas. Double hashing can also require more computation than other forms of probing.
Some open addressing methods, such as
Hopscotch hashing
Hopscotch hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a hash table, table using open addressing. It is also well suited for implementing a concurrent hash table. Hopscotch hashing was in ...
,
Robin Hood hashing,
last-come-first-served hashing and
cuckoo hashing move existing keys around in the array to make room for the new key. This gives better maximum search times than the methods based on probing.
A critical influence on performance of an open addressing hash table is the ''load factor''; that is, the proportion of the slots in the array that are used. As the load factor increases towards 100%, the number of probes that may be required to find or insert a given key rises dramatically. Once the table becomes full, probing algorithms may even fail to terminate. Even with good hash functions, load factors are normally limited to 80%. A poor hash function can exhibit poor performance even at very low load factors by generating significant clustering, especially with the simplest linear addressing method. Generally typical load factors with most open addressing methods are 50%, while
separate chaining typically can use up to 100%.
Example pseudocode
The following
pseudocode
In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...
is an implementation of an open addressing hash table with linear probing and single-slot stepping, a common approach that is effective if the hash function is good. Each of the lookup, set and remove functions use a common internal function find_slot to locate the array slot that either does or should contain a given key.
record pair
var pair slot
slot
..., slot
um_slots - 1
function find_slot(key)
i := hash(key) modulo num_slots
''// search until we either find the key, or find an empty slot.''
while (slot
is occupied) and (slot
key ≠ key)
i := (i + 1) modulo num_slots
return i
function lookup(key)
i := find_slot(key)
if slot
is occupied ''// key is in table''
return slot
value
else ''// key is not in table''
return not found
function set(key, value)
i := find_slot(key)
if slot
is occupied ''// we found our key''
slot
value := value
return
if the table is almost full
rebuild the table larger ''(note 1)''
i := find_slot(key)
mark slot
as occupied
slot
key := key
slot
value := value
; note 1 : Rebuilding the table requires allocating a larger array and recursively using the set operation to insert all the elements of the old array into the new larger array. It is common to increase the array size
exponentially, for example by doubling the old array size.
function remove(key)
i := find_slot(key)
if slot
is unoccupied
return ''// key is not in the table''
mark slot
as unoccupied
j := i
loop ''(note 2)''
j := (j + 1) modulo num_slots
if slot
is unoccupied
exit loop
k := hash(slot
key) modulo num_slots
''// determine if k lies cyclically in (i,j]''
''// i ≤ j: , i..k..j , ''
''// i > j: , .k..j i...., or , ....j i..k., ''
if i ≤ j
if (i < k) and (k ≤ j)
continue loop
else
if (k ≤ j) or (i < k)
continue loop
mark slot
as occupied
slot
key := slot
key
slot
value := slot
value
mark slot
as unoccupied
i := j
; note 2 : For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). At this point in the pseudocode, is a vacant slot that might be invalidating this property for subsequent records in the cluster. is such a subsequent record. is the raw hash where the record at would naturally land in the hash table if there were no collisions. This test is asking if the record at is invalidly positioned with respect to the required properties of a cluster now that is vacant.
Another technique for removal is simply to mark the slot as deleted. However this eventually requires rebuilding the table simply to remove deleted records. The methods above provide ''O''(1) updating and removal of existing records, with occasional rebuilding if the high-water mark of the table size grows.
The ''O''(1) remove method above is only possible in linearly probed hash tables with single-slot stepping. In the case where many records are to be deleted in one operation, marking the slots for deletion and later rebuilding may be more efficient.
Performance
Assuming an ideal hash function (one that uniformly distributes all elements of the universe), and a random choice of elements from the universe, the performance of the linear probing method is:
Here ''n'' is the number of elements in the table, ''m'' the table size,
the load factor,
the number of probes in an unsucessful search and
the number of probes in a successful search.
[
{{cite book
, last1 = Baeza-Yates
, first1 = Ricardo
, last2 = Poblete
, first2 = Patricio V.
, editor = Atallah
, title = Algorithms and Theory of Computation Handbook
, chapter = Chapter 2: Searching
, publisher = CRC Press
, year = 1999
, pages = 2-7
, isbn = 0849326494
] Note that the expectation deteriorates to infinity when the load factor approaches 1.
See also
*
Lazy deletion – a method of deleting from a hash table using open addressing.
References
Hashing