Linear hashing (LH) is a dynamic data structure which implements a
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980.
[
] It has been analyzed by Baeza-Yates and Soza-Pollman.
It is the first in a number of schemes known as dynamic hashing
[
]
such as Larson's Linear Hashing with Partial Extensions,
Linear Hashing with Priority Splitting,
[
]
Linear Hashing with Partial Expansions and Priority Splitting,
or Recursive Linear Hashing.
[
]
The file structure of a dynamic hashing data structure adapts itself to changes in the size of the file, so expensive periodic file reorganization is avoided.[ A Linear Hashing file expands by splitting
a pre-determined bucket into two and contracts by merging two predetermined buckets into one. The trigger for a reconstruction depends on the flavor of the scheme; it could be an overflow at a bucket or load factor (number of records divided by the number of buckets) moving outside of a predetermined range.]
In Linear Hashing there are two types of buckets, those that are to be split and those
already split. While extendible hashing splits only overflowing buckets,
spiral hashing (a.k.a. spiral storage) distributes records unevenly over the buckets such
that buckets with high costs of insertion, deletion, or retrieval are earliest in line
for a split.[
Linear Hashing has also been made into a scalable distributed data structure, LH*. In LH*, each bucket resides at a different server.
][
] LH* itself has been expanded to provide data availability in the presence of
failed buckets.
[
] Key based operations (inserts, deletes, updates, reads) in LH and
LH* take maximum constant time independent of the number of buckets and hence of records.
[
]
Algorithm details
Records in LH or LH* consists of a key and a content, the latter basically all the other attributes of the record.[ They are stored in buckets. For example, in Ellis' implementation, a bucket is a linked list of records.][ The file allows the key based CRUD operations create or insert, read, update, and delete as well as a scan operations that scans all records, for example to do a database select operation on a non-key attribute.][ Records are stored in buckets whose numbering starts with 0.][
]
Hash functions
In order to access a record with key , a family of hash functions, called
collectively a dynamic hash function is applied to the key . At any time,
at most two hash functions and are used. A typical
example uses the division modulo x operation. If the original number of buckets is
, then the family of hash functions is [
]
File expansion
As the file grows through insertions, it expands gracefully through the splitting
of one bucket into two buckets. The sequence of buckets to split is predetermined.
This is the fundamental difference to schemes like Fagin's extendible hashing.
[
]
For the two new buckets, the hash function is replaced with
. The number of the bucket to be split is part of the
file state and called the split pointer .[
]
Split control
A split can be performed whenever a bucket overflows. This is an uncontrolled split.
Alternatively, the file can monitor the load factor and performs a split whenever
the load factor exceeds a threshold. This was controlled splitting.[
]
Addressing
Addressing is based on the file state, consisting of the split pointer
and the level . If the level is , then the hash functions
used are and .
The LH algorithm for hashing key is[
if ]