A hash array mapped trie
(HAMT, ) is an implementation of an
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
that combines the characteristics of a
hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
and an
array mapped trie.
It is a refined version of the more general notion of a
hash tree.
Operation
A HAMT is an array mapped trie where the keys are first hashed to ensure an even distribution of keys and a constant key length.
In a typical implementation of HAMT's array mapped trie, each node contains a table with some fixed number N of slots with each slot containing either a nil pointer or a pointer to another node. N is commonly 32. As allocating space for N pointers for each node would be expensive, each node instead contains a bitmap which is N bits long where each bit indicates the presence of a non-nil pointer. This is followed by an array of pointers equal in length to the number of ones in the bitmap (its
Hamming weight
The Hamming weight of a string (computer science), string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the mo ...
).
Advantages of HAMTs
The hash array mapped trie achieves almost hash table-like speed while using memory much more economically. Also, a hash table may have to be periodically resized, an expensive operation, whereas HAMTs grow dynamically. Generally, HAMT performance is improved by a larger root table with some multiple of N slots; some HAMT variants allow the root to grow lazily
with negligible impact on performance.
Implementation details
Implementation of a HAMT involves the use of the
population count function, which counts the number of ones in the binary representation of a number. This operation is available in
many instruction set architectures, but it is
available in only some high-level languages. Although population count can be implemented in software in
O(1) time using a
series of shift and add instructions, doing so may perform the operation an order of magnitude slower.
Implementations
The programming languages
Clojure
Clojure (, like ''closure'') is a dynamic programming language, dynamic and functional programming, functional dialect (computing), dialect of the programming language Lisp (programming language), Lisp on the Java (software platform), Java platfo ...
,
Scala, and
Frege use a
persistent variant of hash array mapped tries for their native hash map type. The
Haskell
Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
library "unordered-containers" uses the same to implement persistent map and set data structures. Another Haskell library "stm-containers" adapts the algorithm for use in the context of
software transactional memory. A
Javascript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
HAMT library based on the Clojure implementation is also available. The
Rubinius implementation of
Ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
includes a HAMT, mostly written in Ruby but with 3 primitives. Large maps in
Erlang use a
persistent HAMT representation internally since release 18.0. The Pony programming language uses a HAMT for the hash map in its persistent collections package.
The im and im-rc crates, which provide persistent collection types for the Rust programming language, use a HAMT for their persistent hash tables and hash sets.
The concurrent lock-free version of the hash trie called
Ctrie is a mutable thread-safe implementation which ensures progress. The data-structure has been proven to be correct
[Prokopec, A. et al. (2011]
Cache-Aware Lock-Free Concurrent Hash Tries
Technical Report, 2011. - Ctrie operations have been shown to have the
atomicity,
linearizability and
lock-freedom properties.
See also
*
Judy array
In computer science, a Judy array is an early-2000s Hewlett-Packard hand-optimized implementation of a 256-ary radix tree that uses many situational node types to reduce latency from CPU cache-line fills.Alan Silverstein,Judy IV Shop Manual, 2002 ...
*
Radix tree
References
{{DEFAULTSORT:Hash Array Mapped Trie
Associative arrays
Hashing