In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a Judy array is an early-2000s
Hewlett-Packard
The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
hand-optimized implementation of a 256-ary
radix tree
In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resu ...
that uses many situational node types to reduce latency from CPU
cache-line fills.
[Robert Gobeille and Douglas Baskins' patent](_blank)
/ref>[Alan Silverstein,]
Judy IV Shop Manual
, 2002 As a compressed radix tree, a Judy array can store potentially sparse integer- or string-indexed data with comparatively low memory usage and low read latency, without relying on hashing or tree balancing, and without sacrificing in-order traversal. Per-operation latency scales as —as expected of a tree—and the leading constant factor is small enough that Judy arrays are suitable even to the peta-element range. When applicable, they can be faster than implementations of AVL tree
In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by m ...
s, B-tree
In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing fo ...
s, hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s, or skip list
In computer science, a skip list (or skiplist) is a Randomized algorithm, probabilistic data structure that allows O(\log n) Average-case complexity, average complexity for search as well as O(\log n) average complexity for insertion within an or ...
s from the same time period.
History
The Judy array was invented by Douglas Baskins over the years leading up to 2002 and named after his sister.
Node types
Broadly, tree nodes in Judy arrays fall into one of three categories, though the implementation uses situational variations within each category:
* A linear node is a short, fixed-capacity, array-based association list
In computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list in which each list element (or node) comprises a key and a value. The association list is said to ''associate'' the value wit ...
meant to fit in one cache line. That is, such a node has an array of key bytes and a parallel array of values or pointers. Lookup is by linear search
In computer science, linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
A linear search runs in linea ...
over the key array and then random access to the corresponding index in the value/pointer array.
* A bitmap node is a size-256 bitvector
A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level p ...
tracking which values/children are present and then a sorted list of corresponding values or pointers. Lookup is by population count of the bits up to the target index and then random access to the corresponding entry in the value/pointer array. The bitmap fits within a typical CPU cache line, and random access only loads one cache line from the sorted list, so for reading these nodes require at most two cache-line fills.
* An uncompressed node is a conventional trie
In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...
node as an array of values/pointers. Lookup is by random access using the key byte as an index, which at the CPU level requires visiting one cache line.
Linear nodes are used for low branching, bitmap nodes for intermediate branching, and uncompressed nodes for high branching.
Advantages and disadvantages
Due to cache
Cache, caching, or caché may refer to:
Science and technology
* Cache (computing), a technique used in computer storage for easier data access
* Cache (biology) or hoarding, a food storing behavior of animals
* Cache (archaeology), artifacts p ...
optimizations, Judy arrays are fast, especially for very large datasets. On certain tasks involving data that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.
On the other hand, Judy arrays are not suitable for all key types, rely heavily on compile-time case-splitting (which increases both the compiled code size and the work involved in retuning for a new architecture), make some concessions to older architectures that may not be relevant to modern machines, and do not exploit SIMD
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
. They are optimized for read performance over write performance.
See also
* Radix tree
In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resu ...
* Bitwise trie with bitmap
* Hash array mapped trie
A hash array mapped trie (HAMT, ) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie.
It is a refined version of the more general notion of a hash tree.
Operation
A HAMT is ...
References
{{reflist
External links
Main Judy arrays site
A complete technical description of Judy arrays
An independent performance comparison of Judy to Hash Tables
A compact implementation of Judy arrays in 1250 lines of C code
Associative arrays