HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a Judy array is an early-2000s
Hewlett-Packard The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
hand-optimized implementation of a 256-ary
radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resu ...
that uses many situational node types to reduce latency from CPU cache-line fills.Robert Gobeille and Douglas Baskins' patent
/ref>Alan Silverstein,
Judy IV Shop Manual
, 2002
As a compressed radix tree, a Judy array can store potentially sparse integer- or string-indexed data with comparatively low memory usage and low read latency, without relying on hashing or tree balancing, and without sacrificing in-order traversal. Per-operation latency scales as O(\log n)—as expected of a tree—and the leading constant factor is small enough that Judy arrays are suitable even to the peta-element range. When applicable, they can be faster than implementations of
AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by m ...
s,
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing fo ...
s,
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s, or
skip list In computer science, a skip list (or skiplist) is a Randomized algorithm, probabilistic data structure that allows O(\log n) Average-case complexity, average complexity for search as well as O(\log n) average complexity for insertion within an or ...
s from the same time period.


History

The Judy array was invented by Douglas Baskins over the years leading up to 2002 and named after his sister.


Node types

Broadly, tree nodes in Judy arrays fall into one of three categories, though the implementation uses situational variations within each category: * A linear node is a short, fixed-capacity, array-based
association list In computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list in which each list element (or node) comprises a key and a value. The association list is said to ''associate'' the value wit ...
meant to fit in one cache line. That is, such a node has an array of key bytes and a parallel array of values or pointers. Lookup is by
linear search In computer science, linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched. A linear search runs in linea ...
over the key array and then random access to the corresponding index in the value/pointer array. * A bitmap node is a size-256
bitvector A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level p ...
tracking which values/children are present and then a sorted list of corresponding values or pointers. Lookup is by population count of the bits up to the target index and then random access to the corresponding entry in the value/pointer array. The bitmap fits within a typical CPU cache line, and random access only loads one cache line from the sorted list, so for reading these nodes require at most two cache-line fills. * An uncompressed node is a conventional
trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...
node as an array of values/pointers. Lookup is by random access using the key byte as an index, which at the CPU level requires visiting one cache line. Linear nodes are used for low branching, bitmap nodes for intermediate branching, and uncompressed nodes for high branching.


Advantages and disadvantages

Due to
cache Cache, caching, or caché may refer to: Science and technology * Cache (computing), a technique used in computer storage for easier data access * Cache (biology) or hoarding, a food storing behavior of animals * Cache (archaeology), artifacts p ...
optimizations, Judy arrays are fast, especially for very large datasets. On certain tasks involving data that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys. On the other hand, Judy arrays are not suitable for all key types, rely heavily on compile-time case-splitting (which increases both the compiled code size and the work involved in retuning for a new architecture), make some concessions to older architectures that may not be relevant to modern machines, and do not exploit
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
. They are optimized for read performance over write performance.


See also

*
Radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resu ...
* Bitwise trie with bitmap *
Hash array mapped trie A hash array mapped trie (HAMT, ) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie. It is a refined version of the more general notion of a hash tree. Operation A HAMT is ...


References

{{reflist


External links


Main Judy arrays siteA complete technical description of Judy arraysAn independent performance comparison of Judy to Hash TablesA compact implementation of Judy arrays in 1250 lines of C code
Associative arrays