computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...

, an x-fast trie is a

data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...

for storing

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

s from a bounded domain. It supports exact and predecessor or successor queries in time ''O''(log log ''M''), using ''O''(''n'' log ''M'') space, where ''n'' is the number of stored values and ''M'' is the maximum value in the domain. The structure was proposed by

Dan Willard Dan Edward Willard is an American computer scientist and logician, and is a professor of computer science at the University at Albany. Education and career Willard did his undergraduate studies in mathematics at Stony Brook University, graduati ...

in 1982, along with the more complicated

y-fast trie In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time ''O''(log log ''M''), using ''O''(''n'') space, where ''n'' is the number ...

, as a way to improve the space usage of

van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...

s, while retaining the ''O''(log log ''M'') query time.

Structure

An x-fast trie is a bitwise trie: a

binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...

where each subtree stores values whose binary representations start with a common prefix. Each internal node is labeled with the common prefix of the values in its subtree and typically, the left child adds a 0 to the end of the prefix, while the right child adds a 1. The binary representation of an integer between 0 and ''M'' − 1 uses ⌈log₂ ''M''⌉ bits, so the height of the trie is ''O''(log ''M''). All values in the x-fast trie are stored at the leaves. Internal nodes are stored only if they have leaves in their subtree. If an internal node would have no left child, it stores a pointer to the smallest leaf in its right subtree instead, called a ''descendant'' pointer. Likewise, if it would have no right child, it stores a pointer to the largest leaf in its left subtree. Each leaf stores a pointer to its predecessor and successor, thereby forming a

doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked record (computer science), records called node (computer science), nodes. Each node contains three field (computer science), fields: ...

. Finally, there is a

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...

for each level that contains all the nodes on that level. Together, these hash tables form the level-search structure (LSS). To guarantee the worst-case query times, these hash tables should use

dynamic perfect hashing In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31 ...

cuckoo hashing Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick ...

. The total space usage is ''O''(''n'' log ''M''), since each element has a root-to-leaf path of length ''O''(log ''M'').

Operations

s, x-fast tries support the operations of an ''ordered

associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an as ...

''. This includes the usual associative array operations, along with two more ''order'' operations, ''Successor'' and ''Predecessor'': *''Find''(''k''): find the value associated with the given key *''Successor''(''k''): find the key/value pair with the smallest key larger than or equal to the given key *''Predecessor''(''k''): find the key/value pair with the largest key less than or equal to the given key *''Insert''(''k'', ''v''): insert the given key/value pair *''Delete''(''k''): remove the key/value pair with the given key

Find

Finding the value associated with a key ''k'' that is in the data structure can be done in constant time by looking up ''k'' in ''LSS'' which is a hash table on all the leaves.

Successor and Predecessor

To find the successor or predecessor of a key ''k'', we first find ''A''_''k'', the lowest ancestor of ''k''. This is the node in the trie that has the longest common prefix with ''k''. To find ''A''_''k'', we perform a

binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...

on the levels. We start at level ''h''/2, where ''h'' is the height of the trie. On each level, we query the corresponding hash table in the level-search structure with the prefix of ''k'' of the right length. If a node with that prefix does not exist, we know that ''A''_''k'' must be at a higher level and we restrict our search to those. If a node with that prefix does exist, ''A''_''k'' can not be at a higher level, so we restrict our search to the current and lower levels. Once we find the lowest ancestor of ''k'', we know that it has leaves in one of its subtrees (otherwise it wouldn't be in the trie) and ''k'' should be in the other subtree. Therefore the descendant pointer points to the successor or the predecessor of ''k''. Depending on which one we are looking for, we might have to take one step in the linked list to the next or previous leaf. Since the trie has height ''O''(log ''M''), the binary search for the lowest ancestor takes ''O''(log log ''M'') time. After that, the successor or predecessor can be found in constant time, so the total query time is ''O''(log log ''M'').

Insert

To insert a key-value pair (''k'', ''v''), we first find the predecessor and successor of ''k''. Then we create a new leaf for ''k'', insert it in the linked list of leaves between the successor and predecessor, and give it a pointer to ''v''. Next, we walk from the root to the new leaf, creating the necessary nodes on the way down, inserting them into the respective hash tables and updating descendant pointers where necessary. Since we have to walk down the entire height of the trie, this process takes ''O''(log ''M'') time.

Delete

To delete a key ''k'', we find its leaf using the hash table on the leaves. We remove it from the linked list, but remember which were the successor and predecessor. Then we walk from the leaf to the root of the trie, removing all nodes whose subtree only contained ''k'' and updating the descendant pointers where necessary. Descendant pointers that used to point to ''k'' will now point to either the successor or predecessor of ''k'', depending on which subtree is missing. Like insertion, this takes ''O''(log ''M'') time, as we have to walk through every level of the trie.

Discussion

Willard introduced x-fast tries largely as an introduction to

s, which provide the same query time, while using only ''O''(''n'') space and allowing insertions and deletions in ''O''(log log ''M'') time. A compression technique similar to

patricia trie In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resul ...

s can be used to significantly reduce the space usage of x-fast tries in practice. By using an

exponential search In computer science, an exponential search (also called doubling search or galloping search or Struzik search) is an algorithm, created by Jon Bentley and Andrew Chi-Chih Yao in 1976, for searching sorted, unbounded/infinite lists. There are num ...

before the binary search over the levels and by querying not only the current prefix ''x'', but also its successor ''x'' + 1, x-fast tries can answer predecessor and successor queries in time ''O''(log log ''Δ''), where ''Δ'' is the difference between the query value and its predecessor or successor.

References

External links

Open Data Structure - Chapter 13 - Data Structures for Integers
{{CS-Trees Trees (data structures)