A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a

tree data structure In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...

which implements an

associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an a ...

with -bit integer keys. It was invented by a team led by

Dutch Dutch commonly refers to: * Something of, from, or related to the Netherlands * Dutch people () * Dutch language () Dutch may also refer to: Places * Dutch, West Virginia, a community in the United States * Pennsylvania Dutch Country People ...

computer scientist

Peter van Emde Boas Peter van Emde Boas (born 3 April 1945, Amsterdam) is a Dutch computer scientist and professor at the University of Amsterdam. He gained his doctorate in 1974 under Adriaan van Wijngaarden. The Van Emde Boas tree A van Emde Boas tree (), also ...

in 1975. It performs all operations in time, or equivalently in time, where is the largest element that can be stored in the tree. The parameter is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has poor space efficiency. For example, for storing 32-bit integers (i.e., when ), it requires bits of storage. However, similar data structures with equally good time efficiency and space exist, where is the number of stored elements.

Supported operations

A vEB supports the operations of an ''ordered

'', which includes the usual associative array operations along with two more ''order'' operations, ''FindNext'' and ''FindPrevious'': *''Insert'': insert a key/value pair with an -bit key *''Delete'': remove the key/value pair with a given key *''Lookup'': find the value associated with a given key *''FindNext'': find the key/value pair with the smallest key which is greater than a given *''FindPrevious'': find the key/value pair with the largest key which is smaller than a given A vEB tree also supports the operations ''Minimum'' and ''Maximum'', which return the minimum and maximum element stored in the tree respectively. These both run in time, since the minimum and maximum element are stored as attributes in each tree.

Function

For the sake of simplicity, let for some integer ''k''. Define . A vEB tree over the universe has a root node that stores an array of length . is a pointer to a vEB tree that is responsible for the values . Additionally, ''T'' stores two values and as well as an auxiliary vEB tree . Data is stored in a vEB tree as follows: The smallest value currently in the tree is stored in and largest value is stored in . Note that is not stored anywhere else in the vEB tree, while is. If ''T'' is empty then we use the convention that and . Any other value ''x'' is stored in the subtree where . The auxiliary tree keeps track of which children are non-empty, so contains the value ''j'' if and only if is non-empty.

FindNext

The operation that searches for the successor of an element ''x'' in a vEB tree proceeds as follows: If then the search is complete, and the answer is . If then the next element does not exist, return M. Otherwise, let . If then the value being searched for is contained in so the search proceeds recursively in . Otherwise, we search for the successor of the value ''i'' in . This gives us the index ''j'' of the first subtree that contains an element larger than ''x''. The algorithm then returns . The element found on the children level needs to be composed with the high bits to form a complete next element. function FindNext(T, x) if x < T.min then return T.min if x ≥ T.max then ''// no next element'' return M i = floor(x/) lo = x mod if lo < T.children max then return ( i) + FindNext(T.children lo) j = FindNext(T.aux, i) return ( j) + T.children min end Note that, in any case, the algorithm performs work and then possibly recurses on a subtree over a universe of size (an bit universe). This gives a recurrence for the running time of , which resolves to .

Insert

The call that inserts a value into a vEB tree operates as follows: # If ''T'' is empty then we set and we are done. # Otherwise, if then we insert into the subtree responsible for and then set . If was previously empty, then we also insert into # Otherwise, if then we insert into the subtree responsible for and then set . If was previously empty, then we also insert into # Otherwise, so we insert into the subtree responsible for . If was previously empty, then we also insert into . In code: function Insert(T, x) if T.min > T.max then ''// T is empty'' T.min = T.max = x; return if x < T.min then swap(x, T.min) if x > T.max then T.max = x i = floor(x / ) lo = x mod Insert(T.children lo) if T.children min

T.children max then Insert(T.aux, i) end The key to the efficiency of this procedure is that inserting an element into an empty vEB tree takes time. So, even though the algorithm sometimes makes two recursive calls, this only occurs when the first recursive call was into an empty subtree. This gives the same running time recurrence of as before.

Delete

Deletion from vEB trees is the trickiest of the operations. The call that deletes a value ''x'' from a vEB tree T operates as follows: # If then ''x'' is the only element stored in the tree and we set and to indicate that the tree is empty. # Otherwise, if then we need to find the second-smallest value ''y'' in the vEB tree, delete it from its current location, and set . The second-smallest value ''y'' is , so it can be found in time. We delete ''y'' from the subtree that contains it. # If and then we delete x from the subtree that contains ''x''. # If then we will need to find the second-largest value ''y'' in the vEB tree and set . We start by deleting x as in previous case. Then value ''y'' is either or , so it can be found in time. # In any of the above cases, if we delete the last element ''x'' or ''y'' from any subtree then we also delete ''i'' from . In code: function Delete(T, x) if T.min

T.max

x then T.min = M T.max = −1 return if x

T.min then hi = T.aux.min * j = T.aux.min T.min = x = hi + T.children min i = floor(x / ) lo = x mod Delete(T.children lo) if T.children is empty then Delete(T.aux, i) if x

T.max then if T.aux is empty then T.max = T.min else hi = T.aux.max * j = T.aux.max T.max = hi + T.children max end Again, the efficiency of this procedure hinges on the fact that deleting from a vEB tree that contains only one element takes only constant time. In particular, the second Delete call only executes if ''x'' was the only element in prior to the deletion.

Discussion

The assumption that is an integer is unnecessary. The operations

x\sqrt

and

x\bmod\sqrt

can be replaced by taking only higher-order and the lower-order bits of , respectively. On any existing machine, this is more efficient than division or remainder computations. In practical implementations, especially on machines with ''shift-by-k'' and ''find first zero'' instructions, performance can further be improved by switching to a

bit array A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level par ...

once equal to the

word size In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word s ...

(or a small multiple thereof) is reached. Since all operations on a single word are constant time, this does not affect the asymptotic performance, but it does avoid the majority of the pointer storage and several pointer dereferences, achieving a significant practical savings in time and space with this trick. An obvious optimization of vEB trees is to discard empty subtrees. This makes vEB trees quite compact when they contain many elements, because no subtrees are created until something needs to be added to them. Initially, each element added creates about new trees containing about pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones. In a full tree of elements, only space is used. Moreover, unlike a binary search tree, most of this space is being used to store data: even for billions of elements, the pointers in a full vEB tree number in the thousands. The implementation described above uses pointers and occupies a total space of , proportional to the size of the key universe. This can be seen as follows. The recurrence is

S(M) = O( \sqrt) + (\sqrt+1) \cdot S(\sqrt)

. Resolving that would lead to

S(M) \in (1 + \sqrt)^  + \log \log M \cdot O( \sqrt )

. One can, fortunately, also show that by induction.

Similar structures

The space usage of vEB trees is an enormous overhead unless a large fraction of the universe of keys is being stored. This is one reason why vEB trees are not popular in practice. This limitation can be addressed by changing the array used to store children to another data structure. One possibility is to use only a fixed number of bits per level, which results in a

trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes de ...

. Alternatively, each array may be replaced by a

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...

, reducing the space to (where is the number of elements stored in the data structure) at the expense of making the data structure randomized. Other data structures including x-fast tries and y-fast tries have comparable update and query times to vEB trees and also use randomized hash tables to reduce the space to and respectively.

Implementations

There is a verified implementation in

Isabelle (proof assistant) The Isabelle automated theorem prover is a higher-order logic (HOL) theorem prover, written in Standard ML and Scala. As an LCF-style theorem prover, it is based on a small logical core (kernel) to increase the trustworthiness of proofs withou ...

. Both functional correctness and time bounds are proved. Efficient imperative

Standard ML Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of th ...

code can be generated.