Geometry Of Binary Search Trees
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, one approach to the dynamic optimality problem on
online algorithm In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an o ...
s for
binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
s involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible in order to avoid rectangles with only two points on their boundary.


Access sequences and competitive ratio

As typically formulated, the online binary search tree problem involves search trees defined over a fixed key set \. An ''access sequence'' is a sequence x_1, x_2, ... where each access x_i belongs to the key set. Any particular algorithm for maintaining binary search trees (such as the splay tree algorithm or Iacono's working set structure) has a ''cost'' for each access sequence that models the amount of time it would take to use the structure to search for each of the keys in the access sequence in turn. The cost of a search is modeled by assuming that the search tree algorithm has a single pointer into a binary search tree, which at the start of each search points to the root of the tree. The algorithm may then perform any sequence of the following operations: * Move the pointer to its left child. * Move the pointer to its right child. * Move the pointer to its parent. * Perform a single
tree rotation In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape ...
on the pointer and its parent. The search is required, at some point within this sequence of operations to move the pointer to a node containing the key, and the cost of the search is the number of operations that are performed in the sequence. The total cost cost''A''(''X'') for algorithm ''A'' on access sequence ''X'' is the sum of the costs of the searches for each successive key in the sequence. As is standard in competitive analysis, the competitive ratio of an algorithm ''A'' is defined to be the maximum, over all access sequences, of the ratio of the cost for ''A'' to the best cost that any algorithm could achieve: :\rho_A = \sup_X \frac. The dynamic optimality conjecture states that splay trees have constant competitive ratio, but this remains unproven. The geometric view of binary search trees provides a different way of understanding the problem that has led to the development of alternative algorithms that could also (conjecturally) have a constant competitive ratio.


Translation to a geometric point set

In the geometric view of the online binary search tree problem, an ''access sequence'' x_1, . . ., x_m (sequence of searches performed on a binary search tree (BST) with a key set ) is mapped to the set of points , where the X-axis represents the key space and the Y-axis represents time; to which a set of touched nodes is added. By touched nodes we mean the following. Consider a BST access algorithm with a single pointer to a node in the tree. At the beginning of an access to a given key x_i, this pointer is initialized to the root of the tree. Whenever the pointer moves to or is initialized to a node, we say that the node is touched. We represent a BST algorithm for a given input sequence by drawing a point for each item that gets touched. For example, assume the following BST on 4 nodes is given: The key set is . Let 3, 1, 4, 2 be the access sequence. * In the first access, only the node 3 is touched. * In the second access, the nodes 3 and 1 are touched. * In the third access - 3 and 4 are touched. * In the fourth access, touch 3, then 1, and after that 2. The touches are represented geometrically: If an item ''x'' is touched in the operations for the ''i''th access, then a point (''x'',''i'') is plotted.


Arborally satisfied point sets

A point set is said to be arborally satisfied if the following property holds: for any pair of points that do not lie on the same horizontal or vertical line, there exists a third point which lies in the rectangle spanned by the first two points (either inside or on the boundary).


Theorem

A point set containing the points (x_i, i) is arborally satisfied if and only if it corresponds to a valid BST for the input sequence x_1, x_2, . . ., x_m.


Proof

First, prove that the point set for any valid BST algorithm is arborally satisfied. Consider points (x, i) and (y, j), where is touched at time and is touched at time . Assume by symmetry that x < y and i < j. It needs to be shown that there exists a third point in the rectangle with corners as (x, i) and (y, j). Also let \mathrm_t(a, b) denote the
lowest common ancestor In graph theory and computer science, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes and in a tree or directed acyclic graph (DAG) is the lowest (i.e. deepest) node that has both and as descendants, where ...
of nodes and right before time . There are a few cases: * If \mathrm_i(x, y) \ne x, then use the point (\mathrm_i(x, y), i), since \mathrm_i(x, y) must have been touched if was. * If \mathrm_j(x, y) \ne y, then the point (\mathrm_j(x, y), j) can be used. * If neither of the above two cases hold, then must be an ancestor of right before time and be an ancestor of right before time . Then at some time (i \le k < j), must have been rotated above , so the point (y, k) can be used. Next, show the other direction: given an arborally satisfied point set, a valid BST corresponding to that point set can be constructed. Organize our BST into a treap which is organized in heap-order by next-touch-time. Note that next-touch-time has ties and is thus not uniquely defined, but this isn’t a problem as long as there is a way to break ties. When time reached, the nodes touched form a connected subtree at the top, by the heap ordering property. Now, assign new next-touch-times for this subtree, and rearrange it into a new local treap. If a pair of nodes, and , straddle the boundary between the touched and untouched part of the treap, then if is to be touched sooner than then (x, now) \to (y, next - touch(y)) is an unsatisfied rectangle because the leftmost such point would be the right child of , not .


Corollary

Finding the best BST execution for the input sequence x_1, x_2, . . ., x_m is equivalent to finding the minimum cardinality superset of points (that contains the input in geometric representation) that is arborally satisfied. The more general problem of finding the minimum cardinality arborally satisfied superset of a general set of input points (not limited to one input point per coordinate), is known to be
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
.


Greedy algorithm

The following
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
constructs arborally satisfiable sets: * Sweep the point set with a horizontal line by increasing coordinate. * At time , place the minimal number of points at y = i to make the point set up to y \ge i arborally satisfied. This minimal set of points is uniquely defined: for any unsatisfied rectangle formed with (x_i, i) in one corner, add the other corner at y = i. The algorithm has been conjectured to be optimal within an additive term.


Other results

The geometry of binary search trees has been used to provide an algorithm which is dynamically optimal if any binary search tree algorithm is dynamically optimal.


See also

*
Binary search algorithm In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the ...
*
Tango tree A tango tree is a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Pătrașcu in 2004. It is named after Buenos Aires, of which the tango is emblematic. It is an online binary search tree that achieve ...
s * Splay trees *
Self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donal ...
*
Optimal binary search tree In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses ...
*
Interleave lower bound In the theory of optimal binary search trees, the interleave lower bound is a lower bound on the number of operations required by a Binary Search Tree (BST) to execute a given sequence of accesses. Several variants of this lower bound have been pro ...


References

{{Reflist Binary trees Geometry