Join-based Tree Algorithms
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, join-based tree algorithms are a class of algorithms for
self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donal ...
s. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation ''join''. Under this framework, the ''join'' operation captures all balancing criteria of different balancing schemes, and all other functions ''join'' have generic implementation across different balancing schemes. The ''join-based algorithms'' can be applied to at least four balancing schemes:
AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any nod ...
s, red–black trees,
weight-balanced tree In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the ...
s and
treap In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of in ...
s. The ''join''(L,k,R) operation takes as input two binary balanced trees L and R of the same balancing scheme, and a key k, and outputs a new balanced binary tree t whose
in-order traversal In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once. S ...
is the in-order traversal of L, then k then the in-order traversal of R. In particular, if the trees are
search tree In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less ...
s, which means that the in-order of the trees maintain a
total ordering In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X: # a \leq a ( reflexive) ...
on keys, it must satisfy the condition that all keys in L are smaller than k and all keys in R are greater than k.


History

The ''join'' operation was first defined by Tarjan on red–black trees, which runs in worst-case logarithmic time. Later Sleator and Tarjan described a ''join'' algorithm for splay trees which runs in amortized logarithmic time. Later Adams . extended ''join'' to
weight-balanced tree In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the ...
s and used it for fast set–set functions including
union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...
, intersection and
set difference In set theory, the complement of a set , often denoted by (or ), is the set of elements not in . When all sets in the universe, i.e. all sets under consideration, are considered to be members of a given set , the absolute complement of is the ...
. In 1998, Blelloch and Reid-Miller extended ''join'' on
treap In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of in ...
s, and proved the bound of the set functions to be O(m\log (1+\tfrac)) for two trees of size m and n(\ge m), which is optimal in the comparison model. They also brought up parallelism in Adams' algorithm by using a divide-and-conquer scheme. In 2016, Blelloch et al. formally proposed the join-based algorithms, and formalized the ''join'' algorithm for four different balancing schemes:
AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any nod ...
s, red–black trees,
weight-balanced tree In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the ...
s and
treap In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of in ...
s. In the same work they proved that Adams' algorithms on union, intersection and difference are work-optimal on all the four balancing schemes.


Join algorithms

The function ''join''(t_1,k,t_2) considers rebalancing the tree, and thus depends on the input balancing scheme. If the two trees are balanced, ''join'' simply creates a new node with left subtree , root and right subtree . Suppose that is heavier (this "heavier" depends on the balancing scheme) than (the other case is symmetric). ''Join'' follows the right spine of until a node which is balanced with . At this point a new node with left child , root and right child is created to replace c. The new node may invalidate the balancing invariant. This can be fixed with rotations. The following is the ''join'' algorithms on different balancing schemes. The ''join'' algorithm for AVL trees: function joinRightAVL(TL, k, TR) (l, k', c) := expose(TL) if h(c) ≤ h(TR) + 1 T' := Node(c, k, TR) if h(T') ≤ h(l) + 1 return Node(l, k', T') else return rotateLeft(Node(l, k', rotateRight(T'))) else T' := joinRightAVL(c, k, TR) T :''='' Node(l, k', T') if h(T') ≤ h(l) + 1 return T else return rotateLeft(T) function joinLeftAVL(TL, k, TR) /* symmetric to joinRightAVL */ function join(TL, k, TR) if h(TL) > h(TR) + 1 return joinRightAVL(TL, k, TR) else if h(TR) > h(TL) + 1 return joinLeftAVL(TL, k, TR) else return Node(TL, k, TR) Where: * h(v) is the height of node v. * \text(v) extracts the left child l, key k, and right child r of node v into a tuple (l,k,r) . * \text(l,k,r) creates a node with left child l, key k, and right child r. The ''join'' algorithm for red–black trees: function joinRightRB(TL, k, TR) if r(TL) = ⌊r(TR)/2⌋ × 2 return Node(TL, ⟨k, red⟩, TR) else (L', ⟨k', c'⟩, R') := expose(TL) T' := Node(L', ⟨k', c'⟩, joinRightRB(R', k, TR)) if c' = black and T'.right.color = T'.right.right.color = red T'.right.right.color := black return rotateLeft(T') else return T' function joinLeftRB(TL, k, TR) /* symmetric to joinRightRB */ function join(TL, k, TR) if ⌊r(TL)/2⌋ > ⌊r(TR)/2⌋ × 2 T' := joinRightRB(TL, k, TR) if (T'.color = red) and (T'.right.color = red) T'.color := black return T' else if ⌊r(TR)/2⌋ > ⌊r(TL)/2⌋ × 2 /* symmetric */ else if TL.color = black and TR = black return Node(TL, ⟨k, red⟩, TR) else return Node(TL, ⟨k, black⟩, TR) Where: * of a node v means twice the black height of a black node, and the twice the black height of a red node. * \text(v) extracts the left child l, key k, color c, and right child r of node v into a tuple (l, \langle k, c \rangle, r) . * creates a node with left child l, key k, color c, and right child r. The ''join'' algorithm for weight-balanced trees: function joinRightWB(TL, k, TR) (l, k', c) := expose(TL) if w(TL) =α w(TR) return Node(TL, k, TR) else T' := joinRightWB(c, k, TR) (l1, k1, r1) := expose(T') if w(l) =α w(T') return Node(l, k', T') else if w(l) =α w(l1) and w(l)+w(l1) =α w(r1) return rotateLeft(Node(l, k', T')) else return rotateLeft(Node(l, k', rotateRight(T')) function joinLeftWB(TL, k, TR) /* symmetric to joinRightWB */ function join(TL, k, TR) if w(TL) >α w(TR) return joinRightWB(TL, k, TR) else if w(TR) >α w(TL) return joinLeftWB(TL, k, TR) else return Node(TL, k, TR) Where: * w(v) is the weight of node ''v.'' * w_1 =_\alpha w_2 means weights w_1 and w_2 are α-weight-balanced. * w_1 >_\alpha w_2 means weight w_1 is heavier than weight w_2 with respect to the α-weight-balance. * \text(v) extracts the left child l, key k, and right child r of node v into a tuple (l,k,r) . * \text(l,k,r) creates a node with left child l, key k and right child r.


Join-based algorithms

In the following, \text(v) extracts the left child l, key k, and right child r of node v into a tuple (l,k,r) . \text(l,k,r) creates a node with left child l, key k and right child r. "s_1 , , s_2" means that two statements s_1 and s_2 can run in parallel.


Split

To split a tree into two trees, those smaller than key ''x'', and those larger than key ''x'', we first draw a path from the root by inserting ''x'' into the tree. After this insertion, all values less than ''x'' will be found on the left of the path, and all values greater than ''x'' will be found on the right. By applying ''Join'', all the subtrees on the left side are merged bottom-up using keys on the path as intermediate nodes from bottom to top to form the left tree, and the right part is asymmetric. For some applications, ''Split'' also returns a boolean value denoting if ''x'' appears in the tree. The cost of ''Split'' is O(\log n), order of the height of the tree. The split algorithm is as follows: function split(T, k) if (T = nil) return (nil, false, nil) else (L, m, R) := expose(T) if k < m (L', b, R') := split(L, k) return (L', b, join(R', m, R)) else if k > m (L', b, R') := split(R, k) return (join(L, m, L'), b, R')) else return (L, true, R)


Join2

This function is defined similarly as ''join'' but without the middle key. It first splits out the last key k of the left tree, and then join the rest part of the left tree with the right tree with k. The algorithm is as follows: function splitLast(T) (L, k, R) := expose(T) if R = nil return (L, k) else (T', k') := splitLast(R) return (join(L, k, T'), k') function join2(L, R) if L = nil return R else (L', k) := splitLast(L) return join(L', k, R) The cost is O(\log n) for a tree of size n.


Insert and delete

The insertion and deletion algorithms, when making use of ''join'' can be independent of balancing schemes. For an insertion, the algorithm compares the key to be inserted with the key in the root, inserts it to the left/right subtree if the key is smaller/greater than the key in the root, and joins the two subtrees back with the root. A deletion compares the key to be deleted with the key in the root. If they are equal, return join2 on the two subtrees. Otherwise, delete the key from the corresponding subtree, and join the two subtrees back with the root. The algorithms are as follows: function insert(T, k) if T = nil return Node(nil, k, nil) else (L, k', R) := expose(T) if k < k' return join(insert(L,k), k', R) else if k > k' return join(L, k', insert(R, k)) else return T function delete(T, k) if T = nil return nil else (L, k', R) := expose(T) if k < k' return join(delete(L, k), k', R) else if k > k' return join(L, k', delete(R, k)) else return join2(L, R) Both insertion and deletion requires O(\log n) time if , T, =n.


Set–set functions

Several set operations have been defined on weight-balanced trees:
union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...
, intersection and
set difference In set theory, the complement of a set , often denoted by (or ), is the set of elements not in . When all sets in the universe, i.e. all sets under consideration, are considered to be members of a given set , the absolute complement of is the ...
. The union of two weight-balanced trees and representing sets and , is a tree that represents . The following recursive function computes this union: function union(t1, t2) if t1 = nil return t2 else if t2 = nil return t1 else (l1, k1, r1) := expose(t1) (t<, b, t>) := split(t2, k1) l' := union(l1, t<) , , r' := union(r1, t>) return join(l', k1, r') Similarly, the algorithms of intersection and set-difference are as follows: function intersection(t1, t2) if t1 = nil or t2 = nil return nil else (l1, k1, r1) := expose(t1) (t<, b, t>) = split(t2, k1) l' := intersection(l1, t<) , , r' := intersection(r1, t>) if b return join(l', k1, r') else return join2(l', r') function difference(t1, t2) if t1 = nil return nil else if t2 = nil return t1 else (l1, k1, r1) := expose(t1) (t<, b, t>) := split(t2, k1) l' = difference(l1, t<) , , r' = difference(r1, t>) if b return join2(l', r') else return join(l', k1, r') The complexity of each of union, intersection and difference is O\left(m \log \left(\tfrac+1\right)\right) for two weight-balanced trees of sizes m and n(\ge m). This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth O(\log m\log n). When m=1, the join-based implementation applies the same computation as in a single-element insertion or deletion if the root of the larger tree is used to split the smaller tree.


Build

The algorithm for building a tree can make use of the union algorithm, and use the divide-and-conquer scheme: function build(A[], n) if n = 0 return nil else if n = 1 return Node(nil, A[0], nil) else l' := build(A, n/2) , , r' := (A+n/2, n-n/2) return union(L, R) This algorithm costs O(n\log n) work and has O(\log^3 n) depth. A more-efficient algorithm makes use of a parallel sorting algorithm. function buildSorted(A[], n) if n = 0 return nil else if n = 1 return Node(nil, A[0], nil) else l' := build(A, n/2) , , r' := (A+n/2+1, n-n/2-1) return join(l', A /2 r') function build(A[], n) A' := sort(A, n) return buildSorted(A, n) This algorithm costs O(n\log n) work and has O(\log n) depth assuming the sorting algorithm has O(n\log n) work and O(\log n) depth.


Filter

This function selects all entries in a tree satisfying a predicate p, and return a tree containing all selected entries. It recursively filters the two subtrees, and join them with the root if the root satisfies p, otherwise ''join2'' the two subtrees. function filter(T, p) if T = nil return nil else (l, k, r) := expose(T) l' := filter(l, p) , , r' := filter(r, p) if p(k) return join(l', k, r') else return join2(l', R) This algorithm costs work O(n) and depth O(\log^2 n) on a tree of size n, assuming p has constant cost.


Used in libraries

The join-based algorithms are applied to support interface for sets,
maps A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
, and
augmented map In computer science, the augmented map is an abstract data type (ADT) based on ordered maps, which associates each ordered map an augmented value. For an ordered map m with key type K, comparison function <_K on K and value type ...
s in libraries such as
Hackage Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lang ...
,
SML/NJ Standard ML of New Jersey (SML/NJ; Standard Meta-Language of New Jersey) is a free and open-source compiler and programming environment for the Standard ML programming language. Aside from its runtime system, which is written in C, SML/NJ is wr ...
, and PAM.


Notes


References


External links


PAM
the parallel augmented map library
Hackage
Containers in Hackage {{data structures Binary trees