Scapegoat tree
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a scapegoat tree is a
self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
, invented by
Arne Andersson Arne Andersson (27 October 1917 – 1 April 2009) was a Swedish middle distance runner who became famous for his rivalry with his compatriot Gunder Hägg in the 1940s. Anderson set a 1500 metres world record in Gothenburg in August 1943 with a ...
in 1989 and again by Igal Galperin and
Ronald L. Rivest Ronald Linn Rivest (; born May 6, 1947) is a cryptography, cryptographer and an List of Institute Professors at the Massachusetts Institute of Technology, Institute Professor at Massachusetts Institute of Technology, MIT. He is a member of MIT' ...
in 1993. It provides worst-case lookup time (with n as the number of entries) and O(\log n)
amortized In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case r ...
insertion and deletion time. Unlike most other self-balancing binary search trees which also provide worst case O(\log n) lookup time, scapegoat trees have no additional per-node memory overhead compared to a regular
binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
: besides key and value, a node stores only two pointers to the child nodes. This makes scapegoat trees easier to implement and, due to
data structure alignment Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing. The CPU in modern computer hardware performs reads and ...
, can reduce node overhead by up to one-third. Instead of the small incremental rebalancing operations used by most balanced tree algorithms, scapegoat trees rarely but expensively choose a "scapegoat" and completely rebuild the subtree rooted at the scapegoat into a complete binary tree. Thus, scapegoat trees have O(n) worst-case update performance.


Theory

A binary search tree is said to be weight-balanced if half the nodes are on the left of the root, and half on the right. An α-weight-balanced node is defined as meeting a relaxed weight balance criterion: size(left) ≤ α*size(node) size(right) ≤ α*size(node) Where size can be defined recursively as: function size(node) is if node = nil then return 0 else return size(node->left) + size(node->right) + 1 end if end function Even a degenerate tree (linked list) satisfies this condition if α=1, whereas an α=0.5 would only match almost complete binary trees. A binary search tree that is α-weight-balanced must also be α-height-balanced, that is height(tree) ≤ floor(log1/α(size(tree))) By
contraposition In logic and mathematics, contraposition refers to the inference of going from a conditional statement into its logically equivalent contrapositive, and an associated proof method known as Proof by contrapositive, proof by contraposition. The cont ...
, a tree that is not α-height-balanced is not α-weight-balanced. Scapegoat trees are not guaranteed to keep α-weight-balance at all times, but are always loosely α-height-balanced in that height(scapegoat tree) ≤ floor(log1/α(size(tree))) + 1. Violations of this height balance condition can be detected at insertion time, and imply that a violation of the weight balance condition must exist. This makes scapegoat trees similar to
red–black tree In computer science, a red–black tree is a kind of self-balancing binary search tree. Each node stores an extra bit representing "color" ("red" or "black"), used to ensure that the tree remains balanced during insertions and deletions. When the ...
s in that they both have restrictions on their height. They differ greatly though in their implementations of determining where the rotations (or in the case of scapegoat trees, rebalances) take place. Whereas red–black trees store additional 'color' information in each node to determine the location, scapegoat trees find a scapegoat which isn't α-weight-balanced to perform the rebalance operation on. This is loosely similar to
AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node d ...
s, in that the actual rotations depend on 'balances' of nodes, but the means of determining the balance differs greatly. Since AVL trees check the balance value on every insertion/deletion, it is typically stored in each node; scapegoat trees are able to calculate it only as needed, which is only when a scapegoat needs to be found. Unlike most other self-balancing search trees, scapegoat trees are entirely flexible as to their balancing. They support any α such that 0.5 < α < 1. A high α value results in fewer balances, making insertion quicker but lookups and deletions slower, and vice versa for a low α. Therefore in practical applications, an α can be chosen depending on how frequently these actions should be performed.


Operations


Lookup

Lookup is not modified from a standard binary search tree, and has a worst-case time of O(\log n). This is in contrast to splay trees which have a worst-case time of O(n). The reduced node memory overhead compared to other self-balancing binary search trees can further improve
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
and caching.


Insertion

Insertion is implemented with the same basic ideas as an unbalanced binary search tree, however with a few significant changes. When finding the insertion point, the depth of the new node must also be recorded. This is implemented via a simple counter that gets incremented during each iteration of the lookup, effectively counting the number of edges between the root and the inserted node. If this node violates the α-height-balance property (defined above), a rebalance is required. To rebalance, an entire subtree rooted at a scapegoat undergoes a balancing operation. The scapegoat is defined as being an ancestor of the inserted node which isn't α-weight-balanced. There will always be at least one such ancestor. Rebalancing any of them will restore the α-height-balanced property. One way of finding a scapegoat, is to climb from the new node back up to the root and select the first node that isn't α-weight-balanced. Climbing back up to the root requires O(\log n) storage space, usually allocated on the stack, or parent pointers. This can actually be avoided by pointing each child at its parent as you go down, and repairing on the walk back up. To determine whether a potential node is a viable scapegoat, we need to check its α-weight-balanced property. To do this we can go back to the definition: size(left) ≤ α*size(node) size(right) ≤ α*size(node) However a large optimisation can be made by realising that we already know two of the three sizes, leaving only the third to be calculated. Consider the following example to demonstrate this. Assuming that we're climbing back up to the root: size(parent) = size(node) + size(sibling) + 1 But as: size(inserted node) = 1. The case is trivialized down to: size +1= size + size(sibling) + 1 Where x = this node, x + 1 = parent and size(sibling) is the only function call actually required. Once the scapegoat is found, the subtree rooted at the scapegoat is completely rebuilt to be perfectly balanced. This can be done in O(n) time by traversing the nodes of the subtree to find their values in sorted order and recursively choosing the median as the root of the subtree. As rebalance operations take O(n) time (dependent on the number of nodes of the subtree), insertion has a worst-case performance of O(n) time. However, because these worst-case scenarios are spread out, insertion takes O(\log n) amortized time.


Sketch of proof for cost of insertion

Define the Imbalance of a node ''v'' to be the absolute value of the difference in size between its left node and right node minus 1, or 0, whichever is greater. In other words: I(v) = \operatorname(, \operatorname(v) - \operatorname(v), - 1, 0) Immediately after rebuilding a subtree rooted at ''v'', I(''v'') = 0. Lemma: Immediately before rebuilding the subtree rooted at ''v'',
I(v) \in \Omega (, v, )
(\Omega is
Big Omega notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
.) Proof of lemma: Let v_0 be the root of a subtree immediately after rebuilding. h(v_0) = \log(, v_0, + 1) . If there are \Omega (, v_0, ) degenerate insertions (that is, where each inserted node increases the height by 1), then
I(v) \in \Omega (, v_0, ) ,
h(v) = h(v_0) + \Omega (, v_0, ) and
\log(, v, ) \le \log(, v_0, + 1) + 1 . Since I(v) \in \Omega (, v, ) before rebuilding, there were \Omega (, v, ) insertions into the subtree rooted at v that did not result in rebuilding. Each of these insertions can be performed in O(\log n) time. The final insertion that causes rebuilding costs O(, v, ). Using
aggregate analysis In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case r ...
it becomes clear that the amortized cost of an insertion is O(\log n): = O(\log n)


Deletion

Scapegoat trees are unusual in that deletion is easier than insertion. To enable deletion, scapegoat trees need to store an additional value with the tree data structure. This property, which we will call MaxNodeCount simply represents the highest achieved NodeCount. It is set to NodeCount whenever the entire tree is rebalanced, and after insertion is set to max(MaxNodeCount, NodeCount). To perform a deletion, we simply remove the node as you would in a simple binary search tree, but if NodeCount ≤ α*MaxNodeCount then we rebalance the entire tree about the root, remembering to set MaxNodeCount to NodeCount. This gives deletion a worst-case performance of O(n) time, whereas the amortized time is O(\log n).


Sketch of proof for cost of deletion

Suppose the scapegoat tree has n elements and has just been rebuilt (in other words, it is a complete binary tree). At most n/2 - 1 deletions can be performed before the tree must be rebuilt. Each of these deletions take O(\log n) time (the amount of time to search for the element and flag it as deleted). The n/2 deletion causes the tree to be rebuilt and takes O(\log n) + O(n) (or just O(n)) time. Using aggregate analysis it becomes clear that the amortized cost of a deletion is O(\log n): = = O(\log n) \


Etymology

The name Scapegoat tree ''" ..is based on the common wisdom that, when something goes wrong, the first thing people tend to do is find someone to blame (the scapegoat)."'' In the
Bible The Bible (from Koine Greek , , 'the books') is a collection of religious texts or scriptures that are held to be sacred in Christianity, Judaism, Samaritanism, and many other religions. The Bible is an anthologya compilation of texts of a ...
, a
scapegoat In the Bible, a scapegoat is one of a pair of kid goats that is released into the wilderness, taking with it all sins and impurities, while the other is sacrificed. The concept first appears in the Book of Leviticus, in which a goat is designate ...
is an animal that is ritually burdened with the sins of others, and then driven away.


See also

* Splay tree *
Trees In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are u ...
*
Tree rotation In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape ...
*
AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node d ...
*
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
*
T-tree In computer science a T-tree is a type of binary tree data structure that is used by main memory database, main-memory databases, such as Datablitz, eXtremeDB, MySQL Cluster, TimesTen, Oracle TimesTen and MobileLite. A T-tree is a Height-balan ...


References


External links

* * {{DEFAULTSORT:Scapegoat Tree Binary trees Search trees Amortized data structures