HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a self-balancing binary search tree (BST) is any
node In general, a node is a localized swelling (a " knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph * Vertex (geometry), a point where two or more curves, line ...
-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
. ''
The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of com ...
'', Volume 3: ''Sorting and Searching'', Second Edition. Addison-Wesley, 1998. . Section 6.2.3: Balanced Trees, pp.458–481.
These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these
abstract data structures In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior ( semantics) from the point of view of a '' user'', of the data, specifically in terms of possible values ...
receive the attribute "self-balancing". For height-balanced binary trees, the height is defined to be logarithmic \mathcal O(\log n) in the number n of items. This is the case for many binary search trees, such as AVL trees and
red–black tree In computer science, a red–black tree is a kind of self-balancing binary search tree. Each node stores an extra bit representing "color" ("red" or "black"), used to ensure that the tree remains balanced during insertions and deletions. When th ...
s. Splay trees and treaps are self-balancing but not height-balanced, as their height is not guaranteed to be logarithmic in the number of items. Self-balancing binary search trees provide efficient implementations for mutable ordered
lists A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby union ...
, and can be used for other abstract data structures such as associative arrays,
priority queue In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure in which each element additionally has a ''priority'' associated with it. In a priority queue, an element with high priority is se ...
s and sets.


Overview

Most operations on a binary search tree (BST) take time directly proportional to the height of the tree, so it is desirable to keep the height small. A binary tree with height ''h'' can contain at most 20+21+···+2''h'' = 2''h''+1−1 nodes. It follows that for any tree with ''n'' nodes and height ''h'': :n\le 2^-1 And that implies: :h\ge\lceil\log_2(n+1)-1\rceil\ge \lfloor\log_2 n\rfloor. In other words, the minimum height of a binary tree with ''n'' nodes is rounded down; that is, \lfloor \log_2 n\rfloor. However, the simplest algorithms for BST item insertion may yield a tree with height ''n'' in rather common situations. For example, when the items are inserted in sorted key order, the tree degenerates into a
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which ...
with ''n'' nodes. The difference in performance between the two situations may be enormous: for example, when ''n'' = 1,000,000, the minimum height is \lfloor \log_2(1,000,000) \rfloor = 19 . If the data items are known ahead of time, the height can be kept small, in the average sense, by adding values in a random order, resulting in a
random binary search tree In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Two different distributions are commonly used: binary trees formed by inserting nodes one at ...
. However, there are many situations (such as online algorithms) where this randomization is not viable. Self-balancing binary trees solve this problem by performing transformations on the tree (such as tree rotations) at key insertion times, in order to keep the height proportional to Although a certain overhead is involved, it is not bigger than the always necessary lookup cost and may be justified by ensuring fast execution of all operations. While it is possible to maintain a BST with minimum height with expected O(\log n) time operations (lookup/insertion/removal), the additional space requirements required to maintain such a structure tend to outweigh the decrease in search time. For comparison, an AVL tree is guaranteed to be within a factor of 1.44 of the optimal height while requiring only two additional bits of storage in a naive implementation. Therefore, most self-balancing BST algorithms keep the height within a constant factor of this lower bound. In the
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related context ...
(" Big-O") sense, a self-balancing BST structure containing ''n'' items allows the lookup, insertion, and removal of an item in O(log ''n'') worst-case time, and ordered enumeration of all items in O(''n'') time. For some implementations these are per-operation time bounds, while for others they are amortized bounds over a sequence of operations. These times are asymptotically optimal among all data structures that manipulate the key only through comparisons.


Implementations

Data structures implementing this type of tree include: * 2–3 tree *
AA tree An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after Arne Andersson, the one who theorized them. AA trees are a variation of the red–black tree, a form of ...
* AVL tree * B-tree *
Red–black tree In computer science, a red–black tree is a kind of self-balancing binary search tree. Each node stores an extra bit representing "color" ("red" or "black"), used to ensure that the tree remains balanced during insertions and deletions. When th ...
* Scapegoat tree * Splay tree * Tango tree * Treap *
Weight-balanced tree In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in t ...


Applications

Self-balancing binary search trees can be used in a natural way to construct and maintain ordered lists, such as
priority queue In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure in which each element additionally has a ''priority'' associated with it. In a priority queue, an element with high priority is se ...
s. They can also be used for associative arrays; key-value pairs are simply inserted with an ordering based on the key alone. In this capacity, self-balancing BSTs have a number of advantages and disadvantages over their main competitor, hash tables. One advantage of self-balancing BSTs is that they allow fast (indeed, asymptotically optimal) enumeration of the items ''in key order'', which hash tables do not provide. One disadvantage is that their lookup algorithms get more complicated when there may be multiple items with the same key. Self-balancing BSTs have better worst-case lookup performance than hash tables (O(log n) compared to O(n)), but have worse average-case performance (O(log n) compared to O(1)). Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists, to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is implemented with a self-balancing BST, we have a very simple-to-describe yet asymptotically optimal O(''n'' log ''n'') sorting algorithm. Similarly, many algorithms in
computational geometry Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems ar ...
exploit variations on self-balancing BSTs to solve problems such as the line segment intersection problem and the point location problem efficiently. (For average-case performance, however, self-balancing BSTs may be less efficient than other solutions. Binary tree sort, in particular, is likely to be slower than merge sort, quicksort, or
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks ...
, because of the tree-balancing overhead as well as cache access patterns.) Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently record additional information or perform new operations. For example, one can record the number of nodes in each subtree having a certain property, allowing one to count the number of nodes in a certain key range with that property in O(log ''n'') time. These extensions can be used, for example, to optimize database queries or other list-processing algorithms.


See also

* Search data structure * Day–Stout–Warren algorithm *
Fusion tree In computer science, a fusion tree is a type of tree data structure that implements an associative array on -bit integers on a finite universe, where each of the input integer has size less than 2w and is non-negative. When operating on a collec ...
* Skip list *
Sorting Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items. # ordering: arranging items in a sequence ordered by some criterion; # categorizing: grouping items with similar pro ...


References


External links


Dictionary of Algorithms and Data Structures: Height-balanced binary search tree

GNU libavl
a LGPL-licensed library of binary tree implementations in C, with documentation {{DEFAULTSORT:Self-Balancing Binary Search Tree Binary trees Trees (data structures)