HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a red–black tree is a kind of self-balancing binary search tree. Each node stores an extra bit representing "color" ("red" or "black"), used to ensure that the tree remains balanced during insertions and deletions. When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently. The re-balancing is not perfect, but guarantees searching in O(\log n) time, where n is the number of entries. The insert and delete operations, along with the tree rearrangement and recoloring, are also performed in O(\log n) time. Tracking the color of each node requires only one bit of information per node because there are only two colors. The tree does not contain any other data specific to it being a red–black tree, so its memory footprint is almost identical to that of a classic (uncolored) binary search tree. In many cases, the additional bit of information can be stored at no additional memory cost.


History

In 1972, Rudolf Bayer invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not ''binary'' search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2–3–4 trees or just 2–4 trees. In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree. The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at
Xerox PARC PARC (Palo Alto Research Center; formerly Xerox PARC) is a research and development company in Palo Alto, California. Founded in 1969 by Jacob E. "Jack" Goldman, chief scientist of Xerox Corporation, the company was originally a division of Xero ...
. Another response from Guibas states that it was because of the red and black pens available to them to draw the trees. In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations. In 1999,
Chris Okasaki Chris Okasaki, Ph.D. is an associate professor of computer science at the United States Military Academy. He authored ''Purely Functional Data Structures'' (1998), based on a doctoral dissertation of the same name. He obtained a Ph.D. at Carnegie ...
showed how to make the insert operation purely functional. Its balance function needed to take care of only 4 unbalanced cases and one default balanced case. The original algorithm used 8 unbalanced cases, but reduced that to 6 unbalanced cases. Sedgewick showed that the insert operation can be implemented in just 46 lines of Java code. In 2008, Sedgewick proposed the left-leaning red–black tree, leveraging Andersson’s idea that simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3 trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46 lines of code.


Terminology

A red–black tree is a special type of binary search tree, used in
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
to organize pieces of comparable
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
, such as text fragments or numbers (as e.g. the numbers in figures 1 and 2). The nodes carrying keys and/or data are frequently called "internal nodes", but in order to make this very specific they are also called non-NIL nodes in this article. The leaf nodes of red–black trees (NIL in figure 1) do not contain keys or data. These "leaves" need not be explicit individuals in computer memory: a NULL pointer can —as in all binary tree data structures— encode the fact that there is no child node at this position in the (parent) node. Nevertheless, by their position in the tree, these objects are in relation to other nodes that is relevant to the RB-structure, it may have parent, sibling (i.e., the other child of the parent), uncle, even nephew node; and may be child—but never parent of another node. It is not really necessary to attribute a "color" to these end-of-path objects, because the condition "is NIL or BLACK" is implied by the condition "is NIL" (see also this remark). Figure 2 shows the conceptually same red–black tree without these NIL leaves. In order to arrive at the same notion of a path, one has to notice that e.g. 3 paths run through the node 1, namely a path through 1left plus 2 additional paths through 1right, namely the paths through 6left and 6right. This way, these ends of the paths are also docking points for new nodes to be inserted, fully equivalent to the NIL leaves of figure 1. On the other hand, in order to save a marginal amount of execution time, these (possibly many) NIL leaves may be implemented as pointers to one unique (and black) sentinel node (instead of pointers of value NULL). As a conclusion, the fact that a child does not exist (is not a true node, does not contain data) can in all occurrences be specified by the very same NULL pointer or as the very same pointer to a sentinel node. Throughout this article, either choice is called NIL node and has the constant value NIL. The black depth of a node is defined as the number of black nodes from the root to that node (i.e. the number of black ancestors). The black height of a red–black tree is the number of black nodes in any path from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the black depth of any leaf node). The black height of a node is the black height of the subtree rooted by it. In this article, the black height of a NIL node shall be set to 0, because its subtree is empty as suggested by figure 2, and its tree height is also 0.


Properties

In addition to the requirements imposed on a binary search tree the following must be satisfied by a # Every node is either red or black. # All NIL nodes (figure 1) are considered black. # A red node does not have a red child. # Every path from a given node to any of its descendant NIL nodes goes through the same number of black nodes. Some authors, e.g. Cormen & al., claim "the root is black" as fifth requirement; but not Mehlhorn & Sanders or Sedgewick & Wayne. Since the root can always be changed from red to black, this rule has little effect on analysis. This article also omits it, because it slightly disturbs the recursive algorithms and proofs. As an example, every
perfect binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary ...
that consists only of black nodes is a red–black tree. The read-only operations, such as search or tree traversal, do not affect any of the requirements. On the other hand, the modifying operations
insert Insert may refer to: *Insert (advertising) *Insert (composites) *Insert (effects processing) *Insert (filmmaking) *Insert key on a computer keyboard, used to switch between insert mode and overtype mode *Insert (molecular biology) *Insert (SQL) *Fi ...
and delete easily maintain requirements 1 and 2, but with respect to the other requirements some extra effort has to be taken, in order to avoid the introduction of a violation of requirement 3, called a red-violation, or of requirement 4, called a black-violation. The requirements enforce a critical property of red–black trees: ''the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf''. The result is that the tree is height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height h of the tree, this upper bound on the height allows red–black trees to be efficient in the worst case, namely logarithmic in the number n of entries, i.e. h \in O(\log n), which is not the case for ordinary binary search trees. For a mathematical proof see section Proof of bounds. Red–black trees, like all binary search trees, allow quite efficient sequential access (e.g.
in-order traversal In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once ...
, that is: in the order Left–Root–Right) of their elements. But they support also asymptotically optimal direct access via a traversal from root to leaf, resulting in O(\log n) search time.


Analogy to B-trees of order 4

A red–black tree is similar in structure to a B-tree of order 4, where each node can contain between 1 and 3 values and (accordingly) between 2 and 4 child pointers. In such a B-tree, each node will contain only one value matching the value in a black node of the red–black tree, with an optional value before and/or after it in the same node, both matching an equivalent red node of the red–black tree. One way to see this equivalence is to "move up" the red nodes in a graphical representation of the red–black tree, so that they align horizontally with their parent black node, by creating together a horizontal cluster. In the B-tree, or in the modified graphical representation of the red–black tree, all leaf nodes are at the same depth. The red–black tree is then structurally equivalent to a B-tree of order 4, with a minimum fill factor of 33% of values per cluster with a maximum capacity of 3 values. This B-tree type is still more general than a red–black tree though, as it allows ambiguity in a red–black tree conversion—multiple red–black trees can be produced from an equivalent B-tree of order 4 (see figure 3). If a B-tree cluster contains only 1 value, it is the minimum, black, and has two child pointers. If a cluster contains 3 values, then the central value will be black and each value stored on its sides will be red. If the cluster contains two values, however, either one can become the black node in the red–black tree (and the other one will be red). So the order-4 B-tree does not maintain which of the values contained in each cluster is the root black tree for the whole cluster and the parent of the other values in the same cluster. Despite this, the operations on red–black trees are more economical in time because you don’t have to maintain the vector of values. It may be costly if values are stored directly in each node rather than being stored by reference. B-tree nodes, however, are more economical in space because you don’t need to store the color attribute for each node. Instead, you have to know which slot in the cluster vector is used. If values are stored by reference, e.g. objects, null references can be used and so the cluster can be represented by a vector containing 3 slots for value pointers plus 4 slots for child references in the tree. In that case, the B-tree can be more compact in memory, improving data locality. The same analogy can be made with B-trees with larger orders that can be structurally equivalent to a colored binary tree: you just need more colors. Suppose that you add blue, then the blue–red–black tree defined like red–black trees but with the additional constraint that no two successive nodes in the hierarchy will be blue and all blue nodes will be children of a red node, then it becomes equivalent to a B-tree whose clusters will have at most 7 values in the following colors: blue, red, blue, black, blue, red, blue (For each cluster, there will be at most 1 black node, 2 red nodes, and 4 blue nodes). For moderate volumes of values, insertions and deletions in a colored binary tree are faster compared to B-trees because colored trees don’t attempt to maximise the fill factor of each horizontal cluster of nodes (only the minimum fill factor is guaranteed in colored binary trees, limiting the number of splits or junctions of clusters). B-trees will be faster for performing
rotation Rotation, or spin, is the circular movement of an object around a '' central axis''. A two-dimensional rotating object has only one possible central axis and can rotate in either a clockwise or counterclockwise direction. A three-dimensional ...
s (because rotations will frequently occur within the same cluster rather than with multiple separate nodes in a colored binary tree). For storing large volumes, however, B-trees will be much faster as they will be more compact by grouping several children in the same cluster where they can be accessed locally. All optimizations possible in B-trees to increase the average fill factors of clusters are possible in the equivalent multicolored binary tree. Notably, maximizing the average fill factor in a structurally equivalent B-tree is the same as reducing the total height of the multicolored tree, by increasing the number of non-black nodes. The worst case occurs when all nodes in a colored binary tree are black, the best case occurs when only a third of them are black (and the other two thirds are red nodes).


Applications and related data structures

Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures that provide worst-case guarantees; for example, many data structures used in
computational geometry Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems ar ...
can be based on red–black trees, and the Completely Fair Scheduler used in current
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
kernels and
epoll epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.44 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It ...
system call implementation uses red–black trees. The AVL tree is another structure supporting O(\log n) search, insertion, and removal. AVL trees can be colored red–black, thus are a subset of RB trees. Worst-case height is 0.720 times the worst-case height of RB trees, so AVL trees are more rigidly balanced. The performance measurements of Ben Pfaff with realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and geometric mean 0.910. WAVL trees have a performance in between those two. Red–black trees are also particularly valuable in
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions tha ...
, where they are one of the most common persistent data structures, used to construct associative arrays and
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
s that can retain previous versions after mutations. The persistent version of red–black trees requires O(\log n) space for each insertion or deletion, in addition to time. For every 2–4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2–4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2–4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2–4 trees just before red–black trees, even though 2–4 trees are not often used in practice. In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black tree by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes. Red–black trees can be made isometric to either 2–3 trees, or 2–4 trees, for any sequence of operations. The 2–4 tree isometry was described in 1978 by Sedgewick. With 2–4 trees, the isometry is resolved by a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and moves to the parent node. The original description of the tango tree, a type of tree optimised for fast searches, specifically uses red–black trees as part of its data structure. As of
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
8, the
HashMap In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', al ...
has been modified such that instead of using a LinkedList to store different elements with colliding hashcodes, a red–black tree is used. This results in the improvement of time complexity of searching such an element from O(m) to O(\log m) where m is the number of elements with colliding hashcodes.


Operations

The read-only operations, such as search or tree traversal, on a red–black tree require no modification from those used for binary search trees, because every red–black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red–black tree, the restoration of which is called ''rebalancing'' so that red–black trees become self-balancing. It requires in the ''worst case'' a small number, O(\log n) in
Big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund L ...
, where n is the number of objects in the tree, on average or ''amortized'' O(1), a constant number, of color changes (which are very quick in practice); and no more than three tree rotations (two for insertion). If the example implementation below is not suitable, other implementations with explanations may be found in Ben Pfaff’s annotated C librar
GNU libavl
(v2.0.3 as of June 2019). The details of the insert and removal operations will be demonstrated with example C++ code. The example code uses the type definitions and macros below, as well as the helper function for rotations: // Basic type definitions: enum color_t ; struct RBnode ; #define NIL NULL // null pointer or pointer to sentinel node #define LEFT 0 #define RIGHT 1 #define left child
EFT A newt is a salamander in the subfamily Pleurodelinae. The terrestrial juvenile phase is called an eft. Unlike other members of the family Salamandridae, newts are semiaquatic, alternating between aquatic and terrestrial habitats. Not all aquat ...
#define right child IGHT struct RBtree ; // Get the child direction (∈ ) // of the non-root non-NIL RBnode* N: #define childDir(N) ( N

(N->parent)->right ? RIGHT : LEFT )
RBnode* RotateDirRoot( RBtree* T, // red–black tree RBnode* P, // root of subtree (may be the root of T) int dir) #define RotateDir(N,dir) RotateDirRoot(T,N,dir) #define RotateLeft(N) RotateDirRoot(T,N,LEFT) #define RotateRight(N) RotateDirRoot(T,N,RIGHT)


Notes to the sample code and diagrams of insertion and removal

The proposal breaks down both, insertion and removal (not mentioning some very simple cases), into six constellations of nodes, edges and colors, which are called cases. The code repairing (rebalancing) a case sometimes uses code of a subsequent case. The proposal contains for both, insertion and removal, exactly one case that advances one black level closer to the root and loops, the other five cases rebalance the tree of their own. The more complicated cases are pictured in a diagram. * The variable N denotes the current node, which is labeled N in the diagrams. * symbolises a red node and a (non-NIL) black node (of black height ≥ 1), symbolises the color red or black of a non-NIL node, but the same color throughout the same diagram. NIL nodes are not represented in the diagrams. * A diagram contains three columns and two to four actions. The left column shows the first iteration, the right column the higher iterations, the middle column shows the segmentation of a case into its different actions.The left columns contain far less nodes than the right ones, especially for removal. This indicates that some efficiency can be gained by pulling the first iteration out of the rebalancing loops of insertion and deletion, because many of the named nodes are NIL nodes in the first iteration and definitively non-NIL later. (See also this remark.) # The action "entry" shows the constellation of nodes with their colors which defines a case and mostly violates some of the
requirements In product development and process optimization, a requirement is a singular documented physical or functional need that a particular design, product or process aims to satisfy. It is commonly used in a formal sense in engineering design, inclu ...
.
A blue border rings the current node N and the other nodes are labeled according to their relation to N. # If a
rotation Rotation, or spin, is the circular movement of an object around a '' central axis''. A two-dimensional rotating object has only one possible central axis and can rotate in either a clockwise or counterclockwise direction. A three-dimensional ...
is considered useful, this is pictured in the next action, which is labeled "rotation". # If some recoloring is considered useful, this is pictured in the next action, which is labeled "color". # If there is still some need to repair, the cases make use of code of other cases and this after a reassignment of the current node N, which then again carries a blue ring and relative to which other nodes may have to be reassigned as well. This action is labeled "reassign".
For both, insert and delete, there is (exactly) one case which iterates one black level closer to the root; then the reassigned constellation satisfies the respective loop invariant. * A possibly numbered triangle with a black circle atop represents a red–black subtree (connected to its parent according to requirement 3) with a black height equal to the iteration level minus one, i.e. zero in the first iteration. Its root may be red or black.
A possibly numbered triangle represents a red–black subtree with a black height one less, i.e. its parent has black height zero in the second iteration. ;Remark : For simplicity, the sample code uses the
disjunction In logic, disjunction is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is raining or it is snowing" can be represented in logic using the disjunctive formula R \lor ...
: :: U

NIL , , U->color

BLACK ''// considered black''
: and the conjunction: :: U != NIL && U->color

RED   ''// not considered black''
: Thereby, it has to be kept in mind that both statements are ''not'' evaluated in total, if U

NIL
. Then in both cases U->color is not touched (see Short-circuit evaluation).
(The comment considered black is in accordance with requirement 2.) : The related if-statements have to occur far less frequently if the proposal is realised.


Insertion

Insertion begins by placing the new (non-NIL) node, say N, at the position in the binary search tree of a NIL node whose in-order predecessor’s key compares less than the new node’s key, which in turn compares less than the key of its in-order successor. (Frequently, this positioning is the result of a search within the tree immediately preceding the insert operation and consists of a node P together with a direction dir with The newly inserted node is temporarily colored red so that all paths contain the same number of black nodes as before. But if its parent, say P, is also red then this action introduces a red-violation. void RBinsert1( RBtree* T, // -> red–black tree struct RBnode* N, // -> node to be inserted struct RBnode* P, // -> parent node of N ( may be NULL ) byte dir) // side ( LEFT or RIGHT ) of P where to insert N // end of RBinsert1 Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is
in-place In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the outpu ...
.


Removal: simple cases

The label N denotes the current node that at entry is the node to be deleted. If N is the root that does not have a non-NIL child, it is replaced by a NIL node, after which the tree is empty—and in RB-shape. If N has two non-NIL children, an additional navigation to either the maximum element in its left subtree (which is the in-order predecessor) or the minimum element in its right subtree (which is the in-order successor) finds a node with no other node in between (as shown
here Here is an adverb that means "in, on, or at this place". It may also refer to: Software * Here Technologies, a mapping company * Here WeGo (formerly Here Maps), a mobile app and map website by Here Television * Here TV (formerly "here!"), a ...
). This "replacement node", say R, has – as the maximal or minimal element of a subtree – at most one non-NIL child. In order to keep the software completely independent of the node structure as defined by the user, all red–black tree data related with N and R, i.e. the color of and the pointers to and from the two nodes, are exchanged. (The modified red–black tree is the same as before with the exception of the reversed order between N and R, an issue which immediately is resolved by the removal of N.) Now N has at most one non-NIL child. If N has exactly one non-NIL child, it must be a red child, because if it were a black one then requirement 4 would force a second black non-NIL child. If N is a red node, it cannot have a non-NIL child, because this would have to be black by requirement 3. Furthermore, it cannot have exactly one black child as argued just above. As a consequence, the red node N is without any child and can simply be removed. If N is a black node, it may have a red child or no non-NIL child at all. If N has a red child, it is simply replaced with this child after painting the latter black.


Removal of a black non-root leaf

The complex case is when N is not the root, colored black and has only NIL children (⇔ no proper child). In the first iteration, N is replaced by NIL. void RBdelete2( RBtree* T, // -> red–black tree struct RBnode* N) // -> node to be deleted // end of RBdelete2 Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is
in-place In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the outpu ...
.


Proof of bounds

For h\in\N there is a red–black tree of height h with : nodes and there is no red–black tree of this tree height with fewer nodes—therefore it is minimal.
Its black height is   \lceil h/2\rceil   (with black root) or for odd h (then with a red root) also   (h-1)/2~. ;Proof For a red–black tree of a certain height to have minimal number of nodes, it has to have exactly one longest path with maximal number of red nodes, in order to achieve a maximal tree height with a minimal black height. Besides this path all other nodes have to be black. If a node is taken off this tree it either loses height or some RB property. The RB tree of height h=1 with red root is minimal. This is in agreement with : m_1 = 2^ \!+\!2^ \!\!-\!\!2 = 2^1\!+\!2^0\!\!-\!\!2 = 1~. A minimal RB tree (RB''h'' in figure 4) of height h>1 has a root whose two child subtrees are of different height. The higher child subtree is also a minimal RB tree, containing also a longest path that defines its height it has m_ nodes and the black height \lfloor(h\!\!-\!\!1)/2\rfloor =: s . The other subtree is a
perfect binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary ...
of (black) height s having 2^s\!\!-\!\!1=2^\!\!-\!\!1 black nodes—and no red node. Then the number of nodes is by induction The graph of the function m_h is convex and piecewise linear with breakpoints at (h=2k\;, \;m_=2 \cdot 2^k\!-\!2) where k \in \N . The function has been tabulated as m_h= A027383(''h''–1) for h\geq 1 ;Solving the function for h The inequality 9>8=2^3 leads to 3 > 2^, which for odd h leads to : m_h = 3 \cdot 2^-2 = \bigl(3\cdot 2^\bigr) \cdot 2^-2 > 2 \cdot 2^-2. So in the even as well as the odd case, h is in the interval with n being the number of nodes. ;Conclusion A red–black tree with n nodes (keys) has tree height h \in O(\log n) .


Set operations and bulk operations

In addition to the single-element insert, delete and lookup operations, several set operations have been defined on union,
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, thei ...
and
set difference In set theory, the complement of a set , often denoted by (or ), is the set of elements not in . When all sets in the universe, i.e. all sets under consideration, are considered to be members of a given set , the absolute complement of is th ...
. Then fast ''bulk'' operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, ''Split'' and ''Join''. With the new operations, the implementation of red–black trees can be more efficient and highly-parallelizable.. In order to achieve its time complexities this implementation requires that the root is allowed to be either red or black, and that every node stores its own black height. * ''Join'': The function ''Join'' is on two red–black trees and and a key , where , i.e. all keys in are less than , and all keys in are greater than . It returns a tree containing all elements in , as well as . : If the two trees have the same black height, ''Join'' simply creates a new node with left subtree , root and right subtree . If both and have black root, set to be red. Otherwise is set black. : If the black heights are unequal, suppose that has larger black height than (the other case is symmetric). ''Join'' follows the right spine of until a black node , which is balanced with . At this point a new node with left child , root (set to be red) and right child is created to replace c. The new node may invalidate the red–black invariant because at most three red nodes can appear in a row. This can be fixed with a double rotation. If double red issue propagates to the root, the root is then set to be black, restoring the properties. The cost of this function is the difference of the black heights between the two input trees. * ''Split'': To split a red–black tree into two smaller trees, those smaller than key , and those larger than key , first draw a path from the root by inserting into the red–black tree. After this insertion, all values less than will be found on the left of the path, and all values greater than will be found on the right. By applying ''Join'', all the subtrees on the left side are merged bottom-up using keys on the path as intermediate nodes from bottom to top to form the left tree, and the right part is symmetric. : For some applications, ''Split'' also returns a boolean value denoting if appears in the tree. The cost of ''Split'' is O(\log n) , order of the height of the tree. This algorithm actually has nothing to do with any special properties of a red–black tree, and may be used on any tree with a ''join'' operation, such as an AVL tree. The join algorithm is as follows: function joinRightRB(TL, k, TR): if (TL.color=black) and (TL.blackHeight=TR.blackHeight): return Node(TL,⟨k,red⟩,TR) T'=Node(TL.left,⟨TL.key,TL.color⟩,joinRightRB(TL.right,k,TR)) if (TL.color=black) and (T'.right.color=T'.right.right.color=red): T'.right.right.color=black; return rotateLeft(T') return T' /* T ecte T'*/ function joinLeftRB(TL, k, TR): /* symmetric to joinRightRB */ function join(TL, k, TR): if TL.blackHeight>TR.blackHeight: T'=joinRightRB(TL,k,TR) if (T'.color=red) and (T'.right.color=red): T'.color=black return T' if TR.blackHeight>TL.blackHeight: /* symmetric */ if (TL.color=black) and (TR.color=black): return Node(TL,⟨k,red⟩,TR) return Node(TL,⟨k,black⟩,TR) The split algorithm is as follows: function split(T, k): if (T = nil) return (nil, false, nil) if (k = T.key) return (T.left, true, T.right) if (k < T.key): (L',b,R') = split(T.left, k) return (L',b,join(R',T.key,T.right)) (L',b,R') = split(T.right, k) return (join(T.left,T.key,L'),b,T.right) The union of two red–black trees and representing sets and , is a red–black tree that represents . The following recursive function computes this union: function union(t1, t2): if t1 = nil return t2 if t2 = nil return t1 (L1,b,R1)=split(t1,t2.key) proc1=start: TL=union(L1,t2.left) proc2=start: TR=union(R1,t2.right) wait all proc1,proc2 return join(TL, t2.key, TR) Here, ''split'' is presumed to return two trees: one holding the keys less its input key, one holding the greater keys. (The algorithm is
non-destructive Minimally invasive procedures (also known as minimally invasive surgeries) encompass surgical techniques that limit the size of incisions needed, thereby reducing wound healing time, associated pain, and risk of infection. Surgery by definition ...
, but an in-place destructive version exists as well.) The algorithm for intersection or difference is similar, but requires the ''Join2'' helper routine that is the same as ''Join'' but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the red–black tree. Since ''Split'' calls ''Join'' but does not deal with the balancing criteria of red–black trees directly, such an implementation is usually called the "join-based" implementation. The complexity of each of union, intersection and difference is O\left(m \log \left(+1\right)\right) for two red–black trees of sizes m and n(\ge m). This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed
in parallel Two-terminal components and electrical networks can be connected in series or parallel. The resulting electrical network will have two terminals, and itself can participate in a series or parallel topology. Whether a two-terminal "object" is a ...
with a parallel depth O(\log m \log n). When m=1, the join-based implementation has the same computational
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...
(DAG) as single-element insertion and deletion if the root of the larger tree is used to split the smaller tree.


Parallel algorithms

Parallel algorithms for constructing red–black trees from sorted lists of items can run in constant time or O(\log \log n) time, depending on the computer model, if the number of processors available is asymptotically proportional to the number n of items where n\to\infty. Fast search, insertion, and deletion parallel algorithms are also known. The join-based algorithms for red–black trees are parallel for bulk operations, including union, intersection, construction, filter, map-reduce, and so on.


Parallel bulk operations

Basic operations like insertion, removal or update can be parallelised by defining operations that process bulks of multiple elements. It is also possible to process bulks with several basic operations, for example bulks may contain elements to insert and also elements to remove from the tree. The algorithms for bulk operations aren’t just applicable to the red–black tree, but can be adapted to other sorted sequence data structures as well, like the 2–3 tree, 2–3–4 tree and (a,b)-tree. In the following different algorithms for bulk insert will be explained, but the same algorithms can also be applied to removal and update. Bulk insert is an operation that inserts each element of a sequence I into a tree T.


Join-based

This approach can be applied to every sorted sequence data structure that supports efficient join- and split-operations. The general idea is to split I and T in multiple parts and perform the insertions on these parts in parallel. # First the bulk I of elements to insert has to be sorted. # After that, the algorithm splits I into k \in \mathbb^+ parts \langle I_1, \cdots, I_k \rangle of about equal sizes. # Next the tree T has to be split into k parts \langle T_1, \cdots, T_k \rangle in a way, so that for every j \in \mathbb^+ , \, 1 \leq j < k following constraints hold: ##\text(I_j) < \text(T_) ## \text(T_j) < \text(I_) # Now the algorithm inserts each element of I_j into T_j sequentially. This step has to be performed for every j, which can be done by up to k processors in parallel. # Finally, the resulting trees will be joined to form the final result of the entire operation. Note that in Step 3 the constraints for splitting I assure that in Step 5 the trees can be joined again and the resulting sequence is sorted. BulkInsert JoinBased InitialTree.svg, initial tree BulkInsert JoinBased SplitTree.svg, split I and T BulkInsert JoinBased SplitTreeInserted.svg, insert into the split T BulkInsert JoinBased JoinedTree.svg, join T The pseudo code shows a simple divide-and-conquer implementation of the join-based algorithm for bulk-insert. Both recursive calls can be executed in parallel. The join operation used here differs from the version explained in this article, instead join2 is used, which misses the second parameter k. bulkInsert(T, I, k): I.sort() bulklInsertRec(T, I, k) bulkInsertRec(T, I, k): if k = 1: forall e in I: T.insert(e) else m := ⌊size(I) / 2⌋ (T1, _, T2) := split(T, I bulkInsertRec(T1, I .. m ⌈k / 2⌉) , , bulkInsertRec(T2, I + 1 .. size(I) - 1 ⌊k / 2⌋) T ← join2(T1, T2)


= Execution time

= Sorting I is not considered in this analysis. This can be improved by using parallel algorithms for splitting and joining. In this case the execution time is \in O\left(\log , T, + \frac \log , T, \right).


= Work

=


Pipelining

Another method of parallelizing bulk operations is to use a pipelining approach. This can be done by breaking the task of processing a basic operation up into a sequence of subtasks. For multiple basic operations the subtasks can be processed in parallel by assigning each subtask to a separate processor. # First the bulk I of elements to insert has to be sorted. # For each element in I the algorithm locates the according insertion position in T. This can be done in parallel for each element \in I since T won’t be mutated in this process. Now I has to be divided into subsequences S according to the insertion position of each element. For example s_ is the subsequence of I that contains the elements whose insertion position would be to the left of node n. # The middle element m_ of every subsequence s_ will be inserted into T as a new node n'. This can be done in parallel for each m_ since by definition the insertion position of each m_ is unique. If s_ contains elements to the left or to the right of m_, those will be contained in a new set of subsequences S as s_ or s_. # Now T possibly contains up to two consecutive red nodes at the end of the paths form the root to the leaves, which needs to be repaired. Note that, while repairing, the insertion position of elements \in S have to be updated, if the corresponding nodes are affected by rotations.
If two nodes have different nearest black ancestors, they can be repaired in parallel. Since at most four nodes can have the same nearest black ancestor, the nodes at the lowest level can be repaired in a constant number of parallel steps.
This step will be applied successively to the black levels above until T is fully repaired. # The steps 3 to 5 will be repeated on the new subsequences until S is empty. At this point every element \in I has been inserted. Each application of these steps is called a ''stage''. Since the length of the subsequences in S is \in O(, I, ) and in every stage the subsequences are being cut in half, the number of stages is \in O(\log , I, ).
Since all stages move up the black levels of the tree, they can be parallelised in a pipeline. Once a stage has finished processing one black level, the next stage is able to move up and continue at that level. BulkInsert Pipelining InitialTree.svg, Initial tree BulkInsert Pipelining InsertPositions.svg, Find insert positions BulkInsert Pipelining Stage1Insert.svg, Stage 1 inserts elements BulkInsert Pipelining Stage1Repair.svg, Stage 1 begins to repair nodes BulkInsert Pipelining Stage2Insert.svg, Stage 2 inserts elements BulkInsert Pipelining Stage2Repair.svg, Stage 2 begins to repair nodes BulkInsert Pipelining Stage3Insert.svg, Stage 3 inserts elements BulkInsert Pipelining Stage3Repair1.svg, Stage 3 begins to repair nodes BulkInsert Pipelining Stage3Repair2.svg, Stage 3 continues to repair nodes


= Execution time

= Sorting I is not considered in this analysis. Also, , I, is assumed to be smaller than , T, , otherwise it would be more efficient to construct the resulting tree from scratch.


= Work

=


Popular culture

A red–black tree was referenced correctly in an episode of ''Missing'' as noted by Robert Sedgewick in one of his lectures: :


See also

* List of data structures *
Tree data structure In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be c ...
* Tree rotation *
AA tree An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after Arne Andersson, the one who theorized them. AA trees are a variation of the red–black tree, a form of ...
, a variation of the red–black tree * AVL tree * B-tree ( 2–3 tree, 2–3–4 tree,
B+ tree A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a ...
,
B*-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
, UB-tree) * Scapegoat tree * Splay tree * T-tree * WAVL tree


References and notes


Further reading


Mathworld: Red–Black Tree
by Roger Whitney *


External links

*Ben Pfaff: ''An Introduction to Binary Search Trees and Balanced Trees.'' Free Software Foundation, Boston 2004
ftp.gnu.org
(PDF gzip; 1662 kB)
A complete and working implementation in COCW MIT Lecture on Red-black Trees
by Erik Demaine * – Visualization of random and pre-sorted data insertions, in elementary binary search trees, and left-leaning red–black trees
An intrusive red–black tree written in C++
*Red–black BSTs i
3.3 Balanced Search TreesRed–black BST Demo
{{DEFAULTSORT:Red-Black Tree 1972 in computing Articles containing proofs Articles with example C code Binary trees Search trees