In computer science, a
Contents 1 Overview 1.1 Variants 1.2 Etymology 2
2.1 Time to search a sorted file
2.2 An index speeds the search
2.3 Insertions and deletions
2.4 Advantages of
3 Technical description 3.1 Terminology 3.2 Definition 4 Best case and worst case heights 5 Algorithms 5.1 Search 5.2 Insertion 5.3 Deletion 5.3.1 Deletion from a leaf node 5.3.2 Deletion from an internal node 5.3.3 Rebalancing after deletion 5.4 Sequential access 5.5 Initial construction 6 In filesystems 7 Variations 7.1 Access concurrency 8 See also 9 Notes 10 References 10.1 Original papers 11 External links Overview[edit] A
In B-trees, internal (non-leaf) nodes can have a variable number of
child nodes within some pre-defined range. When data is inserted or
removed from a node, its number of child nodes changes. In order to
maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need
re-balancing as frequently as other self-balancing search trees, but
may waste some space, since nodes are not entirely full. The lower and
upper bounds on the number of child nodes are typically fixed for a
particular implementation. For example, in a 2-3
d displaystyle d and 2 d displaystyle 2d , where d displaystyle d is the minimum number of keys, and d + 1 displaystyle d+1 is the minimum degree or branching factor of the tree. In practice, the keys take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If an internal node has 2 d displaystyle 2d keys, then adding a key to that node can be accomplished by splitting the hypothetical 2 d + 1 displaystyle 2d+1 key node into two d displaystyle d key nodes and moving the key that would have been in the middle to the parent node. Each split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each have d displaystyle d keys, then a key may be deleted from the internal node by combining it with its neighbor. Deleting the key would make the internal node have d − 1 displaystyle d-1 keys; joining the neighbor would add d displaystyle d keys plus one more key brought down from the neighbor's parent. The result is an entirely full node of 2 d displaystyle 2d keys.
The number of branches (or child nodes) from a node will be one more
than the number of keys stored in the node. In a 2-3 B-tree, the
internal nodes will store either one key (with two child nodes) or two
keys (with three child nodes). A
( d + 1 ) displaystyle (d+1) — ( 2 d + 1 ) displaystyle (2d+1) or simply with the highest branching order, ( 2 d + 1 ) displaystyle (2d+1) .
A
In the B+ tree, copies of the keys are stored in the internal nodes; the keys and records are stored in leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential access.[1] The B* tree balances more neighboring internal nodes to keep the internal nodes more densely packed.[1] This variant ensures non-root nodes are at least 2/3 full instead of 1/2 (Knuth 1998, p. 488). As the most costly part of operation of inserting the node in B-tree is splitting the node, B*-trees are created to postpone splitting operation as long as they can.[2] To maintain this, instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. This spill operation is less costly to do than split, because it requires only shifting the keys between existing nodes, not allocating memory for a new one.[2] For inserting, first it is checked whether the node has some free space in it, and if so, the new key is just inserted in the node. However, if the node is full (it has m − 1 keys, where m is the order of the tree as maximum number of pointers to subtrees from one node), it needs to be checked whether the right sibling exists and has some free space. If the right sibling has j < m − 1 keys, then keys are redistributed between the two sibling nodes as evenly as possible. For this purpose, m keys from the current node plus the new key inserted, one key from the parent node and j keys from the sibling node are seen as an ordered array of m + j + 1 keys. The array becomes split by half, so that ⌊(m + j + 1)/2⌋ lowest keys stay in the current node, the next (middle) key is inserted in the parent and the rest go to the right sibling.[2] (The newly inserted key might end up in any of the three places.) Situation when right sibling is full, and left isn't is analogous.[2] When both the sibling nodes are full, then the two nodes (current node and a sibling) are split into three and one more key is shifted up the tree, to the parent node.[2] If the parent is full, then spill/split operation propagates towards the root node.[2] Deleting nodes is somewhat more complex than inserting however. B-trees can be turned into order statistic trees to allow rapid searches for the Nth record in key order, or counting the number of records between any two records, and various other related operations.[3] Etymology[edit]
The origin of "B-tree" has never been explained by the authors. As we shall see, "balanced," "broad," or "bushy" might apply. Others suggest that the "B" stands for Boeing. Because of his contributions, however, it seems appropriate to think of B-trees as "Bayer"-trees. (Comer 1979, p. 123 footnote 1)
Bayer and I were in a lunchtime where we get to think [of] a name. And
... B is, you know ... We were working for
keeps keys in sorted order for sequential traversing uses a hierarchical index to minimize the number of disk reads uses partially full blocks to speed insertions and deletions keeps the index balanced with a recursive algorithm In addition, a
Every node has at most m children. Every non-leaf node (except root) has at least ⌈m/2⌉ children. The root has at least two children if it is not a leaf node. A non-leaf node with k children contains k−1 keys. All leaves appear in the same level Each internal node’s keys act as separation values which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2. Internal nodes
Internal nodes are all nodes except for leaf nodes and the root node.
They are usually represented as an ordered set of elements and child
pointers. Every internal node contains a maximum of U children and a
minimum of L children. Thus, the number of elements is always 1 less
than the number of child pointers (the number of elements is between
L−1 and U−1). U must be either 2L or 2L−1; therefore each
internal node is at least half full. The relationship between U and L
implies that two half-full nodes can be joined to make a legal node,
and one full node can be split into two legal nodes (if there’s room
to push one element up into the parent). These properties make it
possible to delete and insert new values into a
The root node The root node’s number of children has the same upper limit as internal nodes, but has no lower limit. For example, when there are fewer than L−1 elements in the entire tree, the root will be the only node in the tree with no children at all. Leaf nodes Leaf nodes have the same restriction on the number of elements, but have no children, and no child pointers. A
⌈ log m ( n + 1 ) ⌉ − 1 displaystyle lceil log _ m (n+1)rceil -1 Let d displaystyle d be the minimum number of children an internal (non-root) node can have. For an ordinary B-tree, d = ⌈ m / 2 ⌉ . displaystyle d=leftlceil m/2rightrceil . Comer (1979, p. 127) and Cormen et al. (2001, pp. 383–384)
give the worst case height of a
h ≤ ⌊ log d ( n + 1 2 ) ⌋ . displaystyle hleq leftlfloor log _ d left( frac n+1 2 right)rightrfloor . Algorithms[edit] This article may be confusing or unclear to readers. In particular, the discussion below uses "element", "value", "key", "separator", and "separation value" to mean essentially the same thing. The terms are not clearly defined. There are some subtle issues at the root and leaves. Please help us clarify the article. There might be a discussion about this on the talk page. (February 2012) (Learn how and when to remove this template message) Search[edit]
Searching is similar to searching a binary search tree. Starting at
the root, the tree is recursively traversed from top to bottom. At
each level, the search reduces its field of view to the child pointer
(subtree) whose range includes the search value. A subtree's range is
defined by the values, or keys, contained in its parent node. These
limiting values are also known as separation values.
A B Tree insertion example with each iteration. The nodes of this B tree have at most 3 children (Knuth order 3). All insertions start at a leaf node. To insert a new element, search the tree to find the leaf node where the new element should be added. Insert the new element into that node with the following steps: If the node contains fewer than the maximum allowed number of elements, then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered. Otherwise the node is full, evenly split it into two nodes so: A single median is chosen from among the leaf's elements and the new element. Values less than the median are put in the new left node and values greater than the median are put in the new right node, with the median acting as a separation value. The separation value is inserted in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree). If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children, which is why the lower bound on the size of internal nodes does not apply to the root. The maximum number of elements per node is U−1. When a node is split, one element moves to the parent, but one element is added. So, it must be possible to divide the maximum number U−1 of elements into two legal nodes. If this number is odd, then U=2L and one of the new nodes contains (U−2)/2 = L−1 elements, and hence is a legal node, and the other contains one more element, and hence it is legal too. If U−1 is even, then U=2L−1, so there are 2L−2 elements in the node. Half of this number is L−1, which is the minimum number of elements allowed per node. An improved algorithm[citation needed] supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this improved algorithm, we must be able to send one element to the parent and split the remaining U−2 elements into two legal nodes, without adding a new element. This requires U = 2L rather than U = 2L−1, which accounts for why some textbooks impose this requirement in defining B-trees. Deletion[edit] There are two popular strategies for deletion from a B-tree. Locate and delete the item, then restructure the tree to retain its invariants, OR Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring The algorithm below uses the former strategy. There are two special cases to consider when deleting an element: The element in an internal node is a separator for its child nodes Deleting an element may put its node under the minimum number of elements and children The procedures for these cases are in order below. Deletion from a leaf node[edit] Search for the value to delete. If the value is in a leaf node, simply delete it from the node. If underflow happens, rebalance the tree as described in section "Rebalancing after deletion" below. Deletion from an internal node[edit] Each element in an internal node acts as a separation value for two subtrees, therefore we need to find a replacement for separation. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is still greater than the separator. Both of those elements are in leaf nodes, and either one can be the new separator for the two subtrees. Algorithmically described below: Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator. The previous step deleted an element (the new separator) from a leaf node. If that leaf node is now deficient (has fewer than the required number of nodes), then rebalance the tree starting from the leaf node. Rebalancing after deletion[edit] Rebalancing starts from a leaf and proceeds toward the root until the tree is balanced. If deleting an element from a node has brought it under the minimum size, then some elements must be redistributed to bring all nodes up to the minimum. Usually, the redistribution involves moving an element from a sibling node that has more than the minimum number of nodes. That redistribution operation is called a rotation. If no sibling can spare an element, then the deficient node must be merged with a sibling. The merge causes the parent to lose a separator element, so the parent may become deficient and need rebalancing. The merging and rebalancing may continue all the way to the root. Since the minimum element count doesn't apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows:[citation needed] If the deficient node's right sibling exists and has more than the minimum number of elements, then rotate left Copy the separator from the parent to the end of the deficient node (the separator moves down; the deficient node now has the minimum number of elements) Replace the separator in the parent with the first element of the right sibling (right sibling loses one node but still has at least the minimum number of elements) The tree is now balanced Otherwise, if the deficient node's left sibling exists and has more than the minimum number of elements, then rotate right Copy the separator from the parent to the start of the deficient node (the separator moves down; deficient node now has the minimum number of elements) Replace the separator in the parent with the last element of the left sibling (left sibling loses one node but still has at least the minimum number of elements) The tree is now balanced Otherwise, if both immediate siblings have only the minimum number of elements, then merge with a sibling sandwiching their separator taken off from their parent Copy the separator to the end of the left node (the left node may be the deficient node or it may be the sibling with the minimum number of elements) Move all elements from the right node to the left node (the left node now has the maximum number of elements, and the right node – empty) Remove the separator from the parent along with its empty right child (the parent loses an element) If the parent is the root and now has no elements, then free it and make the merged node the new root (tree becomes shallower) Otherwise, if the parent has fewer than the required number of elements, then rebalance the parent Note: The rebalancing operations are different for B+ trees (e.g., rotation is different because parent has copy of the key) and B*-tree (e.g., three siblings are merged into two siblings). Sequential access[edit] While freshly loaded databases tend to have good sequential behavior, this behavior becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges.[9] Initial construction[edit] This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (April 2018) (Learn how and when to remove this template message) A common special case is adding a large amount of pre-sorted data into
an initially empty B-tree. While it is quite possible to simply
perform a series of successive inserts, inserting sorted data results
in a tree comprised almost entirely of half-full nodes. Instead, a
special "bulk loading" algorithm can be used to produce a more
efficient tree with a higher branching factor.
When the input is sorted, all insertions are at the rightmost edge of
the tree, and in particular any time a node is split, we are
guaranteed that the no more insertions will take place in the left
half. When bulk loading, we take advantage of this, and instead of
splitting overfull nodes evenly, split them as unevenly as possible:
leave the left node completely full and create a right node with zero
keys and one child (in violation of the usual
i displaystyle i address into a disk block (or perhaps to a cylinder-head-sector) address. Some operating systems require the user to allocate the maximum size of the file when the file is created. The file can then be allocated as contiguous disk blocks. In that case, to convert the file block address i displaystyle i into a disk block address, the operating system simply adds the file block address i displaystyle i to the address of the first disk block constituting the file. The
scheme is simple, but the file cannot exceed its created size.
Other operating systems allow a file to grow. The resulting disk
blocks may not be contiguous, so mapping logical blocks to physical
blocks is more involved.
MS-DOS, for example, used a simple
i displaystyle i , the operating system (or disk utility) must sequentially follow the
file's linked list in the FAT. Worse, to find a free disk block, it
must sequentially scan the FAT. For MS-DOS, that was not a huge
penalty because the disks and files were small and the FAT had few
entries and relatively short file chains. In the
B+tree R-tree Red–black tree 2–3 tree 2–3–4 tree Notes[edit] ^ For FAT, what is called a "disk block" here is what the FAT documentation calls a "cluster", which is fixed-size group of one or more contiguous whole physical disk sectors. For the purposes of this discussion, a cluster has no significant difference from a physical sector. ^ Two of these were reserved for special purposes, so only 4078 could actually represent disk blocks (clusters). References[edit] ^ a b c Comer, Douglas (June 1979). "The Ubiquitous B-Tree". Computing
Surveys. 11 (2): 123–137. doi:10.1145/356770.356776.
^ a b c d e f Tomašević, Milo (2008). Algorithms and Data
Structures. Belgrade, Serbia: Akademska misao. pp. 274–275.
ISBN 978-86-7466-328-8.
^ Counted B-Trees, retrieved 2010-01-25
^ Knuth's video lectures from Stanford
^ Video of the talk at CPM 2013 (24th Annual Symposium on
Combinatorial Pattern Matching, Bad Herrenalb, Germany, June 17–19,
2013), retrieved 2014-01-17; see question asked by Martin
Farach-Colton
^ Seagate Technology LLC, Product Manual: Barracuda ES.2 Serial ATA,
Rev. F., publication 100468393, 2008 [1], page 6
^ Bayer & McCreight (1972) avoided the issue by saying an index
element is a (physically adjacent) pair of (x, a) where x is the
key, and a is some associated information. The associated information
might be a pointer to a record or records in a random access, but what
it was didn't really matter. Bayer & McCreight (1972) states, "For
this paper the associated information is of no further interest."
^ If n is zero, then no root node is needed, so the height of an empty
tree is not well defined.
^ "Cache Oblivious B-trees". State University of New York (SUNY) at
Stony Brook. Retrieved 2011-01-17.
^ Mark Russinovich. "Inside Win2K NTFS, Part 1". Microsoft Developer
Network. Archived from the original on 13 April 2008. Retrieved
2008-04-18.
^ Matthew Dillon (2008-06-21). "The
General Bayer, R.; McCreight, E. (1972), "Organization and Maintenance of
Large Ordered Indexes" (PDF), Acta Informatica, 1 (3): 173–189,
doi:10.1007/bf00288683
Comer, Douglas (June 1979), "The Ubiquitous B-Tree", Computing
Surveys, 11 (2): 123–137, doi:10.1145/356770.356776,
ISSN 0360-0300 .
Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford
(2001),
Original papers[edit] Bayer, Rudolf; McCreight, E. (July 1970), Organization and Maintenance
of Large Ordered Indices, Mathematical and Information Sciences Report
No. 20,
External links[edit]
v t e Tree data structures Search trees (dynamic sets/associative arrays) 2–3 2–3–4 AA (a,b) AVL B B+ B* Bx (Optimal) Binary search Dancing HTree Interval Order statistic (Left-leaning) Red-black Scapegoat Splay T Treap UB Weight-balanced Heaps Binary Binomial Brodal Fibonacci Leftist Pairing Skew Van Emde Boas Weak Tries Ctrie
Spatial data partitioning trees Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree Priority R Quad R R+ R* Segment VP X Other trees Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top v t e Data structures Types Collection Container Abstract Associative array Multimap List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set Arrays Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix Linked Association list Linked list Skip list Unrolled linked list XOR linked list Trees B-tree
AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Trie Hash tree Graphs Binary decision diagram Directed acyclic graph Directed acyclic word graph List |