T-tree
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
a T-tree is a type of
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, ...
that is used by main-memory databases, such as Datablitz,
eXtremeDB eXtremeDB is a high performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/ C++ based programs. It works on Windows, Linux, and other ...
,
MySQL Cluster MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scal ...
, Oracle TimesTen and MobileLite. A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
is an index structure optimized for storage on block oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them. T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields. The 'T' in T-tree refers to the shape of the node data structures in the original paper which first described this type of index.


Node structures

A T-tree node usually consists of pointers to the parent node, the left and right child node, an ordered array of data pointers and some extra control data. Nodes with two
subtree In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
s are called ''internal nodes'', nodes without
subtree In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
s are called ''leaf nodes'' and nodes with only one
subtree In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
are named ''half-leaf'' nodes. A node is called the ''bounding node'' for a value if the value is between the node's current minimum and maximum value, inclusively. For each internal node, leaf or half leaf nodes exist that contain the predecessor of its smallest data value (called the ''greatest lower bound'') and one that contains the successor of its largest data value (called the ''least upper bound''). Leaf and half-leaf nodes can contain any number of data elements from one to the maximum size of the data array. Internal nodes keep their occupancy between predefined minimum and maximum numbers of elements


Algorithms


Search

* Search starts at the root node * If the current node is the bounding node for the search value then search its data array. Search fails if the value is not found in the data array. * If the search value is less than the minimum value of the current node then continue search in its left subtree. Search fails if there is no left subtree. * If the search value is greater than the maximum value of the current node then continue search in its right subtree. Search fails if there is no right subtree.


Insertion

* Search for a bounding node for the new value. If such a node exists then: ** check whether there is still space in its data array, if so then insert the new value and finish ** if no space is available then remove the minimum value from the node's data array and insert the new value. Now proceed to the node holding the greatest lower bound for the node that the new value was inserted to. If the removed minimum value still fits in there then add it as the new maximum value of the node, else create a new right subnode for this node. * If no bounding node was found then insert the value into the last node searched if it still fits into it. In this case the new value will either become the new minimum or maximum value. If the value doesn't fit anymore then create a new left or right subtree. If a new node was added then the tree might need to be rebalanced, as described below.


Deletion

* Search for bounding node of the value to be deleted. If no bounding node is found then finish. * If the bounding node does not contain the value then finish. * delete the value from the node's data array Now we have to distinguish by node type: * Internal node: If the node's data array now has less than the minimum number of elements then move the greatest lower bound value of this node to its data value. Proceed with one of the following two steps for the half leaf or leaf node the value was removed from. * Leaf node: If this was the only element in the data array then delete the node. Rebalance the tree if needed. * Half leaf node: If the node's data array can be merged with its leaf's data array without overflow then do so and remove the leaf node. Rebalance the tree if needed.


Rotation and balancing

A T-tree is implemented on top of an underlying
self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donal ...
. Specifically, Lehman and Carey's article describes a T-tree balanced like an
AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any nod ...
: it becomes out of balance when a node's child trees differ in height by at least two levels. This can happen after an insertion or deletion of a node. After an insertion or deletion, the tree is scanned from the leaf to the root. If an imbalance is found, one
tree rotation In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape ...
or pair of rotations is performed, which is guaranteed to balance the whole tree. When the rotation results in an internal node having fewer than the minimum number of items, items from the node's new child(ren) are moved into the internal node.


Performance and Storage

Although T-trees were once widely used for main-memory databases because of performance benefits, recent trends for very large main-memory databases put more emphasis on provisioning cost. With modern NOSQL database systems often storing trillions of records, the memory cost to store even a single index that includes actual values can exceed tens or even hundreds of terabytes.


See also

*
Tree (graph theory) In graph theory, a tree is an undirected graph in which any two vertices are connected by ''exactly one'' path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by ...
*
Tree (set theory) In set theory, a tree is a partially ordered set (''T'', <) such that for each ''t'' ∈ ''T'', the set is well-ordered by the relation <. Frequently tre ...
*
Tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...
* Exponential tree


Other trees

*
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
( 2-3 tree, 2-3-4 tree,
B+ tree A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a ...
,
B*-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
,
UB-tree The UB-tree as proposed by Rudolf Bayer and Volker Markl is a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree A B+ tree is an m-ary tree with a variable but often large number of childre ...
) * Dancing tree *
Fusion tree In computer science, a fusion tree is a type of tree data structure that implements an associative array on -bit integers on a finite universe, where each of the input integer has size less than 2w and is non-negative. When operating on a collec ...
*
k-d tree In computer science, a ''k''-d tree (short for ''k-dimensional tree'') is a space-partitioning data structure for organizing points in a ''k''-dimensional space. ''k''-d trees are a useful data structure for several applications, such as sea ...
*
Octree An octree is a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three-dimensional space by recursively subdividing it into eight octants. Octrees are the three-dimensional ana ...
* Quadtree *
R-tree R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found sig ...
*
Radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resul ...
* Top tree


References


External links


Oracle TimesTen FAQ entry on index mentioning T-Trees

Oracle Whitepaper: Oracle TimesTen Products and Technologies

DataBlitz presentation mentioning T-Trees

An Open-source T*-tree Library
{{DEFAULTSORT:T-Tree Binary trees Search trees