In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, the B
x tree is a query that is used to update efficient
B+ tree
A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.
A B+ tree can be viewed as a B ...
-based index structures for moving objects.
Index structure
The base structure of the B
x-tree is a B+ tree in which the internal
node
In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex).
Node may refer to:
In mathematics
* Vertex (graph theory), a vertex in a mathematical graph
* Vertex (geometry), a point where two or more curves, line ...
s serve as a directory, each containing a
pointer
Pointer may refer to:
Places
* Pointer, Kentucky
* Pointers, New Jersey
* Pointers Airport, Wasco County, Oregon, United States
* The Pointers, a pair of rocks off Antarctica
People with the name
* Pointer (surname), a surname (including a list ...
to its right sibling. In the earlier version of the B
x-tree,
[Christian S. Jensen, Dan Lin, and Beng Chin Ooi]
Query and Update Efficient B+tree based Indexing of Moving Objects
In Proceedings of 30th International Conference on Very Large Data Bases (VLDB), pages 768-779, 2004. the
leaf node
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
s contained the moving-
object
Object may refer to:
General meanings
* Object (philosophy), a thing, being, or concept
** Object (abstract), an object which does not exist at any particular time or place
** Physical object, an identifiable collection of matter
* Goal, an ai ...
locations being indexed and corresponding index time. In the optimized version,
[Dan Lin]
Indexing and Querying Moving Objects Databases
PhD thesis, National University of Singapore, 2006. each leaf node entry contains the id, velocity, single-dimensional mapping value and the latest update time of the object. The fanout is increased by not storing the locations of moving objects, as these can be derived from the
mapping values.
Utilizing the B+ tree for moving objects
As for many other moving objects indexes, a two-dimensional moving object is
modeled as a linear function as O = ((x, y), (vx, vy), t ), where (x, y) and (vx, vy) are location and
velocity
Velocity is the directional speed of an object in motion as an indication of its rate of change in position as observed from a particular frame of reference and as measured by a particular standard of time (e.g. northbound). Velocity i ...
of the object at a given time instance ''t'', i.e., the time of last update. The B+ tree is a structure for indexing single-dimensional data. In order to adopt the B+ tree as a moving object index, the B
x-tree uses a
linearization
In mathematics, linearization is finding the linear approximation to a function at a given point. The linear approximation of a function is the first order Taylor expansion around the point of interest. In the study of dynamical systems, linea ...
technique which helps to integrate objects' location at time ''t'' into single dimensional value. Specifically, objects are first partitioned according to their update time. For objects within the same partition, the B
x-tree stores their locations at a given time which are estimated by
linear interpolation
In mathematics, linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points.
Linear interpolation between two known points
If the two known po ...
. By doing so, the B
x-tree keeps a consistent view of all objects within the same partition without storing the update time of an objects.
Secondly, the space is partitioned by a grid and the location of an object is linearized within the partitions according to a space-filling curve, e.g., the
Peano
Giuseppe Peano (; ; 27 August 1858 – 20 April 1932) was an Italian mathematician and glottologist. The author of over 200 books and papers, he was a founder of mathematical logic and set theory, to which he contributed much notation. The sta ...
or
Hilbert curve
The Hilbert curve (also known as the Hilbert space-filling curve) is a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891, as a variant of the space-filling Peano curves discovered by Giusepp ...
s.
Finally, with the combination of the partition number (time information) and the linear order (location information), an object is indexed in B
x-tree with a one-dimensional index key B
xvalue:
:
Here index-partition is an index partition determined by the update time and xrep is the space-filling curve value of the object position at the indexed time,
denotes the binary value of x, and “+” means concatenation.
Given an object O ((7, 2), (-0.1,0.05), 10), tmu = 120, the B
xvalue for O can be computed as follows.
# O is indexed in partition 0 as mentioned. Therefore, indexpartition = (00)
2.
# O’s position at the label timestamp of partition 0 is (1,5).
# Using Z-curve with order = 3, the Z-value of O, i.e., xrep is (010011)
2.
# Concatenating indexpartition and xrep, B
xvalue (00010011)
2=19.
# Example O ((0,6), (0.2, -0.3 ),10) and tmu=120 then O's position at the label timestamp of partition: ???
Insertion, update and deletion
Given a new object, its index key is computed and then the object is inserted into the B
x-tree as in the B+ tree. An update consists of a deletion followed by an insertion. An auxiliary structure is employed to keep the latest key of each index so that an object can be deleted by searching for the key. The indexing key is computed before affecting the tree. In this way, the B
x-tree directly inherits the good properties of the B+ tree, and achieves efficient update performance.
Queries
Range query
A range query retrieves all objects whose location falls within the rectangular range
at time
not prior to the current time.
The B
x-tree uses query-window enlargement technique to answer queries. Since the B
x-tree stores an object's location as of sometime after its update time, the enlargement involves two cases: a location must either be brought back to an earlier time or forward to a later time. The main idea is to enlarge the query window so that it encloses all objects whose positions are not within query window at its label timestamp but will enter the query window at the query timestamp.
After the enlargement, the partitions of the B
x-tree need to be traversed to find objects falling in the enlarged query window. In each partition, the use of a space-filling curve means that a range query in the native, two-dimensional space becomes a set of range queries in the transformed, one-dimensional space.
To avoid excessively large query region after expansion in skewed datasets, an optimization of the query algorithm exists, which improves the query efficiency by avoiding unnecessary query enlargement.
K nearest neighbor query
K nearest neighbor query is computed by iteratively performing range queries with an incrementally enlarged search region until k answers are obtained. Another possibility is to employ similar querying ideas in
The iDistance Technique
In pattern recognition, the iDistance is an indexing and query processing technique for k-nearest neighbor queries on point data in multi-dimensional metric spaces. The kNN query is one of the hardest problems on multi-dimensional data, especia ...
.
Other queries
The range query and K Nearest Neighbor query algorithms can be easily extended to support interval queries, continuous queries, etc.
Adapting relational database engines to accommodate moving objects
Since the B
x-tree is an index built on top of a B+ tree index, all operations in the B
x-tree, including the insertion, deletion and search, are the same as those in the B+ tree. There is no need to change the implementations of these operations. The only difference is to implement the procedure of deriving the indexing key as a stored procedure in an existing
DBMS
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
. Therefore, the B
x-tree can be easily integrated into existing DBMS without touching the
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine lea ...
.
SpADE is moving object management system built on top of a popular relational database system
MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
, which uses the B
x-tree for indexing the objects. In the implementation, moving object data is transformed and stored directly on MySQL, and queries are transformed into standard SQL statements which are efficiently processed in the relational engine. Most importantly, all these are achieved neatly and independently without infiltrating into the MySQL core.
Performance tuning
Potential problem with data skew
The B
x tree uses a grid for space partitioning while mapping two-dimensional location into one-dimensional key. This may introduce performance degradation to both query and update operations while dealing with skewed data. If grid cell is oversize, many objects are contained in a cell. Since objects in a cell are indistinguishable to the index, there will be some overflow nodes in the underlying B+ tree. The existing of overflow pages not only destroys the balancing of the tree but also increases the update cost. As for the queries, for the given query region, large cell incurs more false positives and increases the processing time. On the other hand, if the space is partitioned with finer grid, i.e. smaller cells, each cell contains few objects. There is hardly overflow pages so that the update cost is minimized. Fewer false positives are retrieved in a query. However, more cells are needed to be searched. The increase in the number of cells searched also increases the workload of a query.
Index tuning
The ST
2B-tree
[Su Chen, Beng Chin Ooi, Kan-Lee. Tan, and Mario A. Nacismento]
ST2B-tree: A Self-Tunable Spatio-Temporal B+-tree for Moving Objects
. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), page 29-42, 2008. introduces a self-tuning framework for tuning the performance of the B
x-tree while dealing with data skew in space and data change with time. In order to deal with data skew in space, the ST
2B-tree splits the entire space into regions of different object density using a set of reference points. Each region uses an individual grid whose cell size is determined by the object density inside of it.
The B
x-tree have multiple partitions regarding different time intervals. As time elapsed, each partition grows and shrinks alternately. The ST
2B-tree utilizes this feature to tune the index online in order to adjust the space partitioning to make itself accommodate to the data changes with time. In particular, as a partition shrinks to empty and starts growing, it chooses a new set of reference points and new grid for each reference point according to the latest data density. The tuning is based on the latest statistics collected during a given period of time, so that the way of space partitioning is supposed to fit the latest data distribution best. By this means, the ST
2B-tree is expected to minimize the effect caused by data skew in space and data changes with time.
See also
*
B+ tree
A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.
A B+ tree can be viewed as a B ...
*
Hilbert curve
The Hilbert curve (also known as the Hilbert space-filling curve) is a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891, as a variant of the space-filling Peano curves discovered by Giusepp ...
*
Z-order (curve)
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It i ...
References
{{DEFAULTSORT:Bx-Tree Moving Object Index
B-tree