In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, M-trees are
tree data structure
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be conn ...
s that are similar to
R-tree
R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found sign ...
s and
B-tree
In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
s. It is constructed using a
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathema ...
and relies on the
triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.
This statement permits the inclusion of degenerate triangles, but ...
for efficient range and
k-nearest neighbor
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and reg ...
(k-NN) queries.
While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for
distance function
In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
s that satisfy the triangle inequality, while many advanced dissimilarity functions used in
information retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
do not satisfy this.
Overview
As in any Tree-based data structure, the M-Tree is composed of Nodes and Leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radius
that defines a Ball in the desired metric space. Thus, every node
and leaf
residing in a particular node
is at most distance
from
, and every node
and leaf
with node parent
keep the distance from it.
M-Tree construction
Components
An M-Tree has these components and sub-components:
# Non-leaf nodes
## A set of routing objects N
''RO''.
## Pointer to Node's parent object O
''p''.
# Leaf nodes
## A set of objects N
''O''.
## Pointer to Node's parent object O
''p''.
# Routing Object
## (Feature value of) routing object O
''r''.
## Covering radius r(O
''r'').
## Pointer to covering tree T(O
''r'').
## Distance of O
''r'' from its parent object d(O
''r'',P(O
''r''))
# Object
## (Feature value of the) object O
''j''.
## Object identifier oid(O
''j'').
## Distance of O
''j'' from its parent object d(O
''j'',P(O
''j''))
Insert
The main idea is first to find a leaf node where the new object belongs. If is not full then just attach it to . If is full then invoke a method to split . The algorithm is as follows:
Input: Node of M-Tree ,
Output: A new instance of containing all entries in original
's routing objects or objects
if is not a leaf then
}
else
}
/* Upgrade the new radii of the entry */
}
/* Continue inserting in the next level */
else
Split
If the split method arrives to the root of the tree, then it choose two routing objects from , and creates two new nodes containing all the objects in original , and store them into the new root. If split methods arrives to a node that is not the root of the tree, the method choose two new routing objects from , re-arrange every routing object in in two new nodes
and
, and store these new nodes in the parent node
of original . The split must be repeated if
has not enough capacity to store
. The algorithm is as follow:
Input: Node of M-Tree ,
Output: A new instance of containing a new partition.
/* The new routing objects are now all those in the node plus the new routing object */
let be entries of
if is not the root then
/* This node will contain part of the objects of the node to be split */
Create a new node
/* Promote two routing objects from the node to be split, to be new routing objects */
Create new objects
Promote()
/* Choose which objects from the node being split will act as new routing objects */
Partition()
/* Store entries in each new routing object */
if is the current root then
else
M-Tree Queries
Range Query
A range query is where a minimum similarity/maximum distance value is specified.
For a given query object and a maximum search distance
, the range query range(Q, r(Q)) selects all the indexed objects such that .
Algorithm RangeSearch starts from the root node and recursively traverses all the paths which cannot be excluded from leading to qualifying objects.
Input: Node of M-Tree MT, : query object, : search radius
Output: all the DB objects
*
is the identifier of the object which resides on a separate data file.
*
is a sub-tree – the covering tree of
k-NN queries
K Nearest Neighbor (k-NN) query takes the cardinality of the input set as an input parameter. For a given query object Q ∈ D and an
integer k ≥ 1, the k-NN query NN(Q, k) selects the k indexed objects which have the shortest distance from Q, according to the distance function d.
See also
*
Segment tree
In computer science, a segment tree, also known as a statistic tree, is a tree data structure used for storing information about intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle ...
*
Interval tree In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to f ...
- A degenerate R-Tree for 1 dimension (usually time).
*
Bounding volume hierarchy
A bounding volume hierarchy (BVH) is a tree structure on a set of geometric objects. All geometric objects, that form the leaf nodes of the tree, are wrapped in bounding volumes. These nodes are then grouped as small sets and enclosed within la ...
*
Spatial index
A spatial database is a general-purpose database (usually a relational database) that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spa ...
*
GiST
In computing, GiST or Generalized Search Tree, is a data structure and API that can be used to build a variety of disk-based search trees. GiST is a generalization of the B+ tree, providing a concurrent and recoverable height-balanced search tr ...
References
{{DEFAULTSORT:M-Tree
Trees (data structures)
Database index techniques
Geometric data structures