MVP Tree
   HOME

TheInfoList



OR:

A vantage-point tree (or VP tree) is a
metric tree A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle inequality to make accesses to the data more efficient. Examples include the M-tree, vp-t ...
that segregates data in a
metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
by choosing a position in the space (the "vantage point") and partitioning the data points into two parts: those points that are nearer to the vantage point than a threshold, and those points that are not. By recursively applying this procedure to partition the data into smaller and smaller sets, a
tree data structure In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be conn ...
is created where neighbors in the tree are likely to be neighbors in the space. One generalization is called a multi-vantage-point tree (or MVP tree): a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
for indexing objects from large
metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
s for
similarity search Similarity search is the most general term used for a range of mechanisms which share the principle of searching (typically, very large) spaces of objects where the only available comparator is the similarity between any pair of objects. This is ...
queries. It uses more than one point to partition each level.


History

Peter Yianilos claimed that the vantage-point tree was discovered independently by him (Peter Yianilos) and by
Jeffrey Uhlmann Jeffrey K. Uhlmann is an American research scientist who is probably best known for his mathematical generalizations of the Kalman filter. Most of his publications and patents have been in the field of data fusion. He is also known for being a cult ...
. Yet, Uhlmann published this method before Yianilos in 1991. Uhlmann called the data structure a
metric tree A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle inequality to make accesses to the data more efficient. Examples include the M-tree, vp-t ...
, the name VP-tree was proposed by Yianilos. Vantage-point trees have been generalized to non-metric spaces using Bregman divergences by Nielsen et al. This iterative partitioning process is similar to that of a -d tree, but uses circular (or spherical, hyperspherical, etc.) rather than rectilinear partitions. In two-dimensional Euclidean space, this can be visualized as a series of circles segregating the data. The vantage-point tree is particularly useful in dividing data in a non-standard metric space into a metric tree.


Understanding a vantage-point tree

The way a vantage-point tree stores data can be represented by a circle. First, understand that each
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
of this
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
contains an input point and a radius. All the left children of a given
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
are the points inside the circle and all the right children of a given
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
are outside of the circle. The
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of the
metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
.


Searching through a vantage-point tree

A vantage-point tree can be used to find the nearest neighbor of a point . The search algorithm is recursive. At any given step we are working with a node of the tree that has a vantage point and a threshold distance . The point of interest will be some distance from the vantage point . If that distance is less than then use the algorithm recursively to search the subtree of the node that contains the points closer to the vantage point than the threshold ; otherwise recurse to the subtree of the node that contains the points that are farther than the vantage point than the threshold . If the recursive use of the algorithm finds a neighboring point with distance to that is less than then it cannot help to search the other subtree of this node; the discovered node is returned. Otherwise, the other subtree also needs to be searched recursively. A similar approach works for finding the nearest neighbors of a point . In the recursion, the other subtree is searched for nearest neighbors of the point whenever only of the nearest neighbors found so far have distance that is less than .


Advantages of a vantage-point tree

# Instead of inferring multidimensional points for domain before the index being built, we build the index directly based on the distance. Doing this, avoids pre-processing steps. # Updating a vantage-point tree is relatively easy compared to the FastMap approach. For FastMap, after inserting or deleting data, there will come a time when FastMap will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start. # Distance based methods are flexible. It is “able to index objects that are represented as feature vectors of a fixed number of dimensions."


Complexity

The time cost to build a vantage-point tree is approximately . For each element, the tree is descended by levels to find its placement. However there is a constant factor where is the number of vantage points per tree node. The time cost to search a vantage-point tree to find a single nearest neighbor is . There are levels, each involving distance calculations, where is the number of vantage points (elements) at that position in the tree. The time cost to search a vantage-point tree for a range, which may be the most important attribute, can vary greatly depending on the specifics of the algorithm used and parameters. Brin's paper gives the result of experiments with several vantage point algorithms with various parameters to investigate the cost, measured in number of distance calculations. The space cost for a vantage-point tree is approximately . Each element is stored, and each tree element in each non-leaf node requires a pointer to its descendant nodes. (See Brin for details on one implementation choice. The parameter for number of elements at each node plays a factor.) Note that some metric space tools require a matrix of pair-wise distance values, costing , but that is not required with vantage-point trees.


References


External links


Understanding VP Trees
{{CS-Trees Trees (data structures) Database index techniques Binary trees