Euclidean minimum spanning tree
   HOME

TheInfoList



OR:

A Euclidean minimum spanning tree of a
finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, :\ is a finite set with five elements. Th ...
of points in the
Euclidean plane In mathematics, the Euclidean plane is a Euclidean space of dimension two. That is, a geometric setting in which two real quantities are required to determine the position of each point ( element of the plane), which includes affine notions of ...
or higher-dimensional
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...
connects the points by a system of
line segment In geometry, a line segment is a part of a straight line that is bounded by two distinct end points, and contains every point on the line that is between its endpoints. The length of a line segment is given by the Euclidean distance between ...
s with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the
minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. T ...
of a
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices is c ...
with the points as vertices and the
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
s between points as edge weights. The edges of the minimum spanning tree meet at angles of at least 60°, at most six to a vertex. In higher dimensions, the number of edges per vertex is bounded by the
kissing number In geometry, the kissing number of a mathematical space is defined as the greatest number of non-overlapping unit spheres that can be arranged in that space such that they each touch a common unit sphere. For a given sphere packing (arrangement of ...
of tangent
unit sphere In mathematics, a unit sphere is simply a sphere of radius one around a given center. More generally, it is the set of points of distance 1 from a fixed central point, where different norms can be used as general notions of "distance". A unit b ...
s. The total length of the edges, for points in a
unit square In mathematics, a unit square is a square whose sides have length . Often, ''the'' unit square refers specifically to the square in the Cartesian plane with corners at the four points ), , , and . Cartesian coordinates In a Cartesian coordinate ...
, is at most proportional to the square root of the number of points. Each edge lies in an empty region of the plane, and these regions can be used to prove that the Euclidean minimum spanning tree is a subgraph of other geometric graphs including the
relative neighborhood graph In computational geometry, the relative neighborhood graph (RNG) is an undirected graph defined on a set of points in the Euclidean plane by connecting two points p and q by an edge whenever there does not exist a third point r that is closer to bo ...
and
Delaunay triangulation In mathematics and computational geometry, a Delaunay triangulation (also known as a Delone triangulation) for a given set P of discrete points in a general position is a triangulation DT(P) such that no point in P is inside the circumcircle o ...
. By constructing the Delaunay triangulation and then applying a graph minimum spanning tree algorithm, the minimum spanning tree of n given planar points may be found in time O(n\log n), as expressed in
big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
. This is optimal in some models of computation, although faster
randomized algorithms A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
exist for points with integer coordinates. For points in higher dimensions, finding an optimal algorithm remains an
open problem In science and mathematics, an open problem or an open question is a known problem which can be accurately stated, and which is assumed to have an objective and verifiable solution, but which has not yet been solved (i.e., no solution for it is know ...
.


Definition and related problems

A Euclidean minimum spanning tree, for a set of n points in the
Euclidean plane In mathematics, the Euclidean plane is a Euclidean space of dimension two. That is, a geometric setting in which two real quantities are required to determine the position of each point ( element of the plane), which includes affine notions of ...
or
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...
, is a system of
line segment In geometry, a line segment is a part of a straight line that is bounded by two distinct end points, and contains every point on the line that is between its endpoints. The length of a line segment is given by the Euclidean distance between ...
s, having only the given points as their endpoints, whose union includes all of the points in a
connected set In topology and related branches of mathematics, a connected space is a topological space that cannot be represented as the union of two or more disjoint non-empty open subsets. Connectedness is one of the principal topological properties that ...
, and which has the minimum possible total length of any such system. Such a network cannot contain a
polygonal In geometry, a polygon () is a plane figure that is described by a finite number of straight line segments connected to form a closed ''polygonal chain'' (or ''polygonal circuit''). The bounded plane region, the bounding circuit, or the two tog ...
ring of segments; if one existed, the network could be shortened by removing an edge of the polygon. Therefore, the minimum-length network forms a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
. This observation leads to the equivalent definition that a Euclidean minimum spanning tree is a tree of line segments between pairs of the given points, of minimum total length. The same tree may also be described as a
minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. T ...
of a weighted
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices is c ...
, having the given points as its vertices and the distances between points as edge weights. The same points may have more than one minimum spanning tree. For instance, for the vertices of a
regular polygon In Euclidean geometry, a regular polygon is a polygon that is Equiangular polygon, direct equiangular (all angles are equal in measure) and Equilateral polygon, equilateral (all sides have the same length). Regular polygons may be either convex p ...
, removing any edge of the polygon produces a minimum spanning tree. Publications on the Euclidean minimum spanning tree commonly abbreviate it as "EMST". They may also be called "geometric minimum spanning trees", but that term may be used more generally for geometric spaces with non-Euclidean distances, such as spaces. When the context of Euclidean point sets is clear, they may be called simply "minimum spanning trees". Several other standard geometric networks are closely related to the Euclidean minimum spanning tree: *The
Steiner tree problem In combinatorial mathematics, the Steiner tree problem, or minimum Steiner tree problem, named after Jakob Steiner, is an umbrella term for a class of problems in combinatorial optimization. While Steiner tree problems may be formulated in a nu ...
again seeks a system of line segments connecting all given points, but without requiring the segments to start and end only at given points. In this problem, additional points may be added as segment endpoints, allowing the Steiner tree to be shorter than the minimum spanning tree. *In the Euclidean traveling salesperson path problem, the connecting line segments must start and end at the given points, like the spanning tree and unlike the Steiner tree; additionally, each point can touch at most two line segments, so the result forms a
polygonal chain In geometry, a polygonal chain is a connected series of line segments. More formally, a polygonal chain is a curve specified by a sequence of points (A_1, A_2, \dots, A_n) called its vertices. The curve itself consists of the line segments co ...
. Because of this restriction, the optimal path may be longer than the Euclidean minimum spanning tree, but is at most twice as long. *
Geometric spanner A geometric spanner or a -spanner graph or a -spanner was initially introduced as a weighted graph over a set of points as its vertices for which there is a -path between any pair of vertices for a fixed parameter . A -path is defined as a path ...
s are low-weight networks that, like the minimum spanning tree, connect all of the points. Unlike the minimum spanning tree, all of these connecting paths are required to be short, having length proportional to the distance between the points they connect. To achieve this property, these networks generally have cycles and so are not trees.


Properties


Angles and vertex degrees

Whenever two edges of a Euclidean minimum spanning tree meet at a vertex, they must form an angle of 60° or more, with equality only when they form two sides of an
equilateral triangle In geometry, an equilateral triangle is a triangle in which all three sides have the same length. In the familiar Euclidean geometry, an equilateral triangle is also equiangular; that is, all three internal angles are also congruent to each othe ...
. This is because, for two edges forming any sharper angle, one of the two edges could be replaced by the third, shorter edge of the triangle they form, forming a tree with smaller total length. In comparison, the Steiner tree problem has a stronger angle bound: an optimal Steiner tree has all angles at least 120°. The same 60° angle bound also occurs in the
kissing number In geometry, the kissing number of a mathematical space is defined as the greatest number of non-overlapping unit spheres that can be arranged in that space such that they each touch a common unit sphere. For a given sphere packing (arrangement of ...
problem, of finding the maximum number of
unit sphere In mathematics, a unit sphere is simply a sphere of radius one around a given center. More generally, it is the set of points of distance 1 from a fixed central point, where different norms can be used as general notions of "distance". A unit b ...
s in Euclidean space that can be tangent to a central unit sphere without any two spheres intersecting (beyond a point of tangency). The center points of these spheres have a minimum spanning tree in the form of a
star A star is an astronomical object comprising a luminous spheroid of plasma (physics), plasma held together by its gravity. The List of nearest stars and brown dwarfs, nearest star to Earth is the Sun. Many other stars are visible to the naked ...
, with the central point adjacent to all other points. Conversely, for any vertex v of any minimum spanning tree, one can construct non-overlapping unit spheres centered at v and at points two units along each of its edges, with a tangency for each neighbor of v. Therefore, in n-dimensional space the maximum possible
degree Degree may refer to: As a unit of measurement * Degree (angle), a unit of angle measurement ** Degree of geographical latitude ** Degree of geographical longitude * Degree symbol (°), a notation used in science, engineering, and mathematics ...
of a vertex (the number of spanning tree edges connected to it) equals the kissing number of spheres in n dimensions. Planar minimum spanning trees have degree at most six, and when a tree has degree six there is always another minimum spanning tree with maximum degree five. Three-dimensional minimum spanning trees have degree at most twelve. The only higher dimensions in which the exact value of the kissing number is known are four, eight, and 24 dimensions. For points generated at random from a given continuous distribution, the minimum spanning tree is
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
unique. The numbers of vertices of any given degree converge, for large number of vertices, to a constant times that number of vertices. The values of these constants depend on the degree and the distribution. However, even for simple cases—such as the number of leaves for points uniformly distributed in a unit square—their precise values are not known.


Empty regions

For any edge uv of any Euclidean minimum spanning tree, the
lens A lens is a transmissive optical device which focuses or disperses a light beam by means of refraction. A simple lens consists of a single piece of transparent material, while a compound lens consists of several simple lenses (''elements''), ...
(or
vesica piscis The vesica piscis is a type of lens, a mathematical shape formed by the intersection of two disks with the same radius, intersecting in such a way that the center of each disk lies on the perimeter of the other. In Latin, "vesica piscis" literal ...
) formed by intersecting the two circles with uv as their radii cannot have any other given vertex w in its interior. Put another way, if any tree has an edge uv whose lens contains a third point w, then it is not of minimum length. For, by the geometry of the two circles, w would be closer to both u and v than they are to each other. If edge uv were removed from the tree, w would remain connected to one of u and v, but not the other. Replacing the removed edge uv by uw or vw (whichever of these two edges reconnects w to the vertex from which it was disconnected) would produce a shorter tree. For any edge uv of any Euclidean minimum spanning tree, the
rhombus In plane Euclidean geometry, a rhombus (plural rhombi or rhombuses) is a quadrilateral whose four sides all have the same length. Another name is equilateral quadrilateral, since equilateral means that all of its sides are equal in length. The ...
with angles of 60° and 120°, having uv as its long diagonal, is disjoint from the rhombi formed analogously by all other edges. Two edges sharing an endpoint cannot have overlapping rhombi, because that would imply a edge angle sharper than 60°, and two disjoint edges cannot have overlapping rhombi; if they did, the longer of the two edges could be replaced by a shorter edge among the same four vertices.


Supergraphs

Certain geometric graphs have definitions involving empty regions in point sets, from which it follows that they contain every edge that can be part of a Euclidean minimum spanning tree. These include: *The
relative neighborhood graph In computational geometry, the relative neighborhood graph (RNG) is an undirected graph defined on a set of points in the Euclidean plane by connecting two points p and q by an edge whenever there does not exist a third point r that is closer to bo ...
, which has an edge between any pair of points whenever the lens they define is empty. *The
Gabriel graph In mathematics and computational geometry, the Gabriel graph of a set S of points in the Euclidean plane expresses one notion of proximity or nearness of those points. Formally, it is the graph G with vertex set S in which any two distinct po ...
, which has an edge between any pair of points whenever the circle having the pair as a diameter is empty. *The
Delaunay triangulation In mathematics and computational geometry, a Delaunay triangulation (also known as a Delone triangulation) for a given set P of discrete points in a general position is a triangulation DT(P) such that no point in P is inside the circumcircle o ...
, which has an edge between any pair of points whenever there exists an empty circle having the pair as a chord. *The Urquhart graph, formed from the Delaunay triangulation by removing the longest edge of each triangle. For each remaining edge, the vertices of the Delaunay triangles that use that edge cannot lie within the empty lune of the relative neighborhood graph. Because the empty-region criteria for these graphs are progressively weaker, these graphs form an ordered sequence of subgraphs. That is, using "⊆" to denote the subset relationship among their edges, these graphs have the relations: Another graph guaranteed to contain the minimum spanning tree is the
Yao graph In computational geometry, the Yao graph, named after Andrew Yao, is a kind of geometric spanner, a weighted undirected graph connecting a set of geometric points with the property that, for every pair of points in the graph, their shortest pa ...
, determined for points in the plane by dividing the plane around each point into six 60° wedges and connecting each point to the nearest neighbor in each wedge. The resulting graph contains the relative neighborhood graph, because two vertices with an empty lens must be the nearest neighbors to each other in their wedges. As with many of the other geometric graphs above, this definition can be generalized to higher dimensions, and (unlike the Delaunay triangulation) its generalizations always include a linear number of edges.


Total length

For n points in the
unit square In mathematics, a unit square is a square whose sides have length . Often, ''the'' unit square refers specifically to the square in the Cartesian plane with corners at the four points ), , , and . Cartesian coordinates In a Cartesian coordinate ...
(or any other fixed shape), the total length of the minimum spanning tree edges is O(\sqrt n). Some sets of points, such as points evenly spaced in a \sqrt n \times \sqrt n grid, attain this bound. For points in a unit hypercube in d-dimensional space, the corresponding bound is O(n^). The same bound applies to the expected total length of the minimum spanning tree for n points chosen uniformly and independently from a unit square or unit hypercube. Returning to the unit square, the sum of squared edge lengths of the minimum spanning tree is O(1). This bound follows from the observation that the edges have disjoint rhombi, with area proportional to the edge lengths squared. The O(\sqrt n) bound on total length follows by application of the
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality fo ...
. Another interpretation of these results is that the average edge length for any set of points in a unit square is O(1/\sqrt n), at most proportional to the spacing of points in a regular grid; and that for ''random'' points in a unit square the average length is proportional to 1/\sqrt n. However, in the random case, with high probability the longest edge has length approximately \sqrt, longer than the average by a non-constant factor. With high probability, the longest edge forms a leaf of the spanning tree, and connects a point far from all the other points to its nearest neighbor. For large numbers of points, the distribution of the longest edge length around its expected value converges to a
Laplace distribution In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponen ...
. Any
geometric spanner A geometric spanner or a -spanner graph or a -spanner was initially introduced as a weighted graph over a set of points as its vertices for which there is a -path between any pair of vertices for a fixed parameter . A -path is defined as a path ...
, a subgraph of a complete geometric graph whose shortest paths approximate the Euclidean distance, must have total edge length at least as large as the minimum spanning tree, and one of the standard quality measures for a geometric spanner is the ratio between its total length and of the minimum spanning tree for the same points. Several methods for constructing spanners, such as the
greedy geometric spanner In computational geometry, a greedy geometric spanner is an undirected graph whose distances approximate the Euclidean distances among a finite set of points in a Euclidean space. The vertices of the graph represent these points. The edges of the ...
, achieve a constant bound for this ratio. It has been conjectured that the Steiner ratio, the largest possible ratio between the total length of a minimum spanning tree and Steiner tree for the same set of points in the plane, is 2/\sqrt\approx 1.1547, the ratio for three points in an
equilateral triangle In geometry, an equilateral triangle is a triangle in which all three sides have the same length. In the familiar Euclidean geometry, an equilateral triangle is also equiangular; that is, all three internal angles are also congruent to each othe ...
.


Subdivision

If every edge of a Euclidean minimum spanning tree is subdivided, by adding a new point at its midpoint, then the resulting tree is still a minimum spanning tree of the augmented point set. Repeating this subdivision process allows a Euclidean minimum spanning tree to be subdivided arbitrarily finely. However, subdividing only some of the edges, or subdividing the edges at points other than the midpoint, may produce a point set for which the subdivided tree is not the minimum spanning tree.


Computational complexity

For points in any dimension, the minimum spanning tree can be constructed in time O(n^2) by constructing a
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices is c ...
with an edge between every pair of points, weighted by
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
, and then applying a graph minimum spanning tree algorithm such as the Prim–Dijkstra–Jarník algorithm or
Borůvka's algorithm Borůvka's algorithm is a greedy algorithm for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected. It was first published in 1926 by Otakar Borůvka as a method of constructing an ...
on it. These algorithms can be made to take time O(n^2) on complete graphs, unlike another common choice,
Kruskal's algorithm Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. (A minimum spanning tree of a connected graph is a subset of the edges that forms a tree that i ...
, which is slower because it involves sorting all distances. For points in low-dimensional spaces, the problem may be solved more quickly, as detailed below. Computing Euclidean distances involves a
square root In mathematics, a square root of a number is a number such that ; in other words, a number whose ''square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . E ...
calculation. In any comparison of edge weights, comparing the squares of the Euclidean distances, instead of the distances themselves, yields the same ordering, and so does not change the rest of the tree's computation. This shortcut speeds up calculation and allows a minimum spanning tree for points with integer coordinates to be constructed using only integer arithmetic.


Two dimensions

A faster approach to finding the minimum spanning tree of planar points uses the property that it is a subgraph of the Delaunay triangulation: # Compute the Delaunay triangulation, which can be done in O(n\log n) time. Because the Delaunay triangulation is a
planar graph In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross ...
, it has at most 3n-6 edges. # Label each edge with its (squared) length. # Run a graph minimum spanning tree algorithm. Since there are O(n) edges, this requires O(n\log n) time using any of the standard minimum spanning tree algorithms. The result is an algorithm taking O(n\log n) time, optimal in certain models of computation (see
below Below may refer to: *Earth *Ground (disambiguation) *Soil *Floor *Bottom (disambiguation) Bottom may refer to: Anatomy and sex * Bottom (BDSM), the partner in a BDSM who takes the passive, receiving, or obedient role, to that of the top or ...
). If the input coordinates are integers and can be used as array indices, faster algorithms are possible: the Delaunay triangulation can be constructed by a
randomized algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
in O(n\log\log n) expected time. Additionally, since the Delaunay triangulation is a
planar graph In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross ...
, its minimum spanning tree can be found in
linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
by a variant of Borůvka's algorithm that removes all but the cheapest edge between each pair of components after each stage of the algorithm. Therefore, the total expected time for this algorithm is O(n\log\log n). In the other direction, the Delaunay triangulation can be constructed from the minimum spanning tree in the near-linear time bound O(n\log^* n), where \log^* denotes the
iterated logarithm In computer science, the iterated logarithm of n, written  n (usually read "log star"), is the number of times the logarithm function must be iteratively applied before the result is less than or equal to 1. The simplest formal definition i ...
.


Higher dimensions

The problem can also be generalized to n points in the d-dimensional space \R^d. In higher dimensions, the connectivity determined by the Delaunay triangulation (which, likewise, partitions the
convex hull In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space ...
into d-dimensional
simplices In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
) contains the minimum spanning tree; however, the triangulation might contain the complete graph. Therefore, finding the Euclidean minimum spanning tree as a spanning tree of the complete graph or as a spanning tree of the Delaunay triangulation both take O(dn^2) time. For three dimensions the minimum spanning tree can be found in time O\bigl((n\log n)^\bigr), and in any greater dimension, in time O\left(n^\right) for any \varepsilon>0—faster than the quadratic time bound for the complete graph and Delaunay triangulation algorithms. The optimal time complexity for higher-dimensional minimum spanning trees remains unknown, but is closely related to the complexity of computing ''bichromatic closest pairs''. In the bichromatic closest pair problem, the input is a set of points, given two different colors (say, red and blue). The output is a pair of a red point and a blue point with the minimum possible distance. This pair always forms one of the edges in the minimum spanning tree. Therefore, the bichromatic closest pair problem can be solved in the amount of time that it takes to construct a minimum spanning tree and scan its edges for the shortest red–blue edge. Conversely, for any red–blue coloring of any subset of a given set of points, the bichromatic closest pair produces one edge of the minimum spanning tree of the subset. By carefully choosing a sequence of colorings of subsets, and finding the bichromatic closest pair of each subproblem, the minimum spanning tree may be found in time proportional to the optimal time for finding bichromatic closest pairs for the same number of points, whatever that optimal time turns out to be. For uniformly random point sets in any bounded dimension, the
Yao graph In computational geometry, the Yao graph, named after Andrew Yao, is a kind of geometric spanner, a weighted undirected graph connecting a set of geometric points with the property that, for every pair of points in the graph, their shortest pa ...
or Delaunay triangulation have linear expected numbers of edges, are guaranteed to contain the minimum spanning tree, and can be constructed in linear expected time. From these graphs, the minimum spanning tree itself may be constructed in linear time, by using a randomized linear time algorithm for graph minimum spanning trees. However, the poor performance of these methods on inputs coming from clustered data has led
algorithm engineering Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithm theory and practical applications of algorithms in software enginee ...
researchers to develop methods with a somewhat slower O(n\log n) time bound, for random inputs or inputs whose distances and clustering resemble those of random data, while exhibiting better performance on real-world data. A
well-separated pair decomposition In computational geometry, a well-separated pair decomposition (WSPD) of a set of points S \subset \mathbb^d, is a sequence of pairs of sets (A_i, B_i), such that each pair is well-separated, and for each two distinct points p, q \in S, there exists ...
is a family of pairs of subsets of the given points, so that every pair of points belong to one of these pairs of subsets, and so that all pairs of points coming from the same pair of subsets have approximately the same length. It is possible to find a well-separated pair decomposition with a linear number of subsets, and a representative pair of points for each subset, in time O(n\log n). The minimum spanning tree of the graph formed by these representative pairs is then an approximation to the minimum spanning tree. Using these ideas, a (1+\varepsilon)-approximation to the minimum spanning tree may be found in O(n\log n) time, for constant \varepsilon. More precisely, by choosing each representative pair to approximate the closest pair in its equivalence class, and carefully varying the quality of this approximation for different pairs, the dependence on \varepsilon in the time bound can be given as O(n \log n + (\varepsilon^ \log ^2 \tfrac)n), for any fixed dimension.


Dynamic and kinetic

The Euclidean minimum spanning tree has been generalized in many different ways to systems of moving or changing points: *If a set of points undergoes a sequence of dynamic insertions or deletions of points, each of these updates induces a bounded amount of change to the minimum spanning tree of the points. When the update sequence is known in advance, for points in the plane, the change after each insertion or deletion can be found in time O(\log^2 n) per insertion or deletion. When the updates must be handled in an
online In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" or ...
manner, a slower (but still poly-logarithmic) O(\log^ n) time bound is known. For higher-dimensional versions of the problem the time per update is slower, but still sublinear. *For n points moving linearly with constant speed, or with more general algebraic motions, the minimum spanning tree will change by a sequence of swaps, in which one edge is removed and another replaces it at a point in time where both have equal length. For linear motions, the number of changes is at most slightly larger than n^. For more general algebraic motions, there is a near-cubic upper bound on the number of swaps, based on the theory of
Davenport–Schinzel sequence In combinatorics, a Davenport–Schinzel sequence is a sequence of symbols in which the number of times any two symbols may appear in alternation is limited. The maximum possible length of a Davenport–Schinzel sequence is bounded by the number of ...
s. *The ''minimum moving spanning tree problem'' again concerns points moving linearly with constant speed, over an interval of time, and seeks a single tree that minimizes the maximum sum of weights occurring at any instant during this interval. It is NP-hard to compute exactly, but can be approximated to within a factor of two in polynomial time. *The kinetic Euclidean minimum spanning tree problem asks for a
kinetic data structure A kinetic data structure is a data structure used to track an attribute of a geometric system that is moving continuously. For example, a kinetic convex hull data structure maintains the convex hull of a group of n moving points. The developme ...
that can maintain the minimum spanning tree as its points undergo both continuous motions and insertions and deletions. Several papers have studied such structures, and a kinetic structure for algebraically moving points with near-cubic total time, nearly matching the bound on the number of swaps, is known.


Lower bound

An asymptotic lower bound of \Omega(n\log n) of the Euclidean minimum spanning tree problem can be established in restricted models of computation. These include the
algebraic decision tree In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of ''queries'' or ''tests'' that are done adaptively, so the outcome of the previ ...
and algebraic computation tree models, in which the algorithm has access to the input points only through certain restricted primitives that perform simple algebraic computations on their coordinates. In these models, the
closest pair of points problem The closest pair of points problem or closest pair problem is a problem of computational geometry: given n points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean ...
requires \Omega(n\log n) time, but the closest pair is necessarily an edge of the minimum spanning tree, so the minimum spanning tree also requires this much time. Therefore, algorithms for constructing the planar minimum spanning tree in time O(n\log n) within this model, for instance by using the Delaunay triangulation, are optimal. However, these lower bounds do not apply to models of computation with integer point coordinates, in which
bitwise operation In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operati ...
s and table indexing operations on those coordinates are permitted. In these models, faster algorithms are possible, as described above.


Applications

An obvious application of Euclidean minimum spanning trees is to find the cheapest network of wires or pipes to connect a set of places, assuming the links cost a fixed amount per unit length. The first publications on minimum spanning trees more generally concerned a geographic version of the problem, involving the design of an
electrical grid An electrical grid is an interconnected network for electricity delivery from producers to consumers. Electrical grids vary in size and can cover whole countries or continents. It consists of:Kaplan, S. M. (2009). Smart Grid. Electrical Power ...
for southern Moravia, and an application to minimizing wire lengths in circuits was described in 1957 by Loberman and Weinberger. Minimum spanning trees are closely related to
single-linkage clustering In statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of el ...
, one of several methods for
hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into ...
. The edges of a minimum spanning tree, sorted by their length, give the order in which to merge clusters into larger clusters in this clustering method. Once these edges have been found, by any algorithm, they may be used to construct the single-linkage clustering in time O(n\log n). Although the long thin cluster shapes produced by single-linkage clustering can be a bad fit for certain types of data, such as mixtures of Gaussian distributions, it can be a good choice in applications where the clusters themselves are expected to have long thin shapes, such as in modeling the
dark matter halo According to modern models of physical cosmology, a dark matter halo is a basic unit of cosmological structure. It is a hypothetical region that has decoupled from cosmic expansion and contains gravitationally bound matter. A single dark matte ...
s of
galaxies A galaxy is a system of stars, stellar remnants, interstellar gas, dust, dark matter, bound together by gravity. The word is derived from the Greek ' (), literally 'milky', a reference to the Milky Way galaxy that contains the Solar System. ...
. In
geographic information science Geographic information science or geographical information science (GIScience or GISc) is the scientific discipline that studies geographic information, including how it represents phenomena in the real world, how it represents the way humans unders ...
, several researcher groups have used minimum spanning trees of the centroids of buildings to identify meaningful clusters of buildings, for instance by removing edges identified in some other way as inconsistent. Minimum spanning trees have also been used to infer the shape of curves in the plane, given points sampled along the curve. For a smooth curve, sampled more finely than its
local feature size Local feature size refers to several related concepts in computer graphics and computational geometry for measuring the size of a geometric object near a particular point. *Given a smooth manifold M, the local feature size at any point x \in M i ...
, the minimum spanning tree will form a path connecting consecutive points along the curve. More generally, similar methods can recognize curves drawn in a dotted or dashed style rather than as a single connected set. Applications of this curve-finding technique include
particle physics Particle physics or high energy physics is the study of fundamental particles and forces that constitute matter and radiation. The fundamental particles in the universe are classified in the Standard Model as fermions (matter particles) an ...
, in automatically identifying the tracks left by particles in a
bubble chamber A bubble chamber is a vessel filled with a superheated transparent liquid (most often liquid hydrogen) used to detect electrically charged particles moving through it. It was invented in 1952 by Donald A. Glaser, for which he was awarded the 196 ...
. More sophisticated versions of this idea can find curves from a cloud of noisy sample points that roughly follows the curve outline, by using the topology of the spanning tree to guide a
moving least squares Moving least squares is a method of reconstructing continuous functions from a set of unorganized point samples via the calculation of a weighted least squares measure biased towards the region around the point at which the reconstructed value is re ...
method. Another application of minimum spanning trees is a
constant-factor approximation algorithm In computational complexity theory, the class APX (an abbreviation of "approximable") is the set of NP optimization problems that allow polynomial-time approximation algorithms with approximation ratio bounded by a constant (or constant-factor a ...
for the
Euclidean traveling salesman problem The travelling salesman problem (also called the travelling salesperson problem or TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each cit ...
, the problem of finding the shortest
polygonalization In computational geometry, a polygonalization of a finite set of points in the Euclidean plane is a simple polygon with the given points as its vertices. A polygonalization may also be called a polygonization, simple polygonalization, Hamiltoni ...
of a point set. Walking around the boundary of the minimum spanning tree can approximate the optimal traveling salesman tour within a factor of two of the optimal length. However, more accurate
polynomial-time approximation scheme In computer science (particularly algorithmics), a polynomial-time approximation scheme (PTAS) is a type of approximation algorithm for optimization problems (most often, NP-hard optimization problems). A PTAS is an algorithm which takes an insta ...
s are known for this problem. In
wireless ad hoc network A wireless ad hoc network (WANET) or mobile ad hoc network (MANET) is a decentralized type of wireless network. The network is ad hoc because it does not rely on a pre-existing infrastructure, such as routers in wired networks or access points ...
s,
broadcasting Broadcasting is the distribution (business), distribution of sound, audio or video content to a dispersed audience via any electronic medium (communication), mass communications medium, but typically one using the electromagnetic spectrum (radio ...
messages along paths in a minimum spanning tree can be an accurate approximation to the minimum-energy broadcast routing, which is, again, hard to compute exactly.


Realization

The ''realization problem'' for Euclidean minimum spanning trees takes an abstract
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
as input and seeks a geometric location for each vertex of the tree (in a space of some fixed dimension), such that the given tree equals the minimum spanning tree of those points. Not every abstract tree has such a realization; for instance, the tree must obey the
kissing number In geometry, the kissing number of a mathematical space is defined as the greatest number of non-overlapping unit spheres that can be arranged in that space such that they each touch a common unit sphere. For a given sphere packing (arrangement of ...
bound on the degree of each vertex. Additional restrictions exist; for instance, it is not possible for a planar minimum spanning tree to have a degree-six vertex adjacent to a vertex of degree five or six. Determining whether a two-dimensional realization exists is
NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
. However, the proof of hardness depends on the fact that degree-six vertices in a tree have a very restricted set of realizations: the neighbors of such a vertex must be placed on the vertices of a
regular hexagon In geometry, a hexagon (from Greek , , meaning "six", and , , meaning "corner, angle") is a six-sided polygon. The total of the internal angles of any simple (non-self-intersecting) hexagon is 720°. Regular hexagon A '' regular hexagon'' has ...
centered at that vertex. Indeed, for trees of maximum degree five, a planar realization always exists. Similarly, for trees of maximum degree ten, a three-dimensional realization always exists. For these realizations, some trees may require edges of exponential length and bounding boxes of exponential area relative to the length of their shortest edge. Trees of maximum degree four have smaller planar realizations, with polynomially bounded edge lengths and bounding boxes.


See also

*
Rectilinear minimum spanning tree In graph theory, the rectilinear minimum spanning tree (RMST) of a set of ''n'' points in the plane (or more generally, in ℝ''d'') is a minimum spanning tree of that set, where the weight of the edge between each pair of points is the rectiline ...
, a minimum spanning tree with distances measured using
taxicab geometry A taxicab geometry or a Manhattan geometry is a geometry whose usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian co ...


References


External links


EMST tutorial
mlpack mlpack is a machine learning software library for C++, built on top of the Armadillo library and thensmallennumerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possib ...
documentation {{DEFAULTSORT:Euclidean Minimum Spanning Tree Spanning tree Geometric graphs