In mathematics and computer science, an unrooted binary tree is an
unrooted tree
A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree (graph theory), tree showing the evolutionary relationships among va ...
in which each
vertex
Vertex, vertices or vertexes may refer to:
Science and technology Mathematics and computer science
*Vertex (geometry), a point where two or more curves, lines, or edges meet
*Vertex (computer graphics), a data structure that describes the position ...
has either one or three neighbors.
Definitions
A
free tree
In graph theory, a tree is an undirected graph in which any two vertices are connected by ''exactly one'' path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by '' ...
or unrooted tree is a
connected
Connected may refer to:
Film and television
* ''Connected'' (2008 film), a Hong Kong remake of the American movie ''Cellular''
* '' Connected: An Autoblogography About Love, Death & Technology'', a 2011 documentary film
* ''Connected'' (2015 TV ...
undirected graph
In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...
with no
cycles. The vertices with one neighbor are the ''leaves'' of the tree, and the remaining vertices are the ''internal nodes'' of the tree. The
degree
Degree may refer to:
As a unit of measurement
* Degree (angle), a unit of angle measurement
** Degree of geographical latitude
** Degree of geographical longitude
* Degree symbol (°), a notation used in science, engineering, and mathematics
...
of a vertex is its number of neighbors; in a tree with more than one node, the leaves are the vertices of degree one. An unrooted binary tree is a free tree in which all internal nodes have degree exactly three.
In some applications it may make sense to distinguish subtypes of unrooted binary trees: a
planar embedding
In graph theory, a planar graph is a graph (discrete mathematics), graph that can be graph embedding, embedded in the plane (geometry), plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. I ...
of the tree may be fixed by specifying a cyclic ordering for the edges at each vertex, making it into a
plane tree
''Platanus'' is a genus consisting of a small number of tree species native to the Northern Hemisphere. They are the sole living members of the family Platanaceae.
All mature members of ''Platanus'' are tall, reaching in height. All except f ...
. In computer science, binary trees are often rooted and ordered when they are used as
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s, but in the applications of unrooted binary trees in
hierarchical clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into ...
and
evolutionary tree
A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
reconstruction, unordered trees are more common.
Additionally, one may distinguish between trees in which all vertices have distinct labels, trees in which the leaves only are labeled, and trees in which the nodes are not labeled. In an unrooted binary tree with ''n'' leaves, there will be ''n'' − 2 internal nodes, so the labels may be taken from the set of integers from 1 to 2''n'' − 1 when all nodes are to be labeled, or from the set of integers from 1 to ''n'' when only the leaves are to be labeled.
[.]
Related structures
Rooted binary trees
An unrooted binary tree ''T'' may be transformed into a full rooted
binary tree
In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
(that is, a rooted tree in which each non-leaf node has exactly two children) by choosing a ''root edge'' ''e'' of ''T'', placing a new root node in the middle of ''e'', and directing every edge of the resulting subdivided tree away from the root node. Conversely, any full rooted binary tree may be transformed into an unrooted binary tree by removing the root node, replacing the path between its two children by a single undirected edge, and suppressing the orientation of the remaining edges in the graph. For this reason, there are exactly 2''n'' −3 times as many full rooted binary trees with ''n'' leaves as there are unrooted binary trees with ''n'' leaves.
Hierarchical clustering
A
hierarchical clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into ...
of a collection of objects may be formalized as a
maximal family of sets
In set theory and related branches of mathematics, a collection F of subsets of a given set S is called a family of subsets of S, or a family of sets over S. More generally, a collection of any sets whatsoever is called a family of sets, set fam ...
of the objects in which no two sets cross. That is, for every two sets ''S'' and ''T'' in the family, either ''S'' and ''T'' are disjoint or one is a subset of the other, and no more sets can be added to the family while preserving this property. If ''T'' is an unrooted binary tree, it defines a hierarchical clustering of its leaves: for each edge (''u'',''v'') in ''T'' there is a cluster consisting of the leaves that are closer to ''u'' than to ''v'', and these sets together with the empty set and the set of all leaves form a maximal non-crossing family. Conversely, from any maximal non-crossing family of sets over a set of ''n'' elements, one can form a unique unrooted binary tree that has a node for each triple (''A'',''B'',''C'') of disjoint sets in the family that together cover all of the elements.
Evolutionary trees
According to simple forms of the
theory of evolution
Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation t ...
, the history of life can be summarized as a
phylogenetic tree
A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
in which each node describes a species, the leaves represent the species that exist today, and the edges represent ancestor-descendant relationships between species. This tree has a natural orientation from ancestors to descendants, and a root at the
common ancestor
Common descent is a concept in evolutionary biology applicable when one species is the ancestor of two or more species later in time. All living beings are in fact descendants of a unique ancestor commonly referred to as the last universal comm ...
of the species, so it is a rooted tree. However, some methods of reconstructing binary trees can reconstruct only the nodes and the edges of this tree, but not their orientations.
For instance,
cladistic methods such as
maximum parsimony
In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes (or miminizes the cost of differentially weighted character-state changes) is preferred. ...
use as data a set of binary attributes describing features of the species. These methods seek a tree with the given species as leaves in which the internal nodes are also labeled with features, and attempt to minimize the number of times some feature is present at only one of the two endpoints of an edge in the tree. Ideally, each feature should only have one edge for which this is the case. Changing the root of a tree does not change this number of edge differences, so methods based on parsimony are not capable of determining the location of the tree root and will produce an unrooted tree, often an unrooted binary tree.
Unrooted binary trees also are produced by methods for inferring evolutionary trees based on quartet data specifying, for each four leaf species, the unrooted binary tree describing the evolution of those four species, and by methods that use
quartet distance The quartet distance is a way of measuring the distance between two phylogenetic trees. It is defined as the number of subsets of four leaves that are not related by the same topology in both trees.
Computing the quartet distance
The most straigh ...
to measure the distance between trees.
Branch-decomposition
Unrooted binary trees are also used to define
branch-decomposition
In graph theory, a branch-decomposition of an undirected graph ''G'' is a hierarchical clustering of the edges of ''G'', represented by an unrooted binary tree ''T'' with the edges of ''G'' as its leaves. Removing any edge from ''T'' partitions t ...
s of graphs, by forming an unrooted binary tree whose leaves represent the edges of the given graph. That is, a branch-decomposition may be viewed as a hierarchical clustering of the edges of the graph. Branch-decompositions and an associated numerical quantity, branch-width, are closely related to
treewidth
In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The gra ...
and form the basis for efficient
dynamic programming
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.
I ...
algorithms on graphs.
[.]
Enumeration
Because of their applications in hierarchical clustering, the most natural
graph enumeration
In combinatorics, an area of mathematics, graph enumeration describes a class of combinatorial enumeration problems in which one must count undirected or directed graphs of certain types, typically as a function of the number of vertices of the gra ...
problem on unrooted binary trees is to count the number of trees with ''n'' labeled leaves and unlabeled internal nodes. An unrooted binary tree on ''n'' labeled leaves can be formed by connecting the ''n''th leaf to a new node in the middle of any of the edges of an unrooted binary tree on ''n'' − 1 labeled leaves. There are 2''n'' − 5 edges at which the ''n''th node can be attached; therefore, the number of trees on ''n'' leaves is larger than the number of trees on ''n'' − 1 leaves by a factor of 2''n'' − 5. Thus, the number of trees on ''n'' labeled leaves is the
double factorial
In mathematics, the double factorial or semifactorial of a number , denoted by , is the product of all the integers from 1 up to that have the same parity (odd or even) as . That is,
:n!! = \prod_^ (n-2k) = n (n-2) (n-4) \cdots.
For even , the ...
:
The numbers of trees on 2, 3, 4, 5, ... labeled leaves are
:1, 1, 3, 15, 105, 945, 10395, 135135, 2027025, 34459425, ... .
Fundamental Equalities
The leaf-to-leaf path-length on a fixed Unrooted Binary Tree (UBT) T encodes the number of edges belonging to the unique path in T connecting a given leaf to another leaf. For example, by referring to the UBT shown in the image on the right, the path-length
between the leaves 1 and 2 is equal to 2 whereas the path-length
between the leaves 1 and 3 is equal to 3. The path-length sequence from a given leaf on a fixed UBT T encodes the lengths of the paths from the given leaf to all the remaining ones. For example, by referring to the UBT shown in the image on the right, the path-length sequence from the leaf 1 is
. The set of path-length sequences associated to the leaves of T is usually referred to as the ''path-length sequence collection'' of T
.
Daniele Catanzaro,
Raffaele Pesenti and
Laurence Wolsey
Laurence Alexander Wolsey is an English mathematician working in the field of integer programming. He is a former president and research director of the Center for Operations Research and Econometrics (CORE) at Université catholique de Louvain in ...
showed
that the path-length sequence collection encoding a given UBT with n leaves must satisfy specific equalities, namely
*
for all