A phylogenetic tree (also phylogeny or evolutionary tree
[Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.]) is a branching
diagram
A diagram is a symbolic representation of information using visualization techniques. Diagrams have been used since prehistoric times on walls of caves, but became more prevalent during the Enlightenment. Sometimes, the technique uses a three- ...
or a
tree showing the
evolutionary relationships among various biological
species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating
common ancestry.
In a ''rooted'' phylogenetic tree, each node with descendants represents the inferred
most recent common ancestor of those descendants, and the edge lengths in some trees may be interpreted as time estimates. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units, as they cannot be directly observed. Trees are useful in fields of biology such as
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
,
systematics
Biological systematics is the study of the diversification of living forms, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees (synonyms: cladograms, phylogenetic tre ...
, and
phylogenetics. ''Unrooted'' trees illustrate only the relatedness of the
leaf nodes and do not require the ancestral root to be known or inferred.
History
The idea of a "
tree of life" arose from ancient notions of a ladder-like progression from lower into higher forms of
life (such as in the
Great Chain of Being). Early representations of "branching" phylogenetic trees include a "paleontological chart" showing the geological relationships among plants and animals in the book ''Elementary Geology'', by
Edward Hitchcock
Edward Hitchcock (May 24, 1793 – February 27, 1864) was an American geologist and the third President of Amherst College (1845–1854).
Life
Born to poor parents, he attended newly founded Deerfield Academy, where he was later principal, ...
(first edition: 1840).
Charles Darwin featured a diagrammatic
evolutionary "tree" in his 1859 book ''
On the Origin of Species''. Over a century later,
evolutionary biologists still use
tree diagrams to depict
evolution because such diagrams effectively convey the concept that
speciation
Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within ...
occurs through the
adaptive and semi
random splitting of lineages.
The term ''phylogenetic'', or ''phylogeny'', derives from the two
ancient greek words (), meaning "race, lineage", and (), meaning "origin, source".
Properties
Rooted tree
A rooted phylogenetic tree (see two graphics at top) is a
directed tree with a unique node — the root — corresponding to the (usually
imputed) most recent common ancestor of all the entities at the
leaves
A leaf (plural, : leaves) is any of the principal appendages of a vascular plant plant stem, stem, usually borne laterally aboveground and specialized for photosynthesis. Leaves are collectively called foliage, as in "autumn foliage", wh ...
of the tree. The root node does not have a parent node, but serves as the parent of all other nodes in the tree. The root is therefore a node of
degree 2, while other internal nodes have a minimum degree of 3 (where "degree" here refers to the total number of incoming and outgoing edges).
The most common method for rooting trees is the use of an uncontroversial
outgroup Outgroup may refer to:
* Outgroup (cladistics), an evolutionary-history concept
* Outgroup (sociology), a social group
{{disambig ...
—close enough to allow inference from trait data or molecular sequencing, but far enough to be a clear outgroup. Another method is midpoint rooting, or a tree can also be rooted by using a non-stationary
substitution model
In biology, a substitution model, also called models of DNA sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules (e.g., DNA sequences) represented as sequen ...
.
Unrooted tree
Unrooted trees illustrate the relatedness of the leaf nodes without making assumptions about ancestry. They do not require the ancestral root to be known or inferred. Unrooted trees can always be generated from rooted ones by simply omitting the root. By contrast, inferring the root of an unrooted tree requires some means of identifying ancestry. This is normally done by including an outgroup in the input data so that the root is necessarily between the outgroup and the rest of the taxa in the tree, or by introducing additional assumptions about the relative rates of evolution on each branch, such as an application of the
molecular clock hypothesis.
Bifurcating versus multifurcating
Both rooted and unrooted trees can be either
bifurcating or multifurcating. A rooted bifurcating tree has exactly two descendants arising from each
interior node (that is, it forms a
binary tree
In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
), and an unrooted bifurcating tree takes the form of an
unrooted binary tree
In mathematics and computer science, an unrooted binary tree is an unrooted tree in which each vertex has either one or three neighbors.
Definitions
A free tree or unrooted tree is a connected undirected graph with no cycles. The vertices with on ...
, a
free tree
In graph theory, a tree is an undirected graph in which any two vertices are connected by ''exactly one'' path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by ''a ...
with exactly three neighbors at each internal node. In contrast, a rooted multifurcating tree may have more than two children at some nodes and an unrooted multifurcating tree may have more than three neighbors at some nodes.
Labeled versus unlabeled
Both rooted and unrooted trees can be either labeled or unlabeled. A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, defines a topology only. Some sequence-based trees built from a small genomic locus, such as Phylotree, feature internal nodes labeled with inferred ancestral haplotypes.
Enumerating trees
The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more labeled than unlabeled trees, more multifurcating than bifurcating trees, and more rooted than unrooted trees. The last distinction is the most biologically relevant; it arises because there are many places on an unrooted tree to put the root. For bifurcating labeled trees, the total number of rooted trees is:
:
for
,
represents the number of leaf nodes.
For bifurcating labeled trees, the total number of unrooted trees is:
:
for
.
Among labeled bifurcating trees, the number of unrooted trees with
leaves is equal to the number of rooted trees with
leaves.
[Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.]
The number of rooted trees grows quickly as a function of the number of tips. For 10 tips, there are more than
possible bifurcating trees, and the number of multifurcating trees rises faster, with ca. 7 times as many of the latter as of the former.
style="text-align: left; margin-left: auto; margin-right: auto; border: none;"
, + Counting trees.
! Labeled
leaves !! Binary
unrooted trees !! Binary
rooted trees !! Multifurcating
rooted trees !! All possible
rooted trees
, -
, 1 , , 1 , , 1 , , 0 , , 1
, -
, 2 , , 1 , , 1 , , 0 , , 1
, -
, 3 , , 1 , , 3 , , 1 , , 4
, -
, 4 , , 3 , , 15 , , 11 , , 26
, -
, 5 , , 15 , , 105 , , 131 , , 236
, -
, 6 , , 105 , , 945 , , 1,807 , , 2,752
, -
, 7 , , 945 , , 10,395 , , 28,813 , , 39,208
, -
, 8 , , 10,395 , , 135,135 , , 524,897 , , 660,032
, -
, 9 , , 135,135 , , 2,027,025 , , 10,791,887 , , 12,818,912
, -
, 10 , , 2,027,025 , , 34,459,425 , , 247,678,399 , , 282,137,824
, -
Special tree types
Dendrogram
A
dendrogram
A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts:
* in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses.
...
is a general name for a tree, whether phylogenetic or not, and hence also for the diagrammatic representation of a phylogenetic tree.
Cladogram
A
cladogram only represents a branching pattern; i.e., its branch lengths do not represent time or relative amount of character change, and its internal nodes do not represent ancestors.
Phylogram
A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.
A chronogram is a phylogenetic tree that explicitly represents time through its branch lengths.
Dahlgrenogram
A
Dahlgrenogram is a diagram representing a cross section of a phylogenetic tree.
Phylogenetic network
A
phylogenetic network is not strictly speaking a tree, but rather a more general
graph, or a
directed acyclic graph in the case of rooted networks. They are used to overcome some of the
limitations inherent to trees.
Spindle diagram
A spindle diagram, or bubble diagram, is often called a romerogram, after its popularisation by the American palaeontologist
Alfred Romer.
It represents taxonomic diversity (horizontal width) against
geological time
The geologic time scale, or geological time scale, (GTS) is a representation of time based on the rock record of Earth. It is a system of chronological dating that uses chronostratigraphy (the process of relating strata to time) and geochronol ...
(vertical axis) in order to reflect the variation of abundance of various taxa through time.
However, a spindle diagram is not an evolutionary tree:
the taxonomic spindles obscure the actual relationships of the parent taxon to the daughter taxon
and have the disadvantage of involving the
paraphyly
In taxonomy, a group is paraphyletic if it consists of the group's last common ancestor and most of its descendants, excluding a few monophyletic subgroups. The group is said to be paraphyletic ''with respect to'' the excluded subgroups. In co ...
of the parental group.
This type of diagram is no longer used in the form originally proposed.
Coral of life
Darwin
also mentioned that the ''coral'' may be a more suitable metaphor than the ''tree''. Indeed,
phylogenetic corals are useful for portraying past and present life, and they have some advantages over trees (anastomoses allowed, etc.).
Construction
Phylogenetic trees composed with a nontrivial number of input sequences are constructed using
computational phylogenetics methods. Distance-matrix methods such as
neighbor-joining or
UPGMA, which calculate
genetic distance from
multiple sequence alignments, are simplest to implement, but do not invoke an evolutionary model. Many sequence alignment methods such as
ClustalW
Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its ...
also create trees by using the simpler algorithms (i.e. those based on distance) of tree construction.
Maximum parsimony is another simple method of estimating phylogenetic trees, but implies an implicit model of evolution (i.e. parsimony). More advanced methods use the
optimality criterion of
maximum likelihood, often within a
Bayesian framework, and apply an explicit model of evolution to phylogenetic tree estimation.
Identifying the optimal tree using many of these techniques is
NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
,
so
heuristic search and
optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data.
Tree-building methods can be assessed on the basis of several criteria:
* efficiency (how long does it take to compute the answer, how much memory does it need?)
* power (does it make good use of the data, or is information being wasted?)
* consistency (will it converge on the same answer repeatedly, if each time given different data for the same model problem?)
* robustness (does it cope well with violations of the assumptions of the underlying model?)
* falsifiability (does it alert us when it is not good to use, i.e. when assumptions are violated?)
Tree-building techniques have also gained the attention of mathematicians. Trees can also be built using
T-theory.
File formats
Trees can be encoded in a number of different formats, all of which must represent the nested structure of a tree. They may or may not encode branch lengths and other features. Standardized formats are critical for distributing and sharing trees without relying on graphics output that is hard to import into existing software. Commonly used formats are
*
Nexus file format
*
Newick format
Limitations of phylogenetic analysis
Although phylogenetic trees produced on the basis of sequenced
genes or
genomic
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data in different species can provide evolutionary insight, these analyses have important limitations. Most importantly, the trees that they generate are not necessarily correct – they do not necessarily accurately represent the evolutionary history of the included taxa. As with any scientific result, they are subject to
falsification by further study (e.g., gathering of additional data, analyzing the existing data with improved methods). The data on which they are based may be
noisy;
the analysis can be confounded by
genetic recombination
Genetic recombination (also known as genetic reshuffling) is the exchange of genetic material between different organisms which leads to production of offspring with combinations of traits that differ from those found in either parent. In eukaryo ...
,
horizontal gene transfer,
hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place,
convergent evolution, and
conserved sequences.
Also, there are problems in basing an analysis on a single type of character, such as a single
gene or
protein or only on morphological analysis, because such trees constructed from another unrelated data source often differ from the first, and therefore great care is needed in inferring phylogenetic relationships among species. This is most true of genetic material that is subject to lateral gene transfer and
recombination, where different
haplotype
A haplotype ( haploid genotype) is a group of alleles in an organism that are inherited together from a single parent.
Many organisms contain genetic material ( DNA) which is inherited from two parents. Normally these organisms have their DNA or ...
blocks can have different histories. In these types of analysis, the output tree of a phylogenetic analysis of a single gene is an estimate of the gene's phylogeny (i.e. a gene tree) and not the phylogeny of the
taxa (i.e. species tree) from which these characters were sampled, though ideally, both should be very close. For this reason, serious phylogenetic studies generally use a combination of genes that come from different genomic sources (e.g., from mitochondrial or plastid vs. nuclear genomes),
or genes that would be expected to evolve under different selective regimes, so that
homoplasy (false
homology) would be unlikely to result from natural selection.
When extinct species are included as
terminal nodes in an analysis (rather than, for example, to constrain internal nodes), they are considered not to represent direct ancestors of any extant species. Extinct species do not typically contain high-quality
DNA.
The range of useful DNA materials has expanded with advances in extraction and sequencing technologies. Development of technologies able to infer sequences from smaller fragments, or from spatial patterns of DNA degradation products, would further expand the range of DNA considered useful.
Phylogenetic trees can also be inferred from a range of other data types, including morphology, the presence or absence of particular types of genes, insertion and deletion events – and any other observation thought to contain an evolutionary signal.
Phylogenetic networks are used when bifurcating trees are not suitable, due to these complications which suggest a more reticulate evolutionary history of the organisms sampled.
See also
*
Clade
A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English term, ...
*
Cladistics
*
Computational phylogenetics
*
Evolutionary biology
*
Evolutionary taxonomy
*
Generalized tree alignment
*
List of phylogenetics software
*
List of phylogenetic tree visualization software
*
PANDIT
A Pandit ( sa, पण्डित, paṇḍit; hi, पंडित; also spelled Pundit, pronounced ; abbreviated Pt.) is a man with specialised knowledge or a teacher of any field of knowledge whether it is shashtra (Holy Books) or shastra (Wea ...
, a biological database covering protein domains
*
Phylogenetic comparative methods
*
Taxonomic rank
References
Further reading
* Schuh, R. T. and A. V. Z. Brower. 2009. ''Biological Systematics: principles and applications (2nd edn.)''
*
Manuel Lima, ''The Book of Trees: Visualizing Branches of Knowledge'', 2014, Princeton Architectural Press, New York.
*
MEGA
Mega or MEGA may refer to:
Science
* mega-, a metric prefix denoting 106
* Mega (number), a certain very large integer in Steinhaus–Moser notation
* "mega-" a prefix meaning "large" that is used in taxonomy
* Gravity assist, for ''Moon-Earth ...
, a free software to draw phylogenetic trees.
* Gontier, N. 2011. "Depicting the Tree of Life: the Philosophical and Historical Roots of Evolutionary Tree Diagrams." Evolution, Education, Outreach 4: 515–538.
External links
Images
Human Y-Chromosome 2002 Phylogenetic TreeiTOL: Interactive Tree Of LifePhylogenetic Tree of Artificial Organisms Evolved on ComputersMiyamoto and Goodman's Phylogram of Eutherian Mammals
General
*An overview of different methods of tree visualization is available at
OneZoom: Tree of Life – all living species as intuitive and zoomable fractal explorer (responsive design)Discover LifeAn interactive tree based on the U.S. National Science Foundation's Assembling the Tree of Life Project
*
ttp://tolweb.org/tree Tree of Life Web Projectbr>
Phylogenetic inferring on the T-REX serverNCBI's Taxonomy Databasehttps://www.ncbi.nlm.nih.gov/Taxonomy/]
ETE: A Python Environment for Tree ExplorationThis is a programming library to analyze, manipulate and visualize phylogenetic trees
Ref.A daily-updated tree of (sequenced) life
{{DEFAULTSORT:Phylogenetic Tree
Phylogenetics
Trees (data structures)