A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching

diagram A diagram is a symbolic representation of information using visualization techniques. Diagrams have been used since prehistoric times on walls of caves, but became more prevalent during the Enlightenment. Sometimes, the technique uses a three- ...

or a

tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...

showing the

evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...

ary relationships among various biological

species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...

or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating

common ancestry Common descent is a concept in evolutionary biology applicable when one species is the ancestor of two or more species later in time. All living beings are in fact descendants of a unique ancestor commonly referred to as the last universal com ...

. In a ''rooted'' phylogenetic tree, each node with descendants represents the inferred

most recent common ancestor In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA) or concestor, of a set of organisms is the most recent individual from which all the organisms of the set are descended. The ...

of those descendants, and the edge lengths in some trees may be interpreted as time estimates. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units, as they cannot be directly observed. Trees are useful in fields of biology such as

bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...

systematics Biological systematics is the study of the diversification of living forms, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees (synonyms: cladograms, phylogenetic tre ...

, and

phylogenetics In biology, phylogenetics (; from Greek language, Greek wikt:φυλή, φυλή/wikt:φῦλον, φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary his ...

. ''Unrooted'' trees illustrate only the relatedness of the

leaf nodes In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...

and do not require the ancestral root to be known or inferred.

History

The idea of a "

tree of life The tree of life is a fundamental archetype in many of the world's mythological, religious, and philosophical traditions. It is closely related to the concept of the sacred tree.Giovino, Mariana (2007). ''The Assyrian Sacred Tree: A History ...

" arose from ancient notions of a ladder-like progression from lower into higher forms of

life Life is a quality that distinguishes matter that has biological processes, such as signaling and self-sustaining processes, from that which does not, and is defined by the capacity for growth, reaction to stimuli, metabolism, energ ...

(such as in the

Great Chain of Being The great chain of being is a hierarchical structure of all matter and life, thought by medieval Christianity to have been decreed by God. The chain begins with God and descends through angels, humans, animals and plants to minerals. The great ...

). Early representations of "branching" phylogenetic trees include a "paleontological chart" showing the geological relationships among plants and animals in the book ''Elementary Geology'', by Edward Hitchcock (first edition: 1840).

Charles Darwin Charles Robert Darwin ( ; 12 February 1809 – 19 April 1882) was an English naturalist, geologist, and biologist, widely known for his contributions to evolutionary biology. His proposition that all species of life have descended fr ...

featured a diagrammatic evolutionary "tree" in his 1859 book ''

On the Origin of Species ''On the Origin of Species'' (or, more completely, ''On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life''),The book's full original title was ''On the Origin of Species by Me ...

''. Over a century later,

evolutionary biologist Evolutionary biology is the subfield of biology that studies the evolutionary processes (natural selection, common descent, speciation) that produced the diversity of life on Earth. It is also defined as the study of the history of life for ...

s still use tree diagrams to depict

because such diagrams effectively convey the concept that

speciation Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within ...

occurs through the

adaptive Adaptation, in biology, is the process or trait by which organisms or population better match their environment Adaptation may also refer to: Arts * Adaptation (arts), a transfer of a work of art from one medium to another ** Film adaptation, a ...

and semi

random In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or combination. Ind ...

splitting of lineages. The term ''phylogenetic'', or ''phylogeny'', derives from the two

ancient greek Ancient Greek includes the forms of the Greek language used in ancient Greece and the ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Dark Ages (), the Archaic peri ...

words (), meaning "race, lineage", and (), meaning "origin, source".

Properties

Rooted tree

A rooted phylogenetic tree (see two graphics at top) is a

directed Director may refer to: Literature * ''Director'' (magazine), a British magazine * ''The Director'' (novel), a 1971 novel by Henry Denker * ''The Director'' (play), a 2000 play by Nancy Hasty Music * Director (band), an Irish rock band * ''D ...

with a unique node — the root — corresponding to the (usually imputed) most recent common ancestor of all the entities at the

leaves A leaf (plural, : leaves) is any of the principal appendages of a vascular plant plant stem, stem, usually borne laterally aboveground and specialized for photosynthesis. Leaves are collectively called foliage, as in "autumn foliage", wh ...

of the tree. The root node does not have a parent node, but serves as the parent of all other nodes in the tree. The root is therefore a node of

degree Degree may refer to: As a unit of measurement * Degree (angle), a unit of angle measurement ** Degree of geographical latitude ** Degree of geographical longitude * Degree symbol (°), a notation used in science, engineering, and mathematics ...

2, while other internal nodes have a minimum degree of 3 (where "degree" here refers to the total number of incoming and outgoing edges). The most common method for rooting trees is the use of an uncontroversial outgroup—close enough to allow inference from trait data or molecular sequencing, but far enough to be a clear outgroup. Another method is midpoint rooting, or a tree can also be rooted by using a non-stationary substitution model.

Unrooted tree

Unrooted trees illustrate the relatedness of the leaf nodes without making assumptions about ancestry. They do not require the ancestral root to be known or inferred. Unrooted trees can always be generated from rooted ones by simply omitting the root. By contrast, inferring the root of an unrooted tree requires some means of identifying ancestry. This is normally done by including an outgroup in the input data so that the root is necessarily between the outgroup and the rest of the taxa in the tree, or by introducing additional assumptions about the relative rates of evolution on each branch, such as an application of the

molecular clock The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleo ...

hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...

Bifurcating versus multifurcating

Both rooted and unrooted trees can be either bifurcating or multifurcating. A rooted bifurcating tree has exactly two descendants arising from each

interior node In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...

(that is, it forms a binary tree), and an unrooted bifurcating tree takes the form of an unrooted binary tree, a

free tree In graph theory, a tree is an undirected graph in which any two vertices are connected by ''exactly one'' path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by '' ...

with exactly three neighbors at each internal node. In contrast, a rooted multifurcating tree may have more than two children at some nodes and an unrooted multifurcating tree may have more than three neighbors at some nodes.

Labeled versus unlabeled

Both rooted and unrooted trees can be either labeled or unlabeled. A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, defines a topology only. Some sequence-based trees built from a small genomic locus, such as Phylotree, feature internal nodes labeled with inferred ancestral haplotypes.

Enumerating trees

Number of trees as a function of the number of leaves

The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more labeled than unlabeled trees, more multifurcating than bifurcating trees, and more rooted than unrooted trees. The last distinction is the most biologically relevant; it arises because there are many places on an unrooted tree to put the root. For bifurcating labeled trees, the total number of rooted trees is: :

(2n-3)!! = \frac

for

n \ge 2

n

represents the number of leaf nodes. For bifurcating labeled trees, the total number of unrooted trees is: :

(2n-5)!! = \frac

for

n \ge 3

. Among labeled bifurcating trees, the number of unrooted trees with

n

leaves is equal to the number of rooted trees with

n-1

leaves.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. The number of rooted trees grows quickly as a function of the number of tips. For 10 tips, there are more than

34 \times 10^6

possible bifurcating trees, and the number of multifurcating trees rises faster, with ca. 7 times as many of the latter as of the former. style="text-align: left; margin-left: auto; margin-right: auto; border: none;" , + Counting trees. ! Labeled
leaves !! Binary
unrooted trees !! Binary
rooted trees !! Multifurcating
rooted trees !! All possible
rooted trees , - , 1 , , 1 , , 1 , , 0 , , 1 , - , 2 , , 1 , , 1 , , 0 , , 1 , - , 3 , , 1 , , 3 , , 1 , , 4 , - , 4 , , 3 , , 15 , , 11 , , 26 , - , 5 , , 15 , , 105 , , 131 , , 236 , - , 6 , , 105 , , 945 , , 1,807 , , 2,752 , - , 7 , , 945 , , 10,395 , , 28,813 , , 39,208 , - , 8 , , 10,395 , , 135,135 , , 524,897 , , 660,032 , - , 9 , , 135,135 , , 2,027,025 , , 10,791,887 , , 12,818,912 , - , 10 , , 2,027,025 , , 34,459,425 , , 247,678,399 , , 282,137,824 , -

Special tree types

Dendrogram

dendrogram A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: * in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. ...

is a general name for a tree, whether phylogenetic or not, and hence also for the diagrammatic representation of a phylogenetic tree.

Cladogram

cladogram A cladogram (from Greek ''clados'' "branch" and ''gramma'' "character") is a diagram used in cladistics to show relations among organisms. A cladogram is not, however, an evolutionary tree because it does not show how ancestors are related to d ...

only represents a branching pattern; i.e., its branch lengths do not represent time or relative amount of character change, and its internal nodes do not represent ancestors. Phylogenetic chart of Lepidoptera chronogram

Phylogenetic chart of Lepidoptera chronogram

Phylogram

A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change. A chronogram is a phylogenetic tree that explicitly represents time through its branch lengths.

Dahlgrenogram

A Dahlgrenogram is a diagram representing a cross section of a phylogenetic tree.

Phylogenetic network

phylogenetic network A phylogenetic network is any graph used to visualize evolutionary relationships (either abstractly or explicitly) between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybridi ...

is not strictly speaking a tree, but rather a more general

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...

, or a

directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...

in the case of rooted networks. They are used to overcome some of the

limitations Limitation may refer to: *A disclaimer for research done in an experiment or study *A Statute of limitations A statute of limitations, known in civil law systems as a prescriptive period, is a law passed by a legislative body to set the maximum ...

inherent to trees.

Spindle diagram

A spindle diagram, or bubble diagram, is often called a romerogram, after its popularisation by the American palaeontologist

Alfred Romer Alfred Sherwood Romer (December 28, 1894 – November 5, 1973) was an American paleontologist and biologist and a specialist in vertebrate evolution. Biography Alfred Romer was born in White Plains, New York, the son of Harry Houston Romer an ...

. It represents taxonomic diversity (horizontal width) against

geological time The geologic time scale, or geological time scale, (GTS) is a representation of time based on the rock record of Earth. It is a system of chronological dating that uses chronostratigraphy (the process of relating strata to time) and geochron ...

(vertical axis) in order to reflect the variation of abundance of various taxa through time. However, a spindle diagram is not an evolutionary tree: the taxonomic spindles obscure the actual relationships of the parent taxon to the daughter taxon and have the disadvantage of involving the

paraphyly In taxonomy, a group is paraphyletic if it consists of the group's last common ancestor and most of its descendants, excluding a few monophyletic subgroups. The group is said to be paraphyletic ''with respect to'' the excluded subgroups. In co ...

of the parental group. This type of diagram is no longer used in the form originally proposed.

Coral of life

Darwin also mentioned that the ''coral'' may be a more suitable metaphor than the ''tree''. Indeed, phylogenetic corals are useful for portraying past and present life, and they have some advantages over trees (anastomoses allowed, etc.).

Construction

Phylogenetic trees composed with a nontrivial number of input sequences are constructed using

computational phylogenetics Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic

methods. Distance-matrix methods such as

neighbor-joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requi ...

UPGMA UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. The method is generally attributed to Sokal and Michener. The UPGMA method is similar to its ''weighted'' variant, the ...

, which calculate

genetic distance Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with many similar alleles have s ...

from

multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...

s, are simplest to implement, but do not invoke an evolutionary model. Many sequence alignment methods such as

ClustalW Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its ...

also create trees by using the simpler algorithms (i.e. those based on distance) of tree construction.

Maximum parsimony In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes (or miminizes the cost of differentially weighted character-state changes) is preferred. ...

is another simple method of estimating phylogenetic trees, but implies an implicit model of evolution (i.e. parsimony). More advanced methods use the

optimality criterion In statistics, an optimality criterion provides a measure of the fit of the data to a given hypothesis, to aid in model selection. A model is designated as the "best" of the candidate models if it gives the best value of an objective function mea ...

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

, often within a Bayesian framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is

NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...

, so

heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...

search and

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. Tree-building methods can be assessed on the basis of several criteria: * efficiency (how long does it take to compute the answer, how much memory does it need?) * power (does it make good use of the data, or is information being wasted?) * consistency (will it converge on the same answer repeatedly, if each time given different data for the same model problem?) * robustness (does it cope well with violations of the assumptions of the underlying model?) * falsifiability (does it alert us when it is not good to use, i.e. when assumptions are violated?) Tree-building techniques have also gained the attention of mathematicians. Trees can also be built using

T-theory T-theory is a branch of discrete mathematics dealing with analysis of trees and discrete metric spaces. General history T-theory originated from a question raised by Manfred Eigen in the late 1970s. He was trying to fit twenty distinct t-RNA molec ...

File formats

Trees can be encoded in a number of different formats, all of which must represent the nested structure of a tree. They may or may not encode branch lengths and other features. Standardized formats are critical for distributing and sharing trees without relying on graphics output that is hard to import into existing software. Commonly used formats are * Nexus file format *

Newick format In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way of representing graph-theoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Fels ...

Limitations of phylogenetic analysis

Although phylogenetic trees produced on the basis of sequenced

gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...

s or genomic data in different species can provide evolutionary insight, these analyses have important limitations. Most importantly, the trees that they generate are not necessarily correct – they do not necessarily accurately represent the evolutionary history of the included taxa. As with any scientific result, they are subject to falsification by further study (e.g., gathering of additional data, analyzing the existing data with improved methods). The data on which they are based may be noisy; the analysis can be confounded by

genetic recombination Genetic recombination (also known as genetic reshuffling) is the exchange of genetic material between different organisms which leads to production of offspring with combinations of traits that differ from those found in either parent. In eukaryo ...

horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between Unicellular organism, unicellular and/or multicellular organisms other than by the ("vertical") transmission of DNA from parent to offsprin ...

hybrid Hybrid may refer to: Science * Hybrid (biology), an offspring resulting from cross-breeding ** Hybrid grape, grape varieties produced by cross-breeding two ''Vitis'' species ** Hybridity, the property of a hybrid plant which is a union of two dif ...

isation between species that were not nearest neighbors on the tree before hybridisation takes place,

convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...

, and

conserved sequence In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids ( DNA and RNA) or proteins across species ( orthologous sequences), or within a genome ( paralogous sequences), or between donor and receptor taxa ...

s. Also, there are problems in basing an analysis on a single type of character, such as a single

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...

or only on morphological analysis, because such trees constructed from another unrelated data source often differ from the first, and therefore great care is needed in inferring phylogenetic relationships among species. This is most true of genetic material that is subject to lateral gene transfer and recombination, where different

haplotype A haplotype ( haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material ( DNA) which is inherited from two parents. Normally these organisms have their DNA or ...

blocks can have different histories. In these types of analysis, the output tree of a phylogenetic analysis of a single gene is an estimate of the gene's phylogeny (i.e. a gene tree) and not the phylogeny of the

taxa In biology, a taxon (back-formation from ''taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular nam ...

(i.e. species tree) from which these characters were sampled, though ideally, both should be very close. For this reason, serious phylogenetic studies generally use a combination of genes that come from different genomic sources (e.g., from mitochondrial or plastid vs. nuclear genomes), or genes that would be expected to evolve under different selective regimes, so that

homoplasy Homoplasy, in biology and phylogenetics, is the term used to describe a feature that has been gained or lost independently in separate lineages over the course of evolution. This is different from homology, which is the term used to characterize ...

(false

homology Homology may refer to: Sciences Biology *Homology (biology), any characteristic of biological organisms that is derived from a common ancestor * Sequence homology, biological homology between DNA, RNA, or protein sequences *Homologous chrom ...

) would be unlikely to result from natural selection. When extinct species are included as terminal nodes in an analysis (rather than, for example, to constrain internal nodes), they are considered not to represent direct ancestors of any extant species. Extinct species do not typically contain high-quality DNA. The range of useful DNA materials has expanded with advances in extraction and sequencing technologies. Development of technologies able to infer sequences from smaller fragments, or from spatial patterns of DNA degradation products, would further expand the range of DNA considered useful. Phylogenetic trees can also be inferred from a range of other data types, including morphology, the presence or absence of particular types of genes, insertion and deletion events – and any other observation thought to contain an evolutionary signal.

Phylogenetic network A phylogenetic network is any graph used to visualize evolutionary relationships (either abstractly or explicitly) between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybridi ...

s are used when bifurcating trees are not suitable, due to these complications which suggest a more reticulate evolutionary history of the organisms sampled.

References

External links

Images

Human Y-Chromosome 2002 Phylogenetic TreeiTOL: Interactive Tree Of LifePhylogenetic Tree of Artificial Organisms Evolved on Computers

Miyamoto and Goodman's Phylogram of Eutherian Mammals

General

*An overview of different methods of tree visualization is available at
OneZoom: Tree of Life – all living species as intuitive and zoomable fractal explorer (responsive design)Discover Life
An interactive tree based on the U.S. National Science Foundation's Assembling the Tree of Life Project

* ttp://tolweb.org/tree Tree of Life Web Projectbr>Phylogenetic inferring on the T-REX serverNCBI's Taxonomy Database
https://www.ncbi.nlm.nih.gov/Taxonomy/]
ETE: A Python Environment for Tree Exploration
This is a programming library to analyze, manipulate and visualize phylogenetic trees
Ref.A daily-updated tree of (sequenced) life
{{DEFAULTSORT:Phylogenetic Tree Phylogenetics Trees (data structures)