TheInfoList

Greek#REDIRECT Greek Greek may refer to: Greece Anything of, from, or related to Greece Greece ( el, Ελλάδα, , ), officially the Hellenic Republic, is a country located in Southeast Europe. Its population is approximately 10.7 million as of ...
''clados'' "branch" and ''gramma'' "character") is a diagram used in
cladistics Cladistics (; ) is an approach to biological classification In biology Biology is the natural science that studies life and living organisms, including their anatomy, physical structure, Biochemistry, chemical processes, Molecular ...

to show relations among organisms. A cladogram is not, however, an
evolutionary tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram A diagram is a symbolic representation Representation may refer to: Law a ...

because it does not show how ancestors are related to descendants, nor does it show how much they have changed, so many differing evolutionary trees can be consistent with the same cladogram. A cladogram uses lines that branch off in different directions ending at a
clade A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyly, monophyletic – that is, composed of a common ancestor and all its lineage (evolution), lineal descendants - on a phylogenetic tree. R ...

, a group of organisms with a
last common ancestor In biology Biology is the natural science that studies life and living organisms, including their anatomy, physical structure, Biochemistry, chemical processes, Molecular biology, molecular interactions, Physiology, physiological mechanisms ...
. There are many shapes of cladograms but they all have lines that branch off from other lines. The lines can be traced back to where they branch off. These branching off points represent a hypothetical ancestor (not an actual entity) which can be inferred to exhibit the traits shared among the terminal taxa above it. This hypothetical ancestor might then provide clues about the order of evolution of various features, adaptation, and other evolutionary narratives about ancestors. Although traditionally such cladograms were generated largely on the basis of morphological characters,
DNA Deoxyribonucleic acid (; DNA) is a molecule File:Pentacene on Ni(111) STM.jpg, A scanning tunneling microscopy image of pentacene molecules, which consist of linear chains of five carbon rings. A molecule is an electrically neutral gro ...

and
RNA Ribonucleic acid (RNA) is a polymer A polymer (; Greek ''wikt:poly-, poly-'', "many" + ''wikt:-mer, -mer'', "part") is a Chemical substance, substance or material consisting of very large molecules, or macromolecules, composed of many Re ...

sequencing data and
computational phylogenetics Computational phylogenetics is the application of computational algorithm In and , an algorithm () is a finite sequence of , computer-implementable instructions, typically to solve a class of problems or to perform a computation. Algorithms ...
are now very commonly used in the generation of cladograms, either on their own or in combination with morphology.

## Molecular versus morphological data

The characteristics used to create a cladogram can be roughly categorized as either morphological (synapsid skull, warm blooded,
notochord In anatomy Anatomy (Greek ''anatomē'', 'dissection') is the branch of biology concerned with the study of the structure of organism In biology, an organism (from Ancient Greek, Greek: ὀργανισμός, ''organismos'') is any in ...
, unicellular, etc.) or molecular (DNA, RNA, or other genetic information). Prior to the advent of DNA sequencing, cladistic analysis primarily used morphological data. Behavioral data (for animals) may also be used. As
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence A nucleic acid sequence is a succession of bases signified by a series of a set of five different letters that indicate the order of nucleotides Nucleotides are organic ...

has become cheaper and easier,
molecular systematics Molecular phylogenetics () is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominately in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to ...
has become a more and more popular way to infer phylogenetic hypotheses. Using a parsimony criterion is only one of several methods to infer a phylogeny from molecular data. Approaches such as
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating Estimation (or estimating) is the process of finding an estimate, or approximation An approximation is anything that is intentionally similar but not exactly equa ...
, which incorporate explicit models of sequence evolution, are non-Hennigian ways to evaluate sequence data. Another powerful method of reconstructing phylogenies is the use of genomic
retrotransposon markerRetrotransposon markers are components of DNA which are used as cladistic markers. They assist in determining the common ancestry, or not, of related taxa. The "presence" of a given retrotransposon in related taxa suggests their orthologous integ ...
s, which are thought to be less prone to the problem of reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the
genome In the fields of molecular biology Molecular biology is the branch of biology Biology is the natural science that studies life and living organisms, including their anatomy, physical structure, Biochemistry, chemical processes, M ...

was entirely random; this seems at least sometimes not to be the case, however.

## Plesiomorphies and synapomorphies

Researchers must decide which character states are "ancestral" ('''') and which are derived (''
synapomorphies 279px, trait states. In phylogenetics, apomorphy and synapomorphy refer to derived characters of a clade A clade (; from grc, , ''klados'', "branch"), also known as a monophyletic group or natural group, is a group of organisms that are mon ...

''), because only synapomorphic character states provide evidence of grouping. This determination is usually done by comparison to the character states of one or more ''outgroups''. States shared between the outgroup and some members of the in-group are symplesiomorphies; states that are present only in a subset of the in-group are synapomorphies. Note that character states unique to a single terminal (autapomorphies) do not provide evidence of grouping. The choice of an outgroup is a crucial step in cladistic analysis because different outgroups can produce trees with profoundly different topologies.

## Homoplasies

A
homoplasy Homoplasy, in biology and phylogenetics, is when a Phenotypic trait, trait has been gained or lost independently in separate lineages over the course of evolution. This is different from Homology (biology), homology, which is the similarity of trait ...

is a character state that is shared by two or more taxa due to some cause ''other'' than common ancestry. The two main types of homoplasy are convergence (evolution of the "same" character in at least two distinct lineages) and reversion (the return to an ancestral character state). Characters that are obviously homoplastic, such as white fur in different lineages of Arctic mammals, should not be included as a character in a phylogenetic analysis as they do not contribute anything to our understanding of relationships. However, homoplasy is often not evident from inspection of the character itself (as in DNA sequence, for example), and is then detected by its incongruence (unparsimonious distribution) on a most-parsimonious cladogram. Note that characters that are homoplastic may still contain phylogenetic signal. A well-known example of homoplasy due to convergent evolution would be the character, "presence of wings". Although the wings of birds,
bat Bats are mammal Mammals (from Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally spoken in the area around Rome, known as Latium. Through the po ...

s, and insects serve the same function, each evolved independently, as can be seen by their
anatomy Anatomy (Greek ''anatomē'', 'dissection') is the branch of biology concerned with the study of the structure of organisms and their parts. Anatomy is a branch of natural science which deals with the structural organization of living things. It ...

. If a bird, bat, and a winged insect were scored for the character, "presence of wings", a homoplasy would be introduced into the dataset, and this could potentially confound the analysis, possibly resulting in a false hypothesis of relationships. Of course, the only reason a homoplasy is recognizable in the first place is because there are other characters that imply a pattern of relationships that reveal its homoplastic distribution.

## What is not a cladogram

A cladogram is the diagrammatic result of an analysis, which groups taxa on the basis of synapomorphies alone. There are many other phylogenetic algorithms that treat data somewhat differently, and result in phylogenetic trees that look like cladograms but are not cladograms. For example, phenetic algorithms, such as UPGMA and Neighbor-Joining, group by overall similarity, and treat both synapomorphies and symplesiomorphies as evidence of grouping, The resulting diagrams are phenograms, not cladograms, Similarly, the results of model-based methods (Maximum Likelihood or Bayesian approaches) that take into account both branching order and "branch length," count both synapomorphies and autapomorphies as evidence for or against grouping, The diagrams resulting from those sorts of analysis are not cladograms, either.

There are several
algorithms In mathematics Mathematics (from Greek: ) includes the study of such topics as numbers ( and ), formulas and related structures (), shapes and spaces in which they are contained (), and quantities and their changes ( and ). There is no ...
available to identify the "best" cladogram. Most algorithms use a
metric METRIC (Mapping EvapoTranspiration at high Resolution with Internalized Calibration) is a computer model Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of or th ...
to measure how consistent a candidate cladogram is with the data. Most cladogram algorithms use the mathematical techniques of
optimization File:Nelder-Mead Simionescu.gif, Nelder-Mead minimum search of Test functions for optimization, Simionescu's function. Simplex vertices are ordered by their values, with 1 having the lowest ( best) value., alt= Mathematical optimization (alter ...
and minimization. In general, cladogram generation algorithms must be implemented as computer programs, although some algorithms can be performed manually when the data sets are modest (for example, just a few species and a couple of characteristics). Some algorithms are useful only when the characteristic data are molecular (DNA, RNA); other algorithms are useful only when the characteristic data are morphological. Other algorithms can be used when the characteristic data includes both molecular and morphological data. Algorithms for cladograms or other types of phylogenetic trees include
least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the resid ...
,
neighbor-joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) Cluster analysis, clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually used for trees based on DNA or protein primary ...
, parsimony,
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating Estimation (or estimating) is the process of finding an estimate, or approximation An approximation is anything that is intentionally similar but not exactly equa ...
, and
Bayesian inference Bayesian inference is a method of in which is used to update the probability for a hypothesis as more or becomes available. Bayesian inference is an important technique in , and especially in . Bayesian updating is particularly important in th ...
. Biologists sometimes use the term parsimony for a specific kind of cladogram generation algorithm and sometimes as an umbrella term for all phylogenetic algorithms. Algorithms that perform optimization tasks (such as building cladograms) can be sensitive to the order in which the input data (the list of species and their characteristics) is presented. Inputting the data in various orders can cause the same algorithm to produce different "best" cladograms. In these situations, the user should input the data in various orders and compare the results. Using different algorithms on a single data set can sometimes yield different "best" cladograms, because each algorithm may have a unique definition of what is "best". Because of the astronomical number of possible cladograms, algorithms cannot guarantee that the solution is the overall best solution. A nonoptimal cladogram will be selected if the program settles on a local minimum rather than the desired global minimum. To help solve this problem, many cladogram algorithms use a
simulated annealing Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem ...
approach to increase the likelihood that the selected cladogram is the optimal one. The basal position is the direction of the base (or root) of a rooted phylogenetic tree or cladogram. A basal clade is the earliest clade (of a given taxonomic rank to branch within a larger clade.

# Statistics

## Incongruence length difference test (or partition homogeneity test)

The incongruence length difference test (ILD) is a measurement of how the combination of different datasets (e.g. morphological and molecular, plastid and nuclear genes) contributes to a longer tree. It is measured by first calculating the total tree length of each partition and summing them. Then replicates are made by making randomly assembled partitions consisting of the original partitions. The lengths are summed. A p value of 0.01 is obtained for 100 replicates if 99 replicates have longer combined tree lengths.

## Measuring homoplasy

Some measures attempt to measure the amount of homoplasy in a dataset with reference to a tree,reviewed in though it is not necessarily clear precisely what property these measures aim to quantify

### Consistency index

The consistency index (CI) measures the consistency of a tree to a set of data – a measure of the minimum amount of homoplasy implied by the tree. It is calculated by counting the minimum number of changes in a dataset and dividing it by the actual number of changes needed for the cladogram. A consistency index can also be calculated for an individual character ''i'', denoted ci. Besides reflecting the amount of homoplasy, the metric also reflects the number of taxa in the dataset, (to a lesser extent) the number of characters in a dataset, the degree to which each character carries phylogenetic information, and the fashion in which additive characters are coded, rendering it unfit for purpose. ci occupies a range from 1 to 1/ 'n.taxa''/2in binary characters with an even state distribution; its minimum value is larger when states are not evenly spread. In general, for a binary or non-binary character with $n.states$, ci occupies a range from 1 to $\left(n.states-1\right)/\left(n.taxa-\lceil n.taxa/n.states\rceil\right)$.

### Retention index

The retention index (RI) was proposed as an improvement of the CI "for certain applications" This metric also purports to measure of the amount of homoplasy, but also measures how well synapomorphies explain the tree. It is calculated taking the (maximum number of changes on a tree minus the number of changes on the tree), and dividing by the (maximum number of changes on the tree minus the minimum number of changes in the dataset). The rescaled consistency index (RC) is obtained by multiplying the CI by the RI; in effect this stretches the range of the CI such that its minimum theoretically attainable value is rescaled to 0, with its maximum remaining at 1. The homoplasy index (HI) is simply 1 − CI.

### Homoplasy Excess Ratio

This measures the amount of homoplasy observed on a tree relative to the maximum amount of homoplasy that could theoretically be present – 1 − (observed homoplasy excess) / (maximum homoplasy excess). A value of 1 indicates no homoplasy; 0 represents as much homoplasy as there would be in a fully random dataset, and negative values indicate more homoplasy still (and tend only to occur in contrived examples). The HER is presented as the best measure of homoplasy currently available.

*
Phylogenetics In biology Biology is the natural science that studies life and living organisms, including their anatomy, physical structure, Biochemistry, chemical processes, Molecular biology, molecular interactions, Physiology, physiological mechanism ...

*
Dendrogram File:Phylogenetic tree.svg, A dendrogram of the Tree of Life. This phylogenetic tree is adapted from Woese et al. rRNA analysis. The vertical line at bottom represents the last universal common ancestor (LUCA). A dendrogram is a diagram repre ...
*
Basal (phylogenetics)In phylogenetics In biology Biology is the natural science that studies life and living organisms, including their anatomy, physical structure, Biochemistry, chemical processes, Molecular biology, molecular interactions, Physiology, physiolog ...