Ancestral Sequence Reconstruction
   HOME

TheInfoList



OR:

Ancestral sequence reconstruction (ASR) – also known as ancestral gene/sequence reconstruction/resurrection – is a technique used in the study of
molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
. The method uses related sequences to reconstruct an "ancestral" gene from a
multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
. The method can be used to 'resurrect' ancestral proteins and was suggested in 1963 by
Linus Pauling Linus Carl Pauling (; February 28, 1901August 19, 1994) was an American chemist, biochemist, chemical engineer, peace activist, author, and educator. He published more than 1,200 papers and books, of which about 850 dealt with scientific top ...
and
Emile Zuckerkandl Émile Zuckerkandl (July 4, 1922 – November 9, 2013) was an Austrian-born French biologist considered one of the founders of the field of molecular evolution. He introduced, with Linus Pauling, the concept of the "molecular clock", which enab ...
. In the case of enzymes, this approach has been called paleoenzymology (British: palaeoenzymology). Some early efforts were made in the 1980s and 1990s, led by the laboratory of Steven A. Benner, showing the potential of this technique. Thanks to the improvement of algorithms and of better sequencing and synthesis techniques, the method was developed further in the early 2000s to allow the resurrection of a greater variety of and much more ancient genes. Over the last decade, ancestral protein resurrection has developed as a strategy to reveal the mechanisms and dynamics of protein evolution.


Principles

Unlike conventional evolutionary and biochemical approaches to studying proteins, i.e. the so-called ''horizontal'' comparison of related protein homologues from different branch ends of the
tree of life The tree of life is a fundamental archetype in many of the world's mythological, religious, and philosophical traditions. It is closely related to the concept of the sacred tree.Giovino, Mariana (2007). ''The Assyrian Sacred Tree: A History ...
; ASR probes the statistically inferred ancestral proteins within the nodes of the tree – in a ''vertical'' manner (see diagram, right). This approach gives access to protein properties that may have transiently arisen over evolutionary time and has recently been used as a way to infer the potential
selection pressures Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
that resulted in present-day sequences. ASR has been used to probe the causative mutation that resulted in a protein's
neofunctionalization Neofunctionalization, one of the possible outcomes of functional divergence, occurs when one gene copy, or paralog, takes on a totally new function after a gene duplication event. Neofunctionalization is an adaptive mutation process; meaning one ...
after duplication by first determining that said mutation was located between ancestors '5' and '4' on the diagram (illustratively) using functional assays. In the field of protein
biophysics Biophysics is an interdisciplinary science that applies approaches and methods traditionally used in physics to study biological phenomena. Biophysics covers all scales of biological organization, from molecular to organismic and populations. ...
, ASR has also been used to study the development of a protein's thermodynamic and kinetic landscapes over evolutionary time as well as protein folding pathways by combining many modern day analytical techniques such as HX/MS. These sort of insights are typically inferred from several ancestors reconstructed along a phylogeny – referring to the previous analogy, by studying nodes ''higher and higher'' (further and further back in evolutionary time) within the tree of life. Most ASR studies are conducted ''in vitro'', and have revealed ancestral protein properties that seem to be evolutionarily desirable traits – such as increased thermostability, catalytic activity and catalytic promiscuity. These data have been accredited to artifacts of the ASR algorithms, as well as indicative illustrations of ancient Earth's environment – often, ASR research must be complemented with extensive controls (usually alternate ASR experiments) to mitigate algorithmic error. Not all studied ASR proteins exhibit this so-called 'ancestral superiority'. The nascent field of ' evolutionary biochemistry' has been bolstered by the recent increase in ASR studies using the ancestors as ways to probe organismal fitness within certain cellular contexts – effectively testing ancestral proteins ''in vivo''. Due to inherent limitations in these sorts of studies – primarily being the lack of suitably ancient genomes to fit these ancestors in to, the small repertoire of well categorised laboratory model systems, and the inability to mimic ancient cellular environments; very few ASR studies ''in vivo'' have been conducted. Despite the above mentioned obstacles, preliminary insights into this avenue of research from a 2015 paper, have revealed that observed 'ancestral superiority' ''in vitro'' were not recapitulated ''in vivo'' of a given protein. ASR presents one of a few mechanisms to study biochemistry of the
Precambrian The Precambrian (or Pre-Cambrian, sometimes abbreviated pꞒ, or Cryptozoic) is the earliest part of Earth's history, set before the current Phanerozoic Eon. The Precambrian is so named because it preceded the Cambrian, the first period of the ...
era of life (>541 Ma) and is hence often used in '
paleogenetics Paleogenetics is the study of the past through the examination of preserved genetic material from the remains of ancient organisms. Emile Zuckerkandl and Linus Pauling introduced the term in 1963, long before the sequencing of DNA, in reference t ...
'; indeed Zuckerandl and Pauling originally intended ASR to be the starting point of a field they termed 'Paleobiochemistry'.


Methodology

Several related homologues of the protein of interest are selected and aligned in a
multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
(MSA), a '
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
' is constructed with statistically inferred sequences at the nodes of the branches. It is these sequences that are the so-called 'ancestors' – the process of synthesising the corresponding DNA, transforming it into a cell and producing a protein is the so-called 'reconstruction'. Ancestral sequences are typically calculated by
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
, however
Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
methods are also implemented. Because the ancestors are inferred from a phylogeny, the topology and composition of the phylogeny plays a major role in the output ASR sequences. Given that there is much discourse and debate over how to construct phylogenies – for example whether or not thermophilic bacteria are basal or derivative in bacterial evolution – many ASR papers construct several phylogenies with differing topologies and hence differing ASR sequences. These sequences are then compared and often several (~10) are expressed and studied per phylogenetic node. ASR does not claim to recreate the actual sequence of the ancient protein/DNA, but rather a sequence that is likely to be similar to the one that was indeed at the node. This is not considered a shortcoming of ASR as it fits into the ' neutral network' model of protein evolution, whereby at evolutionary junctions (nodes) a population of genotypically different but phenotypically similar protein sequences existed in the extant organismal population. Hence, it is possible that ASR would generate one of the sequences of a node's neutral network and while it may not represent the genotype of the last common ancestor of the modern day sequences, it does likely represent the phenotype. This is supported by the modern day observation that many mutations in a protein's non-catalytic/functional site cause minor changes in biophysical properties. Hence, ASR allows one to probe the biophysical properties of past proteins and is indicative of ancient genetics. Maximum likelihood (ML) methods work by generating a sequence where the residue at each position is predicted to be the most likely to occupy said position by the method of inference used – typically this is a scoring matrix (similar to those used in
BLAST Blast or The Blast may refer to: *Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front Film * ''Blast'' (1997 film), ...
s or MSAs) calculated from extant sequences. Alternate methods include
maximum parsimony In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes (or miminizes the cost of differentially weighted character-state changes) is preferred. ...
(MP) that construct a sequence based on a model of sequence evolution – usually the idea that the minimum number of nucleotidal sequence changes represents the most efficient route for evolution to take and by
Occam's razor Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...
is the most likely. MP is often considered the least reliable method for reconstruction as it arguably oversimplifies evolution to a degree that is not applicable on the billion year scale. Another method involves the consideration of residue uncertainty – so-called Bayesian methods – this form of ASR is sometimes used to complement ML methods but typically produces more ambiguous sequences. In ASR, the term 'ambiguity' refers to residue positions where no clear substitution can be predicted – often in these cases, several ASR sequences are produced, encompassing most of the ambiguities and compared to one-another. ML ASR often needs complementing experiments to indicate that the derived sequences are more than just consensuses of the input sequences. This is particularly necessary in the observation of 'Ancestral Superiority'. In the trend of increasing thermostability, one explanation is that ML ASR creates a consensus sequence of several different, parallel mechanisms evolved to confer minor protein thermostability throughout the phylogeny – leading to an additive effect resulting in 'superior' ancestral thermostability. The expression of consensus sequences and parallel ASR via non-ML methods are often required to disband this theory per experiment. One other concern raised by ML methods is that the scoring matrices are derived from modern sequences and particular amino acid frequencies seen today may not be the same as in Precambrian biology, resulting in skewed sequence inference. Several studies have attempted to construct ancient scoring matrices via various methodologies and have compared the resultant sequences and their protein's biophysical properties. While these modified sequences result in somewhat different ASR sequences, the observed biophysical properties did not seem to vary outside from experimental error. Because of the 'holistic' nature of ASR and the intense complexity that arises when one considers all the possible sources of experimental error – the experimental community considers the ultimate measurement of ASR reliability to be the comparison of several alternate ASR reconstructions of the same node and the identification of similar biophysical properties. While this method does not offer a robust statistical, mathematical measure of reliability it does build off of the fundamental idea used in ASR that individual amino acid substitutions do not cause significant biophysical property changes in a protein – a tenant that must be held true in order to be able to overcome the effect of inference ambiguity. Candidates used for ASR are often selected based on the particular property of interest being studied – e.g. thermostability. By selecting sequences from either end of a property's range (e.g., psychrophilic proteins and thermophilic proteins) but ''within'' a protein family, ASR can be used to probe the specific sequence changes that conferred the observed biophysical effect – such as stabilising interactions. Consider in the diagram, if sequence 'A' encoded a protein that was optimally functional at neutral pHs and 'D' in acidic conditions, sequence changes between '5' and '2' may illustrate the precise biophysical explanation for this difference. As ASR experiments can extract ancestors that are likely billions of years old, there are often tens if not hundreds of sequence changes between ancestors themselves and ancestors and extant sequences – because of this, such sequence-function evolutionary studies can take a lot of work and rational direction. ASR can be biased due to multiple sources of error, like a biased phylogenetic tree (i.e., due to recombination) or an unrealistic substitution model.


Resurrected proteins

There are many examples of ancestral proteins that have been computationally reconstructed, expressed in living cell lines, and – in many cases – purified and biochemically studied. The Thornton lab notably resurrected several ancestral
hormone receptors A hormone receptor is a receptor molecule that binds to a specific chemical messenger . Hormone receptors are a wide family of proteins made up of receptors for thyroid and steroid hormones, retinoids and Vitamin D, and a variety of other receptor ...
(from about 500Ma) and collaborated with the Stevens lab to resurrect ancient
V-ATPase Vacuolar-type ATPase (V-ATPase) is a highly conserved evolutionarily ancient enzyme with remarkably diverse functions in eukaryotic organisms. V-ATPases acidify a wide array of intracellular organelles and pumps protons across the plasm ...
subunits from
yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are estimated to constitut ...
(800Ma). Th
Marqusee
lab has recently published several studies concerning the evolutionary biophysical history of ''E. coli'' Ribonuclease H1. Some other examples are ancestral visual pigments in vertebrates,
enzymes Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different molecule ...
in yeast that break down sugars (800Ma); enzymes in
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
that provide resistance to antibiotics (2 – 3 Ga); the ribonucleases involved in ruminant digestion; and the
alcohol dehydrogenase Alcohol dehydrogenases (ADH) () are a group of dehydrogenase enzymes that occur in many organisms and facilitate the interconversion between alcohols and aldehydes or ketones with the reduction of nicotinamide adenine dinucleotide (NAD+) to N ...
s (Adhs) involved in yeast
fermentation Fermentation is a metabolic process that produces chemical changes in organic substrates through the action of enzymes. In biochemistry, it is narrowly defined as the extraction of energy from carbohydrates in the absence of oxygen. In food ...
(~85Ma). The 'age' of a reconstructed sequence is determined using a
molecular clock The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleoti ...
model, and often several are employed. This dating technique is often calibrated using geological time-points (such as ancient ocean constituents or
BIFs MPEG-4 Part 11 ''Scene description and application engine'' was published as ISO/IEC 14496-11 in 2005. MPEG-4 Part 11 is also known as BIFS, XMT, MPEG-J. It defines: * the coded representation of the spatio-temporal positioning of audio-visual obj ...
) and while these clocks offer the only method of inferring a very ancient protein's age, they have sweeping error margins and are diffuclt to defend against contrary data. To this end, ASR 'age' should really be only used as an indicative feature and is often surpassed altogether for a measurement of the number of substitutions between the ancestral and the modern sequences (the fundiment on which the clock is calculated). That being said, the use of a clock allows one to compare observed biophysical data of an ASR protein to the geological or ecological environment at the time. For example, ASR studies on bacterial
EF-Tu EF-Tu (elongation factor thermo unstable) is a prokaryotic elongation factor responsible for catalyzing the binding of an aminoacyl-tRNA (aa-tRNA) to the ribosome. It is a G-protein, and facilitates the selection and binding of an aa-tRNA to t ...
s (proteins involved in
translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, that are likely rarely subject to HGT and typically exhibit Tms ~2C greater than Tenv) indicate a hotter Precambrian Earth which fits very closely with geological data on ancient earth ocean temperatures based on
Oxygen-18 Oxygen-18 (, Ω) is a natural, stable isotope of oxygen and one of the environmental isotopes. is an important precursor for the production of fluorodeoxyglucose (FDG) used in positron emission tomography (PET). Generally, in the radiopharmaceu ...
isotopic levels. ASR studies of yeast Adhs reveal that the emergence of subfunctionalized Adhs for ethanol metabolism (not just waste excretion) arose at a time similar to the dawn of fleshy fruit in the
Cambrian The Cambrian Period ( ; sometimes symbolized C with bar, Ꞓ) was the first geological period of the Paleozoic Era, and of the Phanerozoic Eon. The Cambrian lasted 53.4 million years from the end of the preceding Ediacaran Period 538.8 million ...
Period and that before this emergence, Adh served to excrete ethanol as a byproduct of excess pyruvate. The use of a clock also perhaps indicates that the
origin of life In biology, abiogenesis (from a- 'not' + Greek bios 'life' + genesis 'origin') or the origin of life is the natural process by which life has arisen from non-living matter, such as simple organic compounds. The prevailing scientific hypothes ...
occurred before the earliest molecular fossils indicate (>4.1Ga), but given the debatable reliability of molecular clocks, such observations should be taken with caution.


Thioredoxin

One example is the reconstruction of
thioredoxin Thioredoxin is a class of small redox proteins known to be present in all organisms. It plays a role in many important biological processes, including redox signaling. In humans, thioredoxins are encoded by ''TXN'' and '' TXN2'' genes. Loss-of-fu ...
enzymes from up to 4 billion year old organisms. Whereas the chemical activity of these reconstructed enzymes were remarkably similar to modern enzymes, their physical properties showed significantly elevated thermal and acidic stability. These results were interpreted as suggesting that ancient life may have evolved in oceans that were much hotter and more acidic than today.


Significance

These experiments address various important questions in evolutionary biology: does
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
proceed in small steps or in large leaps; is evolution reversible; how does
complexity Complexity characterises the behaviour of a system or model whose components interaction, interact in multiple ways and follow local rules, leading to nonlinearity, randomness, collective dynamics, hierarchy, and emergence. The term is generall ...
evolve? It has been shown that slight
mutations In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
in the amino acid sequence of hormone receptors determine an important change in their preferences for hormones. These changes mean huge steps in the evolution of the
endocrine system The endocrine system is a messenger system comprising feedback loops of the hormones released by internal glands of an organism directly into the circulatory system, regulating distant target organs. In vertebrates, the hypothalamus is the neu ...
. Thus very small changes at the molecular level may have enormous consequences. The Thornton lab has also been able to show that evolution is irreversible studying the
glucocorticoid receptor The glucocorticoid receptor (GR, or GCR) also known as NR3C1 (nuclear receptor subfamily 3, group C, member 1) is the receptor to which cortisol and other glucocorticoids bind. The GR is expressed in almost every cell in the body and regulates ...
. This receptor was changed by seven mutations in a cortisol receptor, but reversing these mutations didn't give the original receptor back. Indicating that
epistasis Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dep ...
plays a major role in protein evolution – an observation that in combination with the observations of several examples of parallel evolution, support the neutral network model mentioned above. Other earlier
neutral mutation Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutatio ...
s acted as a
ratchet Ratchet may refer to: Devices * Ratchet (device), a mechanical device that allows movement in only one direction * Ratchet, metonomic name for a socket wrench incorporating a ratcheting device * Ratchet (instrument), a music instrument and a ...
and made the changes to the receptor irreversible. These different experiments on receptors show that, during their evolution, proteins are greatly differentiated and this explains how complexity may evolve. A closer look at the different ancestral hormone receptors and the various
hormone A hormone (from the Greek participle , "setting in motion") is a class of signaling molecules in multicellular organisms that are sent to distant organs by complex biological processes to regulate physiology and behavior. Hormones are required ...
s shows that at the level of interaction between single amino acid residues and chemical groups of the hormones arise by very small but specific changes. Knowledge about these changes may for example lead to the synthesis of hormonal equivalents capable of mimicking or inhibiting the action of a hormone, which might open possibilities for new therapies. Given that ASR has revealed a tendency towards ancient thermostability and enzymatic promiscuity, ASR poses as a valuable tool for protein engineers who often desire these traits (producing effects sometimes greater than current, rationally lead tools). ASR also promises to 'resurrect' phenotypically similar 'ancient organisms' which in turn would allow evolutionary biochemists to probe the story of life. Proponents of ASR such as Benner state that through these and other experiments, the end of the current century will see a level of understanding in biology analogous to the one that arose in classical chemistry in the last century.


References

{{DEFAULTSORT:Ancestral Sequence Reconstruction Evolutionary biology Molecular biology Molecular evolution Paleobiology