HOME

TheInfoList



OR:

Ancient proteins are the ancestors of modern
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s that survive as molecular fossils. Certain structural features of functional importance, particularly relating to metabolism and reproduction, are often conserved through geologic time. Early proteins consisted of simple
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s, with more complicated amino acids being formed at a later stage through
biosynthesis Biosynthesis is a multi-step, enzyme-catalyzed process where substrates are converted into more complex products in living organisms. In biosynthesis, simple compounds are modified, converted into other compounds, or joined to form macromolecules. ...
. Such late-arising amino acids included molecules like:
histidine Histidine (symbol His or H) is an essential amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated –NH3+ form under biological conditions), a carboxylic acid group (which is in the de ...
,
phenylalanine Phenylalanine (symbol Phe or F) is an essential α-amino acid with the formula . It can be viewed as a benzyl group substituted for the methyl group of alanine, or a phenyl group in place of a terminal hydrogen of alanine. This essential amino a ...
,
cysteine Cysteine (symbol Cys or C; ) is a semiessential proteinogenic amino acid with the formula . The thiol side chain in cysteine often participates in enzymatic reactions as a nucleophile. When present as a deprotonated catalytic residue, sometime ...
,
methionine Methionine (symbol Met or M) () is an essential amino acid in humans. As the precursor of other amino acids such as cysteine and taurine, versatile compounds such as SAM-e, and the important antioxidant glutathione, methionine plays a critical ro ...
,
tryptophan Tryptophan (symbol Trp or W) is an α-amino acid that is used in the biosynthesis of proteins. Tryptophan contains an α-amino group, an α- carboxylic acid group, and a side chain indole, making it a polar molecule with a non-polar aromatic ...
, and
tyrosine -Tyrosine or tyrosine (symbol Tyr or Y) or 4-hydroxyphenylalanine is one of the 20 standard amino acids that are used by cells to synthesize proteins. It is a non-essential amino acid with a polar side group. The word "tyrosine" is from the Gr ...
. Ancient
enzymatic Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. A ...
proteins performed basic metabolic functions and required the presence of specific co-factors. The characteristics and ages of these proteins can be traced through comparisons of multiple genomes, the distribution of specific
architectures Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings o ...
, amino acid sequences, and the signatures of specific products caused by particular enzymatic activities. Alpha and beta proteins ( α/β) are considered the oldest class of proteins.
Mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
is one analytical method used to determine the mass and chemical makeup of
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s. Ancestral sequence reconstruction takes place through the collection and alignment of homologous amino acid sequences. These sequences must bear a sufficient amount of diversity to contain
phylogenetic signal Phylogenetic signal is an evolutionary and ecological term, that describes the tendency or the pattern of related biological species to resemble each other more than any other species that is randomly picked from the same phylogenetic tree. Chara ...
s that resolve evolutionary relationships and allow for further deduction of targeted ancient
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
. From there a phylogenetic tree can be constructed to illustrate the genetic resemblance between various amino acid sequences and common ancestors. The ancestral sequence is then inferred and reconstructed through maximum likelihood at the phylogenetic node(s). From there, encoding genes are synthesized, expressed, purified, and incorporated into the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
of an
extant Extant is the opposite of the word extinct. It may refer to: * Extant hereditary titles * Extant literature, surviving literature, such as ''Beowulf'', the oldest extant manuscript written in English * Extant taxon, a taxon which is not extinct, ...
host organisms. Functionality and product properties are observed and experimentally characterized. Using a greater degree of variance in representative
monomer In chemistry, a monomer ( ; ''mono-'', "one" + '' -mer'', "part") is a molecule that can react together with other monomer molecules to form a larger polymer chain or three-dimensional network in a process called polymerization. Classification Mo ...
ic proteins will increase the overall precision of the results.


History

In 1955,
Philip Abelson Philip, also Phillip, is a male given name, derived from the Greek (''Philippos'', lit. "horse-loving" or "fond of horses"), from a compound of (''philos'', "dear", "loved", "loving") and (''hippos'', "horse"). Prominent Philips who popularize ...
published a short paper that laid out what has become, through several cycles of technical advances, the field of palaeoproteomics or ancient protein research. He was the first to propose that amino acids, and therefore proteins, were present in a fossil bone millions of years old which gave clues about the evolution of very early life forms on our planet. Only a few years later, Hare and Abelson (1968) conducted another pioneering analysis on shells and found out that
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s degrade or change their internal L to D configuration progressively over time, and that this could thus be used as dating tool, in what is called amino acid dating or amino acid
racemization In chemistry, racemization is a conversion, by heat or by chemical reaction, of an optically active compound into a racemic (optically inactive) form. This creates a 1:1 molar ratio of enantiomers and is referred too as a racemic mixture (i.e. con ...
. This dating approach was later shown to be a very capable tool for dating periods extending further back than the limits of radiocarbon at ca. 50,000 years.


Structure and evolution

Ecological and geological events that changed the conditions of Earth's global environment effected the evolution of protein structure. The
Great Oxidation Event The Great Oxidation Event (GOE), also called the Great Oxygenation Event, the Oxygen Catastrophe, the Oxygen Revolution, the Oxygen Crisis, or the Oxygen Holocaust, was a time interval during the Paleoproterozoic era when the Earth's atmosphere ...
, triggered by the development of
phototroph Phototrophs () are organisms that carry out photon capture to produce complex organic compounds (e.g. carbohydrates) and acquire energy. They use the energy from light to carry out various cellular metabolic processes. It is a common misconcep ...
ic organisms like
cyanobacteria Cyanobacteria (), also known as Cyanophyta, are a phylum of gram-negative bacteria that obtain energy via photosynthesis. The name ''cyanobacteria'' refers to their color (), which similarly forms the basis of cyanobacteria's common name, blu ...
, resulted in a world-wide increase in oxygen. This pressured various groups of
anaerobic Anaerobic means "living, active, occurring, or existing in the absence of free oxygen", as opposed to aerobic which means "living, active, or occurring only in the presence of oxygen." Anaerobic may also refer to: * Anaerobic adhesive, a bonding a ...
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Connec ...
s, changing the microbial diversity and global
metabolome The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The smal ...
, as well as altering enzyme substrates and
kinetics Kinetics ( grc, κίνησις, , kinesis, ''movement'' or ''to move'') may refer to: Science and medicine * Kinetics (physics), the study of motion and its causes ** Rigid body kinetics, the study of the motion of rigid bodies * Chemical ki ...
. Certain areas of proteins are more prone to undergo evolutionary change at a rapid rate, while others are unusually tolerant. Essential genes - or sequences of genetic material responsible for protein architecture, structure,
catalytic Catalysis () is the process of increasing the rate of a chemical reaction by adding a substance known as a catalyst (). Catalysts are not consumed in the reaction and remain unchanged after it. If the reaction is rapid and the catalyst recyc ...
metal co-factor binding centers, or interaction - will experience little change compared to the rest of the genetic material. Portions of this material will be confronted with genetic
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
s that affect amino acid sequencing. These mutations laid the ground for other mutations and interactions that had major consequences towards protein structure and function, resulting in proteins with similar sequences serving entirely different purposes. Joseph Thornton, an
evolutionary biologist Evolutionary biology is the subfield of biology that studies the evolutionary processes (natural selection, common descent, speciation) that produced the diversity of life on Earth. It is also defined as the study of the history of life for ...
, researched
steroid A steroid is a biologically active organic compound with four rings arranged in a specific molecular configuration. Steroids have two principal biological functions: as important components of cell membranes that alter membrane fluidity; and a ...
hormone A hormone (from the Greek participle , "setting in motion") is a class of signaling molecules in multicellular organisms that are sent to distant organs by complex biological processes to regulate physiology and behavior. Hormones are required ...
s and their binding receptors to map their evolutionary relationship. He inserted DNA molecules, equipped with reconstructed amino acid sequences from ancient proteins, into
in-vitro ''In vitro'' (meaning in glass, or ''in the glass'') studies are performed with microorganisms, cells, or biological molecules outside their normal biological context. Colloquially called " test-tube experiments", these studies in biology and ...
cells to make them synthesize ancestral proteins. The team discovered that reconstructed ancestral protein were capable of reconfiguration in response to multiple hormones. Additional studies conducted by other research teams indicate the evolutionary development of greater protein specificity over time. Ancestral organisms required proteins - mainly enzymes - capable of catalyzing a broad range of biochemical reactions to survive with a limited proteostome.
Subfunctionalization Subfunctionalization was proposed by Stoltzfus (1999) and Force et al. (1999) as one of the possible outcomes of functional divergence that occurs after a gene duplication event, in which pairs of genes that originate from duplication, or paralog ...
and gene duplication in multifunctional and promiscuous proteins led to the development of simpler molecules with the ability to perform more specific tasks. Not all studies concur however. Some results suggest evolutionary trends through less-specific intermediates or molecules bearing two high-specificity states or decreased specificity altogether. A second apparent evolutionary trend is the global transition away from
thermostability In materials science and molecular biology, thermostability is the ability of a substance to resist irreversible change in its chemical or physical structure, often by resisting decomposition or polymerization, at a high relative temperature. ...
for
mesophilic A mesophile is an organism that grows best in moderate temperature, neither too hot nor too cold, with an optimum growth range from . The optimum growth temperature for these organisms is 37°C. The term is mainly applied to microorganisms. Organi ...
protein lineages. The temperature at which various ancient proteins melt was correlated with the optimum growth temperature of extinct or extant organisms. The higher temperatures of the
Precambrian The Precambrian (or Pre-Cambrian, sometimes abbreviated pꞒ, or Cryptozoic) is the earliest part of Earth's history, set before the current Phanerozoic Eon. The Precambrian is so named because it preceded the Cambrian, the first period of the ...
affected optimum growth temperatures. Higher thermostability in proteinaceous structures facilitated their survival under more critical conditions.
Heterogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
environments, neutral drift, random adaptations, mutations, and evolution are some of the factors that influenced this non-linear transition and caused fluctuation in thermostability. This led to the development of alternative mechanisms of surviving fluctuating environmental conditions. Certain ancestral proteins followed alternative evolutionary routes to obtain the same functional outcomes. Organisms that evolved along different pathways developed proteins that performed similar functions. In some cases, changing a single amino acid was enough to provide an entirely new function. Other ancestral sequences became over-stabilized and were incapable of
conformational change In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors. A macromolecule is usually flexible and dynamic. Its shape can change in response to changes in its environment or oth ...
s in response to shifting environmental stimuli.


Palaeoproteomics


Overview

Paleoproteomics Ancient proteins are the ancestors of modern proteins that survive as molecular fossils. Certain structural features of functional importance, particularly relating to metabolism and reproduction, are often conserved through geologic time. Early p ...
is a relatively young and rapidly growing field of molecular science in which
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
-based sequencing technology is used to resolve
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
identification and evolutionary relationships of extinct
taxa In biology, a taxon (back-formation from ''taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular nam ...
. While complementary to
paleogenomics Paleogenomics is a field of science based on the reconstruction and analysis of genomic information in extinct species. Improved methods for the extraction of ancient DNA (aDNA) from museum artifacts, ice cores, archeological or paleontological site ...
in application, the study of ancient proteins has the potential to reveal older, more complete
phylogenies A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
due to the relative stability of
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s in
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s as compared to the
nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...
s of DNA. Ancient protein studies can further reveal types and sources of recovered tissues, as well as the developmental stages of
fossil A fossil (from Classical Latin , ) is any preserved remains, impression, or trace of any once-living thing from a past geological age. Examples include bones, shells, exoskeletons, stone imprints of animals or microbes, objects preserved ...
ized specimens. Paleoproteomics can also be extended to
archaeological Archaeology or archeology is the scientific study of human activity through the recovery and analysis of material culture. The archaeological record consists of artifacts, architecture, biofacts or ecofacts, sites, and cultural landscap ...
materials such as textiles, animal skins, food remains, and pottery. Palaeoproteomics is a neologism used to describe the application of
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
(MS)-based approaches to the study of ancient proteomes. As with palaeogenomics (the study of
ancient DNA Ancient DNA (aDNA) is DNA isolated from ancient specimens. Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is more degraded in comparison with contemporary genetic material. Even under the bes ...
, aDNA), it intersects
evolutionary biology Evolutionary biology is the subfield of biology that studies the evolutionary processes (natural selection, common descent, speciation) that produced the diversity of life on Earth. It is also defined as the study of the history of life fo ...
,
archaeology Archaeology or archeology is the scientific study of human activity through the recovery and analysis of material culture. The archaeological record consists of artifacts, architecture, biofacts or ecofacts, sites, and cultural landscap ...
and
anthropology Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, societies, and linguistics, in both the present and past, including past human species. Social anthropology studies patterns of behavi ...
, with applications ranging from the phylogenetic reconstruction of extinct species to the investigation of past human diets and ancient diseases. The field was pioneered when Peggy Ostrom used MALDI-TOF with post source decy to sequence osteocalcin in 50,000 year old bison bone. With the advent of soft-ionization and use of coupled liquid chromatography (LC) and tandem MS systems (i.e. MALDI-TOF-MS and LC-MS/MS), the high-throughput of information about ancient proteins has grown significantly. One reason for this, is that proteins have been shown to be robust molecule in archaeological and other Quaternary samples. In situations where ancient DNA has long since degraded to sub-useful fragments, protein sequencing has helped to answer phylogenetic questions such as the placing of ''
Toxodon ''Toxodon'' (meaning "bow tooth" in reference to the curvature of the teeth) is an extinct genus of South American mammals from the Late Miocene to early Holocene epochs (Mayoan to Lujanian in the SALMA classification) (about 11.6 million to 11 ...
'' sp. Analysis of more complex protein mixtures, are just emerging. Analysis of binding adhesives usually thought to derive from birch resin (''Betula'' L.), have shown to actually be animal glue.


Background

Philip Abelson Philip, also Phillip, is a male given name, derived from the Greek (''Philippos'', lit. "horse-loving" or "fond of horses"), from a compound of (''philos'', "dear", "loved", "loving") and (''hippos'', "horse"). Prominent Philips who popularize ...
first characterized the findings of ancient
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
residues from
fossil A fossil (from Classical Latin , ) is any preserved remains, impression, or trace of any once-living thing from a past geological age. Examples include bones, shells, exoskeletons, stone imprints of animals or microbes, objects preserved ...
ized materials in 1955, proposing that the
peptide bond In organic chemistry, a peptide bond is an amide type of covalent chemical bond linking two consecutive alpha-amino acids from C1 (carbon number one) of one alpha-amino acid and N2 (nitrogen number two) of another, along a peptide or protein cha ...
s of proteins might persist for millions of years. These initial discoveries were limited by available methodologies, and so
protein sequencing Protein sequencing is the practical process of determining the amino acid sequence of all or part of a protein or peptide. This may serve to identify the protein or characterize its post-translational modifications. Typically, partial sequencing o ...
remained an elusive idea for almost four decades. In 2000,
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
( MS) revealed the presence of
osteocalcin Osteocalcin, also known as bone gamma-carboxyglutamic acid-containing protein (BGLAP), is a small (49-amino-acid) noncollagenous protein hormone found in bone and dentin, first identified as a calcium-binding protein. Because osteocalcin has gl ...
in ancient bone samples and ignited a renewed interest in
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
’s potential as a tool for molecular
paleontology Paleontology (), also spelled palaeontology or palæontology, is the scientific study of life that existed prior to, and sometimes including, the start of the Holocene epoch (roughly 11,700 years before present). It includes the study of fossi ...
. The development of higher resolution instruments further increased the efficiency and depth of ancient protein recovery. In 2012, the first extended fossil bone
proteome The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. ...
from a
Pleistocene The Pleistocene ( , often referred to as the ''Ice age'') is the geological Epoch (geology), epoch that lasted from about 2,580,000 to 11,700 years ago, spanning the Earth's most recent period of repeated glaciations. Before a change was fina ...
mammoth A mammoth is any species of the extinct elephantid genus ''Mammuthus'', one of the many genera that make up the order of trunked mammals called proboscideans. The various species of mammoth were commonly equipped with long, curved tusks and, ...
femur was confidently retrieved and identified, strengthening the future of paleoproteomics research.


Palaeoproteomes


Collagen Type I

The analysis of ancient bone
proteome The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. ...
s has primarily focused on the identification of
collagen type I Type I collagen is the most abundant collagen of the human body. It forms large, eosinophilic fibers known as collagen fibers. It is present in scar tissue, the end product when tissue heals by repair, as well as tendons, ligaments, the endomys ...
(
COL1 Type I collagen is the most abundant collagen of the human body. It forms large, eosinophilic fibers known as collagen fibers. It is present in scar tissue, the end product when tissue heals by repair, as well as tendons, ligaments, the endom ...
), the dominant protein found in mineralized tissues.
Collagen Collagen () is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole ...
is highly conserved across species and comprises about 90% of organic bone compounds. Fibrillar collagens, of which
COL1 Type I collagen is the most abundant collagen of the human body. It forms large, eosinophilic fibers known as collagen fibers. It is present in scar tissue, the end product when tissue heals by repair, as well as tendons, ligaments, the endom ...
is categorized, are thought to have evolved from a common
metazoan Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia. With few exceptions, animals Heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, are Motility, able to move, ca ...
ancestor, thus contributing to their abundance and importance in the fossil record.  
Collagen Collagen () is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole ...
has also been found to survive much longer than other non-collagenous proteins in fossilized specimens, and the protein remains intact beyond the degradation of
ancient DNA Ancient DNA (aDNA) is DNA isolated from ancient specimens. Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is more degraded in comparison with contemporary genetic material. Even under the bes ...
( aDNA). Its tightly coiled triple-helical structure (consisting of two genetically identical alpha-1 chains and a third genetically distinct alpha-2 chain) and
hydrophobic In chemistry, hydrophobicity is the physical property of a molecule that is seemingly repelled from a mass of water (known as a hydrophobe). In contrast, hydrophiles are attracted to water. Hydrophobic molecules tend to be nonpolar and, th ...
composition also make this protein an excellent candidate for survival, even in temperate and humid climates that support the rapid break down of
organic molecules In chemistry, organic compounds are generally any chemical compounds that contain carbon-hydrogen or carbon-carbon bonds. Due to carbon's ability to catenate (form chains with other carbon atoms), millions of organic compounds are known. The s ...
. The
taxonomic Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
resolution of
collagen Collagen () is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole ...
has been thoroughly investigated, and it is known that
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
substitutions can be resolved to the
genus Genus ( plural genera ) is a taxonomic rank used in the biological classification of extant taxon, living and fossil organisms as well as Virus classification#ICTV classification, viruses. In the hierarchy of biological classification, genus com ...
level in most medium and large
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
s.
Species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
-level identification is also possible, even in small
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
remains from high thermal climates. It is for these reasons that
COL1 Type I collagen is the most abundant collagen of the human body. It forms large, eosinophilic fibers known as collagen fibers. It is present in scar tissue, the end product when tissue heals by repair, as well as tendons, ligaments, the endom ...
remains a key protein in paleoproteomics and
phylogenetic In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups o ...
investigations.


Non-collagenous proteins

The remaining 10% of organic bone molecules are non-collagenous proteins (NCPs). The most abundant NCP,
osteocalcin Osteocalcin, also known as bone gamma-carboxyglutamic acid-containing protein (BGLAP), is a small (49-amino-acid) noncollagenous protein hormone found in bone and dentin, first identified as a calcium-binding protein. Because osteocalcin has gl ...
, is a bone and
dentin Dentin () (American English) or dentine ( or ) (British English) ( la, substantia eburnea) is a calcified tissue of the body and, along with enamel, cementum, and pulp, is one of the four major components of teeth. It is usually covered by ena ...
protein involved in bone assembly, often used as a marker for the bone formation process. Preserved
osteocalcin Osteocalcin, also known as bone gamma-carboxyglutamic acid-containing protein (BGLAP), is a small (49-amino-acid) noncollagenous protein hormone found in bone and dentin, first identified as a calcium-binding protein. Because osteocalcin has gl ...
was first detected via
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
( MALDI-MS) in 10,000 year-old bison bone and a 53,000-year-old walrus bone, revealing
phylogenetic In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups o ...
reconstruction potential beyond the temporal limits of aDNA. More advanced
proteomic Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
techniques have enabled the investigation of additional NCPs present in the bone
extracellular matrix In biology, the extracellular matrix (ECM), also called intercellular matrix, is a three-dimensional network consisting of extracellular macromolecules and minerals, such as collagen, enzymes, glycoproteins and hydroxyapatite that provide stru ...
. Though
type I collagen Type I collagen is the most abundant collagen of the human body. It forms large, eosinophilic fibers known as collagen fibers. It is present in scar tissue, the end product when tissue heals by repair, as well as tendons, ligaments, the endomys ...
is the longest lived protein identified in fossilized bone specimens, the identification and sequencing of NCPs may allow for a greater
taxonomic Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
resolution than collagen-based methods.


Other proteins

Proteomic Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
analysis has also been applied to other fossilized and ancient materials. The examination of damaged artifacts through the sequencing of their
keratin Keratin () is one of a family of structural fibrous proteins also known as ''scleroproteins''. Alpha-keratin (α-keratin) is a type of keratin found in vertebrates. It is the key structural material making up scales, hair, nails, feathers, ho ...
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s has allowed researchers to discriminate between horn and hoof remains of important species used at archeological sites. The
keratin Keratin () is one of a family of structural fibrous proteins also known as ''scleroproteins''. Alpha-keratin (α-keratin) is a type of keratin found in vertebrates. It is the key structural material making up scales, hair, nails, feathers, ho ...
of textiles and animal skins worn by
Ötzi Ötzi, also called the Iceman, is the natural mummy of a man who lived some time between 3350 and 3105 BC, discovered in September 1991 in the Ötztal Alps (hence the nickname "Ötzi") on the border between Austria and Italy. Ötzi is believed to ...
, the Iceman, were also identified using
peptide mass fingerprinting Peptide mass fingerprinting (PMF) (also known as protein fingerprinting) is an analytical technique for protein identification in which the unknown protein of interest is first cleaved into smaller peptides, whose absolute masses can be accurately ...
(PMF) from the ancient samples and from reference species. Immune response proteins have illuminated the presence of infections and diseases in multiple studies of
mummified A mummy is a dead human or an animal whose soft tissues and organs have been preserved by either intentional or accidental exposure to chemicals, extreme cold, very low humidity, or lack of air, so that the recovered body does not decay furt ...
human remains. Additionally, the identification of egg proteins,
casein Casein ( , from Latin ''caseus'' "cheese") is a family of related phosphoproteins (CSN1S1, αS1, aS2, CSN2, β, K-casein, κ) that are commonly found in mammalian milk, comprising about 80% of the proteins in cow's milk and between 20% and 60% of ...
s, whey
globulin The globulins are a family of globular proteins that have higher molecular weights than albumins and are insoluble in pure water but dissolve in dilute salt solutions. Some globulins are produced in the liver, while others are made by the immune ...
s, and other proteinaceous materials used as binders in the paint of historical artworks has allowed for a better understanding of proper conservation methods.


Sequencing methods

Analysis of a fossil sample begins with demineralization of the bone/tooth mineral matrix.
Trypsin Trypsin is an enzyme in the first section of the small intestine that starts the digestion of protein molecules by cutting these long chains of amino acids into smaller pieces. It is a serine protease from the PA clan superfamily, found in the dig ...
is commonly used to digest the protein residues into
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s which can then be purified and analyzed.


Peptide mass fingerprinting

Peptide mass fingerprinting Peptide mass fingerprinting (PMF) (also known as protein fingerprinting) is an analytical technique for protein identification in which the unknown protein of interest is first cleaved into smaller peptides, whose absolute masses can be accurately ...
(PMF) is an analytical technique that can be applied to the digested protein mixture. The masses of the unknown
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s can be detected with a
mass spectrometer Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
like
MALDI In mass spectrometry, matrix-assisted laser desorption/ionization (MALDI) is an ionization technique that uses a laser energy absorbing matrix to create ions from large molecules with minimal fragmentation. It has been applied to the analysis of ...
(
matrix-assisted laser desorption/ionization In mass spectrometry, matrix-assisted laser desorption/ionization (MALDI) is an ionization technique that uses a laser energy absorbing matrix to create ions from large molecules with minimal fragmentation. It has been applied to the analysis of ...
) or ESI (
electrospray ionization Electrospray ionization (ESI) is a technique used in mass spectrometry to produce ions using an electrospray in which a high voltage is applied to a liquid to create an aerosol. It is especially useful in producing ions from macromolecules becaus ...
), combined with a mass analyzer, and then compared to masses of
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s that are predicted to derive from known proteins.


Liquid Chromatography-Tandem Mass Spectrometry

The inclusion of
liquid chromatography In chemical analysis, chromatography is a laboratory technique for the Separation process, separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it ...
(LC) solvents can enhance peptide
electrospray ionization Electrospray ionization (ESI) is a technique used in mass spectrometry to produce ions using an electrospray in which a high voltage is applied to a liquid to create an aerosol. It is especially useful in producing ions from macromolecules becaus ...
. When combined with
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
, this allows for a much greater number of peptide
ion An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by conve ...
s to be analyzed with improved fragment spectra. Liquid chromatography-mass spectrometry LC-MS can then be performed for peptide mass fingerprinting; however, liquid chromatography-tandem mass spectrometry ( LC-MS/MS) is most often used in the case of ancient
proteome The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. ...
analysis due to the nature of these complex samples.


De novo sequencing

At the present time, protein databases remain limited to particular taxa, so novel protein sequences that differ greatly from those available will not be identified via the protein search engines. Manual interpretation through
de novo sequencing In mass spectrometry, de novo peptide sequencing is the method in which a peptide amino acid sequence is determined from tandem mass spectrometry. Knowing the amino acid sequence of peptides from a protein digest is essential for studying the biolo ...
remains a viable solution until these databases become more robust, and this technique will allow for identification of amino acid substitutions not previously reported. A hybrid solution of error-tolerant search algorithms that use protein sequence databases while allowing amino acid substitutions may similarly enable the identification of novel single amino acid polymorphisms (SAPs).


Challenges


Dinosaur collagen

A 2007
paleontology Paleontology (), also spelled palaeontology or palæontology, is the scientific study of life that existed prior to, and sometimes including, the start of the Holocene epoch (roughly 11,700 years before present). It includes the study of fossi ...
study reported the alleged discovery of
endogenous Endogenous substances and processes are those that originate from within a living system such as an organism, tissue, or cell. In contrast, exogenous substances and processes are those that originate from outside of an organism. For example, es ...
collagen
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s in 68 mya ''
Tyrannosaurus rex ''Tyrannosaurus'' is a genus of large theropod dinosaur. The species ''Tyrannosaurus rex'' (''rex'' meaning "king" in Latin), often called ''T. rex'' or colloquially ''T-Rex'', is one of the best represented theropods. ''Tyrannosaurus'' live ...
'' fossils. This claim purported survival beyond experimental decay rates, leading to controversy in the emerging field. The same team again reported finding similar collagen
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
sequence matches in 2009 from 80 mya
hadrosaur Hadrosaurids (), or duck-billed dinosaurs, are members of the ornithischian family Hadrosauridae. This group is known as the duck-billed dinosaurs for the flat duck-bill appearance of the bones in their snouts. The ornithopod family, which includ ...
fossils belonging to ''
Brachylophosaurus ''Brachylophosaurus'' ( or ; meaning "short-crested lizard", Greek ''brachys'' = short + ''lophos'' = crest + ''sauros'' = lizard, referring to its small crest) was a mid-sized member of the hadrosaurid family of dinosaurs. It is known from sev ...
canadensis''. Subsequent studies have reanalyzed the original '' T. rex'' sequence data to infer that the sample was predominantly laboratory contaminants, soil bacteria, and bird-like
hemoglobin Hemoglobin (haemoglobin BrE) (from the Greek word αἷμα, ''haîma'' 'blood' + Latin ''globus'' 'ball, sphere' + ''-in'') (), abbreviated Hb or Hgb, is the iron-containing oxygen-transport metalloprotein present in red blood cells (erythrocyte ...
and
collagen Collagen () is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole ...
; the former protein is typically only seen in relatively recent samples. Another exceptionally preserved hadrosaur from the
Hell Creek Formation The Hell Creek Formation is an intensively studied division of mostly Upper Cretaceous and some lower Paleocene rocks in North America, named for exposures studied along Hell Creek, near Jordan, Montana. The formation stretches over portions of ...
(USA), yielded none of the previous findings despite extensive testing, and only the presence of protein breakdown products were detected. Further experimentation demonstrated that contamination from other specimens present in the '' T. rex'' lab cannot be ruled out. Every
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
that was considered unique to both dinosaurs in the 2009 study could be matched to modern
ostrich Ostriches are large flightless birds of the genus ''Struthio'' in the order Struthioniformes, part of the infra-class Palaeognathae, a diverse group of flightless birds also known as ratites that includes the emus, rheas, and kiwis. There are ...
with much greater confidence than could be placed on their own, unique identifications. While there have been several methods described to support the authenticity of paleoproteomics, including immunological or
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
composition and
racemization In chemistry, racemization is a conversion, by heat or by chemical reaction, of an optically active compound into a racemic (optically inactive) form. This creates a 1:1 molar ratio of enantiomers and is referred too as a racemic mixture (i.e. con ...
data, both of these approaches have limitations and are known to yield false-positive reactions in fossils. Great care must be taken to rule out contamination by determining whether sequences differ from those of all extant taxa present in the laboratory environments.
Deamidation Deamidation is a chemical reaction in which an amide functional group in the side chain of the amino acids asparagine or glutamine is removed or converted to another functional group. Typically, asparagine is converted to aspartic acid or isoaspa ...
has also been proposed as an effective method for distinguishing between endogenous and contaminating NCPs, when extraction protocols may permit for this evaluation. This kind of research has almost the same goals with projects focused on studying the impact of differential therapeutic treatments on the hippocampus proteome of depressed mice but also has tremendously different sets of instruments.


Future perspectives

Paleoproteomics is still a young field, with most complex proteomes only being discovered in the last decade. Current proteomic methods greatly suffer from the fact that it is not a true form of sequencing, relying on probability-matching against expected results. While several MS methods are being employed to increase the robustness of retrievable data, these techniques also increase the sensitivities to contamination.


Other associated fields

Zooarcheology Zooarchaeology (sometimes called archaeozoology), also known as faunal analysis, is a branch of archaeology that studies remains of animals from archaeological sites. Faunal remains are the items left behind when an animal dies. These include bon ...
uses mass spectrometry and protein analyses to determine the evolutionary relationship between different animal species due to differences in proteinaceous mass, for instance
collagen Collagen () is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole ...
. Techniques such as
shotgun proteomics Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun seq ...
allows researchers to identify
proteome The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. ...
s and the exact sequences of amino acids within different kinds of proteins. These sequences can be compared to other organisms within different
clade A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English term, ...
s to determine their evolutionary relationships within the phylogenetic tree. Proteins are also more preserved in fossils than DNA, allowing researchers to recover proteins from the enamel of 1.8 million year old animal teeth and mineral crystals of 3.8 million year old eggshells.


Applications and products

Combined genome and protein sequencing research has allowed for scientists to further piece together narratives of archaic environmental conditions and past evolutionary relationships. Research into the thermostability of protein structures permits predictions of past global temperatures. Ancestral sequence reconstruction further reveals the origins of human ethanol metabolism and the evolution of various species. An example of this would be the identification and differentiation of
Denisovan The Denisovans or Denisova hominins ) are an extinct species or subspecies of archaic human that ranged across Asia during the Lower and Middle Paleolithic. Denisovans are known from few physical remains and consequently, most of what is known ...
hominids The Hominidae (), whose members are known as the great apes or hominids (), are a taxonomic family of primates that includes eight extant species in four genera: '' Pongo'' (the Bornean, Sumatran and Tapanuli orangutan); ''Gorilla'' (the east ...
from modern ''
Homo sapiens sapiens Human taxonomy is the classification of the human species (systematic name ''Homo sapiens'', Latin: "wise man") within zoological taxonomy. The systematic genus, ''Homo'', is designed to include both anatomically modern humans and extinct varie ...
'' through amino acid variants in collagen obtained from the former's teeth. The study of ancient proteins has not only helped to determine the evolutionary history of
viral protein A viral protein is both a component and a product of a virus. Viral proteins are grouped according to their functions, and groups of viral proteins include structural proteins, nonstructural proteins, regulatory proteins, and accessory proteins. Vi ...
s but facilitated the development of new drugs.


Benefits and limitations

Understanding protein function and evolution provides new methods of engineering and controlling evolutionary pathways to produce useful templates and byproducts - more specifically, proteins with high thermostability and broad substrate specificity. Multiple limitations as well as possible sources of error must be taken under consideration, and possible solutions or alternatives put into place. The statistical construction of ancient proteins is unverifiable and will not have identical amino acid sequences to ancestral proteins. Reconstruction can also be affected by multiple factors including: mutations; turnover rates - as prokaryotic species are more prone to genetic change than their
eukaryotic Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
counterparts, making it harder to determine their proteomic past; amino acid distribution; and limited resources of fully sequenced genomes and amino acid sequences of extant species. Ancestral protein reconstruction also assumes that certain homologous phenotypes actually existed within ancient proteinaceous populations when in fact the data recovered is but an estimate consensus of the total pre-existing diversity. Inadequate taxonomic sampling can lead to inaccurate phylogenetic trees due to
long branch attraction In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lin ...
. Proteins can also get degraded over time into small fragments and have modern proteins incorporated into them - making identification difficult or inaccurate. Last but not least, fossilized remnants contain minuscule amounts of proteins that can be used for further study and identification and actually provide less information regarding evolutionary patterns compared to genome sequences. Additional concerns regarding ancestral sequence reconstruction (ASR) method would lie in the underlying bias in thermostability due to the usage of maximum-likelihood in obtaining data. This makes ancient proteins appear more stable than they actually were. Using alternative reconstruction methods - for instance the
Bayesian method Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and e ...
that incorporates and averages over the level of uncertainty - could provide a comparable reference regarding ancestral stability. However, this method performs poor reconstructions and may not accurately reflect actual conditions.


See also

*
Epistasis Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dep ...


References

{{Authority control Proteomics Omics Proteins Paleobiology