HOME

TheInfoList



OR:

Phylogenetic profiling is a
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
technique in which the joint presence or joint absence of two traits across large numbers of species is used to infer a meaningful biological connection, such as involvement of two different proteins in the same
biological pathway A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on a ...
. Along with examination of conserved
synteny In genetics, the term synteny refers to two related concepts: * In classical genetics, ''synteny'' describes the physical co-localization of genetic loci on the same chromosome within an individual or species. * In current biology, ''synteny'' mo ...
, conserved
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
structure, or "Rosetta Stone" domain fusions, comparing phylogenetic profiles is a designated "post-homology" technique, in that the computation essential to this method begins after it is determined which proteins are homologous to which. A number of these techniques were developed by
David Eisenberg David S. Eisenberg (born 15 March 1939) is an American biochemist and biophysicist best known for his contributions to structural biology and computational molecular biology, a professor at the University of California, Los Angeles since the earl ...
and colleagues; phylogenetic profile comparison was introduced in 1999 by Pellegrini, ''et al.''


Method

Over 2000 species of
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
,
archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaebac ...
, and
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
are now represented by complete DNA
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
sequences. Typically, each
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
in a genome encodes a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
that can be assigned to a particular
protein family A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be c ...
on the basis of
homology Homology may refer to: Sciences Biology *Homology (biology), any characteristic of biological organisms that is derived from a common ancestor * Sequence homology, biological homology between DNA, RNA, or protein sequences *Homologous chrom ...
. For a given protein family, its presence or absence in each genome (in the original, binary, formulation) is represented by either 1 (present) or 0 (absent). Consequently, the
phylogenetic In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups o ...
distribution of the protein family can be represented by a long binary number with a digit for each genome; such binary representations are easily compared with each other to search for correlated phylogenetic distributions. The large number of complete genomes makes these profiles rich in information. The advantage of using only complete genomes is that the 0 values, representing the absence of a trait, tend to be reliable.


Theory

Closely related species should be expected to have very similar sets of genes. However, changes accumulate between more distantly related species by processes that include
horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between Unicellular organism, unicellular and/or multicellular organisms other than by the ("vertical") transmission of DNA from parent to offsprin ...
and gene loss. Individual proteins have specific molecular functions, such as carrying out a single enzymatic reaction or serving as one subunit of a larger protein complex. A biological process such as
photosynthesis Photosynthesis is a process used by plants and other organisms to convert light energy into chemical energy that, through cellular respiration, can later be released to fuel the organism's activities. Some of this chemical energy is stored i ...
,
methanogenesis Methanogenesis or biomethanation is the formation of methane coupled to energy conservation by microbes known as methanogens. Organisms capable of producing methane for energy conservation have been identified only from the domain Archaea, a group ...
, or
histidine Histidine (symbol His or H) is an essential amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated –NH3+ form under biological conditions), a carboxylic acid group (which is in the de ...
biosynthesis may require the concerted action of many proteins. If some protein critical to a process is lost, other proteins dedicated to that process would become useless;
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charle ...
makes it unlikely these useless proteins will be retained over evolutionary time. Therefore, should two different protein families consistently tend to be either present or absent together, a likely
hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...
is that the two proteins cooperate in some biological process.


Advances and challenges

Phylogenetic profiling has led to numerous discoveries in biology, including previously unknown enzymes in
metabolic pathway In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites, which are modified by a sequence of chemical reac ...
s,
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...
s that bind to conserved
regulatory site In biochemistry, allosteric regulation (or allosteric control) is the regulation of an enzyme by binding an effector molecule at a site other than the enzyme's active site. The site to which the effector binds is termed the ''allosteric site ...
s, and explanations for roles of certain mutations in
human disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
. Improving the method itself is an active area of scientific research because the method itself faces several limitations. First, co-occurrence of two protein families often represents recent common ancestry of two species rather than a conserved functional relationship; disambiguating these two sources of correlation may require improved statistical methods. Second, proteins grouped as homologs may differ in function, or proteins conserved in function may fail to register as homologs; improved methods for tailoring the size of each protein family to reflect functional conservation will lead to improved results.


Tools

Tools include PLEX (Protein Link Explorer). (Now defunct) and JGI IMG (Integrated Microbial Genomes) Phylogenetic Profiler (for both single genes and
gene cassette In biology, a gene cassette is a type of mobile genetic element that contains a gene and a recombination site. Each cassette usually contains a single gene and tends to be very small; on the order of 500–1000 base pairs. They may exist incorpora ...
s).


Notes

{{reflist Bioinformatics