HOME

TheInfoList



OR:

Comparative genomics is a field of
biological research Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary in ...
in which the
genomic Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
features of different
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
s are compared. The genomic features may include the
DNA sequence DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
,
genes In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
, gene order,
regulatory sequences A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and vi ...
, and other genomic structural landmarks. In this branch of
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
, whole or large parts of genomes resulting from
genome projects Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus) and to annotate protein-coding genes and ot ...
are compared to study basic biological similarities and differences as well as
evolutionary Evolution is change in the heredity, heritable Phenotypic trait, characteristics of biological populations over successive generations. These characteristics are the Gene expression, expressions of genes, which are passed on from parent to ...
relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of
alignment Alignment may refer to: Archaeology * Alignment (archaeology), a co-linear arrangement of features or structures with external landmarks * Stone alignment, a linear arrangement of upright, parallel megalithic standing stones Biology * Structu ...
of genome sequences and looking for
ortholog Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
ous sequences (sequences that share a
common ancestry Common descent is a concept in evolutionary biology applicable when one species is the ancestor of two or more species later in time. All living beings are in fact descendants of a unique ancestor commonly referred to as the last universal com ...
) in the aligned genomes and checking to what extent those sequences are conserved. Based on these,
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
and
molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
are inferred and this may in turn be put in the context of, for example,
phenotypic In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
evolution or
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
. Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria ''
Haemophilus influenzae ''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, non-motile, coccobacillary, facultatively anaerobic, capnophilic pathogenic bacterium of the family Pasteurellaceae. The bacteria ...
'' and ''
Mycoplasma genitalium ''Mycoplasma genitalium'' (''MG'', commonly known as Mgen) is a sexually transmitted, small and pathogenic bacterium that lives on the mucous epithelial cells of the urinary and genital tracts in humans. Medical reports published in 2007 and 2 ...
'') in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. With the explosion in the number of
genome projects Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus) and to annotate protein-coding genes and ot ...
due to the advancements in
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
technologies, particularly the
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
methods in late 2000s, this field has become more sophisticated, making it possible to deal with many genomes in a single study. Comparative genomics has revealed high levels of similarity between closely related organisms, such as
humans Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
and
chimpanzees The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative the ...
, and, more surprisingly, similarity between seemingly distantly related organisms, such as humans and the yeast ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungus microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been o ...
''. It has also showed the extreme diversity of the gene composition in different evolutionary lineages.


History

''See also'': History of genomics Comparative genomics has a root in the comparison of
virus A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Since Dmitri Ivanovsky's 1 ...
genomes in the early 1980s. For example, small RNA viruses infecting animals (
picornaviruses Picornaviruses are a group of related nonenveloped RNA viruses which infect vertebrates including fish, mammals, and birds. They are viruses that represent a large family of small, positive-sense, single-stranded RNA viruses with a 30 nm ...
) and those infecting plants ( cowpea mosaic virus) were compared and turned out to share significant sequence similarity and, in part, the order of their genes. In 1986, the first comparative genomic study at a larger scale was published, comparing the genomes of
varicella-zoster virus Varicella-zoster virus (VZV), also known as human herpesvirus 3 (HHV-3, HHV3) or ''Human alphaherpesvirus 3'' (taxonomically), is one of nine known herpes viruses that can infect humans. It causes chickenpox (varicella) commonly affecting chil ...
and Epstein-Barr virus that contained more than 100 genes each. The first complete genome sequence of a cellular organism, that of ''
Haemophilus influenzae ''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, non-motile, coccobacillary, facultatively anaerobic, capnophilic pathogenic bacterium of the family Pasteurellaceae. The bacteria ...
'' Rd, was published in 1995. The second genome sequencing paper was of the small parasitic bacterium ''Mycoplasma genitalium'' published in the same year. Starting from this paper, reports on new genomes inevitably became comparative-genomic studies. ''Microbial genomes.'' The first high-resolution whole genome comparison system of microbial genomes of 10-15kbp was developed in 1998 by Art Delcher, Simon Kasif and Steven Salzberg and applied to the comparison of entire highly related microbial organisms with their collaborators at the Institute for Genomic Research (TIGR). The system is called
MUMMER Mummers' plays are folk plays performed by troupes of amateur actors, traditionally all male, known as mummers or guisers (also by local names such as ''rhymers'', ''pace-eggers'', ''soulers'', ''tipteerers'', ''wrenboys'', and ''galoshins''). ...
and was described in a publication in Nucleic Acids Research in 1999. The system helps researchers to identify large rearrangements, single base mutations, reversals, tandem repeat expansions and other polymorphisms. In bacteria, MUMMER enables the identification of polymorphisms that are responsible for virulence, pathogenicity, and anti-biotic resistance. The system was also applied to the Minimal Organism Project at TIGR and subsequently to many other comparative genomics projects. ''Eukaryote genomes.'' ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungus microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been o ...
'', the baker's yeast, was the first
eukaryote Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
to have its complete genome sequence published in 1996. After the publication of the roundworm ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
'' genome in 1998 and together with the fruit fly ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...
'' genome in 2000, Gerald M. Rubin and his team published a paper titled "Comparative Genomics of the Eukaryotes", in which they compared the genomes of the
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
''D. melanogaster'', ''C. elegans'', and ''S. cerevisiae'', as well as the
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Connec ...
''H. influenzae''. At the same time, Bonnie Berger,
Eric Lander Eric Steven Lander (born February 3, 1957) is an American mathematician and geneticist who served as the 11th director of the Office of Science and Technology Policy and Science Advisor to the President, serving on the presidential Cabinet. Lan ...
, and their team published a paper on whole-genome comparison of human and mouse. With the publication of the large genomes of vertebrates in the 2000s, including
human Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
, the
Japanese pufferfish ''Takifugu'' is a genus of pufferfish, often better known by the Japanese name . There are 25 species belonging to the genus ''Takifugu'' and most of these are native to salt and brackish waters of the northwest Pacific, but a few species are ...
''
Takifugu rubripes ''Takifugu rubripes'', commonly known as the Japanese puffer, Tiger puffer, or torafugu ( ja, 虎河豚), is a pufferfish in the genus '' Takifugu''. It is distinguished by a very small genome that has been fully sequenced because of its use as a ...
'', and
mouse A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
, precomputed results of large genome comparisons have been released for downloading or for visualization in a
genome browser In bioinformatics, a genome browser is a graphical interface for display of information from a biological database Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughp ...
. Instead of undertaking their own analyses, most biologists can access these large cross-species comparisons and avoid the impracticality caused by the size of the genomes.
Next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
methods, which were first introduced in 2007, have produced an enormous amount of genomic data and have allowed researchers to generate multiple (prokaryotic) draft genome sequences at once. These methods can also quickly uncover
single-nucleotide polymorphisms In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...
, insertions and deletions by mapping unassembled reads against a well
annotated An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For ann ...
reference genome, and thus provide a list of possible gene differences that may be the basis for any functional variation among strains.


Evolutionary principles

One character of biology is evolution,
evolutionary theory Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
is also the theoretical foundation of comparative genomics, and at the same time the results of comparative genomics unprecedentedly enriched and developed the theory of evolution. When two or more of the genome sequence are compared, one can deduce the evolutionary relationships of the sequences in a phylogenetic tree. Based on a variety of biological genome data and the study of vertical and horizontal evolution processes, one can understand vital parts of the gene structure and its regulatory function. Similarity of related genomes is the basis of comparative genomics. If two creatures have a recent common ancestor, the differences between the two species genomes are evolved from the ancestors’ genome. The closer the relationship between two organisms, the higher the similarities between their genomes. If there is close relationship between them, then their genome will display a linear behaviour (
synteny In genetics, the term synteny refers to two related concepts: * In classical genetics, ''synteny'' describes the physical co-localization of genetic loci on the same chromosome within an individual or species. * In current biology, ''synteny'' mo ...
), namely some or all of the genetic sequences are conserved. Thus, the genome sequences can be used to identify gene function, by analyzing their homology (sequence similarity) to genes of known function. Orthologous sequences are related sequences in different species: a gene exists in the original species, the species divided into two species, so genes in new species are orthologous to the sequence in the original species. Paralogous sequences are separated by gene cloning (gene duplication): if a particular gene in the genome is copied, then the copy of the two sequences is paralogous to the original gene. A pair of orthologous sequences is called orthologous pairs (orthologs), a pair of paralogous sequence is called collateral pairs (paralogs). Orthologous pairs usually have the same or similar function, which is not necessarily the case for collateral pairs. In collateral pairs, the sequences tend to evolve into having different functions. Comparative genomics exploits both similarities and differences in the
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
,
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, and
regulatory regions A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and vir ...
of different organisms to infer how
selection Selection may refer to: Science * Selection (biology), also called natural selection, selection in evolution ** Sex selection, in genetics ** Mate selection, in mating ** Sexual selection in humans, in human sexuality ** Human mating strateg ...
has acted upon these elements. Those elements that are responsible for similarities between different
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
should be conserved through time (
stabilizing selection Stabilizing selection (not to be confused with negative or purifying selection) is a type of natural selection in which the population mean stabilizes on a particular non-extreme trait value. This is thought to be the most common mechanism of a ...
), while those elements responsible for differences among species should be divergent (
positive selection In population genetics, directional selection, is a mode of negative natural selection in which an extreme phenotype is favored over other phenotypes, causing the allele frequency to shift over time in the direction of that phenotype. Under dir ...
). Finally, those elements that are unimportant to the evolutionary success of the organism will be unconserved (selection is neutral). One of the important goals of the field is the identification of the mechanisms of eukaryotic genome evolution. It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. For this reason comparative genomics studies of small
model organisms A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the working ...
(for example the model
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
and closely related
Caenorhabditis briggsae ''Caenorhabditis briggsae'' is a small nematode, closely related to ''Caenorhabditis elegans''. The differences between the two species are subtle. The male tail in ''C. briggsae'' has a slightly different morphology from ''C. elegans''. Other di ...
) are of great importance to advance our understanding of general mechanisms of evolution.


Methods

Computational approaches are necessary for genome comparisons, given the large amount of data encoded in genomes. Many tools are now publicly available, ranging from whole genome comparisons to
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
analysis. This includes approaches from systems and control, information theory, string analysis and data mining. Computational approaches will remain critical for research and teaching, especially when information science and genome biology is taught in conjunction. Comparative genomics starts with basic comparisons of genome size and gene density. For instance, genome size is important for coding capacity and possibly for regulatory reasons. High gene density facilitates
genome annotation DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanati ...
, analysis of environmental selection. By contrast, low gene density hampers the mapping of genetic disease as in the human genome.


Sequence alignment

Alignments are used to capture information about similar sequences such as ancestry, common evolutionary descent, or common structure and function. Alignments can be done for both genetic and protein sequences. Alignments consist of local or global pairwise alignments, and multiple sequence alignments. One way to find global alignments is to use a dynamic programming algorithm known as Needleman-Wunsch algorithm. This algorithm can be modified and used to find local alignments.


Phylogenetic reconstruction

Another computational method for comparative genomics is phylogenetic reconstruction. It is used to describe evolutionary relationships in terms of common ancestors. The relationships are usually represented in a tree called a
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
.  Similarly,
coalescent theory Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, m ...
is a retrospective model to trace alleles of a gene in a population to a single ancestral copy shared by members of the population. This is also known as the
most recent common ancestor In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA) or concestor, of a set of organisms is the most recent individual from which all the organisms of the set are descended. The ...
. Analysis based on coalescence theory tries predicting the amount of time between the introduction of a mutation and a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed. The inheritance relationships are visualized in a form similar to a phylogenetic tree. Coalescence (or the gene genealogy) can be visualized using dendrograms.


Genome maps

An additional method in comparative genomics is
genetic mapping Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separ ...
. In genetic mapping, visualizing
synteny In genetics, the term synteny refers to two related concepts: * In classical genetics, ''synteny'' describes the physical co-localization of genetic loci on the same chromosome within an individual or species. * In current biology, ''synteny'' mo ...
is one way to see the preserved order of genes on chromosomes. It is usually used for chromosomes of related species, both of which result from a common ancestor. This and other methods can shed light on evolutionary history. A recent study used comparative genomics to reconstruct 16 ancestral 
karyotypes A karyotype is the general appearance of the complete set of metaphase chromosomes in the cells of a species or in an individual organism, mainly including their sizes, numbers, and shapes. Karyotyping is the process by which a karyotype is disce ...
across the mammalian phylogeny. The computational reconstruction showed how chromosomes rearranged themselves during mammal evolution. It gave insight into conservation of select regions often associated with the control of developmental processes. In addition, it helped to provide an understanding of chromosome evolution and
genetic diseases A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
associated with DNA rearrangements.


Tools

Computational tools for analyzing sequences and complete genomes are developing quickly due to the availability of large amount of genomic data. At the same time, comparative analysis tools are progressed and improved. In the challenges about these analyses, it is very important to visualize the comparative results. Visualization of sequence conservation is a tough task of comparative sequence analysis. As we know, it is highly inefficient to examine the alignment of long genomic regions manually. Internet-based genome browsers provide many useful tools for investigating genomic sequences due to integrating all sequence-based biological information on genomic regions. When we extract large amount of relevant biological data, they can be very easy to use and less time-consuming. * UCSC Browser: This site contains the reference sequence and working draft assemblies for a large collection of genomes. *
Ensembl Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
: The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online. * MapView: The Map Viewer provides a wide variety of genome mapping and sequencing data. *
VISTA Vista usually refers to a distant view. Vista may also refer to: Software *Windows Vista, the line of Microsoft Windows client operating systems released in 2006 and 2007 * VistA, (Veterans Health Information Systems and Technology Architecture) ...
is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. It was built to visualize the results of comparative analysis based on DNA alignments. The presentation of comparative data generated by VISTA can easily suit both small and large scale of data. * BlueJay Genome Browser: a stand-alone visualization tool for the multi-scale viewing of annotated genomes and other genomic elements. An advantage of using online tools is that these websites are being developed and updated constantly. There are many new settings and content can be used online to improve efficiency.


Selected applications


Agriculture

Agriculture Agriculture or farming is the practice of cultivating plants and livestock. Agriculture was the key development in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that enabled people to ...
is a field that reaps the benefits of comparative genomics. Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and
disease resistance Disease resistance is the ability to prevent or reduce the presence of diseases in otherwise susceptible hosts. It can arise from genetic or environmental factors, such as incomplete penetrance. Disease tolerance is different as it is the ability o ...
. For example, one genome wide association study conducted on 517 rice
landrace A landrace is a domesticated, locally adapted, often traditional variety of a species of animal or plant that has developed over time, through adaptation to its natural and cultural environment of agriculture and pastoralism, and due to isolation ...
s revealed 80 loci associated with several categories of agronomic performance, such as grain weight,
amylose Amylose is a polysaccharide made of α-D-glucose units, bonded to each other through α(1→4) glycosidic bonds. It is one of the two components of starch, making up approximately 20–30%. Because of its tightly packed helical structure, amylose ...
content, and
drought tolerance Drought tolerance is the ability to which a plant maintains its biomass production during arid or drought conditions. Some plants are naturally adapted to dry conditions'','' surviving with protection mechanisms such as desiccation tolerance, detox ...
. Many of the loci were previously uncharacterized. Not only is this methodology powerful, it is also quick. Previous methods of identifying loci associated with agronomic performance required several generations of carefully monitored breeding of parent strains, a time-consuming effort that is unnecessary for comparative genomic studies.


Medicine


Vaccine development

The medical field also benefits from the study of comparative genomics. In an approach known as
reverse vaccinology Reverse vaccinology is an improvement of vaccinology that employs bioinformatics and reverse pharmacology practices, pioneered by Rino Rappuoli and first used against Serogroup B meningococcus. Since then, it has been used on several other bact ...
, researchers can discover candidate antigens for vaccine development by analyzing the genome of a
pathogen In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a germ ...
or a family of pathogens. Applying a comparative genomics approach by analyzing the genomes of several related pathogens can lead to the development of vaccines that are multiprotective. A team of researchers employed such an approach to create a universal vaccine for
Group B Streptococcus ''Streptococcus agalactiae'' (also known as group B streptococcus or GBS) is a gram-positive coccus (round bacterium) with a tendency to form chains (as reflected by the genus name '' Streptococcus''). It is a beta- hemolytic, catalase-negative ...
, a group of bacteria responsible for severe
neonatal infection Neonatal infections are infections of the neonate (newborn) acquired during prenatal development or in the first four weeks of life (neonatal period). Neonatal infections may be contracted by mother to child transmission, in the birth canal dur ...
. Comparative genomics can also be used to generate specificity for vaccines against pathogens that are closely related to commensal microorganisms. For example, researchers used comparative genomic analysis of
commensal Commensalism is a long-term biological interaction (symbiosis) in which members of one species gain benefits while those of the other species neither benefit nor are harmed. This is in contrast with mutualism, in which both organisms benefit fro ...
and pathogenic strains of ''E. coli'' to identify pathogen-specific genes as a basis for finding antigens that result in immune response against pathogenic strains but not commensal ones. In May 2019, using the Global Genome Set, a team in the UK and Australia sequenced thousands of globally-collected isolates o
Group A Streptococcus
providing potential targets for developing a vaccine against the pathogen, also known as ''S. pyogenes''.


Mouse models in immunology

T cells A T cell is a type of lymphocyte. T cells are one of the important white blood cells of the immune system and play a central role in the adaptive immune response. T cells can be distinguished from other lymphocytes by the presence of a T-cell re ...
(also known as a T lymphocytes or a thymocytes) are
immune cells White blood cells, also called leukocytes or leucocytes, are the cells of the immune system that are involved in protecting the body against both infectious disease and foreign invaders. All white blood cells are produced and derived from mult ...
that grow from stem cells in the bone marrow. They assist to defend the body from infection and may aid in the fight against cancer. Because of their morphological, physiological, and genetic resemblance to humans, mice and rats have long been the preferred species for biomedical research
animal model An animal model (short for animal disease model) is a living, non-human, often genetic-engineered animal used during the research and investigation of human disease, for the purpose of better understanding the disease process without the risk of ha ...
s. Comparative Medicine Research is built on the ability to use information from one species to understand the same processes in another. We can get new insights into molecular pathways by comparing human and mouse T cells and their effects on the immune system utilizing comparative genomics. In order to comprehend its TCRs and their genes, Glusman conducted research on the sequencing of the human and mouse T cell receptor loci. TCR genes are well-known and serve as a significant resource for supporting functional genomics and understanding how genes and intergenic regions of the genome contribute to biological processes. T-cell immune receptors are important in seeing the world of pathogens in the cellular immune system. One of the reasons for sequencing the human and mouse TCR loci was to match the orthologous gene family sequences and discover conserved areas using comparative genomics. These, it was thought, would reflect two sorts of biological information: (1) exons and (2)
regulatory sequence A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and vir ...
s. In fact, the majority of V, D, J, and C exons could be identified in this method. The variable regions are encoded by multiple unique DNA elements that are rearranged and connected during T cell (TCR) differentiation: variable (V), diversity (D), and joining (J) elements for the and polypeptides; and V and J elements for the and polypeptides. igure 1However, several short noncoding conserved blocks of the genome had been shown. Both human and mouse motifs are largely clustered in the 200 bp igure 2 the known 3′ enhancers in the TCR/ were identified, and a conserved region of 100 bp in the mouse J intron was subsequently shown to have a regulatory function. Comparisons of the genomic sequences within each physical site or location of a specific gene on a chromosome (locs) and across species allow for research on other mechanisms and other regulatory signals. Some suggest new hypotheses about the evolution of TCRs, to be tested (and improved) by comparison to the TCR gene complement of other vertebrate species. A comparative genomic investigation of humans and mice will obviously allow for the discovery and annotation of many other genes, as well as identifying in other species for regulatory sequences.


Research

Comparative genomics also opens up new avenues in other areas of research. As DNA sequencing technology has become more accessible, the number of sequenced genomes has grown. With the increasing reservoir of available genomic data, the potency of comparative genomic inference has grown as well. A notable case of this increased potency is found in recent
primate Primates are a diverse order of mammals. They are divided into the strepsirrhines, which include the lemurs, galagos, and lorisids, and the haplorhines, which include the tarsiers and the simians (monkeys and apes, the latter including huma ...
research. Comparative genomic methods have allowed researchers to gather information about
genetic variation Genetic variation is the difference in DNA among individuals or the differences between populations. The multiple sources of genetic variation include mutation and genetic recombination. Mutations are the ultimate sources of genetic variation, ...
, differential gene expression, and evolutionary dynamics in primates that were indiscernible using previous data and methods.


Great Ape Genome Project

The Great Ape Genome Project used comparative genomic methods to investigate genetic variation with reference to the six
great ape The Hominidae (), whose members are known as the great apes or hominids (), are a taxonomic family of primates that includes eight extant species in four genera: '' Pongo'' (the Bornean, Sumatran and Tapanuli orangutan); ''Gorilla'' (the east ...
species, finding healthy levels of variation in their gene pool despite shrinking population size. Another study showed that patterns of DNA methylation, which are a known regulation mechanism for gene expression, differ in the prefrontal cortex of humans versus chimps, and implicated this difference in the evolutionary divergence of the two species.


See also

* Data mining *
Molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
*
Comparative anatomy Comparative anatomy is the study of similarities and differences in the anatomy of different species. It is closely related to evolutionary biology and phylogeny (the evolution of species). The science began in the classical era, continuing in t ...
*
Homology Homology may refer to: Sciences Biology *Homology (biology), any characteristic of biological organisms that is derived from a common ancestor * Sequence homology, biological homology between DNA, RNA, or protein sequences *Homologous chrom ...
*
Sequence mining Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time serie ...
*
Alignment-free sequence analysis In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches. The emergence and need for the analysis of different types of data generated through biolo ...


References


Further reading

* * * * * * * * * *


External links


Genomes OnLine Database (GOLD)

Genome News Network

JCVI Comprehensive Microbial Resource

Pathema: A Clade Specific Bioinformatics Resource Center

CBS Genome Atlas Database

The UCSC Genome Browser

The U.S. National Human Genome Research Institute

Ensembl
The
Ensembl Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
Genome Browser
Genolevures, comparative genomics of the Hemiascomycetous yeasts

Phylogenetically Inferred Groups (PhIGs)
a recently developed method incorporates phylogenetic signals in building gene clusters for use in comparative genomics.
Metazome
, a resource for the phylogenomic exploration and analysis of Metazoan gene families.
IMG
The Integrated Microbial Genomes system, for comparative genome analysis by the DOE-JGI.
Dcode.org
Dcode.org Comparative Genomics Center.
SUPERFAMILY
Protein annotations for all completely sequenced organisms
Comparative Genomics

Blastology and Open Source: Needs and Deeds

Alignment-free comparative Genomics tool
{{DEFAULTSORT:Comparative Genomics Evolutionary biology Genomics Comparisons