OrthoDB
   HOME

TheInfoList



OR:

OrthoDB presents a catalog of
orthologous Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
protein-coding genes across
vertebrates Vertebrates () comprise all animal taxa within the subphylum Vertebrata () ( chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, ...
,
arthropods Arthropods (, (gen. ποδός)) are invertebrate animals with an exoskeleton, a Segmentation (biology), segmented body, and paired jointed appendages. Arthropods form the phylum Arthropoda. They are distinguished by their jointed limbs and Arth ...
,
fungi A fungus ( : fungi or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and molds, as well as the more familiar mushrooms. These organisms are classified as a kingdom, separately from ...
,
plants Plants are predominantly Photosynthesis, photosynthetic eukaryotes of the Kingdom (biology), kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all curr ...
, and
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
. Orthology refers to the last common ancestor of the species under consideration, and thus OrthoDB explicitly delineates orthologs at each major radiation along the species phylogeny. The database of orthologs presents available protein descriptors, together with
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and g ...
and
InterPro InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them. The contents of InterPro ...
attributes, which serve to provide general descriptive annotations of the orthologous groups, and facilitate comprehensive orthology database querying. OrthoDB also provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and gene intron-exon architectures. In
comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
, the importance of scale cannot be underestimated. As gene orthology delineation requires specific expertise and considerable computational resources, scale is something that individual non-specialist research groups cannot accomplish on their own. This challenging task is achieved by OrthoDB, with very comprehensive sets of species and several unique features such as the extensive functional and evolutionary annotations of orthologous groups, with the integration of many useful links to other world-leading databases that focus on capturing information about gene function. No genome can exist as a useful data source without extensive comparative analyses with other genomes – OrthoDB provides a critically important resource for comparative genomics for the entire community of researchers from those interested in grand evolutionary questions to those focused on the specific biological functions of individual genes.


Methodology

Orthology is defined relative to the last common ancestor of the species being considered, thereby determining the hierarchical nature of orthologous classifications. This is explicitly addressed in OrthoDB by application of the orthology delineation procedure at each major radiation point of the considered phylogeny. The OrthoDB implementation employs a Best-Reciprocal-Hit (BRH) clustering algorithm based on all-against-all Smith–Waterman protein sequence comparisons. Gene set pre-processing selects the longest protein-coding transcript of alternatively spliced genes and of very similar gene copies. The procedure triangulates BRHs to progressively build the clusters and requires an overall minimum sequence alignment overlap to avoid domain walking. These core clusters are further expanded to include all more closely related within-species in-paralogs, and the previously identified very similar gene copies.


Data content

The database contains some 600 eukaryotic species and more than 3600 bacteria sourced from
Ensembl Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
,
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
,
NCBI The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
,
FlyBase FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, ''Drosophila melanogaster'', a wide range of ...
, and several other databases. The ever-increasing sampling of sequenced genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Examples of studies that have employed data from OrthoDB include ''comparative analyses of gene repertoire evolution'', ''comparisons of fruit fly and mosquito developmental genes'', ''analyses of bloodmeal- or infection-induced changes in gene expression in mosquitoes'', ''analysis of the evolution of mammalian milk production'', and ''mosquito gene and genome evolution''. Others studies citing OrthoDB can be found a
PubMed
an
Google Scholar


Performance

OrthoDB has performed consistently well in benchmarking assessments alongside other orthology delineation procedures. Results were compared to reference trees for three well-conserved protein families, and to a larger set of curated protein families.


BUSCO

Benchmarking sets of Universal Single-Copy Orthologs - Orthologous groups are selected from OrthoDB for the root-level classifications of arthropods, vertebrates, metazoans, fungi, and other major clades. Groups are required to contain single-copy orthologs in at least 90% of the species (in others they may be lost or duplicated), and the missing species cannot all be from the same clade. Species with frequent losses or duplications are removed from the selection unless they hold a key position in the phylogeny. BUSCOs are therefore expected to be found as single-copy orthologs in any newly sequenced genome from the appropriate phylogenetic clade, and can be used to analyse newly sequenced genomes to assess their relative completeness. The BUSCO assessment tool and datasets (accessibl
here
are being widely used in many genomics projects, with most journal editors now requiring such quality assessments before accepting new genome publications.


Notes and references


See also

*
Homology (biology) In biology, homology is similarity due to shared ancestry between a pair of structures or genes in different taxa. A common example of homologous structures is the forelimbs of vertebrates, where the wings of bats and birds, the arms of pri ...
*
Phylogeny A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
*
List of biological databases Biological databases are stores of biological information. The journal ''Nucleic Acids Research'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases an ...


External links

* {{official website, http://www.orthodb.org Biological databases Evolutionary biology Phylogenetics