The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of
open-source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States
National Institutes of Health
The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late ...
,
National Science Foundation
The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National I ...
and the
USDA Agricultural Research Service.
History
The GMOD project was started in the early 2000s as a collaboration between several
model organism databases
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large set ...
(MODs) who shared a need to create similar
software
Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work.
At the lowest programming level, executable code consists ...
tools for processing data from sequencing projects. MODs, or organism-specific
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s, describe
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
and other information about important
experimental organisms in the life sciences and capture the large volumes of data and information being generated by modern
biology
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
. Rather than each group designing their own software, four major MODs--
FlyBase
FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, ''Drosophila melanogaster'', a wide range of ...
,
Saccharomyces Genome Database
The ''Saccharomyces'' Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast ''Saccharomyces cerevisiae'', which is commonly known as baker's or budding yeast.
''Saccharomyces'' Genome Database
The SGD ...
,
Mouse Genome Database
Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kenned ...
, and
WormBase—worked together to create applications that provide functionality needed by all MODs, such as software to help manage the data within the MOD, and to help users access and query the data.
The GMOD project works to keep software components interoperable. To this end, many of the tools use a common input/output file format or run off a Chado schema database.
Chado database schema
The Chado
schema aims to cover many of the classes of data frequently used by modern biologists, from genetic data to phylogenetic trees to publications to organisms to microarray data to IDs to RNA/protein expression. Chado makes extensive use of controlled vocabularies to type all entities in the database; for example: genes, transcripts, exons, transposable elements, etc., are stored in a feature table, with the type provided by
Sequence Ontology
The Sequence Ontology (SO) is an ontology
In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality.
Ontology addresses questions like how entities are grouped int ...
. When a new type is added to the Sequence Ontology, the feature table requires no modification, only an update of the data in the database. The same is largely true of analysis data that can be stored in Chado as well.
The existing core modules of Chado are:
*sequence - for sequences/features
*cv - for controlled-vocabs/ontologies
*general - currently just dbxrefs
*organism - taxonomic data
*pub - publication and references
*companalysis - augments sequence module with computational analysis data
*map - non-sequence maps
*genetic - genetic and phenotypic data
*expression - gene expression
*natural diversity - population data
Software
The full list of GMOD software components is found on the GMOD Components page. These components include:
Participating databases
The following organism databases are contributing to and/or adopting GMOD components for model organism databases.
Related projects
*
Bioperl,
BioJava
BioJava is an open-source software project dedicated to provide Java tools to process biological data.VS Matha and P Kangueane, 2009, ''Bioinformatics: a concept-based introduction'', 2009. p26 BioJava is a set of library functions written in the ...
,
Biopython,
BioRuby, etc.
*
Ensembl
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
*
Gene Ontology
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and g ...
*DAS
*Genomics Unified Schema
*Manatee: Manual Annotation Tool
*Biocurator.org
*
Open Biomedical Ontologies
The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a s ...
*Sequence Ontology Project
See also
*
Biological database
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including geno ...
*
Genome project
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus) and to annotate protein-coding genes and ot ...
*
Genomics
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
*
Genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
References
{{Reflist, 30em
External links
GMOD website
Model organism databases
Genomics
Bioinformatics software