Bioinformatics Databanks
   HOME

TheInfoList



OR:

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
,
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
, metabolomics,
microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Biological databases can be classified by the kind of data they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.) Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of
biomolecule A biomolecule or biological molecule is a loosely used term for molecules present in organisms that are essential to one or more typically biological processes, such as cell division, morphogenesis, or development. Biomolecules include large ...
s and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications, predicting certain genetic diseases and in discovering basic relationships among species in the history of life.


Technical basis and theoretical concepts

Relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
concepts of computer science and
Information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
concepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
. Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi- structured data, and can be represented as tables, key delimited records, and XML structures.


Access

Most biological databases are available through web sites that organise data such that users can browse through the data online. In addition the underlying data is usually available for download in a variety of formats. Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example: *Text formats are provided by PubMed and OMIM. *Sequence data is provided by GenBank, in terms of DNA, and UniProt, in terms of protein. *Protein structures are provided by PDB, SCOP, and
CATH The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and coll ...
.


Problems and challenges

Biological knowledge is distributed among countless databases. This sometimes makes it difficult to ensure the consistency of information, e.g. when different names are used for the same species or different data formats. As a consequence, inter-operability is a constant challenge for information exchange. For instance, if a DNA sequence database stores the DNA sequence along the name of a species, a name change of that species may break the links to other databases which may use a different name.
Integrative bioinformatics Integrative bioinformatics is a discipline of bioinformatics that focuses on problems of data integration for the life sciences. With the rise of high-throughput (HTP) technologies in the life sciences, particularly in molecular biology, the amoun ...
is one field attempting to tackle this problem by providing unified access. One solution is how biological databases
cross-reference The term cross-reference (abbreviation: xref) can refer to either: * An instance within a document which refers to related information elsewhere in the same document. In both printed and online dictionaries cross-references are important because ...
to other databases with accession numbers to link their related knowledge together (e.g. so that the accession number stays the same even if a species name changes). Redundancy is another problem, as many databases must store the same information, e.g. protein structure databases also contain the sequence of the proteins they cover, their sequence, and their bibliographic information.


Model-organism databases

Species-specific databases are available for some species, mainly those that are often used in research ( ''model organisms''). For example, EcoCyc is an ''E. coli'' database. Other popular
model organism databases Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large set ...
include Mouse Genome Informatics for the laboratory mouse, ''Mus musculus'', the
Rat Genome Database The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat gen ...
for ''Rattus'', ZFIN for ''Danio Rerio'' (zebrafish),
PomBase PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase webs ...
for the fission yeast ''Schizosaccharomyces pombe'', FlyBase for ''Drosophila'', WormBase for the nematodes ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
'' and '' Caenorhabditis briggsae'', and Xenbase for '' Xenopus tropicalis'' and '' Xenopus laevis'' frogs.


Biodiversity and species databases

Numerous databases attempt to document the diversity of life on earth. A prominent example is the Catalogue of Life, first created in 2001 by Species 2000 and the Integrated Taxonomic Information System. The Catalogue of Li

is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world. The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, International Committee on Taxonomy of Viruses, ICTV MSL (for viruses), and LepIndex (for butterflies and moths). In total, the Catalogue of Life draws from 165 databases as of May 2022. Operational costs of the Catalogue of Life are paid for by the Global Biodiversity Information Facility, the Illinois Natural History Survey, the Naturalis Biodiversity Center, and the Smithsonian Institution. Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species. Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of the bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known.


Medical databases

Medical databases are a special case of biomedical data resource and can range from bibliographies, such as PubMed, to image databases for the development of AI based diagnostic software. For instance, one such image database was developed with the goal of aiding in the development of wound monitoring algorithms. Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets. The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website


''Nucleic Acids Research'' Database Issue

An important resource for finding biological databases is a special yearly issue of the journal '' Nucleic Acids Research'' (NAR). The Database Issue of NAR is freely available, and categorizes many of the public biological databases. A companion database to the issue called the Online Molecular Biology Database Collection lists 1,380 online databases. Other collections of databases exist such as MetaBase and the Bioinformatics Links Collection.


See also

* Biobank * Biological data * Chemical database *
Death Domain database The Death Domain database is a secondary database of protein-protein interactions (PPI) of the death domain superfamily. Members of this superfamily are key players in apoptosis, inflammation, necrosis, and immune cell signaling pathways. Negativ ...
* European Bioinformatics Institute *
Gene Disease Database In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite inter ...
*
Integrative bioinformatics Integrative bioinformatics is a discipline of bioinformatics that focuses on problems of data integration for the life sciences. With the rise of high-throughput (HTP) technologies in the life sciences, particularly in molecular biology, the amoun ...
*
List of biological databases Biological databases are stores of biological information. The journal ''Nucleic Acids Research'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases an ...
*
Model organism databases Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large set ...
* NCBI * PubMed (a database of biomedical literature)


References


External links


Interactive list of biological databases
classified by categories, from Nucleic Acids Research, 2010
DBD: Database of Biological DatabasesBiosharing
(a database of biological databases)
Chronic Wounds Database
WoundsDB
Catalogue of Life
Catalogue of Life {{Personal genomics