Biological database
   HOME

TheInfoList



OR:

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics,
metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprin ...
, microarray gene expression, and
phylogenetics In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups ...
. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Biological databases can be classified by the kind of data they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.) Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole
metabolism Metabolism (, from el, μεταβολή ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run c ...
of organisms and to understanding the
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
of
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
. This knowledge helps facilitate the fight against diseases, assists in the development of
medication A medication (also called medicament, medicine, pharmaceutical drug, medicinal drug or simply drug) is a drug used to diagnose, cure, treat, or prevent disease. Drug therapy ( pharmacotherapy) is an important part of the medical field and ...
s, predicting certain genetic diseases and in discovering basic relationships among species in the
history of life The history of life on Earth traces the processes by which living and fossil organisms evolved, from the earliest emergence of life to present day. Earth formed about 4.5 billion years ago (abbreviated as ''Ga'', for ''gigaannum'') and evide ...
.


Technical basis and theoretical concepts

Relational database concepts of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
and Information retrieval concepts of
digital libraries A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital m ...
are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of bioinformatics. Data contents include gene sequences, textual descriptions, attributes and
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exi ...
classifications, citations, and tabular data. These are often described as semi-
structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
, and can be represented as tables, key delimited records, and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
structures.


Access

Most biological databases are available through web sites that organise data such that users can browse through the data online. In addition the underlying data is usually available for download in a variety of formats.
Biological data Biological data refers to a compound or information derived from living organisms and their products. A medicinal compound made from living organisms, such as a serum or a vaccine, could be characterized as biological data. Biological data is highly ...
comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example: *Text formats are provided by
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain t ...
and
OMIM Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...
. *Sequence data is provided by
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, in terms of DNA, and
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
, in terms of protein. *Protein structures are provided by PDB,
SCOP A ( or ) was a poet as represented in Old English poetry. The scop is the Old English counterpart of the Old Norse ', with the important difference that "skald" was applied to historical persons, and scop is used, for the most part, to designa ...
, and CATH.


Problems and challenges

Biological knowledge is distributed among countless databases. This sometimes makes it difficult to ensure the consistency of information, e.g. when different names are used for the same species or different data formats. As a consequence, inter-operability is a constant challenge for information exchange. For instance, if a DNA sequence database stores the DNA sequence along the name of a species, a name change of that species may break the links to other databases which may use a different name. Integrative bioinformatics is one field attempting to tackle this problem by providing unified access. One solution is how biological databases cross-reference to other databases with accession numbers to link their related knowledge together (e.g. so that the accession number stays the same even if a species name changes). Redundancy is another problem, as many databases must store the same information, e.g. protein structure databases also contain the sequence of the proteins they cover, their sequence, and their bibliographic information.


Model-organism databases

Species-specific databases are available for some species, mainly those that are often used in research ( ''model organisms''). For example, EcoCyc is an ''E. coli'' database. Other popular model organism databases include
Mouse Genome Informatics Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kenne ...
for the
laboratory mouse The laboratory mouse or lab mouse is a small mammal of the order Rodentia which is bred and used for scientific research or feeders for certain pets. Laboratory mice are usually of the species '' Mus musculus''. They are the most commonly ...
, ''Mus musculus'', the Rat Genome Database for ''Rattus'',
ZFIN The Zebrafish Information NetworkZFIN is an online biological database of information about the zebrafish (''Danio rerio''). The zebrafish is a widely used model organism for genetic, genomic, and developmental studies, and ZFIN provides an integra ...
for ''Danio Rerio'' (zebrafish), PomBase for the fission yeast ''Schizosaccharomyces pombe'',
FlyBase FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, ''Drosophila melanogaster'', a wide range of d ...
for ''Drosophila'', WormBase for the nematodes '' Caenorhabditis elegans'' and ''
Caenorhabditis briggsae ''Caenorhabditis briggsae'' is a small nematode, closely related to ''Caenorhabditis elegans''. The differences between the two species are subtle. The male tail in ''C. briggsae'' has a slightly different morphology from ''C. elegans''. Other di ...
'', and
Xenbase Xenbase is a Model Organism Database (MOD), providing informatics resources, as well as genomic and biological data on Xenopus frogs.K. Karimi et al. (2017Xenbase: a genomic, epigenomic and transcriptomic model organism database Nucleic Acids Re ...
for ''
Xenopus tropicalis The western clawed frog (''Xenopus tropicalis'') is a species of frog in the family Pipidae, also known as tropical clawed frog. It is the only species in the genus ''Xenopus'' to have a diploid genome. Its genome has been sequenced, making it a ...
'' and ''
Xenopus laevis The African clawed frog (''Xenopus laevis'', also known as the xenopus, African clawed toad, African claw-toed frog or the ''platanna'') is a species of African aquatic frog of the family Pipidae. Its name is derived from the three short claws ...
'' frogs.


Biodiversity and species databases

Numerous databases attempt to document the diversity of life on earth. A prominent example is the
Catalogue of Life The Catalogue of Life is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxonomic I ...
, first created in 2001 by Species 2000 and the Integrated Taxonomic Information System. The Catalogue of Li

is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world. The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, International Committee on Taxonomy of Viruses, ICTV MSL (for viruses), and LepIndex (for butterflies and moths). In total, the Catalogue of Life draws from 165 databases as of May 2022. Operational costs of the Catalogue of Life are paid for by the
Global Biodiversity Information Facility The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the ...
, the
Illinois Natural History Survey The Illinois Natural History Survey (abbreviated as INHS), located on the campus of the University of Illinois at Urbana–Champaign in Champaign, Illinois, is an active research institution with over 200 staff members, and it maintains one of th ...
, the
Naturalis Biodiversity Center Naturalis Biodiversity Center ( nl, Nederlands Centrum voor Biodiversiteit Naturalis) is a national museum of natural history and a research center on biodiversity in Leiden, Netherlands. It was named the European Museum of the Year 2021. ...
, and the
Smithsonian Institution The Smithsonian Institution ( ), or simply the Smithsonian, is a group of museums and education and research centers, the largest such complex in the world, created by the U.S. government "for the increase and diffusion of knowledge". Founded ...
. Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species. Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of the bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known.


Medical databases

Medical databases are a special case of biomedical data resource and can range from bibliographies, such as
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain t ...
, to image databases for the development of AI based diagnostic software. For instance, one such image database was developed with the goal of aiding in the development of wound monitoring algorithms. Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets. The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website


''Nucleic Acids Research'' Database Issue

An important resource for finding biological databases is a special yearly issue of the journal ''
Nucleic Acids Research ''Nucleic Acids Research'' is an open-access peer-reviewed scientific journal published since 1974 by the Oxford University Press. The journal covers research on nucleic acids, such as DNA and RNA, and related work. According to the ''Journal Ci ...
'' (NAR). The Database Issue of NAR is freely available, and categorizes many of the public biological databases. A companion database to the issue called the Online Molecular Biology Database Collection lists 1,380 online databases. Other collections of databases exist such as MetaBase and the Bioinformatics Links Collection.


See also

*
Biobank A biobank is a type of biorepository that stores biological samples (usually human) for use in research. Biobanks have become an important resource in medical research, supporting many types of contemporary research like genomics and personalize ...
*
Biological data Biological data refers to a compound or information derived from living organisms and their products. A medicinal compound made from living organisms, such as a serum or a vaccine, could be characterized as biological data. Biological data is highly ...
* Chemical database * Death Domain database * European Bioinformatics Institute * Gene Disease Database * Integrative bioinformatics * List of biological databases * Model organism databases *
NCBI The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
*
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain t ...
(a database of biomedical literature)


References


External links


Interactive list of biological databases
classified by categories, from
Nucleic Acids Research ''Nucleic Acids Research'' is an open-access peer-reviewed scientific journal published since 1974 by the Oxford University Press. The journal covers research on nucleic acids, such as DNA and RNA, and related work. According to the ''Journal Ci ...
, 2010
DBD: Database of Biological DatabasesBiosharing
(a database of biological databases)
Chronic Wounds Database
WoundsDB
Catalogue of Life
Catalogue of Life {{Personal genomics