List Of Biological Databases
   HOME

TheInfoList



OR:

Biological databases Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genom ...
are stores of biological information. The journal ''
Nucleic Acids Research ''Nucleic Acids Research'' is an open-access peer-reviewed scientific journal published since 1974 by the Oxford University Press. The journal covers research on nucleic acids, such as DNA and RNA, and related work. According to the ''Journal Cit ...
'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases
Omics Discovery Index
can be used to browse and search several biological databases.


Meta databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism. etadatabase is a database model for metadata management, global query of independent database, and distributed data processing. The word metadatabase is an addition to the dictionary originally ,metadata was only common term referring simply to ''data about data '' such a tags ,keywords, and markup headers. *
ConsensusPathDB The ConsensusPathDB is a molecular functional interaction database, integrating information on protein interactions, genetic interactions signaling, metabolism, gene regulation, and drug-target interactions in humans. ConsensusPathDB currently (re ...
: a molecular functional interaction database, integrating information from 12 others *
Entrez The Entrez (pronounced ''ɒnˈtreɪ'') Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information ...
(
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
) *
Neuroscience Information Framework The Neuroscience Information Framework is a repository of global neuroscience web resources, including experimental, clinical, and translational neuroscience databases, knowledge bases, atlases, and genetic/ genomic resources and provides many aut ...
(
University of California, San Diego The University of California, San Diego (UC San Diego or colloquially, UCSD) is a public university, public Land-grant university, land-grant research university in San Diego, California. Established in 1960 near the pre-existing Scripps Insti ...
): integrates hundreds of neuroscience relevant resources; many are listed below


Model organism databases

Model organism databases Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large set ...
provide in-depth biological data for intensively studied organisms. *
PomBase PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase webs ...
: the knowledgebase for the fission yeast ''
Schizosaccharomyces pombe ''Schizosaccharomyces pombe'', also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically meas ...
'' *''Subti''Wiki: integrated database for the model bacterium ''
Bacillus subtilis ''Bacillus subtilis'', known also as the hay bacillus or grass bacillus, is a Gram-positive, catalase-positive bacterium, found in soil and the gastrointestinal tract of ruminants, humans and marine sponges. As a member of the genus ''Bacillu ...
''


Nucleic acid databases


DNA databases

The primary databases make up the International Nucleotide Sequence Database (INSD). The include: *
DNA Data Bank of Japan The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Dat ...
(
National Institute of Genetics The National Institute of Genetics ("Japanese Institute of Genetics") is a Japanese institution founded in 1949. It hosts the DNA Data Bank of Japan The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is ...
) *
EMBL The European Molecular Biology Laboratory (EMBL) is an intergovernmental organization dedicated to molecular biology research and is supported by 27 member states, two prospect states, and one associate member state. EMBL was created in 1974 and ...
(
European Bioinformatics Institute The European Bioinformatics Institute (EMBL-EBI) is an Intergovernmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Well ...
) *
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
(
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
) DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
data from all
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
s. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with
Sequence Read Archive The Sequence Read Archive (SRA, previously known as the Short Read Archive) is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typ ...
(SRA), which archives raw reads from high-throughput sequencing instruments. Secondary databases are: *
23andMe 23andMe Holding Co. is a publicly held personal genomics and biotechnology company based in South San Francisco, California. It is best known for providing a direct-to-consumer genetic testing service in which customers provide a saliva sample t ...
's database *
HapMap The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease a ...
*
OMIM Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM r ...
(Online Mendelian Inheritance in Man): inherited diseases *
RefSeq The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences ( DNA, RNA) and their protein products. RefSeq was first introduced in 2000. This database is built by National ...
*
1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
EggNOG Database:
a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Other databases *
Nucleosome positioning region database Nucleosome Positioning Region Database (NPRD) is a database of nucleosome formation sites (NFSs). See also References External links * http://srs6.bionet.nsc.ru/srs6/. Biological databases Genetics databases {{Biodatabase-stub ...


Gene expression databases (mostly microarray data)


Genome databases

These databases collect
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single
model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
genome.


Phenotype databases

*
PHI-base The Pathogen-Host Interactions database (PHI-base) is a biological database that contains curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database is maintained by researchers at Rotham ...
: pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer reviewed literature. * RGD
Rat Genome Database The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat gen ...
: genomic and phenotype data for ''
Rattus norvegicus ''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus. Species and description The best-known ''Rattus'' species are the black rat (''R. rattus'') ...
'' *
PomBase PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase webs ...
database: manually curated phenotypic data for the yeast ''
Schizosaccharomyces pombe ''Schizosaccharomyces pombe'', also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically meas ...
''


RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
databases

*
miRBase In bioinformatics, miRBase is a biological database that acts as an archive of microRNA sequences and annotations. As of September 2010 it contained information about 15,172 microRNAs. This number has risen to 38,589 by March 2018. The miRBase re ...
: the
microRNA MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miRN ...
database * PolymiRTS: a database of DNA variations in putative microRNA target sites * PolyQ: database of polyglutamine repeats in
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
and non-disease associated proteins *
Rfam Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janel ...
: a database of RNA families


Amino acid / protein databases

Several publicly available data repositories and resources have been developed to support and manage
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
related information, biological knowledge discovery and data-driven hypothesis generation. The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the
UniProtKB UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
. Most of these databases are cross-referenced with
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
/
UniProtKB UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
so that identifiers can be mapped to each other.


Protein sequence Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthesi ...
databases


Protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer ma ...
databases

*
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cry ...
(PDB), comprising: ** Protein DataBank in Europe (PDBe) ** ProteinDatabank in Japan (PDBj) ** Research Collaboratory for Structural Bioinformatics (RCSB) *
Structural Classification of Proteins The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine t ...
(SCOP) *
CATH The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and coll ...
: Protein Structure Classification database For more protein structure databases, see also
Protein structure database In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the ...
.


Protein model databases

*
ModBase ModBase is a database of annotated comparative protein structure models, containing models for more than 3.8 million unique protein sequences. Models are created by the comparative modeling pipeline ModPipe which relies on the MODELLER program. ...
: database of comparative protein structure models ( Sali Lab,
UCSF The University of California, San Francisco (UCSF) is a public land-grant research university in San Francisco, California. It is part of the University of California system and is dedicated entirely to health science and life science. It condu ...
) * Similarity Matrix of Proteins ( SIMAP): database of protein similarities computed using
FASTA FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics. History The original FASTA program ...
* Swiss-model: server and repository for protein structure models * AAindex: database of amino acid indices, amino acid mutation matrices, and pair-wise contact potentials


Protein-protein and other molecular interactions

* BioGRID: general repository for interaction datasets (
Samuel Lunenfeld Research Institute The Lunenfeld-Tanenbaum Research Institute is a medical research, medical research institute in Toronto, Ontario and part of the Sinai Health System. It was originally established in 1985 as the Samuel Lunenfeld Research Institute, the research ...
) *
RNA-binding protein database The RNA-binding Proteins Database (RBPDB) is a biological database of RNA-binding protein specificities that includes experimental observations of RNA-binding sites. The experimental results included are both in vitro and in vivo from primary liter ...
*
Database of Interacting Proteins In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
( Univ. of California) *
IntAct Intact can refer to: * An entire building, generally in good condition not dilapidated or ruins *Intact (group of companies), a Romanian media trust *''Intact'' (album) and "Intact" (song) by Ned's Atomic Dustbin *''Intacto'', a film *Entire (anima ...
: open-source database for molecular interactions (
EMBL-EBI The European Bioinformatics Institute (EMBL-EBI) is an Intergovernmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Well ...
)


Protein expression databases

*
Human Protein Atlas The Human Protein Atlas (HPA) is a Swedish-based program started in 2003 with the aim to map all the human proteins in cells, tissues and organs using integration of various omics technologies, including antibody-based imaging, mass spectrometr ...
: aims at mapping all the human proteins in cells, tissues and organs


Signal transduction pathway databases

*
NCI-Nature Pathway Interaction Database The Pathway Interaction Database (PID) is a free biomedical database of human cellular signaling pathways.{{cite journal , last=Schaefer, first=Carl F, author2=Anthony Kira , author3=Krupa Shiva , author4=Buchoff Jeffrey , author5=Day Matthew , aut ...
* Netpath: curated resource of
signal transduction pathways Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events, most commonly protein phosphorylation catalyzed by protein kinases, which ultimately results in a cellula ...
in humans *
Reactome Reactome is a free online database of biological pathways. There are several Reactomes that concentrate on specific organisms, the largest of these is focused on human biology, the following description concentrates on the human Reactome. It is au ...
: navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling (
Ontario Institute for Cancer Research The Ontario Institute for Cancer Research (OICR) is a not-for-profit organization based in Toronto, Ontario, Canada that focuses on the prevention, early detection, diagnosis and treatment of cancer. OICR intends to make Ontario more effective i ...
,
European Bioinformatics Institute The European Bioinformatics Institute (EMBL-EBI) is an Intergovernmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Well ...
,
NYU Langone Medical Center NYU Langone Health is an academic medical center located in New York City, New York, United States. The health system consists of NYU Grossman School of Medicine and NYU Long Island School of Medicine, both part of New York University (NYU), and m ...
,
Cold Spring Harbor Laboratory Cold Spring Harbor Laboratory (CSHL) is a private, non-profit institution with research programs focusing on cancer, neuroscience, plant biology, genomics, and quantitative biology. It is one of 68 institutions supported by the Cancer Centers ...
) *
WikiPathways WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of a ...


Metabolic pathway and protein function databases


Taxonomic databases

Numerous databases collect information about
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
and other
taxonomic Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
categories. The Catalogue of Life is a special case as it is a meta-database of about 150 specialized "global species databases" (GSDs) that have collected the names and other information on (almost) all described and thus "known" species. *
BacDive Bac''Dive'' (the Bacterial Diversity Metadatabase) is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity. Introduction Bac''Dive'' is a resource for different kind of metadata like taxonomy ...
: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information *
Catalogue of Life The Catalogue of Life is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxonomic Info ...
: a meta-database of all species on earth * EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences *NCBI Taxonomy: a taxonomic database operated by
NCBI The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
and concentrating on all taxa for which DNA sequences are available (those sequences are stored by
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, another database operated by NCBI).


Image databases

Images play a critical role in biomedicine, ranging from images of
anthropological Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, societies, and linguistics, in both the present and past, including past human species. Social anthropology studies patterns of behavi ...
specimens to
zoology Zoology ()The pronunciation of zoology as is usually regarded as nonstandard, though it is not uncommon. is the branch of biology that studies the Animal, animal kingdom, including the anatomy, structure, embryology, evolution, Biological clas ...
. However, there are relatively few databases dedicated to image collection, although some projects such as
iNaturalist iNaturalist is a social network of naturalists, citizen scientists, and biologists built on the concept of mapping and sharing observations of biodiversity across the globe. iNaturalist may be accessed via its website or from its mobile applic ...
collect photos as a main part of their data. A special case of "images" are 3-dimensional images such as protein structures or 3D-reconstructions of anatomical structures. Image databases include, among others: *
Allen Brain Atlas The Allen Mouse and Human Brain Atlases are projects within the Allen Institute for Brain Science which seek to combine genomics with neuroanatomy by creating gene expression maps for the mouse and human brain. They were initiated in September 2 ...
* Digital Brain Bank * Electron Microscopy Public Image Archive (EMPIAR) * Image Data Resource *
Morphobank MorphoBank is a web application for collaborative evolutionary research, specifically phylogenetic systematics or cladistics, on the phenotype. Historically, scientists conducting research on phylogenetic systematics have worked individually or in ...
* Morphosource


Additional databases


Exosomal databases

*
ExoCarta ExoCarta is a manually curated database of exosomal proteins, RNA and lipids. Exosomes are cell-derived vesicles that are present in many and perhaps all biological fluids, including blood, urine, and cultured medium of cell cultures. The repo ...
* Extracellular RNA Atlas: a repository of small RNA-seq and qPCR-derived exRNA profiles from human and mouse biofluids


Mathematical model databases

*
Biomodels Database BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scie ...
: published mathematical models describing biological processes


Radiologic databases

* The Cancer Imaging Archive (TCIA) *
Neuroimaging Informatics Tools and Resources Clearinghouse The Neuroimaging Tools and Resources CollaboratoryNITRC is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and reso ...


Databases on

antimicrobial resistance Antimicrobial resistance (AMR) occurs when microbes evolve mechanisms that protect them from the effects of antimicrobials. All classes of microbes can evolve resistance. Fungi evolve antifungal resistance. Viruses evolve antiviral resistance. ...
rates and antibiotic consumption

* CIPARS * EARS-Net * ESAC-Net


Databases on

antimicrobial resistance Antimicrobial resistance (AMR) occurs when microbes evolve mechanisms that protect them from the effects of antimicrobials. All classes of microbes can evolve resistance. Fungi evolve antifungal resistance. Viruses evolve antiviral resistance. ...
mechanisms


Wiki-style databases

*
Gene Wiki The Gene Wiki is a project within Wikipedia that aims to describe the relationships and functions of all human genes. It was established to transfer information from scientific resources to Wikipedia stub articles. The Gene Wiki project also init ...
* WikiProfessional


Specialized databases


References


External links


Nucleic Acid Research Molecular Biology Database Collection
– over 1,600 databases
Nucleic Acid Research (NAR) Database Summary Paper Category List
{{Bioinformatics * Da