The National Center for Biotechnology Information (NCBI)
is part of the
United States National Library of Medicine
The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library.
Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. ...
(NLM), a branch of the
National Institutes of Health (NIH). It is approved and funded by the government of the
United States. The NCBI is located in
Bethesda, Maryland
Bethesda () is an unincorporated, census-designated place in southern Montgomery County, Maryland. It is located just northwest of Washington, D.C. It takes its name from a local church, the Bethesda Meeting House (1820, rebuilt 1849), which ...
, and was founded in 1988 through legislation sponsored by US Congressman
Claude Pepper.
The NCBI houses a series of databases relevant to
biotechnology
Biotechnology is the integration of natural sciences and engineering sciences in order to achieve the application of organisms, cells, parts thereof and molecular analogues for products and services. The term ''biotechnology'' was first used ...
and
biomedicine
Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine) and is an important resource for bioinformatics tools and services. Major databases include
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
for DNA sequences and
PubMed, a bibliographic database for biomedical literature. Other databases include the
NCBI Epigenomics database. All these databases are available online through the
Entrez
The Entrez (pronounced ''ɒnˈtreɪ'') Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information ...
search engine. NCBI was directed by
David Lipman,
[ one of the original authors of the ]BLAST
Blast or The Blast may refer to:
*Explosion, a rapid increase in volume and release of energy in an extreme manner
* Detonation, an exothermic front accelerating through a medium that eventually drives a shock front
Film
* ''Blast'' (1997 film) ...
sequence alignment program and a widely respected figure in bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics comb ...
.
GenBank
NCBI had responsibility for making available the GenBank DNA sequence database
In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized (" digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. ...
since 1992. GenBank coordinates with individual laboratories and other sequence databases, such as those of the European Molecular Biology Laboratory
The European Molecular Biology Laboratory (EMBL) is an intergovernmental organization dedicated to molecular biology research and is supported by 27 member states, two prospect states, and one associate member state. EMBL was created in 1974 and ...
(EMBL) and the DNA Data Bank of Japan (DDBJ).
Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI provides Gene, Online Mendelian Inheritance in Man
Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...
, the Molecular Modeling Database (3D protein structures), dbSNP
The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the Nati ...
(a database of single-nucleotide polymorphisms), the Reference Sequence Collection, a map of the human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as ...
, and a taxonomy
Taxonomy is the practice and science of categorization or classification.
A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types ...
browser, and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy Project. The NCBI assigns a unique identifier (taxonomy ID number) to each species of organism.
The NCBI has software tools that are available through internet browsers or by FTP
The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data ...
. For example, BLAST
Blast or The Blast may refer to:
*Explosion, a rapid increase in volume and release of energy in an extreme manner
* Detonation, an exothermic front accelerating through a medium that eventually drives a shock front
Film
* ''Blast'' (1997 film) ...
is a sequence similarity searching program. BLAST can do sequence comparisons against the GenBank DNA database in less than 15 seconds.
NCBI Bookshelf
The NCBI Bookshelf is a collection of freely accessible, downloadable, online versions of selected biomedical books. The Bookshelf covers a wide range of topics including molecular biology
Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physica ...
, biochemistry
Biochemistry or biological chemistry is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology ...
, cell biology
Cell biology (also cellular biology or cytology) is a branch of biology that studies the structure, function, and behavior of cells. All living organisms are made of cells. A cell is the basic unit of life that is responsible for the living a ...
, genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar workin ...
, microbiology, disease states from a molecular and cellular point of view, research methods, and virology
Virology is the scientific study of biological viruses. It is a subfield of microbiology that focuses on their detection, structure, classification and evolution, their methods of infection and exploitation of host cells for reproduction, their ...
. Some of the books are online versions of previously published books, while others, such as ''Coffee Break
A break at work (or work-break) is a period of time during a shift in which an employee is allowed to take time off from their job. It is a type of downtime. There are different types of breaks, and depending on the length and the employer' ...
'', are written and edited by NCBI staff. The Bookshelf is a complement to the Entrez PubMed repository of peer-reviewed publication abstracts in that Bookshelf contents provide established perspectives on evolving areas of study and a context in which many disparate individual pieces of reported research can be organized.
Basic Local Alignment Search Tool (BLAST)
BLAST
Blast or The Blast may refer to:
*Explosion, a rapid increase in volume and release of energy in an extreme manner
* Detonation, an exothermic front accelerating through a medium that eventually drives a shock front
Film
* ''Blast'' (1997 film) ...
is an algorithm used for calculating sequence similarity between biological sequences, such as nucleotide sequences of DNA and amino acid sequences of proteins. BLAST is a powerful tool for finding sequences similar to the query sequence within the same organism or in different organisms. It searches the query sequence on NCBI databases and servers and posts the results back to the person's browser in the chosen format. Input sequences to the BLAST are mostly in FASTA or GenBank format while output could be delivered in a variety of formats such as HTML, XML formatting, and plain text. HTML is the default output format for NCBI's web-page. Results for NCBI-BLAST are presented in graphical format with all the hits found, a table with sequence identifiers for the hits having scoring related data, along with the alignments for the sequence of interest and the hits received with analogous BLAST scores for these.
Entrez
The Entrez
The Entrez (pronounced ''ɒnˈtreɪ'') Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information ...
Global Query Cross-Database Search System is used at NCBI for all the major databases such as Nucleotide and Protein Sequences, Protein Structures, PubMed, Taxonomy, Complete Genomes, OMIM, and several others. Entrez is both an indexing and retrieval system having data from various sources for biomedical research. NCBI distributed the first version of Entrez in 1991, composed of nucleotide sequences from PDB and GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, protein sequences from SWISS-PROT, translated GenBank, PIR, PRF, PDB, and associated abstracts and citations from PubMed. Entrez is specially designed to integrate the data from several different sources, databases, and formats into a uniform information model and retrieval system which can efficiently retrieve that relevant references, sequences and structures.
Gene
Gene has been implemented at NCBI to characterize and organize the information about genes. It serves as a major node in the nexus of the genomic map, expression, sequence, protein function, structure, and homology data. A unique GeneID is assigned to each gene record that can be followed through revision cycles. Gene records for known or predicted genes are established here and are demarcated by map positions or nucleotide sequences. Gene has several advantages over its predecessor, LocusLink, including, better integration with other databases in NCBI, broader taxonomic scope, and enhanced options for query and retrieval provided by the Entrez system.
Protein
Protein database maintains the text record for individual protein sequences, derived from many different resources such as NCBI Reference Sequence (RefSeq) project, GenBank, PDB, and UniProtKB/SWISS-Prot. Protein records are present in different formats including FASTA and XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
and are linked to other NCBI resources. Protein provides the relevant data to the users such as genes, DNA/RNA sequences, biological pathways, expression and variation data, and literature. It also provides the pre-determined sets of similar and identical proteins for each sequence as computed by the BLAST. The Structure database of NCBI contains 3D coordinate sets for experimentally-determined structures in PDB that are imported by NCBI.
The Conserved Domain database ( CDD) of protein contains sequence profiles that characterize highly conserved domains within protein sequences. It also has records from external resources like SMART and Pfam
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families.
Use ...
.
There is another database of proteins known as Protein Clusters database, which contains sets of proteins sequences that are clustered according to the maximum alignments between the individual sequences as calculated by BLAST.
Pubchem database
PubChem
PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is pa ...
database of NCBI is a public resource for molecules and their activities against biological assays. PubChem is searchable and accessible by Entrez
The Entrez (pronounced ''ɒnˈtreɪ'') Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information ...
information retrieval system.[Wang Y. & Bryant S H. (2014). The NCBI Handbook, 2nd edition, NCBI PubChem BioAssay Database]
See also
* DNA Data Bank of Japan (DDBJ)
* European Bioinformatics Institute
The European Bioinformatics Institute (EMBL-EBI) is an Intergovernmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the W ...
(EBI)
References
External links
*
National Library of Medicine
National Institutes of Health
{{Authority control
Biotechnology organizations
Medical research institutes in Maryland
National Institutes of Health
Online databases
Online taxonomy databases