Biomolecular Interaction Network Database
   HOME

TheInfoList



OR:

The Biomolecular Object Network Databank is a
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
databank containing information on small molecule structures and interactions. The databank integrates a number of existing databases to provide a comprehensive overview of the information currently available for a given molecule.


Background

The Blueprint Initiative started as a research program in the lab of Dr. Christopher Hogue at the
Samuel Lunenfeld Research Institute The Lunenfeld-Tanenbaum Research Institute is a medical research institute in Toronto, Ontario and part of the Sinai Health System. It was originally established in 1985 as the Samuel Lunenfeld Research Institute, the research arm of Mount Sina ...
at Mount Sinai Hospital in
Toronto Toronto ( ; or ) is the capital city of the Canadian province of Ontario. With a recorded population of 2,794,356 in 2021, it is the most populous city in Canada and the fourth most populous city in North America. The city is the ancho ...
. On December 14, 2005, Unleashed Informatics Limited acquired the commercial rights to The Blueprint Initiative
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, cop ...
. This included rights to the protein interaction database BIND, the small molecule interaction database SMID, as well as the data warehouse SeqHound. Unleashed Informatics is a data management service provider and is overseeing the management and curation of The Blueprint Initiative under the guidance of Dr. Hogue.


Construction

BOND integrates the original Blueprint Initiative databases as well as other databases, such as Genbank, combined with many tools required to analyze these data. Annotation links for sequences, including taxon identifiers, redundant sequences, Gene Ontology descriptions, Online Mendelian Inheritance in Man identifiers, conserved domains, data base cross-references, LocusLink Identifiers and complete genomes are also available. BOND facilitates cross-database queries and is an
open access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
resource which integrates interaction and sequence data.BOND at Unleashed Informatics


Small Molecule Interaction Database (SMID)

The Small Molecule Interaction Database is a database containing protein domain-small molecule interactions. It uses a domain-based approach to identify domain families, found in the Conserved Domain Database (CDD), which interact with a query small molecule. The CDD from
NCBI The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
amalgamates data from several different sources;
Protein FAMilies A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be ...
(PFAM),
Simple Modular Architecture Research Tool Simple Modular Architecture Research Tool (SMART) is a biological database that is used in the identification and analysis of protein domains within protein sequences. SMART uses profile-hidden Markov models built from multiple sequence alignmen ...
(SMART), Cluster of Orthologous Genes (COGs), and NCBI's own curated sequences. The data in SMID is derived from the Protein Data Bank (PDB), a database of known protein crystal structures. SMID can be queried by entering a protein GI, domain identifier, PDB ID or SMID ID. The results of a search provide small molecule, protein, and domain information for each interaction identified in the database. Interactions with non-biological contacts are normally screened out by default. SMID-BLAST is a tool developed to annotate known small-molecule binding sites as well as to predict binding sites in proteins whose crystal structures have not yet been determined. The prediction is based on extrapolation of known interactions, found in the PDB, to interactions between an uncrystallized protein with a small molecule of interest. SMID-BLAST was validated against a test set of known small molecule interactions from the PDB. It was shown to be an accurate predictor of protein-small molecule interactions; 60% of predicted interactions identically matched the PDB annotated binding site, and of these 73% had greater than 80% of the binding residues of the protein correctly identified. Hogue, C et al. estimated that 45% of predictions that were not observed in the PDB data do in fact represent true positives.


Biomolecular Interaction Network Database (BIND)


Introduction

The idea of a database to document all known molecular interactions was originally put forth by Tony Pawson in the 1990s and was later developed by scientists at the
University of Toronto The University of Toronto (UToronto or U of T) is a public research university in Toronto, Ontario, Canada, located on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution ...
in collaboration with the
University of British Columbia The University of British Columbia (UBC) is a public university, public research university with campuses near Vancouver and in Kelowna, British Columbia. Established in 1908, it is British Columbia's oldest university. The university ranks a ...
. The development of the Biomolecular Interaction Network Database (BIND) has been supported by grants from the Canadian Institutes of Health Research (
CIHR The Canadian Institutes of Health Research (CIHR; french: Instituts de recherche en santé du Canada; IRSC) is a federal agency responsible for funding health and medical research in Canada. Comprising 13 institutes, it is the successor to the M ...
), Genome Canada,BIND at genomecanada.ca
/ref> the Canadian Foundation for Innovation and the Ontario Research and Development Fund. BIND was originally designed to be a constantly growing depository for information regarding biomolecular interactions, molecular complexes and pathways. As
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
is a rapidly advancing field, there is a need to have information from scientific journals readily available to researchers. BIND facilitates the understanding of molecular interactions and pathways involved in cellular processes and will eventually give scientists a better understanding of developmental processes and disease pathogenesis The major goals of the BIND project are: to create a public proteomics resource that is available to all; to create a platform to enable datamining from other sources (PreBIND); to create a platform capable of presenting visualizations of complex molecular interactions. From the beginning, BIND has been
open access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
and software can be freely distributed and modified. Currently, BIND includes a data specification, a database and associated data mining and visualization tools. Eventually, it is hoped that BIND will be a collection of all the interactions occurring in each of the major model organisms.


Database structure

BIND contains information on three types of data: interactions, molecular complexes and pathways. # Interactions are the basic component of BIND and describe how 2 or more objects (A and B) interact with each other. The objects can be a variety of things: DNA,
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
,
genes In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
,
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
,
ligands In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's electro ...
, or
photons A photon () is an elementary particle that is a quantum of the electromagnetic field, including electromagnetic radiation such as light and radio waves, and the force carrier for the electromagnetic force. Photons are massless, so they alway ...
. The interaction entry contains the most information about a molecule; it provides information on its name and synonyms, where it is found (e.g. where in the cell, what species, when it is active, etc.), and its sequence or where its sequence can be found. The interaction entry also outlines the experimental conditions required to observe binding in vitro, chemical dynamics (including
thermodynamics Thermodynamics is a branch of physics that deals with heat, work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed by the four laws of the ...
and
kinetics Kinetics ( grc, κίνησις, , kinesis, ''movement'' or ''to move'') may refer to: Science and medicine * Kinetics (physics), the study of motion and its causes ** Rigid body kinetics, the study of the motion of rigid bodies * Chemical ki ...
). # The second type of BIND entries are the molecular complexes. Molecular complexes are defined as an aggregate of molecules that are stable and have a function when bound to each other. The record may also contain some information on the role of the complex in various interactions and the molecular complex entry links data from 2 or more interaction records. # The third component of BIND is the pathway record section. A pathway consists of a network of interactions that are involved in the regulation of cellular processes. This section may also contain information on phenotypes and diseases related to the pathway.
The minimum amount of information needed to create an entry in BIND is a
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
publication reference and an entry in another database (e.g. GenBank). Each entry within the database provides references/authors for the data. As BIND is a constantly growing database, all components of BIND track updates and changes.Bader, GD, ''et al.'' BIND- The Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 29: 242-245 (2001). BIND is based on a data specification written using Abstract Syntax Notation 1 (
ASN.1 Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and ...
) language. ASN.1 is used also by
NCBI The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
when storing data for their Entrez system and because of this BIND uses the same standards as NCBI for data representation. The ASN.1 language is preferred because it can be easily translated into other data specification languages (e.g.
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
), can easily handle complex data and can be applied to all biological interactions – not just proteins. Bader and Hogue (2000) have prepared a detailed manuscript on the ASN.1 data specification used by BIND.Bader, GD, Hogue, CWV. BIND- a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. ''Bioinformatics'' 16(5): 465-477 (2000).


Data submission and curation

User submission to the database is encouraged. To contribute to the database, one must submit: contact info,
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
identifier and the two molecules that interact. The person who submits a record is the owner of it. All records are validated before being made public and BIND is curated for quality assurance. BIND curation has two tracks: high-throughput (HTP) and low-throughput (LTP). HTP records are from papers which have reported more than 40 interaction results from one experimental methodology. HTP curators typically have a
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
backgrounds. The HTP curators are responsible for the collection of storage of experimental data and they also create scripts to update BIND based on new publications. LTP records are curated by individuals with either an MSc or PhD and laboratory experience in interaction research. LTP curators are given further training through the
Canadian Bioinformatics Workshops Canadian Bioinformatics Workshops (CBW) are a series of advanced training workshops in bioinformatics, founded in 1999 in response to an identified need for a skilled bioinformatics workforce in Canada. 1999-2007 The Canadian Bioinformatics Work ...
. Information on small molecule chemistry is curated separately by chemists to ensure the curator is knowledgeable about the subject. The priority for BIND curation is to focus on LTP to collect information as it is published. Although, HTP studies provide more information at once, there are more LTP studies being reported and similar numbers of interactions are being reported by both tracks. In 2004, BIND collected data from 110 journals.Alfarano, C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 (2005).


Database growth

BIND has grown significantly since its conception; in fact, the database saw a 10 fold increase in entries between 2003 and 2004. By September 2004, there were over 100,000 interaction records by 2004 (including 58,266 protein-protein, 4,225 genetic, 874 protein-small molecule, 25,857 protein-DNA, and 19,348 biopolymer interactions). The database also contains sequence information for 31,972 proteins, 4560 DNA samples and 759 RNA samples. These entries have been collected from 11,649 publications; therefore, the database represents an important amalgamation of data. The organisms with entries in the database include: ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungus microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been o ...
'', ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...
'', ''
Homo sapiens Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
'', '' Mus musculus'', ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
'', '' Helicobacter pylori'', ''
Bos taurus Cattle (''Bos taurus'') are large, domesticated, cloven-hooved, herbivores. They are a prominent modern member of the subfamily Bovinae and the most widespread species of the genus ''Bos''. Adult females are referred to as cows and adult ma ...
'', HIV-1, '' Gallus gallus'', ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
'', as well as others. In total, 901 taxa were included by September 2004 and BIND has been split up into BIND-Metazoa, BIND-Fungi, and BIND-Taxroot. Not only is the information contained within the database continually updated, the software itself has gone through several revisions. Version 1.0 of BIND was released in 1999 and based on user feedback it was modified to include additional detail on experimental conditions required for binding and a hierarchical description of cellular location of the interaction. Version 2.0 was released in 2001 and included the capability to link to information available in other databases. Version 3.0 (2002) expanded the database from physical/biochemical interactions to also include genetic interactions.Bader, GD, ''et al.''. BIND: the Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 31: 248-250 (2003). Version 3.5 (2004) included a refined user-interface that aimed to simplify information retrieval. In 2006, BIND was incorporated into the Biomolecular Object Network Database (BOND) where it continues to be updated and improved.


Special features

BIND was the first database of its kind to contain info on biomolecular interactions, reactions and pathways in one schema. It is also the first to base its
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
on chemistry which allows 3D representation of molecular interactions. The underlying chemistry allows molecular interactions to be described down to the atomic level of resolution. PreBIND an associated system for data mining to locate biomolecular interaction information in the scientific literature. The name or accession number of a protein can be entered and PreBIND will scan the literature and return a list of potentially interacting proteins. BIND BLAST is also available to find interactions with proteins that are similar to the one specified in the query. BIND offers several “features” that many other proteomics databases do not include. The authors of this program have created an extension to traditional
IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
nomenclature to help describe
post-translational modifications Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosomes ...
that occur to amino acids. These modifications include:
acetylation : In organic chemistry, acetylation is an organic esterification reaction with acetic acid. It introduces an acetyl group into a chemical compound. Such compounds are termed ''acetate esters'' or simply '' acetates''. Deacetylation is the oppo ...
,
formylation In biochemistry, the addition of a formyl functional group is termed formylation. A formyl functional group consists of a carbonyl bonded to hydrogen. When attached to an R group, a formyl group is called an aldehyde. Formylation has been identi ...
,
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These t ...
,
palmitoylation Palmitoylation is the covalent attachment of fatty acids, such as palmitic acid, to cysteine (''S''-palmitoylation) and less frequently to serine and threonine (''O''-palmitoylation) residues of proteins, which are typically lipid bilayer, memb ...
, etc. the extension of the traditional IUPAC codes allows these amino acids to be represented in sequence form as well. BIND also utilizes a unique visualization tool known as OntoGlyphs. The OntoGlyphs were developed based on Gene Ontology (GO) and provide a link back to the original GO information. A number of GO terms have been grouped into categories, each one representing a specific function, binding specificity, or localization in the cell. There are 83 OntoGlyph characters in total. There are 34 functional OntoGlyphs which contain information about the role of the molecule (e.g. cell physiology, ion transport, signaling). There are 25 binding OntoGlyphs which describe what the molecule binds (e.g. ligands, DNA, ions). The other 24 OntoGlyphs provide information about the location of the molecule within a cell (e.g. nucleus, cytoskeleton). The OntoGlyphs can be selected and manipulated to include or exclude certain characteristics from search results. The visual nature of the OntoGlyphs also facilitates pattern recognition when looking at search results. ProteoGlyphs are graphical representations of the structural and binding properties of proteins at the level of conserved domains. The protein is diagrammed as a straight horizontal line and glyphs are inserted to represent conserved domains. Each glyph is displayed to represent the relative position and length of its alignment in the protein sequence.


Accessing the database

The database user interface is web-based and can be queried using text or accession numbers/identifiers. Since its integration with the other components of BOND, sequences have been added to interactions, molecular complexes and pathways in the results. Records include information on: BIND ID, description of the interaction/complex/pathway, publications, update records, organism, OntoGlyphs, ProteoGlyphs, and links to other databases where additional information can be found. BIND records include various viewing formats (e.g.
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
,
ASN.1 Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and ...
,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, FASTA), various formats for exporting results (e.g. ASN.1,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, GI list,
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
), and visualizations (e.g.
Cytoscape Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are available f ...
). The exact viewing and exporting options vary depending on what type of data has been retrieved.


User statistics

The number of Unleashed Registrants has increased 10 fold since the integration of BIND. As of December 2006 registration fell just short of 10,000. Subscribers to the commercial versions of BOND fall into six general categories;
agriculture Agriculture or farming is the practice of cultivating plants and livestock. Agriculture was the key development in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that enabled people to ...
and
food Food is any substance consumed by an organism for nutritional support. Food is usually of plant, animal, or fungal origin, and contains essential nutrients, such as carbohydrates, fats, proteins, vitamins, or minerals. The substance is inge ...
,
biotechnology Biotechnology is the integration of natural sciences and engineering sciences in order to achieve the application of organisms, cells, parts thereof and molecular analogues for products and services. The term ''biotechnology'' was first used b ...
,
pharmaceuticals A medication (also called medicament, medicine, pharmaceutical drug, medicinal drug or simply drug) is a drug used to diagnose, cure, treat, or prevent disease. Drug therapy (pharmacotherapy) is an important part of the medical field and rel ...
,
informatics Informatics is the study of computational systems, especially those for data storage and retrieval. According to ACM ''Europe and'' ''Informatics Europe'', informatics is synonymous with computer science and computing as a profession, in which ...
, materials and other. The biotechnology sector is the largest of these groups, holding 28% of subscriptions. Pharmaceuticals and informatics follow with 22% and 18% respectively. The
United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territorie ...
holds the bulk of these subscriptions, 69%. Other countries with access to the commercial versions of BOND include
Canada Canada is a country in North America. Its ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, covering over , making it the world's second-largest country by tot ...
, the
United Kingdom The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a country in Europe, off the north-western coast of the continental mainland. It comprises England, Scotland, Wales and North ...
,
Japan Japan ( ja, 日本, or , and formally , ''Nihonkoku'') is an island country in East Asia. It is situated in the northwest Pacific Ocean, and is bordered on the west by the Sea of Japan, while extending from the Sea of Okhotsk in the north ...
,
China China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's most populous country, with a population exceeding 1.4 billion, slightly ahead of India. China spans the equivalent of five time zones and ...
,
Korea Korea ( ko, 한국, or , ) is a peninsular region in East Asia. Since 1945, it has been divided at or near the 38th parallel, with North Korea (Democratic People's Republic of Korea) comprising its northern half and South Korea (Republic o ...
,
Germany Germany,, officially the Federal Republic of Germany, is a country in Central Europe. It is the second most populous country in Europe after Russia, and the most populous member state of the European Union. Germany is situated betwe ...
,
France France (), officially the French Republic ( ), is a country primarily located in Western Europe. It also comprises of Overseas France, overseas regions and territories in the Americas and the Atlantic Ocean, Atlantic, Pacific Ocean, Pac ...
,
India India, officially the Republic of India (Hindi: ), is a country in South Asia. It is the seventh-largest country by area, the second-most populous country, and the most populous democracy in the world. Bounded by the Indian Ocean on the so ...
and
Australia Australia, officially the Commonwealth of Australia, is a Sovereign state, sovereign country comprising the mainland of the Australia (continent), Australian continent, the island of Tasmania, and numerous List of islands of Australia, sma ...
. All of these countries fall below 6% in user share.


References

{{reflist Biochemistry databases