Information Hyperlinked over Proteins (or iHOP) is an online
text mining service that provides a gene-guided network to access
PubMed abstracts. The service was established by Robert Hoffmann and
Alfonso Valencia in 2004.
The concept underlying iHOP is that by using
genes and
proteins as
hyperlink
In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text wit ...
s between sentences and abstracts, the information in PubMed can be converted into one navigable resource. Navigating across interrelated sentences within this network rather than the use of conventional keyword searches allows for stepwise and controlled acquisition of information. Moreover, this literature network can be superimposed upon experimental interaction data to facilitate the simultaneous analysis of novel and existing knowledge. As of September 2014, the network presented in iHOP contains 28.4 million sentences and 110,000 genes from over 2,700 organisms, including the model organisms ''
Homo sapiens'', ''
Mus musculus'', ''
Drosophila melanogaster'', ''
Caenorhabditis elegans
''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
'', ''
Danio rerio'', ''
Arabidopsis thaliana
''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land.
A winter a ...
'', ''
Saccharomyces cerevisiae'' and ''
Escherichia coli''.
The iHOP system has shown that by navigating from gene to gene, distant medical and biological concepts may be connected by only a small number of genes; the shortest path between two genes has been shown to involve on average four intermediary genes.
The iHOP system architecture consists of two separate parts: the 'iHOP factory' and the web application. The iHOP factory manages the PubMed source data (text and gene data) and organises it within a
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
relational database. The iHOP factory also produces the relevant
XML output for display by the web application.
iHOP is free to use and is licensed under a
Creative Commons BY-ND license.
References
{{reflist
External links
iHOP server
Bioinformatics
Medical search engines