The continuing development of the standard has been supported since
2010 by the not-for-profit INCHI TRUST, of which
* 1 Overview * 2 Format and layers * 3 Examples
* 4 InChIKey
* 4.1 InChI resolvers
* 5 Name * 6 Continuing development * 7 Adoption * 8 See also * 9 Notes and references
* 10 External links
* 10.1 Documentation and presentations * 10.2 Software and services
The identifiers describe chemical substances in terms of layers of information — the atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry , and electronic charge information. Not all layers have to be provided; for instance, the tautomer layer can be omitted if that type of information is not relevant to the particular application.
InChIs differ from the widely used CAS registry numbers in three respects:
* they are freely usable and non-proprietary; * they can be computed from structural information and do not have to be assigned by some organization; * most of the information in an InChI is human readable (with practice).
InChIs can thus be seen as akin to a general and extremely formalized
The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters).
The INCHIKEY, sometimes referred to as a hashed InChI, is a fixed length (27 character) condensed digital representation of the InChI that is not human-understandable. The InChIKey specification was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematic with the full-length InChI. It should be noted that, unlike the InChI, the InChIKey is not unique: though collisions can be calculated to be very rare, they happen.
In January 2009 the final 1.02 version of the InChI software was released. This provided a means to generate so called standard InChI, which does not allow for user selectable options in dealing with the stereochemistry and tautomeric layers of the InChI string. The standard InChIKey is then the hashed version of the standard InChI string. The standard InChI will simplify comparison of InChI strings and keys generated by different groups, and subsequently accessed via diverse sources such as databases and web resources.
FORMAT AND LAYERS
InChI format INTERNET MEDIA TYPE chemical/x-inchi
TYPE OF FORMAT chemical file format
Every InChI starts with the string "InChI=" followed by the version number, currently 1. This is followed by the letter S for STANDARD INCHIS, which is a fully standardized InChI flavor maintaining the same level of attention to structure details and the same conventions for drawing perception. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter "/" and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:
* Main layer
Chemical formula (no prefix). This is the only sublayer that must
occur in every InChI.
* Atom connections (prefix: "c"). The atoms in the chemical formula
(except for hydrogens) are numbered in sequence; this sublayer
describes which atoms are connected by bonds to which other ones.
* Charge layer
* proton sublayer (prefix: "p" for "protons") * charge sublayer (prefix: "q")
* Stereochemical layer
* double bonds and cumulenes (prefix: "b") * tetrahedral stereochemistry of atoms and allenes (prefixes: "t", "m") * type of stereochemistry information (prefix: "s")
* Isotopic layer (prefixes: "i", "h", as well as "b", "t", "m", "s" for isotopic stereochemistry) * Fixed-H layer (prefix: "f"); contains some or all of the above types of layers except atom connections; may end with "o" sublayer; never included in standard InChI * Reconnected layer (prefix: "r"); contains the whole InChI of a structure with reconnected metal atoms; never included in standard InChI
The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers.
CH3CH2OH ethanol InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3
InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 (standard InChI)
L-ascorbic acid InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1
InChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1 (standard InChI)
The condensed, 27 character INCHIKEY is a hashed version of the full
InChI (using the
InChIKeys consist of 14 characters resulting from a hash of the connectivity information of the InChI, followed by a hyphen, followed by 8 characters resulting from a hash of the remaining layers of the InChI, followed by a single character indicating the kind of InChIKey, followed by a single character indicating the version of InChI used, another hyphen, followed by single character indicating protonation .
Example: Morphine has the structure shown on the right. The standard InChI for morphine is InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 and the standard InChIKey for morphine is BQJCRHHNABKAKU-KBQPJGBKSA-N.
As the InChI cannot be reconstructed from the InChIKey, an InChIKey always needs to be linked to the original InChI to get back to the original structure. InChI Resolvers act as a lookup service to make these links, and prototype services are available from National Cancer Institute , the UniChem service at the European Bioinformatics Institute , and PubChem . ChemSpider has had a resolver until July 2015 when it was decommissioned.
The format was originally called IChI (
Scientific direction of the InChI standard is carried out by the
The InChI has been adopted by many larger and smaller databases,