History
The original SMILES specification was initiated by David Weininger at the USEPA Mid-Continent Ecology Division Laboratory inTerminology
The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms "canonical" and "isomeric" can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive. Typically, a number of equally valid SMILES strings can be written for a molecule. For example,CCO
, OCC
and C(O)C
all specify the structure of Graph-based definition
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in aSMILES definition as strings of a context-free language
From the view point of a formal language theory, SMILES is a word. A SMILES is parsable with a context-free parser. The use of this representation has been in the prediction of biochemical properties (incl. toxicity andDescription
Atoms
u/code> for gold
Gold is a chemical element with the symbol Au (from la, aurum) and atomic number 79. This makes it one of the higher atomic number elements that occur naturally. It is a bright, slightly orange-yellow, dense, soft, malleable, and ductile met ...
. Brackets may be omitted in the common case of atoms which:
# are in the " organic subset" of B, C, N, O, P, S, F, Cl, Br, or I, and
# have no formal charge, and
# have the number of hydrogens attached implied by the SMILES valence model (typically their normal valence, but for N and P it is 3 or 5, and for S it is 2, 4 or 6), and
# are the normal isotope
Isotopes are two or more types of atoms that have the same atomic number (number of protons in their nuclei) and position in the periodic table (and hence belong to the same chemical element), and that differ in nucleon numbers (mass numbers) ...
s, and
# are not chiral centers
In stereochemistry, a stereocenter of a molecule is an atom (center), axis or plane that is the focus of stereoisomerism; that is, when having at least three different groups bound to the stereocenter, interchanging any two different groups cr ...
.
All other elements must be enclosed in brackets, and have charges and hydrogens shown explicitly. For instance, the SMILES for water
Water (chemical formula ) is an inorganic, transparent, tasteless, odorless, and nearly colorless chemical substance, which is the main constituent of Earth's hydrosphere and the fluids of all known living organisms (in which it acts as a ...
may be written as either O
or H2/code>. Hydrogen may also be written as a separate atom; water may also be written as /code>.
When brackets are used, the symbol H
is added if the atom in brackets is bonded to one or more hydrogen, followed by the number of hydrogen atoms if greater than 1, then by the sign +
for a positive charge or by -
for a negative charge. For example, H4+/code> for ammonium
The ammonium cation is a positively-charged polyatomic ion with the chemical formula or . It is formed by the protonation of ammonia (). Ammonium is also a general name for positively charged or protonated substituted amines and quaternary a ...
(). If there is more than one charge, it is normally written as digit; however, it is also possible to repeat the sign as many times as the ion has charges: one may write either i+4 I4, i4, I 4 or I-4 may refer to:
Arts, entertainment, and media
* '' I-4: Loafing and Camouflage'', a Greek film
Military
* 1st Life Grenadier Regiment (Sweden) (1816–1927), a Swedish infantry regiment
* , a World War II Type J1 submarine o ...
/code> or i++++/code> for titanium
Titanium is a chemical element with the symbol Ti and atomic number 22. Found in nature only as an oxide, it can be reduced to produce a lustrous transition metal with a silver color, low density, and high strength, resistant to corrosion in ...
(IV) Ti4+. Thus, the hydroxide
Hydroxide is a diatomic anion with chemical formula OH−. It consists of an oxygen and hydrogen atom held together by a single covalent bond, and carries a negative electric charge. It is an important but usually minor constituent of water. I ...
anion
An ion () is an atom or molecule with a net electrical charge.
The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by convent ...
() is represented by H-/code>, the hydronium
In chemistry, hydronium (hydroxonium in traditional British English) is the common name for the aqueous cation , the type of oxonium ion produced by protonation of water. It is often viewed as the positive ion present when an Arrhenius acid is d ...
cation () is H3+/code> and the cobalt
Cobalt is a chemical element with the symbol Co and atomic number 27. As with nickel, cobalt is found in the Earth's crust only in a chemically combined form, save for small deposits found in alloys of natural meteoric iron. The free element, pr ...
(III) cation
An ion () is an atom or molecule with a net electrical charge.
The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by convent ...
(Co3+) is either o+3
O, or o, is the fifteenth Letter (alphabet), letter and the fourth vowel letter in the Latin alphabet, used in the English alphabet, modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in ...
/code> or o+++/code>.
Bonds
A bond is represented using one of the symbols . - = # $ : / \
.
Bonds between aliphatic
In organic chemistry, hydrocarbons ( compounds composed solely of carbon and hydrogen) are divided into two classes: aromatic compounds and aliphatic compounds (; G. ''aleiphar'', fat, oil). Aliphatic compounds can be saturated, like hexane, ...
atoms are assumed to be single unless specified otherwise and are implied by adjacency in the SMILES string. Although single bonds may be written as -
, this is usually omitted. For example, the SMILES for ethanol
Ethanol (abbr. EtOH; also called ethyl alcohol, grain alcohol, drinking alcohol, or simply alcohol) is an organic compound. It is an Alcohol (chemistry), alcohol with the chemical formula . Its formula can be also written as or (an ethyl ...
may be written as C-C-O
, CC-O
or C-CO
, but is usually written CCO
.
Double, triple, and quadruple bonds are represented by the symbols =
, #
, and $
respectively as illustrated by the SMILES O=C=O
(carbon dioxide
Carbon dioxide (chemical formula ) is a chemical compound made up of molecules that each have one carbon atom covalently double bonded to two oxygen atoms. It is found in the gas state at room temperature. In the air, carbon dioxide is transpar ...
), C#N
(hydrogen cyanide
Hydrogen cyanide, sometimes called prussic acid, is a chemical compound with the formula HCN and structure . It is a colorless, extremely poisonous, and flammable liquid that boils slightly above room temperature, at . HCN is produced on an ...
HCN) and a+ s-/code> (gallium arsenide
Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a Zincblende (crystal structure), zinc blende crystal structure.
Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monoli ...
).
An additional type of bond is a "non-bond", indicated with .
, to indicate that two parts are not bonded together. For example, aqueous sodium chloride
Sodium chloride , commonly known as salt (although sea salt also contains other chemical salts), is an ionic compound with the chemical formula NaCl, representing a 1:1 ratio of sodium and chloride ions. With molar masses of 22.99 and 35.45 g ...
may be written as a+ l-/code> to show the dissociation.
An aromatic "one and a half" bond may be indicated with :
; see below.
Single bonds adjacent to double bonds may be represented using /
or \
to indicate stereochemical configuration; see below.
Rings
Ring structures are written by breaking each ring at an arbitrary point (although some choices will lead to a more legible SMILES than others) to make an acyclic structure and adding numerical ring closure labels to show connectivity between non-adjacent atoms.
For example, cyclohexane
Cyclohexane is a cycloalkane with the molecular formula . Cyclohexane is non-polar. Cyclohexane is a colorless, flammable liquid with a distinctive detergent-like odor, reminiscent of cleaning products (in which it is sometimes used). Cyclohexan ...
and dioxane
1,4-Dioxane () is a heterocyclic organic compound, classified as an ether. It is a colorless liquid with a faint sweet odor similar to that of diethyl ether. The compound is often called simply dioxane because the other dioxane isomers ( 1,2- ...
may be written as C1CCCCC1
and O1CCOCC1
respectively. For a second ring, the label will be 2. For example, decalin
Decalin (decahydronaphthalene, also known as bicyclo .4.0ecane and sometimes decaline), a bicyclic organic compound, is an industrial solvent. A colorless liquid with an aromatic odor, it is used as a solvent for many resins or fuel additives.
I ...
(decahydronaphthalene) may be written as C1CCCC2C1CCCC2
.
SMILES does not require that ring numbers be used in any particular order, and permits ring number zero, although this is rarely used. Also, it is permitted to reuse ring numbers after the first ring has closed, although this usually makes formulae harder to read. For example, bicyclohexyl is usually written as C1CCCCC1C2CCCCC2
, but it may also be written as C0CCCCC0C0CCCCC0
.
Multiple digits after a single atom indicate multiple ring-closing bonds. For example, an alternative SMILES notation for decalin is C1CCCC2CCCCC12
, where the final carbon participates in both ring-closing bonds 1 and 2. If two-digit ring numbers are required, the label is preceded by %
, so C%12
is a single ring-closing bond of ring 12.
Either or both of the digits may be preceded by a bond type to indicate the type of the ring-closing bond. For example, cyclopropene
Cyclopropene is an organic compound with the formula . It is the simplest cycloalkene. Because the ring is highly strained, cyclopropene is difficult to prepare and highly reactive. This colorless gas has been the subject for many fundamental st ...
is usually written C1=CC1
, but if the double bond is chosen as the ring-closing bond, it may be written as C=1CC1
, C1CC=1
, or C=1CC=1
. (The first form is preferred.) C=1CC-1
is illegal, as it explicitly specifies conflicting types for the ring-closing bond.
Ring-closing bonds may not be used to denote multiple bonds. For example, C1C1
is not a valid alternative to C=C
for ethylene
Ethylene (IUPAC name: ethene) is a hydrocarbon which has the formula or . It is a colourless, flammable gas with a faint "sweet and musky" odour when pure. It is the simplest alkene (a hydrocarbon with carbon-carbon double bonds).
Ethylene i ...
. However, they may be used with non-bonds; C1.C2.C12
is a peculiar but legal alternative way to write propane
Propane () is a three-carbon alkane with the molecular formula . It is a gas at standard temperature and pressure, but compressible to a transportable liquid. A by-product of natural gas processing and petroleum refining, it is commonly used a ...
, more commonly written CCC
.
Choosing a ring-break point adjacent to attached groups can lead to a simpler SMILES form by avoiding branches. For example, cyclohexane-1,2-diol
Cyclohexane-1,2-diol is a chemical compound found in castoreum
Castoreum is a yellowish exudate from the castor sacs of mature beavers. Beavers use castoreum in combination with urine to scent mark their territory. Both beaver sexes have a pair o ...
is most simply written as OC1CCCCC1O
; choosing a different ring-break location produces a branched structure that requires parentheses to write.
Aromaticity
Aromatic
In chemistry, aromaticity is a chemical property of cyclic ( ring-shaped), ''typically'' planar (flat) molecular structures with pi bonds in resonance (those containing delocalized electrons) that gives increased stability compared to satur ...
rings such as benzene
Benzene is an organic chemical compound with the molecular formula C6H6. The benzene molecule is composed of six carbon atoms joined in a planar ring with one hydrogen atom attached to each. Because it contains only carbon and hydrogen atoms, ...
may be written in one of three forms:
# In Kekulé form with alternating single and double bonds, e.g. C1=CC=CC=C1
,
# Using the aromatic bond symbol :
, e.g. C1:C:C:C:C:C1
, or
# Most commonly, by writing the constituent B, C, N, O, P and S atoms in lower-case forms b
, c
, n
, o
, p
and s
, respectively.
In the latter case, bonds between two aromatic atoms are assumed (if not explicitly shown) to be aromatic bonds. Thus, benzene
Benzene is an organic chemical compound with the molecular formula C6H6. The benzene molecule is composed of six carbon atoms joined in a planar ring with one hydrogen atom attached to each. Because it contains only carbon and hydrogen atoms, ...
, pyridine
Pyridine is a basic heterocyclic organic compound with the chemical formula . It is structurally related to benzene, with one methine group replaced by a nitrogen atom. It is a highly flammable, weakly alkaline, water-miscible liquid with a d ...
and furan
Furan is a heterocyclic organic compound, consisting of a five-membered aromatic ring with four carbon atoms and one oxygen atom. Chemical compounds containing such rings are also referred to as furans.
Furan is a colorless, flammable, highly ...
can be represented respectively by the SMILES c1ccccc1
, n1ccccc1
and o1cccc1
.
Aromatic nitrogen bonded to hydrogen, as found in pyrrole
Pyrrole is a heterocyclic aromatic organic compound, a five-membered ring with the formula C4 H4 NH. It is a colorless volatile liquid that darkens readily upon exposure to air. Substituted derivatives are also called pyrroles, e.g., ''N''-meth ...
must be represented as H/code>; thus imidazole
Imidazole (ImH) is an organic compound with the formula C3N2H4. It is a white or colourless solid that is soluble in water, producing a mildly alkaline solution. In chemistry, it is an aromatic heterocycle, classified as a diazole Diazole refers ...
is written in SMILES notation as n1c Hc1
.
When aromatic atoms are singly bonded to each other, such as in biphenyl
Biphenyl (also known as diphenyl, phenylbenzene, 1,1′-biphenyl, lemonene or BP) is an organic compound that forms colorless crystals. Particularly in older literature, compounds containing the functional group consisting of biphenyl less one ...
, a single bond must be shown explicitly: c1ccccc1-c2ccccc2
. This is one of the few cases where the single bond symbol -
is required. (In fact, most SMILES software can correctly infer that the bond between the two rings cannot be aromatic and so will accept the nonstandard form c1ccccc1c2ccccc2
.)
The Daylight and OpenEye algorithms for generating canonical SMILES differ in their treatment of aromaticity.
Branching
Branches are described with parentheses, as in CCC(=O)O
for propionic acid
Propionic acid (, from the Greek words πρῶτος : ''prōtos'', meaning "first", and πίων : ''píōn'', meaning "fat"; also known as propanoic acid) is a naturally occurring carboxylic acid with chemical formula CH3CH2CO2H. It is a liq ...
and FC(F)F
for fluoroform
Trifluoromethane or fluoroform is the chemical compound with the formula CHF3. It is one of the " haloforms", a class of compounds with the formula CHX3 (X = halogen) with C3v symmetry. Fluoroform is used in diverse applications in organic s ...
. The first atom within the parentheses, and the first atom after the parenthesized group, are both bonded to the same branch point atom. The bond symbol must appear inside the parentheses; outside (E.g.: CCC=(O)O
) is invalid.
Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N
see depiction
and COc(cc1)ccc1C#N
see depiction
which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for substituted rings in this way can make them more human-readable.
Branches may be written in any order. For example, bromochlorodifluoromethane may be written as FC(Br)(Cl)F
, BrC(F)(F)Cl
, C(F)(Cl)(F)Br
, or the like. Generally, a SMILES form is easiest to read if the simpler branch comes first, with the final, unparenthesized portion being the most complex. The only caveats to such rearrangements are:
* If ring numbers are reused, they are paired according to their order of appearance in the SMILES string. Some adjustments may be required to preserve the correct pairing.
* If stereochemistry is specified, adjustments must be made; see below.
The one form of branch which does ''not'' require parentheses are ring-closing bonds. Choosing ring-closing bonds appropriately can reduce the number of parentheses required. For example, toluene
Toluene (), also known as toluol (), is a substituted aromatic hydrocarbon. It is a colorless, water-insoluble liquid with the smell associated with paint thinners. It is a mono-substituted benzene derivative, consisting of a methyl group (CH3) at ...
is normally written as Cc1ccccc1
or c1ccccc1C
, avoiding the parentheses required if written as c1cc(C)ccc1
or c1cc(ccc1)C
.
Stereochemistry
SMILES permits, but does not require, specification of stereoisomer
In stereochemistry, stereoisomerism, or spatial isomerism, is a form of isomerism in which molecules have the same molecular formula and sequence of bonded atoms (constitution), but differ in the three-dimensional orientations of their atoms in ...
s.
Configuration around double bonds is specified using the characters /
and \
to show directional single bonds adjacent to a double bond. For example, F/C=C/F
see depiction
is one representation of ''trans
Trans- is a Latin prefix meaning "across", "beyond", or "on the other side of".
Used alone, trans may refer to:
Arts, entertainment, and media
* Trans (festival), a former festival in Belfast, Northern Ireland, United Kingdom
* ''Trans'' (film ...
''- 1,2-difluoroethylene, in which the fluorine atoms are on opposite sides of the double bond (as shown in the figure), whereas F/C=C\F
see depiction
is one possible representation of ''cis
Cis or cis- may refer to:
Places
* Cis, Trentino, in Italy
* In Poland:
** Cis, Świętokrzyskie Voivodeship, south-central
** Cis, Warmian-Masurian Voivodeship, north
Math, science and biology
* cis (mathematics) (cis(''θ'')), a trigonome ...
''-1,2-difluoroethylene, in which the fluorines are on the same side of the double bond.
Bond direction symbols always come in groups of at least two, of which the first is arbitrary. That is, F\C=C\F
is the same as F/C=C/F
. When alternating single-double bonds are present, the groups are larger than two, with the middle directional symbols being adjacent to two double bonds. For example, the common form of (2,4)-hexadiene is written C/C=C/C=C/C
.
As a more complex example, beta-carotene has a very long backbone of alternating single and double bonds, which may be written CC1CCC/C(C)=C1/C=C/C(C)=C/C=C/C(C)=C/C=C/C=C(C)/C=C/C=C(C)/C=C/C2=C(C)/CCCC2(C)C
.
Configuration at tetrahedral carbon is specified by @
or @@
. Consider the four bonds in the order in which they appear, left to right, in the SMILES form. Looking toward the central carbon from the perspective of the first bond, the other three are either clockwise or counter-clockwise. These cases are indicated with @@
and @
, respectively (because the @
symbol itself is a counter-clockwise spiral).
For example, consider the amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
alanine
Alanine (symbol Ala or A), or α-alanine, is an α-amino acid that is used in the biosynthesis of proteins. It contains an amine group and a carboxylic acid group, both attached to the central carbon atom which also carries a methyl group side c ...
. One of its SMILES forms is NC(C)C(=O)O
, more fully written as N HC)C(=O)O
. L-Alanine, the more common enantiomer
In chemistry, an enantiomer ( /ɪˈnænti.əmər, ɛ-, -oʊ-/ ''ih-NAN-tee-ə-mər''; from Ancient Greek ἐνάντιος ''(enántios)'' 'opposite', and μέρος ''(méros)'' 'part') – also called optical isomer, antipode, or optical ant ...
, is written as N @@HC)C(=O)O
see depiction
. Looking from the nitrogen–carbon bond, the hydrogen (H
), methyl (C
), and carboxylate (C(=O)O
) groups appear clockwise. D-Alanine can be written as N @HC)C(=O)O
see depiction
.
While the order in which branches are specified in SMILES is normally unimportant, in this case it matters; swapping any two groups requires reversing the chirality indicator. If the branches are reversed so alanine is written as NC(C(=O)O)C
, then the configuration also reverses; L-alanine is written as N @HC(=O)O)C
see depiction
. Other ways of writing it include C @HN)C(=O)O
, OC(=O) @@HN)C
and OC(=O) @HC)N
.
Normally, the first of the four bonds appears to the left of the carbon atom, but if the SMILES is written beginning with the chiral carbon, such as C(C)(N)C(=O)O
, then all four are to the right, but the first to appear (the H/code> bond in this case) is used as the reference to order the following three: L-alanine may also be written @@HC)(N)C(=O)O
.
The SMILES specification includes elaborations on the @
symbol to indicate stereochemistry around more complex chiral centers, such as trigonal bipyramidal molecular geometry
In chemistry, a trigonal bipyramid formation is a molecular geometry with one atom at the center and 5 more atoms at the corners of a triangular bipyramid. This is one geometry for which the bond angles surrounding the central atom are not identi ...
.
Isotopes
Isotopes
Isotopes are two or more types of atoms that have the same atomic number (number of protons in their nuclei) and position in the periodic table (and hence belong to the same chemical element), and that differ in nucleon numbers (mass numbers) ...
are specified with a number equal to the integer isotopic mass preceding the atomic symbol. Benzene
Benzene is an organic chemical compound with the molecular formula C6H6. The benzene molecule is composed of six carbon atoms joined in a planar ring with one hydrogen atom attached to each. Because it contains only carbon and hydrogen atoms, ...
in which one atom is carbon-14
Carbon-14, C-14, or radiocarbon, is a radioactive isotope of carbon with an atomic nucleus containing 6 protons and 8 neutrons. Its presence in organic materials is the basis of the radiocarbon dating method pioneered by Willard Libby and coll ...
is written as 4cccccc1
and deuterochloroform is H(Cl)(Cl)Cl
.
Examples
To illustrate a molecule with more than 9 rings, consider cephalostatin-1, a steroidic 13-ringed pyrazine
Pyrazine is a heterocyclic aromatic organic compound with the chemical formula C4H4N2. It is a symmetrical molecule with point group D2h. Pyrazine is less basic than pyridine, pyridazine and pyrimidine. It is a ''"deliquescent crystal or wax-lik ...
with the empirical formula
In chemistry, the empirical formula of a chemical compound is the simplest whole number ratio of atoms present in a compound. A simple example of this concept is that the empirical formula of sulfur monoxide, or SO, would simply be SO, as is th ...
C54H74N2O10 isolated from the Indian Ocean
The Indian Ocean is the third-largest of the world's five oceanic divisions, covering or ~19.8% of the water on Earth's surface. It is bounded by Asia to the north, Africa to the west and Australia to the east. To the south it is bounded by th ...
hemichordate
Hemichordata is a phylum which consists of triploblastic, enterocoelomate, and bilaterally symmetrical marine deuterostome animals, generally considered the sister group of the echinoderms. They appear in the Lower or Middle Cambrian and includ ...
''Cephalodiscus gilchristi
''Cephalodiscus gilchristi'' is a Sessility (zoology), sessile hemichordate belonging to the order Cephalodiscida. It is found in South Africa in 1908.
Parasites
The parasitic copepod ''Zanclopus cephalodisci'' has been found in the intestines o ...
'':
:
Starting with the left-most methyl group in the figure:
:CC(C)(O1)C @@HO) @@(O2) @@HC) @@HCC=C4 @(C2)C(=O)C @H @HCC @@HC6) @(C)Cc(n7)c6nc(C @@9(C))c7C @@HCC @@H10 @@HC @@HO) @@11(C)C%10=C @HO%12) @11(O) @HC) @12(O%13) @HO)C @@13(C)CO
Note that %
appears in front of the index of ring closure labels above 9; see above.
Other examples of SMILES
The SMILES notation is described extensively in the SMILES theory manual provided by Daylight Chemical Information Systems and a number of illustrative examples are presented. Daylight's depict utility provides users with the means to check their own examples of SMILES and is a valuable educational tool.
Extensions
SMARTS is a line notation for specification of substructural patterns in molecules. While it uses many of the same symbols as SMILES, it also allows specification of wildcard
Wild card most commonly refers to:
* Wild card (cards), a playing card that substitutes for any other card in card games
* Wild card (sports), a tournament or playoff place awarded to an individual or team that has not qualified through normal pla ...
atoms and bonds, which can be used to define substructural queries for chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.
Types of chemical databases
Bioactivit ...
searching. One common misconception is that SMARTS-based substructural searching involves matching of SMILES and SMARTS strings. In fact, both SMILES and SMARTS strings are first converted to internal graph representations which are searched for subgraph isomorphism
In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word is ...
.
SMIRKS, a superset of "reaction SMILES" and a subset of "reaction SMARTS", is a line notation for specifying reaction transforms. The general syntax for the reaction extensions is REACTANT>AGENT>PRODUCT
(without spaces), where any of the fields can either be left blank or filled with multiple molecules deliminated with a dot (.
), and other descriptions dependent on the base language. Atoms can additionally be identified with a number (e.g. :1/code>) for mapping, for example in .
SMILES corresponds to discrete molecular structures. However many materials are macromolecules, which are too large (and often stochastic) to conveniently generate SMILES for. BigSMILES is an extension of SMILES that aims to provide an efficient representation system for macromolecules.
Conversion
SMILES can be converted back to two-dimensional representations using structure diagram generation (SDG) algorithms. This conversion is not always unambiguous. Conversion to three-dimensional representation is achieved by energy-minimization approaches. There are many downloadable and web-based conversion utilities.
See also
* SMILES arbitrary target specification (SMARTS), an extension of SMILES for specification of substructural queries
* SYBYL Line Notation, another line notation
* International Chemical Identifier
The International Chemical Identifier (InChI or ) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the we ...
(InChI), the IUPAC
The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
's alternative to SMILES
* Molecular Query Language The Molecular Query Language (MQL) was designed to allow more complex, problem-specific search methods in chemoinformatics. In contrast to the widely used SMARTS queries, MQL provides for the specification of spatial and physicochemical properties ...
, a query language
Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL).
Types
Broadly, query language ...
allowing also numerical properties, e.g. physicochemical values or distances
* Chemistry Development Kit
The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed und ...
, 2D layout and conversion software
* OpenBabel
Open Babel is computer software, a chemical expert system mainly used to interconvert chemical file formats.
About
Due to the strong relationship to informatics this program belongs more to the category cheminformatics than to molecular model ...
, JOELib
JOELib is computer software, a chemical expert system used mainly to interconvert chemical file formats. Because of its strong relationship to informatics, this program belongs more to the category cheminformatics than to molecular modelling. ...
, OELib OELib was an Open Source Cheminformatics library. Its actual GPLed C++ and Java successors are OpenBabel and JOELib. Its commercial successor is called OEChem.
See also
* JOELib
* OpenBabel
External links
* Archived copy oOELibin 2008 on In ...
(conversion)
References
{{DEFAULTSORT:Simplified Molecular Input Line Entry System
Chemical nomenclature
Encodings
Chemical file formats