HOME

TheInfoList



OR:

This page describes
mining Mining is the extraction of valuable minerals or other geological materials from the Earth, usually from an ore body, lode, vein, seam, reef, or placer deposit. The exploitation of these deposits for raw material is based on the economic via ...
for
molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioch ...
s. Since molecules may be represented by
molecular graph In chemical graph theory and in mathematical chemistry, a molecular graph or chemical graph is a representation of the structural formula of a chemical compound in terms of graph theory. A chemical graph is a labeled graph whose vertices corresp ...
s this is strongly related to
graph mining Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining. Descrip ...
and
structured data mining Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining. Descrip ...
. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity
metrics Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathema ...
, which has a long tradition in the field of
cheminformatics Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive problem ...
. Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.


Coding(Moleculei,Moleculej\neqi)


Kernel methods

* Marginalized
graph kernel In structure mining, a graph kernel is a kernel function that computes an inner product on graphs. Graph kernels can be intuitively understood as functions measuring the similarity of pairs of graphs. They allow kernelized learning algorithms su ...
H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF * Optimal assignment kernelH. Fröhlich, J. K. Wegner, A. Zell, ''Optimal Assignment Kernels For Attributed Molecular Graphs'', The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDFH. Fröhlich, J. K. Wegner, A. Zell, ''Assignment Kernels For Chemical Compounds'', International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer * Pharmacophore kernel
C++ (and R) implementation
combining ** the marginalized graph kernel between labeled graphs ** extensions of the marginalized kernel ** Tanimoto kernels ** graph kernels based on tree patterns ** kernels based on pharmacophores for 3D structure of molecules


Maximum Common Graph methods

* MCS-HSCS (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS) *
Small Molecule Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs ar ...
Subgraph Detector (SMSD)- is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).


Coding(Moleculei)


Molecular query methods

* WarmrL. Dehaspe, H. Toivonen, King, ''Finding frequent substructures in chemical compounds'', 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36. * AGMA. Inokuchi, T. Washio, T. Okada, H. Motoda, ''Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis'', ''Journal of Computer Aided Chemistry'', 2001;, 2, 87-92.A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, ''A Fast Algorithm for Mining Frequent Connected Subgraphs'', IBM Research, Tokyo Research Laboratory, 2002. * PolyFARMA. Clare, R. D. King, ''Data mining the yeast genome in a lazy functional language'', Practical Aspects of Declarative Languages (PADL2003), 2003. * FSG * MolFea * MoFa/MoSST. Meinl, C. Borgelt, M. R. Berthold, ''Discriminative Closed Fragment Mining and Perfect Extensions in MoFa'', Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, ''Mining Fragments with Fuzzy Chains in Molecular Databases'', Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004. * GastonS. Nijssen, J. N. Kok. ''Frequent Graph Mining and its Application to Molecular Databases'', Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004. * LAZARC. Helma, Predictive Toxicology, CRC Press, 2005. * ParMolM. Wörlein, ''Extension and parallelization of a graph-mining-algorithm'', Friedrich-Alexander-Universität, 2006. PDF (contains MoFa, FFSM, gSpan, and Gaston) * optimized gSpanK. Jahn, S. Kramer, ''Optimizing gSpan for Molecular Datasets'', Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.X. Yan, J. Han, ''gSpan: Graph-Based Substructure Pattern Mining'', ''Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society'', 2002, 721-724. * SMIREP * DMax * SAm/AIm/RHC * AFGen * gRedA. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad ''Mining Connected Subgraph Mining Reducing the Number of Candidates'', '' Proc. of ECML--PKDD'', pp. 365–376, ''2008''. * G-HashXiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, ''Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases '', '' BMC Bioinformatics'' Vol. 11 (Suppl 3):S8 ''2010''.


Methods based on special architectures of neural networks

* BPZ * ChemNet * CCS * MolNet * Graph machines


See also

*
Molecular Query Language The Molecular Query Language (MQL) was designed to allow more complex, problem-specific search methods in chemoinformatics. In contrast to the widely used SMARTS queries, MQL provides for the specification of spatial and physicochemical properties ...
*
Chemical graph theory Chemical graph theory is the topology branch of mathematical chemistry which applies graph theory to mathematical modelling of chemical phenomena. The pioneers of chemical graph theory are Alexandru Balaban, Ante Graovac, Iván Gutman, Haruo Hosoy ...
* QSAR *
ADME ADME is an abbreviation in pharmacokinetics and pharmacology for " absorption, distribution, metabolism, and excretion", and describes the disposition of a pharmaceutical compound within an organism. The four criteria all influence the drug le ...
*
partition coefficient In the physical sciences, a partition coefficient (''P'') or distribution coefficient (''D'') is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solub ...


References


Further reading

* Schölkopf, B., K. Tsuda and J. P. Vert: ''Kernel Methods in Computational Biology'', MIT Press, Cambridge, MA, 2004. * R.O. Duda, P.E. Hart, D.G. Stork, ''Pattern Classification'', John Wiley & Sons, 2001. * Gusfield, D., ''Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology'', Cambridge University Press, 1997. * R. Todeschini, V. Consonni, ''Handbook of Molecular Descriptors'', Wiley-VCH, 2000. {{ISBN, 3-527-29913-0


External links


Small Molecule Subgraph Detector (SMSD)
- is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules.
5th International Workshop on Mining and Learning with Graphs, 2007





ParMol
an
master thesis documentation
- Java - Open source - Distributed mining - Benchmark algorithm library
TU München - Kramer group

Molecule mining (advanced chemical expert systems)

DMax Chemistry Assistant
- commercial software
AFGen
- Software for generating fragment-based descriptors Cheminformatics Computational chemistry Data mining