HOME

TheInfoList



OR:

In
structure mining Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining. Descrip ...
, a graph kernel is a
kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...
that computes an
inner product In mathematics, an inner product space (or, rarely, a Hausdorff space, Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation (mathematics), operation called an inner product. The inner product of two ve ...
on
graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
. Graph kernels can be intuitively understood as functions measuring the similarity of pairs of graphs. They allow kernelized learning algorithms such as
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s to work directly on graphs, without having to do
feature extraction In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning a ...
to transform them to fixed-length, real-valued
feature vector In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
s. They find applications in
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, in
chemoinformatics Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive proble ...
(as a type of
molecule kernel This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data in ...
s), and in
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) a ...
. Concepts of graph kernels have been around since the 1999, when D. Haussler introduced convolutional kernels on discrete structures. The term graph kernels was more officially coined in 2002 by R. I. Kondor and J. Lafferty as kernels ''on'' graphs, i.e. similarity functions between the nodes of a single graph, with the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
hyperlink In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text wit ...
graph as a suggested application. In 2003, Gaertner ''et al.'' and Kashima ''et al.'' defined kernels ''between'' graphs. In 2010, Vishwanathan ''et al.'' gave their unified framework. In 2018, Ghosh et al. described the history of graph kernels and their evolution over two decades.


Applications

The marginalized graph kernel has been shown to allow accurate predictions of the atomization energy of small organic molecules.


Example Kernels

An example of a kernel between graphs is the random walk kernel, which conceptually performs
random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
s on two graphs simultaneously, then counts the number of
path A path is a route for physical travel – see Trail. Path or PATH may also refer to: Physical paths of different types * Bicycle path * Bridle path, used by people on horseback * Course (navigation), the intended path of a vehicle * Desire p ...
s that were produced by ''both'' walks. This is equivalent to doing random walks on the
direct product In mathematics, one can often define a direct product of objects already known, giving a new one. This generalizes the Cartesian product of the underlying sets, together with a suitably defined structure on the product set. More abstractly, one ta ...
of the pair of graphs, and from this, a kernel can be derived that can be efficiently computed. Another examples is the Weisfeiler-Lehman graph kernelShervashidze, Nino, et al. "Weisfeiler-lehman graph kernels." Journal of Machine Learning Research 12.9 (2011). which computes multiple rounds of the Weisfeiler-Lehman algorithm and then computes the similarity of two graphs as the inner product of the histogram vectors of both graphs. In those histogram vectors the kernel collects the number of times a color occurs in the graph in every iteration. Note that the Weisfeiler-Lehman kernel in theory has an infinite dimension as the number of possible colors assigned by the Weisfeiler-Lehman algorithm is infinite. By restricting to the colors that occur in both graphs, the computation is still feasible.


See also

*
Tree kernel In machine learning, tree kernels are the application of the more general concept of positive-definite kernel to tree structures. They find applications in natural language processing, where they can be used for Machine learning, machine-learned par ...
, as special case of non-cyclic graphs *
Molecule mining This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data inst ...
, as special case of small multi-label graphs


References

Graph algorithms Kernel methods for machine learning {{comp-sci-stub