HOME

TheInfoList



OR:

In
theoretical computer science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
, the subgraph isomorphism problem is a computational task in which two graphs ''G'' and ''H'' are given as input, and one must determine whether ''G'' contains a subgraph that is
isomorphic In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word i ...
to ''H''. Subgraph isomorphism is a generalization of both the
maximum clique problem In computer science, the clique problem is the computational problem of finding cliques (subsets of vertices, all adjacent to each other, also called complete subgraphs) in a graph. It has several different formulations depending on which cli ...
and the problem of testing whether a graph contains a
Hamiltonian cycle In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a cycle that visits each vertex ...
, and is therefore
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
. However certain other cases of subgraph isomorphism may be solved in polynomial time. Sometimes the name subgraph matching is also used for the same problem. This name puts emphasis on finding such a subgraph as opposed to the bare decision problem.


Decision problem and computational complexity

To prove subgraph isomorphism is NP-complete, it must be formulated as a
decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whe ...
. The input to the decision problem is a pair of graphs ''G'' and ''H''. The answer to the problem is positive if ''H'' is isomorphic to a subgraph of ''G'', and negative otherwise. Formal question: Let G=(V,E), H=(V^\prime,E^\prime) be graphs. Is there a subgraph G_0=(V_0,E_0) \mid V_0\subseteq V, E_0\subseteq E\cap(V_0\times V_0) such that G_0\cong H? I. e., does there exist a
bijection In mathematics, a bijection, also known as a bijective function, one-to-one correspondence, or invertible function, is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other ...
f\colon V_0\rightarrow V^\prime such that \ \in E_0 \iff \ \in E^\prime? The proof of subgraph isomorphism being NP-complete is simple and based on reduction of the
clique problem In computer science, the clique problem is the computational problem of finding cliques (subsets of vertices, all adjacent to each other, also called complete subgraphs) in a graph. It has several different formulations depending on which cli ...
, an NP-complete decision problem in which the input is a single graph ''G'' and a number ''k'', and the question is whether ''G'' contains a complete subgraph with ''k'' vertices. To translate this to a subgraph isomorphism problem, simply let ''H'' be the complete graph ''K''''k''; then the answer to the subgraph isomorphism problem for ''G'' and ''H'' is equal to the answer to the clique problem for ''G'' and ''k''. Since the clique problem is NP-complete, this
polynomial-time many-one reduction In computational complexity theory, a polynomial-time reduction is a method for solving one problem using another. One shows that if a hypothetical subroutine solving the second problem exists, then the first problem can be solved by transforming ...
shows that subgraph isomorphism is also NP-complete. An alternative reduction from the
Hamiltonian cycle In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a cycle that visits each vertex ...
problem translates a graph ''G'' which is to be tested for Hamiltonicity into the pair of graphs ''G'' and ''H'', where ''H'' is a cycle having the same number of vertices as ''G''. Because the Hamiltonian cycle problem is NP-complete even for
planar graphs In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cro ...
, this shows that subgraph isomorphism remains NP-complete even in the planar case. Subgraph isomorphism is a generalization of the
graph isomorphism problem The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic. The problem is not known to be solvable in polynomial time nor to be NP-complete, and therefore may be in the computational compl ...
, which asks whether ''G'' is isomorphic to ''H'': the answer to the graph isomorphism problem is true if and only if ''G'' and ''H'' both have the same numbers of vertices and edges and the subgraph isomorphism problem for ''G'' and ''H'' is true. However the complexity-theoretic status of graph isomorphism remains an open question. In the context of the
Aanderaa–Karp–Rosenberg conjecture In theoretical computer science, the Aanderaa–Karp–Rosenberg conjecture (also known as the Aanderaa–Rosenberg conjecture or the evasiveness conjecture) is a group of related conjectures about the number of questions of the form "Is there an ...
on the
query complexity In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of ''queries'' or ''tests'' that are done adaptively, so the outcome of the previ ...
of monotone graph properties, showed that any subgraph isomorphism problem has query complexity Ω(''n''3/2); that is, solving the subgraph isomorphism requires an algorithm to check the presence or absence in the input of Ω(''n''3/2) different edges in the graph.


Algorithms

describes a recursive backtracking procedure for solving the subgraph isomorphism problem. Although its running time is, in general, exponential, it takes polynomial time for any fixed choice of ''H'' (with a polynomial that depends on the choice of ''H''). When ''G'' is a
planar graph In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cro ...
(or more generally a graph of
bounded expansion In graph theory, a family of graphs is said to have bounded expansion if all of its shallow minors are sparse graphs. Many natural families of sparse graphs have bounded expansion. A closely related but stronger property, polynomial expansion, i ...
) and ''H'' is fixed, the running time of subgraph isomorphism can be reduced to
linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
.; is a substantial update to the 1976 subgraph isomorphism algorithm paper. proposed in 2004 another algorithm based on Ullmann's, VF2, which improves the refinement process using different heuristics and uses significantly less memory. proposed a better algorithm, which improves the initial order of the vertices using some heuristics. The current state of the art solver for moderately-sized, hard instances is the Glasgow Subgraph Solver (). This solver adopts a constraint programming approach, using bit-parallel data structures and specialized propagation algorithms for performance. It supports most common variations of the problem and is capable of counting or enumerating solutions as well as deciding whether one exists. For large graphs, state-of-the art algorithms include CFL-Match and Turboiso, and extensions thereupon such as DAF by .


Applications

As subgraph isomorphism has been applied in the area of
cheminformatics Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive problem ...
to find similarities between chemical compounds from their structural formula; often in this area the term substructure search is used. A query structure is often defined graphically using a
structure editor A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, c ...
program;
SMILES The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors ...
based database systems typically define queries using SMARTS, a
SMILES The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors ...
extension. The closely related problem of counting the number of isomorphic copies of a graph ''H'' in a larger graph ''G'' has been applied to pattern discovery in databases, the
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
of protein-protein interaction networks, and in
exponential random graph Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above *Exponential decay, decrease at a rate proportional to value *Expo ...
methods for mathematically modeling
social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods fo ...
s. describe an application of subgraph isomorphism in the
computer-aided design Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve co ...
of
electronic circuits An electronic circuit is composed of individual electronic components, such as resistors, transistors, capacitors, inductors and diodes, connected by conductive wires or traces through which electric current can flow. It is a type of electrical ...
. Subgraph matching is also a substep in
graph rewriting In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering (software construction and also ...
(the most runtime-intensive), and thus offered by graph rewrite tools. The problem is also of interest in
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
, where it is considered part of an array of
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
in graphs problems; an extension of subgraph isomorphism known as
graph mining Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining. Descrip ...
is also of interest in that area.http://www.aaai.org/Papers/Symposia/Fall/2006/FS-06-02/FS06-02-007.pdf; expanded version at https://e-reports-ext.llnl.gov/pdf/332302.pdf


See also

*
Frequent subtree mining In computer science, frequent subtree mining is the problem of finding all patterns in a given database whose support (a metric related to its number of occurrences in other subtrees) is over a given threshold. It is a more general form of the maxi ...
*
Induced subgraph isomorphism problem In complexity theory and graph theory, induced subgraph isomorphism is an NP-complete decision problem that involves finding a given graph as an induced subgraph of a larger graph. Problem statement Formally, the problem takes as input two gra ...
*
Maximum common edge subgraph problem Given two graphs G and G', the maximum common edge subgraph problem is the problem of finding a graph H with as many edges as possible which is isomorphic to both a subgraph of G and a subgraph of G'. The maximum common edge subgraph problem on g ...
*
Maximum common subgraph isomorphism problem In graph theory and theoretical computer science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and t ...


Notes


References

*. *. *. A1.4: GT48, pg.202. *. * *. *. *. *. *. *. *. *. * * * * * {{DEFAULTSORT:Subgraph Isomorphism Problem NP-complete problems Graph algorithms Computational problems in graph theory