
In
theoretical computer science
Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation.
It is difficult to circumscribe the theoretical areas precisely. The Associati ...
, the subgraph isomorphism problem is a computational task in which two
graphs and
are given as input, and one must determine whether
contains a
subgraph that is
isomorphic
In mathematics, an isomorphism is a structure-preserving mapping or morphism between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between the ...
to
.
Subgraph isomorphism is a generalization of both the
maximum clique problem and the problem of testing whether a graph contains a
Hamiltonian cycle
In the mathematics, mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path (graph theory), path in an undirected or directed graph that visits each vertex (graph theory), vertex exactly once. A Hamiltonian cycle (or ...
, and is therefore
NP-complete
In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''.
Somewhat more precisely, a problem is NP-complete when:
# It is a decision problem, meaning that for any ...
. However certain other cases of subgraph isomorphism may be solved in polynomial time.
Sometimes the name subgraph matching is also used for the same problem. This name puts emphasis on finding such a subgraph as opposed to the bare decision problem.
Decision problem and computational complexity
To prove subgraph isomorphism is NP-complete, it must be formulated as a
decision problem
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question on a set of input values. An example of a decision problem is deciding whether a given natura ...
. The input to the decision problem is a pair of graphs
and ''H''. The answer to the problem is positive if ''H'' is isomorphic to a subgraph of ''G'', and negative otherwise.
Formal question:
Let
,
be graphs. Is there a subgraph
such that
? I. e., does there exist a
bijection
In mathematics, a bijection, bijective function, or one-to-one correspondence is a function between two sets such that each element of the second set (the codomain) is the image of exactly one element of the first set (the domain). Equival ...
such that
?
The proof of subgraph isomorphism being NP-complete is simple and based on reduction of the
clique problem, an NP-complete decision problem in which the input is a single graph ''G'' and a number ''k'', and the question is whether ''G'' contains a
complete subgraph with ''k'' vertices. To translate this to a subgraph isomorphism problem, simply let ''H'' be the complete graph ''K''
''k''; then the answer to the subgraph isomorphism problem for ''G'' and ''H'' is equal to the answer to the clique problem for ''G'' and ''k''. Since the clique problem is NP-complete, this
polynomial-time many-one reduction shows that subgraph isomorphism is also NP-complete.
An alternative reduction from the
Hamiltonian cycle
In the mathematics, mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path (graph theory), path in an undirected or directed graph that visits each vertex (graph theory), vertex exactly once. A Hamiltonian cycle (or ...
problem translates a graph ''G'' which is to be tested for Hamiltonicity into the pair of graphs ''G'' and ''H'', where ''H'' is a cycle having the same number of vertices as ''G''. Because the Hamiltonian cycle problem is NP-complete even for
planar graphs
In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cro ...
, this shows that subgraph isomorphism remains NP-complete even in the planar case.
Subgraph isomorphism is a generalization of the
graph isomorphism problem
The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.
The problem is not known to be solvable in polynomial time nor to be NP-complete, and therefore may be in the computational c ...
, which asks whether ''G'' is isomorphic to ''H'': the answer to the graph isomorphism problem is true if and only if ''G'' and ''H'' both have the same numbers of vertices and edges and the subgraph isomorphism problem for ''G'' and ''H'' is true. However the complexity-theoretic status of graph isomorphism remains an open question.
In the context of the
Aanderaa–Karp–Rosenberg conjecture on the
query complexity of monotone graph properties, showed that any subgraph isomorphism problem has query complexity Ω(''n''
3/2); that is, solving the subgraph isomorphism requires an algorithm to check the presence or absence in the input of Ω(''n''
3/2) different edges in the graph.
Algorithms
describes a recursive
backtracking
Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
procedure for solving the subgraph isomorphism problem. Although its running time is, in general, exponential, it takes polynomial time for any fixed choice of ''H'' (with a polynomial that depends on the choice of ''H''). When ''G'' is a
planar graph
In graph theory, a planar graph is a graph (discrete mathematics), graph that can be graph embedding, embedded in the plane (geometry), plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. ...
(or more generally a graph of
bounded expansion) and ''H'' is fixed, the running time of subgraph isomorphism can be reduced to
linear time
In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...
.
[; ]
is a substantial update to the 1976 subgraph isomorphism algorithm paper.
proposed in 2004 another algorithm based on Ullmann's, VF2, which improves the refinement process using different heuristics and uses significantly less memory.
proposed a better algorithm, which improves the initial order of the vertices using some heuristics.
The current state of the art solver for moderately-sized, hard instances is the Glasgow Subgraph Solver (). This solver adopts a
constraint programming
Constraint programming (CP) is a paradigm for solving combinatorial problems that draws on a wide range of techniques from artificial intelligence, computer science, and operations research. In constraint programming, users declaratively state t ...
approach, using bit-parallel data structures and specialized propagation algorithms for performance. It supports most common variations of the problem and is capable of counting or enumerating solutions as well as deciding whether one exists.
For large graphs, state-of-the art algorithms include CFL-Match and Turboiso, and extensions thereupon such as DAF by .
Applications
As subgraph isomorphism has been applied in the area of
cheminformatics
Cheminformatics (also known as chemoinformatics) refers to the use of physical chemistry theory with computer and information science techniques—so called "'' in silico''" techniques—in application to a range of descriptive and prescriptive ...
to find similarities between chemical compounds from their
structural formula
The structural formula of a chemical compound is a graphic representation of the molecular structure (determined by structural chemistry methods), showing how the atoms are connected to one another. The chemical bonding within the molecule is al ...
; often in this area the term
substructure search is used. A query structure is often defined graphically using a
structure editor program;
SMILES based database systems typically define queries using
SMARTS, a
SMILES extension.
The closely related problem of counting the number of isomorphic copies of a graph ''H'' in a larger graph ''G'' has been applied to pattern discovery in databases, the
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
of protein-protein interaction networks, and in
exponential random graph methods for mathematically modeling
social network
A social network is a social structure consisting of a set of social actors (such as individuals or organizations), networks of Dyad (sociology), dyadic ties, and other Social relation, social interactions between actors. The social network per ...
s.
describe an application of subgraph isomorphism in the
computer-aided design
Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...
of
electronic circuits
An electronic circuit is composed of individual electronic components, such as resistors, transistors, capacitors, inductors and diodes, connected by conductive wires or traces through which electric current can flow. It is a type of electric ...
. Subgraph matching is also a substep in
graph rewriting (the most runtime-intensive), and thus offered by
graph rewrite tools.
The problem is also of interest in
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
, where it is considered part of an array of
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
in graphs problems; an extension of subgraph isomorphism known as
graph mining is also of interest in that area.
[http://www.aaai.org/Papers/Symposia/Fall/2006/FS-06-02/FS06-02-007.pdf; expanded version at https://e-reports-ext.llnl.gov/pdf/332302.pdf]
See also
*
Frequent subtree mining
*
Induced subgraph isomorphism problem
*
Maximum common edge subgraph problem
*
Maximum common subgraph isomorphism problem
Notes
References
*.
*.
*. A1.4: GT48, pg.202.
*.
*
*.
*.
*.
*.
*.
*.
*.
*.
*
*
*
*
*
{{DEFAULTSORT:Subgraph Isomorphism Problem
NP-complete problems
Graph algorithms
Computational problems in graph theory