In
graph theory
In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...
, a branch of mathematics, graph canonization is the problem of finding a
canonical form
In mathematics and computer science, a canonical, normal, or standard form of a mathematical object is a standard way of presenting that object as a mathematical expression. Often, it is one which provides the simplest representation of an ...
of a given graph ''G''. A canonical form is a
labeled graph
In the mathematical discipline of graph theory, a graph labelling is the assignment of labels, traditionally represented by integers, to edges and/or vertices of a graph.
Formally, given a graph , a vertex labelling is a function of to a set ...
Canon(''G'') that is
isomorphic to ''G'', such that every graph that is isomorphic to ''G'' has the same canonical form as ''G''. Thus, from a solution to the graph canonization problem, one could also solve the problem of
graph isomorphism
In graph theory, an isomorphism of graphs ''G'' and ''H'' is a bijection between the vertex sets of ''G'' and ''H''
: f \colon V(G) \to V(H)
such that any two vertices ''u'' and ''v'' of ''G'' are adjacent in ''G'' if and only if f(u) and f(v) ar ...
: to test whether two graphs ''G'' and ''H'' are isomorphic, compute their canonical forms Canon(''G'') and Canon(''H''), and test whether these two canonical forms are identical.
The canonical form of a graph is an example of a
complete graph invariant: every two isomorphic graphs have the same canonical form, and every two non-isomorphic graphs have different canonical forms.
[.][.] Conversely, every complete invariant of graphs may be used to construct a canonical form. The vertex set of an ''n''-vertex graph may be identified with the
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
s from 1 to ''n'', and using such an identification a canonical form of a graph may also be described as a
permutation of its vertices. Canonical forms of a graph are also called canonical labelings, and graph canonization is also sometimes known as graph canonicalization.
Computational complexity
Clearly, the graph canonization problem is at least as
computationally hard as the
graph isomorphism problem
The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.
The problem is not known to be solvable in polynomial time nor to be NP-complete, and therefore may be in the computational compl ...
. In fact, graph isomorphism is even
AC0-
reducible to graph canonization. However it is still an open question whether the two problems are
polynomial time equivalent.
[
While the existence of (deterministic) polynomial algorithms for graph isomorphism is still an open problem in ]computational complexity theory
In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved ...
, in 1977 László Babai
László "Laci" Babai (born July 20, 1950, in Budapest) a fellow of the American Academy of Arts and Sciences, and won the Knuth Prize.
Babai was an invited speaker at the International Congresses of Mathematicians in Kyoto (1990), Zürich (1994, ...
reported that with probability at least 1 − exp(−O(''n'')), a simple vertex classification algorithm produces a canonical labeling of a graph chosen uniformly at random from the set of all ''n''-vertex graphs after only two refinement steps. Small modifications and an added depth-first search step produce canonical labeling of such uniformly-chosen random graphs in linear expected time. This result sheds some light on the fact why many reported graph isomorphism algorithms behave well in practice. This was an important breakthrough in probabilistic complexity theory which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.
A commonly known canonical form is the lexicographically smallest graph within the isomorphism class
In mathematics, an isomorphism class is a collection of mathematical objects isomorphic to each other.
Isomorphism classes are often defined as the exact identity of the elements of the set is considered irrelevant, and the properties of the stru ...
, which is the graph of the class with lexicographically smallest adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simp ...
considered as a linear string.
However, the computation of the lexicographically smallest graph is NP-hard.
For trees, a concise polynomial canonization algorithm requiring O(n) space is presented by . Begin by labeling each vertex with the string 01. Iteratively for each non-leaf x remove the leading 0 and trailing 1 from x's label; then sort x's label along with the labels of all adjacent leaves in lexicographic order. Concatenate these sorted labels, add back a leading 0 and trailing 1, make this the new label of x, and delete the adjacent leaves. If there are two vertices remaining, concatenate their labels in lexicographic order.
Applications
Graph canonization is the essence of many graph isomorphism algorithms. One of the leading tools is Nauty.
A common application of graph canonization is in graphical data mining, in particular in chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.
Types of chemical databases
Bioactivi ...
applications.
A number of identifiers for chemical substance
A chemical substance is a form of matter having constant chemical composition and characteristic properties. Some references add that chemical substance cannot be separated into its constituent elements by physical separation methods, i.e., w ...
s, such as SMILES
The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors f ...
and InChI use canonicalization steps in their computation, which is essentially the canonicalization of the graph which represents the molecule.
These identifiers are designed to provide a standard (and sometimes human-readable) way to encode molecular information and to facilitate the search for such information in databases and on the web.
References
{{reflist
Graph theory