HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the clique problem is the computational problem of finding
clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...
s (subsets of vertices, all adjacent to each other, also called
complete Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies t ...
subgraphs) in a
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a
maximum clique In the mathematical area of graph theory, a clique ( or ) is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph G is an induced subgraph of G that is comple ...
(a clique with the largest possible number of vertices), finding a maximum weight clique in a weighted graph, listing all
maximal clique In the mathematical area of graph theory, a clique ( or ) is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph G is an induced subgraph of G that is compl ...
s (cliques that cannot be enlarged), and solving the
decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm wheth ...
of testing whether a graph contains a clique larger than a given size. The clique problem arises in the following real-world setting. Consider a
social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...
, where the graph's vertices represent people, and the graph's edges represent mutual acquaintance. Then a clique represents a subset of people who all know each other, and algorithms for finding cliques can be used to discover these groups of mutual friends. Along with its applications in social networks, the clique problem also has many applications in
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, and
computational chemistry Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into computer programs, to calculate the structures and properties of m ...
. Most versions of the clique problem are hard. The clique decision problem is
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
(one of
Karp's 21 NP-complete problems In computational complexity theory, Karp's 21 NP-complete problems are a set of computational problems which are NP-complete. In his 1972 paper, "Reducibility Among Combinatorial Problems", Richard Karp used Stephen Cook's 1971 theorem that the b ...
). The problem of finding the maximum clique is both fixed-parameter intractable and
hard to approximate In computer science, hardness of approximation is a field that studies the Computational complexity theory, algorithmic complexity of finding near-optimal solutions to optimization problems. Scope Hardness of approximation complements the study of ...
. And, listing all maximal cliques may require
exponential time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
as there exist graphs with exponentially many maximal cliques. Therefore, much of the theory about the clique problem is devoted to identifying special types of graph that admit more efficient algorithms, or to establishing the computational difficulty of the general problem in various models of computation. To find a maximum clique, one can systematically inspect all subsets, but this sort of brute-force search is too time-consuming to be practical for networks comprising more than a few dozen vertices. Although no
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
algorithm is known for this problem, more efficient
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s than the brute-force search are known. For instance, the
Bron–Kerbosch algorithm In computer science, the Bron–Kerbosch algorithm is an enumeration algorithm for finding all maximal cliques in an undirected graph. That is, it lists all subsets of vertices with the two properties that each pair of vertices in one of the listed ...
can be used to list all maximal cliques in worst-case optimal time, and it is also possible to list them in polynomial time per clique.


History and applications

The study of complete subgraphs in mathematics predates the "clique" terminology. For instance, complete subgraphs make an early appearance in the mathematical literature in the graph-theoretic reformulation of
Ramsey theory Ramsey theory, named after the British mathematician and philosopher Frank P. Ramsey, is a branch of mathematics that focuses on the appearance of order in a substructure given a structure of a known size. Problems in Ramsey theory typically ask ...
by . But the term "clique" and the problem of algorithmically listing cliques both come from the social sciences, where complete subgraphs are used to model social cliques, groups of people who all know each other. used graphs to model social networks, and adapted the social science terminology to graph theory. They were the first to call complete subgraphs "cliques". The first algorithm for solving the clique problem is that of ,; . who were motivated by the sociological application. Social science researchers have also defined various other types of cliques and maximal cliques in social network, "cohesive subgroups" of people or actors in the network all of whom share one of several different kinds of connectivity relation. Many of these generalized notions of cliques can also be found by constructing an undirected graph whose edges represent related pairs of actors from the social network, and then applying an algorithm for the clique problem to this graph. Since the work of Harary and Ross, many others have devised algorithms for various versions of the clique problem. In the 1970s, researchers began studying these algorithms from the point of view of
worst-case analysis The worst-case analysis regulation40 CFR 1502.22 was promulgated in 1979 by the US Council on Environmental Quality (CEQ). The regulation is one of many implementing the National Environmental Policy Act of 1969 and it sets out the formal procedur ...
. See, for instance, , an early work on the worst-case complexity of the maximum clique problem. Also in the 1970s, beginning with the work of and , researchers began using the theory of
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
ness and related intractability results to provide a mathematical explanation for the perceived difficulty of the clique problem. In the 1990s, a breakthrough series of papers beginning with and reported in ''
The New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
'', showed that (assuming
P ≠ NP The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved. The informal term ''quickly'', used above ...
) it is not even possible to
approximate An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ' ...
the problem accurately and efficiently. Clique-finding algorithms have been used in
chemistry Chemistry is the science, scientific study of the properties and behavior of matter. It is a natural science that covers the Chemical element, elements that make up matter to the chemical compound, compounds made of atoms, molecules and ions ...
, to find chemicals that match a target structure and to model molecular docking and the binding sites of chemical reactions. They can also be used to find similar structures within different molecules. In these applications, one forms a graph in which each vertex represents a matched pair of atoms, one from each of two molecules. Two vertices are connected by an edge if the matches that they represent are compatible with each other. Being compatible may mean, for instance, that the distances between the atoms within the two molecules are approximately equal, to within some given tolerance. A clique in this graph represents a set of matched pairs of atoms in which all the matches are compatible with each other. A special case of this method is the use of the
modular product of graphs In graph theory, the modular product of graphs ''G'' and ''H'' is a graph formed by combining ''G'' and ''H'' that has applications to subgraph isomorphism. It is one of several different kinds of graph products that have been studied, generally u ...
to reduce the problem of finding the
maximum common induced subgraph In graph theory and theoretical computer science, a maximum common induced subgraph of two graphs ''G'' and ''H'' is a graph that is an induced subgraph of both ''G'' and ''H'', and that has as many vertices as possible. Finding this graph is NP-ha ...
of two graphs to the problem of finding a maximum clique in their product. In
automatic test pattern generation ATPG (acronym for both Automatic Test Pattern Generation and Automatic Test Pattern Generator) is an electronic design automation method/technology used to find an input (or test) sequence that, when applied to a digital circuit, enables automatic t ...
, finding cliques can help to bound the size of a test set. In
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, clique-finding algorithms have been used to infer
evolutionary tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
s, predict protein structures, and find closely interacting clusters of proteins. Listing the cliques in a
dependency graph In mathematics, computer science and digital electronics, a dependency graph is a directed graph representing dependencies of several objects towards each other. It is possible to derive an evaluation order or the absence of an evaluation order t ...
is an important step in the analysis of certain random processes. In mathematics,
Keller's conjecture In geometry, Keller's conjecture is the conjecture that in any tiling of -dimensional Euclidean space by identical hypercubes, there are two hypercubes that share an entire -dimensional face with each other. For instance, in any tiling of the pl ...
on face-to-face tiling of
hypercube In geometry, a hypercube is an ''n''-dimensional analogue of a square () and a cube (). It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel line segments aligned in each of the space's dimensions, ...
s was disproved by , who used a clique-finding algorithm on an associated graph to find a counterexample.


Definitions

An
undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...
is formed by a
finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, :\ is a finite set with five elements. Th ...
of vertices and a set of
unordered pair In mathematics, an unordered pair or pair set is a set of the form , i.e. a set having two elements ''a'' and ''b'' with no particular relation between them, where = . In contrast, an ordered pair (''a'', ''b'') has ''a'' as its first ...
s of vertices, which are called edges. By convention, in algorithm analysis, the number of vertices in the graph is denoted by and the number of edges is denoted by . A
clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...
in a graph is a
complete Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies t ...
subgraph of . That is, it is a subset of the vertices such that every two vertices in are the two endpoints of an edge in . A
maximal clique In the mathematical area of graph theory, a clique ( or ) is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph G is an induced subgraph of G that is compl ...
is a clique to which no more vertices can be added. For each vertex that is not part of a maximal clique, there must be another vertex that is in the clique and non-adjacent to , preventing from being added to the clique. A
maximum clique In the mathematical area of graph theory, a clique ( or ) is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph G is an induced subgraph of G that is comple ...
is a clique that includes the largest possible number of vertices. The clique number is the number of vertices in a maximum clique of . Several closely related clique-finding problems have been studied.; . *In the maximum clique problem, the input is an undirected graph, and the output is a maximum clique in the graph. If there are multiple maximum cliques, one of them may be chosen arbitrarily. *In the weighted maximum clique problem, the input is an undirected graph with weights on its vertices (or, less frequently, edges) and the output is a clique with maximum total weight. The maximum clique problem is the special case in which all weights are equal. As well as the problem of optimizing the sum of weights, other more complicated bicriterion optimization problems have also been studied. *In the maximal clique listing problem, the input is an undirected graph, and the output is a list of all its maximal cliques. The maximum clique problem may be solved using as a subroutine an algorithm for the maximal clique listing problem, because the maximum clique must be included among all the maximal cliques. *In the -clique problem, the input is an undirected graph and a number . The output is a clique with vertices, if one exists, or a special value indicating that there is no -clique otherwise. In some variations of this problem, the output should list all cliques of size . *In the clique decision problem, the input is an undirected graph and a number , and the output is a
Boolean value In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas in ...
: true if the graph contains a -clique, and false otherwise. The first four of these problems are all important in practical applications. The clique decision problem is not of practical importance; it is formulated in this way in order to apply the theory of
NP-completeness In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
to clique-finding problems. The clique problem and the
independent set problem In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set S of vertices such that for every two vertices in S, there is no edge connecting the two ...
are complementary: a clique in is an independent set in the
complement graph In the mathematical field of graph theory, the complement or inverse of a graph is a graph on the same vertices such that two distinct vertices of are adjacent if and only if they are not adjacent in . That is, to generate the complement of a ...
of and vice versa. Therefore, many computational results may be applied equally well to either problem, and some research papers do not clearly distinguish between the two problems. However, the two problems have different properties when applied to restricted families of graphs. For instance, the clique problem may be solved in polynomial time for planar graphs; . while the independent set problem remains NP-hard on planar graphs.


Algorithms


Finding a single maximal clique

A maximal clique, sometimes called inclusion-maximal, is a clique that is not included in a larger clique. Therefore, every clique is contained in a maximal clique. Maximal cliques can be very small. A graph may contain a non-maximal clique with many vertices and a separate clique of size 2 which is maximal. While a maximum (i.e., largest) clique is necessarily maximal, the converse does not hold. There are some types of graphs in which every maximal clique is maximum; these are the complements of the well-covered graphs, in which every maximal independent set is maximum. However, other graphs have maximal cliques that are not maximum. A single maximal clique can be found by a straightforward
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
. Starting with an arbitrary clique (for instance, any single vertex or even the empty set), grow the current clique one vertex at a time by looping through the graph's remaining vertices. For each vertex that this loop examines, add to the clique if it is adjacent to every vertex that is already in the clique, and discard otherwise. This algorithm runs in linear time. Because of the ease of finding maximal cliques, and their potential small size, more attention has been given to the much harder algorithmic problem of finding a maximum or otherwise large clique. However, some research in
parallel algorithm In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract machin ...
s has studied the problem of finding a maximal clique. In particular, the problem of finding the lexicographically first maximal clique (the one found by the algorithm above) has been shown to be
complete Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies t ...
for the class of polynomial-time functions. This result implies that the problem is unlikely to be solvable within the parallel complexity class NC.


Cliques of fixed size

One can test whether a graph contains a -vertex clique, and find any such clique that it contains, using a brute force algorithm. This algorithm examines each subgraph with vertices and checks to see whether it forms a clique. It takes time , as expressed using big O notation. This is because there are subgraphs to check, each of which has edges whose presence in needs to be checked. Thus, the problem may be solved in
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
whenever is a fixed constant. However, when does not have a fixed value, but instead may vary as part of the input to the problem, the time is exponential. The simplest nontrivial case of the clique-finding problem is finding a triangle in a graph, or equivalently determining whether the graph is triangle-free. In a graph with edges, there may be at most triangles (using big theta notation to indicate that this bound is tight). The worst case for this formula occurs when is itself a clique. Therefore, algorithms for listing all triangles must take at least time in the worst case (using
big omega notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
), and algorithms are known that match this time bound. For instance, describe an algorithm that sorts the vertices in order from highest degree to lowest and then iterates through each vertex in the sorted list, looking for triangles that include and do not include any previous vertex in the list. To do so the algorithm marks all neighbors of , searches through all edges incident to a neighbor of outputting a triangle for every edge that has two marked endpoints, and then removes the marks and deletes from the graph. As the authors show, the time for this algorithm is proportional to the
arboricity The arboricity of an undirected graph is the minimum number of forests into which its edges can be partitioned. Equivalently it is the minimum number of spanning forests needed to cover all the edges of the graph. The Nash-Williams theorem provi ...
of the graph (denoted ) multiplied by the number of edges, which is . Since the arboricity is at most , this algorithm runs in time . More generally, all -vertex cliques can be listed by a similar algorithm that takes time proportional to the number of edges multiplied by the arboricity to the power . For graphs of constant arboricity, such as planar graphs (or in general graphs from any non-trivial minor-closed graph family), this algorithm takes time, which is optimal since it is linear in the size of the input.. If one desires only a single triangle, or an assurance that the graph is triangle-free, faster algorithms are possible. As observe, the graph contains a triangle if and only if its
adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...
and the square of the adjacency matrix contain nonzero entries in the same cell. Therefore,
fast matrix multiplication In theoretical computer science, the computational complexity of matrix multiplication dictates how quickly the operation of matrix multiplication can be performed. Matrix multiplication algorithms are a central subroutine in theoretical and num ...
techniques can be applied to find triangles in time . used fast matrix multiplication to improve the algorithm for finding triangles to . These algorithms based on fast matrix multiplication have also been extended to problems of finding -cliques for larger values of .


Listing all maximal cliques

By a result of , every -vertex graph has at most maximal cliques. They can be listed by the
Bron–Kerbosch algorithm In computer science, the Bron–Kerbosch algorithm is an enumeration algorithm for finding all maximal cliques in an undirected graph. That is, it lists all subsets of vertices with the two properties that each pair of vertices in one of the listed ...
, a recursive
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it d ...
procedure of . The main recursive subroutine of this procedure has three arguments: a partially constructed (non-maximal) clique, a set of candidate vertices that could be added to the clique, and another set of vertices that should not be added (because doing so would lead to a clique that has already been found). The algorithm tries adding the candidate vertices one by one to the partial clique, making a recursive call for each one. After trying each of these vertices, it moves it to the set of vertices that should not be added again. Variants of this algorithm can be shown to have worst-case running time , matching the number of cliques that might need to be listed. Therefore, this provides a worst-case-optimal solution to the problem of listing all maximal cliques. Further, the Bron–Kerbosch algorithm has been widely reported as being faster in practice than its alternatives. However, when the number of cliques is significantly smaller than its worst case, other algorithms might be preferable. As showed, it is also possible to list all maximal cliques in a graph in an amount of time that is polynomial per generated clique. An algorithm such as theirs in which the running time depends on the output size is known as an
output-sensitive algorithm In computer science, an output-sensitive algorithm is an algorithm whose running time depends on the size of the output, instead of, or in addition to, the size of the input. For certain problems where the output size varies widely, for example fro ...
. Their algorithm is based on the following two observations, relating the maximal cliques of the given graph to the maximal cliques of a graph formed by removing an arbitrary vertex from : *For every maximal clique of , either continues to form a maximal clique in , or forms a maximal clique in . Therefore, has at least as many maximal cliques as does. *Each maximal clique in that does not contain is a maximal clique in , and each maximal clique in that does contain can be formed from a maximal clique in by adding and removing the non-neighbors of from . Using these observations they can generate all maximal cliques in by a recursive algorithm that chooses a vertex arbitrarily and then, for each maximal clique in , outputs both and the clique formed by adding to and removing the non-neighbors of . However, some cliques of may be generated in this way from more than one parent clique of , so they eliminate duplicates by outputting a clique in only when its parent in is lexicographically maximum among all possible parent cliques. On the basis of this principle, they show that all maximal cliques in may be generated in time per clique, where is the number of edges in and is the number of vertices. improve this to per clique, where is the arboricity of the given graph. provide an alternative output-sensitive algorithm based on fast matrix multiplication. show that it is even possible to list all maximal cliques in
lexicographic order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
with
polynomial delay In the analysis of algorithms, an enumeration algorithm (i.e., an algorithm for listing a large or infinite collection of structures) is said to have polynomial delay if the time between the output of any one structure and the next is bounded by a ...
per clique. However, the choice of ordering is important for the efficiency of this algorithm: for the reverse of this order, there is no polynomial-delay algorithm unless
P = NP The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved. The informal term ''quickly'', used abov ...
. On the basis of this result, it is possible to list all maximal cliques in polynomial time, for families of graphs in which the number of cliques is polynomially bounded. These families include
chordal graph In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a ''chord'', which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced c ...
s,
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices is ...
s,
triangle-free graph In the mathematical area of graph theory, a triangle-free graph is an undirected graph in which no three vertices form a triangle of edges. Triangle-free graphs may be equivalently defined as graphs with clique number ≤ 2, graphs with g ...
s,
interval graph In graph theory, an interval graph is an undirected graph formed from a set of intervals on the real line, with a vertex for each interval and an edge between vertices whose intervals intersect. It is the intersection graph of the intervals. In ...
s, graphs of bounded
boxicity In graph theory, boxicity is a graph invariant, introduced by Fred S. Roberts in 1969. The boxicity of a graph is the minimum dimension in which a given graph can be represented as an intersection graph of axis-parallel boxes. That is, there mu ...
, and planar graphs. In particular, the planar graphs have cliques, of at most constant size, that can be listed in linear time. The same is true for any family of graphs that is both sparse (having a number of edges at most a constant times the number of vertices) and closed under the operation of taking subgraphs.


Finding maximum cliques in arbitrary graphs

It is possible to find the maximum clique, or the clique number, of an arbitrary ''n''-vertex graph in time by using one of the algorithms described above to list all maximal cliques in the graph and returning the largest one. However, for this variant of the clique problem better worst-case time bounds are possible. The algorithm of solves this problem in time . It is a recursive backtracking scheme similar to that of the
Bron–Kerbosch algorithm In computer science, the Bron–Kerbosch algorithm is an enumeration algorithm for finding all maximal cliques in an undirected graph. That is, it lists all subsets of vertices with the two properties that each pair of vertices in one of the listed ...
, but is able to eliminate some recursive calls when it can be shown that the cliques found within the call will be suboptimal. improved the time to , and improved it to time, at the expense of greater space usage. Robson's algorithm combines a similar backtracking scheme (with a more complicated case analysis) and a
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
technique in which the optimal solution is precomputed for all small connected subgraphs of the
complement graph In the mathematical field of graph theory, the complement or inverse of a graph is a graph on the same vertices such that two distinct vertices of are adjacent if and only if they are not adjacent in . That is, to generate the complement of a ...
. These partial solutions are used to shortcut the backtracking recursion. The fastest algorithm known today is a refined version of this method by which runs in time . There has also been extensive research on
heuristic algorithm In mathematical optimization and computer science, heuristic (from Greek εὑρίσκω "I find, discover") is a technique designed for solving a problem more quickly when classic methods are too slow for finding an approximate solution, or whe ...
s for solving maximum clique problems without worst-case runtime guarantees, based on methods including
branch and bound Branch and bound (BB, B&B, or BnB) is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solut ...
, local search,
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
s, and
constraint programming Constraint programming (CP) is a paradigm for solving combinatorial problems that draws on a wide range of techniques from artificial intelligence, computer science, and operations research. In constraint programming, users declaratively state t ...
. Non-standard computing methodologies that have been suggested for finding cliques include
DNA computing DNA computing is an emerging branch of unconventional computing which uses DNA, biochemistry, and molecular biology hardware, instead of the traditional electronic computing. Research and development in this area concerns theory, experiments, a ...
and
adiabatic quantum computation Adiabatic quantum computation (AQC) is a form of quantum computing which relies on the adiabatic theorem to do calculations and is closely related to quantum annealing. Description First, a (potentially complicated) Hamiltonian is found whose g ...
. The maximum clique problem was the subject of an implementation challenge sponsored by
DIMACS The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) is a collaboration between Rutgers University, Princeton University, and the research firms AT&T, Bell Labs, Applied Communication Sciences, and NEC. It was founded in ...
in 1992–1993, and a collection of graphs used as benchmarks for the challenge, which is publicly available.


Special classes of graphs

Planar graphs, and other families of sparse graphs, have been discussed above: they have linearly many maximal cliques, of bounded size, that can be listed in linear time. In particular, for planar graphs, any clique can have at most four vertices, by
Kuratowski's theorem In graph theory, Kuratowski's theorem is a mathematical forbidden graph characterization of planar graphs, named after Kazimierz Kuratowski. It states that a finite graph is planar if and only if it does not contain a subgraph that is a subdi ...
.
Perfect graph In graph theory, a perfect graph is a graph in which the chromatic number of every induced subgraph equals the order of the largest clique of that subgraph (clique number). Equivalently stated in symbolic terms an arbitrary graph G=(V,E) is perfe ...
s are defined by the properties that their clique number equals their
chromatic number In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices o ...
, and that this equality holds also in each of their
induced subgraph In the mathematical field of graph theory, an induced subgraph of a graph is another graph, formed from a subset of the vertices of the graph and ''all'' of the edges (from the original graph) connecting pairs of vertices in that subset. Defini ...
s. For perfect graphs, it is possible to find a maximum clique in polynomial time, using an algorithm based on
semidefinite programming Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of positive ...
. However, this method is complex and non-combinatorial, and specialized clique-finding algorithms have been developed for many subclasses of perfect graphs. In the
complement graph In the mathematical field of graph theory, the complement or inverse of a graph is a graph on the same vertices such that two distinct vertices of are adjacent if and only if they are not adjacent in . That is, to generate the complement of a ...
s of bipartite graphs, Kőnig's theorem allows the maximum clique problem to be solved using techniques for matching. In another class of perfect graphs, the
permutation graph In the mathematical field of graph theory, a permutation graph is a graph whose vertices represent the elements of a permutation, and whose edges represent pairs of elements that are reversed by the permutation. Permutation graphs may also be ...
s, a maximum clique is a longest decreasing subsequence of the permutation defining the graph and can be found using known algorithms for the longest decreasing subsequence problem. Conversely, every instance of the longest decreasing subsequence problem can be described equivalently as a problem of finding a maximum clique in a permutation graph. provide an alternative quadratic-time algorithm for maximum cliques in
comparability graph In graph theory, a comparability graph is an undirected graph that connects pairs of elements that are comparable to each other in a partial order. Comparability graphs have also been called transitively orientable graphs, partially orderable grap ...
s, a broader class of perfect graphs that includes the permutation graphs as a special case. In
chordal graph In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a ''chord'', which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced c ...
s, the maximal cliques can be found by listing the vertices in an elimination ordering, and checking the clique
neighborhoods A neighbourhood (British English, Irish English, Australian English and Canadian English) or neighborhood (American English; American and British English spelling differences, see spelling differences) is a geographically localised community ...
of each vertex in this ordering. In some cases, these algorithms can be extended to other, non-perfect, classes of graphs as well. For instance, in a circle graph, the neighborhood of each vertex is a permutation graph, so a maximum clique in a circle graph can be found by applying the permutation graph algorithm to each neighborhood. Similarly, in a
unit disk graph In geometric graph theory, a unit disk graph is the intersection graph of a family of unit disks in the Euclidean plane. That is, it is a graph with one vertex for each disk in the family, and with an edge between two vertices whenever the corr ...
(with a known geometric representation), there is a polynomial time algorithm for maximum cliques based on applying the algorithm for complements of bipartite graphs to shared neighborhoods of pairs of vertices. The algorithmic problem of finding a maximum clique in a
random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs ...
drawn from the
Erdős–Rényi model In the mathematical field of graph theory, the Erdős–Rényi model is either of two closely related models for generating random graphs or the evolution of a random network. They are named after Hungarian mathematicians Paul Erdős and Alfr ...
(in which each edge appears with probability , independently from the other edges) was suggested by . Because the maximum clique in a random graph has logarithmic size with high probability, it can be found by a brute force search in expected time . This is a quasi-polynomial time bound. Although the clique number of such graphs is usually very close to , simple
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
s as well as more sophisticated randomized approximation techniques only find cliques with size , half as big. The number of maximal cliques in such graphs is with high probability exponential in , which prevents methods that list all maximal cliques from running in polynomial time. Because of the difficulty of this problem, several authors have investigated the
planted clique In computational complexity theory, a planted clique or hidden clique in an undirected graph is a clique formed from another graph by selecting a subset of vertices and adding edges between each pair of vertices in the subset. The planted clique pr ...
problem, the clique problem on random graphs that have been augmented by adding large cliques. While
spectral methods Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically solve certain differential equations. The idea is to write the solution of the differential equation as a sum of certain " basis functio ...
and
semidefinite programming Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of positive ...
can detect hidden cliques of size , no polynomial-time algorithms are currently known to detect those of size (expressed using
little-o notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
).


Approximation algorithms

Several authors have considered
approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solu ...
s that attempt to find a clique or independent set that, although not maximum, has size as close to the maximum as can be found in polynomial time. Although much of this work has focused on independent sets in sparse graphs, a case that does not make sense for the complementary clique problem, there has also been work on approximation algorithms that do not use such sparsity assumptions. describes a polynomial time algorithm that finds a clique of size in any graph that has clique number for any constant . By using this algorithm when the clique number of a given input graph is between and , switching to a different algorithm of for graphs with higher clique numbers, and choosing a two-vertex clique if both algorithms fail to find anything, Feige provides an approximation algorithm that finds a clique with a number of vertices within a factor of of the maximum. Although the
approximation ratio An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ' ...
of this algorithm is weak, it is the best known to date. The results on
hardness of approximation In computer science, hardness of approximation is a field that studies the algorithmic complexity of finding near-optimal solutions to optimization problems. Scope Hardness of approximation complements the study of approximation algorithms by prov ...
described below suggest that there can be no approximation algorithm with an approximation ratio significantly less than linear.


Lower bounds


NP-completeness

The clique decision problem is
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
. It was one of Richard Karp's original 21 problems shown NP-complete in his 1972 paper "Reducibility Among Combinatorial Problems". This problem was also mentioned in
Stephen Cook Stephen Arthur Cook (born December 14, 1939) is an American-Canadian computer scientist and mathematician who has made significant contributions to the fields of complexity theory and proof complexity. He is a university professor at the Unive ...
's paper introducing the theory of NP-complete problems. Because of the hardness of the decision problem, the problem of finding a maximum clique is also NP-hard. If one could solve it, one could also solve the decision problem, by comparing the size of the maximum clique to the size parameter given as input in the decision problem. Karp's NP-completeness proof is a
many-one reduction In computability theory and computational complexity theory, a many-one reduction (also called mapping reduction) is a reduction which converts instances of one decision problem L_1 into instances of a second decision problem L_2 where the instan ...
from the
Boolean satisfiability problem In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfie ...
. It describes how to translate Boolean formulas in
conjunctive normal form In Boolean logic, a formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; otherwise put, it is a product of sums or an AND of ORs. As a can ...
(CNF) into equivalent instances of the maximum clique problem. Satisfiability, in turn, was proved NP-complete in the
Cook–Levin theorem In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a determi ...
. From a given CNF formula, Karp forms a graph that has a vertex for every pair , where is a variable or its negation and is a clause in the formula that contains . Two of these vertices are connected by an edge if they represent compatible variable assignments for different clauses. That is, there is an edge from to whenever and and are not each other's negations. If denotes the number of clauses in the CNF formula, then the -vertex cliques in this graph represent consistent ways of assigning truth values to some of its variables in order to satisfy the formula. Therefore, the formula is satisfiable if and only if a -vertex clique exists. Some NP-complete problems (such as the travelling salesman problem in planar graphs) may be solved in time that is exponential in a sublinear function of the input size parameter , significantly faster than a brute-force search. However, it is unlikely that such a subexponential time bound is possible for the clique problem in arbitrary graphs, as it would imply similarly subexponential bounds for many other standard NP-complete problems.


Circuit complexity

The computational difficulty of the clique problem has led it to be used to prove several lower bounds in circuit complexity. The existence of a clique of a given size is a Hereditary property, monotone graph property, meaning that, if a clique exists in a given graph, it will exist in any Glossary of graph theory terms#supergraph, supergraph. Because this property is monotone, there must exist a monotone circuit, using only and gates and or gates, to solve the clique decision problem for a given fixed clique size. However, the size of these circuits can be proven to be a super-polynomial function of the number of vertices and the clique size, exponential in the cube root of the number of vertices. Even if a small number of NOT gates are allowed, the complexity remains superpolynomial. Additionally, the depth of a monotone circuit for the clique problem using gates of bounded fan-in must be at least a polynomial in the clique size.


Decision tree complexity

The (deterministic) decision tree complexity of determining a graph property is the number of questions of the form "Is there an edge between vertex and vertex ?" that have to be answered in the worst case to determine whether a graph has a particular property. That is, it is the minimum height of a boolean Decision tree model, decision tree for the problem. There are possible questions to be asked. Therefore, any graph property can be determined with at most questions. It is also possible to define random and quantum decision tree complexity of a property, the expected number of questions (for a worst case input) that a randomized or quantum algorithm needs to have answered in order to correctly determine whether the given graph has the property. Because the property of containing a clique is monotone, it is covered by the Aanderaa–Karp–Rosenberg conjecture, which states that the deterministic decision tree complexity of determining any non-trivial monotone graph property is exactly . For arbitrary monotone graph properties, this conjecture remains unproven. However, for deterministic decision trees, and for any in the range , the property of containing a -clique was shown to have decision tree complexity exactly by . Deterministic decision trees also require exponential size to detect cliques, or large polynomial size to detect cliques of bounded size. The Aanderaa–Karp–Rosenberg conjecture also states that the randomized decision tree complexity of non-trivial monotone functions is . The conjecture again remains unproven, but has been resolved for the property of containing a clique for . This property is known to have randomized decision tree complexity . For quantum decision trees, the best known lower bound is , but no matching algorithm is known for the case of .


Fixed-parameter intractability

Parameterized complexity is the computational complexity theory, complexity-theoretic study of problems that are naturally equipped with a small integer parameter and for which the problem becomes more difficult as increases, such as finding -cliques in graphs. A problem is said to be fixed-parameter tractable if there is an algorithm for solving it on inputs of size , and a function , such that the algorithm runs in time . That is, it is fixed-parameter tractable if it can be solved in polynomial time for any fixed value of and moreover if the exponent of the polynomial does not depend on . For finding -vertex cliques, the brute force search algorithm has running time . Because the exponent of depends on , this algorithm is not fixed-parameter tractable. Although it can be improved by fast matrix multiplication the running time still has an exponent that is linear in Thus, although the running time of known algorithms for the clique problem is polynomial for any fixed , these algorithms do not suffice for fixed-parameter tractability. defined a hierarchy of parametrized problems, the W hierarchy, that they conjectured did not have fixed-parameter tractable algorithms. They proved that independent set (or, equivalently, clique) is hard for the first level of this hierarchy, W(1), W[1]. Thus, according to their conjecture, clique has no fixed-parameter tractable algorithm. Moreover, this result provides the basis for proofs of W[1]-hardness of many other problems, and thus serves as an analogue of the
Cook–Levin theorem In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a determi ...
for parameterized complexity. showed that finding -vertex cliques cannot be done in time unless the exponential time hypothesis fails. Again, this provides evidence that no fixed-parameter tractable algorithm is possible. Although the problems of listing maximal cliques or finding maximum cliques are unlikely to be fixed-parameter tractable with the parameter , they may be fixed-parameter tractable for other parameters of instance complexity. For instance, both problems are known to be fixed-parameter tractable when parametrized by the degeneracy (graph theory), degeneracy of the input graph..


Hardness of approximation

Weak results hinting that the clique problem might be hard to approximate have been known for a long time. observed that, because the clique number takes on small integer values and is NP-hard to compute, it cannot have a Polynomial-time approximation scheme, fully polynomial-time approximation scheme. If too accurate an approximation were available, rounding its value to an integer would give the exact clique number. However, little more was known until the early 1990s, when several authors began to make connections between the approximation of maximum cliques and probabilistically checkable proofs. They used these connections to prove
hardness of approximation In computer science, hardness of approximation is a field that studies the algorithmic complexity of finding near-optimal solutions to optimization problems. Scope Hardness of approximation complements the study of approximation algorithms by prov ...
results for the maximum clique problem. After many improvements to these results it is now known that, for every real number , there can be no polynomial time algorithm that approximates the maximum clique to within a factor better than , unless P versus NP problem, P = NP. showed inapproximability for this ratio using a stronger complexity theoretic assumption, the inequality of NP (complexity), NP and ZPP (complexity), ZPP. described more precisely the inapproximability ratio, but with an even stronger assumption. derandomization, derandomized the construction weakening its assumption to P ≠ NP. The rough idea of these inapproximability results is to form a graph that represents a probabilistically checkable proof system for an NP-complete problem such as the Boolean satisfiability problem. In a probabilistically checkable proof system, a proof is represented as a sequence of bits. An instance of the satisfiability problem should have a valid proof if and only if it is satisfiable. The proof is checked by an algorithm that, after a polynomial-time computation on the input to the satisfiability problem, chooses to examine a small number of randomly chosen positions of the proof string. Depending on what values are found at that sample of bits, the checker will either accept or reject the proof, without looking at the rest of the bits. False negatives are not allowed: a valid proof must always be accepted. However, an invalid proof may sometimes mistakenly be accepted. For every invalid proof, the probability that the checker will accept it must be low. To transform a probabilistically checkable proof system of this type into a clique problem, one forms a graph with a vertex for each possible accepting run of the proof checker. That is, a vertex is defined by one of the possible random choices of sets of positions to examine, and by bit values for those positions that would cause the checker to accept the proof. It can be represented by a partial word with a 0 or 1 at each examined position and a wildcard character at each remaining position. Two vertices are adjacent, in this graph, if the corresponding two accepting runs see the same bit values at every position they both examine. Each (valid or invalid) proof string corresponds to a clique, the set of accepting runs that see that proof string, and all maximal cliques arise in this way. One of these cliques is large if and only if it corresponds to a proof string that many proof checkers accept. If the original satisfiability instance is satisfiable, it will have a valid proof string, one that is accepted by all runs of the checker, and this string will correspond to a large clique in the graph. However, if the original instance is not satisfiable, then all proof strings are invalid, each proof string has only a small number of checker runs that mistakenly accept it, and all cliques are small. Therefore, if one could distinguish in polynomial time between graphs that have large cliques and graphs in which all cliques are small, or if one could accurately approximate the clique problem, then applying this approximation to the graphs generated from satisfiability instances would allow satisfiable instances to be distinguished from unsatisfiable instances. However, this is not possible unless P = NP.This reduction is originally due to and used in all subsequent inapproximability proofs; the proofs differ in the strengths and details of the probabilistically checkable proof systems that they rely on.


Notes


References


Surveys and textbooks

*. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *.


Popular press

*.


Research articles

*. *. *. *. *. *. Originally presented at the 1992 Symposium on Foundations of Computer Science, . *. Originally presented at the 1992 Symposium on Foundations of Computer Science, . *. *. *. *. *. *. *. *. * *. *. *. * *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. * *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *
Source code
*. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. {{refend NP-complete problems Computational problems in graph theory