computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by ...

, a planted clique or hidden clique in an

undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...

is a

clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...

formed from another graph by selecting a subset of vertices and adding edges between each pair of vertices in the subset. The planted clique problem is the algorithmic problem of distinguishing

random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs li ...

s from graphs that have a planted clique. This is a variation of the

clique problem In computer science, the clique problem is the computational problem of finding cliques (subsets of vertices, all adjacent to each other, also called complete subgraphs) in a graph. It has several different formulations depending on which cliq ...

; it may be solved in

quasi-polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...

but is conjectured not to be solvable in

polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...

for intermediate values of the clique size. The conjecture that no polynomial time solution exists is called the planted clique conjecture; it has been used as a

computational hardness assumption In computational complexity theory, a computational hardness assumption is the hypothesis that a particular problem cannot be solved efficiently (where ''efficiently'' typically means "in polynomial time"). It is not known how to prove (uncondition ...

Definition

A clique in a graph is a subset of vertices, all of which are adjacent to each other. A planted clique is a clique created from another graph by adding edges between all pairs of a selected subset of vertices. The planted clique problem can be formalized as a

decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whethe ...

over a

random distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

on graphs, parameterized by two numbers, (the number of vertices), and (the size of the clique). These parameters may be used to generate a graph, by the following random process:. #Create an Erdős–Rényi random graph on vertices by choosing independently for each pair of vertices whether to include an edge connecting that pair, with probability 1/2 for each pair. #Decide whether or not to add a clique to the graph, with probability 1/2; if not, return the graph formed in step 1. #Choose randomly a subset of of the vertices and add an edge (if one is not already present) between each pair of the selected vertices. The problem is then to determine algorithmically whether one of the graphs resulting from this process contains a clique of at least vertices.

Upper and lower bounds

There exists a function

f(n) \sim 2 \log_2 n

such that asymptotically almost surely, the size of the largest clique in an -vertex random graph is either

f(n)

f(n)+1

, and there exists some constant

c

such that the expected number of cliques of size

\geq f(n) -c

converges to infinity. Consequently, one should expect that the planting a clique of size

\sim 2 \log_2 n

cannot be detected with high probability. By the

central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...

, the vertex degrees of the random graph would be distributed close to a standard normal distribution with mean

\frac n2

and standard deviation

\frac \sqrt 2

. Consequently, when

k

is on the order of

\sqrt n

it would create a detectable change in the shape of the distribution. Namely, if you plot out the vertex degree distribution, it would look like a deformed bell curve. Therefore, the most interesting range of values for the parameter is between these two values, :

2\log_2 n \ll k \ll \sqrt n.

Algorithms

Large cliques

For sufficiently large values of the parameter , the planted clique problem can be solved (with high probability) in polynomial time. observes that, when

k=\Omega(\sqrt)

then almost surely all vertices of the planted clique have higher degree than all vertices outside the clique, making the clique very easy to find. He describes a modification to the random process for generating planted clique instances, that makes the vertex degrees more uniform even for large values of , but shows that despite this modification the planted clique may still be found quickly. prove for

k>10\sqrt n

a planted clique can be found with high probability by the following method: #Compute the

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

of the

adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...

corresponding to its second highest

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

. #Select the vertices whose coordinates in this eigenvector have the largest

absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), an ...

s. #Return the set of vertices that are adjacent to at least 3/4 of the selected vertices. They show how to modify this technique so that it continues to work whenever is at least proportional to some multiple of the square root of the number of vertices. Large planted cliques can also be found using

semidefinite programming Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of positive ...

. A combinatorial technique based on randomly sampling vertices can achieve the same bound on and runs in

linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...

Quasipolynomial time

It is also possible to solve the planted clique problem, regardless of the choice of , in

.. Because the largest clique in a random graph typically has size near , a planted clique of size (if it exists) can be found with high probability by the following method: #Loop through all sets of

\min(k,3\log_2 n)

vertices. #For each choice of , test whether is a clique. If it is, and

, S,  = k

, return . Otherwise, find the set of vertices that are adjacent to all vertices in . If

, T, \ge k

, return . The running time of this algorithm is quasipolynomial, because there are quasipolynomially many choices of to loop over. This method is guaranteed to try a set that is a subset of the planted clique; with high probability, the set will consist only of other members of the planted clique.

As a hardness assumption

The planted clique conjecture is the conjecture that there is no polynomial time algorithm that takes as input graphs produced by the planted clique process and distinguishes the ones with planted cliques from the ones that don't have planted cliques with probability significantly better than random chance. used the assumption that finding planted cliques is hard as a

to prove that, if so, it is also hard to approximate the best

Nash equilibrium In game theory, the Nash equilibrium, named after the mathematician John Nash, is the most common way to define the solution of a non-cooperative game involving two or more players. In a Nash equilibrium, each player is assumed to know the equili ...

in a two-player game. The planted clique conjecture has also been used as a hardness assumption to show the difficulty of

property testing In computer science, a property testing algorithm for a decision problem is an algorithm whose query complexity to its input is much smaller than the instance size of the problem. Typically property testing algorithms are used to distinguish if s ...

-independence of random distributions, finding clusters in social networks, and

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

References

{{Computational hardness assumptions Computational problems in graph theory Computational hardness assumptions