Szemerédi Regularity Lemma
   HOME

TheInfoList



OR:

Szemerédi's regularity lemma is one of the most powerful tools in
extremal graph theory Extremal graph theory is a branch of combinatorics, itself an area of mathematics, that lies at the intersection of extremal combinatorics and graph theory. In essence, extremal graph theory studies how global properties of a graph influence local ...
, particularly in the study of large dense graphs. It states that the vertices of every large enough
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
can be partitioned into a bounded number of parts so that the edges between different parts behave almost randomly. According to the lemma, no matter how large a graph is, we can approximate it with the edge densities between a bounded number of parts. Between any two parts, the distribution of edges will be pseudorandom as per the edge density. These approximations provide essentially correct values for various properties of the graph, such as the number of embedded copies of a given subgraph or the number of edge deletions required to remove all copies of some subgraph.


Statement

To state Szemerédi's regularity lemma formally, we must formalize what the edge distribution between parts behaving 'almost randomly' really means. By 'almost random', we're referring to a notion called -regularity. To understand what this means, we first state some definitions. In what follows is a graph with
vertex Vertex, vertices or vertexes may refer to: Science and technology Mathematics and computer science *Vertex (geometry), a point where two or more curves, lines, or edges meet * Vertex (computer graphics), a data structure that describes the positio ...
set .
Definition 1. Let be disjoint subsets of . The edge density of the pair is defined as: :d(X,Y) := \frac where denotes the set of edges having one end vertex in and one in .
We call a pair of parts -regular if, whenever you take a large subset of each part, their edge density isn't too far off the edge density of the pair of parts. Formally,
Definition 2. For , a pair of vertex sets and is called -regular, if for all subsets , satisfying , , we have :\left, d(X,Y) - d(A,B) \ \le \varepsilon.
The natural way to define an -regular partition should be one where each pair of parts is -regular. However, some graphs, such as the half graphs, require many pairs of partitions (but a small fraction of all pairs) to be irregular. So we shall define -regular partitions to be one where most pairs of parts are -regular.
Definition 3. A partition of V into k sets \mathcal=\ is called an \varepsilon-regular partition if :\sum_ , V_i, , V_j, \leq\varepsilon, V(G), ^2
Now we can state the lemma:
Szemerédi's Regularity Lemma. For every and
positive Positive is a property of positivity and may refer to: Mathematics and science * Positive formula, a logical formula not containing negation * Positive number, a number that is greater than 0 * Plus sign, the sign "+" used to indicate a posi ...
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
there exists an integer such that if is a graph with at least vertices, there exists an integer in the range and an -regular partition of the vertex set of into sets.
The bound for the number of parts in the partition of the graph given by the proofs of Szemeredi's regularity lemma is very large, given by a -level iterated exponential of . At one time it was hoped that the true bound was much smaller, which would have had several useful applications. However found examples of graphs for which does indeed grow very fast and is at least as large as a -level iterated exponential of . In particular the best bound has level exactly 4 in the
Grzegorczyk hierarchy The Grzegorczyk hierarchy (, ), named after the Polish logician Andrzej Grzegorczyk, is a hierarchy of functions used in computability theory. Every function in the Grzegorczyk hierarchy is a primitive recursive function, and every primitive recurs ...
, and so is not an elementary recursive function.


Proof

We shall find an ε-regular partition for a given graph following an algorithm. A rough outline: # Start with an arbitrary partition (could just be 1 part) # While the partition isn't ε-regular: #*Find the subsets which witness ε-irregularity for each irregular pair. #*Simultaneously refine the partition using all the witnessing subsets. We apply a technique called the energy increment argument to show that this process terminates after a bounded number of steps. Basically, we define a monovariant which must increase by a certain amount in each step, but it's bounded above and thus cannot increase indefinitely. This monovariant is called 'energy' as it's an L^2 quantity.
Definition 4. Let be subsets of . Set , V, = n . The energy of the pair is defined as: :q(U,W) := \fracd(U,W)^2 For partitions \mathcal_U=\ of and \mathcal_W = \ of , we define the energy to be the sum of the energies between each pair of parts: :q(\mathcal_U,\mathcal_W) := \sum_^k\sum_^lq(U_i,W_j) Finally, for a partition \mathcal=\ of , define the energy of \mathcal to be q(\mathcal,\mathcal). Specifically, :q(\mathcal)=\sum_^k\sum_^kq(V_i,V_j)=\sum_^k\sum_^k\fracd(V_i,V_j)^2
Observe that energy is between 0 and 1 because edge density is bounded above by 1: :q(\mathcal)=\sum_^k\sum_^k\fracd(V_i,V_j)^2\leq\sum_^k\sum_^k\frac=1 Now, we start by proving that energy does not decrease upon refinement.
Lemma 1. (Energy is nondecreasing under partitioning) For any partitions \mathcal_U and \mathcal_W of vertex sets U and W, q(\mathcal_U,\mathcal_W)\geq q(U,W).
Proof: Let \mathcal_U=\ and \mathcal_W=\. Choose vertices x uniformly from U and y uniformly from W, with x in part U_i and y in part W_j. Then define the random variable Z=d(U_i,W_j). Let us look at properties of Z. The expectation is :\mathbb \sum_^k\sum_^l\frac\fracd(U_i,W_j)=\frac=d(U,W) The second moment is :\mathbb ^2\sum_^k\sum_^l\frac\fracd(U_i,W_j)^2=\fracq(\mathcal_U,\mathcal_W) By convexity, \mathbb ^2geq\mathbb 2. Rearranging, we get that q(\mathcal_U,\mathcal_W) \ge q(U,W) for all U,W.\square
If each part of \mathcal is further partitioned, the new partition is called a refinement of \mathcal. Now, if \mathcal=\, applying Lemma 1 to each pair (V_i,V_j) proves that for every refinement \mathcal of \mathcal, q(\mathcal) \ge q(\mathcal). Thus the refinement step in the algorithm doesn't lose any energy.
Lemma 2. (Energy boost lemma) If (U,W) is not \varepsilon-regular as witnessed by U_1\subset U,W_1\subset W, then, :q\left(\,\\right)>q(U,W)+\varepsilon^4\frac
Proof: Define Z as above. Then, :Var(Z) = \mathbb ^2\mathbb 2 = \frac\left(q\left(\,\\right)-q(U,W)\right) But observe that , Z-\mathbb =, d(U_1,W_1)-d(U,W), with probability \frac\frac(corresponding to x\in U_1 and y\in W_1), so :Var(Z) = \mathbb Z-\mathbb[Z^2.html" ;"title=".html" ;"title="Z-\mathbb[Z">Z-\mathbb[Z^2">.html" ;"title="Z-\mathbb[Z">Z-\mathbb[Z^2\geq \frac\frac(d(U_1,W_1)-d(U,W))^2 > \varepsilon\cdot\varepsilon\cdot\varepsilon^2 \square
Now we can prove the energy increment argument, which shows that energy increases substantially in each iteration of the algorithm.
Lemma 3 (Energy increment lemma) If a partition \mathcal=\ of V(G) is not \varepsilon-regular, then there exists a refinement \mathcal of \mathcal where every V_i is partitioned into at most 2^k parts such that :q(\mathcal)\geq q(\mathcal)+\varepsilon^5.
Proof: For all (i,j) such that (V_i,V_j) is not \varepsilon-regular, find A^\subset V_i and A^\subset V_j that witness irregularity (do this simultaneously for all irregular pairs). Let \mathcal be a common refinement of \mathcal by A^'s. Each V_i is partitioned into at most 2^k parts as desired. Then, :q(\mathcal) = \sum_ q(\mathcal_,\mathcal_) = \sum_q(\mathcal_,\mathcal_)+\sum_q(\mathcal_,\mathcal_) Where \mathcal_ is the partition of V_i given by \mathcal. By Lemma 1, the above quantity is at least :\sum_q(V_i,V_j)+\sum_q(\,\) Since V_i is cut by A^ when creating \mathcal, so \mathcal_ is a refinement of \. By lemma 2, the above sum is at least : \sum_q(V_i,V_j)+\sum_\varepsilon^4\frac But the second sum is at least \varepsilon^5 since \mathcal is not \varepsilon-regular, so we deduce the desired inequality. \square
Now, starting from any partition, we can keep applying Lemma 3 as long as the resulting partition isn't \varepsilon-regular. But in each step energy increases by \varepsilon^5, and it's bounded above by 1. Then this process can be repeated at most \varepsilon^ times, before it terminates and we must have an \varepsilon-regular partition.


Applications


Graph counting lemma

If we have enough information about the regularity of a graph, we can count the number of copies of a specific subgraph within the graph up to small error.
Graph Counting Lemma. Let H be a graph with V(H)=[k], and let \varepsilon>0. Let G be an n-vertex graph with vertex sets V_1,\dots,V_k\subseteq V(G) such that (V_i,V_j) is \varepsilon-regular whenever \\in E(H). Then, the number of labeled copies of H in G is within e(H)\varepsilon, V_1, \cdots, V_k, of :\left(\prod_d(V_i,V_j)\right)\left(\prod_^k, V_i, \right).
This can be combined with Szemerédi's regularity lemma to prove the
Graph removal lemma In graph theory, the graph removal lemma states that when a graph contains few copies of a given subgraph, then all of the copies can be eliminated by removing a small number of edges. The special case in which the subgraph is a triangle is known a ...
. The graph removal lemma can be used to prove Roth's Theorem on Arithmetic Progressions, and a generalization of it, the
hypergraph removal lemma In graph theory, the hypergraph removal lemma states that when a hypergraph contains few copies of a given sub-hypergraph, then all of the copies can be eliminated by removing a small number of hyperedges. It is a generalization of the graph remova ...
, can be used to prove
Szemerédi's theorem In arithmetic combinatorics, Szemerédi's theorem is a result concerning arithmetic progressions in subsets of the integers. In 1936, Erdős and Turán conjectured that every set of integers ''A'' with positive natural density contains a ''k''-ter ...
. The graph removal lemma generalizes to
induced subgraph In the mathematical field of graph theory, an induced subgraph of a graph is another graph, formed from a subset of the vertices of the graph and ''all'' of the edges (from the original graph) connecting pairs of vertices in that subset. Defini ...
s, by considering edge edits instead of only edge deletions. This was proved by Alon, Fischer, Krivelevich, and Szegedy in 2000. However, this required a stronger variation of the regularity lemma. Szemerédi's regularity lemma does not provide meaningful results in
sparse graph In mathematics, a dense graph is a graph in which the number of edges is close to the maximal number of edges (where every pair of vertices is connected by one edge). The opposite, a graph with only a few edges, is a sparse graph. The distinction ...
s. Since sparse graphs have subconstant edge densities, \varepsilon-regularity is trivially satisfied. Even though the result seems purely theoretical, some attempts have been made to use the regularity method as compression technique for large graphs.


Frieze-Kannan regularity

A different notion of regularity was introduced by Frieze and Kannan, known as the weak regularity lemma. This lemma defines a weaker notion of regularity than that of Szemerédi which uses better bounds and can be used in efficient algorithms. Given a graph G=(V,E), a partition of its vertices \mathcal = \ is said to be ''Frieze-Kannan'' \epsilon-regular if for any pair of sets S,T \subseteq V: : \left, e(S,T) - \sum_^k d(V_i, V_j) , S\cap V_i, , T\cap V_j, \ \leq \epsilon , V, ^2 The weak regularity lemma for graphs states that every graph has a weak \epsilon-regular partition into at most 4^ parts. This notion can be extended to
graphon GraphOn GO-Global is a multi-user remote access application for Windows. Overview GO-Global allows multiple users to concurrently run Microsoft Windows applications installed on a Windows server or server farm  from network-connected lo ...
s by defining a stepping operator. Given a graphon W and a partition \mathcal of ,1/math>, we can define W_ as a step-graphon with steps given by \mathcal and values given by averaging W over each step. A partition \mathcal is weak \epsilon-regular if: : \, W - W_ \, _ \leq \epsilon The weak regularity lemma for graphons states that every graphon has a weak \epsilon-regular partition into at most 4^ parts. As with Szemerédi's regularity lemma, the weak regularity also induces a counting lemma.


Algorithmic Applications

One of the initial motivations for the development of the weak regularity lemma was the search for an efficient
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
for estimating the
maximum cut For a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets and , such that the number of edges between and is as large as possible. Fin ...
in a
dense graph In mathematics, a dense graph is a graph in which the number of edges is close to the maximal number of edges (where every pair of vertices is connected by one edge). The opposite, a graph with only a few edges, is a sparse graph. The distinctio ...
. It has been shown that approximating the max-cut problem beyond 16/17 is
NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
, however an algorithmic version of the weak regularity lemma gives an efficient algorithm for approximating the max-cut for dense graphs within an \epsilon n^2 additive error. These ideas have been further developed into efficient sampling algorithms for estimating max-cut in dense graphs. The smaller bounds of the weak regularity lemma allow for efficient algorithms to find an \epsilon-regular partition. Graph regularity has further been used in various area of
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
, such as
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
and
communication complexity In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first intro ...
.


Strong regularity lemma

The strong regularity lemma is a stronger variation of the regularity lemma proven by
Alon Alon or ALON may refer to: * Alon (name), an Israeli given name and surname * Alon, Mateh Binyamin, an Israeli settlement in the West Bank * Alon Inc, an American airplane builder, known for the Alon A-4 * Alon USA, an American energy company * Alu ...
, Fischer, Krivelevich, and Szegedy in 2000. Intuitively, it provides information between non-regular pairs and could be applied to prove the induced graph removal lemma.


Statement

For any infinite sequence of constants \epsilon_0\ge \epsilon_1 \ge ...>0, there exists an integer M such that for any graph G, we can obtain two (equitable) partitions \mathcal and \mathcal such that the following properties are satisfied: * \mathcal refines \mathcal, that is every part of \mathcal is the union of some collection of parts in \mathcal. * \mathcal is \epsilon_0-regular and \mathcal is \epsilon_-regular. * q(\mathcal) * , \mathcal, \le M


Proof

We apply the regularity lemma repeatedly to prove the stronger version. A rough outline: * Start with \mathcal P_0 be an \epsilon_0 regular partition * Repeatedly find its refinement \mathcal Q that is \epsilon_ regular. If the energy increment of \mathcal Q \le \epsilon_0, we simply return (\mathcal P, \mathcal Q). Otherwise, we replace \mathcal P with \mathcal Q and continue. We start with \mathcal P_0 be an \epsilon_0 regular partition of G with \le M(\epsilon_0) parts. Here M(t) corresponds to the bound of parts in regularity lemma when \epsilon=t . Now for i=0, 1, \cdots, we set \mathcal to be an \epsilon_regular refinement of \mathcal with \le M(\epsilon_), \mathcal P_i, parts. By the energy increment argument, q(\mathcal P_) \ge q(\mathcal P_). Since the energy is bounded in , 1/math>, there must be some i \le 1/\epsilon_0+1 such that q(\mathcal P_)- q(\mathcal P_) < \epsilon_0. We return (\mathcal P_i, \mathcal P_) as (\mathcal P, \mathcal Q). By our choice of \mathcal P_, the regular and refinement conditions hold. The energy condition holds trivially. Now we argue for the number of parts. We use induction to show that \forall i, there exists M_i such that , \mathcal P_i, \le M_i. By setting M_0=M(\epsilon_0), we have , \mathcal P_0, \le M_0. Note that when , P_i, \le M_i, , P_, \le M(\epsilon_), \mathcal P_i, \le M(\epsilon_) M_i, so we could set M_=M(\epsilon_) M_i and the statement is true for i+1. By setting M=\max_ M_i, we have , P, , , Q, \le M.


Remarks on equitable

A partition is equitable if the sizes of any two sets differ by at most 1 . By equitizing in each round of iteration, the proof of regularity lemma could be accustomed to prove the equitable version of regularity lemma. And by replacing the regularity lemma with its equitable version, the proof above could prove the equitable version of strong regularity lemma where \mathcal and \mathcal are equitable partitions.


A useful Corollary


Statement

For any infinite sequence of constants \epsilon_0\ge \epsilon_1 \ge ...>0, there exists \delta>0 such that there exists a partition \mathcal= and subsets W_i \subset V_i for each i where the following properties are satisfied: * , W_i, >\delta n * (W_i,W_j) is \epsilon_-regular for each pair 1\le i\le j\le k * , d(W_i,W_j)-d(V_i,V_j), \le \epsilon_0 for all but \epsilon_0 , \mathcal, ^2 pairs 1\le i\le j\le k


Motivation

The corollary explores deeper the small energy increment. It gives us a partition together with subsets with large sizes from each part, which are pairwise regular. In addition, the density between the corresponding subset pairs differs "not much" from the density between the corresponding parts.


Proof of corollary

We'll only prove the weaker result where the second condition only requires (W_i,W_j) to be \epsilon_-regular for 1\le i< j\le k. The full version can be proved by picking more subsets from each part that are mostly pairwise regular and combine them together. Let r=\epsilon_0^3/20. We apply the strong regularity lemma to find equitable \mathcal P that is a r regular partition and equitable \mathcal Q that is a r/, P, ^4 regular refinement of \mathcal P , such that q(\mathcal Q)-q(\mathcal P) \le r and , \mathcal Q, \le M. Now assume that P=\, we randomly pick a vertex v_i from each V_i and let W_i to be the set that contains v_i in \mathcal Q. We argue that the subsets W_i satisfy all the conditions with probability > 0. By setting \delta = \frac the first condition is trivially true since \mathcal Q is an equitable partition. Since at most \frac\binom\le \epsilon_0\frac vertex pairs live between irregular pairs in \mathcal Q , the probability that the pair W_i and W_j is irregular \le \frac, by union bound, the probability that at least one pair W_i, W_i is irregular \le 1/3. Note that \beginr&\ge q(\mathcal Q)-q(\mathcal P)\\&=\sum_\frac\mathbb E, d(W_i, W_j)-d(V_i, V_j), ^2\\ &\ge \sum_\frac\mathbb E, d(W_i, W_j)-d(V_i, V_j), ^2\\&=\frac\mathbb E\sum_ , d(W_i, W_j)-d(V_i, V_j), ^2 \end So by Markov's inequality P(\sum_ , d(W_i, W_j)-d(V_i, V_j), ^2\ge 8, P, ^2r)\le 1/2, so with probability \ge 1/2, at most \epsilon_0 , P, ^2 pairs could have d(W_i, W_j)-d(V_i, V_j) \ge \epsilon_0. By union bound, the probability that all conditions hold \ge 1-1/2-1/3 >0.


History and Extensions

first introduced a weaker version of this lemma, restricted to bipartite graphs, in order to prove
Szemerédi's theorem In arithmetic combinatorics, Szemerédi's theorem is a result concerning arithmetic progressions in subsets of the integers. In 1936, Erdős and Turán conjectured that every set of integers ''A'' with positive natural density contains a ''k''-ter ...
, and in he proved the full lemma. Extensions of the regularity method to
hypergraph In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices. Formally, an undirected hypergraph H is a pair H = (X,E) wh ...
s were obtained by Rödl and his collaborators and Gowers. János Komlós, Gábor Sárközy and
Endre Szemerédi Endre Szemerédi (; born August 21, 1940) is a Hungarian-American mathematician and computer scientist, working in the field of combinatorics and theoretical computer science. He has been the State of New Jersey Professor of computer science a ...
later (in 1997) proved in the
blow-up lemma The blow-up lemma, proved by János Komlós, Gábor N. Sárközy, and Endre Szemerédi in 1997, is an important result in extremal graph theory, particularly within the context of the regularity method. It states that the regular pairs in the s ...
that the regular pairs in Szemerédi regularity lemma behave like complete bipartite graphs under the correct conditions. The lemma allowed for deeper exploration into the nature of embeddings of large sparse graphs into dense graphs. The first constructive version was provided by Alon, Duke, Lefmann, Rödl and Yuster. Subsequently,
Frieze In architecture, the frieze is the wide central section part of an entablature and may be plain in the Ionic or Doric order, or decorated with bas-reliefs. Paterae are also usually used to decorate friezes. Even when neither columns nor ...
and Kannan gave a different version and extended it to hypergraphs. They later produced a different construction due to Alan Frieze and Ravi Kannan that uses singular values of matrices. One can find more efficient non-deterministic algorithms, as formally detailed in
Terence Tao Terence Chi-Shen Tao (; born 17 July 1975) is an Australian-American mathematician. He is a professor of mathematics at the University of California, Los Angeles (UCLA), where he holds the James and Carol Collins chair. His research includes ...
's blog and implicitly mentioned in various papers. An inequality of
Terence Tao Terence Chi-Shen Tao (; born 17 July 1975) is an Australian-American mathematician. He is a professor of mathematics at the University of California, Los Angeles (UCLA), where he holds the James and Carol Collins chair. His research includes ...
extends the Szemerédi regularity lemma, by revisiting it from the perspective of probability theory and information theory instead of graph theory. Terence Tao has also provided a proof of the lemma based on spectral theory, using the adjacency matrices of graphs. It is not possible to prove a variant of the regularity lemma in which all pairs of partition sets are regular. Some graphs, such as the half graphs, require many pairs of partitions (but a small fraction of all pairs) to be irregular. It is a common variant in the definition of an \varepsilon-regular partition to require that the vertex sets all have the same size, while collecting the leftover vertices in an "error"-set V_0 whose size is at most an \varepsilon-fraction of the size of the vertex set of G. A stronger variation of the regularity lemma was proven by Alon, Fischer, Krivelevich, and Szegedy while proving the induced graph removal lemma. This works with a sequence of \varepsilon instead of just one, and shows that there exists a partition with an extremely regular refinement, where the refinement doesn't have too large of an energy increment. Szemerédi's regularity lemma can be interpreted as saying that the space of all graphs is
totally bounded In topology and related branches of mathematics, total-boundedness is a generalization of compactness for circumstances in which a set is not necessarily closed. A totally bounded set can be covered by finitely many subsets of every fixed “size†...
(and hence precompact) in a suitable metric (the cut distance). Limits in this metric can be represented by
graphon GraphOn GO-Global is a multi-user remote access application for Windows. Overview GO-Global allows multiple users to concurrently run Microsoft Windows applications installed on a Windows server or server farm  from network-connected lo ...
s; another version of the regularity lemma simply states that the space of graphons is
compact Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to: * Interstate compact * Blood compact, an ancient ritual of the Philippines * Compact government, a type of colonial rule utilized in British ...
.


References


Further reading

*. *.


External links

* Edmonds, Chelsea; Koutsoukou-Argyraki, Angeliki; Paulson, Lawrence C.br>Szemerédi's regularity lemma (Formal proof development in Isabelle/HOL, Archive of Formal Proofs)
{{DEFAULTSORT:Szemeredi regularity lemma Lemmas in graph theory Information theory