Sauer–Shelah Lemma
   HOME

TheInfoList



OR:

In combinatorial mathematics and extremal set theory, the Sauer–Shelah lemma states that every
family of sets In set theory and related branches of mathematics, a family (or collection) can mean, depending upon the context, any of the following: set, indexed set, multiset, or class. A collection F of subsets of a given set S is called a family of su ...
with small
VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and other Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Victorious ...
consists of a small number of sets. It is named after Norbert Sauer and
Saharon Shelah Saharon Shelah (; , ; born July 3, 1945) is an Israeli mathematician. He is a professor of mathematics at the Hebrew University of Jerusalem and Rutgers University in New Jersey. Biography Shelah was born in Jerusalem on July 3, 1945. He is th ...
, who published it independently of each other in 1972. The same result was also published slightly earlier and again independently, by Vladimir Vapnik and Alexey Chervonenkis, after whom the VC dimension is named. In his paper containing the lemma, Shelah gives credit also to
Micha Perles Micha Asher Perles () is an Israeli mathematician working in geometry, a professor emeritus at the Hebrew University. He earned his Ph.D. in 1964 from the Hebrew University, under the supervision of Branko Grünbaum. His contributions include: *Th ...
, and for this reason the lemma has also been called the Perles–Sauer–Shelah lemma and the Sauer–Shelah–Perles lemma. Buzaglo et al. call this lemma "one of the most fundamental results on VC-dimension", and it has applications in many areas. Sauer's motivation was in the
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
of set systems, while Shelah's was in
model theory In mathematical logic, model theory is the study of the relationship between theory (mathematical logic), formal theories (a collection of Sentence (mathematical logic), sentences in a formal language expressing statements about a Structure (mat ...
and that of Vapnik and Chervonenkis was in
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
. It has also been applied in
discrete geometry Discrete geometry and combinatorial geometry are branches of geometry that study combinatorial properties and constructive methods of discrete geometric objects. Most questions in discrete geometry involve finite or discrete sets of basic geom ...
and
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
.


Definitions and statement

If \textstyle \mathcal=\ is a family of sets and T is a set, then T is said to be shattered by \mathcal if every subset of T (including the
empty set In mathematics, the empty set or void set is the unique Set (mathematics), set having no Element (mathematics), elements; its size or cardinality (count of elements in a set) is 0, zero. Some axiomatic set theories ensure that the empty set exi ...
and T itself) can be obtained as the intersection T\cap S_i of T with some set S_i in the family. The
VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and other Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Victorious ...
of \mathcal is the largest
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
of a set shattered by \mathcal. In terms of these definitions, the Sauer–Shelah lemma states that if the VC dimension of \mathcal is k, and the union of \mathcal has n elements, then \mathcal can consist of at most \sum_^ =O(n^k) sets, as expressed using
big O notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...
. Equivalently, if the number of sets in the family, , \mathcal, , obeys the inequality , \mathcal, > \sum_^ , then \mathcal shatters a set of size k. The bound of the lemma is tight: Let the family \mathcal be composed of all subsets of \ with size less than k. Then the number of sets in \mathcal is exactly \sum_^ but it does not shatter any set of size k.


The number of shattered sets

A strengthening of the Sauer–Shelah lemma, due to , states that every finite set family \mathcal shatters at least , \mathcal, sets. This immediately implies the Sauer–Shelah lemma, because only \sum_^ of the subsets of an n-item universe have cardinality less than k. Thus, when , \mathcal, >\sum_^ , there are not enough small sets to be shattered, so one of the shattered sets must have cardinality at least k. For a restricted type of shattered set, called an order-shattered set, the number of shattered sets always equals the cardinality of the set family.


Proof

Pajor's variant of the Sauer–Shelah lemma may be proved by
mathematical induction Mathematical induction is a method for mathematical proof, proving that a statement P(n) is true for every natural number n, that is, that the infinitely many cases P(0), P(1), P(2), P(3), \dots  all hold. This is done by first proving a ...
; the proof has variously been credited to
Noga Alon Noga Alon (; born 1956) is an Israeli mathematician and a professor of mathematics at Princeton University noted for his contributions to combinatorics and theoretical computer science, having authored hundreds of papers. Education and career ...
or to
Ron Aharoni Ron Aharoni (; born 1952) is an Israeli mathematician, working in finite and infinite combinatorics. Aharoni is a professor at the Technion – Israel Institute of Technology, where he received his Ph.D. in mathematics in 1979. With Nash-William ...
and Ron Holzman. ;Base: Every family of only one set shatters the empty set. ;Step: Assume the lemma is true for all families of size less than , \mathcal, and let \mathcal be a family of two or more sets. Let x be an element that belongs to some but not all of the sets in \mathcal. Split \mathcal into two subfamilies, of the sets that contain x and the sets that do not contain x. By the induction assumption, these two subfamilies shatter two collections of sets whose sizes add to at least , \mathcal, . None of these shattered sets contain x, since a set that contains x cannot be shattered by a family in which all sets contain x or all sets do not contain x. Some of the shattered sets may be shattered by both subfamilies. When a set S is shattered by only one of the two subfamilies, it contributes one unit both to the number of shattered sets of the subfamily and to the number of shattered sets of \mathcal. When a set S is shattered by both subfamilies, both S and S\cup\ are shattered by \mathcal, so S contributes two units to the number of shattered sets of the subfamilies and of \mathcal. Therefore, the number of shattered sets of \mathcal is at least equal to the number shattered by the two subfamilies of \mathcal, which is at least , \mathcal, . A different proof of the Sauer–Shelah lemma in its original form, by
Péter Frankl Péter Frankl (born 26 March 1953 in Kaposvár, Somogy County, Hungary) is a mathematician, street performer, columnist and educator, active in Japan. Frankl studied mathematics at Eötvös Loránd University in Budapest and submitted his PhD ...
and
János Pach János Pach (born May 3, 1954) is a mathematician and computer scientist working in the fields of combinatorics and discrete and computational geometry. Biography Pach was born and grew up in Hungary. He comes from a noted academic family: his f ...
, is based on
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...
and the
inclusion–exclusion principle In combinatorics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union (set theory), union of two finite sets; symbolically expressed as : , A \cup B, ...
. This proof extends to other settings such as families of vector spaces and, more generally, geometric lattices.


Applications

The original application of the lemma, by Vapnik and Chervonenkis, was in showing that every probability distribution can be approximated (with respect to a family of events of a given VC dimension) by a finite set of sample points whose
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
depends only on the VC dimension of the family of events. In this context, there are two important notions of approximation, both parameterized by a number \varepsilon: a set S of samples, and a probability distribution on S, is said to be an \varepsilon-approximation of the original distribution if the probability of each event with respect to S differs from its original probability by at most \varepsilon. A set S of (unweighted) samples is said to be an \varepsilon-net if every event with probability at least \varepsilon includes at least one point of S. An \varepsilon-approximation must also be an \varepsilon-net but not necessarily vice versa. Vapnik and Chervonenkis used the lemma to show that set systems of VC dimension d always have \varepsilon-approximations of cardinality O(\tfrac\log\tfrac). Later authors including and similarly showed that there always exist \varepsilon-nets of cardinality O(\tfrac\log\tfrac), and more precisely of cardinality at most \tfrac\ln\tfrac+\tfrac\ln\ln\tfrac+\tfrac. The main idea of the proof of the existence of small \varepsilon-nets is to choose a random sample x of cardinality O(\tfrac\log\tfrac) and a second independent random sample y of cardinality O(\tfrac\log^2\tfrac), and to bound the probability that x is missed by some large event E by the probability that x is missed and simultaneously the intersection of y with E is larger than its median value. For any particular E, the probability that x is missed while y is larger than its median is very small, and the Sauer–Shelah lemma (applied to x\cup y) shows that only a small number of distinct events E need to be considered, so by the
union bound In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the indiv ...
, with nonzero probability, x is an \varepsilon-net. In turn, \varepsilon-nets and \varepsilon-approximations, and the likelihood that a random sample of large enough cardinality has these properties, have important applications in
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, in the area of
probably approximately correct learning In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ...
. In computational geometry, they have been applied to
range searching In computer science, the range searching problem consists of processing a set ''S'' of objects, in order to determine which objects from ''S'' intersect with a query object, called the ''range''. For example, if ''S'' is a set of points correspond ...
,
derandomization A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
, and
approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
s. use generalizations of the Sauer–Shelah lemma to prove results in
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
such as that the number of
strong orientation In graph theory, a strong orientation of an undirected graph is an assignment of a direction to each edge (an Orientation (graph theory), orientation) that makes it into a strongly connected graph. Strong orientations have been applied to the des ...
s of a given graph is sandwiched between its numbers of
connected Connected may refer to: Film and television * ''Connected'' (2008 film), a Hong Kong remake of the American movie ''Cellular'' * '' Connected: An Autoblogography About Love, Death & Technology'', a 2011 documentary film * ''Connected'' (2015 TV ...
and 2-edge-connected subgraphs.


See also

*
Growth function The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family or class of functions. It is especially used in the context of statistical learning theory, where it is used to study propertie ...


References

{{DEFAULTSORT:Sauer-Shelah lemma Families of sets Lemmas Extremal combinatorics