HOME

TheInfoList



OR:

The concept of shattered sets plays an important role in
Vapnik–Chervonenkis theory Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statis ...
, also known as VC-theory. Shattering and VC-theory are used in the study of
empirical process In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model ...
es as well as in statistical
computational learning theory In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms. Overview Theoretical results in machine learning m ...
.


Definition

Suppose ''A'' is a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
and ''C'' is a
class Class or The Class may refer to: Common uses not otherwise categorized * Class (biology), a taxonomic rank * Class (knowledge representation), a collection of individuals or objects * Class (philosophy), an analytical concept used differentl ...
of sets. The class ''C'' shatters the set ''A'' if for each subset ''a'' of ''A'', there is some element ''c'' of ''C'' such that : a = c \cap A. Equivalently, ''C'' shatters ''A'' when their
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their i ...
is equal to ''As
power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is po ...
: ''P''(''A'') = . We employ the letter ''C'' to refer to a "class" or "collection" of sets, as in a Vapnik–Chervonenkis class (VC-class). The set ''A'' is often assumed to be
finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb, a verb form that has a subject, usually being inflected or marked ...
because, in empirical processes, we are interested in the shattering of finite sets of data points.


Example

We will show that the class of all discs in the
plane Plane(s) most often refers to: * Aero- or airplane, a powered, fixed-wing aircraft * Plane (geometry), a flat, 2-dimensional surface Plane or planes may also refer to: Biology * Plane (tree) or ''Platanus'', wetland native plant * Planes (gen ...
(two-dimensional space) does not shatter every set of four points on the
unit circle In mathematics, a unit circle is a circle of unit radius—that is, a radius of 1. Frequently, especially in trigonometry, the unit circle is the circle of radius 1 centered at the origin (0, 0) in the Cartesian coordinate system in the Eucl ...
, yet the class of all
convex set In geometry, a subset of a Euclidean space, or more generally an affine space over the reals, is convex if, given any two points in the subset, the subset contains the whole line segment that joins them. Equivalently, a convex set or a convex r ...
s in the plane does shatter every finite set of points on the
unit circle In mathematics, a unit circle is a circle of unit radius—that is, a radius of 1. Frequently, especially in trigonometry, the unit circle is the circle of radius 1 centered at the origin (0, 0) in the Cartesian coordinate system in the Eucl ...
. Let ''A'' be a set of four points on the unit circle and let ''C'' be the class of all discs. To test where ''C'' shatters ''A'', we attempt to draw a disc around every subset of points in ''A''. First, we draw a disc around the subsets of each isolated point. Next, we try to draw a disc around every subset of point pairs. This turns out to be doable for adjacent points, but impossible for points on opposite sides of the circle. As visualized below: Image:shattering02.png, Each individual point can be isolated with a disc (showing all four). Image:shattering03.png, Each subset of adjacent points can be isolated with a disc (showing one of four). Image:shattering04.png, A subset of points on opposite sides of the unit circle can ''not'' be isolated with a disc. Because there is some subset which can ''not'' be isolated by any disc in ''C'', we conclude then that ''A'' is not shattered by ''C''. And, with a bit of thought, we can prove that no set of four points is shattered by this ''C''. However, if we redefine ''C'' to be the class of all ''elliptical discs'', we find that we can still isolate all the subsets from above, as well as the points that were formerly problematic. Thus, this specific set of 4 points is shattered by the class of elliptical discs. Visualized below: Image:shattering05.png, Opposite points of ''A'' are now separable by some ellipse (showing one of two) Image:shattering06.png, Each subset of three points in ''A'' is also separable by some ellipse (showing one of four) With a bit of thought, we could generalize that any set of finite points on a unit circle could be shattered by the class of all
convex set In geometry, a subset of a Euclidean space, or more generally an affine space over the reals, is convex if, given any two points in the subset, the subset contains the whole line segment that joins them. Equivalently, a convex set or a convex r ...
s (visualize connecting the dots).


Shatter coefficient

To quantify the richness of a collection ''C'' of sets, we use the concept of ''shattering coefficients'' (also known as the ''growth function''). For a collection ''C'' of sets s \subset \Omega, \Omega being any space, often a
sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...
, we define the ''n''th ''shattering coefficient'' of ''C'' as : S_C(n) = \max_ \operatorname \ where \operatorname denotes the
cardinality In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
of the set and x_1,x_2,\dots,x_n \in \Omega is any set of ''n'' points,. S_C(n) is the largest number of subsets of any set ''A'' of ''n'' points that can be formed by intersecting ''A'' with the sets in collection ''C''. Here are some facts about S_C(n): # S_C(n)\leq 2^n for all ''n'' because \\subseteq P(A) for any A\subseteq \Omega. # If S_C(n)=2^n, that means there is a set of cardinality ''n'', which can be shattered by ''C''. # If S_C(N)<2^N for some N>1 then S_C(n)<2^n for all n\geq N. The third property means that if ''C'' cannot shatter any set of cardinality ''N'' then it can not shatter sets of larger cardinalities.


Vapnik–Chervonenkis class

The
VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Vic ...
of a class ''C'' is defined as :VC(C)=\underset\\, or, alternatively, as :VC_0(C)=\underset\.\, Note that VC(C)=VC_0(C)+1. If for any ''n'' there is a set of cardinality ''n'' which can be shattered by ''C'', then S_C(n)=2^n for all ''n'' and the VC dimension of this class ''C'' is infinite. A class with finite VC dimension is called a ''Vapnik–Chervonenkis class'' or ''VC class''. A class ''C'' is uniformly Glivenko–Cantelli if and only if it is a VC class.


See also

*
Sauer–Shelah lemma In combinatorial mathematics and extremal set theory, the Sauer–Shelah lemma states that every family of sets with small VC dimension consists of a small number of sets. It is named after Norbert Sauer and Saharon Shelah, who published it indep ...
, relating the cardinality of a
family of sets In set theory and related branches of mathematics, a collection F of subsets of a given set S is called a family of subsets of S, or a family of sets over S. More generally, a collection of any sets whatsoever is called a family of sets, set fam ...
to the size of its largest shattered set


References

*. * *{{citation, authorlink=J. Michael Steele, last=Steele, first=J. M., year=1978, title=Empirical discrepancies and subadditive processes, journal=
Annals of Probability The ''Annals of Probability'' is a leading peer-reviewed probability journal published by the Institute of Mathematical Statistics, which is the main international society for researchers in the areas probability and statistics. The journal was sta ...
, volume=6, pages=118–227, issue=1, jstor=2242865, doi=10.1214/aop/1176995615, doi-access=free.


External links


Origin of "Shattered sets" terminology
by J. Steele Empirical process Computational learning theory