Growth Function
   HOME

TheInfoList



OR:

The growth function, also called the shatter coefficient or the shattering number, measures the richness of a
set family In set theory and related branches of mathematics, a collection F of subsets of a given set S is called a family of subsets of S, or a family of sets over S. More generally, a collection of any sets whatsoever is called a family of sets, set f ...
. It is especially used in the context of
statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on dat ...
, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties. It is a basic concept in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
., especially Section 3.2


Definitions


Set-family definition

Let H be a
set family In set theory and related branches of mathematics, a collection F of subsets of a given set S is called a family of subsets of S, or a family of sets over S. More generally, a collection of any sets whatsoever is called a family of sets, set f ...
(a set of sets) and C a set. Their ''intersection'' is defined as the following set-family: : H\cap C := \ The ''intersection-size'' (also called the ''index'') of H with respect to C is , H\cap C, . If a set C_m has m elements then the index is at most 2^m. If the index is exactly 2''m'' then the set C is said to be shattered by H, because H\cap C contains all the subsets of C, i.e.: : , H\cap C, =2^, The growth function measures the size of H\cap C as a function of , C, . Formally: : \operatorname(H,m) := \max_ , H\cap C,


Hypothesis-class definition

Equivalently, let H be a hypothesis-class (a set of binary functions) and C a set with m elements. The ''restriction'' of H to C is the set of binary functions on C that can be derived from H: : H_ := \ The growth function measures the size of H_C as a function of , C, : : \operatorname(H,m) := \max_ , H_C,


Examples

1. The domain is the real line \mathbb. The set-family H contains all the
half-line In geometry, a line is an infinitely long object with no width, depth, or curvature. Thus, lines are one-dimensional objects, though they may exist in two, three, or higher dimension spaces. The word ''line'' may also refer to a line segmen ...
s (rays) from a given number to positive infinity, i.e., all sets of the form \ for some x_0\in \mathbb. For any set C of m real numbers, the intersection H\cap C contains m+1 sets: the empty set, the set containing the largest element of C, the set containing the two largest elements of C, and so on. Therefore: \operatorname(H,m)=m+1. The same is true whether H contains open half-lines, closed half-lines, or both. 2. The domain is the segment ,1/math>. The set-family H contains all the open sets. For any finite set C of m real numbers, the intersection H\cap C contains all possible subsets of C. There are 2^m such subsets, so \operatorname(H,m)=2^m. 3. The domain is the Euclidean space \mathbb^n. The set-family H contains all the half-spaces of the form: x\cdot \phi \geq 1, where \phi is a fixed vector. Then \operatorname(H,m)=\operatorname(n,m), where Comp is the number of components in a partitioning of an n-dimensional space by m hyperplanes. 4. The domain is the real line \mathbb. The set-family H contains all the real intervals, i.e., all sets of the form \ for some x_0,x_1\in \mathbb. For any set C of m real numbers, the intersection H\cap C contains all runs of between 0 and m consecutive elements of C. The number of such runs is +1, so \operatorname(H,m)=+1.


Polynomial or exponential

The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between. The following is a property of the intersection-size: * If, for some set C_m of size m, and for some number n\leq m, , H\cap C_m, \geq \operatorname(n,m) - * then, there exists a subset C_n\subseteq C_m of size n such that , H\cap C_n, = 2^n. This implies the following property of the Growth function. For every family H there are two cases: * The ''exponential case'': \operatorname(H,m) = 2^m identically. * The ''polynomial case'': \operatorname(H,m) is majorized by \operatorname(n,m) \leq m^n+1, where n is the smallest integer for which \operatorname(H,n) < 2^n.


Other properties


Trivial upper bound

For any finite H: :\operatorname(H,m) \leq , H, since for every C, the number of elements in H\cap C is at most , H, . Therefore, the growth function is mainly interesting when H is infinite.


Exponential upper bound

For any nonempty H: :\operatorname(H,m) \leq 2^m I.e, the growth function has an exponential upper-bound. We say that a set-family H shatters a set C if their intersection contains all possible subsets of C, i.e. H\cap C = 2^C. If H shatters C of size m, then \operatorname(H,C)=2^, which is the upper bound.


Cartesian intersection

Define the Cartesian intersection of two set-families as: ::H_1\bigotimes H_2 := \. Then: ::\operatorname(H_1\bigotimes H_2, m) \leq \operatorname(H_1, m)\cdot \operatorname(H_2, m)


Union

For every two set-families: ::\operatorname(H_1\cup H_2, m) \leq \operatorname(H_1, m) + \operatorname(H_2, m)


VC dimension

The
VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Vic ...
of H is defined according to these two cases: * In the ''polynomial case'', \operatorname(H) = n-1 = the largest integer d for which \operatorname(H,d) = 2^d. * In the ''exponential case'' \operatorname(H) = \infty. So \operatorname(H)\geq d if-and-only-if \operatorname(H,d)=2^d. The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether \operatorname(H,d) is equal to or smaller than 2^d, while the growth function tells us exactly how \operatorname(H,m) changes as a function of m. Another connection between the growth function and the VC dimension is given by the
Sauer–Shelah lemma In combinatorial mathematics and extremal set theory, the Sauer–Shelah lemma states that every family of sets with small VC dimension consists of a small number of sets. It is named after Norbert Sauer and Saharon Shelah, who published it indep ...
: :If \operatorname(H)=d, then: :for all m: \operatorname(H,m)\leq \sum_^ In particular, :for all m>d+1: \operatorname(H,m)\leq (em/d)^d = O(m^d) :so when the VC dimension is finite, the growth function grows polynomially with m. This upper bound is tight, i.e., for all m>d there exists H with VC dimension d such that: :\operatorname(H,m)= \sum_^


Entropy

While the growth-function is related to the ''maximum'' intersection-size, the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
is related to the ''average'' intersection size: ::\operatorname(H, m) = E_\big H\cap C_m, )\big/math> The intersection-size has the following property. For every set-family H: :: , H\cap (C_1 \cup C_2), \leq , H\cap C_1, \cdot , H \cap C_2, Hence: :: \operatorname(H, m_1+m_2) \leq \operatorname(H, m_1) + \operatorname(H, m_2) Moreover, the sequence \operatorname(H, m)/m converges to a constant c\in ,1/math> when m\to\infty. Moreover, the random-variable \log_2 is concentrated near c.


Applications in probability theory

Let \Omega be a set on which a
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...
\Pr is defined. Let H be family of subsets of \Omega (= a family of events). Suppose we choose a set C_m that contains m elements of \Omega, where each element is chosen at random according to the probability measure P, independently of the others (i.e., with replacements). For each event h\in H, we compare the following two quantities: * Its relative frequency in C_m, i.e., , h\cap C_m, /m; * Its probability \Pr /math>. We are interested in the difference, D(h,C_m) := \big, , h\cap C_m, /m - \Pr big, . This difference satisfies the following upper bound: :: \Pr\left forall h\in H: D(h,C_m) \leq \sqrt \right~~~~ > ~~~~ 1 - \delta which is equivalent to: :: \Pr\big forall h\in H: D(h,C_m) \leq \varepsilon\big~~~~ > ~~~~ 1 - 4\cdot \operatorname(H, 2m)\cdot \exp(-\varepsilon^2\cdot m/8) In words: the probability that for ''all'' events in H, the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of H. A corollary of this is that, if the growth function is polynomial in m (i.e., there exists some n such that \operatorname(H,m)\leq m^n+1), then the above probability approaches 1 as m\to\infty. I.e, the family H enjoys
uniform convergence in probability Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the ''empirical frequencies'' of all events in a certain event-family co ...
.


References

{{reflist Measures of complexity Statistical classification Computational learning theory Families of sets