The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family. It is especially used in the context of

statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on d ...

, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties. It is a basic concept in

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

., especially Section 3.2

Definitions

Set-family definition

Let

H

be a set family (a set of sets) and

C

a set. Their ''intersection'' is defined as the following set-family: :

H\cap C := \

The ''intersection-size'' (also called the ''index'') of

H

with respect to

C

, H\cap C,

. If a set

C_m

has

m

elements then the index is at most

2^m

. If the index is exactly 2^''m'' then the set

C

is said to be shattered by

H

, because

H\cap C

contains all the subsets of

C

, i.e.: :

, H\cap C, =2^,

The growth function measures the size of

H\cap C

as a function of

, C,

. Formally: :

\operatorname(H,m) := \max_ , H\cap C,

Hypothesis-class definition

Equivalently, let

H

be a hypothesis-class (a set of binary functions) and

C

a set with

m

elements. The ''restriction'' of

H

C

is the set of binary functions on

C

that can be derived from

H

: :

H_ := \

The growth function measures the size of

H_C

as a function of

, C,

: :

\operatorname(H,m) := \max_ , H_C,

Examples

1. The domain is the real line

\mathbb

. The set-family

H

contains all the

half-line In geometry, a line is an infinitely long object with no width, depth, or curvature. Thus, lines are one-dimensional objects, though they may exist in two, three, or higher dimension spaces. The word ''line'' may also refer to a line segment ...

s (rays) from a given number to positive infinity, i.e., all sets of the form

\

for some

x_0\in \mathbb

. For any set

C

m

real numbers, the intersection

H\cap C

contains

m+1

sets: the empty set, the set containing the largest element of

C

, the set containing the two largest elements of

C

, and so on. Therefore:

\operatorname(H,m)=m+1

. The same is true whether

H

contains open half-lines, closed half-lines, or both. 2. The domain is the segment

,1 /math>. 
The set-family H contains all the open sets. 
For any finite set C of m real numbers, the intersection H\cap C contains all possible subsets of C . There are 2^m such subsets, so \operatorname(H,m)=2^m . 3. The domain is the Euclidean space \mathbb^n . 
The set-family H contains all the half-space s of the form: x\cdot \phi \geq 1, where \phi is a fixed vector.
Then \operatorname(H,m)=\operatorname(n,m),
where Comp is the number of components in a partitioning of an n-dimensional space by m hyperplanes . 4. The domain is the real line \mathbb . The set-family H contains all the real intervals, i.e., all sets of the form \ for some x_0,x_1\in \mathbb .  For any set C of m real numbers, the intersection H\cap C contains all runs of between 0 and m consecutive elements of C . The number of such runs is +1, so \operatorname(H,m)=+1 .

Polynomial or exponential

The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between. The following is a property of the intersection-size: * If, for some set

C_m

of size

m

, and for some number

n\leq m

, H\cap C_m,  \geq \operatorname(n,m)

- * then, there exists a subset

C_n\subseteq C_m

of size

n

such that

, H\cap C_n,  = 2^n

. This implies the following property of the Growth function. For every family

H

there are two cases: * The ''exponential case'':

\operatorname(H,m) = 2^m

identically. * The ''polynomial case'':

\operatorname(H,m)

is majorized by

\operatorname(n,m) \leq m^n+1

, where

n

is the smallest integer for which

\operatorname(H,n) < 2^n

Other properties

Trivial upper bound

For any finite

H

: :

\operatorname(H,m) \leq , H,

since for every

C

, the number of elements in

H\cap C

is at most

, H,

. Therefore, the growth function is mainly interesting when

H

is infinite.

Exponential upper bound

For any nonempty

H

: :

\operatorname(H,m) \leq 2^m

I.e, the growth function has an exponential upper-bound. We say that a set-family

H

shatters a set

C

if their intersection contains all possible subsets of

C

, i.e.

H\cap C = 2^C

. If

H

shatters

C

of size

m

, then

\operatorname(H,C)=2^

, which is the upper bound.

Cartesian intersection

Define the Cartesian intersection of two set-families as: ::

H_1\bigotimes H_2 := \

. Then: ::

\operatorname(H_1\bigotimes H_2, m) \leq \operatorname(H_1, m)\cdot \operatorname(H_2, m)

Union

For every two set-families: ::

\operatorname(H_1\cup H_2, m) \leq \operatorname(H_1, m) + \operatorname(H_2, m)

VC dimension

The

VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * ...

H

is defined according to these two cases: * In the ''polynomial case'',

\operatorname(H) = n-1

= the largest integer

d

for which

\operatorname(H,d) = 2^d

. * In the ''exponential case''

\operatorname(H) = \infty

. So

\operatorname(H)\geq d

if-and-only-if

\operatorname(H,d)=2^d

. The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether

\operatorname(H,d)

is equal to or smaller than

2^d

, while the growth function tells us exactly how

\operatorname(H,m)

changes as a function of

m

. Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma: :If

\operatorname(H)=d

, then: :for all

m

\operatorname(H,m)\leq \sum_^

In particular, :for all

m>d+1

\operatorname(H,m)\leq (em/d)^d = O(m^d)

:so when the VC dimension is finite, the growth function grows polynomially with

m

. This upper bound is tight, i.e., for all

m>d

there exists

H

with VC dimension

d

such that: :

\operatorname(H,m)= \sum_^

Entropy

While the growth-function is related to the ''maximum'' intersection-size, the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

is related to the ''average'' intersection size: ::

\operatorname(H, m) = E_\big H\cap C_m, )\big /math>
The intersection-size has the following property. For every set-family H :
::, H\cap (C_1 \cup C_2),  
\leq 
, H\cap C_1, \cdot , H \cap C_2, Hence:
:: \operatorname(H, m_1+m_2)
\leq
\operatorname(H, m_1) + \operatorname(H, m_2) Moreover, the sequence \operatorname(H, m)/m converges to a constant c\in,1 /math> when m\to\infty .

Moreover, the random-variable \log_2 is concentrated near c .

Applications in probability theory

Let

\Omega

be a set on which a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

\Pr

is defined. Let

H

be family of subsets of

\Omega

(= a family of events). Suppose we choose a set

C_m

that contains

m

elements of

\Omega

, where each element is chosen at random according to the probability measure

P

, independently of the others (i.e., with replacements). For each event

h\in H

, we compare the following two quantities: * Its relative frequency in

C_m

, i.e.,

, h\cap C_m, /m

; * Its probability

\Pr /math>.
We are interested in the difference, D(h,C_m) := \big, , h\cap C_m, /m - \Pr big, . This difference satisfies the following upper bound:
:: \Pr\left forall h\in H: D(h,C_m) \leq 
\sqrt
\right ~~~~
>
~~~~
1 - \delta which is equivalent to: :: \Pr\big forall h\in H: D(h,C_m) \leq \varepsilon\big ~~~~
>
~~~~
1 - 4\cdot \operatorname(H, 2m)\cdot \exp(-\varepsilon^2\cdot m/8) In words: the probability that for ''all'' events in H, the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of H .

A corollary of this is that, if the growth function is polynomial in m (i.e., there exists some n such that \operatorname(H,m)\leq m^n+1), then the above probability approaches 1 as m\to\infty . I.e, the family H enjoys

uniform convergence in probability Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the ''empirical frequencies'' of all events in a certain event-family con ...

References

{{reflist Measures of complexity Statistical classification Computational learning theory Families of sets