Growth function

The '''growth function''', also called the '''shatter coefficient''' or the '''shattering number''', measures the richness of a set family or class of functions. It is especially used in the context of statistical learning theory, where it is used to study properties of statistical learning methods. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.<ref name="vc">{{Cite Vapnik Chervonenkis}}</ref> It is a basic concept in machine learning.<ref name=book12>{{Cite Mehryar Afshin Ameet 2012}}, especially Section 3.2</ref> <ref name=book14>{{Cite Shai Shai 2014}}</ref>

== Definitions == === Set-family definition === Let <math>H</math> be a set family (a set of sets) and <math>C</math> a set. Their ''intersection'' is defined as the following set-family: :<math mode=display> H\cap C := \{h\cap C\mid h\in H\} </math>

The ''intersection-size'' (also called the ''index'') of <math>H</math> with respect to <math>C</math> is <math>|H\cap C|</math>. If a set <math>C_m</math> has <math>m</math> elements then the index is at most <math>2^m</math>. If the index is exactly 2<sup>''m''</sup> then the set <math>C</math> is said to be shattered by <math>H</math>, because <math mode=display> H\cap C</math> contains all the subsets of <math mode>C</math>, i.e.: :<math mode=display> |H\cap C|=2^{|C|}, </math>

The growth function measures the size of <math>H\cap C</math> as a function of <math>|C|</math>. Formally: :<math mode=display> \operatorname{Growth}(H,m) := \max_{C: |C|=m} |H\cap C| </math>

=== Hypothesis-class definition === Equivalently, let <math>H</math> be a hypothesis-class (a set of binary functions) and <math>C</math> a set with <math>m</math> elements. The ''restriction'' of <math>H</math> to <math>C</math> is the set of binary functions on <math>C</math> that can be derived from <math>H</math>:<ref name=book14/>{{rp|45}} :<math mode=display> H_{C} := \{(h(x_1),\ldots,h(x_m))\mid h\in H, x_i\in C\} </math> The growth function measures the size of <math>H_C</math> as a function of <math>|C|</math>:<ref name=book14/>{{rp|49}} :<math mode="display"> \operatorname{Growth}(H,m) := \max_{C: |C|=m} |H_C| </math>

== Examples == '''1.''' The domain is the real line <math>\mathbb{R}</math>. The set-family <math>H</math> contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form <math>\{x > x_0 \mid x\in \mathbb{R}\}</math> for some <math>x_0\in \mathbb{R}</math>. For any set <math>C</math> of <math>m</math> real numbers, the intersection <math>H\cap C</math> contains <math>m+1</math> sets: the empty set, the set containing the largest element of <math>C</math>, the set containing the two largest elements of <math>C</math>, and so on. Therefore: <math>\operatorname{Growth}(H,m)=m+1</math>.<ref name=vc/>{{rp|Ex.1}} The same is true whether <math>H</math> contains open half-lines, closed half-lines, or both.

'''2.''' The domain is the segment <math>[0,1]</math>. The set-family <math>H</math> contains all the open sets. For any finite set <math>C</math> of <math>m</math> real numbers, the intersection <math>H\cap C</math> contains all possible subsets of <math>C</math>. There are <math>2^m</math> such subsets, so <math>\operatorname{Growth}(H,m)=2^m</math>. <ref name=vc/>{{rp|Ex.2}}

'''3.''' The domain is the Euclidean space <math>\mathbb{R}^n</math>. The set-family <math>H</math> contains all the half-spaces of the form: <math>x\cdot \phi \geq 1</math>, where <math>\phi</math> is a fixed vector. Then <math>\operatorname{Growth}(H,m)=\operatorname{Comp}(n,m)</math>, where Comp is the number of components in a partitioning of an n-dimensional space by m hyperplanes.<ref name=vc/>{{rp|Ex.3}}

'''4.''' The domain is the real line <math>\mathbb{R}</math>. The set-family <math>H</math> contains all the real intervals, i.e., all sets of the form <math>\{x \in [x_0,x_1] | x\in \mathbb{R}\}</math> for some <math>x_0,x_1\in \mathbb{R}</math>. For any set <math>C</math> of <math>m</math> real numbers, the intersection <math>H\cap C</math> contains all runs of between 0 and <math>m</math> consecutive elements of <math>C</math>. The number of such runs is <math>{m+1 \choose 2}+1</math>, so <math>\operatorname{Growth}(H,m)={m+1 \choose 2}+1</math>.

== Polynomial or exponential == The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.

The following is a property of the intersection-size:<ref name=vc/>{{rp|Lem.1}} * If, for some set <math>C_m</math> of size <math>m</math>, and for some number <math>n\leq m</math>, <math>|H\cap C_m| \geq \operatorname{Comp}(n,m)</math> - * then, there exists a subset <math>C_n\subseteq C_m</math> of size <math>n</math> such that <math>|H\cap C_n| = 2^n</math>.

This implies the following property of the Growth function.<ref name=vc/>{{rp|Th.1}} For every family <math>H</math> there are two cases: * The ''exponential case'': <math>\operatorname{Growth}(H,m) = 2^m</math> identically. * The ''polynomial case'': <math>\operatorname{Growth}(H,m)</math> is majorized by <math>\operatorname{Comp}(n,m) \leq m^n+1</math>, where <math>n</math> is the smallest integer for which <math>\operatorname{Growth}(H,n) < 2^n</math>.

== Other properties == === Trivial upper bound === For any finite <math>H</math>: :<math>\operatorname{Growth}(H,m) \leq |H|</math> since for every <math>C</math>, the number of elements in <math>H\cap C</math> is at most <math>|H|</math>. Therefore, the growth function is mainly interesting when <math>H</math> is infinite.

=== Exponential upper bound === For any nonempty <math>H</math>: :<math>\operatorname{Growth}(H,m) \leq 2^m</math> I.e, the growth function has an exponential upper-bound.

We say that a set-family <math>H</math> '''shatters''' a set <math>C</math> if their intersection contains all possible subsets of <math>C</math>, i.e. <math>H\cap C = 2^C</math>. If <math>H</math> shatters <math>C</math> of size <math>m</math>, then <math>\operatorname{Growth}(H,C)=2^{m}</math>, which is the upper bound.

=== Cartesian intersection === Define the Cartesian intersection of two set-families as: ::<math>H_1\bigotimes H_2 := \{h_1\cap h_2 \mid h_1\in H_1, h_2\in H_2\}</math>. Then:<ref name=book12/>{{rp|57}} ::<math>\operatorname{Growth}(H_1\bigotimes H_2, m) \leq \operatorname{Growth}(H_1, m)\cdot \operatorname{Growth}(H_2, m)</math>

=== Union === For every two set-families:<ref name=book12/>{{rp|58}} ::<math>\operatorname{Growth}(H_1\cup H_2, m) \leq \operatorname{Growth}(H_1, m) + \operatorname{Growth}(H_2, m)</math>

=== VC dimension === The '''VC dimension''' of <math>H</math> is defined according to these two cases: * In the ''polynomial case'', <math>\operatorname{VCDim}(H) = n-1</math> = the largest integer <math>d</math> for which <math>\operatorname{Growth}(H,d) = 2^d</math>. * In the ''exponential case'' <math>\operatorname{VCDim}(H) = \infty</math>.

So <math>\operatorname{VCDim}(H)\geq d</math> if-and-only-if <math>\operatorname{Growth}(H,d)=2^d</math>.

The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether <math>\operatorname{Growth}(H,d)</math> is equal to or smaller than <math>2^d</math>, while the growth function tells us exactly how <math>\operatorname{Growth}(H,m)</math> changes as a function of <math>m</math>.

Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:<ref name=book14/>{{rp|49}} :If <math>\operatorname{VCDim}(H)=d</math>, then: :for all <math>m</math>: <math>\operatorname{Growth}(H,m)\leq \sum_{i=0}^{d} {m\choose i}</math> In particular, :for all <math>m>d+1</math>: <math>\operatorname{Growth}(H,m)\leq (em/d)^d = O(m^d)</math> :so when the VC dimension is finite, the growth function grows polynomially with <math>m</math>. This upper bound is tight, i.e., for all <math>m>d</math> there exists <math>H</math> with VC dimension <math>d</math> such that:<ref name=book12/>{{rp|56}} :<math>\operatorname{Growth}(H,m)= \sum_{i=0}^{d} {m\choose i}</math>

=== Entropy === While the growth-function is related to the ''maximum'' intersection-size, the '''entropy''' is related to the ''average'' intersection size:<ref name=vc/>{{rp|272–273}} ::<math>\operatorname{Entropy}(H, m) = E_{|C_m|=m}\big[\log_2(|H\cap C_m|)\big]</math> The intersection-size has the following property. For every set-family <math>H</math>: ::<math> |H\cap (C_1 \cup C_2)| \leq |H\cap C_1|\cdot |H \cap C_2| </math> Hence: ::<math> \operatorname{Entropy}(H, m_1+m_2) \leq \operatorname{Entropy}(H, m_1) + \operatorname{Entropy}(H, m_2) </math> Moreover, the sequence <math>\operatorname{Entropy}(H, m)/m</math> converges to a constant <math>c\in [0,1]</math> when <math>m\to\infty</math>.

Moreover, the random-variable <math>\log_2{|H\cap C_m|/m}</math> is concentrated near <math>c</math>.

== Applications in probability theory == Let <math>\Omega</math> be a set on which a probability measure <math>\Pr</math> is defined. Let <math>H</math> be family of subsets of <math>\Omega</math> (= a family of events).

Suppose we choose a set <math>C_m</math> that contains <math>m</math> elements of <math>\Omega</math>, where each element is chosen at random according to the probability measure <math>P</math>, independently of the others (i.e., with replacements). For each event <math>h\in H</math>, we compare the following two quantities: * Its relative frequency in <math>C_m</math>, i.e., <math>|h\cap C_m|/m</math>; * Its probability <math>\Pr[h]</math>. We are interested in the difference, <math>D(h,C_m) := \big||h\cap C_m|/m - \Pr[h]\big|</math>. This difference satisfies the following upper bound: ::<math mode=display> \Pr\left[\forall h\in H: D(h,C_m) \leq \sqrt{8(\ln\operatorname{Growth}(H, 2m) + \ln(4/\delta)) \over m} \right] ~~~~ > ~~~~ 1 - \delta </math> which is equivalent to:<ref name=vc/>{{rp|Th.2}} ::<math mode=display> \Pr\big[\forall h\in H: D(h,C_m) \leq \varepsilon\big] ~~~~ > ~~~~ 1 - 4\cdot \operatorname{Growth}(H, 2m)\cdot \exp(-\varepsilon^2\cdot m/8) </math> In words: the probability that for ''all'' events in <math>H</math>, the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of <math>H</math>.

A corollary of this is that, if the growth function is polynomial in <math>m</math> (i.e., there exists some <math>n</math> such that <math>\operatorname{Growth}(H,m)\leq m^n+1</math>), then the above probability approaches 1 as <math>m\to\infty</math>. I.e, the family <math>H</math> enjoys uniform convergence in probability.

== References == {{reflist}}

Category:Measures of complexity Category:Statistical classification Category:Computational learning theory Category:Families of sets