In
computational learning theory
In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms.
Overview
Theoretical results in machine learning m ...
(
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
theory of computation
In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree (e.g., a ...
), Rademacher complexity, named after
Hans Rademacher
Hans Adolph Rademacher (; 3 April 1892, Wandsbeck, now Hamburg-Wandsbek – 7 February 1969, Haverford, Pennsylvania, USA) was a German-born American mathematician, known for work in mathematical analysis and number theory.
Biography
Rademacher r ...
, measures richness of a class of real-valued functions with respect to a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
.
Definitions
Rademacher complexity of a set
Given a set
, the Rademacher complexity of ''A'' is defined as follows:
[Chapter 26 in ]
:
where
are independent random variables drawn from the
Rademacher distribution
In probability theory and statistics, the Rademacher distribution (which is named after Hans Rademacher) is a discrete probability distribution where a random variate ''X'' has a 50% chance of being +1 and a 50% chance of being -1.
A series (th ...
i.e.
for
, and
. Some authors take the absolute value of the sum before taking the supremum, but if
is
symmetric
Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...
this makes no difference.
Rademacher complexity of a function class
Let
be a sample of points and consider a function class
of real-valued functions over
. Then, the empirical Rademacher complexity of
given
is defined as:
:
This can also be written using the previous definition:
[
:
where denotes ]function composition
In mathematics, function composition is an operation that takes two functions and , and produces a function such that . In this operation, the function is applied to the result of applying the function to . That is, the functions and ...
, i.e.:
:
Let be a probability distribution over .
The Rademacher complexity of the function class with respect to for sample size is:
:
where the above expectation is taken over an identically independently distributed (i.i.d.) sample generated according to .
Intuition
The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of under the expectation, so that this quantity in the sum is maximised.
Examples
1. contains a single vector, e.g., . Then:
::
The same is true for every singleton hypothesis class.
2. contains two vectors, e.g., . Then:
::
Using the Rademacher complexity
The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability Learnability is a quality of products and interfaces that allows users to quickly become familiar with them and able to make good use of all their features and capabilities.
Software testing
In software testing learnability, according to ISO/IEC 9 ...
of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn.
Bounding the representativeness
In machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, it is desired to have a training set
In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
that represents the true distribution of some sample data . This can be quantified using the notion of representativeness. Denote by the probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
from which the samples are drawn. Denote by the set of hypotheses (potential classifiers) and denote by the corresponding set of error functions, i.e., for every hypothesis , there is a function , that maps each training sample (features,label) to the error of the classifier (note in this case hypothesis and classifier are used interchangeably). For example, in the case that represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function returns 1 if correctly classifies a sample and 0 else. We omit the index and write instead of when the underlying hypothesis is irrelevant. Define:
: