The distributional learning theory or learning of probability distribution is a framework in
computational learning theory
In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms.
Overview
Theoretical results in machine learning m ...
. It has been proposed from
Michael Kearns,
Yishay Mansour
Jesse () or Yishai ( he, יִשַׁי – ''Yīšay'', – ''ʾĪšay''. in pausa he, יִשָׁי – ''Yīšāy'', meaning "King" or "God's gift"; syr, ܐܝܫܝ – ''Eshai''; el, Ἰεσσαί – ''Iessaí''; la, Issai, Isai, Jesse), i ...
,
Dana Ron
Dana Ron Goldreich ( he, דנה רון גולדרייך; b. 1964) is a computer scientist, a professor of electrical engineering at the Tel Aviv University, Israel. Prof. Ron is one of the pioneers of research in property testing, and a leading ...
,
Ronitt Rubinfeld
Ronitt Rubinfeld is a professor of electrical engineering and computer science at MIT.
Education
Rubinfeld graduated from the University of Michigan with a BSE in Electrical and Computer Engineering. Following that, she received her PhD from th ...
,
Robert Schapire
Robert Elias Schapire is an American computer scientist, former David M. Siegel '83 Professor in the computer science department at Princeton University, and has recently moved to Microsoft Research. His primary specialty is theoretical and app ...
and
Linda Sellie in 1994
[M. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R. Schapire, L. Sellie ''On the Learnability of Discrete Distributions''. ACM Symposium on Theory of Computing, 199]
/ref> and it was inspired from the PAC-learning, PAC-framework introduced by Leslie Valiant
Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Comput ...
.[L. Valiant ''A theory of the learnable''. Communications of ACM, 1984](_blank)
/ref>
In this framework the input is a number of samples drawn from a distribution that belongs to a specific class of distributions. The goal is to find an efficient algorithm that, based on these samples, determines with high probability the distribution from which the samples have been drawn. Because of its generality, this framework has been used in a large variety of different fields like machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, approximation algorithms
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solut ...
, applied probability
Applied probability is the application of probability theory to statistical problems and other scientific and engineering domains.
Scope
Much research involving probability is done under the auspices of applied probability. However, while such res ...
and statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
.
This article explains the basic definitions, tools and results in this framework from the theory of computation point of view.
Definitions
Let be the support of the distributions of interest. As in the original work of Kearns et al. if is finite it can be assumed without loss of generality that where is the number of bits that have to be used in order to represent any . We focus in probability distributions over .
There are two possible representations of a probability distribution over .
* probability distribution function (or evaluator) an evaluator for takes as input any and outputs a real number