HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, computational learning theory (or just learning theory) is a subfield of
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
devoted to studying the design and analysis of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms.


Overview

Theoretical results in machine learning mainly deal with a type of inductive learning called
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples. In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity results: * Positive resultsShowing that a certain class of functions is learnable in polynomial time. * Negative resultsShowing that certain classes cannot be learned in polynomial time. Negative results often rely on commonly believed, but yet unproven assumptions, such as: * Computational complexity – P ≠ NP (the P versus NP problem); * Cryptographic
One-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, s ...
s exist. There are several different approaches to computational learning theory based on making different assumptions about the inference principles used to generalise from limited data. This includes different definitions of
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...
(see
frequency probability Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in many trials (the long-run probability). Probabilities can be found (in principle) by a repe ...
,
Bayesian probability Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification ...
) and different assumptions on the generation of samples. The different approaches include: * Exact learning, proposed by
Dana Angluin Dana Angluin is a professor emeritus of computer science at Yale University. She is known for foundational work in computational learning theory and distributed computing. Education Angluin received her B.A. (1969) and Ph.D. (1976) at University ...
; *
Probably approximately correct learning In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ...
(PAC learning), proposed by
Leslie Valiant Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Compu ...
; *
VC theory VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Vic ...
, proposed by
Vladimir Vapnik Vladimir Naumovich Vapnik (russian: Владимир Наумович Вапник; born 6 December 1936) is one of the main developers of the Vapnik–Chervonenkis theory of statistical learning, and the co-inventor of the support-vector machin ...
and
Alexey Chervonenkis Alexey Yakovlevich Chervonenkis (russian: link=no, Алексей Яковлевич Червоненкис; 7 September 1938 – 22 September 2014) was a Soviet and Russian mathematician. Along with Vladimir Vapnik, he was one of the main develo ...
; * Bayesian inference; *
Algorithmic learning theory Algorithmic learning theory is a mathematical framework for analyzing machine learning problems and algorithms. Synonyms include formal learning theory and algorithmic inductive inference. Algorithmic learning theory is different from statistica ...
, from the work of E. Mark Gold; *
Online machine learning In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques whic ...
, from the work of Nick Littlestone. While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired boosting, VC theory led to support vector machines, and Bayesian inference led to
belief networks A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
.


See also

*
Grammar induction Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of ''re-write rules'' or '' productions'' or alternatively as a finite state machine or automaton of some kind) fr ...
* Information theory *
Stability (learning theory) Stability, also known as algorithmic stability, is a notion in computational learning theory of how a machine learning algorithm is perturbed by small changes to its inputs. A stable learning algorithm is one for which the prediction does not cha ...
*
Error Tolerance (PAC learning) Error tolerance (PAC learning) In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applicat ...


References


Surveys

* Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pages 351–369. http://portal.acm.org/citation.cfm?id=129712.129746 * D. Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101–1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html


VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Vic ...

* V. Vapnik and A. Chervonenkis
On the uniform convergence of relative frequencies of events to their probabilities
Theory of Probability and Its Applications, 16(2):264–280, 1971.


Feature selection

* A. Dhagat and L. Hellerstein, "PAC learning with irrelevant attributes", in 'Proceedings of the IEEE Symp. on Foundation of Computer Science', 1994. http://citeseer.ist.psu.edu/dhagat94pac.html


Inductive inference

*


Optimal O notation learning

*
Oded Goldreich Oded Goldreich ( he, עודד גולדרייך; b. 1957) is a professor of Computer Science at the Faculty of Mathematics and Computer Science of Weizmann Institute of Science, Israel. His research interests lie within the theory of computation ...
,
Dana Ron Dana Ron Goldreich ( he, דנה רון גולדרייך; b. 1964) is a computer scientist, a professor of electrical engineering at the Tel Aviv University, Israel. Prof. Ron is one of the pioneers of research in property testing, and a leading ...
.
On universal learning algorithms
'. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2224


Negative results

* M. Kearns and
Leslie Valiant Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Compu ...
. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html


Boosting (machine learning) In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the que ...

* Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990 http://citeseer.ist.psu.edu/schapire90strength.html


Occam learning In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (P ...

* Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K.br>Occam's razor
Inf.Proc.Lett. 24, 377–380, 1987. * Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM, 36(4):929–865, 1989.


Probably approximately correct learning In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ...

* L. Valiant
A Theory of the Learnable
Communications of the ACM, 27(11):1134–1142, 1984.


Error tolerance

* Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, August 1993. http://citeseer.ist.psu.edu/kearns93learning.html * Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pages 392–401. http://citeseer.ist.psu.edu/kearns93efficient.html


Equivalence

* D.Haussler, M.Kearns, N.Littlestone and M. Warmuth, Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55. * A description of some of these publications is given at important publications in machine learning.


Distribution learning theory


External links


Basics of Bayesian inference
{{Differentiable computing Theoretical computer science Computational fields of study de:Maschinelles Lernen