In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, computational learning theory (or just learning theory) is a subfield of
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machine
A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, moveme ...
devoted to studying the design and analysis of
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
algorithms.
Overview
Theoretical results in machine learning mainly deal with a type of inductive learning called
supervised learning
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples.
In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning. In
computational learning theory, a computation is considered feasible if it can be done in
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. There are two kinds of time
complexity results:
* Positive resultsShowing that a certain class of functions is learnable in polynomial time.
* Negative resultsShowing that certain classes cannot be learned in polynomial time.
Negative results often rely on commonly believed, but yet unproven assumptions, such as:
* Computational complexity –
P ≠ NP (the P versus NP problem);
*
Cryptographic
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
–
One-way functions exist.
There are several different approaches to computational learning theory based on making different assumptions about the
inference
Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that ...
principles used to generalise from limited data. This includes different definitions of
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
(see
frequency probability
Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in many trials (the long-run probability). Probabilities can be found (in principle) by a re ...
,
Bayesian probability
Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification o ...
) and different assumptions on the generation of samples. The different approaches include:
* Exact learning, proposed by
Dana Angluin;
*
Probably approximately correct learning
In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ...
(PAC learning), proposed by
Leslie Valiant
Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Compu ...
;
*
VC theory, proposed by
Vladimir Vapnik and
Alexey Chervonenkis;
*
Bayesian inference;
*
Algorithmic learning theory, from the work of
E. Mark Gold
E. Mark Gold (often written "E Mark Gold" without a dot, born 1936 in Los Angeles) is an American physicist, mathematician, and computer scientist.
He became well known for his article ''Language identification in the limit'' which pioneered a fo ...
;
*
Online machine learning, from the work of Nick Littlestone.
While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired
boosting, VC theory led to
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...
s, and Bayesian inference led to
belief networks.
See also
*
Grammar induction
Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of ''re-write rules'' or '' productions'' or alternatively as a finite state machine or automaton of some kind) fr ...
*
Information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
*
Stability (learning theory)
*
Error Tolerance (PAC learning)
References
Surveys
* Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pages 351–369. http://portal.acm.org/citation.cfm?id=129712.129746
* D. Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101–1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html
VC dimension
* V. Vapnik and A. Chervonenkis
On the uniform convergence of relative frequencies of events to their probabilities Theory of Probability and Its Applications, 16(2):264–280, 1971.
Feature selection
* A. Dhagat and L. Hellerstein, "PAC learning with irrelevant attributes", in 'Proceedings of the IEEE Symp. on Foundation of Computer Science', 1994. http://citeseer.ist.psu.edu/dhagat94pac.html
Inductive inference
*
Optimal O notation learning
*
Oded Goldreich,
Dana Ron.
On universal learning algorithms'. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2224
Negative results
* M. Kearns and
Leslie Valiant
Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Compu ...
. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html
Boosting (machine learning)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the q ...
* Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990 http://citeseer.ist.psu.edu/schapire90strength.html
Occam learning
* Blumer, A.; Ehrenfeucht, A.; Haussler, D.;
Warmuth, M. K.br>
Occam's razorInf.Proc.Lett. 24, 377–380, 1987.
* Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K
Learnability and the Vapnik-Chervonenkis dimension Journal of the ACM, 36(4):929–865, 1989.
Probably approximately correct learning
In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ...
* L. Valiant
A Theory of the Learnable Communications of the ACM, 27(11):1134–1142, 1984.
Error tolerance
* Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, August 1993. http://citeseer.ist.psu.edu/kearns93learning.html
* Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pages 392–401. http://citeseer.ist.psu.edu/kearns93efficient.html
Equivalence
* D.Haussler, M.Kearns, N.Littlestone and
M. Warmuth, Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55.
*
A description of some of these publications is given at
important publications in machine learning.
Distribution learning theory
External links
Basics of Bayesian inference
{{Differentiable computing
Theoretical computer science
Computational fields of study
de:Maschinelles Lernen