Meta learning is a subfield of

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

where automatic learning algorithms are applied to

metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...

about machine learning experiments. As of 2017 the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing

learning algorithms Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn. Flexibility is important because each learning algorithm is based on a set of assumptions about the data, its

inductive bias The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. In machine learning, one aims to construct algorithms that a ...

. This means that it will only learn well if the bias matches the learning problem. A learning algorithm may perform very well in one domain, but not on the next. This poses strong restrictions on the use of

or data mining techniques, since the relationship between the learning problem (often some kind of

database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...

) and the effectiveness of different learning algorithms is not yet understood. By using different kinds of metadata, like properties of the learning problem, algorithm properties (like performance measures), or patterns previously derived from the data, it is possible to learn, select, alter or combine different learning algorithms to effectively solve a given learning problem. Critiques of meta learning approaches bear a strong resemblance to the critique of

metaheuristic In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimizati ...

, a possibly related problem. A good analogy to meta-learning, and the inspiration for

Jürgen Schmidhuber Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artif ...

's early work (1987) and

Yoshua Bengio Yoshua Bengio (born March 5, 1964) is a Canadian computer scientist, most noted for his work on artificial neural networks and deep learning. He is a professor at the Department of Computer Science and Operations Research at the Université ...

et al.'s work (1991), considers that genetic evolution learns the learning procedure encoded in genes and executed in each individual's brain. In an open-ended hierarchical meta learning system using

genetic programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...

, better evolutionary methods can be learned by meta evolution, which itself can be improved by meta meta evolution, etc. See also

Ensemble learning In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statisti ...

Definition

A proposed definition for a meta learning system combines three requirements: * The system must include a learning subsystem. * Experience is gained by exploiting meta knowledge extracted ** in a previous learning episode on a single dataset, or ** from different domains. * Learning bias must be chosen dynamically. ''Bias'' refers to the assumptions that influence the choice of explanatory hypotheses and not the notion of bias represented in the bias-variance dilemma. Meta learning is concerned with two aspects of learning bias. * Declarative bias specifies the representation of the space of hypotheses, and affects the size of the search space (e.g., represent hypotheses using linear functions only). * Procedural bias imposes constraints on the ordering of the inductive hypotheses (e.g., preferring smaller hypotheses).

Common approaches

There are three common approaches:
Lilian Weng(2018). Meta-Learning: Learning to Learn Fast. OpenAI Blog . November 2018 . Retrieved 27 October 2019 * 1) using (cyclic) networks with external or internal memory (model-based) * 2) learning effective distance metrics (metrics-based) * 3) explicitly optimizing model parameters for fast learning (optimization-based).

Model-Based

Model-based meta-learning models updates its parameters rapidly with a few training steps, which can be achieved by its internal architecture or controlled by another meta-learner model.

Memory-Augmented Neural Networks

A Memory-Augmented

Neural Network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

, or MANN for short, is claimed to be able to encode new information quickly and thus to adapt to new tasks after only a few examples.
Adam Santoro, Sergey Bartunov, Daan Wierstra, Timothy Lillicrap. Meta-Learning with Memory-Augmented Neural Networks. Google DeepMind. Retrieved 29 October 2019

Meta Networks

Meta Networks (MetaNet) learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization.
Tsendsuren Munkhdalai, Hong Yu (2017). Meta Networks.arXiv:1703.00837 s.LG/ref>

Metric-Based

The core idea in metric-based meta-learning is similar to nearest neighbors algorithms, which weight is generated by a kernel function. It aims to learn a metric or distance function over objects. The notion of a good metric is problem-dependent. It should represent the relationship between inputs in the task space and facilitate problem solving.

Convolutional Siamese
Neural Network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

Siamese

neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

is composed of two twin networks whose output is jointly trained. There is a function above to learn the relationship between input data sample pairs. The two networks are the same, sharing the same weight and network parameters.
Gregory Koch, Richard Zemel, Ruslan Salakhutdinov (2015). Siamese Neural Networks for One-shot Image Recognition. Department of Computer Science, University of Toronto. Toronto, Ontario, Canada.

Matching Networks

Matching Networks learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. Google DeepMind. Retrieved 3 November 2019

Relation Network

The Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting.
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H. S., & Hospedales, T. M. (2018). Learning to compare: relation network for few-shot learning

Prototypical Networks

Prototypical Networks learn a

metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...

in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve satisfied results.
Snell, J., Swersky, K., & Zemel, R. S. (2017). Prototypical networks for few-shot learning.

Optimization-Based

What optimization-based meta-learning algorithms intend for is to adjust the

optimization algorithm Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

so that the model can be good at learning with a few examples.

LSTM Meta-Learner

LSTM-based meta-learner is to learn the exact

used to train another learner

classifier in the few-shot regime. The parametrization allows it to learn appropriate parameter updates specifically for the

scenario In the performing arts, a scenario (, ; ; ) is a synoptical collage of an event or series of actions and events. In the ''commedia dell'arte'', it was an outline of entrances, exits, and action describing the plot of a play, and was literally pi ...

where a set amount of updates will be made, while also learning a general initialization of the learner (classifier) network that allows for quick convergence of training.
Sachin Ravi and Hugo Larochelle (2017).” Optimization as a model for few-shot learning”. ICLR 2017. Retrieved 3 November 2019

Temporal Discreteness

MAML, short for Model-Agnostic Meta-Learning, is a fairly general

, compatible with any model that learns through gradient descent.
Chelsea Finn, Pieter Abbeel, Sergey Levine (2017). “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” arXiv:1703.03400 s.LG/ref>

Reptile

Reptile is a remarkably simple meta-learning optimization algorithm, given that both of its components rely on meta-optimization through gradient descent and both are model-agnostic.
Alex Nichol, Joshua Achiam, and John Schulman (2018).” On First-Order Meta-Learning Algorithms”. arXiv:1803.02999 s.LG/ref>

Examples

Some approaches which have been viewed as instances of meta learning: *

Recurrent neural networks A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...

(RNNs) are universal computers. In 1993,

showed how "self-referential" RNNs can in principle learn by

backpropagation In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward neural network, feedforward artificial neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANN ...

to run their own weight change algorithm, which may be quite different from backpropagation. In 2001,

Sepp Hochreiter Josef "Sepp" Hochreiter (born 14 February 1967) is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018 ...

& A.S. Younger & P.R. Conwell built a successful supervised meta learner based on

Long short-term memory Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) ca ...

RNNs. It learned through backpropagation a learning algorithm for quadratic functions that is much faster than backpropagation. Researchers at

Deepmind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was List of mergers and acquisitions by Google, acquired by Google in 2014 and became a wholly owned subsid ...

(Marcin Andrychowicz et al.) extended this approach to optimization in 2017. * In the 1990s, Meta

Reinforcement Learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

or Meta RL was achieved in Schmidhuber's research group through self-modifying policies written in a universal programming language that contains special instructions for changing the policy itself. There is a single lifelong trial. The goal of the RL agent is to maximize reward. It learns to accelerate reward intake by continually improving its own learning algorithm which is part of the "self-referential" policy. * An extreme type of Meta

is embodied by the

Gödel machine A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The ...

, a theoretical construct which can inspect and modify any part of its own software which also contains a general theorem prover. It can achieve

recursive self-improvement The technological singularity—or simply the singularity—is a hypothetical future point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization. According to the m ...

in a provably optimal way. * ''Model-Agnostic Meta-Learning'' (MAML) was introduced in 2017 by

Chelsea Finn Chelsea Finn is an American computer scientist and assistant professor at Stanford University. Her research investigates intelligence through the interactions of robots, with the hope to create robotic systems that can learn how to learn. She is ...

et al. Given a sequence of tasks, the parameters of a given model are trained such that few iterations of gradient descent with few training data from a new task will lead to good generalization performance on that task. MAML "trains the model to be easy to fine-tune." MAML was successfully applied to few-shot image classification benchmarks and to policy-gradient-based reinforcement learning. * ''Discovering

meta-knowledge Meta-knowledge or metaknowledge is knowledge about knowledge. Some authors divide meta-knowledge into orders: * ''zero order meta-knowledge'' is knowledge whose domain is not knowledge (and hence zero order meta-knowledge is not meta-knowledge ''p ...

'' works by inducing knowledge (e.g. rules) that expresses how each learning method will perform on different learning problems. The metadata is formed by characteristics of the data (general, statistical, information-theoretic,... ) in the learning problem, and characteristics of the learning algorithm (type, parameter settings, performance measures,...). Another learning algorithm then learns how the data characteristics relate to the algorithm characteristics. Given a new learning problem, the data characteristics are measured, and the performance of different learning algorithms are predicted. Hence, one can predict the algorithms best suited for the new problem. * ''Stacked generalisation'' works by combining multiple (different) learning algorithms. The metadata is formed by the predictions of those different algorithms. Another learning algorithm learns from this metadata to predict which combinations of algorithms give generally good results. Given a new learning problem, the predictions of the selected set of algorithms are combined (e.g. by (weighted) voting) to provide the final prediction. Since each algorithm is deemed to work on a subset of problems, a combination is hoped to be more flexible and able to make good predictions. * '' Boosting'' is related to stacked generalisation, but uses the same algorithm multiple times, where the examples in the training data get different weights over each run. This yields different predictions, each focused on rightly predicting a subset of the data, and combining those predictions leads to better (but more expensive) results. * ''Dynamic bias selection'' works by altering the inductive bias of a learning algorithm to match the given problem. This is done by altering key aspects of the learning algorithm, such as the hypothesis representation, heuristic formulae, or parameters. Many different approaches exist. * ''

Inductive transfer Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize ...

'' studies how the learning process can be improved over time. Metadata consists of knowledge about previous learning episodes and is used to efficiently develop an effective hypothesis for a new task. A related approach is called

learning to learn Meta-learning is a branch of metacognition concerned with learning about one's own learning and learning processes. The term comes from the meta prefix's modern meaning of an abstract recursion, or "X about X", similar to its use in metaknowled ...

, in which the goal is to use acquired knowledge from one domain to help learning in other domains. * Other approaches using metadata to improve automatic learning are

learning classifier system Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement lear ...

case-based reasoning In artificial intelligence and philosophy, case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an auto mechanic who fixes an engine by recalli ...

and

constraint satisfaction In artificial intelligence and operations research, constraint satisfaction is the process of finding a solution through a set of constraints that impose conditions that the variables must satisfy. A solution is therefore a set of values for th ...

. * Some initial, theoretical work has been initiated to use ''

Applied Behavioral Analysis Applied behavior analysis (ABA), also called behavioral engineering, is a psychological intervention that applies empirical approaches based upon the principles of respondent and operant conditioning to change behavior of social significanc ...

'' as a foundation for agent-mediated meta-learning about the performances of human learners, and adjust the instructional course of an artificial agent. *

AutoML Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready ...

such as Google Brain's "AI building AI" project, which according to Google briefly exceeded existing ImageNet benchmarks in 2017.

References

External links

Metalearning
article in

Scholarpedia ''Scholarpedia'' is an English-language wiki-based online encyclopedia with features commonly associated with open-access online academic journals, which aims to have quality content in science and medicine. ''Scholarpedia'' articles are written ...

* Vilalta R. and Drissi Y. (2002).
A perspective view and survey of meta-learning
', Artificial Intelligence Review, 18(2), 77–95. * Giraud-Carrier, C., & Keller, J. (2002). Dealing with the data flood, J. Meij (ed), chapter Meta-Learning. STT/Beweton, The Hague. * Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. (2009
Metalearning: applications to data mining
chapter Metalearning: Concepts and Systems, Springer * Video courses about Meta-Learning with step-by-step explanation o
MAMLPrototypical Networks
an
Relation Networks
{{DEFAULTSORT:Meta Learning (Computer Science) Machine learning