Meta-learning
is a subfield of
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
where automatic learning algorithms are applied to
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing
learning algorithms
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalise to unseen data, and thus perform tasks without explicit inst ...
or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.
Flexibility is important because each learning algorithm is based on a set of assumptions about the data, its
inductive bias
The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered.
Inductive bias is anything which makes the algorithm learn o ...
.
This means that it will only learn well if the bias matches the learning problem. A learning algorithm may perform very well in one domain, but not on the next. This poses strong restrictions on the use of
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
or
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
techniques, since the relationship between the learning problem (often some kind of
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
) and the effectiveness of different learning algorithms is not yet understood.
By using different kinds of metadata, like properties of the learning problem, algorithm properties (like performance measures), or patterns previously derived from the data, it is possible to learn, select, alter or combine different learning algorithms to effectively solve a given learning problem. Critiques of meta-learning approaches bear a strong resemblance to the critique of
metaheuristic
In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an op ...
, a possibly related problem. A good analogy to meta-learning, and the inspiration for
Jürgen Schmidhuber
Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artifici ...
's early work (1987)
and
Yoshua Bengio
Yoshua Bengio (born March 5, 1964) is a Canadian-French computer scientist, and a pioneer of artificial neural networks and deep learning. He is a professor at the Université de Montréal and scientific director of the AI institute Montreal In ...
et al.'s work (1991), considers that genetic evolution learns the learning procedure encoded in genes and executed in each individual's brain. In an open-ended hierarchical meta-learning system
using
genetic programming
Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection (evolutionary algorithm), selection a ...
, better evolutionary methods can be learned by meta evolution, which itself can be improved by meta meta evolution, etc.
Definition
A proposed definition for a meta-learning system combines three requirements:
* The system must include a learning subsystem.
* Experience is gained by exploiting meta knowledge extracted
** in a previous learning episode on a single dataset, or
** from different domains.
* Learning bias must be chosen dynamically.
''Bias'' refers to the assumptions that influence the choice of explanatory hypotheses and not the notion of bias represented in the
bias-variance dilemma. Meta-learning is concerned with two aspects of learning bias.
* Declarative bias specifies the representation of the space of hypotheses, and affects the size of the search space (e.g., represent hypotheses using linear functions only).
* Procedural bias imposes constraints on the ordering of the inductive hypotheses (e.g., preferring smaller hypotheses).
Common approaches
There are three common approaches:
# using (cyclic) networks with external or internal memory (model-based)
# learning effective distance metrics (metrics-based)
# explicitly optimizing model parameters for fast learning (optimization-based).
Model-Based
Model-based meta-learning models updates its parameters rapidly with a few training steps, which can be achieved by its internal architecture or controlled by another meta-learner model.
Memory-Augmented Neural Networks
A Memory-Augmented
Neural Network
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
, or MANN for short, is claimed to be able to encode new information quickly and thus to adapt to new tasks after only a few examples.
Meta Networks
Meta Networks (MetaNet) learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization.
Metric-Based
The core idea in metric-based meta-learning is similar to
nearest neighbors algorithms, which weight is generated by a kernel function. It aims to learn a metric or distance function over objects. The notion of a good metric is problem-dependent. It should represent the relationship between inputs in the task space and facilitate problem solving.
Convolutional Siamese Neural Network
Siamese neural network
A Siamese neural network (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. Often one of the output vector ...
is composed of two twin networks whose output is jointly trained. There is a function above to learn the relationship between input data sample pairs. The two networks are the same, sharing the same weight and network parameters.
Matching Networks
Matching Networks learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.
Relation Network
The Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting.
Prototypical Networks
Prototypical Networks learn a
metric space
In mathematics, a metric space is a Set (mathematics), set together with a notion of ''distance'' between its Element (mathematics), elements, usually called point (geometry), points. The distance is measured by a function (mathematics), functi ...
in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve satisfied results.
Optimization-Based
What optimization-based meta-learning algorithms intend for is to adjust the
optimization algorithm
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
so that the model can be good at learning with a few examples.
LSTM Meta-Learner
LSTM
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hi ...
-based meta-learner is to learn the exact
optimization algorithm
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
used to train another learner
neural network
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
classifier in the few-shot regime. The parametrization allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner (classifier) network that allows for quick convergence of training.
Temporal Discreteness
Model-Agnostic Meta-Learning (MAML) is a fairly general
optimization algorithm
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
, compatible with any model that learns through gradient descent.
Reptile
Reptile is a remarkably simple meta-learning optimization algorithm, given that both of its components rely on
meta-optimization
Meta-optimization from numerical optimization is the use of one optimization method to tune another optimization method. Meta-optimization is reported to have been used as early as in the late 1970s by Mercer and Sampson for finding optimal paramet ...
through gradient descent and both are model-agnostic.
Examples
Some approaches which have been viewed as instances of meta-learning:
*
Recurrent neural networks
Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
(RNNs) are universal computers. In 1993,
Jürgen Schmidhuber
Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artifici ...
showed how "self-referential" RNNs can in principle learn by
backpropagation
In machine learning, backpropagation is a gradient computation method commonly used for training a neural network to compute its parameter updates.
It is an efficient application of the chain rule to neural networks. Backpropagation computes th ...
to run their own weight change algorithm, which may be quite different from backpropagation.
In 2001,
Sepp Hochreiter
Josef "Sepp" Hochreiter (born 14 February 1967) is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 201 ...
& A.S. Younger & P.R. Conwell built a successful supervised meta-learner based on
Long short-term memory
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, ...
RNNs. It learned through backpropagation a learning algorithm for quadratic functions that is much faster than backpropagation.
Researchers at
Deepmind
DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...
(Marcin Andrychowicz et al.) extended this approach to optimization in 2017.
* In the 1990s, Meta
Reinforcement Learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
or Meta RL was achieved in Schmidhuber's research group through self-modifying policies written in a universal programming language that contains special instructions for changing the policy itself. There is a single lifelong trial. The goal of the RL agent is to maximize reward. It learns to accelerate reward intake by continually improving its own learning algorithm which is part of the "self-referential" policy.
* An extreme type of Meta
Reinforcement Learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
is embodied by the
Gödel machine
A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. Th ...
, a theoretical construct which can inspect and modify any part of its own software which also contains a general
theorem prover. It can achieve
recursive self-improvement
Recursive self-improvement (RSI) is a process in which an early or weak artificial general intelligence (AGI) system enhances its own capabilities and intelligence without human intervention, leading to a superintelligence or intelligence explos ...
in a provably optimal way.
* ''Model-Agnostic Meta-Learning'' (MAML) was introduced in 2017 by
Chelsea Finn
Chelsea Finn is an American computer scientist and assistant professor at Stanford University. Her research investigates intelligence through the interactions of robots, with the hope to create robotic systems that can learn how to learn. She is ...
et al.
Given a sequence of tasks, the parameters of a given model are trained such that few iterations of gradient descent with few training data from a new task will lead to good generalization performance on that task. MAML "trains the model to be easy to fine-tune."
MAML was successfully applied to few-shot image classification benchmarks and to policy-gradient-based reinforcement learning.
* ''Variational Bayes-Adaptive Deep RL'' (VariBAD) was introduced in 2019. While MAML is optimization-based, VariBAD is a model-based method for meta reinforcement learning, and leverages a
variational autoencoder
In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian metho ...
to capture the task information in an internal memory, thus conditioning its decision making on the task.
* When addressing a set of tasks, most meta learning approaches optimize the average score across all tasks. Hence, certain tasks may be sacrificed in favor of the average score, which is often unacceptable in real-world applications. By contrast, ''Robust Meta Reinforcement Learning'' (RoML) focuses on improving low-score tasks, increasing robustness to the selection of task. RoML works as a meta-algorithm, as it can be applied on top of other meta learning algorithms (such as MAML and VariBAD) to increase their robustness. It is applicable to both supervised meta learning and meta
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
.
* ''Discovering
meta-knowledge
Metaknowledge or meta-knowledge is knowledge about knowledge.
Some authors divide meta-knowledge into orders:
* ''zero order meta-knowledge'' is knowledge whose domain is not knowledge (and hence zero order meta-knowledge is not meta-knowledge ''p ...
'' works by inducing knowledge (e.g. rules) that expresses how each learning method will perform on different learning problems. The metadata is formed by characteristics of the data (general, statistical, information-theoretic,... ) in the learning problem, and characteristics of the learning algorithm (type, parameter settings, performance measures,...). Another learning algorithm then learns how the data characteristics relate to the algorithm characteristics. Given a new learning problem, the data characteristics are measured, and the performance of different learning algorithms are predicted. Hence, one can predict the algorithms best suited for the new problem.
* ''Stacked generalisation'' works by combining multiple (different) learning algorithms. The metadata is formed by the predictions of those different algorithms. Another learning algorithm learns from this metadata to predict which combinations of algorithms give generally good results. Given a new learning problem, the predictions of the selected set of algorithms are combined (e.g. by (weighted) voting) to provide the final prediction. Since each algorithm is deemed to work on a subset of problems, a combination is hoped to be more flexible and able to make good predictions.
* ''
Boosting'' is related to stacked generalisation, but uses the same algorithm multiple times, where the examples in the training data get different weights over each run. This yields different predictions, each focused on rightly predicting a subset of the data, and combining those predictions leads to better (but more expensive) results.
* ''Dynamic bias selection'' works by altering the inductive bias of a learning algorithm to match the given problem. This is done by altering key aspects of the learning algorithm, such as the hypothesis representation, heuristic formulae, or parameters. Many different approaches exist.
* ''
Inductive transfer
Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recogniz ...
'' studies how the learning process can be improved over time. Metadata consists of knowledge about previous learning episodes and is used to efficiently develop an effective hypothesis for a new task. A related approach is called
learning to learn
Meta-learning is a branch of metacognition concerned with learning about one's own learning and learning processes.
The term comes from the Meta (prefix), meta prefix's modern meaning of an abstract recursion, or "X about X", similar to its use ...
, in which the goal is to use acquired knowledge from one domain to help learning in other domains.
* Other approaches using metadata to improve automatic learning are
learning classifier system
Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component (e.g. typically a genetic algorithm in evolutionary computation) with a learning component (performing either supervised ...
s,
case-based reasoning
Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems.
In everyday life, an auto mechanic who fixes an engine by recalling another car that exhibited similar sympto ...
and
constraint satisfaction In artificial intelligence and operations research, constraint satisfaction is the process of finding a solution through
a set of constraints that impose conditions that the variables must satisfy. A solution is therefore an assignment of value ...
.
* Some initial, theoretical work has been initiated to use ''
Applied Behavioral Analysis
Applied behavior analysis (ABA), also referred to as behavioral engineering, is a behavior modification system based on the principles of respondent and operant conditioning. ABA is the applied form of behavior analysis; the other two are: ra ...
'' as a foundation for agent-mediated meta-learning about the performances of human learners, and adjust the instructional course of an artificial agent.
*
AutoML
Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML.
AutoML potentially includes every stage from beginning with a raw datas ...
such as Google Brain's "AI building AI" project, which according to Google briefly exceeded existing
ImageNet
The ImageNet project is a large visual database designed for use in Outline of object recognition, visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictur ...
benchmarks in 2017.
References
External links
Metalearningarticle in
Scholarpedia
''Scholarpedia'' is an English-language wiki-based online encyclopedia with features commonly associated with Open access (publishing), open-access online academic journals, which aims to have quality content in science and medicine.
''Scholarpe ...
*
*
*
* Video courses about Meta-Learning with step-by-step explanation o
MAMLPrototypical Networks an
Relation Networks
{{DEFAULTSORT:Meta-Learning (Computer Science)
Machine learning