
An autoencoder is a type of
artificial neural network
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
used to learn
efficient codings of unlabeled data (
unsupervised learning
Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, wh ...
). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for
dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
, to generate lower-dimensional embeddings for subsequent use by other
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithms.
Variants exist which aim to make the learned representations assume useful properties.
Examples are regularized autoencoders (''sparse'', ''denoising'' and ''contractive'' autoencoders), which are effective in learning representations for subsequent
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
tasks,
and
''variational'' autoencoders, which can be used as
generative model
In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...
s.
Autoencoders are applied to many problems, including
facial recognition,
feature detection,
anomaly detection
In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of ...
, and
learning the meaning of words. In terms of
data synthesis, autoencoders can also be used to randomly generate new data that is similar to the input (training) data.
Mathematical principles
Definition
An autoencoder is defined by the following components:
Two sets: the space of decoded messages ; the space of encoded messages . Typically and are Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
s, that is, with
Two parametrized families of functions: the encoder family , parametrized by ; the decoder family , parametrized by .
For any
, we usually write
, and refer to it as the code, the
latent variable
In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...
, latent representation, latent vector, etc. Conversely, for any
, we usually write
, and refer to it as the (decoded) message.
Usually, both the encoder and the decoder are defined as
multilayer perceptron
In deep learning, a multilayer perceptron (MLP) is a name for a modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is ...
s (MLPs). For example, a one-layer-MLP encoder
is:
:
where
is an element-wise
activation function
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...
,
is a "weight" matrix, and
is a "bias" vector.
Training an autoencoder
An autoencoder, by itself, is simply a tuple of two functions. To judge its ''quality'', we need a ''task''. A task is defined by a reference probability distribution
over
, and a "reconstruction quality" function