An autoencoder is a type of
artificial neural network
Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains.
An ANN is based on a collection of connected unit ...
used to learn
efficient codings of unlabeled data (
unsupervised learning
Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a
representation (encoding) for a set of data, typically for
dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
, by training the network to ignore insignificant data (“noise”).
Variants exist, aiming to force the learned representations to assume useful properties.
Examples are regularized autoencoders (''Sparse'', ''Denoising'' and ''Contractive''), which are effective in learning representations for subsequent
classification tasks,
and ''Variational'' autoencoders, with applications as
generative model
In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsi ...
s.
Autoencoders are applied to many problems, including
facial recognition, feature detection,
anomaly detection and acquiring the meaning of words. Autoencoders are also generative models which can randomly generate new data that is similar to the input data (training data).
Mathematical principles
Definition
An autoencoder is defined by the following components:
Two sets: the space of decoded messages ; the space of encoded messages . Almost always, both and are Euclidean spaces, that is, for some .
Two parametrized families of functions: the encoder family , parametrized by ; the decoder family , parametrized by .
For any
, we usually write
, and refer to it as the code, the
latent variable
In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
, latent representation, latent vector, etc. Conversely, for any
, we usually write
, and refer to it as the (decoded) message.
Usually, both the encoder and the decoder are defined as
multilayer perceptron
A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean ''any'' feedforward ANN, sometimes strictly to refer to networks composed of mul ...
s. For example, a one-layer-MLP encoder
is:
:
where
is an element-wise
activation function
In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs.
A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or " ...
such as a
sigmoid function
A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve.
A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula:
:S(x) = \frac = \ ...
or a
rectified linear unit,
is a matrix called "weight", and
is a vector called "bias".
Training an autoencoder
An autoencoder, by itself, is simply a tuple of two functions. To judge its ''quality'', we need a ''task''. A task is defined by a reference probability distribution
over
, and a "reconstruction quality" function