A flow-based generative model is a
generative model
In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...
used in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
that explicitly models a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
by leveraging normalizing flow, which is a statistical method using the
change-of-variable law of probabilities to transform a simple distribution into a complex one.
The direct modeling of likelihood provides many advantages. For example, the negative log-likelihood can be directly computed and minimized as the
loss function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
. Additionally, novel samples can be generated by sampling from the initial distribution, and applying the flow transformation.
In contrast, many alternative generative modeling methods such as
variational autoencoder (VAE) and
generative adversarial network
A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
do not explicitly represent the likelihood function.
Method

Let
be a (possibly multivariate)
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
with distribution
.
For
, let
be a sequence of random variables transformed from
. The functions
should be invertible, i.e. the
inverse function
In mathematics, the inverse function of a function (also called the inverse of ) is a function that undoes the operation of . The inverse of exists if and only if is bijective, and if it exists, is denoted by f^ .
For a function f\colon ...
exists. The final output
models the target distribution.
The log likelihood of
is (see
derivation
Derivation may refer to:
Language
* Morphological derivation, a word-formation process
* Parse tree or concrete syntax tree, representing a string's syntax in formal grammars
Law
* Derivative work, in copyright law
* Derivation proceeding, a ...
):
:
To efficiently compute the log likelihood, the functions
should be easily invertible, and the determinants of their Jacobians should be simple to compute. In practice, the functions
are modeled using
deep neural networks
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
, and are trained to minimize the negative log-likelihood of data samples from the target distribution. These architectures are usually designed such that only the forward pass of the neural network is required in both the inverse and the Jacobian determinant calculations. Examples of such architectures include NICE,
RealNVP,
and Glow.
Derivation of log likelihood
Consider
and
. Note that
.
By the
change of variable formula, the distribution of
is:
:
Where
is the
determinant
In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...
of the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...
of
.
By the
inverse function theorem
In mathematics, the inverse function theorem is a theorem that asserts that, if a real function ''f'' has a continuous derivative near a point where its derivative is nonzero, then, near this point, ''f'' has an inverse function. The inverse fu ...
:
:
By the identity
(where
is an
invertible matrix
In linear algebra, an invertible matrix (''non-singular'', ''non-degenarate'' or ''regular'') is a square matrix that has an inverse. In other words, if some other matrix is multiplied by the invertible matrix, the result can be multiplied by a ...
), we have:
:
The log likelihood is thus:
:
In general, the above applies to any
and
. Since
is equal to
subtracted by a non-recursive term, we can infer by
induction that:
:
Training method
As is generally done when training a deep learning model, the goal with normalizing flows is to minimize the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
between the model's likelihood and the target distribution to be estimated. Denoting
the model's likelihood and
the target distribution to learn, the (forward) KL-divergence is:
: