
A restricted Boltzmann machine (RBM) (also called a restricted Sherrington–Kirkpatrick model with external field or restricted stochastic Ising–Lenz–Little model) is a
generative stochastic Stochastic (; ) is the property of being well-described by a random probability distribution. ''Stochasticity'' and ''randomness'' are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; i ...
artificial neural network
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
that can learn a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
over its set of inputs.
RBMs were initially proposed under the name Harmonium by
Paul Smolensky in 1986, and rose to prominence after
Geoffrey Hinton
Geoffrey Everest Hinton (born 1947) is a British-Canadian computer scientist, cognitive scientist, and cognitive psychologist known for his work on artificial neural networks, which earned him the title "the Godfather of AI".
Hinton is Univer ...
and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in
dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
,
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
,
collaborative filtering
Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...
,
feature learning,
topic model
In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden ...
ling,
[Ruslan Salakhutdinov and Geoffrey Hinton (2010)]
Replicated softmax: an undirected topic model
. '' Neural Information Processing Systems'' 23. immunology
Immunology is a branch of biology and medicine that covers the study of Immune system, immune systems in all Organism, organisms.
Immunology charts, measures, and contextualizes the Physiology, physiological functioning of the immune system in ...
, and even
manybody quantum mechanics.
They can be trained in either
supervised or
unsupervised ways, depending on the task.
As their name implies, RBMs are a variant of
Boltzmann machine
A Boltzmann machine (also called Sherrington–Kirkpatrick model with external field or stochastic Ising model), named after Ludwig Boltzmann, is a spin glass, spin-glass model with an external field, i.e., a Spin glass#Sherrington–Kirkpatrick m ...
s, with the restriction that their
neurons
A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...
must form a
bipartite graph
In the mathematics, mathematical field of graph theory, a bipartite graph (or bigraph) is a Graph (discrete mathematics), graph whose vertex (graph theory), vertices can be divided into two disjoint sets, disjoint and Independent set (graph theo ...
:
*a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and
*there are no connections between nodes within a group.
By contrast, "unrestricted" Boltzmann machines may have connections between
hidden units. This restriction allows for more efficient training
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
than are available for the general class of Boltzmann machines, in particular the
gradient-based contrastive divergence algorithm.
[Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005)]
On contrastive divergence learning
''Artificial Intelligence and Statistics''.
Restricted Boltzmann machines can also be used in
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
networks. In particular,
deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with
gradient descent
Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.
The idea is to take repeated steps in the opposite direction of the gradi ...
and
backpropagation
In machine learning, backpropagation is a gradient computation method commonly used for training a neural network to compute its parameter updates.
It is an efficient application of the chain rule to neural networks. Backpropagation computes th ...
.
Structure
The standard type of RBM has binary-valued (
Boolean) hidden and visible units, and consists of a
matrix
Matrix (: matrices or matrixes) or MATRIX may refer to:
Science and mathematics
* Matrix (mathematics), a rectangular array of numbers, symbols or expressions
* Matrix (logic), part of a formula in prenex normal form
* Matrix (biology), the m ...
of weights
of size
. Each weight element
of the matrix is associated with the connection between the visible (input) unit
and the hidden unit
. In addition, there are bias weights (offsets)
for
and
for
. Given the weights and biases, the ''energy'' of a configuration (pair of Boolean vectors) is defined as
:
or, in matrix notation,
:
This energy function is analogous to that of a
Hopfield network
A Hopfield network (or associative memory) is a form of recurrent neural network, or a spin glass system, that can serve as a content-addressable memory. The Hopfield network, named for John Hopfield, consists of a single layer of neurons, where ...
. As with general Boltzmann machines, the
joint probability distribution
A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...
for the visible and hidden vectors is defined in terms of the energy function as follows,
[Geoffrey Hinton (2010). ]
A Practical Guide to Training Restricted Boltzmann Machines
'. UTML TR 2010–003, University of Toronto.
:
where
is a
partition function defined as the sum of
over all possible configurations, which can be interpreted as a
normalizing constant
In probability theory, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one.
For example, a Gaussian function can be normalized into a probabilit ...
to ensure that the probabilities sum to 1. The
marginal probability
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variable ...
of a visible vector is the sum of
over all possible hidden layer configurations,
:
,
and vice versa. Since the underlying graph structure of the RBM is
bipartite (meaning there are no intra-layer connections), the hidden unit activations are
mutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations.
That is, for ''m'' visible units and ''n'' hidden units, the
conditional probability
In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
of a configuration of the visible units , given a configuration of the hidden units , is
:
.
Conversely, the conditional probability of given is
:
.
The individual activation probabilities are given by
:
and
where
denotes the
logistic sigmoid.
The visible units of Restricted Boltzmann Machine can be
multinomial, although the hidden units are
Bernoulli. In this case, the logistic function for visible units is replaced by the
softmax function
The softmax function, also known as softargmax or normalized exponential function, converts a tuple of real numbers into a probability distribution of possible outcomes. It is a generalization of the logistic function to multiple dimensions, a ...
:
where ''K'' is the number of discrete values that the visible values have. They are applied in topic modeling,
and
recommender system
A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fi ...
s.
Relation to other models
Restricted Boltzmann machines are a special case of
Boltzmann machine
A Boltzmann machine (also called Sherrington–Kirkpatrick model with external field or stochastic Ising model), named after Ludwig Boltzmann, is a spin glass, spin-glass model with an external field, i.e., a Spin glass#Sherrington–Kirkpatrick m ...
s and
Markov random field
In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph
In discrete mathematics, particularly ...
s.
[Asja Fischer and Christian Igel]
Training Restricted Boltzmann Machines: An Introduction
. Pattern Recognition 47, pp. 25-39, 2014
The
graphical model of RBMs corresponds to that of
factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...
.
Training algorithm
Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set
(a matrix, each row of which is treated as a visible vector
),
:
or equivalently, to maximize the
expected log probability of a training sample
selected randomly from
:
: