An energy-based model (EBM) is a form of

generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsis ...

(GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that also match the data distribution. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training

graphical Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...

and other structured models. An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the

latent variables In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...

of a dataset and generate new datasets with a similar distribution. Target applications include

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...

robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...

and

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

History

Early work on EBMs proposed models that represented energy as a composition of latent and observable variables. EBMs surfaced in 2003.

Approach

EBMs capture dependencies by associating an unnormalized probability scalar (''energy'') to each configuration of the combination of observed and latent variables. Inference consists of finding (values of) latent variables that minimize the energy given a set of (values of) the observed variables. Similarly, the model learns a function that associates low energies to correct values of the latent variables, and higher energies to incorrect values. Traditional EBMs rely on stochastic gradient-descent (SGD) optimization methods that are typically hard to apply to high-dimension datasets. In 2019, OpenAI publicized a variant that instead used

Langevin dynamics In physics, Langevin dynamics is an approach to the mathematical modeling of the dynamics of molecular systems. It was originally developed by French physicist Paul Langevin. The approach is characterized by the use of simplified models while acco ...

(LD). LD is an iterative optimization algorithm that introduces noise to the estimator as part of learning an objective function. It can be used for

Bayesian learning Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and e ...

scenarios by producing samples from a posterior distribution. EBMs do not require that energies be normalized as probabilities. In other words, energies do not need to sum to 1. Since there is no need to estimate the

normalization Normalization or normalisation refers to a process that makes something more normal or regular. Most commonly it refers to: * Normalization (sociology) or social normalization, the process through which ideas and behaviors that may fall outside of ...

constant like probabilistic models do, certain forms of inference and learning with EBMs are more tractable and flexible. Samples are generated implicitly via a

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

approach. A replay buffer of past images is used with LD to initialize the optimization module.

Characteristics

EBMs demonstrate useful properties: * Simplicity and stability–The EBM is the only object that needs to be designed and trained. Separate networks need not be trained to ensure balance. * Adaptive computation time–An EBM can generate sharp, diverse samples or (more quickly) coarse, less diverse samples. Given infinite time, this procedure produces true samples. * Flexibility–In Variational Autoencoders (VAE) and flow-based models, the generator learns a map from a continuous space to a (possibly) discontinuous space containing different data modes. EBMs can learn to assign low energies to disjoint regions (multiple modes). * Adaptive generation–EBM generators are implicitly defined by the probability distribution, and automatically adapt as the distribution changes (without training), allowing EBMs to address domains where generator training is impractical, as well as minimizing mode collapse and avoiding spurious modes from out-of-distribution samples. * Compositionality–Individual models are unnormalized probability distributions, allowing models to be combined through product of experts or other hierarchical techniques.

Experimental results

On image datasets such as

CIFAR-10 The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The ...

and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and

autoregressive model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...

s. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification.

Alternatives

EBMs compete with techniques such as variational autoencoders (VAEs) or Generative Adversarial Neural Networks (GANs).

References

External links

* * * *{{Cite journal, last1=Salakhutdinov, first1=Ruslan, last2=Hinton, first2=Geoffrey, date=2009-04-15, title=Deep Boltzmann Machines, url=http://proceedings.mlr.press/v5/salakhutdinov09a.html, journal=Artificial Intelligence and Statistics, language=en, pages=448–455 Statistical models Machine learning Statistical mechanics Hamiltonian mechanics