In
statistical classification
In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...
, two main approaches are called the generative approach and the discriminative approach. These compute
classifiers by different approaches, differing in the degree of
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
ling. Terminology is inconsistent, but three major types can be distinguished, following :
# A generative model is a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
of the
joint probability distribution on given
observable variable
In physics, an observable is a physical quantity that can be measured. Examples include position and momentum. In systems governed by classical mechanics, it is a real-valued "function" on the set of all possible system states. In quantum phys ...
''X'' and
target variable ''Y'';
[: "Generative classifiers learn a model of the joint probability, , of the inputs ''x'' and the label ''y'', and make their predictions by using Bayes rules to calculate , and then picking the most likely label ''y''.]
# A
discriminative model Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/si ...
is a model of the
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
of the target ''Y'', given an observation ''x''; and
# Classifiers computed without using a probability model are also referred to loosely as "discriminative".
The distinction between these last two classes is not consistently made; refers to these three classes as ''generative learning'', ''conditional learning'', and ''discriminative learning'', but only distinguish two classes, calling them generative classifiers (joint distribution) and discriminative classifiers (conditional distribution or no distribution), not distinguishing between the latter two classes. Analogously, a classifier based on a generative model is a generative classifier, while a classifier based on a discriminative model is a discriminative classifier, though this term also refers to classifiers that are not based on a model.
Standard examples of each, all of which are
linear classifier
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...
s, are:
* generative classifiers:
**
naive Bayes classifier
In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...
and
**
linear discriminant analysis
* discriminative model:
**
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
In application to classification, one wishes to go from an observation ''x'' to a label ''y'' (or probability distribution on labels). One can compute this directly, without using a probability distribution (''distribution-free classifier''); one can estimate the probability of a label given an observation,
(''discriminative model''), and base classification on that; or one can estimate the joint distribution
(''generative model''), from that compute the conditional probability
, and then base classification on that. These are increasingly indirect, but increasingly probabilistic, allowing more domain knowledge and probability theory to be applied. In practice different approaches are used, depending on the particular problem, and hybrids can combine strengths of multiple approaches.
Definition
An alternative division defines these symmetrically as:
* a generative model is a model of the conditional probability of the observable ''X'', given a target ''y'', symbolically,
[: "We can use Bayes rule as the basis for designing learning algorithms (function approximators), as follows: Given that we wish to learn some target function , or equivalently, , we use the training data to learn estimates of and . New ''X'' examples can then be classified using these estimated probability distributions, plus Bayes rule. This type of classifier is called a ''generative'' classifier, because we can view the distribution as describing how to generate random instances ''X'' conditioned on the target attribute ''Y''.]
* a discriminative model is a model of the conditional probability of the target ''Y'', given an observation ''x'', symbolically,
[: "Logistic Regression is a function approximation algorithm that uses training data to directly estimate , in contrast to Naive Bayes. In this sense, Logistic Regression is often referred to as a ''discriminative'' classifier because we can view the distribution as directly discriminating the value of the target value ''Y'' for any given instance ''X'']
Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (
outcomes), either of an observation and target
, or of an observation ''x'' given a target value ''y'',
while a discriminative model or discriminative classifier (without a model) can be used to "discriminate" the value of the target variable ''Y'', given an observation ''x''.
The difference between "
discriminate
Discrimination is the act of making unjustified distinctions between people based on the groups, classes, or other categories to which they belong or are perceived to belong. People may be discriminated on the basis of race, gender, age, reli ...
" (distinguish) and "
classify" is subtle, and these are not consistently distinguished. (The term "discriminative classifier" becomes a
pleonasm
Pleonasm (; , ) is redundancy in linguistic expression, such as "black darkness" or "burning fire". It is a manifestation of tautology by traditional rhetorical criteria and might be considered a fault of style. Pleonasm may also be used for em ...
when "discrimination" is equivalent to "classification".)
The term "generative model" is also used to describe models that generate instances of output variables in a way that has no clear relationship to probability distributions over potential samples of input variables.
Generative adversarial networks
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
are examples of this class of generative models, and are judged primarily by the similarity of particular outputs to potential inputs. Such models are not classifiers.
Relationships between models
In application to classification, the observable ''X'' is frequently a
continuous variable, the target ''Y'' is generally a
discrete variable
In mathematics and statistics, a quantitative variable may be continuous or discrete if they are typically obtained by ''measuring'' or '' counting'', respectively. If it can take on two particular real values such that it can also take on all ...
consisting of a finite set of labels, and the conditional probability
can also be interpreted as a (non-deterministic)
target function , considering ''X'' as inputs and ''Y'' as outputs.
Given a finite set of labels, the two definitions of "generative model" are closely related. A model of the conditional distribution
is a model of the distribution of each label, and a model of the joint distribution is equivalent to a model of the distribution of label values
, together with the distribution of observations given a label,
; symbolically,
Thus, while a model of the joint probability distribution is more informative than a model of the distribution of label (but without their relative frequencies), it is a relatively small step, hence these are not always distinguished.
Given a model of the joint distribution,
, the distribution of the individual variables can be computed as the
marginal distribution
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the varia ...
s
and
(considering ''X'' as continuous, hence integrating over it, and ''Y'' as discrete, hence summing over it), and either conditional distribution can be computed from the definition of
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
:
and
.
Given a model of one conditional probability, and estimated
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s for the variables ''X'' and ''Y'', denoted
and
, one can estimate the opposite conditional probability using
Bayes' rule
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
:
:
For example, given a generative model for
, one can estimate:
:
and given a discriminative model for
, one can estimate:
:
Note that Bayes' rule (computing one conditional probability in terms of the other) and the definition of conditional probability (computing conditional probability in terms of the joint distribution) are frequently conflated as well.
Contrast with discriminative classifiers
A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal? A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal. So, discriminative algorithms try to learn
directly from the data and then try to classify data. On the other hand, generative algorithms try to learn
which can be transformed into
later to classify the data. One of the advantages of generative algorithms is that you can use
to generate new data similar to existing data. On the other hand, it has been proved that some discriminative algorithms give better performance than some generative algorithms in classification tasks.
Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express complex relationships between the observed and target variables. But in general, they don't necessarily perform better than generative models at
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
and
regression tasks. The two classes are seen as complementary or as different views of the same procedure.
Deep generative models
With the rise of deep learning, a new family of methods, called deep generative models (DGMs),
is formed through the combination of generative models and deep neural networks. An increase in the scale of the neural networks is typically accompanied by an increase in the scale of the training data, both of which are required for good performance.
Popular DGMs include
variational autoencoders
In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods.
...
(VAEs),
generative adversarial networks
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
(GANs), and auto-regressive models. Recently, there has been a trend to build very large deep generative models.
For example,
GPT-3
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
The architecture is a standa ...
, and its precursor
GPT-2
Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence created by OpenAI in February 2019. GPT-2 translates text, answers questions, summarizes passages, and generates text output on a level that, while somet ...
, are auto-regressive neural language models that contain billions of parameters, BigGAN and VQ-VAE which are used for image generation that can have hundreds of millions of parameters, and Jukebox is a very large generative model for musical audio that contains billions of parameters.
Types
Generative models
Types of generative models are:
*
Gaussian mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observatio ...
(and other types of
mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation ...
)
*
Hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
*
Probabilistic context-free grammar Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA struct ...
*
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
(e.g.
Naive bayes
In statistics, naive Bayes classifiers are a family of simple "Probabilistic classification, probabilistic classifiers" based on applying Bayes' theorem with strong (naive) statistical independence, independence assumptions between the features (s ...
,
Autoregressive model
In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
)
*
Averaged one-dependence estimators Averaged one-dependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attribute-independence problem of the popular naive Bayes classifier. It frequently develops substantially more accur ...
*
Latent Dirichlet allocation
In natural language processing, Latent Dirichlet Allocation (LDA) is a generative statistical model that explains a set of observations through unobserved groups, and each group explains why some parts of the data are similar. The LDA is an ex ...
*
Boltzmann machine
A Boltzmann machine (also called Sherrington–Kirkpatrick model with external field or stochastic Ising–Lenz–Little model) is a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model, that is a stochastic ...
(e.g.
Restricted Boltzmann machine
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.
RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986,
and rose ...
,
Deep belief network
In machine learning, a deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not bet ...
)
*
Variational autoencoder
In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods.
...
*
Generative adversarial network
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
*
Flow-based generative model
A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the change-of-variable law of probabilities to tra ...
*
Energy based model
An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that als ...
*
Diffusion model
In machine learning, diffusion models, also known as diffusion probabilistic models, are a class of latent variable models. They are Markov chains trained using variational inference. The goal of diffusion models is to learn the latent structure of ...
If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to
maximize the data likelihood is a common method. However, since most statistical models are only approximations to the ''true'' distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using a
discriminative model Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/si ...
(see below), although application-specific details will ultimately dictate which approach is most suitable in any particular case.
Discriminative models
*
k-nearest neighbors algorithm
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regre ...
*
Logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
*
Support Vector Machines
*
Decision Tree Learning
Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of ob ...
*
Random Forest
*
Maximum-entropy Markov model
In statistics, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discrimina ...
s
*
Conditional random field
Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...
s
Examples
Simple example
Suppose the input data is
, the set of labels for
is
, and there are the following 4 data points:
For the above data, estimating the joint probability distribution
from the
empirical measure
In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
will be the following:
while
will be following:
Text generation
gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.
See also
*
Discriminative model Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/si ...
*
Graphical model
A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a Graph (discrete mathematics), graph expresses the conditional dependence structure between random variables. They are ...
Notes
References
External links
*
*
*
*
* ,
mirrormirror, published as book (above)
* Code accompanying the book ():
{{Statistics, state=expanded
Machine learning
Statistical models
Probabilistic models