Discriminative models, also referred to as conditional models, are a class of logistical models used for
classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/sick.
Typical discriminative models include
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...
(LR),
conditional random fields (CRFs) (specified over an undirected graph),
decision trees
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
, and many others. Typical generative model approaches include
naive Bayes classifiers,
Gaussian mixture models,
variational autoencoders
In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods.
...
,
generative adversarial network
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is a ...
s and others.
Definition
Unlike generative modelling, which studies from the
joint probability
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
, discriminative modeling studies the
or maps the given unobserved variable (target)
to a class label
dependent on the observed variables (training samples). For example, in
object recognition
Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
,
is likely to be a vector of raw pixels (or features extracted from the raw pixels of the image). Within a probabilistic framework, this is done by modeling the
conditional probability distribution
In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the ...
, which can be used for predicting
from
. Note that there is still distinction between the conditional model and the discriminative model, though more often they are simply categorised as discriminative model.
Pure discriminative model vs. conditional model
A ''conditional model'' models the conditional probability distribution, while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples.
Typical discriminative modelling approaches
The following approach is based on the assumption that it is given the training data-set
, where
is the corresponding output for the input
.
Linear classifier
We intend to use the function
to simulate the behavior of what we observed from the training data-set by the
linear classifier
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...
method. Using the joint feature vector
, the decision function is defined as:
:
According to Memisevic's interpretation,
, which is also
, computes a score which measures the computability of the input
with the potential output
. Then the
determines the class with the highest score.
Logistic regression (LR)
Since the
0-1 loss function is a commonly used one in the decision theory, the conditional probability distribution
, where
is a parameter vector for optimizing the training data, could be reconsidered as following for the logistics regression model:
:
, with
:
The equation above represents
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...
. Notice that a major distinction between models is their way of introducing posterior probability. Posterior probability is inferred from the parametric model. We then can maximize the parameter by following equation:
:
It could also be replaced by the
log-loss equation below:
:
Since the
log-loss is differentiable, a gradient-based method can be used to optimize the model. A global optimum is guaranteed because the objective function is convex. The gradient of log likelihood is represented by:
:
where
is the expectation of
.
The above method will provide efficient computation for the relative small number of classification.
Contrast with generative model
Contrast in approaches
Let's say we are given the
class labels (classification) and
feature variables,
, as the training samples.
A generative model takes the joint probability
, where
is the input and
is the label, and predicts the most possible known label
for the unknown variable
using
Bayes' theorem.
Discriminative models, as opposed to
generative model
In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsi ...
s, do not allow one to generate samples from the
joint distribution
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
of observed and target variables. However, for tasks such as
classification and
regression that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute).
On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently
supervised and cannot easily support
unsupervised learning
Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model.
Discriminative models and generative models also differ in introducing the
posterior possibility.
To maintain the least expected loss, the minimization of result's misclassification should be acquired. In the discriminative model, the posterior probabilities,
, is inferred from a parametric model, where the parameters come from the training data. Points of estimation of the parameters are obtained from the maximization of likelihood or distribution computation over the parameters. On the other hand, considering that the generative models focus on the joint probability, the class posterior possibility
is considered in
Bayes' theorem, which is
:
.
Advantages and disadvantages in application
In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster.
However, in Ulusoy and Bishop's joint work, ''Comparison of Generative and Discriminative Techniques for Object Detection and Classification'', they state that the above statement is true only when the model is the appropriate one for data (i.e.the data distribution is correctly modeled by the generative model).
Advantages
Significant advantages of using discriminative modeling are:
* Higher accuracy, which mostly leads to better learning result.
* Allows simplification of the input and provides a direct approach to
* Saves calculation resource
* Generates lower asymptotic errors
Compared with the advantages of using generative modeling:
* Takes all data into consideration, which could result in slower processing as a disadvantage
* Requires fewer training samples
* A flexible framework that could easily cooperate with other needs of the application
Disadvantages
* Training method usually requires multiple numerical optimization techniques
* Similarly by the definition, the discriminative model will need the combination of multiple subtasks for a solving complex real-world problem
Optimizations in applications
Since both advantages and disadvantages present on the two way of modeling, combining both approaches will be a good modeling in practice. For example, in Marras' article ''A Joint Discriminative Generative Model for Deformable Model Construction and Classification'', he and his coauthors apply the combination of two modelings on face classification of the models, and receive a higher accuracy than the traditional approach.
Similarly, Kelm also proposed the combination of two modelings for pixel classification in his article ''Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning''.
During the process of extracting the discriminative features prior to the clustering,
Principal component analysis (PCA), though commonly used, is not a necessarily discriminative approach. In contrast, LDA is a discriminative one.
Linear discriminant analysis (LDA), provides an efficient way of eliminating the disadvantage we list above. As we know, the discriminative model needs a combination of multiple subtasks before classification, and LDA provides appropriate solution towards this problem by reducing dimension.
Types
Examples of discriminative models include:
*
Logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...
, a type of
generalized linear regression used for predicting
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two digits (0 and 1)
* Binary function, a function that takes two arguments
* Binary operation, a mathematical operation that ta ...
or
categorical outputs (also known as
maximum entropy classifiers)
*
Boosting (meta-algorithm)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the que ...
*
Conditional random field
Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...
s
*
Linear regression
*
Random forests
See also
*
Generative model
In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsi ...
References
{{Statistics, state=expanded
Regression models