Binary classification is the task of
classifying the elements of a
set into two groups (each called ''class'') on the basis of a
classification rule. Typical binary classification problems include:
*
Medical testing to determine if a patient has certain disease or not;
*
Quality control in industry, deciding whether a specification has been met;
* In
information retrieval, deciding whether a page should be in the
result set of a search or not.
Binary classification is
dichotomization applied to a practical situation. In many practical binary classification problems, the two groups are not symmetric, and rather than overall accuracy, the relative proportion of different
types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a ''
false positive
A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
'') is considered differently from not detecting a disease when it is present (a ''
false negative'').
Statistical binary classification
Statistical classification is a problem studied in
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
. It is a type of
supervised learning, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification.
Some of the methods commonly used for binary classification are:
*
Decision trees
*
Random forests
*
Bayesian networks
*
Support vector machines
*
Neural network
A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...
s
*
Logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
*
Probit model
*
Genetic Programming
*
Multi expression programming
*
Linear genetic programming
Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the
feature vector
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern ...
, the noise in the data and many other factors. For example,
random forests perform better than
SVM classifiers for 3D point clouds.
Evaluation of binary classifiers
There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. In medicine
sensitivity and specificity
In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...
are often used, while in information retrieval
precision and recall are preferred. An important distinction is between metrics that are independent of how often each category occurs in the population (the ''
prevalence''), and metrics that depend on the prevalence – both types are useful, but they have very different properties.
Given a classification of a specific data set, there are four basic combinations of actual data category and assigned category:
true positives TP (correct positive assignments),
true negatives TN (correct negative assignments),
false positive
A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
s FP (incorrect positive assignments), and
false negatives FN (incorrect negative assignments).
These can be arranged into a 2×2
contingency table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...
, with columns corresponding to actual value – condition positive or condition negative – and rows corresponding to classification value – test outcome positive or test outcome negative.
The eight basic ratios
There are eight basic ratios that one can compute from this table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio".
There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements.
The row ratios are:
*
true positive rate (TPR) = (TP/(TP+FN)), aka
sensitivity
Sensitivity may refer to:
Science and technology Natural sciences
* Sensitivity (physiology), the ability of an organism or organ to respond to external stimuli
** Sensory processing sensitivity in humans
* Sensitivity and specificity, statisti ...
or
recall. These are the proportion of the ''population with the condition'' for which the test is correct.
**with complement the
false negative rate
A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result ...
(FNR) = (FN/(TP+FN))
*
true negative rate (TNR) = (TN/(TN+FP), aka
specificity (SPC),
**with complement
false positive rate
In statistics, when performing multiple comparisons, a false positive ratio (also known as fall-out or false alarm ratio) is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as ...
(FPR) = (FP/(TN+FP)), also called independent of
prevalence
The column ratios are:
*
positive predictive value (PPV, aka
precision) (TP/(TP+FP)). These are the proportion of the ''population with a given test result'' for which the test is correct.
**with complement the
false discovery rate (FDR) (FP/(TP+FP))
*
negative predictive value (NPV) (TN/(TN+FN))
**with complement the
false omission rate (FOR) (FN/(TN+FN)), also called dependence on prevalence.
In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as
sensitivity and specificity
In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...
. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as
precision and recall.
One can take ratios of a complementary pair of ratios, yielding four
likelihood ratios (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding
likelihood ratios in diagnostic testing. Taking the ratio of one of these groups of ratios yields a final ratio, the
diagnostic odds ratio (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an
odds ratio – and is prevalence-independent.
There are a number of other metrics, most simply the
accuracy
Accuracy and precision are two measures of '' observational error''.
''Accuracy'' is how close a given set of measurements (observations or readings) are to their '' true value'', while ''precision'' is how close the measurements are to each ot ...
or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The
F-score combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score (
F1 score). Some metrics come from
regression coefficients: the
markedness
In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...
and the
informedness, and their
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the ...
, the
Matthews correlation coefficient. Other metrics include
Youden's J statistic, the
uncertainty coefficient, the
phi coefficient, and
Cohen's kappa.
Converting continuous values to binary
Tests whose results are of continuous values, such as most
blood values, can artificially be made binary by defining a
cutoff value
In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons (for example, the amount of creatinine in the blood ...
, with test results being designated as
positive or negative depending on whether the resultant value is higher or lower than the cutoff.
However, such conversion causes a loss of information, as the resultant binary classification does not tell ''how much'' above or below the cutoff a value is. As a result, when converting a continuous value that is close to the cutoff to a binary one, the resultant
positive or
negative predictive value is generally higher than the
predictive value given directly from the continuous value. In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty. For example, with the urine concentration of
hCG as a continuous value, a urine
pregnancy test that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value. On the other hand, a test result very far from the cutoff generally has a resultant positive or negative predictive value that is lower than the predictive value given from the continuous value. For example, a urine hCG value of 200,000 mIU/ml confers a very high probability of pregnancy, but conversion to binary values results in that it shows just as "positive" as the one of 52 mIU/ml.
See also
*
Examples of Bayesian inference
*
Classification rule
*
Confusion matrix
*
Detection theory
Detection theory or signal detection theory is a means to measure the ability to differentiate between information-bearing patterns (called stimulus in living organisms, signal in machines) and random patterns that distract from the information ( ...
*
Kernel methods
*
Multiclass classification
*
Multi-label classification
*
One-class classification
*
Prosecutor's fallacy
*
Receiver operating characteristic
*
Thresholding (image processing)
In digital image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images.
Definition
The simplest thresholding methods replace each pixel in an image with a ...
*
Uncertainty coefficient, aka proficiency
*
Qualitative property
Qualitative properties are properties that are observed and can generally not be measured with a numerical result. They are contrasted to quantitative properties which have numerical characteristics.
Some engineering and scientific properties are ...
*
Precision and recall (equivalent classification schema)
References
Bibliography
*
Nello Cristianini
Nello Cristianini (born 1968) is a Professor of Artificial Intelligence in the Department of Computer Science at the University of Bristol.
Education
Cristianini holds a degree in physics from the University of Trieste, a Master in computati ...
and
John Shawe-Taylor
John Stewart Shawe-Taylor (born 1953) is Director of the Centre for Computational Statistics and Machine Learning at University College, London (UK). His main research area is statistical learning theory. He has contributed to a number of f ...
. ''An Introduction to Support Vector Machines and other kernel-based learning methods''. Cambridge University Press, 2000. '
SVM Book)''
* John Shawe-Taylor and Nello Cristianini. ''Kernel Methods for Pattern Analysis''. Cambridge University Press, 2004.
Website for the book
* Bernhard Schölkopf and A. J. Smola: ''Learning with Kernels''. MIT Press, Cambridge, Massachusetts, 2002.
{{Statistics, analysis, , state=expanded
Statistical classification
Machine learning