Binary classifier
   HOME

TheInfoList



OR:

Binary classification is the task of classifying the elements of a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
into two groups (each called ''class'') on the basis of a
classification rule Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfe ...
. Typical binary classification problems include: *
Medical test A medical test is a medical procedure performed to detect, diagnose, or monitor diseases, disease processes, susceptibility, or to determine a course of treatment. Medical tests such as, physical and visual exams, diagnostic imaging, genetic ...
ing to determine if a patient has certain disease or not; *
Quality control Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements". This approach place ...
in industry, deciding whether a specification has been met; * In
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other c ...
, deciding whether a page should be in the
result set An SQL result set is a set of rows from a database, as well as metadata about the query such as the column names, and the types and sizes of each column. Depending on the database system In computing, a database is an organized collection of ...
of a search or not. Binary classification is dichotomization applied to a practical situation. In many practical binary classification problems, the two groups are not symmetric, and rather than overall accuracy, the relative proportion of different types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a ''
false positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
'') is considered differently from not detecting a disease when it is present (a ''
false negative A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
'').


Statistical binary classification

Statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagn ...
is a problem studied in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. It is a type of
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification. Some of the methods commonly used for binary classification are: *
Decision trees A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains cond ...
*
Random forests Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of t ...
*
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Ba ...
s *
Support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborat ...
s *
Neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s *
Logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression an ...
* Probit model *
Genetic Programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
* Multi expression programming *
Linear genetic programming :''"Linear genetic programming" is unrelated to "linear programming".'' Linear genetic programming (LGP) is a particular subset of genetic programming wherein computer programs in a population are represented as a sequence of instructions from i ...
Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the
feature vector In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
, the noise in the data and many other factors. For example,
random forests Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of t ...
perform better than SVM classifiers for 3D point clouds.


Evaluation of binary classifiers

There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. In medicine
sensitivity and specificity ''Sensitivity'' and ''specificity'' mathematically describe the accuracy of a test which reports the presence or absence of a condition. Individuals for which the condition is satisfied are considered "positive" and those for which it is not are ...
are often used, while in information retrieval
precision and recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also call ...
are preferred. An important distinction is between metrics that are independent of how often each category occurs in the population (the ''
prevalence In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
''), and metrics that depend on the prevalence – both types are useful, but they have very different properties. Given a classification of a specific data set, there are four basic combinations of actual data category and assigned category:
true positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result ...
s TP (correct positive assignments),
true negative A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
s TN (correct negative assignments),
false positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
s FP (incorrect positive assignments), and
false negative A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
s FN (incorrect negative assignments). These can be arranged into a 2×2
contingency table In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...
, with columns corresponding to actual value – condition positive or condition negative – and rows corresponding to classification value – test outcome positive or test outcome negative.


The eight basic ratios

There are eight basic ratios that one can compute from this table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio". There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements. The row ratios are: *
true positive rate ''Sensitivity'' and ''specificity'' mathematically describe the accuracy of a test which reports the presence or absence of a condition. Individuals for which the condition is satisfied are considered "positive" and those for which it is not are ...
(TPR) = (TP/(TP+FN)), aka sensitivity or
recall Recall may refer to: * Recall (bugle call), a signal to stop * Recall (information retrieval), a statistical measure * ''ReCALL'' (journal), an academic journal about computer-assisted language learning * Recall (memory) * ''Recall'' (Overwatc ...
. These are the proportion of the ''population with the condition'' for which the test is correct. **with complement the false negative rate (FNR) = (FN/(TP+FN)) * true negative rate (TNR) = (TN/(TN+FP), aka specificity (SPC), **with complement
false positive rate In statistics, when performing multiple comparisons, a false positive ratio (also known as fall-out or false alarm ratio) is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as th ...
(FPR) = (FP/(TN+FP)), also called independent of
prevalence In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
The column ratios are: *
positive predictive value The positive and negative predictive values (PPV and NPV respectively) are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV des ...
(PPV, aka precision) (TP/(TP+FP)). These are the proportion of the ''population with a given test result'' for which the test is correct. **with complement the
false discovery rate In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expec ...
(FDR) (FP/(TP+FP)) *
negative predictive value The positive and negative predictive values (PPV and NPV respectively) are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV des ...
(NPV) (TN/(TN+FN)) **with complement the false omission rate (FOR) (FN/(TN+FN)), also called dependence on prevalence. In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as
sensitivity and specificity ''Sensitivity'' and ''specificity'' mathematically describe the accuracy of a test which reports the presence or absence of a condition. Individuals for which the condition is satisfied are considered "positive" and those for which it is not are ...
. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as
precision and recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also call ...
. One can take ratios of a complementary pair of ratios, yielding four likelihood ratios (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding likelihood ratios in diagnostic testing. Taking the ratio of one of these groups of ratios yields a final ratio, the diagnostic odds ratio (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an
odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due ...
– and is prevalence-independent. There are a number of other metrics, most simply the
accuracy Accuracy and precision are two measures of '' observational error''. ''Accuracy'' is how close a given set of measurements ( observations or readings) are to their '' true value'', while ''precision'' is how close the measurements are to each o ...
or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The
F-score In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the n ...
combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score (
F1 score In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the ...
). Some metrics come from regression coefficients: the
markedness In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...
and the
informedness Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test. Informedness is its generalization to the multiclass case and estimates the probability of an informed decision ...
, and their
geometric mean In mathematics, the geometric mean is a mean or average which indicates a central tendency of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the ...
, the
Matthews correlation coefficient In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as ...
. Other metrics include
Youden's J statistic Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test. Informedness is its generalization to the multiclass case and estimates the probability of an informed decision ...
, the
uncertainty coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy. Definition S ...
, the
phi coefficient In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as ...
, and
Cohen's kappa Cohen's kappa coefficient (''κ'', lowercase Greek kappa) is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. It is generally thought to be a more robust measure th ...
.


Converting continuous values to binary

Tests whose results are of continuous values, such as most
blood values Reference ranges (reference intervals) for blood tests are sets of values used by a health professional to interpret a set of medical test results from blood samples. Reference ranges for blood tests are studied within the field of clinical chemis ...
, can artificially be made binary by defining a cutoff value, with test results being designated as positive or negative depending on whether the resultant value is higher or lower than the cutoff. However, such conversion causes a loss of information, as the resultant binary classification does not tell ''how much'' above or below the cutoff a value is. As a result, when converting a continuous value that is close to the cutoff to a binary one, the resultant
positive Positive is a property of positivity and may refer to: Mathematics and science * Positive formula, a logical formula not containing negation * Positive number, a number that is greater than 0 * Plus sign, the sign "+" used to indicate a posi ...
or
negative predictive value The positive and negative predictive values (PPV and NPV respectively) are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV des ...
is generally higher than the
predictive value Predictive value of tests is the probability of a target condition given by the result of a test, often in regard to medical tests. *In cases where binary classification can be applied to the test results, such yes versus no, test target (such as ...
given directly from the continuous value. In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty. For example, with the urine concentration of hCG as a continuous value, a urine
pregnancy test A pregnancy test is used to determine whether a female is pregnant or not. The two primary methods are testing for the female pregnancy hormone (human chorionic gonadotropin (hCG)) in blood or urine using a pregnancy test kit, and scanning with ...
that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value. On the other hand, a test result very far from the cutoff generally has a resultant positive or negative predictive value that is lower than the predictive value given from the continuous value. For example, a urine hCG value of 200,000 mIU/ml confers a very high probability of pregnancy, but conversion to binary values results in that it shows just as "positive" as the one of 52 mIU/ml.


See also

* Examples of Bayesian inference *
Classification rule Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfe ...
*
Confusion matrix In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a su ...
*
Detection theory Detection theory or signal detection theory is a means to measure the ability to differentiate between information-bearing patterns (called stimulus in living organisms, signal in machines) and random patterns that distract from the information ( ...
*
Kernel methods In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for exampl ...
* Multiclass classification *
Multi-label classification In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of mult ...
*
One-class classification In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to ''identify'' objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects ...
*
Prosecutor's fallacy The prosecutor's fallacy is a fallacy of statistical reasoning involving a test for an occurrence, such as a DNA match. A positive result in the test may paradoxically be more likely to be an erroneous result than an actual occurrence, even i ...
*
Receiver operating characteristic A receiver operating characteristic curve, or ROC curve, is a graph of a function, graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally develope ...
*
Thresholding (image processing) In digital image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images. Definition The simplest thresholding methods replace each pixel in an image with ...
*
Uncertainty coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy. Definition S ...
, aka proficiency *
Qualitative property Qualitative properties are properties that are observed and can generally not be measured with a numerical result. They are contrasted to quantitative properties which have numerical characteristics. Some engineering and scientific properties are ...
*
Precision and recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also call ...
(equivalent classification schema)


References


Bibliography

*
Nello Cristianini Nello Cristianini (born 1968) is a Professor of Artificial Intelligence in the Department of Computer Science at the University of Bristol. Education Cristianini holds a degree in physics from the University of Trieste, a Master in computation ...
and John Shawe-Taylor. ''An Introduction to Support Vector Machines and other kernel-based learning methods''. Cambridge University Press, 2000. '

SVM Book)'' * John Shawe-Taylor and Nello Cristianini. ''Kernel Methods for Pattern Analysis''. Cambridge University Press, 2004.
Website for the book
* Bernhard Schölkopf and A. J. Smola: ''Learning with Kernels''. MIT Press, Cambridge, Massachusetts, 2002. {{Statistics, analysis, , state=expanded Statistical classification Machine learning