Binary classification is the task of

classifying Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...

the elements of a

set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...

into one of two groups (each called ''class''). Typical binary classification problems include: *

Medical test A medical test is a medical procedure performed to detect, diagnose, or monitor diseases, disease processes, susceptibility, or to determine a course of treatment. Medical tests such as, physical and visual exams, diagnostic imaging, genetic ...

ing to determine if a patient has a certain disease or not; *

Quality control Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements". This approach plac ...

in industry, deciding whether a specification has been met; * In

information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...

, deciding whether a page should be in the

result set A result set is the set of results returned by a query, usually in the same format as the database the query is called on. For example, in SQL, which is used in conjunction with relational databases, it is the result of a SELECT query on a table ...

of a search or not * In

administration Administration may refer to: Management of organizations * Management, the act of directing people towards accomplishing a goal: the process of dealing with or controlling things or people. ** Administrative assistant, traditionally known as a se ...

, deciding whether someone should be issued with a driving licence or not * In

cognition Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...

, deciding whether an object is food or not food. When measuring the accuracy of a binary classifier, the simplest way is to count the errors. But in the real world often one of the two classes is more important, so that the number of both of the different types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a ''

false positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resu ...

'') is considered differently from not detecting a disease when it is present (a ''

false negative A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resu ...

'').

Four outcomes

Given a classification of a specific data set, there are four basic combinations of actual data category and assigned category:

true positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resu ...

s TP (correct positive assignments),

true negative A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resu ...

s TN (correct negative assignments),

s FP (incorrect positive assignments), and

s FN (incorrect negative assignments). These can be arranged into a 2×2

contingency table In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...

, with rows corresponding to actual value – condition positive or condition negative – and columns corresponding to classification value – test outcome positive or test outcome negative.

Evaluation

From tallies of the four basic outcomes, there are many approaches that can be used to measure the accuracy of a classifier or predictor. Different fields have different preferences.

The eight basic ratios

A common approach to evaluation is to begin by computing two ratios of a standard pattern. There are eight basic ratios of this form that one can compute from the contingency table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio". There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements. The row ratios are: *

true positive rate In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...

(TPR) = (TP/(TP+FN)), aka sensitivity or

recall Recall may refer to: * Recall (baseball), a baseball term * Recall (bugle call), a signal to stop * Recall (information retrieval), a statistical measure * ReCALL (journal), ''ReCALL'' (journal), an academic journal about computer-assisted langua ...

. These are the proportion of the ''population with the condition'' for which the test is correct. **with complement the false negative rate (FNR) = (FN/(TP+FN)) * true negative rate (TNR) = (TN/(TN+FP), aka specificity (SPC), **with complement false positive rate (FPR) = (FP/(TN+FP)), also called independent of

prevalence In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...

The column ratios are: *

positive predictive value The positive and negative predictive values (PPV and NPV respectively) are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV desc ...

(PPV, aka precision) (TP/(TP+FP)). These are the proportion of the ''population with a given test result'' for which the test is correct. **with complement the

false discovery rate In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the exp ...

(FDR) (FP/(TP+FP)) *

negative predictive value The positive and negative predictive values (PPV and NPV respectively) are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV desc ...

(NPV) (TN/(TN+FN)) **with complement the

false omission rate The positive and negative predictive values (PPV and NPV respectively) are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV desc ...

(FOR) (FN/(TN+FN)), also called dependence on prevalence. In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as

sensitivity and specificity In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...

. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as

precision and recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also calle ...

. Cullerne Bown has suggested a flow chart for determining which pair of indicators should be used when. Otherwise, there is no general rule for deciding. There is also no general agreement on how the pair of indicators should be used to decide on concrete questions, such as when to prefer one classifier over another. One can take ratios of a complementary pair of ratios, yielding four likelihood ratios (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding

likelihood ratios in diagnostic testing In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They combine sensitivity and specificity into a single metric that indicates how much a test result shifts the probability that a co ...

. Taking the ratio of one of these groups of ratios yields a final ratio, the diagnostic odds ratio (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an

odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B ...

– and is prevalence-independent.

Other metrics

There are a number of other metrics, most simply the

accuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''. ''Precision'' is how close the measurements are to each other. The ...

or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The

F-score In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number o ...

combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score (

F1 score In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number o ...

). Some metrics come from

regression coefficient In statistics, linear regression is a model that estimates the relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A model with exactly one explanatory variable ...

s: the

markedness In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...

and the

informedness Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test. In meteorology, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discriminant (HKD) ...

, and their

geometric mean In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...

, the

Matthews correlation coefficient In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...

. Other metrics include

Youden's J statistic Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomy, dichotomous diagnostic test. In meteorology, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discrim ...

, the

uncertainty coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal Association (statistics), association. It was first introduced by Henri Theil and is based on the concept of informatio ...

, the

phi coefficient In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...

, and

Cohen's kappa Cohen's kappa coefficient ('κ', lowercase Greek kappa) is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. It is generally thought to be a more robust measure than ...

Statistical binary classification

Statistical classification When classification is performed by a computer, statistical methods are normally used to develop the algorithm. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...

is a problem studied in

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

in which the classification is performed on the basis of a

classification rule Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfe ...

. It is a type of

supervised learning In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...

, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification. Some of the methods commonly used for binary classification are: *

Decision trees A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...

Random forests Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random for ...

Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...

s *

Support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

s *

Neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

Logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

Probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...

Genetic Programming Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection (evolutionary algorithm), selection a ...

Multi expression programming Multi Expression Programming (MEP) is an evolutionary algorithm for generating mathematical functions describing a given set of data. MEP is a Genetic Programming variant encoding multiple solutions in the same chromosome. MEP representation is no ...

Linear genetic programming :''"Linear genetic programming" is unrelated to " linear programming".'' Linear genetic programming (LGP)M. Brameier, W. Banzhaf,Linear Genetic Programming, Springer, New York, 2007 is a particular method of genetic programming wherein computer ...

Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the

feature vector In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a data set. Choosing informative, discriminating, and independent features is crucial to produce effective algorithms for pattern re ...

, the noise in the data and many other factors. For example,

random forests Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random for ...

perform better than SVM classifiers for 3D point clouds.

Converting continuous values to binary

Binary classification may be a form of

dichotomization In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerica ...

in which a continuous function is transformed into a binary variable. Tests whose results are of continuous values, such as most blood values, can artificially be made binary by defining a cutoff value, with test results being designated as positive or negative depending on whether the resultant value is higher or lower than the cutoff. However, such conversion causes a loss of information, as the resultant binary classification does not tell ''how much'' above or below the cutoff a value is. As a result, when converting a continuous value that is close to the cutoff to a binary one, the resultant positive or

is generally higher than the

predictive value Predictive value of tests is the probability of a target condition given by the result of a test, often in regard to medical tests. *In cases where binary classification can be applied to the test results, such yes versus no, test target (such as a ...

given directly from the continuous value. In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty. For example, with the urine concentration of hCG as a continuous value, a urine

pregnancy test A pregnancy test is used to determine whether a person is Pregnancy, pregnant or not. The two primary methods are testing for the pregnancy hormone (human chorionic gonadotropin (hCG)) in blood or urine using a pregnancy test kit, and scanning ...

that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value. On the other hand, a test result very far from the cutoff generally has a resultant positive or negative predictive value that is lower than the predictive value given from the continuous value. For example, a urine hCG value of 200,000 mIU/ml confers a very high probability of pregnancy, but conversion to binary values results in that it shows just as "positive" as the one of 52 mIU/ml.

References

Bibliography

Nello Cristianini Nello Cristianini (born 1968) is a professor of Artificial Intelligence in the Department of Computer Science at the University of Bath. Education Cristianini holds a degree in physics from the University of Trieste, a Master in computational ...

and John Shawe-Taylor. ''An Introduction to Support Vector Machines and other kernel-based learning methods''. Cambridge University Press, 2000. '

SVM Book)'' * John Shawe-Taylor and Nello Cristianini. ''Kernel Methods for Pattern Analysis''. Cambridge University Press, 2004.
Website for the book
* Bernhard Schölkopf and A. J. Smola: ''Learning with Kernels''. MIT Press, Cambridge, Massachusetts, 2002. {{Statistics, analysis, , state=expanded Statistical classification Machine learning

Four outcomes

Evaluation

The eight basic ratios

Other metrics

Statistical binary classification

Converting continuous values to binary

See also

References

Bibliography