P4-metric
   HOME

TheInfoList



OR:

P4 metric enables performance evaluation of the
binary classifier Binary classification is the task of classifying the elements of a set into two groups (each called ''class'') on the basis of a classification rule. Typical binary classification problems include: * Medical testing to determine if a patient has c ...
. It is calculated from
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
,
recall Recall may refer to: * Recall (bugle call), a signal to stop * Recall (information retrieval), a statistical measure * ''ReCALL'' (journal), an academic journal about computer-assisted language learning * Recall (memory) * ''Recall'' (Overwatch ...
, specificity and NPV (negative predictive value). P4 is designed in similar way to F1 metric, however addressing the criticisms leveled against F1. It may be perceived as its extension. Like the other known metrics, P4 is a function of: TP (true positives), TN (true negatives), FP (
false positives A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result ...
), FN (
false negatives A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result ...
).


Justification

The key concept of P4 is to leverage the four key conditional probabilities: :P(+ \mid C) - the probability that the sample is positive, provided the classifier result was positive. :P(C \mid +) - the probability that the classifier result will be positive, provided the sample is positive. :P(C \mid -) - the probability that the classifier result will be negative, provided the sample is negative. :P(- \mid C) - the probability the sample is negative, provided the classifier result was negative. The main assumption behind this metric is, that a properly designed binary classifier should give the results for which all the probabilities mentioned above are close to 1. P4 is designed the way that \mathrm_4 = 1 requires all the probabilities being equal 1. It also goes to zero when any of these probabilities go to zero.


Definition

P4 is defined as a
harmonic mean In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired. The harmonic mean can be expressed as the recipro ...
of four key conditional probabilities: :\mathrm_4 = \frac = \frac In terms of TP,TN,FP,FN it can be calculated as follows: :\mathrm_4 = \frac


Evaluation of the binary classifier performance

Evaluating the performance of binary classifier is a multidisciplinary concept. It spans from the evaluation of medical tests, psychiatric tests to
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
classifiers from a variety of fields. Thus, many metrics in use exist under several names. Some of them being defined independently.


Properties of P4 metric

* Symmetry - contrasting to the F1 metric, P4 is symmetrical. It means - it does not change its value when dataset labeling is changed - positives named negatives and negatives named positives. * Range: \mathrm_4 \in ,1/math> * Achieving \mathrm_4 \approx 1 requires all the key four conditional probabilities being close to 1. * For \mathrm_4 \approx 0 it is sufficient that one of the key four conditional probabilities is close to 0.


Examples, comparing with the other metrics

Dependency table for selected metrics ("true" means depends, "false" - does not depend): Metrics that do not depend on a given probability are prone to misrepresentation when it approaches 0.


Example 1: Rare disease detection test

Let us consider the medical test aimed to detect kind of rare disease. Population size is 100 000, while 0.05% population is infected. Test performance: 95% of all positive individuals are classified correctly ( TPR=0.95) and 95% of all negative individuals are classified correctly ( TNR=0.95). In such a case, due to high population imbalance, in spite of having high test
accuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value'', while ''precision'' is how close the measurements are to each other ...
(0.95), the probability that an individual who has been classified as positive is in fact positive is very low: :P(+ \mid C) = 0.0095 And now we can observe how this low probability is reflected in some of the metrics: * \mathrm_4 = 0.0370 * \mathrm_1 = 0.0188 * \mathrm = \mathbf (''Informedness'' / ''Youden index'') * \mathrm = 0.0095 (''Markedness'')


Example 2: Image recognition - cats vs dogs

We are training neural network based image classifier. We are considering only two types of images: containing dogs (labeled as 0) and containing cats (labeled as 1). Thus, our goal is to distinguish between the cats and dogs. The classifier overpredicts in favor of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. The image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In such a situation, the probability that the picture containing dog will be classified correctly is pretty low: :P(C-, -) = 0.01 Not all the metrics are noticing this low probability: * \mathrm_4 = 0.0388 * \mathrm_1 = \mathbf * \mathrm = 0.0099 (''Informedness'' / ''Youden index'') * \mathrm = \mathbf (''Markedness'')


See also

*
F-score In statistics, statistical analysis of binary classification, the F-score or F-measure is a measure of a test's Accuracy_and_precision#In_binary_classification, accuracy. It is calculated from the Precision (information retrieval), precision and ...
*
Informedness Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test. Informedness is its generalization to the multiclass case and estimates the probability of an informed decision. ...
*
Markedness In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...
*
Matthews correlation coefficient In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a ...
*
Precision and Recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also called ...
*
Sensitivity and Specificity ''Sensitivity'' and ''specificity'' mathematically describe the accuracy of a test which reports the presence or absence of a condition. Individuals for which the condition is satisfied are considered "positive" and those for which it is not are ...
* NPV *
Confusion matrix In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a su ...


References

{{reflist Statistical natural language processing Evaluation of machine translation Statistical ratios Summary statistics for contingency tables Clustering criteria