In statistical classification, Bayes error rate is the lowest possible error rate for any classifier of a random outcome (into, for example, one of two categories) and is analogous to the irreducible error.K. Tumer, K. (1996) "Estimating the Bayes error rate through classifier combining" in ''Proceedings of the 13th International Conference on Pattern Recognition'', Volume 2, 695–699 A number of approaches to the estimation of the Bayes error rate exist. One method seeks to obtain analytical bounds which are inherently dependent on distribution parameters, and hence difficult to estimate. Another approach focuses on class densities, while yet another method combines and compares various classifiers. The Bayes error rate finds important use in the study of patterns and machine learning techniques.

Error determination

In terms of machine learning and pattern classification, the labels of a set of random observations can be divided into 2 or more classes. Each observation is called an ''instance'' and the class it belongs to is the ''label''. The Bayes error rate of the data distribution is the probability an instance is misclassified by a classifier that knows the true class probabilities given the predictors. For a

multiclass classifier In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary c ...

, the expected prediction error may be calculated as follows: :

EPE = E_x x) /math>

where ''x'' is the instance, E[] the expectation value, ''C

_k'' is a class into which an instance is classified, ''P(C_k, x)'' is the conditional probability of label ''k'' for instance ''x'', and ''L()'' is the 0–1 loss function: :

L(x,y)= 1-\delta_=\begin0 & \text x=y \\ 1 & \text x\neq y \end,

where

\delta_

is the Kronecker delta. When the learner knows the conditional probability, then one solution is: :

\hat_B(x) =  \arg \max_ P(C_k, X=x)

This solution is known as the Bayes classifier. The corresponding expected Prediction Error is called the Bayes error rate: :

BE = E_x x) = E_x x) = E_x x)

, where the sum can be omitted in the last step due to considering the counter event. By the definition of the Bayes classifier, it maximizes

P(\hat_B(x), x)

and, therefore, minimizes the Bayes error BE. The Bayes error is non-zero if the classification labels are not deterministic, i.e., there is a non-zero probability of a given instance belonging to more than one class.. In a regression context with squared error, the Bayes error is equal to the noise variance.

Proof of Minimality

Proof that the Bayes error rate is indeed the minimum possible and that the Bayes classifier is therefore optimal, may be found together on the Wikipedia page

Bayes classifier In statistical classification, the Bayes classifier minimizes the probability of misclassification. Definition Suppose a pair (X,Y) takes values in \mathbb^d \times \, where Y is the class label of X. Assume that the conditional distribution of ' ...

References

Statistical classification Error rate {{Statistics-stub

Error determination

Proof of Minimality

See also

References