HOME

TheInfoList



OR:

In
statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...
, the Bayes classifier minimizes the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of misclassification.


Definition

Suppose a pair (X,Y) takes values in \mathbb^d \times \, where Y is the class label of X. Assume that the
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
of ''X'', given that the label ''Y'' takes the value ''r'' is given by :(X\mid Y=r) \sim P_r for r=1,2,\dots,K where "\sim" means "is distributed as", and where P_r denotes a probability distribution. A classifier is a rule that assigns to an observation ''X''=''x'' a guess or estimate of what the unobserved label ''Y''=''r'' actually was. In theoretical terms, a classifier is a measurable function C: \mathbb^d \to \, with the interpretation that ''C'' classifies the point ''x'' to the class ''C''(''x''). The probability of misclassification, or
risk In simple terms, risk is the possibility of something bad happening. Risk involves uncertainty about the effects/implications of an activity with respect to something that humans value (such as health, well-being, wealth, property or the environme ...
, of a classifier ''C'' is defined as :\mathcal(C) = \operatorname\. The Bayes classifier is :C^\text(x) = \underset \operatorname(Y=r \mid X=x). In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, \operatorname(Y=r \mid X=x). The Bayes classifier is a useful benchmark in
statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...
. The excess risk of a general classifier C (possibly depending on some training data) is defined as \mathcal(C) - \mathcal(C^\text). Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
if the excess risk converges to zero as the size of the training data set tends to infinity. Considering the components x_i of x to be mutually independent, we get the
naive bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...
, where C^\text(x) = \underset \operatorname(Y=r)\prod_^P_r(x).


Proof of Optimality

Proof that the Bayes classifier is optimal and
Bayes error rate In statistical classification, Bayes error rate is the lowest possible error rate for any classifier of a random outcome (into, for example, one of two categories) and is analogous to the irreducible error.K. Tumer, K. (1996) "Estimating the Bayes e ...
is minimal proceeds as follows. Define the variables: Risk R(h), Bayes risk R^*, all possible classes to which the points can be classified Y = \. Let the posterior probability of a point belonging to class 1 be \eta(x)=Pr(Y=1, X=x). Define the classifier \mathcal^*as \mathcal^*(x)=\begin1&,\eta(x)\geqslant 0.5\\ 0&,\eta(x)<0.5\end Then we have the following results: (a) R(h^*)=R^*, i.e. h^* is a Bayes classifier, (b) For any classifier h, the ''excess risk'' satisfies R(h)-R^*=2\mathbb_X\left min(\eta(X),1-.html" ;"title="\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X 2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>\eta(x)-0.5, \cdot \mathbb_\right/math> (c) R^* = \mathbb_X\left min(\eta(X),1-\eta(X))\right/math> Proof of (a): For any classifier h, we have R(h) = \mathbb_\left \mathbb_ \right =\mathbb_X\mathbb_ mathbb_ /math> (due to
Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...
) = \mathbb_X eta(X)\mathbb_ +(1-\eta(X))\mathbb_ Notice that R(h) is minimised by taking \forall x\in X, h(x)=\begin1&,\eta(x)\geqslant 1-\eta(x)\\ 0&,\text\end Therefore the minimum possible risk is the Bayes risk, R^*= R(h^*). Proof of (b): \begin R(h)-R^* &= R(h)-R(h^*)\\ &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_-\eta(X)\mathbb_-(1-\eta(X))\mathbb_\ &=\mathbb_X 2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification>2\eta(X)-1, \mathbb_\ &= 2\mathbb_X \eta(X)-0.5, \mathbb_\end Proof of (c): \begin R(h^*) &= \mathbb_X eta(X)\mathbb_+(1-\eta(X))\mathbb_\ &= \mathbb_X min(\eta(X),1-\eta(X))\end The general case that the Bayes classifier minimises classification error when each element can belong to either of ''n'' categories proceeds by towering expectations as follows. \begin \mathbb_Y(\mathbb_) &= \mathbb_X\mathbb_\left(\mathbb_, X=x\right)\\ &= \mathbb\left X=x)\mathbb_+Pr(Y=2, X=x)\mathbb_+\dots+Pr(Y=n, X=x)\mathbb_\right \end This is minimised by simultaneously minimizing all the terms of the expectation using the classifierh(x)=k,\quad \arg\max_Pr(Y=k, X=x) for each observation ''x''.


See also

*
Naive Bayes classifier In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...


References

{{reflist, 1 Bayesian statistics Statistical classification