The Kolmogorov axioms are the foundations of

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

introduced by Russian mathematician

Andrey Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...

in 1933. These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases. An alternative approach to formalising probability, favoured by some Bayesians, is given by

Cox's theorem Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability, as the laws of pr ...

Axioms

The assumptions as to setting up the axioms can be summarised as follows: Let

(\Omega, F, P)

be a

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

with

P(E)

being the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

of some

event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of ev ...

E'','' and

P(\Omega) = 1

. Then

(\Omega, F, P)

is a

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

, with sample space

\Omega

, event space

F

and

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

P

First axiom

The probability of an event is a non-negative real number: :

P(E)\in\mathbb, P(E)\geq 0 \qquad \forall E \in F

where

F

is the event space. It follows that

P(E)

is always finite, in contrast with more general measure theory. Theories which assign

negative probability The probability of the outcome of an experiment is never negative, although a quasiprobability distribution allows a negative probability, or quasiprobability for some events. These distributions may apply to unobservable events or conditional prob ...

relax the first axiom.

Second axiom

This is the assumption of

unit measure Unit measure is an axiom of probability theory that states that the probability of the entire sample space is equal to one (unity Unity may refer to: Buildings * Unity Building, Oregon, Illinois, US; a historic building * Unity Building (Chicago), ...

: that the probability that at least one of the

elementary event In probability theory, an elementary event, also called an atomic event or sample point, is an event which contains only a single outcome in the sample space. Using set theory terminology, an elementary event is a singleton. Elementary events a ...

s in the entire sample space will occur is 1 :

P(\Omega) = 1.

Third axiom

This is the assumption of σ-additivity: : Any

countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural number ...

sequence of

disjoint sets In mathematics, two sets are said to be disjoint sets if they have no element in common. Equivalently, two disjoint sets are sets whose intersection is the empty set.. For example, and are ''disjoint sets,'' while and are not disjoint. ...

(synonymous with ''

mutually exclusive In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...

'' events)

E_1, E_2, \ldots

satisfies ::

P\left(\bigcup_^\infty E_i\right) = \sum_^\infty P(E_i).

Some authors consider merely

finitely additive In mathematics, an additive set function is a function mapping sets to numbers, with the property that its value on a union of two disjoint sets equals the sum of its values on these sets, namely, \mu(A \cup B) = \mu(A) + \mu(B). If this additivity ...

probability spaces, in which case one just needs an

algebra of sets In mathematics, the algebra of sets, not to be confused with the mathematical structure of ''an'' algebra of sets, defines the properties and laws of sets, the set-theoretic operations of union, intersection, and complementation and the r ...

, rather than a

σ-algebra In mathematical analysis and in probability theory, a σ-algebra (also σ-field) on a set ''X'' is a collection Σ of subsets of ''X'' that includes the empty subset, is closed under complement, and is closed under countable unions and countabl ...

Quasiprobability distribution A quasiprobability distribution is a mathematical object similar to a probability distribution but which relaxes some of Kolmogorov's axioms of probability theory. Quasiprobabilities share several of general features with ordinary probabilities, ...

s in general relax the third axiom.

Consequences

From the

Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...

axioms, one can deduce other useful rules for studying probabilities. The proofs of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:

Monotonicity

\quad\text\quad A\subseteq B\quad\text\quad P(A)\leq P(B).

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

''Proof of monotonicity''

In order to verify the monotonicity property, we set

E_1=A

and

E_2=B\setminus A

, where

A\subseteq B

and

E_i=\varnothing

for

i\geq 3

. From the properties of the

empty set In mathematics, the empty set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, while in oth ...

(

\varnothing

), it is easy to see that the sets

E_i

are pairwise disjoint and

E_1\cup E_2\cup\cdots=B

. Hence, we obtain from the third axiom that :

P(A)+P(B\setminus A)+\sum_^\infty P(E_i)=P(B).

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to

P(B)

which is finite, we obtain both

P(A)\leq P(B)

and

P(\varnothing)=0

The probability of the empty set

P(\varnothing)=0.

In many cases,

\varnothing

is not the only event with probability 0.

''Proof of probability of the empty set''

Define

E_i := \varnothing

for

i \in \N

, then these are disjoint, and

\bigcup_^\infty E_i = \varnothing = E_1

, hence by the third axiom

\sum_^\infty P(E_i) = P(E_1)

; subtracting

P(E_1)

(which is finite by the first axiom) yields

\sum_^\infty P(E_i) = 0

. From this together with the first axiom follows

0 \leq P(E_2) \leq \sum_^\infty P(E_i) = 0

, thus

P(E_2) = P(\varnothing) = 0

The complement rule

P\left(A^\right) = P(\Omega-A) = 1 - P(A)

''Proof of the complement rule''

Given

A

and

A^

are mutually exclusive and that

A \cup A^c = \Omega

P(A \cup A^c)=P(A)+P(A^c)

''... (by axiom 3)'' and,

P(A \cup A^c)=P(\Omega)=1

... ''(by axiom 2)''

\Rightarrow P(A)+P(A^c)=1

\therefore    P(A^c)=1-P(A)

The numeric bound

It immediately follows from the monotonicity property that :

0\leq P(E)\leq 1\qquad \forall E\in F.

''Proof of the numeric bound''

Given the complement rule

P(E^c)=1-P(E)

and ''axiom 1''

P(E^c)\geq0

1-P(E) \geq 0

\Rightarrow 1 \geq P(E)

\therefore 0\leq P(E)\leq 1

Further consequences

Another important property is: :

P(A \cup B) = P(A) + P(B) - P(A \cap B).

This is called the addition law of probability, or the sum rule. That is, the probability that an event in ''A'' ''or'' ''B'' will happen is the sum of the probability of an event in ''A'' and the probability of an event in ''B'', minus the probability of an event that is in both ''A'' ''and'' ''B''. The proof of this is as follows: Firstly, :

P(A\cup B) = P(A) + P(B\setminus A)

... ''(by Axiom 3)'' So, :

P(A \cup B) = P(A) + P(B\setminus  (A \cap B))

(by

B \setminus A = B\setminus  (A \cap B)

). Also, :

P(B) = P(B\setminus (A \cap B)) + P(A \cap B)

and eliminating

P(B\setminus (A \cap B))

from both equations gives us the desired result. An extension of the addition law to any number of sets is the

inclusion–exclusion principle In combinatorics, a branch of mathematics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union of two finite sets; symbolically expressed as : , A \c ...

. Setting ''B'' to the complement ''A^c'' of ''A'' in the addition law gives :

P\left(A^\right) = P(\Omega\setminus A) = 1 - P(A)

That is, the probability that any event will ''not'' happen (or the event's

complement A complement is something that completes something else. Complement may refer specifically to: The arts * Complement (music), an interval that, when added to another, spans an octave ** Aggregate complementation, the separation of pitch-clas ...

) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair. We may define: :

\Omega = \

F = \

Kolmogorov's axioms imply that: :

P(\varnothing) = 0

The probability of ''neither'' heads ''nor'' tails, is 0. :

P(\^c) = 0

The probability of ''either'' heads ''or'' tails, is 1. :

P(\) + P(\) = 1

The sum of the probability of heads and the probability of tails, is 1.