In probability theory, the

central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...

(CLT) states that, in many situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution. This article gives two illustrations of this theorem. Both involve the sum of

independent and identically-distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...

and show how the

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

of the sum approaches the

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

as the number of terms in the sum increases. The first illustration involves a

continuous probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

, for which the random variables have a

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

. The second illustration, for which most of the computation can be done by hand, involves a

discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

, which is characterized by a

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

Illustration of the continuous case

The density of the sum of two independent real-valued random variables equals the

convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...

of the density functions of the original variables. Thus, the density of the sum of ''m''+''n'' terms of a sequence of independent identically distributed variables equals the convolution of the densities of the sums of ''m'' terms and of ''n'' term. In particular, the density of the sum of ''n''+1 terms equals the convolution of the density of the sum of ''n'' terms with the original density (the "sum" of 1 term). A

is shown in the first figure below. Then the densities of the sums of two, three, and four

independent identically distributed variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...

, each having the original density, are shown in the following figures. If the original density is a

piecewise In mathematics, a piecewise-defined function (also called a piecewise function, a hybrid function, or definition by cases) is a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain. Pi ...

polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...

, as it is in the example, then so are the sum densities, of increasingly higher degree. Although the original density is far from normal, the density of the sum of just a few variables with that density is much smoother and has some of the qualitative features of the normal density. The convolutions were computed via the

discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex- ...

. A list of values ''y'' = ''f''(''x''₀ + ''k'' Δ''x'') was constructed, where ''f'' is the original density function, and Δ''x'' is approximately equal to 0.002, and ''k'' is equal to 0 through 1000. The discrete Fourier transform ''Y'' of ''y'' was computed. Then the convolution of ''f'' with itself is proportional to the inverse discrete Fourier transform of the

pointwise product In mathematics, the pointwise product of two functions is another function, obtained by multiplying the images of the two functions at each value in the domain. If and are both functions with domain and codomain , and elements of can be mul ...

of ''Y'' with itself.

Original probability density function

We start with a probability density function. This function, although discontinuous, is far from the most pathological example that could be created. It is a piecewise polynomial, with pieces of degrees 0 and 1. The mean of this distribution is 0 and its standard deviation is 1.

Probability density function of the sum of two terms

Next we compute the density of the sum of two independent variables, each having the above density. The density of the sum is the

of the above density with itself. The sum of two variables has mean 0. The density shown in the figure at right has been rescaled by

\sqrt

, so that its standard deviation is 1. This density is already smoother than the original. There are obvious lumps, which correspond to the intervals on which the original density was defined.

Probability density function of the sum of three terms

We then compute the density of the sum of three independent variables, each having the above density. The density of the sum is the convolution of the first density with the second. The sum of three variables has mean 0. The density shown in the figure at right has been rescaled by , so that its standard deviation is 1. This density is even smoother than the preceding one. The lumps can hardly be detected in this figure.

Probability density function of the sum of four terms

Finally, we compute the density of the sum of four independent variables, each having the above density. The density of the sum is the convolution of the first density with the third (or the second density with itself). The sum of four variables has mean 0. The density shown in the figure at right has been rescaled by , so that its standard deviation is 1. This density appears qualitatively very similar to a normal density. No lumps can be distinguished by the eye.

Illustration of the discrete case

This section illustrates the central limit theorem via an example for which the computation can be done quickly by hand on paper, unlike the more computing-intensive example of the previous section.

Original probability mass function

Suppose the probability distribution of a

discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

''X'' puts equal weights on 1, 2, and 3: :

X=\left\{\begin{matrix} 1 & \mbox{with}\ \mbox{probability}\ 1/3, \\
2 & \mbox{with}\ \mbox{probability}\ 1/3, \\
3 & \mbox{with}\ \mbox{probability}\ 1/3.
\end{matrix}\right.

The probability mass function of the random variable ''X'' may be depicted by the following

bar graph A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is ...

: Clearly this looks nothing like the bell-shaped curve of the normal distribution. Contrast the above with the depictions below.

Probability mass function of the sum of two terms

Now consider the sum of two independent copies of ''X'': :

\left\{\begin{matrix}
1+1 & = & 2 \\
1+2 & = & 3 \\
1+3 & = & 4 \\
2+1 & = & 3 \\
2+2 & = & 4 \\
2+3 & = & 5 \\
3+1 & = & 4 \\
3+2 & = & 5 \\
3+3 & = & 6
\end{matrix}\right\}
=\left\{\begin{matrix}
2 & \mbox{with}\ \mbox{probability}\ 1/9 \\
3 & \mbox{with}\ \mbox{probability}\ 2/9 \\
4 & \mbox{with}\ \mbox{probability}\ 3/9 \\
5 & \mbox{with}\ \mbox{probability}\ 2/9 \\
6 & \mbox{with}\ \mbox{probability}\ 1/9
\end{matrix}\right\}

The probability mass function of this sum may be depicted thus: This still does not look very much like the bell-shaped curve, but, like the bell-shaped curve and unlike the probability mass function of ''X'' itself, it is higher in the middle than in the two tails. {{clear

Probability mass function of the sum of three terms

Now consider the sum of ''three'' independent copies of this random variable: :

\left\{\begin{matrix}
1+1+1 & = & 3 \\
1+1+2 & = & 4 \\
1+1+3 & = & 5 \\
1+2+1 & = & 4 \\
1+2+2 & = & 5 \\
1+2+3 & = & 6 \\
1+3+1 & = & 5 \\
1+3+2 & = & 6 \\
1+3+3 & = & 7 \\
2+1+1 & = & 4 \\
2+1+2 & = & 5 \\
2+1+3 & = & 6 \\
2+2+1 & = & 5 \\
2+2+2 & = & 6 \\
2+2+3 & = & 7 \\
2+3+1 & = & 6 \\
2+3+2 & = & 7 \\
2+3+3 & = & 8 \\
3+1+1 & = & 5 \\
3+1+2 & = & 6 \\
3+1+3 & = & 7 \\
3+2+1 & = & 6 \\
3+2+2 & = & 7 \\
3+2+3 & = & 8 \\
3+3+1 & = & 7 \\
3+3+2 & = & 8 \\
3+3+3 & = & 9 
\end{matrix}\right\}
=\left\{\begin{matrix}
3 & \mbox{with}\ \mbox{probability}\ 1/27 \\
4 & \mbox{with}\ \mbox{probability}\ 3/27 \\
5 & \mbox{with}\ \mbox{probability}\ 6/27 \\
6 & \mbox{with}\ \mbox{probability}\ 7/27 \\
7 & \mbox{with}\ \mbox{probability}\ 6/27 \\
8 & \mbox{with}\ \mbox{probability}\ 3/27 \\
9 & \mbox{with}\ \mbox{probability}\ 1/27
\end{matrix}\right\}

The probability mass function of this sum may be depicted thus: Not only is this bigger at the center than it is at the tails, but as one moves toward the center from either tail, the slope first increases and then decreases, just as with the bell-shaped curve. The degree of its resemblance to the bell-shaped curve can be quantified as follows. Consider :Pr(''X''₁ + ''X''₂ + ''X''₃ ≤ 7) = 1/27 + 3/27 + 6/27 + 7/27 + 6/27 = 23/27 = 0.85185... . How close is this to what a

normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...

approximation would give? It can readily be seen that the expected value of ''Y'' = ''X''₁ + ''X''₂ + ''X''₃ is 6 and the standard deviation of ''Y'' is the

square root of 2 The square root of 2 (approximately 1.4142) is a positive real number that, when multiplied by itself, equals the number 2. It may be written in mathematics as \sqrt or 2^, and is an algebraic number. Technically, it should be called the princip ...

. Since ''Y'' ≤ 7 (weak inequality) if and only if ''Y'' < 8 (strict inequality), we use a

continuity correction In probability theory, a continuity correction is an adjustment that is made when a discrete distribution is approximated by a continuous distribution. Examples Binomial If a random variable ''X'' has a binomial distribution with parameters ' ...

and seek :

\mbox{Pr}(Y\leq 7.5)
=\mbox{P}\left({Y-6 \over \sqrt{2\leq{7.5-6 \over \sqrt{2\right)
=\mbox{Pr}(Z\leq 1.0606602\dots) = 0.85558\dots

where ''Z'' has a standard normal distribution. The difference between 0.85185... and 0.85558... seems remarkably small when it is considered that the number of independent random variables that were added was only three.

Probability mass function of the sum of 1,000 terms

The following image shows the result of a simulation based on the example presented in this page. The extraction from the uniform distribution is repeated 1,000 times, and the results are summed. Since the simulation is based on the

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...

, the process is repeated 10,000 times. The results shows that the distribution of the sum of 1,000 uniform extractions resembles the bell-shaped curve very well.

External links

Uniform summation at MathworldAnimated examples of the CLTGeneral Dynamic SOCR CLT Activity
* [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_GeneralCentralLimitTheorem The SOCR CLT activity provides hands-on demonstration of the theory and applications of this limit theorem]. *
A music video demonstrating the central limit theorem with a Galton board
by Carl McTague Central limit theorem