This article is supplemental for “

Convergence of random variables In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...

” and provides proofs for selected results. Several results will be established using the portmanteau lemma: A sequence converges in distribution to ''X'' if and only if any of the following conditions are met:

E 'f''(''X_n'')→ E 'f''(''X'')for all
bounded Boundedness or bounded may refer to: Economics * Bounded rationality, the idea that human rationality in decision-making is bounded by the available information, the cognitive limitations, and the time available to make the decision * Bounded e ...
, continuous functions ''f'';
E 'f''(''X_n'')→ E 'f''(''X'')for all bounded, Lipschitz functions ''f'';
limsup ≤ Pr(''X'' ∈ ''C'') for all
closed set In geometry, topology, and related branches of mathematics, a closed set is a set whose complement is an open set. In a topological space, a closed set can be defined as a set which contains all its limit points. In a complete metric spac ...
s ''C'';

Convergence almost surely implies convergence in probability

X_n\ \overset\mathrm\rightarrow\ X  \quad\Rightarrow\quad  X_n\ \overset\rightarrow\ X

Proof: If converges to ''X'' almost surely, it means that the set of points has measure zero; denote this set ''O''. Now fix ε > 0 and consider a sequence of sets :

A_n = \bigcup_ \left \

This sequence of sets is decreasing: ''A''_''n'' ⊇ ''A''_''n''+1 ⊇ ..., and it decreases towards the set :

A_ = \bigcap_ A_n.

For this decreasing sequence of events, their probabilities are also a decreasing sequence, and it decreases towards the Pr(''A''_∞); we shall show now that this number is equal to zero. Now any point ω in the complement of ''O'' is such that lim ''X_n''(ω) = ''X''(ω), which implies that , ''X_n''(ω) − ''X''(ω), < ε for all ''n'' greater than a certain number ''N''. Therefore, for all ''n'' ≥ ''N'' the point ω will not belong to the set ''A_n'', and consequently it will not belong to ''A''_∞. This means that ''A''_∞ is disjoint with ''O'', or equivalently, ''A''_∞ is a subset of ''O'' and therefore Pr(''A''_∞) = 0. Finally, by continuity from above, :

\operatorname\left(, X_n-X, >\varepsilon\right) \leq \operatorname(A_n) \ \underset 0,

which by definition means that ''X_n'' converges in probability to ''X''.

Convergence in probability does not imply almost sure convergence in the discrete case

If ''X_n'' are independent random variables assuming value one with probability 1/''n'' and zero otherwise, then ''X_n'' converges to zero in probability but not almost surely. This can be verified using the

Borel–Cantelli lemma In probability theory, the Borel–Cantelli lemma is a theorem about sequences of events. In general, it is a result in measure theory. It is named after Émile Borel and Francesco Paolo Cantelli, who gave statement to the lemma in the first ...

Convergence in probability implies convergence in distribution

X_n\ \xrightarrow\ X \quad\Rightarrow\quad X_n\ \xrightarrow\ X,

Proof for the case of scalar random variables

Lemma. Let ''X'', ''Y'' be random variables, let ''a'' be a real number and ε > 0. Then :

\operatorname(Y \leq a) \leq \operatorname(X\leq a+\varepsilon) + \operatorname(, Y - X,  > \varepsilon).

Proof of lemma: :

\begin
\operatorname(Y\leq a) &= \operatorname(Y\leq a,\ X\leq a+\varepsilon) + \operatorname(Y\leq a,\ X>a+\varepsilon) \\
      &\leq \operatorname(X\leq a+\varepsilon) + \operatorname(Y-X\leq a-X,\ a-X<-\varepsilon) \\
      &\leq \operatorname(X\leq a+\varepsilon) + \operatorname(Y-X<-\varepsilon) \\
      &\leq \operatorname(X\leq a+\varepsilon) + \operatorname(Y-X<-\varepsilon) + \operatorname(Y-X>\varepsilon)\\
      &= \operatorname(X\leq a+\varepsilon) + \operatorname(, Y-X, >\varepsilon)
  \end

Shorter proof of the lemma: We have :

\begin
\\subset\\cup \
\end

for if

Y\leq a

and

, Y-X, \leq \varepsilon

, then

X\leq a+\varepsilon

. Hence by the union bound, :

\begin
\operatorname(Y\leq a) \leq \operatorname(X \leq a + \varepsilon) + \operatorname(, Y-X, >\varepsilon).
\end

Proof of the theorem: Recall that in order to prove convergence in distribution, one must show that the sequence of cumulative distribution functions converges to the ''F_X'' at every point where ''F_X'' is continuous. Let ''a'' be such a point. For every ε > 0, due to the preceding lemma, we have: :

\begin
\operatorname(X_n\leq a) &\leq \operatorname(X\leq a+\varepsilon) + \operatorname(, X_n-X, >\varepsilon) \\
\operatorname(X\leq a-\varepsilon)&\leq \operatorname(X_n\leq a) + \operatorname(, X_n-X, >\varepsilon)
\end

So, we have :

\operatorname(X\leq a-\varepsilon) - \operatorname \left (\left , X_n-X \right , >\varepsilon \right ) \leq \operatorname(X_n\leq a) \leq \operatorname(X\leq a+\varepsilon) + \operatorname \left (\left , X_n-X \right , >\varepsilon \right ).

Taking the limit as ''n'' → ∞, we obtain: :

F_X(a-\varepsilon) \leq \lim_ \operatorname(X_n\leq a) \leq F_X(a+\varepsilon),

where ''F_X''(''a'') = Pr(''X'' ≤ ''a'') is the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

of ''X''. This function is continuous at ''a'' by assumption, and therefore both ''F_X''(''a''−ε) and ''F_X''(''a''+ε) converge to ''F_X''(''a'') as ε → 0⁺. Taking this limit, we obtain :

\lim_ \operatorname(X_n \leq a) = \operatorname(X \leq a),

which means that converges to ''X'' in distribution.

Proof for the generic case

The implication follows for when ''X_n'' is a random vector by using this property proved later on this page and by taking ''Y_n = X''.

Convergence in distribution to a constant implies convergence in probability

X_n\ \xrightarrow\ c \quad\Rightarrow\quad X_n\ \xrightarrow\ c,

provided ''c'' is a constant. Proof: Fix ε > 0. Let ''B''_ε(''c'') be the

open ball In mathematics, a ball is the solid figure bounded by a ''sphere''; it is also called a solid sphere. It may be a closed ball (including the boundary points that constitute the sphere) or an open ball (excluding them). These concepts are def ...

of radius ε around point ''c'', and ''B''_ε(''c'')^''c'' its complement. Then :

\operatorname\left(, X_n-c, \geq\varepsilon\right) = \operatorname\left(X_n\in B_\varepsilon(c)^c\right).

By the portmanteau lemma (part C), if ''X_n'' converges in distribution to ''c'', then the

limsup In mathematics, the limit inferior and limit superior of a sequence can be thought of as limiting (that is, eventual and extreme) bounds on the sequence. They can be thought of in a similar fashion for a Function (mathematics), function (see limi ...

of the latter probability must be less than or equal to Pr(''c'' ∈ ''B''_ε(''c'')^''c''), which is obviously equal to zero. Therefore, :

\begin
\lim_\operatorname\left( \left , X_n-c \right , \geq\varepsilon\right) &\leq \limsup_\operatorname\left( \left , X_n-c \right ,  \geq \varepsilon \right) \\
&= \limsup_\operatorname\left(X_n\in B_\varepsilon(c)^c\right) \\
&\leq \operatorname\left(c\in B_\varepsilon(c)^c\right) = 0
\end

which by definition means that ''X_n'' converges to ''c'' in probability.

Convergence in probability to a sequence converging in distribution implies convergence to the same distribution

, Y_n-X_n, \ \xrightarrow\ 0,\ \ X_n\ \xrightarrow\ X\  \quad\Rightarrow\quad  Y_n\ \xrightarrow\ X

Proof: We will prove this theorem using the portmanteau lemma, part B. As required in that lemma, consider any bounded function ''f'' (i.e. , ''f''(''x''), ≤ ''M'') which is also Lipschitz: :

\exists K >0, \forall x,y: \quad , f(x)-f(y), \leq K, x-y, .

Take some ε > 0 and majorize the expression , E 'f''(''Y_n'')− E 'f''(''X_n'') as :

\\ &\leq K \varepsilon \operatorname \left (\left , Y_n-X_n \right , <\varepsilon\right) + 2M \operatorname \left( \left , Y_n-X_n \right , \geq\varepsilon\right )\\ &\leq K \varepsilon + 2M \operatorname \left (\left , Y_n-X_n \right , \geq\varepsilon \right ) \end

(here 1 denotes the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x ...

; the expectation of the indicator function is equal to the probability of corresponding event). Therefore, :

. \end

If we take the limit in this expression as ''n'' → ∞, the second term will go to zero since converges to zero in probability; and the third term will also converge to zero, by the portmanteau lemma and the fact that ''X_n'' converges to ''X'' in distribution. Thus :

\ \leq K\varepsilon.

Since ε was arbitrary, we conclude that the limit must in fact be equal to zero, and therefore E 'f''(''Y_n'')→ E 'f''(''X'') which again by the portmanteau lemma implies that converges to ''X'' in distribution. QED.

Convergence of one sequence in distribution and another to a constant implies joint convergence in distribution

X_n\ \xrightarrow\ X,\ \ Y_n\ \xrightarrow\ c\ \quad\Rightarrow\quad (X_n,Y_n)\ \xrightarrow\ (X,c)

provided ''c'' is a constant. Proof: We will prove this statement using the portmanteau lemma, part A. First we want to show that (''X_n'', ''c'') converges in distribution to (''X'', ''c''). By the portmanteau lemma this will be true if we can show that E 'f''(''X_n'', ''c'')→ E 'f''(''X'', ''c'')for any bounded continuous function ''f''(''x'', ''y''). So let ''f'' be such arbitrary bounded continuous function. Now consider the function of a single variable ''g''(''x'') := ''f''(''x'', ''c''). This will obviously be also bounded and continuous, and therefore by the portmanteau lemma for sequence converging in distribution to ''X'', we will have that E 'g''(''X_n'')→ E 'g''(''X'') However the latter expression is equivalent to “E 'f''(''X_n'', ''c'')→ E 'f''(''X'', ''c'')��, and therefore we now know that (''X_n'', ''c'') converges in distribution to (''X'', ''c''). Secondly, consider , (''X_n'', ''Y_n'') − (''X_n'', ''c''), = , ''Y_n'' − ''c'', . This expression converges in probability to zero because ''Y_n'' converges in probability to ''c''. Thus we have demonstrated two facts: :

\begin
    \left,  (X_n, Y_n) - (X_n,c) \\ \xrightarrow\ 0, \\
    (X_n,c)\ \xrightarrow\ (X,c).
  \end

By the property proved earlier, these two facts imply that (''X_n'', ''Y_n'') converge in distribution to (''X'', ''c'').

Convergence of two sequences in probability implies joint convergence in probability

X_n\ \xrightarrow\ X,\ \ Y_n\ \xrightarrow\ Y\ \quad\Rightarrow\quad (X_n,Y_n)\ \xrightarrow\ (X,Y)

Proof: :

\begin
\operatorname\left(\left, (X_n,Y_n)-(X,Y)\\geq\varepsilon\right) &\leq \operatorname\left(, X_n-X,  + , Y_n-Y, \geq\varepsilon\right) \\
&\leq\operatorname\left(, X_n-X, \geq\varepsilon/2\right) + \operatorname\left(, Y_n-Y, \geq\varepsilon/2\right)
\end

where the last step follows by the pigeonhole principle and the sub-additivity of the probability measure. Each of the probabilities on the right-hand side converge to zero as ''n'' → ∞ by definition of the convergence of and in probability to ''X'' and ''Y'' respectively. Taking the limit we conclude that the left-hand side also converges to zero, and therefore the sequence converges in probability to .

References

* {{DEFAULTSORT:Proofs Of Convergence Of Random Variables Article proofs Statistical randomness