The Cauchy distribution, named after

Augustin-Louis Cauchy Baron Augustin-Louis Cauchy ( , , ; ; 21 August 1789 – 23 May 1857) was a French mathematician, engineer, and physicist. He was one of the first to rigorously state and prove the key theorems of calculus (thereby creating real a ...

, is a

continuous probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

. It is also known, especially among

physicist A physicist is a scientist who specializes in the field of physics, which encompasses the interactions of matter and energy at all length and time scales in the physical universe. Physicists generally are interested in the root or ultimate cau ...

s, as the Lorentz distribution (after

Hendrik Lorentz Hendrik Antoon Lorentz ( ; ; 18 July 1853 – 4 February 1928) was a Dutch theoretical physicist who shared the 1902 Nobel Prize in Physics with Pieter Zeeman for their discovery and theoretical explanation of the Zeeman effect. He derive ...

), Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution

f(x; x_0,\gamma)

is the distribution of the -intercept of a ray issuing from

(x_0,\gamma)

with a uniformly distributed angle. It is also the distribution of the

ratio In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...

of two independent

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

random variables with mean zero. The Cauchy distribution is often used in statistics as the canonical example of a "

pathological Pathology is the study of disease. The word ''pathology'' also refers to the study of disease in general, incorporating a wide range of biology research fields and medical practices. However, when used in the context of modern medical treatme ...

" distribution since both its

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

and its

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

are undefined (but see below). The Cauchy distribution does not have finite moments of order greater than or equal to one; only fractional absolute moments exist., Chapter 16. The Cauchy distribution has no moment generating function. In

mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

, it is closely related to the

Poisson kernel In mathematics, and specifically in potential theory, the Poisson kernel is an integral kernel, used for solving the two-dimensional Laplace equation, given Dirichlet boundary conditions on the unit disk. The kernel can be understood as the deriv ...

, which is the

fundamental solution In mathematics, a fundamental solution for a linear partial differential operator is a formulation in the language of distribution theory of the older idea of a Green's function (although unlike Green's functions, fundamental solutions do not ...

for the

Laplace equation In mathematics and physics, Laplace's equation is a second-order partial differential equation named after Pierre-Simon Laplace, who first studied its properties in 1786. This is often written as \nabla^2\! f = 0 or \Delta f = 0, where \Delt ...

in the

upper half-plane In mathematics, the upper half-plane, is the set of points in the Cartesian plane with The lower half-plane is the set of points with instead. Arbitrary oriented half-planes can be obtained via a planar rotation. Half-planes are an example ...

. It is one of the few

stable distribution In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be st ...

s with a probability density function that can be expressed analytically, the others being the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

and the

Lévy distribution In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is k ...

Definitions

Here are the most important constructions.

Rotational symmetry

If one stands in front of a line and kicks a ball with at a uniformly distributed random angle towards the line, then the distribution of the point where the ball hits the line is a Cauchy distribution. For example, consider a point at

(x_0, \gamma)

in the x-y plane, and select a line passing through the point, with its direction (angle with the

x

-axis) chosen uniformly (between −180° and 0°) at random. The intersection of the line with the x-axis follows a Cauchy distribution with location

x_0

and scale

\gamma

. This definition gives a simple way to sample from the standard Cauchy distribution. Let

u

be a sample from a uniform distribution from

,1 /math>, then we can generate a sample, x from the standard Cauchy distribution using x = \tan\left(\pi(u-\tfrac)\right) When U and V are two independent

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

s with

0 and

1, then the ratio

U/V

has the standard Cauchy distribution. More generally, if

(U, V)

is a rotationally symmetric distribution on the plane, then the ratio

U/V

has the standard Cauchy distribution.

Probability density function (PDF)

The Cauchy distribution is the probability distribution with the following

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

(PDF)

f(x; x_0,\gamma) = \frac =  \left \right

where

x_0

is the

location parameter In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter x_0, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distr ...

, specifying the location of the peak of the distribution, and

\gamma

is the

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family ...

which specifies the half-width at half-maximum (HWHM), alternatively

2\gamma

full width at half maximum In a distribution, full width at half maximum (FWHM) is the difference between the two values of the independent variable at which the dependent variable is equal to half of its maximum value. In other words, it is the width of a spectrum curve ...

(FWHM).

\gamma

is also equal to half the

interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...

and is sometimes called the

probable error In statistics, probable error defines the half-range of an interval about a central point for the distribution, such that half of the values from the distribution will lie within the interval and half outside.Dodge, Y. (2006) ''The Oxford Dictiona ...

. This function is also known as a

Lorentzian function The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...

, and an example of a nascent delta function, and therefore approaches a

Dirac delta function In mathematical analysis, the Dirac delta function (or distribution), also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line ...

in the limit as

\gamma \to 0

exploited such a density function in 1827 with an

infinitesimal In mathematics, an infinitesimal number is a non-zero quantity that is closer to 0 than any non-zero real number is. The word ''infinitesimal'' comes from a 17th-century Modern Latin coinage ''infinitesimus'', which originally referred to the " ...

scale parameter, defining this

Properties of PDF

The maximum value or amplitude of the Cauchy PDF is

\frac

, located at

x=x_0

. It is sometimes convenient to express the PDF in terms of the complex parameter

\psi= x_0 + i\gamma

f(x;\psi)=\frac\,\textrm\left(\frac\right)=\frac\,\textrm\left(\frac\right)

The special case when

x_0 = 0

and

\gamma = 1

is called the standard Cauchy distribution with the probability density function

f(x; 0,1) = \frac.

In physics, a three-parameter Lorentzian function is often used:

f(x; x_0,\gamma,I) = \frac = I \left \frac \right

where

I

is the height of the peak. The three-parameter Lorentzian function indicated is not, in general, a probability density function, since it does not integrate to 1, except in the special case where

I = \frac.\!

Cumulative distribution function (CDF)

The Cauchy distribution is the probability distribution with the following

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

(CDF):

F(x; x_0,\gamma)=\frac \arctan\left(\frac\right)+\frac

and the

quantile function In probability and statistics, the quantile function is a function Q: ,1\mapsto \mathbb which maps some probability x \in ,1/math> of a random variable v to the value of the variable y such that P(v\leq y) = x according to its probability distr ...

(inverse cdf) of the Cauchy distribution is

Q(p; x_0,\gamma) = x_0 + \gamma\,\tan\left pi\left(p-\tfrac\right)\right

It follows that the first and third quartiles are

(x_0 - \gamma, x_0 + \gamma)

, and hence the

2\gamma

. For the standard distribution, the cumulative distribution function simplifies to arctangent function

\arctan(x)

F(x; 0,1)=\frac \arctan\left(x\right)+\frac

Other constructions

The standard Cauchy distribution is the Student's ''t''-distribution with one degree of freedom, and so it may be constructed by any method that constructs the Student's t-distribution. If

\Sigma

is a

p\times p

positive-semidefinite covariance matrix with strictly positive diagonal entries, then for

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

X,Y\sim N(0,\Sigma)

and any random

p

-vector

w

independent of

X

and

Y

such that

w_1+\cdots+w_p=1

and

w_i\geq 0, i=1,\ldots,p,

(defining a

categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can ...

) it holds that

\sum_^p w_j\frac\sim\mathrm(0,1).

Properties

The Cauchy distribution is an example of a distribution which has no

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

or higher moments defined. Its mode and

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

are well defined and are both equal to

x_0

. The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly

stable A stable is a building in which working animals are kept, especially horses or oxen. The building is usually divided into stalls, and may include storage for equipment and feed. Styles There are many different types of stables in use tod ...

distribution. Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is closed under

linear transformations In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...

with real coefficients. In addition, the family of Cauchy-distributed random variables is closed under linear fractional transformations with real coefficients. In this connection, see also McCullagh's parametrization of the Cauchy distributions.

Sum of Cauchy-distributed random variables

X_1, X_2, \ldots, X_n

are an IID sample from the standard Cauchy distribution, then their

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

\bar X = \frac 1 n \sum_i X_i

is also standard Cauchy distributed. In particular, the average does not converge to the mean, and so the standard Cauchy distribution does not follow the law of large numbers. This can be proved by repeated integration with the PDF, or more conveniently, by using the

characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function \mathbf_A\colon X \to \, which for a given subset ''A'' of ''X'', has value 1 at points ...

of the standard Cauchy distribution (see below):

\varphi_X(t) = \operatorname\left^ \right =  e^.

With this, we have

\varphi_(t) = e^

, and so

\bar X

has a standard Cauchy distribution. More generally, if

X_1, X_2, \ldots, X_n

are independent and Cauchy distributed with location parameters

x_1, \ldots, x_n

and scales

\gamma_1, \ldots, \gamma_n

, and

a_1, \ldots, a_n

are real numbers, then

\sum_i a_i X_i

is Cauchy distributed with location

\sum_i a_i x_i

and scale

\sum_i , a_i,  \gamma_i

. We see that there is no law of large numbers for any weighted sum of independent Cauchy distributions. This shows that the condition of finite variance in the

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is characteristic of all

s, of which the Cauchy distribution is a special case.

Central limit theorem

X_1, X_2, \ldots

are an IID sample with PDF

\rho

such that

\lim_ \frac \int_^c x^2 \rho(x) \, dx = \frac

is finite, but nonzero, then

\frac 1n \sum_^n X_i

converges in distribution to a Cauchy distribution with scale

\gamma

Characteristic function

Let

X

denote a Cauchy distributed random variable. The

of the Cauchy distribution is given by

=\int_^\infty f(x;x_0,\gamma)e^\,dx = e^.

which is just the

Fourier transform In mathematics, the Fourier transform (FT) is an integral transform that takes a function as input then outputs another function that describes the extent to which various frequencies are present in the original function. The output of the tr ...

of the probability density. The original probability density may be expressed in terms of the characteristic function, essentially by using the inverse Fourier transform:

f(x; x_0,\gamma) = \frac\int_^\infty \varphi_X(t;x_0,\gamma)e^ \, dt \!

The ''n''th moment of a distribution is the ''n''th derivative of the characteristic function evaluated at

t=0

. Observe that the characteristic function is not

differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...

at the origin: this corresponds to the fact that the Cauchy distribution does not have well-defined moments higher than the zeroth moment.

Kullback–Leibler divergence

The

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...

between two Cauchy distributions has the following symmetric closed-form formula:

\mathrm\left(p_: p_\right) = \log \frac.

Any

f-divergence In probability theory, an f-divergence is a certain type of function D_f(P\, Q) that measures the difference between two probability distributions P and Q. Many common divergences, such as KL-divergence, Hellinger distance, and total variation ...

between two Cauchy distributions is symmetric and can be expressed as a function of the chi-squared divergence. Closed-form expression for the

total variation In mathematics, the total variation identifies several slightly different concepts, related to the (local property, local or global) structure of the codomain of a Function (mathematics), function or a measure (mathematics), measure. For a real ...

Jensen–Shannon divergence In probability theory and statistics, the Jensen–Shannon divergence, named after Johan Jensen and Claude Shannon, is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or to ...

Hellinger distance In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hell ...

, etc. are available.

Entropy

The entropy of the Cauchy distribution is given by:

& =\log(4\pi\gamma) \end

The derivative of the

, the quantile density function, for the Cauchy distribution is:

Q'(p; \gamma) = \gamma \pi \, \sec^2\left pi\left(p - \tfrac\right)\right

The

differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy (a measure of average surprisal) of a random variable, to continu ...

of a distribution can be defined in terms of its quantile density, specifically:

H(\gamma) = \int_0^1 \log\,(Q'(p; \gamma))\,\mathrm dp = \log(4\pi\gamma)

The Cauchy distribution is the

maximum entropy probability distribution In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, ...

for a random variate

X

for which

= \log 4

Moments

The Cauchy distribution is usually used as an illustrative counterexample in elementary probability courses, as a distribution with no well-defined (or "indefinite") moments.

Sample moments

If we take an IID sample

X_1, X_2, \ldots

from the standard Cauchy distribution, then the sequence of their sample mean is

S_n = \frac \sum_^n X_i

, which also has the standard Cauchy distribution. Consequently, no matter how many terms we take, the sample average does not converge. Similarly, the sample variance

V_n = \frac \sum_^n ^2

also does not converge. Sample mean and variance of IID samples from a standard Cauchy distribution

A typical trajectory of

S_1, S_2, ...

looks like long periods of slow convergence to zero, punctuated by large jumps away from zero, but never getting too far away. A typical trajectory of

V_1, V_2, ...

looks similar, but the jumps accumulate faster than the decay, diverging to infinity. These two kinds of trajectories are plotted in the figure. Moments of sample lower than order 1 would converge to zero. Moments of sample higher than order 2 would diverge to infinity even faster than sample variance.

Mean

If a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

has a

density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

f(x)

, then the mean, if it exists, is given by We may evaluate this two-sided

improper integral In mathematical analysis, an improper integral is an extension of the notion of a definite integral to cases that violate the usual assumptions for that kind of integral. In the context of Riemann integrals (or, equivalently, Darboux integral ...

by computing the sum of two one-sided improper integrals. That is, for an arbitrary real number

a

. For the integral to exist (even as an infinite value), at least one of the terms in this sum should be finite, or both should be infinite and have the same sign. But in the case of the Cauchy distribution, both the terms in this sum () are infinite and have opposite sign. Hence () is undefined, and thus so is the mean. When the mean of a probability distribution function (PDF) is undefined, no one can compute a reliable average over the experimental data points, regardless of the sample's size. Note that the

Cauchy principal value In mathematics, the Cauchy principal value, named after Augustin-Louis Cauchy, is a method for assigning values to certain improper integrals which would otherwise be undefined. In this method, a singularity on an integral interval is avoided by ...

of the mean of the Cauchy distribution is

\lim_\int_^a x f(x)\,dx

which is zero. On the other hand, the related integral

\lim_\int_^a x f(x)\,dx

is ''not'' zero, as can be seen by computing the integral. This again shows that the mean () cannot exist. Various results in probability theory about

s, such as the

strong law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

, fail to hold for the Cauchy distribution.

Smaller moments

The absolute moments for

p\in(-1,1)

are defined. For

X\sim\mathrm(0,\gamma)

we have

\operatorname raw moment s do exist and have a value of infinity, for example, the raw second moment: \begin
\operatorname^2 & \propto \int_^\infty \frac\,dx = \int_^\infty 1 - \frac\,dx \\ pt & = \int_^\infty dx - \int_^\infty \frac\,dx = \int_^\infty dx-\pi = \infty.
\end By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1).  Higher even-powered raw moments will also evaluate to infinity.  Odd-powered raw moments, however, are undefined, which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to \infty - \infty since the two halves of the integral both diverge and have opposite signs.  The first raw moment is the mean, which, being odd, does not exist. (See also the discussion above about this.) This in turn means that all of the

central moment In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random ...

s and

standardized moment In probability theory and statistics, a standardized moment of a probability distribution is a moment (often a higher degree central moment) that is normalized, typically by a power of the standard deviation, rendering the moment scale invariant ...

s are undefined since they are all based on the mean. The variance—which is the second central moment—is likewise non-existent (despite the fact that the raw second moment exists with the value infinity). The results for higher moments follow from

Hölder's inequality In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality (mathematics), inequality between Lebesgue integration, integrals and an indispensable tool for the study of Lp space, spaces. The numbers an ...

, which implies that higher moments (or halves of moments) diverge if lower ones do.

Moments of truncated distributions

Consider the

truncated distribution In statistics, a truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. Truncated distributions arise in practical statistics in cases where the ability to record, or ...

defined by restricting the standard Cauchy distribution to the interval . Such a truncated distribution has all moments (and the central limit theorem applies for i.i.d. observations from it); yet for almost all practical purposes it behaves like a Cauchy distribution.

Transformation properties

*If

X \sim \operatorname(x_0,\gamma)

then

kX + \ell \sim \textrm(x_0 k+\ell, \gamma , k, )

*If

X \sim \operatorname(x_0, \gamma_0)

and

Y \sim \operatorname(x_1,\gamma_1)

are independent, then

X+Y \sim \operatorname(x_0+x_1,\gamma_0 +\gamma_1)

and

X-Y \sim \operatorname(x_0-x_1, \gamma_0+\gamma_1)

*If

X \sim \operatorname(0,\gamma)

then

\tfrac \sim \operatorname(0, \tfrac)

* McCullagh's parametrization of the Cauchy distributions: McCullagh, P.
"Conditional inference and Cauchy models"
''

Biometrika ''Biometrika'' is a peer-reviewed scientific journal published by Oxford University Press for the Biometrika Trust. The editor-in-chief is Paul Fearnhead (Lancaster University). The principal focus of this journal is theoretical statistics. It was ...

'', volume 79 (1992), pages 247–259
PDF
from McCullagh's homepage. Expressing a Cauchy distribution in terms of one complex parameter

\psi = x_0+i\gamma

, define

X \sim \operatorname(\psi)

to mean

X \sim \operatorname(x_0,, \gamma, )

. If

X \sim \operatorname(\psi)

then:

\frac \sim \operatorname\left(\frac\right)

where

a

b

c

and

d

are real numbers. * Using the same convention as above, if

X \sim \operatorname(\psi)

then:

\frac \sim \operatorname\left(\frac\right)

where

\operatorname

is the circular Cauchy distribution.

Statistical inference

Estimation of parameters

Because the parameters of the Cauchy distribution do not correspond to a mean and variance, attempting to estimate the parameters of the Cauchy distribution by using a sample mean and a sample variance will not succeed. For example, if an i.i.d. sample of size ''n'' is taken from a Cauchy distribution, one may calculate the sample mean as:

\bar=\frac 1 n \sum_^n x_i

Although the sample values

x_i

will be concentrated about the central value

x_0

, the sample mean will become increasingly variable as more observations are taken, because of the increased probability of encountering sample points with a large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the observations themselves; i.e., the sample mean of a large sample is no better (or worse) an estimator of

x_0

than any single observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more observations are taken. Therefore, more robust means of estimating the central value

x_0

and the scaling parameter

\gamma

are needed. One simple method is to take the median value of the sample as an estimator of

x_0

and half the sample

as an estimator of

\gamma

. Other, more precise and robust methods have been developed. For example, the

truncated mean A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, a ...

of the middle 24% of the sample

order statistics In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Important ...

produces an estimate for

x_0

that is more efficient than using either the sample median or the full sample mean. However, because of the fat tails of the Cauchy distribution, the efficiency of the estimator decreases if more than 24% of the sample is used.

Maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

can also be used to estimate the parameters

x_0

and

\gamma

. However, this tends to be complicated by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that represent local maxima. Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively inefficient for small samples. The log-likelihood function for the Cauchy distribution for sample size

n

is:

\hat\ell(x_1,\dotsc,x_n \mid \!x_0,\gamma ) = - n \log (\gamma \pi) - \sum_^n \log \left(1 + \left(\frac\right)^2\right)

Maximizing the log likelihood function with respect to

x_0

and

\gamma

by taking the first derivative produces the following system of equations:

\frac =  \sum_^n \frac =0

\frac = \sum_^n \frac - \frac = 0

Note that

\sum_^n \frac

is a monotone function in

\gamma

and that the solution

\gamma

must satisfy

\min , x_i-x_0, \le \gamma\le \max , x_i-x_0, .

Solving just for

x_0

requires solving a polynomial of degree

2n-1

, and solving just for

\,\!\gamma

requires solving a polynomial of degree

2n

. Therefore, whether solving for one parameter or for both parameters simultaneously, a numerical solution on a computer is typically required. The benefit of maximum likelihood estimation is asymptotic efficiency; estimating

x_0

using the sample median is only about 81% as asymptotically efficient as estimating

x_0

by maximum likelihood. The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically efficient an estimator of

x_0

as the maximum likelihood estimate. When

Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...

is used to find the solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for

x_0

. The shape can be estimated using the median of absolute values, since for location 0 Cauchy variables

X\sim\mathrm(0,\gamma)

, the

\operatorname(, X, ) = \gamma

the shape parameter.

Related distributions

General

\operatorname(0,1) \sim \textrm(\mathrm=1)\,

Student's ''t'' distribution *

\operatorname(\mu,\sigma) \sim \textrm_(\mu,\sigma)\,

non-standardized Student's ''t'' distribution *If

X, Y \sim \textrm(0,1)\, X, Y

independent, then

\tfrac X Y\sim \textrm(0,1)\,

*If

X \sim \textrm(0,1)\,

then

\tan \left( \pi \left(X-\tfrac\right) \right) \sim \textrm(0,1)\,

*If

X \sim \operatorname(0, 1)

then

\ln(X) \sim \textrm(0, 1)

*If

X \sim \operatorname(x_0,\gamma)

then

\tfrac1X \sim \operatorname\left(\tfrac,\tfrac\right)

*The Cauchy distribution is a limiting case of a

Pearson distribution The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History The Pearson syste ...

of type 4 *The Cauchy distribution is a special case of a

of type 7. *The Cauchy distribution is a

: if

X \sim \textrm(1, 0, \gamma, \mu)

, then

X \sim \operatorname(\mu, \gamma)

. *The Cauchy distribution is a singular limit of a

hyperbolic distribution The hyperbolic distribution is a continuous probability distribution characterized by the logarithm of the probability density function being a hyperbola. Thus the distribution decreases exponentially, which is more slowly than the normal distrib ...

*The wrapped Cauchy distribution, taking values on a circle, is derived from the Cauchy distribution by wrapping it around the circle. *If

X \sim \textrm(0,1)

Z \sim \operatorname(1/2, s^2/2)

, then

Y = \mu + X \sqrt Z \sim \operatorname(\mu,s)

. For half-Cauchy distributions, the relation holds by setting

X \sim \textrm(0,1) I\

Lévy measure

The Cauchy distribution is the

of index 1. The Lévy–Khintchine representation of such a stable distribution of parameter

\gamma

is given, for

X \sim \operatorname(\gamma, 0, 0)\,

by:

\operatorname\left( e^ \right) = \exp\left( \int_ (e^ - 1) \Pi_\gamma(dy) \right)

where

\Pi_\gamma(dy) = \left( c_  \frac 1_ + c_  \frac 1_ \right) \, dy

and

c_, c_

can be expressed explicitly. In the case

\gamma = 1

of the Cauchy distribution, one has

c_ = c_

. This last representation is a consequence of the formula

\pi , x,  = \operatorname\int_ (1 - e^) \, \frac

Multivariate Cauchy distribution

random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

X=(X_1, \ldots, X_k)^T

is said to have the multivariate Cauchy distribution if every linear combination of its components

Y=a_1X_1+ \cdots + a_kX_k

has a Cauchy distribution. That is, for any constant vector

a\in \mathbb R^k

, the random variable

Y=a^TX

should have a univariate Cauchy distribution. The characteristic function of a multivariate Cauchy distribution is given by:

\varphi_X(t) =  e^, \!

where

x_0(t)

and

\gamma(t)

are real functions with

x_0(t)

homogeneous function In mathematics, a homogeneous function is a function of several variables such that the following holds: If each of the function's arguments is multiplied by the same scalar (mathematics), scalar, then the function's value is multiplied by some p ...

of degree one and

\gamma(t)

a positive homogeneous function of degree one. More formally:

\begin
x_0(at) &= a x_0(t), \\
\gamma (at) &= , a,  \gamma (t),
\end

for all

t

. An example of a bivariate Cauchy distribution can be given by:

f(x, y; x_0,y_0,\gamma) = \frac \, \frac .

Note that in this example, even though the covariance between

x

and

y

is 0,

x

and

y

are not

statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...

. We also can write this formula for complex variable. Then the probability density function of complex Cauchy is :

f(z; z_0,\gamma) = \frac \,\frac  .

Like how the standard Cauchy distribution is the Student t-distribution with one degree of freedom, the multidimensional Cauchy density is the multivariate Student distribution with one degree of freedom. The density of a

k

dimension Student distribution with one degree of freedom is:

f(\mathbf; \boldsymbol,\mathbf, k)= \frac .

The properties of multidimensional Cauchy distribution are then special cases of the multivariate Student distribution.

Occurrence and applications

In general

*In

spectroscopy Spectroscopy is the field of study that measures and interprets electromagnetic spectra. In narrower contexts, spectroscopy is the precise study of color as generalized from visible light to all bands of the electromagnetic spectrum. Spectro ...

, the Cauchy distribution describes the shape of

spectral line A spectral line is a weaker or stronger region in an otherwise uniform and continuous spectrum. It may result from emission (electromagnetic radiation), emission or absorption (electromagnetic radiation), absorption of light in a narrow frequency ...

s which are subject to homogeneous broadening in which all atoms interact in the same way with the frequency range contained in the line shape. Many mechanisms cause homogeneous broadening, most notably collision broadening. Lifetime or natural broadening also gives rise to a line shape described by the Cauchy distribution. *Applications of the Cauchy distribution or its transformation can be found in fields working with

exponential growth Exponential growth occurs when a quantity grows as an exponential function of time. The quantity grows at a rate directly proportional to its present size. For example, when it is 3 times as big as it is now, it will be growing 3 times as fast ...

. A 1958 paper by White derived the test statistic for estimators of

\hat

for the equation

x_=\beta_t+\varepsilon_,\beta>1

and where the maximum likelihood estimator is found using ordinary least squares showed the sampling distribution of the statistic is the Cauchy distribution. *The Cauchy distribution is often the distribution of observations for objects that are spinning. The classic reference for this is called the Gull's lighthouse problem and as in the above section as the Breit–Wigner distribution in particle physics. *In

hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and drainage basin sustainability. A practitioner of hydrology is called a hydro ...

the Cauchy distribution is applied to extreme events such as annual maximum one-day rainfalls and river discharges. The blue picture illustrates an example of fitting the Cauchy distribution to ranked monthly maximum one-day rainfalls showing also the 90% confidence belt based on the

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

. The rainfall data are represented by plotting positions as part of the cumulative frequency analysis. *The expression for the imaginary part of complex electrical permittivity, according to the Lorentz model, is a Cauchy distribution. *As an additional distribution to model fat tails in

computational finance Computational finance is a branch of applied computer science that deals with problems of practical interest in finance.Rüdiger U. Seydel, ''Tools for Computational Finance'', Springer; 3rd edition (May 11, 2006) 978-3540279235 Some slightly diff ...

, Cauchy distributions can be used to model VAR (

value at risk Value at risk (VaR) is a measure of the risk of loss of investment/capital. It estimates how much a set of investments might lose (with a given probability), given normal market conditions, in a set time period such as a day. VaR is typically us ...

) producing a much larger probability of extreme risk than

Gaussian Distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

Relativistic Breit–Wigner distribution

nuclear Nuclear may refer to: Physics Relating to the nucleus of the atom: *Nuclear engineering *Nuclear physics *Nuclear power *Nuclear reactor *Nuclear weapon *Nuclear medicine *Radiation therapy *Nuclear warfare Mathematics * Nuclear space *Nuclear ...

and

particle physics Particle physics or high-energy physics is the study of Elementary particle, fundamental particles and fundamental interaction, forces that constitute matter and radiation. The field also studies combinations of elementary particles up to the s ...

, the energy profile of a

resonance Resonance is a phenomenon that occurs when an object or system is subjected to an external force or vibration whose frequency matches a resonant frequency (or resonance frequency) of the system, defined as a frequency that generates a maximu ...

is described by the relativistic Breit–Wigner distribution, while the Cauchy distribution is the (non-relativistic) Breit–Wigner distribution.

History

A function with the form of the density function of the Cauchy distribution was studied geometrically by

Fermat Pierre de Fermat (; ; 17 August 1601 – 12 January 1665) was a French mathematician who is given credit for early developments that led to infinitesimal calculus, including his technique of adequality. In particular, he is recognized for his d ...

in 1659, and later was known as the

witch of Agnesi Witchcraft is the use of magic by a person called a witch. Traditionally, "witchcraft" means the use of magic to inflict supernatural harm or misfortune on others, and this remains the most common and widespread meaning. According to ''Enc ...

, after

Maria Gaetana Agnesi Maria Gaetana Agnesi (16 May 1718 – 9 January 1799) was an Italians, Italian mathematician, philosopher, Theology, theologian, and humanitarianism, humanitarian. She was the first woman to write a mathematics handbook and the list of women in ...

included it as an example in her 1748 calculus textbook. Despite its name, the first explicit analysis of the properties of the Cauchy distribution was published by the French mathematician Poisson in 1824, with Cauchy only becoming associated with it during an academic controversy in 1853.Cauchy and the Witch of Agnesi in ''Statistics on the Table'', S M Stigler Harvard 1999 Chapter 18 Poisson noted that if the mean of observations following such a distribution were taken, the

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

did not converge to any finite number. As such,

Laplace Pierre-Simon, Marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French polymath, a scholar whose work has been instrumental in the fields of physics, astronomy, mathematics, engineering, statistics, and philosophy. He summariz ...

's use of the

with such a distribution was inappropriate, as it assumed a finite mean and variance. Despite this, Poisson did not regard the issue as important, in contrast to Bienaymé, who was to engage Cauchy in a long dispute over the matter.

References

External links

*
Earliest Uses: The entry on Cauchy distribution has some historical information.
*

Ratios of Normal Variables by George Marsaglia
{{DEFAULTSORT:Cauchy Distribution Augustin-Louis Cauchy Continuous distributions Probability distributions with non-finite variance Power laws Stable distributions Location-scale family probability distributions>X, ^p= \gamma^p \mathrm(\pi p/2).

Higher moments

The Cauchy distribution does not have finite moments of any order. Some of the higher raw moments do exist and have a value of infinity, for example, the raw second moment:

& = \int_^\infty dx - \int_^\infty \frac\,dx = \int_^\infty dx-\pi = \infty. \end

By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1). Higher even-powered raw moments will also evaluate to infinity. Odd-powered raw moments, however, are undefined, which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to

\infty - \infty

since the two halves of the integral both diverge and have opposite signs. The first raw moment is the mean, which, being odd, does not exist. (See also the discussion above about this.) This in turn means that all of the

s and

, which implies that higher moments (or halves of moments) diverge if lower ones do.

Moments of truncated distributions

Consider the

Transformation properties

*If

X \sim \operatorname(x_0,\gamma)

then

kX + \ell \sim \textrm(x_0 k+\ell, \gamma , k, )

*If

X \sim \operatorname(x_0, \gamma_0)

and

Y \sim \operatorname(x_1,\gamma_1)

are independent, then

X+Y \sim \operatorname(x_0+x_1,\gamma_0 +\gamma_1)

and

X-Y \sim \operatorname(x_0-x_1, \gamma_0+\gamma_1)

*If

X \sim \operatorname(0,\gamma)

then

\tfrac \sim \operatorname(0, \tfrac)

* McCullagh's parametrization of the Cauchy distributions: McCullagh, P.
"Conditional inference and Cauchy models"
''

'', volume 79 (1992), pages 247–259
PDF
from McCullagh's homepage. Expressing a Cauchy distribution in terms of one complex parameter

\psi = x_0+i\gamma

, define

X \sim \operatorname(\psi)

to mean

X \sim \operatorname(x_0,, \gamma, )

. If

X \sim \operatorname(\psi)

then:

\frac \sim \operatorname\left(\frac\right)

where

a

b

c

and

d

are real numbers. * Using the same convention as above, if

X \sim \operatorname(\psi)

then:

\frac \sim \operatorname\left(\frac\right)

where

\operatorname

is the circular Cauchy distribution.

Statistical inference

Estimation of parameters

\bar=\frac 1 n \sum_^n x_i

Although the sample values

x_i

will be concentrated about the central value

x_0

x_0

x_0

and the scaling parameter

\gamma

are needed. One simple method is to take the median value of the sample as an estimator of

x_0

and half the sample

as an estimator of

\gamma

. Other, more precise and robust methods have been developed. For example, the

of the middle 24% of the sample

produces an estimate for

x_0

can also be used to estimate the parameters

x_0

and

\gamma

n

is:

\hat\ell(x_1,\dotsc,x_n \mid \!x_0,\gamma ) = - n \log (\gamma \pi) - \sum_^n \log \left(1 + \left(\frac\right)^2\right)

Maximizing the log likelihood function with respect to

x_0

and

\gamma

by taking the first derivative produces the following system of equations:

\frac =  \sum_^n \frac =0

\frac = \sum_^n \frac - \frac = 0

Note that

\sum_^n \frac

is a monotone function in

\gamma

and that the solution

\gamma

must satisfy

\min , x_i-x_0, \le \gamma\le \max , x_i-x_0, .

Solving just for

x_0

requires solving a polynomial of degree

2n-1

, and solving just for

\,\!\gamma

requires solving a polynomial of degree

2n

x_0

using the sample median is only about 81% as asymptotically efficient as estimating

x_0

by maximum likelihood. The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically efficient an estimator of

x_0

as the maximum likelihood estimate. When

is used to find the solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for

x_0

. The shape can be estimated using the median of absolute values, since for location 0 Cauchy variables

X\sim\mathrm(0,\gamma)

, the

\operatorname(, X, ) = \gamma

the shape parameter.

Related distributions

General

\operatorname(0,1) \sim \textrm(\mathrm=1)\,

Student's ''t'' distribution *

\operatorname(\mu,\sigma) \sim \textrm_(\mu,\sigma)\,

non-standardized Student's ''t'' distribution *If

X, Y \sim \textrm(0,1)\, X, Y

independent, then

\tfrac X Y\sim \textrm(0,1)\,

*If

X \sim \textrm(0,1)\,

then

\tan \left( \pi \left(X-\tfrac\right) \right) \sim \textrm(0,1)\,

*If

X \sim \operatorname(0, 1)

then

\ln(X) \sim \textrm(0, 1)

*If

X \sim \operatorname(x_0,\gamma)

then

\tfrac1X \sim \operatorname\left(\tfrac,\tfrac\right)

*The Cauchy distribution is a limiting case of a

of type 4 *The Cauchy distribution is a special case of a

of type 7. *The Cauchy distribution is a

: if

X \sim \textrm(1, 0, \gamma, \mu)

, then

X \sim \operatorname(\mu, \gamma)

. *The Cauchy distribution is a singular limit of a

*The wrapped Cauchy distribution, taking values on a circle, is derived from the Cauchy distribution by wrapping it around the circle. *If

X \sim \textrm(0,1)

Z \sim \operatorname(1/2, s^2/2)

, then

Y = \mu + X \sqrt Z \sim \operatorname(\mu,s)

. For half-Cauchy distributions, the relation holds by setting

X \sim \textrm(0,1) I\

Lévy measure

The Cauchy distribution is the

of index 1. The Lévy–Khintchine representation of such a stable distribution of parameter

\gamma

is given, for

X \sim \operatorname(\gamma, 0, 0)\,

by:

\operatorname\left( e^ \right) = \exp\left( \int_ (e^ - 1) \Pi_\gamma(dy) \right)

where

\Pi_\gamma(dy) = \left( c_  \frac 1_ + c_  \frac 1_ \right) \, dy

and

c_, c_

can be expressed explicitly. In the case

\gamma = 1

of the Cauchy distribution, one has

c_ = c_

. This last representation is a consequence of the formula

\pi , x,  = \operatorname\int_ (1 - e^) \, \frac

Multivariate Cauchy distribution

X=(X_1, \ldots, X_k)^T

is said to have the multivariate Cauchy distribution if every linear combination of its components

Y=a_1X_1+ \cdots + a_kX_k

has a Cauchy distribution. That is, for any constant vector

a\in \mathbb R^k

, the random variable

Y=a^TX

should have a univariate Cauchy distribution. The characteristic function of a multivariate Cauchy distribution is given by:

\varphi_X(t) =  e^, \!

where

x_0(t)

and

\gamma(t)

are real functions with

x_0(t)

of degree one and

\gamma(t)

a positive homogeneous function of degree one. More formally:

\begin
x_0(at) &= a x_0(t), \\
\gamma (at) &= , a,  \gamma (t),
\end

for all

t

. An example of a bivariate Cauchy distribution can be given by:

f(x, y; x_0,y_0,\gamma) = \frac \, \frac .

Note that in this example, even though the covariance between

x

and

y

is 0,

x

and

y

are not

. We also can write this formula for complex variable. Then the probability density function of complex Cauchy is :

f(z; z_0,\gamma) = \frac \,\frac  .

k

dimension Student distribution with one degree of freedom is:

f(\mathbf; \boldsymbol,\mathbf, k)= \frac .

The properties of multidimensional Cauchy distribution are then special cases of the multivariate Student distribution.

Occurrence and applications

In general

*In

, the Cauchy distribution describes the shape of

. A 1958 paper by White derived the test statistic for estimators of

\hat

for the equation

x_=\beta_t+\varepsilon_,\beta>1

, Cauchy distributions can be used to model VAR (

) producing a much larger probability of extreme risk than

Relativistic Breit–Wigner distribution

and

, the energy profile of a

is described by the relativistic Breit–Wigner distribution, while the Cauchy distribution is the (non-relativistic) Breit–Wigner distribution.

History

A function with the form of the density function of the Cauchy distribution was studied geometrically by

in 1659, and later was known as the

, after

did not converge to any finite number. As such,

's use of the