In
directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes (lines through the origin in R''n'') or rotations in R''n''. M ...
, the von Mises–Fisher distribution (named after
Richard von Mises
Richard Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of Gordo ...
and
Ronald Fisher
Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
), is a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
on the
-
sphere
A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
in
. If
the distribution reduces to the
von Mises distribution
In probability theory and directional statistics, the von Mises distribution (also known as the circular normal distribution or Tikhonov distribution) is a continuous probability distribution on the circle. It is a close approximation to the wr ...
on the
circle
A circle is a shape consisting of all points in a plane that are at a given distance from a given point, the centre. Equivalently, it is the curve traced out by a point that moves in a plane so that its distance from a given point is const ...
.
Definition
The
probability density
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
function of the von Mises–Fisher distribution for the random ''p''-dimensional unit vector
is given by:
:
where
and
the normalization constant
is equal to
:
where
denotes the modified
Bessel function
Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation
x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0
for an arbitrary ...
of the first kind at order
. If
, the normalization constant reduces to
:
The parameters
and
are called the ''mean direction'' and ''
concentration parameter
In probability theory and statistics, a concentration parameter is a special kind of numerical parameter of a parametric family of probability distributions. Concentration parameters occur in two kinds of distribution: In the Von Mises–Fisher ...
'', respectively. The greater the value of
, the higher the concentration of the distribution around the mean direction
. The distribution is
unimodal
In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object.
Unimodal probability distribution
In statistics, a unimodal pr ...
for
, and is uniform on the sphere for
.
The von Mises–Fisher distribution for
is also called the Fisher distribution.
It was first used to model the interaction of
electric dipole
The electric dipole moment is a measure of the separation of positive and negative electrical charges within a system, that is, a measure of the system's overall polarity. The SI unit for electric dipole moment is the coulomb-meter (C⋅m). The d ...
s in an
electric field
An electric field (sometimes E-field) is the physical field that surrounds electrically charged particles and exerts force on all other charged particles in the field, either attracting or repelling them. It also refers to the physical field fo ...
.
Other applications are found in
geology
Geology () is a branch of natural science concerned with Earth and other astronomical objects, the features or rocks of which it is composed, and the processes by which they change over time. Modern geology significantly overlaps all other Ear ...
,
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, and
text mining
Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
.
Note on the normalization constant
In the textbook by
Mardia and Jupp,
the normalization constant given for the Von Mises Fisher probability density is apparently different from the one given here:
. In that book, the normalization constant is specified as:
:
This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified in the usual way, with respect to
Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wit ...
. The density (w.r.t. Lebesgue measure) of the uniform distribution is the reciprocal of the
surface area of the (p-1)-sphere, so that the uniform density function is given by the constant:
:
It then follows that:
:
While the value for
was derived above via the surface area, the same result may be obtained by setting
in the above formula for
. This can be done by noting that the
series expansion for divided by
has but one non-zero term at
. (To evaluate that term, one needs to use the
definition
A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
.)
Relation to normal distribution
Starting from a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
with
isotropic
Isotropy is uniformity in all orientations; it is derived . Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence ''anisotropy''. ''Anisotropy'' is also used to describe ...
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
and mean
of length
, whose density function is:
:
the Von Mises–Fisher distribution is obtained by conditioning on
. By expanding
:
and using the fact that the first two right-hand-side terms are fixed, the Von Mises-Fisher density,
is recovered by recomputing the normalization constant by integrating
over the unit sphere. If
, we get the uniform distribution, with density
.
More succinctly, the
restriction
Restriction, restrict or restrictor may refer to:
Science and technology
* restrict, a keyword in the C programming language used in pointer declarations
* Restriction enzyme, a type of enzyme that cleaves genetic material
Mathematics and logi ...
of any isotropic multivariate normal density to the unit hypersphere, gives a Von Mises-Fisher density, up to normalization.
This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case conditioning on
gives the
Fisher-Bingham distribution.
Estimation of parameters
Mean direction
A series of ''N''
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independ ...
unit vector
In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat").
The term ''direction vecto ...
s
are drawn from a von Mises–Fisher distribution.
The
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
estimates of the mean direction
is simply the normalized
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
, a
sufficient statistic
In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
:
:
Concentration parameter
Use the modified
Bessel function of the first kind
Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation
x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0
for an arbitrary ...
to define
:
Then:
:
Thus
is the solution to
:
A simple approximation to
is (Sra, 2011)
:
A more accurate inversion can be obtained by iterating the
Newton method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-va ...
a few times
:
:
Standard error
For ''N'' ≥ 25, the estimated spherical
standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the sample mean direction can be computed as:
:
where
:
It is then possible to approximate a
a spherical
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
(a ''confidence cone'') about
with semi-vertical angle:
:
where
For example, for a 95% confidence cone,
and thus
Expected value
The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by
as defined above. For a Von Mises–Fisher distribution with mean direction
and concentration
, the expected value is:
:
.
For
, the expected value is at the origin. For finite
, the length of the expected value, is strictly between zero and one and is a monotonic rising function of
.
The empirical mean (
arithmetic average
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.
Entropy and KL divergence
The expected value can be used to compute
differential entropy
Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
and
KL divergence
KL, kL, kl, or kl. may refer to:
Businesses and organizations
* KLM, a Dutch airline (IATA airline designator KL)
* Koninklijke Landmacht, the Royal Netherlands Army
* Kvenna Listin ("Women's List"), a political party in Iceland
* KL FM, a Mala ...
.
The differential entropy of
is:
:
.
Notice that the entropy is a function of
only.
The KL divergence between
and
is:
:
Transformation
Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let
be a
-by-
orthogonal matrix
In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.
One way to express this is
Q^\mathrm Q = Q Q^\mathrm = I,
where is the transpose of and is the identity ma ...
. Let
and apply the invertible linear transform:
. The inverse transform is
, because the inverse of an orthogonal matrix is its
transpose
In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations).
The tr ...
:
. The
Jacobian of the transform is
, for which the absolute value of its
determinant
In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...
is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that:
:
One may verify that since
and
are unit vectors, then by the orthogonality, so are
and
.
Pseudo-random number generation
To generate a Von Mises–Fisher distributed pseudo-random spherical 3-D unit vector
on the
sphere
A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
for a given
and
, define