In
directional statistics
Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. ...
, the von Mises–Fisher distribution (named after
Richard von Mises
Richard Martin Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of ...
and
Ronald Fisher
Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
), is a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on the
-
sphere
A sphere (from Ancient Greek, Greek , ) is a surface (mathematics), surface analogous to the circle, a curve. In solid geometry, a sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
in
. If
the distribution reduces to the
von Mises distribution
In probability theory and directional statistics, the Richard von Mises, von Mises distribution (also known as the circular normal distribution or the Andrey Nikolayevich Tikhonov, Tikhonov distribution) is a continuous probability distribution ...
on the
circle
A circle is a shape consisting of all point (geometry), points in a plane (mathematics), plane that are at a given distance from a given point, the Centre (geometry), centre. The distance between any point of the circle and the centre is cal ...
.
Definition
The
probability density function of the von Mises–Fisher distribution for the random ''p''-dimensional unit vector
is given by:
:
where
and
the normalization constant
is equal to
:
where
denotes the modified
Bessel function
Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation
x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0
for an arbitrary complex ...
of the first kind at order
. If
, the normalization constant reduces to
:
The parameters
and
are called the ''mean direction'' and ''
concentration parameter'', respectively. The greater the value of
, the higher the concentration of the distribution around the mean direction
. The distribution is
unimodal
In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object.
Unimodal probability distribution
In statistics, a unimodal p ...
for
, and is uniform on the sphere for
.
The von Mises–Fisher distribution for
is also called the Fisher distribution.
It was first used to model the interaction of
electric dipole
The electric dipole moment is a measure of the separation of positive and negative electrical charges within a system: that is, a measure of the system's overall polarity. The SI unit for electric dipole moment is the coulomb-metre (C⋅m). The ...
s in an
electric field
An electric field (sometimes called E-field) is a field (physics), physical field that surrounds electrically charged particles such as electrons. In classical electromagnetism, the electric field of a single charge (or group of charges) descri ...
.
Other applications are found in
geology
Geology (). is a branch of natural science concerned with the Earth and other astronomical objects, the rocks of which they are composed, and the processes by which they change over time. Modern geology significantly overlaps all other Earth ...
,
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, and
text mining
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
.
Support
The
support of the Von Mises–Fisher distribution is the ''hypersphere'', or more specifically, the
-sphere, denoted as
:
This is a
-dimensional
manifold
In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...
embedded in
-dimensional Euclidean space,
.
Note on the normalization constant
In the textbook, ''Directional Statistics''
by
Mardia and Jupp, the normalization constant given for the Von Mises Fisher (VMF) probability density is apparently different from the one given here:
. In that book, for
the normalization constant is specified as:
:
where
is the
gamma function
In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...
. This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified with respect to
scaled Hausdorff measure,
, which gives
the surface area of the whole ''(p-1)''-sphere as:
:
the reciprocal of which gives the (constant) density of the uniform distribution,
as:
:
It then follows that:
:
While the value for
was derived above via the surface area, the same result may be obtained by setting
in the above formula for
. This can be done by noting that the
series expansion for divided by
has but one non-zero term at
. (To evaluate that term, one needs to use the
definition
A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
.)
For further understanding of density functions on the hypersphere, see: .
Relation to normal distribution
Starting from a
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...
with
isotropic
In physics and geometry, isotropy () is uniformity in all orientations. Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence '' anisotropy''. ''Anisotropy'' is also ...
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables.
The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...
and mean
of length
, whose density function is:
:
the Von Mises–Fisher distribution is obtained by conditioning on
. By expanding
:
the Von Mises-Fisher density,
is recovered by recomputing the normalization constant by integrating
over the unit sphere. If
, we get the uniform distribution, with (contstant) density
, where
is arbitrary.
More succinctly, the
restriction of any isotropic multivariate normal density to the unit hypersphere gives a Von Mises-Fisher density, up to normalization.
See also:
* This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case restricting to
gives the
Fisher-Bingham distribution.
* Restriction is not to be confused with
projection
Projection or projections may refer to:
Physics
* Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction
* The display of images by a projector
Optics, graphics, and carto ...
. If
, which we project onto the unitsphere:
, we get the
projected normal distribution. (Informally, restriction can be thought of as
rejection sampling
In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type o ...
with an infinite sampling budget, where we keep only those
that land on the unitsphere, while with projection we use all samples.)
Estimation of parameters
Mean direction
A series of ''N''
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in Pennsylvania, United States
* Independentes (English: Independents), a Portuguese artist ...
unit vector
In mathematics, a unit vector in a normed vector space is a Vector (mathematics and physics), vector (often a vector (geometry), spatial vector) of Norm (mathematics), length 1. A unit vector is often denoted by a lowercase letter with a circumfle ...
s
are drawn from a von Mises–Fisher distribution.
The
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
estimates of the mean direction
is simply the normalized
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
, a
sufficient statistic
In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...
:
:
Concentration parameter
Use the modified
Bessel function of the first kind to define
:
Then:
:
Thus
is the solution to
:
A simple approximation to
is (Sra, 2011)
:
A more accurate inversion can be obtained by iterating the
Newton method a few times
:
:
Standard error
For ''N'' ≥ 25, the estimated spherical
standard error
The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...
of the sample mean direction can be computed as:
:
where
:
It is then possible to approximate a
a spherical
confidence interval (a ''confidence cone'') about
with semi-vertical angle:
:
where
For example, for a 95% confidence cone,
and thus
Expected value
The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by
as defined above. For a Von Mises–Fisher distribution with mean direction
and concentration
, the expected value is:
:
.
For
, the expected value is at the origin. For finite
, the length of the expected value is strictly between zero and one and is a monotonic rising function of
.
The empirical mean (
arithmetic average) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.
Entropy and KL divergence
The expected value can be used to compute
differential entropy
Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy (a measure of average surprisal) of a random variable, to continu ...
and
KL divergence
KL, kL, kl, or kl. may refer to:
Businesses and organizations
* KLM, a Dutch airline (IATA airline designator KL)
* Koninklijke Landmacht, the Royal Netherlands Army
* Kvenna Listin ("Women's List"), a political party in Iceland
* KL FM, a Ma ...
.
The differential entropy of
is:
:
where the angle brackets denote expectation. Notice that the entropy is a function of
only.
The KL divergence between
and
is:
:
Transformation
Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let
be a
-by-
orthogonal matrix
In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.
One way to express this is
Q^\mathrm Q = Q Q^\mathrm = I,
where is the transpose of and is the identi ...
. Let
and apply the invertible linear transform:
. The inverse transform is
, because the inverse of an orthogonal matrix is its
transpose
In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...
:
. The
Jacobian of the transform is
, for which the absolute value of its
determinant
In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...
is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that:
:
One may verify that since
and
are unit vectors, then by the orthogonality, so are
and
.
Pseudo-random number generation
General case
An algorithm for drawing pseudo-random samples from the Von Mises Fisher (VMF) distribution was given by Ulrich
and later corrected by Wood.
An implementation in
R is given by Hornik and Grün;
and a fast
Python implementation is described by Pinzón and Jung.
To simulate from a VMF distribution on the
-dimensional
unitsphere,
, with mean direction
, these algorithms use the following radial-tangential decomposition for a point
:
:
where
lives in the tangential
-dimensional unit-subsphere that is centered at and perpendicular to
; while