In
directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. M ...
, the von Mises–Fisher distribution (named after
Richard von Mises and
Ronald Fisher
Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...
), is a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
on the
-
sphere
A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
in
. If
the distribution reduces to the
von Mises distribution
In probability theory and directional statistics, the von Mises distribution (also known as the circular normal distribution or Tikhonov distribution) is a continuous probability distribution on the circle. It is a close approximation to the w ...
on the
circle
A circle is a shape consisting of all points in a plane that are at a given distance from a given point, the centre. Equivalently, it is the curve traced out by a point that moves in a plane so that its distance from a given point is const ...
.
Definition
The
probability density function of the von Mises–Fisher distribution for the random ''p''-dimensional unit vector
is given by:
:
where
and
the normalization constant
is equal to
:
where
denotes the modified
Bessel function
Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation
x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0
for an arbitrary ...
of the first kind at order
. If
, the normalization constant reduces to
:
The parameters
and
are called the ''mean direction'' and ''
concentration parameter'', respectively. The greater the value of
, the higher the concentration of the distribution around the mean direction
. The distribution is
unimodal
In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object.
Unimodal probability distribution
In statistics, a unimodal pr ...
for
, and is uniform on the sphere for
.
The von Mises–Fisher distribution for
is also called the Fisher distribution.
It was first used to model the interaction of
electric dipole
The electric dipole moment is a measure of the separation of positive and negative electrical charges within a system, that is, a measure of the system's overall polarity. The SI unit for electric dipole moment is the coulomb-meter (C⋅m). The d ...
s in an
electric field.
Other applications are found in
geology
Geology () is a branch of natural science concerned with Earth and other astronomical objects, the features or rocks of which it is composed, and the processes by which they change over time. Modern geology significantly overlaps all other Ea ...
,
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
, and
text mining
Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
.
Note on the normalization constant
In the textbook by
Mardia and Jupp,
[ the normalization constant given for the Von Mises Fisher probability density is apparently different from the one given here: . In that book, the normalization constant is specified as:
:
This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified in the usual way, with respect to ]Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides ...
. The density (w.r.t. Lebesgue measure) of the uniform distribution is the reciprocal of the surface area of the (p-1)-sphere, so that the uniform density function is given by the constant:
:
It then follows that:
:
While the value for was derived above via the surface area, the same result may be obtained by setting in the above formula for . This can be done by noting that the series expansion for divided by has but one non-zero term at . (To evaluate that term, one needs to use the definition
A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
.)
Relation to normal distribution
Starting from a normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
with isotropic covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
and mean of length , whose density function is:
:
the Von Mises–Fisher distribution is obtained by conditioning on . By expanding
:
and using the fact that the first two right-hand-side terms are fixed, the Von Mises-Fisher density, is recovered by recomputing the normalization constant by integrating over the unit sphere. If , we get the uniform distribution, with density .
More succinctly, the restriction
Restriction, restrict or restrictor may refer to:
Science and technology
* restrict, a keyword in the C programming language used in pointer declarations
* Restriction enzyme, a type of enzyme that cleaves genetic material
Mathematics and log ...
of any isotropic multivariate normal density to the unit hypersphere, gives a Von Mises-Fisher density, up to normalization.
This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case conditioning on gives the Fisher-Bingham distribution
In directional statistics, the Kent distribution, also known as the 5-parameter Fisher–Bingham distribution (named after John T. Kent, Ronald Fisher, and Christopher Bingham), is a probability distribution on the unit sphere (2-sphere ''S''2 in ...
.
Estimation of parameters
Mean direction
A series of ''N'' independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
unit vector
In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat").
The term ''direction ve ...
s are drawn from a von Mises–Fisher distribution.
The maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
estimates of the mean direction is simply the normalized arithmetic mean
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the '' average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The coll ...
, a sufficient statistic
In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
:[
:
]
Concentration parameter
Use the modified Bessel function of the first kind to define
:
Then:
:
Thus is the solution to
:
A simple approximation to is (Sra, 2011)
:
A more accurate inversion can be obtained by iterating the Newton method a few times
:
:
Standard error
For ''N'' ≥ 25, the estimated spherical standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the sample mean direction can be computed as:
:
where
:
It is then possible to approximate a a spherical confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
(a ''confidence cone'') about with semi-vertical angle:
: where
For example, for a 95% confidence cone, and thus
Expected value
The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by as defined above. For a Von Mises–Fisher distribution with mean direction and concentration , the expected value is:
:.
For , the expected value is at the origin. For finite , the length of the expected value, is strictly between zero and one and is a monotonic rising function of .
The empirical mean ( arithmetic average) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.
Entropy and KL divergence
The expected value can be used to compute differential entropy
Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
and KL divergence.
The differential entropy of is:
: .
Notice that the entropy is a function of only.
The KL divergence between and is:
:
Transformation
Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let be a -by- orthogonal matrix
In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.
One way to express this is
Q^\mathrm Q = Q Q^\mathrm = I,
where is the transpose of and is the identity ...
. Let and apply the invertible linear transform: . The inverse transform is , because the inverse of an orthogonal matrix is its transpose
In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations).
The tr ...
: . The Jacobian
In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to:
*Jacobian matrix and determinant
*Jacobian elliptic functions
*Jacobian variety
*Intermediate Jacobian
In mathematics, the intermediate Jacobian of a compact Kähler m ...
of the transform is , for which the absolute value of its determinant
In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if ...
is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that:
:
One may verify that since and are unit vectors, then by the orthogonality, so are and .
Pseudo-random number generation
To generate a Von Mises–Fisher distributed pseudo-random spherical 3-D unit vector on the sphere
A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
for a given and , define
,(.),1
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math> the pseudo-random vector is then given by
\mathbf X_ = arccos W, V , 1/math>
where V is sampled from the continuous uniform distribution U(a,b) with lower bound a and upper bound b
V \sim U(0, 2\pi)
and
W = 1+ \frac (\ln\xi+\ln(1- \frac e^))
where \xi is sampled from the standard continuous uniform distribution U(0,1)
\xi \sim U(0, 1)
here, Wshould be set to W = 1 when \mathbf \xi=0 and \mathbf X_ rotated to match any other desired \mu
Generalizations
The matrix von Mises-Fisher distribution (also known as matrix Langevin distribution) has the density
:f_(\mathbf; \mathbf) \propto \exp(\operatorname(\mathbf^\mathsf\mathbf))
supported on the Stiefel manifold In mathematics, the Stiefel manifold V_k(\R^n) is the set of all orthonormal ''k''-frames in \R^n. That is, it is the set of ordered orthonormal ''k''-tuples of vectors in \R^n. It is named after Swiss mathematician Eduard Stiefel. Likewise one ...
of n \times p orthonormal
In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...
p-frames \mathbf, where \mathbf is an arbitrary n \times p real matrix.
Distribution of polar angle
For p = 3, the angle θ between \mathbf and \boldsymbol satisfies \cos\theta=\boldsymbol^\mathsf \mathbf. It has the distribution
:p(\theta)=\int d^2x f(x; \boldsymbol, \kappa)\, \delta\left(\theta-\text(\boldsymbol^\mathsf \mathbf)\right),
which can be easily evaluated as
:p(\theta)=2\pi C_3(\kappa)\,\sin\theta\, e^.
See also
* Kent distribution
In directional statistics, the Kent distribution, also known as the 5-parameter Fisher–Bingham distribution (named after John T. Kent, Ronald Fisher, and Christopher Bingham), is a probability distribution on the unit sphere (2-sphere ''S''2 in ...
, a related distribution on the two-dimensional unit sphere
* von Mises distribution
In probability theory and directional statistics, the von Mises distribution (also known as the circular normal distribution or Tikhonov distribution) is a continuous probability distribution on the circle. It is a close approximation to the w ...
, von Mises–Fisher distribution where ''p'' = 2, the one-dimensional unit circle
* Bivariate von Mises distribution
* Directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. M ...
References
Further reading
* Dhillon, I., Sra, S. (2003) "Modeling Data using Directional Distributions". Tech. rep., University of Texas, Austin.
* Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). "Clustering on the unit hypersphere using von Mises-Fisher distributions". Journal of Machine Learning Research, 6(Sep), 1345-1382.
*
{{DEFAULTSORT:Von Mises-Fisher distribution
Directional statistics
Multivariate continuous distributions
Exponential family distributions
Continuous distributions