Von Mises–Fisher distribution
   HOME

TheInfoList



OR:

In
directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes (lines through the origin in R''n'') or rotations in R''n''. M ...
, the von Mises–Fisher distribution (named after
Richard von Mises Richard Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of Gordo ...
and
Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
), is a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
on the (p-1)-
sphere A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
in \mathbb^. If p=2 the distribution reduces to the
von Mises distribution In probability theory and directional statistics, the von Mises distribution (also known as the circular normal distribution or Tikhonov distribution) is a continuous probability distribution on the circle. It is a close approximation to the wr ...
on the
circle A circle is a shape consisting of all points in a plane that are at a given distance from a given point, the centre. Equivalently, it is the curve traced out by a point that moves in a plane so that its distance from a given point is const ...
.


Definition

The
probability density In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
function of the von Mises–Fisher distribution for the random ''p''-dimensional unit vector \mathbf is given by: :f_(\mathbf; \boldsymbol, \kappa) = C_(\kappa) \exp \left( \right), where \kappa \ge 0, \left \Vert \boldsymbol \right \Vert = 1 and the normalization constant C_(\kappa) is equal to : C_(\kappa)=\frac , where I_ denotes the modified
Bessel function Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary ...
of the first kind at order v. If p = 3, the normalization constant reduces to : C_(\kappa) = \frac = \frac . The parameters \boldsymbol and \kappa are called the ''mean direction'' and ''
concentration parameter In probability theory and statistics, a concentration parameter is a special kind of numerical parameter of a parametric family of probability distributions. Concentration parameters occur in two kinds of distribution: In the Von Mises–Fisher ...
'', respectively. The greater the value of \kappa, the higher the concentration of the distribution around the mean direction \boldsymbol. The distribution is
unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal pr ...
for \kappa > 0, and is uniform on the sphere for \kappa = 0. The von Mises–Fisher distribution for p=3 is also called the Fisher distribution. It was first used to model the interaction of
electric dipole The electric dipole moment is a measure of the separation of positive and negative electrical charges within a system, that is, a measure of the system's overall polarity. The SI unit for electric dipole moment is the coulomb-meter (C⋅m). The d ...
s in an
electric field An electric field (sometimes E-field) is the physical field that surrounds electrically charged particles and exerts force on all other charged particles in the field, either attracting or repelling them. It also refers to the physical field fo ...
. Other applications are found in
geology Geology () is a branch of natural science concerned with Earth and other astronomical objects, the features or rocks of which it is composed, and the processes by which they change over time. Modern geology significantly overlaps all other Ear ...
,
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, and
text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
.


Note on the normalization constant

In the textbook by Mardia and Jupp, the normalization constant given for the Von Mises Fisher probability density is apparently different from the one given here: C_(\kappa). In that book, the normalization constant is specified as: : C^*_(\kappa)=\frac This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified in the usual way, with respect to
Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wit ...
. The density (w.r.t. Lebesgue measure) of the uniform distribution is the reciprocal of the surface area of the (p-1)-sphere, so that the uniform density function is given by the constant: :C_(0)=\frac It then follows that: : C^*_(\kappa) = \frac While the value for C_(0) was derived above via the surface area, the same result may be obtained by setting \kappa=0 in the above formula for C_(\kappa). This can be done by noting that the series expansion for I_(\kappa) divided by \kappa^ has but one non-zero term at \kappa=0. (To evaluate that term, one needs to use the
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
0^0=1.)


Relation to normal distribution

Starting from a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
with
isotropic Isotropy is uniformity in all orientations; it is derived . Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence ''anisotropy''. ''Anisotropy'' is also used to describe ...
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
\kappa^\mathbf and mean \boldsymbol of length r>0, whose density function is: :G_(\mathbf; \boldsymbol, \kappa) = \left(\sqrt\right)^p \exp\left( -\kappa \frac \right), the Von Mises–Fisher distribution is obtained by conditioning on \left\, \mathbf\right\, =1. By expanding :(\mathbf-\boldsymbol)'(\mathbf-\boldsymbol) = \mathbf'\mathbf + \boldsymbol'\boldsymbol - 2\boldsymbol' \mathbf, and using the fact that the first two right-hand-side terms are fixed, the Von Mises-Fisher density, f_(\mathbf; r^\boldsymbol, r\kappa) is recovered by recomputing the normalization constant by integrating \mathbf over the unit sphere. If r=0, we get the uniform distribution, with density f_(\mathbf; \boldsymbol, 0). More succinctly, the
restriction Restriction, restrict or restrictor may refer to: Science and technology * restrict, a keyword in the C programming language used in pointer declarations * Restriction enzyme, a type of enzyme that cleaves genetic material Mathematics and logi ...
of any isotropic multivariate normal density to the unit hypersphere, gives a Von Mises-Fisher density, up to normalization. This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case conditioning on \left\, \mathbf\right\, =1 gives the Fisher-Bingham distribution.


Estimation of parameters


Mean direction

A series of ''N''
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
unit vector In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat"). The term ''direction vecto ...
s x_i are drawn from a von Mises–Fisher distribution. The
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
estimates of the mean direction \mu is simply the normalized
arithmetic mean In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
, a
sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
: :\mu = \bar/\bar, \text \bar = \frac\sum_i^N x_i, \text \bar = \, \bar\, ,


Concentration parameter

Use the modified
Bessel function of the first kind Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary ...
to define : A_(\kappa) = \frac . Then: :\kappa = A_p^(\bar) . Thus \kappa is the solution to :A_p(\kappa) = \frac = \bar . A simple approximation to \kappa is (Sra, 2011) :\hat = \frac , A more accurate inversion can be obtained by iterating the
Newton method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-va ...
a few times :\hat_1 = \hat - \frac , :\hat_2 = \hat_1 - \frac .


Standard error

For ''N'' ≥ 25, the estimated spherical
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the sample mean direction can be computed as: :\hat = \left(\frac\right)^ where :d = 1 - \frac \sum_i^N \left(\mu^Tx_i\right)^2 It is then possible to approximate a 100(1-\alpha)\% a spherical
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
(a ''confidence cone'') about \mu with semi-vertical angle: :q = \arcsin\left(e_\alpha^\hat\right), where e_\alpha = -\ln(\alpha). For example, for a 95% confidence cone, \alpha = 0.05, e_\alpha = -\ln(0.05) = 2.996, and thus q = \arcsin(1.731\hat).


Expected value

The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by A_p(\kappa) as defined above. For a Von Mises–Fisher distribution with mean direction \boldsymbol and concentration \kappa>0, the expected value is: :A_p(\kappa)\boldsymbol. For \kappa=0, the expected value is at the origin. For finite \kappa>0, the length of the expected value, is strictly between zero and one and is a monotonic rising function of \kappa. The empirical mean (
arithmetic average In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.


Entropy and KL divergence

The expected value can be used to compute
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
and
KL divergence KL, kL, kl, or kl. may refer to: Businesses and organizations * KLM, a Dutch airline (IATA airline designator KL) * Koninklijke Landmacht, the Royal Netherlands Army * Kvenna Listin ("Women's List"), a political party in Iceland * KL FM, a Mala ...
. The differential entropy of f_(\mathbf; \boldsymbol, \kappa) is: : -\log f_(A_p(\kappa)\boldsymbol; \boldsymbol, \kappa) = -\log C_p(\kappa) -\kappa A_p(\kappa) . Notice that the entropy is a function of \kappa only. The KL divergence between f_(\mathbf; \boldsymbol, \kappa_0) and f_(\mathbf; \boldsymbol, \kappa_1) is: :\log \frac


Transformation

Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let \mathbf be a p-by-p
orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identity ma ...
. Let \mathbf\sim\text(\boldsymbol\mu,\kappa) and apply the invertible linear transform: \mathbf=\mathbf. The inverse transform is \mathbf=\mathbf, because the inverse of an orthogonal matrix is its
transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...
: \mathbf^=\mathbf'. The Jacobian of the transform is \mathbf, for which the absolute value of its
determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...
is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that: :\mathbf\sim\text(\mathbf\boldsymbol,\kappa). One may verify that since \boldsymbol and \mathbf are unit vectors, then by the orthogonality, so are \mathbf\boldsymbol and \mathbf.


Pseudo-random number generation

To generate a Von Mises–Fisher distributed pseudo-random spherical 3-D unit vector \mathbf X_ on the S^
sphere A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
for a given \mu and \kappa, define \mathbf X_ = phi, \theta , r/math> where \phi is the polar angle, \theta the equatorial angle, and r=1 the distance to the center of the sphere for \mathbf \mu = ,(.),1/math> the pseudo-random vector is then given by \mathbf X_ = arccos W, V , 1/math> where V is sampled from the
continuous uniform distribution In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies betw ...
U(a,b) with lower bound a and upper bound b V \sim U(0, 2\pi) and W = 1+ \frac (\ln\xi+\ln(1- \frac e^)) where \xi is sampled from the standard continuous uniform distribution U(0,1) \xi \sim U(0, 1) here, Wshould be set to W = 1 when \mathbf \xi=0 and \mathbf X_ rotated to match any other desired \mu


Generalizations

The matrix von Mises-Fisher distribution (also known as matrix Langevin distribution) has the density :f_(\mathbf; \mathbf) \propto \exp(\operatorname(\mathbf^\mathsf\mathbf)) supported on the Stiefel manifold of n \times p
orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...
p-frames \mathbf, where \mathbf is an arbitrary n \times p real matrix.


Distribution of polar angle

For p = 3, the angle θ between \mathbf and \boldsymbol satisfies \cos\theta=\boldsymbol^\mathsf \mathbf. It has the distribution :p(\theta)=\int d^2x f(x; \boldsymbol, \kappa)\, \delta\left(\theta-\text(\boldsymbol^\mathsf \mathbf)\right), which can be easily evaluated as :p(\theta)=2\pi C_3(\kappa)\,\sin\theta\, e^.


See also

*
Kent distribution In directional statistics, the Kent distribution, also known as the 5-parameter Fisher–Bingham distribution (named after John T. Kent, Ronald Fisher, and Christopher Bingham), is a probability distribution on the unit sphere (2-sphere ''S''2 in ...
, a related distribution on the two-dimensional unit sphere *
von Mises distribution In probability theory and directional statistics, the von Mises distribution (also known as the circular normal distribution or Tikhonov distribution) is a continuous probability distribution on the circle. It is a close approximation to the wr ...
, von Mises–Fisher distribution where ''p'' = 2, the one-dimensional unit circle *
Bivariate von Mises distribution In probability theory and statistics, the bivariate von Mises distribution is a probability distribution describing values on a torus. It may be thought of as an analogue on the torus of the bivariate normal distribution. The distribution belon ...
*
Directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes (lines through the origin in R''n'') or rotations in R''n''. M ...


References


Further reading

* Dhillon, I., Sra, S. (2003) "Modeling Data using Directional Distributions". Tech. rep., University of Texas, Austin. * Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). "Clustering on the unit hypersphere using von Mises-Fisher distributions". Journal of Machine Learning Research, 6(Sep), 1345-1382. * {{DEFAULTSORT:Von Mises-Fisher distribution Directional statistics Multivariate continuous distributions Exponential family distributions Continuous distributions