HOME

TheInfoList



OR:

In
directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. ...
, the von Mises–Fisher distribution (named after
Richard von Mises Richard Martin Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of ...
and
Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
), is a
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on the (p-1)-
sphere A sphere (from Ancient Greek, Greek , ) is a surface (mathematics), surface analogous to the circle, a curve. In solid geometry, a sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
in \mathbb^. If p=2 the distribution reduces to the
von Mises distribution In probability theory and directional statistics, the Richard von Mises, von Mises distribution (also known as the circular normal distribution or the Andrey Nikolayevich Tikhonov, Tikhonov distribution) is a continuous probability distribution ...
on the
circle A circle is a shape consisting of all point (geometry), points in a plane (mathematics), plane that are at a given distance from a given point, the Centre (geometry), centre. The distance between any point of the circle and the centre is cal ...
.


Definition

The probability density function of the von Mises–Fisher distribution for the random ''p''-dimensional unit vector \mathbf is given by: :f_(\mathbf; \boldsymbol, \kappa) = C_(\kappa) \exp \left( \right), where \kappa \ge 0, \left \Vert \boldsymbol \right \Vert = 1 and the normalization constant C_(\kappa) is equal to : C_(\kappa)=\frac , where I_ denotes the modified
Bessel function Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary complex ...
of the first kind at order v. If p = 3, the normalization constant reduces to : C_(\kappa) = \frac = \frac . The parameters \boldsymbol and \kappa are called the ''mean direction'' and '' concentration parameter'', respectively. The greater the value of \kappa, the higher the concentration of the distribution around the mean direction \boldsymbol. The distribution is
unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal p ...
for \kappa > 0, and is uniform on the sphere for \kappa = 0. The von Mises–Fisher distribution for p=3 is also called the Fisher distribution. It was first used to model the interaction of
electric dipole The electric dipole moment is a measure of the separation of positive and negative electrical charges within a system: that is, a measure of the system's overall polarity. The SI unit for electric dipole moment is the coulomb-metre (C⋅m). The ...
s in an
electric field An electric field (sometimes called E-field) is a field (physics), physical field that surrounds electrically charged particles such as electrons. In classical electromagnetism, the electric field of a single charge (or group of charges) descri ...
. Other applications are found in
geology Geology (). is a branch of natural science concerned with the Earth and other astronomical objects, the rocks of which they are composed, and the processes by which they change over time. Modern geology significantly overlaps all other Earth ...
,
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, and
text mining Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
.


Support

The support of the Von Mises–Fisher distribution is the ''hypersphere'', or more specifically, the (p-1)-sphere, denoted as : \mathbb S^ = \left\ This is a (p-1)-dimensional
manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...
embedded in p-dimensional Euclidean space, \mathbb^p.


Note on the normalization constant

In the textbook, ''Directional Statistics'' by Mardia and Jupp, the normalization constant given for the Von Mises Fisher (VMF) probability density is apparently different from the one given here: C_(\kappa). In that book, for \text(\boldsymbol\mu,\kappa) the normalization constant is specified as: : C^*_(\kappa)=\frac where \Gamma is the
gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...
. This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified with respect to scaled Hausdorff measure, \bar H^, which gives the surface area of the whole ''(p-1)''-sphere as: : H^_\lambda(\mathbb S^) = \frac, the reciprocal of which gives the (constant) density of the uniform distribution, \text(\boldsymbol\mu,\kappa=0), as: :C_(0)=\frac It then follows that: : C^*_(\kappa) = \frac While the value for C_(0) was derived above via the surface area, the same result may be obtained by setting \kappa=0 in the above formula for C_(\kappa). This can be done by noting that the series expansion for I_(\kappa) divided by \kappa^ has but one non-zero term at \kappa=0. (To evaluate that term, one needs to use the
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
0^0=1.) For further understanding of density functions on the hypersphere, see: .


Relation to normal distribution

Starting from a
multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...
with
isotropic In physics and geometry, isotropy () is uniformity in all orientations. Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence '' anisotropy''. ''Anisotropy'' is also ...
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...
\kappa^\mathbf and mean \boldsymbol of length r>0, whose density function is: :\mathcal N_(\mathbf; \boldsymbol, \kappa) = \left(\sqrt\right)^p \exp\left( -\kappa \frac \right), the Von Mises–Fisher distribution is obtained by conditioning on \left\, \mathbf\right\, =1. By expanding :(\mathbf-\boldsymbol)^\mathsf(\mathbf-\boldsymbol) = \mathbf^\mathsf\mathbf + \boldsymbol^\mathsf\boldsymbol - 2\boldsymbol^\mathsf \mathbf=1+r^2-2\boldsymbol\mu^\mathsf\mathbf x, the Von Mises-Fisher density, f_(\mathbf; r^\boldsymbol, r\kappa)\propto e^ is recovered by recomputing the normalization constant by integrating \mathbf over the unit sphere. If \boldsymbol\mu=\boldsymbol0, we get the uniform distribution, with (contstant) density f_(\mathbf; \tilde\boldsymbol\mu, 0), where \tilde\boldsymbol\mu\in\mathbb S^ is arbitrary. More succinctly, the restriction of any isotropic multivariate normal density to the unit hypersphere gives a Von Mises-Fisher density, up to normalization. See also: * This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case restricting to \left\, \mathbf\right\, =1 gives the Fisher-Bingham distribution. * Restriction is not to be confused with
projection Projection or projections may refer to: Physics * Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction * The display of images by a projector Optics, graphics, and carto ...
. If \mathbf z\sim\mathcal N_p(\boldsymbol\mu,\boldsymbol\Sigma), which we project onto the unitsphere: \mathbf x=\lVert\mathbf z\rVert^\mathbf z, we get the projected normal distribution. (Informally, restriction can be thought of as
rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type o ...
with an infinite sampling budget, where we keep only those \mathbf z that land on the unitsphere, while with projection we use all samples.)


Estimation of parameters


Mean direction

A series of ''N''
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...
unit vector In mathematics, a unit vector in a normed vector space is a Vector (mathematics and physics), vector (often a vector (geometry), spatial vector) of Norm (mathematics), length 1. A unit vector is often denoted by a lowercase letter with a circumfle ...
s x_i are drawn from a von Mises–Fisher distribution. The
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
estimates of the mean direction \mu is simply the normalized
arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
, a
sufficient statistic In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...
: :\mu = \bar/\bar, \text \bar = \frac\sum_i^N x_i, \text \bar = \, \bar\, ,


Concentration parameter

Use the modified Bessel function of the first kind to define : A_(\kappa) = \frac . Then: :\kappa = A_p^(\bar) . Thus \kappa is the solution to :A_p(\kappa) = \frac = \bar . A simple approximation to \kappa is (Sra, 2011) :\hat = \frac , A more accurate inversion can be obtained by iterating the Newton method a few times :\hat_1 = \hat - \frac , :\hat_2 = \hat_1 - \frac .


Standard error

For ''N'' ≥ 25, the estimated spherical
standard error The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...
of the sample mean direction can be computed as: :\hat = \left(\frac\right)^ where :d = 1 - \frac \sum_i^N \left(\mu^Tx_i\right)^2 It is then possible to approximate a 100(1-\alpha)\% a spherical confidence interval (a ''confidence cone'') about \mu with semi-vertical angle: :q = \arcsin\left(e_\alpha^\hat\right), where e_\alpha = -\ln(\alpha). For example, for a 95% confidence cone, \alpha = 0.05, e_\alpha = -\ln(0.05) = 2.996, and thus q = \arcsin(1.731\hat).


Expected value

The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by A_p(\kappa) as defined above. For a Von Mises–Fisher distribution with mean direction \boldsymbol and concentration \kappa>0, the expected value is: :A_p(\kappa)\boldsymbol. For \kappa=0, the expected value is at the origin. For finite \kappa>0, the length of the expected value is strictly between zero and one and is a monotonic rising function of \kappa. The empirical mean ( arithmetic average) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.


Entropy and KL divergence

The expected value can be used to compute
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy (a measure of average surprisal) of a random variable, to continu ...
and
KL divergence KL, kL, kl, or kl. may refer to: Businesses and organizations * KLM, a Dutch airline (IATA airline designator KL) * Koninklijke Landmacht, the Royal Netherlands Army * Kvenna Listin ("Women's List"), a political party in Iceland * KL FM, a Ma ...
. The differential entropy of \text(\boldsymbol, \kappa) is: : \bigl\langle -\log f_(\mathbf; \boldsymbol, \kappa)\bigr\rangle_ =-\log f_(A_p(\kappa)\boldsymbol; \boldsymbol, \kappa) = -\log C_p(\kappa) -\kappa A_p(\kappa) where the angle brackets denote expectation. Notice that the entropy is a function of \kappa only. The KL divergence between \text(\boldsymbol, \kappa_0) and \text(\boldsymbol, \kappa_1) is: : \Bigl\langle \log \frac \Bigr\rangle_ =\log \frac


Transformation

Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let \mathbf be a p-by-p
orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identi ...
. Let \mathbf\sim\text(\boldsymbol\mu,\kappa) and apply the invertible linear transform: \mathbf=\mathbf. The inverse transform is \mathbf=\mathbf, because the inverse of an orthogonal matrix is its
transpose In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...
: \mathbf^=\mathbf'. The Jacobian of the transform is \mathbf, for which the absolute value of its
determinant In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...
is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that: :\mathbf\sim\text(\mathbf\boldsymbol,\kappa). One may verify that since \boldsymbol and \mathbf are unit vectors, then by the orthogonality, so are \mathbf\boldsymbol and \mathbf.


Pseudo-random number generation


General case

An algorithm for drawing pseudo-random samples from the Von Mises Fisher (VMF) distribution was given by Ulrich and later corrected by Wood. An implementation in R is given by Hornik and Grün; and a fast Python implementation is described by Pinzón and Jung. To simulate from a VMF distribution on the (p-1)-dimensional unitsphere, S^, with mean direction \boldsymbol\in S^, these algorithms use the following radial-tangential decomposition for a point \mathbf\in S^\subset\mathbb^p : : \mathbf = t\boldsymbol+\sqrt\mathbf where \mathbf\in\mathbb^p lives in the tangential (p-2)-dimensional unit-subsphere that is centered at and perpendicular to \boldsymbol; while t\in 1,1/math>. To draw a sample \mathbf from a VMF with parameters \boldsymbol and \kappa, \mathbf must be drawn from the uniform distribution on the tangential subsphere; and the radial component, t, must be drawn independently from the distribution with density: : f_\text(t;\kappa,p)=\frac e^(1-t^2)^ where \nu=\frac2-1. The normalization constant for this density may be verified by using: : I_\nu(\kappa) = \frac \int_^e^(1-t^2)^\,dt as given in Appendix 1 (A.3) in ''Directional Statistics''. Drawing the t samples from this density by using a
rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type o ...
algorithm is explained in the above references. To draw the uniform \mathbf samples perpendicular to \boldsymbol, see the algorithm in, or otherwise a Householder transform can be used as explained in Algorithm 1 in.


3-D sphere

To generate a Von Mises–Fisher distributed pseudo-random spherical 3-D unit vector \mathbf X_ on the S^
sphere A sphere (from Ancient Greek, Greek , ) is a surface (mathematics), surface analogous to the circle, a curve. In solid geometry, a sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
for a given \mu and \kappa, define \mathbf X_ = , \theta, \phi/math> where \theta is the polar angle, \phi the azimuthal angle, and r=1 the distance to the center of the sphere for \mathbf \mu = ,(.),1/math> the pseudo-random triplet is then given by \mathbf X_ = , \arccos W, V/math> where V is sampled from the
continuous uniform distribution In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...
U(a,b) with lower bound a and upper bound b V \sim U(0, 2\pi) and W = \cos \theta = 1+ \frac (\ln\xi+\ln(1- \frac e^)) where \xi is sampled from the standard continuous uniform distribution U(0,1) \xi \sim U(0, 1) here, Wshould be set to W = 1 when \mathbf \xi=0 and \mathbf X_ rotated to match any other desired \mu.


Distribution of polar angle

For p = 3, the angle θ between \mathbf and \boldsymbol satisfies \cos\theta=\boldsymbol^\mathsf \mathbf. It has the distribution :p(\theta)=\int d^2x f(x; \boldsymbol, \kappa)\, \delta\left(\theta-\text(\boldsymbol^\mathsf \mathbf)\right), which can be easily evaluated as :p(\theta)=2\pi C_3(\kappa)\,\sin\theta\, e^. For the general case, p\ge2, the distribution for the cosine of this angle: : \cos\theta = t = \boldsymbol^\mathsf \mathbf is given by f_\text(t;\kappa,p), as explained
above Above may refer to: *Above (artist) Tavar Zawacki (b. 1981, California) is a Polish, Portuguese - American abstract artist and internationally recognized visual artist based in Berlin, Germany. From 1996 to 2016, he created work under the ...
.


The uniform hypersphere distribution

See also: . When \kappa=0, the Von Mises–Fisher distribution, \text(\boldsymbol,\kappa) simplifies to the uniform distribution on \mathbb S^\subset\mathbb^p. The density is constant with value C_p(0). Pseudo-random samples can be generated by generating samples in \mathbb^p from the standard multivariate normal distribution, followed by normalization to unit norm.


Component marginal of uniform distribution

For 1\le i\le p, let x_i be any component of \mathbf\in \mathbb S^. The marginal distribution for x_i has the density: : f_i(x_i;p) = f_\text(x_i;\kappa=0,p)=\frac where B(\alpha,\beta) is the
beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...
. This distribution may be better understood by highlighting its relation to the
beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...
: : \begin x_i^2&\sim\text\bigl(\frac12,\frac2\bigr) &&\text& \frac&\sim\text\bigl(\frac2,\frac2\bigr) \end where the Legendre duplication formula is useful to understand the relationships between the normalization constants of the various densities above. Note that the components of \mathbf\in \mathbb S^ are ''not'' independent, so that the uniform density is not the product of the marginal densities; and \mathbf cannot be assembled by independent sampling of the components.


Distribution of dot-products

In
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, especially in
image classification Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form o ...
, to-be-classified inputs (e.g. images) are often compared using
cosine similarity In data analysis, cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between the vectors; that is, it is the dot product of the vectors divided ...
, which is the
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
between intermediate representations in the form of unitvectors (termed ''embeddings''). The dimensionality is typically high, with p at least several hundreds. The
deep neural network Deep learning is a subset of machine learning that focuses on utilizing multilayered neural network (machine learning), neural networks to perform tasks such as Statistical classification, classification, Regression analysis, regression, and re ...
s that extract embeddings for classification should learn to spread the classes as far apart as possible and ideally this should give classes that are uniformly distributed on \mathbb S^. For a better statistical understanding of ''across-class cosine similarity'', the distribution of dot-products between unitvectors independently sampled from the uniform distribution may be helpful. Let \mathbf,\mathbf\in \mathbb S^ be unitvectors in \mathbb^p, independently sampled from the uniform distribution. Define: : \begin t&=\mathbf'\mathbf\in 1,1 & r&=\frac\in ,1 & s&=\text(r) =\log\frac \in\mathbb \end where t is the dot-product and r,s are transformed versions of it. Then the distribution for t is the same as the ''marginal component distribution'' given
above Above may refer to: *Above (artist) Tavar Zawacki (b. 1981, California) is a Polish, Portuguese - American abstract artist and internationally recognized visual artist based in Berlin, Germany. From 1996 to 2016, he created work under the ...
; the distribution for r is symmetric beta and the distribution for s is symmetric logistic-beta: : \begin r&\sim \text\bigl(\frac2,\frac2\bigr), & s&\sim B_\sigma\bigl(\frac2,\frac2\bigr) \end The means and variances are: : \begin E =0, & E =\frac12, & E =0, \end and : \begin \text =\frac1p, & \text =\frac1, & \text =2\psi'\bigl(\frac2\bigr)\approx\frac4 \end where \psi'=\psi^ is the first
polygamma function In mathematics, the polygamma function of order is a meromorphic function on the complex numbers \mathbb defined as the th derivative of the logarithm of the gamma function: :\psi^(z) := \frac \psi(z) = \frac \ln\Gamma(z). Thus :\psi^(z) ...
. The variances decrease, the distributions of all three variables become more Gaussian, and the final
approximation An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ...
gets better as the dimensionality, p, is increased.


Generalizations


Matrix Von Mises-Fisher

The matrix von Mises-Fisher distribution (also known as matrix Langevin distribution) has the density :f_(\mathbf; \mathbf) \propto \exp(\operatorname(\mathbf^\mathsf\mathbf)) supported on the
Stiefel manifold In mathematics, the Stiefel manifold V_k(\R^n) is the set of all orthonormal ''k''-frames in \R^n. That is, it is the set of ordered orthonormal ''k''-tuples of vectors in \R^n. It is named after Swiss mathematician Eduard Stiefel. Likewise one ...
of n \times p
orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal unit vectors. A unit vector means that the vector has a length of 1, which is also known as normalized. Orthogonal means that the vectors are all perpe ...
p-frames \mathbf, where \mathbf is an arbitrary n \times p real matrix.


Saw distributions

Ulrich, in designing an algorithm for sampling from the VMF distribution, makes use of a family of distributions named after and explored by John G. Saw. A Saw distribution is a distribution on the (p-1)-sphere, S^, with modal vector \boldsymbol\in S^ and concentration \kappa\ge0, and of which the density function has the form: : f_\text(\mathbf;\boldsymbol\mu,\kappa) = \frac where g is a non-negative, increasing function; and where K_P(\kappa) is the normalization constant. The above-mentioned ''radial-tangential decomposition'' generalizes to the Saw family and the radial component, t=\mathbf x'\boldsymbol\mu has the density: : f_\text(t;\kappa)=\frac\frac. where B is the beta function. Also notice that the left-hand factor of the radial density is the surface area of S^. By setting g(\kappa\mathbf x'\boldsymbol\mu)=e^, one recovers the VMF distribution.


Weighted Rademacher Distribution

The definition of the Von Mises–Fisher distribution can be extended to include also the case where p=1, so that the support is the 0-dimensional hypersphere, which when embedded into 1-dimensional Euclidean space is the discrete set, \. The mean direction is \mu\in\ and the concentration is \kappa\ge0. The probability mass function, for x\in\ is: : f_1(x\mid\mu,\kappa) = \frac = \sigma(2\kappa\mu x) where \sigma(z)=1/(1+e^) is the logistic sigmoid. The expected value is \mu\,\text(\kappa). In the uniform case, at \kappa=0, this distribution degenerates to the Rademacher distribution.


See also

* Kent distribution, a related distribution on the two-dimensional unit sphere *
von Mises distribution In probability theory and directional statistics, the Richard von Mises, von Mises distribution (also known as the circular normal distribution or the Andrey Nikolayevich Tikhonov, Tikhonov distribution) is a continuous probability distribution ...
, von Mises–Fisher distribution where ''p'' = 2, the one-dimensional unit circle * Bivariate von Mises distribution *
Directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. ...


References


Further reading

* Dhillon, I., Sra, S. (2003) "Modeling Data using Directional Distributions". Tech. rep., University of Texas, Austin. * Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). "Clustering on the unit hypersphere using von Mises-Fisher distributions". Journal of Machine Learning Research, 6(Sep), 1345-1382. * {{DEFAULTSORT:Von Mises-Fisher distribution Directional statistics Multivariate continuous distributions Exponential family distributions Continuous distributions