directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. ...

, the von Mises–Fisher distribution (named after

Richard von Mises Richard Martin Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of ...

and

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...

), is a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

on the

(p-1)

sphere A sphere (from Ancient Greek, Greek , ) is a surface (mathematics), surface analogous to the circle, a curve. In solid geometry, a sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...

\mathbb^

. If

p=2

the distribution reduces to the

von Mises distribution In probability theory and directional statistics, the Richard von Mises, von Mises distribution (also known as the circular normal distribution or the Andrey Nikolayevich Tikhonov, Tikhonov distribution) is a continuous probability distribution ...

on the

circle A circle is a shape consisting of all point (geometry), points in a plane (mathematics), plane that are at a given distance from a given point, the Centre (geometry), centre. The distance between any point of the circle and the centre is cal ...

Definition

The probability density function of the von Mises–Fisher distribution for the random ''p''-dimensional unit vector

\mathbf

is given by: :

f_(\mathbf; \boldsymbol, \kappa) = C_(\kappa) \exp \left(  \right),

where

\kappa \ge 0, \left \Vert \boldsymbol \right \Vert = 1

and the normalization constant

C_(\kappa)

is equal to :

C_(\kappa)=\frac  ,

where

I_

denotes the modified

Bessel function Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary complex ...

of the first kind at order

v

. If

p = 3

, the normalization constant reduces to :

C_(\kappa) = \frac   = \frac  .

The parameters

\boldsymbol

and

\kappa

are called the ''mean direction'' and '' concentration parameter'', respectively. The greater the value of

\kappa

, the higher the concentration of the distribution around the mean direction

\boldsymbol

. The distribution is

unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal p ...

for

\kappa > 0

, and is uniform on the sphere for

\kappa = 0

. The von Mises–Fisher distribution for

p=3

is also called the Fisher distribution. It was first used to model the interaction of

electric dipole The electric dipole moment is a measure of the separation of positive and negative electrical charges within a system: that is, a measure of the system's overall polarity. The SI unit for electric dipole moment is the coulomb-metre (C⋅m). The ...

s in an

electric field An electric field (sometimes called E-field) is a field (physics), physical field that surrounds electrically charged particles such as electrons. In classical electromagnetism, the electric field of a single charge (or group of charges) descri ...

. Other applications are found in

geology Geology (). is a branch of natural science concerned with the Earth and other astronomical objects, the rocks of which they are composed, and the processes by which they change over time. Modern geology significantly overlaps all other Earth ...

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

, and

text mining Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...

Support

The support of the Von Mises–Fisher distribution is the ''hypersphere'', or more specifically, the

(p-1)

-sphere, denoted as :

\mathbb S^ = \left\

This is a

(p-1)

-dimensional

manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...

embedded in

p

-dimensional Euclidean space,

\mathbb^p

Note on the normalization constant

In the textbook, ''Directional Statistics'' by Mardia and Jupp, the normalization constant given for the Von Mises Fisher (VMF) probability density is apparently different from the one given here:

C_(\kappa)

. In that book, for

\text(\boldsymbol\mu,\kappa)

the normalization constant is specified as: :

C^*_(\kappa)=\frac

where

\Gamma

is the

gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...

. This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified with respect to scaled Hausdorff measure,

\bar H^

, which gives the surface area of the whole ''(p-1)''-sphere as: :

H^_\lambda(\mathbb S^) = \frac,

the reciprocal of which gives the (constant) density of the uniform distribution,

\text(\boldsymbol\mu,\kappa=0),

as: :

C_(0)=\frac

It then follows that: :

C^*_(\kappa) = \frac

While the value for

C_(0)

was derived above via the surface area, the same result may be obtained by setting

\kappa=0

in the above formula for

C_(\kappa)

. This can be done by noting that the series expansion for

I_(\kappa)

divided by

\kappa^

has but one non-zero term at

\kappa=0

. (To evaluate that term, one needs to use the

definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...

0^0=1

.) For further understanding of density functions on the hypersphere, see: .

Relation to normal distribution

Starting from a

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...

with

isotropic In physics and geometry, isotropy () is uniformity in all orientations. Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence '' anisotropy''. ''Anisotropy'' is also ...

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...

\kappa^\mathbf

and mean

\boldsymbol

of length

r>0

, whose density function is: :

\mathcal N_(\mathbf; \boldsymbol, \kappa)
= \left(\sqrt\right)^p \exp\left( -\kappa \frac \right),

the Von Mises–Fisher distribution is obtained by conditioning on

\left\, \mathbf\right\, =1

. By expanding :

(\mathbf-\boldsymbol)^\mathsf(\mathbf-\boldsymbol) = \mathbf^\mathsf\mathbf + \boldsymbol^\mathsf\boldsymbol - 2\boldsymbol^\mathsf \mathbf=1+r^2-2\boldsymbol\mu^\mathsf\mathbf x,

the Von Mises-Fisher density,

f_(\mathbf; r^\boldsymbol, r\kappa)\propto e^

is recovered by recomputing the normalization constant by integrating

\mathbf

over the unit sphere. If

\boldsymbol\mu=\boldsymbol0

, we get the uniform distribution, with (contstant) density

f_(\mathbf; \tilde\boldsymbol\mu, 0)

, where

\tilde\boldsymbol\mu\in\mathbb S^

is arbitrary. More succinctly, the restriction of any isotropic multivariate normal density to the unit hypersphere gives a Von Mises-Fisher density, up to normalization. See also: * This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case restricting to

\left\, \mathbf\right\, =1

gives the Fisher-Bingham distribution. * Restriction is not to be confused with

projection Projection or projections may refer to: Physics * Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction * The display of images by a projector Optics, graphics, and carto ...

. If

\mathbf z\sim\mathcal N_p(\boldsymbol\mu,\boldsymbol\Sigma)

, which we project onto the unitsphere:

\mathbf x=\lVert\mathbf z\rVert^\mathbf z

, we get the projected normal distribution. (Informally, restriction can be thought of as

rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type o ...

with an infinite sampling budget, where we keep only those

\mathbf z

that land on the unitsphere, while with projection we use all samples.)

Estimation of parameters

Mean direction

A series of ''N''

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

unit vector In mathematics, a unit vector in a normed vector space is a Vector (mathematics and physics), vector (often a vector (geometry), spatial vector) of Norm (mathematics), length 1. A unit vector is often denoted by a lowercase letter with a circumfle ...

x_i

are drawn from a von Mises–Fisher distribution. The

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

estimates of the mean direction

\mu

is simply the normalized

arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...

, a

sufficient statistic In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...

: :

\mu = \bar/\bar,
\text
\bar = \frac\sum_i^N x_i,
\text
\bar = \, \bar\, ,

Concentration parameter

Use the modified Bessel function of the first kind to define :

A_(\kappa) = \frac   .

Then: :

\kappa = A_p^(\bar) .

Thus

\kappa

is the solution to :

A_p(\kappa) = \frac = \bar .

A simple approximation to

\kappa

is (Sra, 2011) :

\hat = \frac ,

A more accurate inversion can be obtained by iterating the Newton method a few times :

\hat_1 = \hat - \frac ,

\hat_2 = \hat_1 - \frac .

Standard error

For ''N'' ≥ 25, the estimated spherical

standard error The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...

of the sample mean direction can be computed as: :

\hat = \left(\frac\right)^

where :

d = 1 - \frac \sum_i^N \left(\mu^Tx_i\right)^2

It is then possible to approximate a

100(1-\alpha)\%

a spherical confidence interval (a ''confidence cone'') about

\mu

with semi-vertical angle: :

q = \arcsin\left(e_\alpha^\hat\right),

where

e_\alpha = -\ln(\alpha).

For example, for a 95% confidence cone,

\alpha = 0.05, e_\alpha = -\ln(0.05) = 2.996,

and thus

q = \arcsin(1.731\hat).

Expected value

The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by

A_p(\kappa)

as defined above. For a Von Mises–Fisher distribution with mean direction

\boldsymbol

and concentration

\kappa>0

, the expected value is: :

A_p(\kappa)\boldsymbol

. For

\kappa=0

, the expected value is at the origin. For finite

\kappa>0

, the length of the expected value is strictly between zero and one and is a monotonic rising function of

\kappa

. The empirical mean ( arithmetic average) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.

Entropy and KL divergence

The expected value can be used to compute

differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy (a measure of average surprisal) of a random variable, to continu ...

and

KL divergence KL, kL, kl, or kl. may refer to: Businesses and organizations * KLM, a Dutch airline (IATA airline designator KL) * Koninklijke Landmacht, the Royal Netherlands Army * Kvenna Listin ("Women's List"), a political party in Iceland * KL FM, a Ma ...

. The differential entropy of

\text(\boldsymbol, \kappa)

is: :

\bigl\langle -\log f_(\mathbf; \boldsymbol, \kappa)\bigr\rangle_
=-\log f_(A_p(\kappa)\boldsymbol; \boldsymbol, \kappa)
= -\log C_p(\kappa) -\kappa A_p(\kappa)

where the angle brackets denote expectation. Notice that the entropy is a function of

\kappa

only. The KL divergence between

\text(\boldsymbol, \kappa_0)

and

\text(\boldsymbol, \kappa_1)

is: :

\Bigl\langle \log \frac

\Bigr\rangle_
=\log \frac

Transformation

Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let

\mathbf

be a

p

-by-

p

orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identi ...

. Let

\mathbf\sim\text(\boldsymbol\mu,\kappa)

and apply the invertible linear transform:

\mathbf=\mathbf

. The inverse transform is

\mathbf=\mathbf

, because the inverse of an orthogonal matrix is its

transpose In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...

\mathbf^=\mathbf'

. The Jacobian of the transform is

\mathbf

, for which the absolute value of its

determinant In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...

is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that: :

\mathbf\sim\text(\mathbf\boldsymbol,\kappa).

One may verify that since

\boldsymbol

and

\mathbf

are unit vectors, then by the orthogonality, so are

\mathbf\boldsymbol

and

\mathbf

Pseudo-random number generation

General case

An algorithm for drawing pseudo-random samples from the Von Mises Fisher (VMF) distribution was given by Ulrich and later corrected by Wood. An implementation in R is given by Hornik and Grün; and a fast Python implementation is described by Pinzón and Jung. To simulate from a VMF distribution on the

(p-1)

-dimensional unitsphere,

S^

, with mean direction

\boldsymbol\in S^

, these algorithms use the following radial-tangential decomposition for a point

\mathbf\in S^\subset\mathbb^p

: :

\mathbf = t\boldsymbol+\sqrt\mathbf

where

\mathbf\in\mathbb^p

lives in the tangential

(p-2)

-dimensional unit-subsphere that is centered at and perpendicular to

\boldsymbol

; while

t\in 1,1 /math>. To draw a sample \mathbf from a VMF with parameters \boldsymbol and \kappa, \mathbf must be drawn from the uniform distribution on the tangential subsphere; and the radial component, t, must be drawn independently from the distribution with density:
: f_\text(t;\kappa,p)=\frac

e^(1-t^2)^where \nu=\frac2-1 . The normalization constant for this density may be verified by using:
: I_\nu(\kappa) = \frac
\int_^e^(1-t^2)^\,dt as given in Appendix 1 (A.3) in ''Directional Statistics''. Drawing the t samples from this density by using a

algorithm is explained in the above references. To draw the uniform

\mathbf

samples perpendicular to

\boldsymbol

, see the algorithm in, or otherwise a Householder transform can be used as explained in Algorithm 1 in.

3-D sphere

To generate a Von Mises–Fisher distributed pseudo-random spherical 3-D unit vector

\mathbf X_

on the

S^

for a given

\mu

and

\kappa

, define

\mathbf X_ =, \theta, \phi /math>

where \theta is the polar angle, \phi the azimuthal angle, and r=1 the distance to the center of the sphere

for \mathbf \mu =,(.),1 /math> the pseudo-random triplet is then given by \mathbf X_ =, \arccos W, V /math>

where V is sampled from the

continuous uniform distribution In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...

U(a,b)

with lower bound

a

and upper bound

b

V \sim U(0, 2\pi)

and

W = \cos \theta = 1+ \frac   (\ln\xi+\ln(1- \frac  e^))

where

\xi

is sampled from the standard continuous uniform distribution

U(0,1)

\xi \sim U(0, 1)

here,

W

should be set to

W = 1

when

\mathbf \xi=0

and

\mathbf X_

rotated to match any other desired

\mu

Distribution of polar angle

For

p = 3

, the angle θ between

\mathbf

and

\boldsymbol

satisfies

\cos\theta=\boldsymbol^\mathsf \mathbf

. It has the distribution :

p(\theta)=\int d^2x f(x; \boldsymbol, \kappa)\, \delta\left(\theta-\text(\boldsymbol^\mathsf \mathbf)\right)

, which can be easily evaluated as :

p(\theta)=2\pi C_3(\kappa)\,\sin\theta\, e^

. For the general case,

p\ge2

, the distribution for the cosine of this angle: :

\cos\theta = t = \boldsymbol^\mathsf \mathbf

is given by

f_\text(t;\kappa,p)

, as explained

above Above may refer to: *Above (artist) Tavar Zawacki (b. 1981, California) is a Polish, Portuguese - American abstract artist and internationally recognized visual artist based in Berlin, Germany. From 1996 to 2016, he created work under the ...

The uniform hypersphere distribution

Component marginal of uniform distribution

For

1\le i\le p

, let

x_i

be any component of

\mathbf\in \mathbb S^

. The marginal distribution for

x_i

has the density: :

f_i(x_i;p) = f_\text(x_i;\kappa=0,p)=\frac

where

B(\alpha,\beta)

is the

beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...

. This distribution may be better understood by highlighting its relation to the

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

: :

\begin
x_i^2&\sim\text\bigl(\frac12,\frac2\bigr) &&\text& 
\frac&\sim\text\bigl(\frac2,\frac2\bigr)  
\end

where the Legendre duplication formula is useful to understand the relationships between the normalization constants of the various densities above. Note that the components of

\mathbf\in \mathbb S^

are ''not'' independent, so that the uniform density is not the product of the marginal densities; and

\mathbf

cannot be assembled by independent sampling of the components.

Distribution of dot-products

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

, especially in

image classification Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form o ...

, to-be-classified inputs (e.g. images) are often compared using

cosine similarity In data analysis, cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between the vectors; that is, it is the dot product of the vectors divided ...

, which is the

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...

between intermediate representations in the form of unitvectors (termed ''embeddings''). The dimensionality is typically high, with

p

at least several hundreds. The

deep neural network Deep learning is a subset of machine learning that focuses on utilizing multilayered neural network (machine learning), neural networks to perform tasks such as Statistical classification, classification, Regression analysis, regression, and re ...

s that extract embeddings for classification should learn to spread the classes as far apart as possible and ideally this should give classes that are uniformly distributed on

\mathbb S^

. For a better statistical understanding of ''across-class cosine similarity'', the distribution of dot-products between unitvectors independently sampled from the uniform distribution may be helpful. Let

\mathbf,\mathbf\in \mathbb S^

be unitvectors in

\mathbb^p

, independently sampled from the uniform distribution. Define: :

& s&=\text(r) =\log\frac \in\mathbb \end

where

t

is the dot-product and

r,s

are transformed versions of it. Then the distribution for

t

is the same as the ''marginal component distribution'' given

; the distribution for

r

is symmetric beta and the distribution for

s

is symmetric logistic-beta: :

\begin
r&\sim \text\bigl(\frac2,\frac2\bigr), &
s&\sim B_\sigma\bigl(\frac2,\frac2\bigr)
\end

The means and variances are: :

=0, \end

and :

=2\psi'\bigl(\frac2\bigr)\approx\frac4 \end

where

\psi'=\psi^

is the first

polygamma function In mathematics, the polygamma function of order is a meromorphic function on the complex numbers \mathbb defined as the th derivative of the logarithm of the gamma function: :\psi^(z) := \frac \psi(z) = \frac \ln\Gamma(z). Thus :\psi^(z) ...

. The variances decrease, the distributions of all three variables become more Gaussian, and the final

approximation An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ...

gets better as the dimensionality,

p

, is increased.

Generalizations

Matrix Von Mises-Fisher

The matrix von Mises-Fisher distribution (also known as matrix Langevin distribution) has the density :

f_(\mathbf; \mathbf) \propto \exp(\operatorname(\mathbf^\mathsf\mathbf))

supported on the

Stiefel manifold In mathematics, the Stiefel manifold V_k(\R^n) is the set of all orthonormal ''k''-frames in \R^n. That is, it is the set of ordered orthonormal ''k''-tuples of vectors in \R^n. It is named after Swiss mathematician Eduard Stiefel. Likewise one ...

n \times p

orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal unit vectors. A unit vector means that the vector has a length of 1, which is also known as normalized. Orthogonal means that the vectors are all perpe ...

p-frames

\mathbf

, where

\mathbf

is an arbitrary

n \times p

real matrix.

Saw distributions

Ulrich, in designing an algorithm for sampling from the VMF distribution, makes use of a family of distributions named after and explored by John G. Saw. A Saw distribution is a distribution on the

(p-1)

-sphere,

S^

, with modal vector

\boldsymbol\in S^

and concentration

\kappa\ge0

, and of which the density function has the form: :

f_\text(\mathbf;\boldsymbol\mu,\kappa)
= \frac

where

g

is a non-negative, increasing function; and where

K_P(\kappa)

is the normalization constant. The above-mentioned ''radial-tangential decomposition'' generalizes to the Saw family and the radial component,

t=\mathbf x'\boldsymbol\mu

has the density: :

f_\text(t;\kappa)=\frac\frac.

where

B

is the beta function. Also notice that the left-hand factor of the radial density is the surface area of

S^

. By setting

g(\kappa\mathbf x'\boldsymbol\mu)=e^

, one recovers the VMF distribution.

Weighted Rademacher Distribution

The definition of the Von Mises–Fisher distribution can be extended to include also the case where

p=1

, so that the support is the 0-dimensional hypersphere, which when embedded into 1-dimensional Euclidean space is the discrete set,

\

. The mean direction is

\mu\in\

and the concentration is

\kappa\ge0

. The probability mass function, for

x\in\

is: :

f_1(x\mid\mu,\kappa) = \frac
= \sigma(2\kappa\mu x)

where

\sigma(z)=1/(1+e^)

is the logistic sigmoid. The expected value is

\mu\,\text(\kappa)

. In the uniform case, at

\kappa=0

, this distribution degenerates to the Rademacher distribution.

Definition

Support

Note on the normalization constant

Relation to normal distribution

Estimation of parameters

Mean direction

Concentration parameter

Standard error

Expected value

Entropy and KL divergence

Transformation

Pseudo-random number generation

General case

3-D sphere

Distribution of polar angle

The uniform hypersphere distribution

Component marginal of uniform distribution

Distribution of dot-products

Generalizations

Matrix Von Mises-Fisher

Saw distributions

Weighted Rademacher Distribution

See also

References

Further reading