directional statistics Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes ( lines through the origin in R''n'') or rotations in R''n''. ...

, the projected normal distribution (also known as offset normal distribution, angular normal distribution or angular Gaussian distribution) is a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

over directions that describes the radial projection of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

with n-variate normal distribution over the unit (n-1)-sphere.

Definition and properties

Given a random variable

\boldsymbol X \in \R^n

that follows a multivariate normal distribution

\mathcal_n(\boldsymbol\mu,\, \boldsymbol\Sigma)

, the projected normal distribution

\mathcal_n(\boldsymbol\mu, \boldsymbol\Sigma)

represents the distribution of the random variable

\boldsymbol Y = \frac

obtained projecting

\boldsymbol X

over the unit sphere. In the general case, the projected normal distribution can be asymmetric and multimodal. In case

\boldsymbol \mu

is parallel to an

eigenvector In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by ...

\boldsymbol \Sigma

, the distribution is symmetric. The first version of such distribution was introduced in Pukkila and Rao (1988).

Support

The support of this distribution is the unit (n-1)-sphere, which can be variously given in terms of a set of

(n-1)

-dimensional angular spherical cooordinates: :

\boldsymbol\Theta=, \pi \times, 2 \pi)\subset\R^

or in terms of

n

-dimensional Cartesian coordinates: :

\mathbb S^=\\subset\R^n

The two are linked via the embedding function,

e:\boldsymbol\Theta\to\R^n

, with range

e(\boldsymbol\Theta)=\mathbb S^.

This function is defined by N-sphere#Spherical_coordinates">the formula for spherical coordinates at

r=1.

Density function

The density of the projected normal distribution

\mathcal_n(\boldsymbol\mu, \boldsymbol\Sigma)

can be constructed from the density of its generator n-variate normal distribution

\mathcal_n(\boldsymbol\mu, \boldsymbol\Sigma)

by re-parametrising to n-dimensional spherical coordinates and then integrating over the radial coordinate. In ''full'' spherical coordinates with radial component

r \in \boldsymbol \mu, \boldsymbol \Sigma) =
r^\mathcal N_n(r\boldsymbol v\mid\boldsymbol\mu,\boldsymbol\Sigma)
= \frac
e^

where the factor

r^

is due to the change of variables

\boldsymbol x=r\boldsymbol v

. The density of

\mathcal_n(\boldsymbol\mu, \boldsymbol\Sigma)

can then be obtained via marginalization over

r

as :

p(\boldsymbol \theta ,  \boldsymbol \mu, \boldsymbol \Sigma) = \int_0^\infty p(r, \boldsymbol \theta ,  \boldsymbol \mu, \boldsymbol \Sigma) dr .

The same density had been previously obtained in Pukkila and Rao (1988, Eq. (2.4)) using a different notation.

Note on density definition

This subsection gives some clarification lest the various forms of probability density used in this article be misunderstood. Take for example a random variate

u\in(0,1]

, with uniform density,

p_U(u)=1

. If

\ell=-\log u

, it has density,

p_L(\ell)=e^

. This works if both densities are defined with respect to

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...

on the real line. By default convention: * Density functions are Lebesgue-densities, defined with respect to Lebesgue measure, applied in the space where the argument of the density function lives, so that: * The Lebesgue-densities involved in a

change of variables In mathematics, a change of variables is a basic technique used to simplify problems in which the original variables are replaced with functions of other variables. The intent is that when expressed in new variables, the problem may become si ...

are related by a factor dependent on the derivative(s) of the transformation (

d\ell/du=e^

in this example; and

r^

for the above change of variables,

\boldsymbol x=r\boldsymbol v

). Neither of these conventions apply to the

\mathcal

densities in this article: * For

n\ge3

the density,

p(\boldsymbol\theta\mid\boldsymbol\mu,\boldsymbol\Sigma)

is ''not'' defined w.r.t. Lebesgue measure in

\R^

where

\boldsymbol\theta

lives, because that measure does not agree with the standard notion of hyperspherical area. Instead, the density is defined w.r.t. a measure that is pulled back (via the embedding function) to angular coordinate space, from Lebesgue measure in the

(n-1)

-dimensional

tangent space In mathematics, the tangent space of a manifold is a generalization of to curves in two-dimensional space and to surfaces in three-dimensional space in higher dimensions. In the context of physics the tangent space to a manifold at a point can be ...

of the hypersphere. This will be explained below. * With the embedding

\boldsymbol v=e(\boldsymbol\theta)

, a density,

\tilde p(\boldsymbol v\mid\boldsymbol\mu,\boldsymbol\Sigma)

cannot be defined w.r.t. Lebesgue measure, because

\mathbb S^\in\R^n

has Lebesgue measure zero. Instead,

\tilde p

is defined w.r.t. scaled Hausdorff measure. The pullback and Hausdorff measures agree, so that: :

p(\boldsymbol\theta\mid\boldsymbol\mu,\boldsymbol\Sigma)=\tilde p(\boldsymbol v\mid\boldsymbol\mu,\boldsymbol\Sigma)

where there is no change-of-variables factor, because the densities use ''different'' measures. To better understand what is meant by a density being defined w.r.t. a measure (a function that maps subsets in sample space to a non-negative real-valued 'volume'), consider a measureable subset,

U\subseteq\boldsymbol\Theta

, with embedded image

V=e(U)\subseteq\mathbb S^

and let

\boldsymbol v=e(\boldsymbol\theta)\sim\mathcal

, then the probability for finding the sample in the subset is: :

P(\boldsymbol\theta\in U)=\int_U p \,d\pi
= P(\boldsymbol v\in V) = \int_V \tilde p \,d h

where

\pi,h

are respectively the pullback and Hausdorff measures; and the integrals are Lebesgue integrals, which can be rewritten as Riemann integrals thus: :

\int_U p\,d\pi = \int_0^\infty \pi\left(\\right)\,dt  
\quad (1)

Pullback measure

The tangent space at

\boldsymbol v\in\mathbb S^

is the

(n-1)

-dimensional

linear subspace In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a ''function (mathematics), function'' (or ''mapping (mathematics), mapping''); * linearity of a ''polynomial''. An example of a li ...

perpendicular to

\boldsymbol v

, where Lebesgue measure ''can'' be used. At very small scale, the tangent space is indistinguishable from the sphere (e.g. Earth looks locally flat), so that Lebesgue measure in tangent space agrees with area on the hypersphere. The tangent space Lebesgue measure is pulled back via the embedding function, as follows, to define the measure in coordinate space. For

U\subseteq\boldsymbol\Theta,

a measureable subset in coordinate space, the pullback measure, as a

Riemann integral In the branch of mathematics known as real analysis, the Riemann integral, created by Bernhard Riemann, was the first rigorous definition of the integral of a function on an interval. It was presented to the faculty at the University of Gö ...

is: :

\pi(U) = \int_U \sqrt\,d\theta_1\,\cdots\,d\theta_
\quad (2)

where the Jacobian of the embedding function,

e(\boldsymbol\theta)

, is the

n\text(n-1)

matrix

\mathbf E_\boldsymbol\theta,

the columns of which span the

(n-1)

-dimensional tangent space where the Lebesgue measure is applied. It can be shown:

\sqrt=\prod_^ \sin^(\theta_i).

When plugging the pullback measure (2), into equation (1) and exchanging the order of integration: :

P(\boldsymbol\theta\in\mathcal U) = \int_U p\,d\pi
= \int_U p(\boldsymbol\theta\mid\boldsymbol\mu,\boldsymbol\Sigma)
\,\sqrt\,d\theta_1\,\cdots\,d\theta_

where the first integral is Lebesgue and the second Riemann. Finally, for better geometric understanding of the square-root factor, consider: * For

n=2

, when integrating over the unitcircle, w.r.t.

\theta_1

, with embedding

e(\theta_1)=(\cos\theta_1, \sin\theta_1)

, the Jacobian is

\mathbf E_\boldsymbol\theta= \sin\theta_1\,\cos\theta_1

, so that

\sqrt=1

. The angular differential,

d\theta_1

directly gives the subtended arc length on the circle. * For

n=3

, when integrating over the unitsphere, w.r.t.

\theta_1,\theta_2

, we get

\sqrt=\sin\theta_1

, which is the radius of the

circle of latitude A circle of latitude or line of latitude on Earth is an abstract east–west small circle connecting all locations around Earth (ignoring elevation) at a given latitude coordinate line. Circles of latitude are often called parallels because ...

\theta_1

(compare equator to polar circle). The area of the surface patch subtended by the two angular differentials is:

\sin\theta_1\,d\theta_1\,d\theta_2

. * More generally, for

n\ge2

, let

\mathbf T

be a square or tall matrix and let

/\mathbf T\!/

denote the parallelotope spanned by its colums (which represent the edges meeting at a common vertex). The parallelotope volume is

\sqrt,

the square root of the absolute value of the

Gram determinant In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...

. For square

\mathbf T

, the volume simplifies to

\left, \operatorname(\mathbf T)\.

Now let

\mathbf R=\operatorname(d\theta_1,\cdots,d\theta_)

, so that

/\mathbf/\in\boldsymbol\Theta

is a rectangle with infinitessimally small volume,

\left, \operatorname(\mathbf R)\=\prod_^d\theta_i

. Since the smooth embedding function is linear at small scale, the embedded image is the paralleotope,

e(/\mathbf/)=/\mathbf/

, with volume (area of the subtended hyperspherical surface patch):

\sqrt
= \sqrt\,
d\theta_1\,\cdots\,d\theta_.

Circular distribution

For

n=2

, parametrising the position on the

unit circle In mathematics, a unit circle is a circle of unit radius—that is, a radius of 1. Frequently, especially in trigonometry, the unit circle is the circle of radius 1 centered at the origin (0, 0) in the Cartesian coordinate system in the Eucli ...

polar coordinates In mathematics, the polar coordinate system specifies a given point (mathematics), point in a plane (mathematics), plane by using a distance and an angle as its two coordinate system, coordinates. These are *the point's distance from a reference ...

\boldsymbol v = (\cos\theta, \sin\theta)

, the density function can be written with respect to the parameters

\boldsymbol\mu

and

\boldsymbol\Sigma

of the initial normal distribution as :

p(\theta ,  \boldsymbol\mu, \boldsymbol\Sigma) =
\frac
\left( 1 + T(\theta) \frac \right) I_(\theta)

where

\phi

and

\Phi

are the

density Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...

and cumulative distribution of a

standard normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac e^ ...

T(\theta) = \frac

, and

I

is the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

. In the circular case, if the mean vector

\boldsymbol \mu

is parallel to the

associated to the largest

eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

of the covariance, the distribution is symmetric and has a

mode Mode ( meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * MO''D''E (magazine), a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is the setting fo ...

\theta = \alpha

and either a mode or an antimode at

\theta = \alpha + \pi

, where

\alpha

is the polar angle of

\boldsymbol \mu = (r \cos\alpha, r \sin\alpha)

. If the mean is parallel to the eigenvector associated to the smallest eigenvalue instead, the distribution is also symmetric but has either a mode or an antimode at

\theta = \alpha

and an antimode at

\theta = \alpha + \pi

Spherical distribution

For

n=3

, parametrising the position on the

unit sphere In mathematics, a unit sphere is a sphere of unit radius: the locus (mathematics), set of points at Euclidean distance 1 from some center (geometry), center point in three-dimensional space. More generally, the ''unit -sphere'' is an n-sphere, -s ...

spherical coordinates In mathematics, a spherical coordinate system specifies a given point in three-dimensional space by using a distance and two angles as its three coordinates. These are * the radial distance along the line connecting the point to a fixed point ...

\boldsymbol v = (\cos\theta_1 \sin\theta_2, \sin\theta_1 \sin\theta_2, \cos\theta_2)

where

\boldsymbol \theta = (\theta_1, \theta_2)

are the

azimuth An azimuth (; from ) is the horizontal angle from a cardinal direction, most commonly north, in a local or observer-centric spherical coordinate system. Mathematically, the relative position vector from an observer ( origin) to a point ...

\theta_1 \in, 2\pi)

and inclination

\theta_2 \in [0, \pi/math> angles respectively, the density function becomes

: p(\boldsymbol \theta ,  \boldsymbol\mu, \boldsymbol\Sigma) =
\frac
\left(\frac  + T(\boldsymbol \theta) \left( 1 + T(\boldsymbol \theta) \frac \right) \right)
I_(\theta_1) I_(\theta_2) where \phi, \Phi, T, and I have the same meaning as the circular case.

Angular Central Gaussian Distribution

In the special case,

\boldsymbol\mu=\mathbf 0

, the projected normal distribution, with

n\ge2

is known as the angular central Gaussian (ACG) and in this case, the density function can be obtained in closed form as a function of Cartesian coordinates. Let

\mathbf x\sim\mathcal N_n(\mathbf 0, \boldsymbol\Sigma)

and project radially:

\mathbf v = \lVert\mathbf x\rVert^\mathbf x

so that

\mathbf v\in\mathbb S^=\

(the unit hypersphere). We write

\mathbf v\sim\operatorname(\boldsymbol\Sigma)

, which as explained above, at

\boldsymbol v=e(\boldsymbol\theta)

, has density: :

\tilde p_(\mathbf v\mid\boldsymbol\Sigma)
= p(\boldsymbol\theta\mid\boldsymbol0,\boldsymbol\Sigma)
= \int_0^\infty r^\mathcal N_n(r\mathbf v\mid\mathbf 0, \boldsymbol\Sigma)\,dr
= \frac\left, \boldsymbol\Sigma\^(\mathbf v'\boldsymbol\Sigma^\mathbf v)^

where the integral can be solved by a change of variables and then using the standard definition of the

gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...

. Notice that: * For any

k>0

there is the parameter indeterminacy: :

\tilde p_(\mathbf v\mid k\boldsymbol\Sigma) = \tilde p_(\mathbf v\mid\boldsymbol\Sigma)

. * If

\boldsymbol\Sigma=k\mathbf I_n

, the uniform hypershpere distribution,

\operatorname

results, with constant density equal to the reciprocal of the

surface area The surface area (symbol ''A'') of a solid object is a measure of the total area that the surface of the object occupies. The mathematical definition of surface area in the presence of curved surfaces is considerably more involved than the d ...

\mathbb S^

: :

\tilde p_\text(\mathbf v\mid k\mathbf I_n)=p_\text=\frac

ACG via transformation of normal or uniform variates

Let

\mathbf T

be any

n

-by-

n

invertible matrix such that

\mathbf T\mathbf T'=\boldsymbol\Sigma

. Let

\mathbf u\sim\operatorname(\mathbf I_n)

(uniform) and

s\sim\chi(n)

(

chi distribution In probability theory and statistics, the chi distribution is a continuous probability distribution over the non-negative real line. It is the distribution of the positive square root of a sum of squared independent Gaussian random variables. E ...

), so that:

\mathbf x=s\mathbf\sim\mathcal N_n(\mathbf 0, \boldsymbol\Sigma)

(multivariate normal). Now consider: :

\mathbf v = \frac = \frac\sim\operatorname(\boldsymbol\Sigma)

which shows that the ACG distribution ''also'' results from applying, to uniform variates, the normalized linear transform: :

f_(\mathbf u)=\frac

Some further explanation of these two ways to obtain

\mathbf v\sim\operatorname(\boldsymbol\Sigma)

may be helpful: * If we start with

\mathbf x\in\mathbb R^n

, sampled from a multivariate normal, we can project radially onto

\mathbb S^

to obtain ACG variates. To derive the ACG density, we first do a change of variables:

\mathbf x\mapsto(r,\mathbf v)

, which is still an

n

-dimensional representation, and this transformation induces the differential volume change factor,

r^

, which is proportional to volume in the

(n-1)

-dimensional

perpendicular to

\mathbf x

. Then, to finally obtain the ACG density on the

(n-1)

-dimensional unitsphere, we need to marginalize over

r

. * If we start with

\mathbf u\in\mathbb S^

, sampled from the uniform distribution, we do not need to marginalize, because we are already in

n-1

dimensions. Instead, to obtain ACG variates (and the associated density), we can directly do the change of variables,

\mathbf v=f_(\mathbf u)

, for which further details are given in the next subsection. Caveat: when

\boldsymbol\mu

is nonzero, although

s\mathbf+\boldsymbol\mu\sim\mathcal N_d(\boldsymbol\mu,\boldsymbol\Sigma)

, a similar duality does ''not'' hold: :

\frac
\ne\frac\sim\mathcal_n(\boldsymbol)

Although we can radially project affine-transformed normal variates to get

\mathcal_n

variates, this does not work for uniform variates.

Wider application of the normalized linear transform

The normalized linear transform,

\mathbf v=f_(\mathbf u)

, is a

bijection In mathematics, a bijection, bijective function, or one-to-one correspondence is a function between two sets such that each element of the second set (the codomain) is the image of exactly one element of the first set (the domain). Equival ...

from the unitsphere to itself; the inverse is

\mathbf u=f_(\mathbf v)

. This transform is of independent interest, as it may be applied as a probabilistic flow on the hypersphere (similar to a

normalizing flow A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the Probability density function#Function of random ...

) to generalize also other (non-uniform) distributions on hyperspheres, for example the Von Mises-Fisher distribution. The fact that we have a closed form for the ACG density allows us to recover also in closed form the differential volume change induced by this transform. For the change of variables,

\mathbf v=f_(\mathbf u)

on the

manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...

\mathbb S^

, the uniform and ACG densities are related as: :

\tilde p_(\mathbf v\mid\boldsymbol\Sigma) = \frac

where the (constant) uniform density is

p_=\frac

and where

R(\mathbf v,\boldsymbol\Sigma)

is the differential volume change factor from the input to the output of the transformation; specifically, it is given by the absolute value of the

determinant In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...

of an

(n-1)

-by-

(n-1)

matrix: :

R(\mathbf v,\boldsymbol\Sigma) = \operatorname\left, \mathbf Q_'\mathbf J_\mathbf Q_\

where

\mathbf J_

is the

n

-by-

n

Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...

of the ''transformation in Euclidean space'',

f_:\mathbb R^n\to\mathbb R^n

, evaluated at

\mathbf u

. In Euclidean space, the transformation and its Jacobian are non-invertible, but when the domain and co-domain are restricted to

\mathbb S^

, then

f_:\mathbb S^\to\mathbb S^

is a bijection and the induced differential volume ratio,

R(\mathbf v,\boldsymbol\Sigma)

is obtained by projecting

\mathbf J_

onto the

(n-1)

-dimensional tangent spaces at the transformation input and output:

\mathbf Q_, \mathbf Q_

are

n

-by-

(n-1)

matrices whose orthonormal columns span the tangent spaces. Although the above determinant formula is relatively easy to evaluate numerically on a software platform equipped with

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...

and

automatic differentiation In mathematics and computer algebra, automatic differentiation (auto-differentiation, autodiff, or AD), also called algorithmic differentiation, computational differentiation, and differentiation arithmetic Hend Dawood and Nefertiti Megahed (2023) ...

, a simple closed form is hard to derive directly. However, since we already have

\tilde p_

, we can recover: :

R(\mathbf v, \boldsymbol\Sigma) = \left, \boldsymbol\Sigma\^(\mathbf v'\boldsymbol\Sigma^\mathbf v)^
= \frac

where in the final RHS it is understood that

\boldsymbol\Sigma=\mathbf T\mathbf T'

and

\mathbf u=f_(\mathbf v)

. The normalized linear transform can now be used, for example, to give a closed-form density for a more flexible distribution on the hypersphere, that is generalized from the Von Mises-Fisher. Let

\mathbf x\sim\text(\boldsymbol\mu,\kappa)

and

\mathbf v = f_(\mathbf x)

; the resulting density is: :

p(\mathbf v\mid\boldsymbol\mu,\kappa,\mathbf T) = \frac

References

Sources

* * * * * {{DEFAULTSORT:Projected normal distribution Normal distribution Continuous distributions Directional statistics

Definition and properties

Support

Density function

Note on density definition

Pullback measure

Circular distribution

Spherical distribution

Angular Central Gaussian Distribution

ACG via transformation of normal or uniform variates

Wider application of the normalized linear transform

See also

References

Sources