Vecchia approximation is a

Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. e ...

approximation An approximation is anything that is intentionally similar but not exactly equality (mathematics), equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very ...

technique originally developed by Aldo Vecchia, a statistician at

United States Geological Survey The United States Geological Survey (USGS), formerly simply known as the Geological Survey, is a scientific agency of the United States government. The scientists of the USGS study the landscape of the United States, its natural resources, ...

. It is one of the earliest attempts to use Gaussian processes in high-dimensional settings. It has since been extensively generalized giving rise to many contemporary approximations.

Intuition

A joint probability distribution for events

A, B

, and

C

, denoted

P(A,B,C)

, can be expressed as :

P(A,B,C) = P(A) P(B ,  A) P(C ,  A,B)

Vecchia's approximation takes the form, for example, :

P(A,B,C) \approx P(A) P(B ,  A) P(C ,  A )

and is accurate when events

B

and

C

are close to conditionally independent given knowledge of

A

. Of course one could have alternatively chosen the approximation :

P(A,B,C) \approx P(A) P(B, A) P(C ,  B)

and so use of the approximation requires some knowledge of which events are close to conditionally independent given others. Moreover, we could have chosen a different ordering, for example :

P(A,B,C) \approx P(C) P(C, A) P(B, A).

Fortunately, in many cases there are good heuristics making decisions about how to construct the approximation. More technically, general versions of the approximation lead to a sparse Cholesky factor of the precision matrix. Using the standard Cholesky factorization produces entries which can be interpreted as conditional correlations with zeros indicating no independence (since the model is Gaussian). These independence relations can be alternatively expressed using graphical models and there exist theorems linking graph structure and vertex ordering with zeros in the Cholesky factor. In particular, it is known that independencies that are encoded in a

moral graph In graph theory, a moral graph is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models. The moralized counterpart of a directed acy ...

lead to Cholesky factors of the precision matrix that have no

fill-in Fill-in can refer to: * A puzzle, see Fill-In (puzzle) * In numerical analysis, the entries of a matrix which change from zero to a non-zero value in the execution of an algorithm; see Sparse matrix#Reducing fill-in * An issue of a comic book prod ...

Formal description

The problem

Let

x

be a

indexed by

\mathcal

with mean function

\mu

and covariance function

K

. Assume that

S = \ \subset \mathcal

is a finite subset of

\mathcal

and

\mathbf = (x_1, \dots, x_n)

is a vector of values of

x

evaluated at

S

, i.e.

x_i = x(s_i)

for

i=1, \dots, n

. Assume further, that one observes

\mathbf = (y_1, \dots, y_n)

where

y_i = x_i + \varepsilon_i

with

\varepsilon_i \overset \mathcal(0, \sigma^2)

. In this context the two most common inference tasks include evaluating the likelihood :

\mathcal(\mathbf) = \int f(\mathbf, \mathbf) \, d\mathbf,

or making predictions of values of

x

for

s^* \in \mathcal

and

s \not\in S

, i.e. calculating :

f(x(s^*)\mid y_1, \dots, y_n).

Original formulation

The original Vecchia method starts with the observation that the joint density of observations

f(\mathbf) = \left(y_1, \dots, y_n\right)

can be written as a product of conditional distributions :

f(\mathbf) = f(y_1)\prod_^n f(y_i\mid y_, \dots, y_1).

Vecchia approximation assumes instead that for some

k \ll n

\hat(\mathbf) = f(y_1)\prod_^n f(y_i\mid y_, \dots, y_ ).

Vecchia also suggested that the above approximation be applied to observations that are reordered lexicographically using their spatial coordinates. While his simple method has many weaknesses, it reduced the computational complexity to

\mathcal(nk^3)

. Many of its deficiencies were addressed by the subsequent generalizations.

General formulation

While conceptually simple, the assumption of the Vecchia approximation often proves to be fairly restrictive and inaccurate. This inspired important generalizations and improvements introduced in the basic version over the years: the inclusion of latent variables, more sophisticated conditioning and better ordering. Different special cases of the general Vecchia approximation can be described in terms of how these three elements are selected.

Latent variables

To describe extensions of the Vecchia method in its most general form, define

z_i = (x_i, y_i)

and notice that for

\mathbf = (z_1, \dots, z_n)

it holds that like in the previous section :

f(\mathbf) = f(x_1, y_1) \left( \prod_^n f(x_i\mid z_) \right) \left( \prod_^n f(y_i\mid x_i) \right)

because given

x_i

all other variables are independent of

y_i

Ordering

It has been widely noted that the original lexicographic ordering based on coordinates when

\mathcal

is two-dimensional produces poor results. More recently another orderings have been proposed, some of which ensure that points are ordered in a quasi-random fashion. Highly scalable, they have been shown to also drastically improve accuracy.

Conditioning

Similar to the basic version described above, for a given ordering a general Vecchia approximation can be defined as :

\hat(\mathbf) = f(x_1, y_1) \left( \prod_^n f(x_i\mid z_) \right) \left( \prod_^n f(y_i\mid x_i) \right),

where

q(i) \subset \left\

. Since

y_i \perp x_, y_ \mid x_i

it follows that

f(x_i\mid z_) = f(x_i\mid x_q(i), y_q(i)) = f(x_i\mid x_q(i))

since suggesting that the terms

f(x_i\mid z_)

be replaced with

f(x_i\mid x_)

. It turns out, however, that sometimes conditioning on some of the observations

z_i

increases sparsity of the Cholesky factor of the precision matrix of

(\mathbf, \mathbf)

. Therefore, one might instead consider sets

q_y(i)

and

q_x(i)

such that

q(i) = q_y(i) \cup q_x(i)

and express

\hat

as :

\hat(\mathbf) = f(x_1, y_1) \left( \prod_^n f(x_i\mid x_, y_) \right) \left( \prod_^n f(y_i\mid x_i) \right).

Multiple methods of choosing

q_y(i)

and

q_x(i)

have been proposed, most notably the nearest-neighbour Gaussian process (NNGP), meshed Gaussian process and multi-resolution approximation (MRA) approaches using

q(i) = q_x(i)

, standard Vecchia using

q(i) = q_y(i)

and Sparse General Vecchia where both

q_y(i)

and

q_x(i)

are non-empty.

Software

Several packages have been developed which implement some variants of the Vecchia approximation. * '
GPvecchia
'' is an R package available through CRAN which implements most versions of the Vecchia approximation * '
GpGp
'' is an R package available through CRAN which implements an scalable ordering method for spatial problems which greatly improves accuracy. * '

'' is an R package available through CRAN which implements the latent Vecchia approximation * '
pyMRA
'' is a Python package available through pyPI implementing Multi-resolution approximation, a special case of the general Vecchia method used in dynamic state-space models * '
meshed
'' is an R package available through CRAN which implements Bayesian spatial or spatiotemporal multivariate regression models based a latent Meshed Gaussian Process (MGP) using Vecchia approximations on partitioned domains

Notes

{{Reflist Geostatistics Computational science Computational statistics Statistical software