Probabilistic numerics is a
activefield of study at the intersection of
applied mathematics
Applied mathematics is the application of mathematics, mathematical methods by different fields such as physics, engineering, medicine, biology, finance, business, computer science, and Industrial sector, industry. Thus, applied mathematics is a ...
,
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, and
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
centering on the concept of uncertainty in
computation
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms.
Mechanical or electronic devices (or, hist ...
. In probabilistic numerics, tasks in
numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...
such as finding numerical solutions for
integration,
linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrix (mathemat ...
,
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
and
simulation and differential equations are seen as problems of statistical, probabilistic, or
Bayesian inference
Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
.
Introduction
A numerical method is an algorithm that ''approximates'' the solution to a mathematical problem (examples below include the solution to a
linear system of equations, the value of an
integral
In mathematics, an integral is the continuous analog of a Summation, sum, which is used to calculate area, areas, volume, volumes, and their generalizations. Integration, the process of computing an integral, is one of the two fundamental oper ...
, the solution of a
differential equation, the
minimum
In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...
of a multivariate function). In a ''probabilistic'' numerical algorithm, this process of approximation is thought of as a problem of ''estimation'', ''inference'' or ''learning'' and realised in the framework of
probabilistic inference (often, but not always,
Bayesian inference
Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
).
Formally, this means casting the setup of the computational problem in terms of a
prior distribution
A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
, formulating the relationship between numbers computed by the computer (e.g. matrix-vector multiplications in linear algebra, gradients in optimization, values of the integrand or the vector field defining a differential equation) and the quantity in question (the solution of the linear problem, the minimum, the integral, the solution curve) in a
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
, and returning a
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...
as the output. In most cases, numerical algorithms also take internal adaptive decisions about which numbers to compute, which form an
active learning
Active learning is "a method of learning in which students are actively or experientially involved in the learning process and where there are different levels of active learning, depending on student involvement." states that "students particip ...
problem.
Many of the most popular classic numerical algorithms can be re-interpreted in the probabilistic framework. This includes the method of
conjugate gradients,
Nordsieck methods,
Gaussian quadrature
In numerical analysis, an -point Gaussian quadrature rule, named after Carl Friedrich Gauss, is a quadrature rule constructed to yield an exact result for polynomials of degree or less by a suitable choice of the nodes and weights for .
Th ...
rules,
and
quasi-Newton methods.
In all these cases, the classic method is based on a regularized
least-squares estimate that can be associated with the posterior mean arising from a
Gaussian prior and likelihood. In such cases, the variance of the Gaussian posterior is then associated with a
worst-case estimate for the squared error.
Probabilistic numerical methods promise several conceptual advantages over classic, point-estimate based approximation techniques:
* They return ''structured'' error estimates (in particular, the ability to return joint posterior samples, i.e. multiple realistic hypotheses for the true unknown solution of the problem)
* Hierarchical Bayesian inference can be used to set and control internal hyperparameters in such methods in a generic fashion, rather than having to re-invent novel methods for each parameter
* Since they use and allow for an explicit likelihood describing the relationship between computed numbers and target quantity, probabilistic numerical methods can use the results of even highly imprecise, biased and stochastic computations.
Conversely, probabilistic numerical methods can also provide a likelihood in computations often considered "
likelihood-free" elsewhere
* Because all probabilistic numerical methods use essentially the same data type – probability measures – to quantify uncertainty over ''both inputs and outputs'' they can be chained together to propagate uncertainty across large-scale, composite computations
* Sources from multiple sources of information (e.g. algebraic, mechanistic knowledge about the form of a differential equation, and observations of the trajectory of the system collected in the physical world) can be combined naturally and ''inside the inner loop'' of the algorithm, removing otherwise necessary nested loops in computation, e.g. in
inverse problem
An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, sound source reconstruction, source reconstruction in ac ...
s.
These advantages are essentially the equivalent of similar functional advantages that Bayesian methods enjoy over point-estimates in machine learning, applied or transferred to the computational domain.
Numerical tasks
Integration

Probabilistic numerical methods have been developed for the problem of
numerical integration
In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral.
The term numerical quadrature (often abbreviated to quadrature) is more or less a synonym for "numerical integr ...
, with the most popular method called ''
Bayesian quadrature''.
In numerical integration, function evaluations
at a number of points
are used to estimate the integral
of a function
against some measure
. Bayesian quadrature consists of specifying a prior distribution over
and conditioning this prior on
to obtain a posterior distribution over
, then computing the implied posterior distribution on
. The most common choice of prior is a
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...
as this allows us to obtain a closed-form posterior distribution on the integral which is a univariate Gaussian distribution. Bayesian quadrature is particularly useful when the function
is expensive to evaluate and the dimension of the data is small to moderate.
Optimization

Probabilistic numerics have also been studied for
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
, which consist of finding the minimum or maximum of some objective function
given (possibly noisy or indirect) evaluations of that function at a set of points.
Perhaps the most notable effort in this direction is
Bayesian optimization,
a general approach to optimization grounded in Bayesian inference. Bayesian optimization algorithms operate by maintaining a probabilistic belief about
throughout the optimization procedure; this often takes the form of a
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...
prior conditioned on observations. This belief then guides the algorithm in obtaining observations that are likely to advance the optimization process. Bayesian optimization policies are usually realized by transforming the objective function posterior into an inexpensive, differentiable ''acquisition function'' that is maximized to select each successive observation location. One prominent approach is to model optimization via
Bayesian sequential experimental design, seeking to obtain a sequence of observations yielding the most optimization progress as evaluated by an appropriate
utility function
In economics, utility is a measure of a certain person's satisfaction from a certain state of the world. Over time, the term has been used with at least two meanings.
* In a Normative economics, normative context, utility refers to a goal or ob ...
. A welcome side effect from this approach is that uncertainty in the objective function, as measured by the underlying probabilistic belief, can guide an optimization policy in addressing the classic
exploration vs. exploitation tradeoff.
Local optimization
Probabilistic numerical methods have been developed in the context of
stochastic optimization
Stochastic optimization (SO) are optimization methods that generate and use random variables. For stochastic optimization problems, the objective functions or constraints are random. Stochastic optimization also include methods with random iter ...
for
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
, in particular to address main issues such as
learning rate tuning and
line searches,
batch-size selection,
early stopping,
pruning,
and first- and second-order search directions.
In this setting, the optimization objective is often an
empirical risk of the form
defined by a dataset
, and a loss
that quantifies how well a predictive model
parameterized by
performs on predicting the target
from its corresponding input
.
Epistemic uncertainty arises when the dataset size
is large and cannot be processed at once meaning that local quantities (given some
) such as the loss function
itself or its gradient
cannot be computed in reasonable time.
Hence, generally mini-batching is used to construct estimators of these quantities on a random subset of the data. Probabilistic numerical methods model this uncertainty explicitly and allow for automated decisions and parameter tuning.
Linear algebra
Probabilistic numerical methods for linear algebra
[
]
have primarily focused on solving
systems of linear equations
In mathematics, a system of linear equations (or linear system) is a collection of two or more linear equations involving the same variables.
For example,
: \begin
3x+2y-z=1\\
2x-2y+4z=-2\\
-x+\fracy-z=0
\end
is a system of three equations in ...
of the form
and the computation of
determinants
In mathematics, the determinant is a scalar-valued function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the matrix and the linear map represented, on a ...
.

A large class of methods are iterative in nature and collect information about the linear system to be solved via repeated matrix-vector multiplication
with the system matrix
with different vectors
.
Such methods can be roughly split into a solution-
and a matrix-based perspective,
depending on whether belief is expressed over the solution
of the linear system or the (pseudo-)inverse of the matrix
.
The belief update uses that the inferred object is linked to matrix multiplications
or
via
and
.
Methods typically assume a Gaussian distribution, due to its closedness under linear observations of the problem. While conceptually different, these two views are computationally equivalent and inherently connected via the right-hand-side through
.
Probabilistic numerical linear algebra routines have been successfully applied to scale
Gaussian processes to large datasets.
In particular, they enable ''exact'' propagation of the approximation error to a combined Gaussian process posterior, which quantifies the uncertainty arising from both the ''finite number of data'' observed and the ''finite amount of computation'' expended.
Ordinary differential equations

Probabilistic numerical methods for
ordinary differential equation
In mathematics, an ordinary differential equation (ODE) is a differential equation (DE) dependent on only a single independent variable (mathematics), variable. As with any other DE, its unknown(s) consists of one (or more) Function (mathematic ...
s
, have been developed for initial and boundary value problems. Many different probabilistic numerical methods designed for ordinary differential equations have been proposed, and these can broadly be grouped into the two following categories:
* Randomisation-based methods are defined through random perturbations of standard deterministic numerical methods for ordinary differential equations. For example, this has been achieved by adding Gaussian perturbations on the solution of one-step integrators
or by perturbing randomly their time-step.
This defines a probability measure on the solution of the differential equation that can be sampled.
* Gaussian process regression methods are based on posing the problem of solving the differential equation at hand as a Gaussian process regression problem, interpreting evaluations of the right-hand side as data on the derivative.
These techniques resemble to Bayesian cubature, but employ different and often non-linear observation models.
In its infancy, this class of methods was based on naive
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...
regression. This was later improved (in terms of efficient computation) in favor of GaussMarkov priors
modeled by the
stochastic differential equation
A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. SDEs have many applications throughout pure mathematics an ...
, where
is a
-dimensional vector modeling the first
derivatives of
, and where
is a
-dimensional
Brownian motion
Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...
. Inference can thus be implemented efficiently with
Kalman filter
In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unk ...
ing based methods.
The boundary between these two categories is not sharp, indeed a Gaussian process regression approach based on randomised data was developed as well.
These methods have been applied to problems in computational Riemannian geometry,
inverse problems, latent force models, and to differential equations with a geometric structure such as symplecticity.
Partial differential equations
A number of probabilistic numerical methods have also been proposed for
partial differential equations
In mathematics, a partial differential equation (PDE) is an equation which involves a multivariable function and one or more of its partial derivatives.
The function is often thought of as an "unknown" that solves the equation, similar to how ...
. As with ordinary differential equations, the approaches can broadly be divided into those based on randomisation, generally of some underlying finite-element mesh
and those based on Gaussian process regression.

Probabilistic numerical PDE solvers based on Gaussian process regression recover classical methods on linear PDEs for certain priors, in particular
methods of mean weighted residuals, which include
Galerkin methods,
finite element methods, as well as
spectral methods
Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically solve certain differential equations. The idea is to write the solution of the differential equation as a sum of certain "basis functio ...
.
History and related fields
The interplay between
numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...
and
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
is touched upon by a number of other areas of mathematics, including
average-case analysis of numerical methods,
information-based complexity,
game theory
Game theory is the study of mathematical models of strategic interactions. It has applications in many fields of social science, and is used extensively in economics, logic, systems science and computer science. Initially, game theory addressed ...
, and statistical
decision theory
Decision theory or the theory of rational choice is a branch of probability theory, probability, economics, and analytic philosophy that uses expected utility and probabilities, probability to model how individuals would behave Rationality, ratio ...
. Precursors to what is now being called "probabilistic numerics" can be found as early as the late 19th and early 20th century.
The origins of probabilistic numerics can be traced to a discussion of probabilistic approaches to polynomial interpolation by
Henri Poincaré
Jules Henri Poincaré (, ; ; 29 April 185417 July 1912) was a French mathematician, Theoretical physics, theoretical physicist, engineer, and philosophy of science, philosopher of science. He is often described as a polymath, and in mathemati ...
in his ''Calcul des Probabilités''.
In modern terminology, Poincaré considered a
Gaussian prior distribution on a function
, expressed as a formal
power series
In mathematics, a power series (in one variable) is an infinite series of the form
\sum_^\infty a_n \left(x - c\right)^n = a_0 + a_1 (x - c) + a_2 (x - c)^2 + \dots
where ''a_n'' represents the coefficient of the ''n''th term and ''c'' is a co ...
with random coefficients, and asked for "probable values" of
given this prior and
observations
for
.
A later seminal contribution to the interplay of numerical analysis and probability was provided by Albert Suldin in the context of univariate
quadrature.
The statistical problem considered by Suldin was the approximation of the definite integral
of a function
, under a
Brownian motion
Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...
prior on
, given access to pointwise evaluation of
at nodes