A computer experiment or simulation experiment is an experiment used to study a computer simulation, also referred to as an

in silico In biology and other experimental sciences, an ''in silico'' experiment is one performed on computer or via computer simulation. The phrase is pseudo-Latin for 'in silicon' (correct la, in silicio), referring to silicon in computer chips. It ...

system. This area includes

computational physics Computational physics is the study and implementation of numerical analysis to solve problems in physics for which a quantitative theory already exists. Historically, computational physics was the first application of modern computers in science, ...

computational chemistry Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into computer programs, to calculate the structures and properties of mo ...

computational biology Computational biology refers to the use of data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the ...

and other similar disciplines.

Background

Computer simulation Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be deter ...

s are constructed to emulate a physical system. Because these are meant to replicate some aspect of a system in detail, they often do not yield an analytic solution. Therefore, methods such as

discrete event simulation A discrete-event simulation (DES) models the operation of a system as a ( discrete) sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in t ...

finite element The finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat t ...

solvers are used. A

computer model Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be deter ...

is used to make inferences about the system it replicates. For example, climate models are often used because experimentation on an earth sized object is impossible.

Objectives

Computer experiments have been employed with many purposes in mind. Some of those include: * Uncertainty quantification: Characterize the uncertainty present in a computer simulation arising from unknowns during the computer simulation's construction. *

Inverse problem An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, source reconstruction in acoustics, or calculating the ...

s: Discover the underlying properties of the system from the physical data. * Bias correction: Use physical data to correct for bias in the simulation. *

Data assimilation Data assimilation is a mathematical discipline that seeks to optimally combine theory (usually in the form of a numerical model) with observations. There may be a number of different goals sought – for example, to determine the optimal state es ...

: Combine multiple simulations and physical data sources into a complete predictive model. *

Systems design Systems design interfaces, and data for an electronic control system to satisfy specified requirements. System design could be seen as the application of system theory to product development. There is some overlap with the disciplines of system ...

: Find inputs that result in optimal system performance measures.

Computer simulation modeling

Modeling of computer experiments typically uses a Bayesian framework.

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

is an interpretation of the field of

statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

where all evidence about the true state of the world is explicitly expressed in the form of

probabilities Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

. In the realm of computer experiments, the Bayesian interpretation would imply we must form a

prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...

that represents our prior belief on the structure of the computer model. The use of this philosophy for computer experiments started in the 1980s and is nicely summarized by Sacks et al. (1989

While the Bayesian approach is widely used,

frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...

approaches have been recently discusse

The basic idea of this framework is to model the computer simulation as an unknown function of a set of inputs. The computer simulation is implemented as a piece of computer code that can be evaluated to produce a collection of outputs. Examples of inputs to these simulations are coefficients in the underlying model, initial conditions and forcing functions. It is natural to see the simulation as a deterministic function that maps these ''inputs'' into a collection of ''outputs''. On the basis of seeing our simulator this way, it is common to refer to the collection of inputs as

x

, the computer simulation itself as

f

, and the resulting output as

f(x)

. Both

x

and

f(x)

are vector quantities, and they can be very large collections of values, often indexed by space, or by time, or by both space and time. Although

f(\cdot)

is known in principle, in practice this is not the case. Many simulators comprise tens of thousands of lines of high-level computer code, which is not accessible to intuition. For some simulations, such as climate models, evaluation of the output for a single set of inputs can require millions of computer hour

Gaussian process prior

The typical model for a computer code output is a Gaussian process. For notational simplicity, assume

f(x)

is a scalar. Owing to the Bayesian framework, we fix our belief that the function

f

follows a

Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...

f \sim \operatorname(m(\cdot),C(\cdot,\cdot)),

where

m

is the mean function and

C

is the covariance function. Popular mean functions are low order polynomials and a popular

covariance function In probability theory and statistics, the covariance function describes how much two random variables change together (their ''covariance'') with varying spatial or temporal separation. For a random field or stochastic process ''Z''(''x'') on a doma ...

is Matern covariance, which includes both the exponential (

\nu = 1/2

) and Gaussian covariances (as

\nu \rightarrow \infty

Design of computer experiments

The design of computer experiments has considerable differences from

design of experiments The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...

for parametric models. Since a Gaussian process prior has an infinite dimensional representation, the concepts of A and D criteria (see

Optimal design In the design of experiments, optimal designs (or optimum designs) are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistic ...

), which focus on reducing the error in the parameters, cannot be used. Replications would also be wasteful in cases when the computer simulation has no error. Criteria that are used to determine a good experimental design include integrated mean squared prediction erro

and distance based criteri

Popular strategies for design include

latin hypercube sampling Latin hypercube sampling (LHS) is a statistical method for generating a near-random sample of parameter values from a multidimensional distribution. The sampling method is often used to construct computer experiments or for Monte Carlo integratio ...

and low discrepancy sequences.

Problems with massive sample sizes

Unlike physical experiments, it is common for computer experiments to have thousands of different input combinations. Because the standard inference requires matrix inversion of a square matrix of the size of the number of samples (

n

), the cost grows on the

\mathcal (n^3)

. Matrix inversion of large, dense matrices can also cause numerical inaccuracies. Currently, this problem is solved by greedy decision tree techniques, allowing effective computations for unlimited dimensionality and sample siz
patent WO2013055257A1
or avoided by using approximation methods, e.g