Parareal is a

parallel algorithm In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract machin ...

from

numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...

and used for the solution of

initial value problem In multivariable calculus, an initial value problem (IVP) is an ordinary differential equation together with an initial condition which specifies the value of the unknown function at a given point in the domain. Modeling a system in physics or oth ...

s. It was introduced in 2001 by

Lions The lion (''Panthera leo'') is a large cat of the genus ''Panthera'' native to Africa and India. It has a muscular, broad-chested body; short, rounded head; round ears; and a hairy tuft at the end of its tail. It is sexually dimorphic; adult ...

, Maday and Turinici. Since then, it has become one of the most widely studied parallel-in-time integration methods.

Parallel-in-time integration methods

In contrast to e.g. Runge-Kutta or multi-step methods, some of the computations in Parareal can be performed in parallel and Parareal is therefore one example of a parallel-in-time integration method. While historically most efforts to parallelize the

numerical solution Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods th ...

partial differential equation In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a Multivariable calculus, multivariable function. The function is often thought of as an "unknown" to be sol ...

s focussed on the spatial discretization, in view of the challenges from

exascale computing Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance. Exascale ...

, parallel methods for

temporal discretization Temporal discretization is a mathematical technique applied to transient problems that occur in the fields of applied physics and engineering. Transient problems are often solved by conducting simulations using computer-aided engineering (CAE) p ...

have been identified as a possible way to increase concurrency in

numerical software Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods th ...

. Because Parareal computes the numerical solution for multiple time steps in parallel, it is categorized as a ''parallel across the steps'' method. This is in contrast to approaches using ''parallelism across the method'' like parallel Runge-Kutta or extrapolation methods, where independent stages can be computed in parallel or ''parallel across the system'' methods like waveform relaxation.

History

Parareal can be derived as both a

multigrid method In numerical analysis, a multigrid method (MG method) is an algorithm for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhi ...

in time method or as multiple shooting along the time axis. Both ideas, multigrid in time as well as adopting multiple shooting for time integration, go back to the 1980s and 1990s. Parareal is a widely studied method and has been used and modified for a range of different applications. Ideas to parallelize the solution of initial value problems go back even further: the first paper proposing a parallel-in-time integration method appeared in 1964.

Algorithm

The Problem

The goal is to solve an initial value problem of the form

\quad \text \quad u(t_0) = u^0.

The right hand side

f

is assumed to be a smooth (possibly nonlinear) function. It can also correspond to the spatial discretization of a partial differential equation in a

method of lines The method of lines (MOL, NMOL, NUMOL) is a technique for solving partial differential equations (PDEs) in which all but one dimension is discretized. By reducing a PDE to a single continuous dimension, the method of lines allows solutions to be ...

approach. We wish to solve this problem on a temporal mesh of

N+1

equally spaced points

(t_0,t_1,\ldots,t_N)

, where

t_ = t_j + \Delta T

and

\Delta T = (T-t_0)/N

. Carrying out this discretisation we obtain a partitioned time interval consisting of time slices

_j,t_

for

j = 0,\ldots,N-1

. The objective is to calculate numerical approximations

U_j

to the exact solution

u(t_j)

using a serial time-stepping method (e.g. Runge-Kutta) that has high numerical accuracy (and therefore high computational cost). We refer to this method as the fine solver

\mathcal

, which propagates an initial value

U_j

at time

t_j

to a terminal value

U_

at time

t_

. The goal is to calculate the solution (with high numerical accuracy) using

\mathcal

such that we obtain

U_ = \mathcal(t_j,t_,U_j), \quad \text \quad U_0 = u^0.

The problem with this (and the reason for attempting to solve in parallel in the first place) solution is that it is computationally infeasible to calculate in real-time.

How it works

Instead of using a single processor to solve the initial value problem (as is done with classical time-stepping methods), Parareal makes use of

N

processors. The aim to is to use

N

processors to solve

N

smaller initial value problems (one on each time slice) in parallel. For example, in a MPI based code,

N

would be the number of processes, while in an

OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...

based code,

N

would be equal to the number of threads. Parareal makes use of a second time-stepping method to solve this initial value problem in parallel, referred to as the coarse solver

\mathcal

. The coarse solver works the same way as the fine solver, propagating an initial value over a time interval of length

\Delta T

, however it does so at much lower numerical accuracy than

\mathcal

(and therefore at much lower computational cost). Having a coarse solver that is much less computationally expensive than the fine solver is the key to achieving parallel speed-up with Parareal. Henceforth, we will denote the Parareal solution at time

t_j

and iteration

k

U^k_j

Zeroth Iteration

Firstly, run the coarse solver serially over the entire time interval

_0,T

to calculate an approximate initial guess to the solution:

U^0_ = \mathcal(t_j,t_,U^0_j), \quad j=0,\ldots,N-1.

Subsequent Iterations

Next, run the fine solver on each of the time slices, in parallel, from the most up-to-date solution values:

\mathcal(t_j, t_,U^_j), \quad j=0, \ldots, N-1.

Now update the parareal solution values sequentially using the predictor-corrector:

U_^ = \mathcal(t_j, t_,U^_j) + \mathcal(t_j, t_,U^_j) - \mathcal(t_j, t_,U^_j), \quad j=0, \ldots, N-1.

At this stage, one can use a stopping criterion to determine whether the solution values are no longer changing each iteration. For example, one may test this by checking if

,  U^_j - U^_j ,  < \varepsilon \quad \forall \ j \leq N,

and some tolerance

\varepsilon > 0

. If this critertion is not satisfied, subsequent iterations can then be run by applying the fine solver in parallel and then the predictor-corrector. Once the criterion is satisfied, however, the algorithm is said to have converged in

k \leq N

iterations. Note that other stopping criterion do exist and have been successfully tested in Parareal.

Remarks

Parareal should reproduce the solution that is obtained by the serial application of the fine solver and will converge in a maximum of

N

iterations. For Parareal to provide speedup, however, it has to converge in a number of iterations significantly smaller than the number of time slices, i.e.

k \ll N

. In the Parareal iteration, the computationally expensive evaluation of

\mathcal(t_j, t_,U^_j)

can be performed in parallel on

N

processing units. By contrast, the dependency of

U^_

\mathcal(t_j, t_,U^_j)

means that the coarse correction has to be computed in serial order. Typically, some form of Runge-Kutta method is chosen for both coarse and fine integrator, where

\mathcal

might be of lower order and use a larger time step than

\mathcal

. If the initial value problem stems from the discretization of a PDE,

\mathcal

can also use a coarser spatial discretization, but this can negatively impact convergence unless high order interpolation is used.

Speedup

Under some assumptions, a simple theoretical model for the

speedup In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with d ...

of Parareal can be derived. Although in applications these assumptions can be too restrictive, the model still is useful to illustrate the trade offs that are involved in obtaining speedup with Parareal. First, assume that every time slice

_j, t_/math> consists of exactly N_f steps of the fine integrator and of N_c steps of the coarse integrator.
This includes in particular the assumption that all time slices are of identical length and that both coarse and fine integrator use a constant step size over the full simulation.
Second, denote by \tau_f and \tau_c the computing time required for a single step of the fine and coarse methods, respectively, and assume that both are constant.
This is typically not exactly true when an

implicit Implicit may refer to: Mathematics * Implicit function * Implicit function theorem * Implicit curve * Implicit surface * Implicit differential equation Other uses * Implicit assumption, in logic * Implicit-association test, in social psychology ...

method is used, because then runtimes vary depending on the number of iterations required by the

iterative solver In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...

. Under these two assumptions, the runtime for the fine method integrating over

P

time slices can be modelled as

c_ = N N_ \tau_f.

The runtime of Parareal using

P

processing units and performing

k

iterations is

c_ = (k+1) N N_ \tau_c + k N_ \tau_f.

Speedup of Parareal then is

S_ = \frac = \frac \leq \min\left\.

These two bounds illustrate the trade off that has to be made in choosing the coarse method: on the one hand, it has to be cheap and/or use a much larger time step to make the first bound as large as possible, on the other hand the number of iterations

k

has to be kept low to keep the second bound large. In particular, Parareal's parallel efficiency is bounded by

E_ = \frac \leq \frac,

that is by the inverse of the number of required iterations.

Instability for imaginary eigenvalues

The vanilla version of Parareal has issues for problems with imaginary

eigenvalues In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

. It typically only converges toward the very last iterations, that is as

k

approaches

N

, and the speedup

S_p

is always going to be smaller than one. So either the number of iterations is small and Parareal is unstable or, if

k

is large enough to make Parareal stable, no speedup is possible. This also means that Parareal is typically unstable for

hyperbolic Hyperbolic is an adjective describing something that resembles or pertains to a hyperbola (a curve), to hyperbole (an overstatement or exaggeration), or to hyperbolic geometry. The following phenomena are described as ''hyperbolic'' because they ...

equations. Even though the formal analysis by Gander and Vandewalle covers only linear problems with constant coefficients, the problem also arises when Parareal is applied to the nonlinear

Navier–Stokes equations In physics, the Navier–Stokes equations ( ) are partial differential equations which describe the motion of viscous fluid substances, named after French engineer and physicist Claude-Louis Navier and Anglo-Irish physicist and mathematician Geo ...

when the

viscosity The viscosity of a fluid is a measure of its resistance to deformation at a given rate. For liquids, it corresponds to the informal concept of "thickness": for example, syrup has a higher viscosity than water. Viscosity quantifies the inte ...

coefficient becomes too small and the

Reynolds number In fluid mechanics, the Reynolds number () is a dimensionless quantity that helps predict fluid flow patterns in different situations by measuring the ratio between inertial and viscous forces. At low Reynolds numbers, flows tend to be domi ...

too large. Different approaches exist to stabilise Parareal, one being Krylov-subspace enhanced Parareal.

Variants

There are multiple algorithms that are directly based or at least inspired by the original Parareal algorithm.

Krylov-subspace enhanced Parareal

Early on it was recognised that for linear problems information generated by the fine method

\mathcal_

can be used to improve the accuracy of the coarse method

\mathcal_

. Originally, the idea was formulated for the parallel implicit time-integrator PITA, a method closely related to Parareal but with small differences in how the correction is done. In every iteration

k

the result

\mathcal_(U^k_j)

is computed for values

u^k_j \in \mathbb^d

for

j=0, \ldots, N-1

. Based on this information, the subspace

S_k := \left\

is defined and updated after every Parareal iteration. Denote as

P_k

the

orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it wer ...

from

\mathbb^d

S_k

. Then, replace the coarse method with the improved integrator

\mathcal_(u) = \mathcal_(P_k u) + \mathcal_((I-P_k)u)

. As the number of iterations increases, the space

S_k

will grow and the modified propagator

\mathcal_

will become more accurate. This will lead to faster convergence. This version of Parareal can also stably integrate linear hyperbolic partial differential equations. An extension to nonlinear problems based on the reduced basis method exists as well.

Hybrid Parareal spectral deferred corrections

A method with improved parallel efficiency based on a combination of Parareal with spectral deferred corrections (SDC) has been proposed by M. Minion. It limits the choice for coarse and fine integrator to SDC, sacrificing flexibility for improved parallel efficiency. Instead of the limit of

1/k

, the bound on parallel efficiency in the hybrid method becomes

E_p \leq \frac

with

k_s

being the number of iterations of the serial SDC base method and

k_p

the typically greater number of iterations of the parallel hybrid method. The Parareal-SDC hybrid has been further improved by addition of a ''full approximation scheme'' as used in nonlinear

multigrid In numerical analysis, a multigrid method (MG method) is an algorithm for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhi ...

. This led to the development of the ''parallel full approximation scheme in space and time'' (PFASST). Performance of PFASST has been studied for PEPC, a Barnes-Hut tree code based particle solver developed at Juelich Supercomputing Centre. Simulations using all 262,144 cores on the IBM

BlueGene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...

/P system JUGENE showed that PFASST could produce additional speedup beyond saturation of the spatial tree parallelisation.

Multigrid reduction in time (MGRIT)

The multigrid reduction in time method (MGRIT) generalises the interpretation of Parareal as a multigrid-in-time algorithms to multiple levels using different smoothers. It is a more general approach but for a specific choice of parameters it is equivalent to Parareal. Th
XBraid
library implementing MGRIT is being developed by

Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...

ParaExp

ParaExp uses exponential integrators within Parareal. While limited to linear problems, it can produce almost optimal parallel speedup.

References

{{Reflist

External links

parallel-in-time.org
Numerical analysis Numerical differential equations Parallel computing Computational science