numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods ...

, catastrophic cancellation is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers. For example, if there are two studs, one

L_1 = 254.5\,\text

long and the other

L_2 = 253.5\,\text

long, and they are measured with a ruler that is good only to the centimeter, then the approximations could come out to be

\tilde L_1 = 255\,\text

and

\tilde L_2 = 253\,\text

. These may be good approximations, in

relative error The approximation error in a data value is the discrepancy between an exact value and some ''approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute er ...

, to the true lengths: the approximations are in error by less than 2% of the true lengths,

, L_1 - \tilde L_1, /, L_1,  < 2\%

. However, if the ''approximate'' lengths are subtracted, the difference will be

\tilde L_1 - \tilde L_2 = 255\,\text - 253\,\text = 2\,\text

, even though the true difference between the lengths is

L_1 - L_2 = 254.5\,\text - 253.5\,\text = 1\,\text

. The difference of the approximations,

2\,\text

, is in error by 100% of the magnitude of the difference of the true values,

1\,\text

. Catastrophic cancellation may happen even if the difference is computed exactly, as in the example above—it is not a property of any particular kind of arithmetic like

floating-point arithmetic In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...

; rather, it is inherent to subtraction, when the ''inputs'' are approximations themselves. Indeed, in floating-point arithmetic, when the inputs are close enough, the floating-point difference is computed exactly, by the

Sterbenz lemma In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974. The Sterbe ...

—there is no rounding error introduced by the floating-point subtraction operation.

Formal analysis

Formally, catastrophic cancellation happens because subtraction is

ill-conditioned In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...

at nearby inputs: even if approximations

\tilde x = x (1 + \delta_x)

and

\tilde y = y (1 + \delta_y)

have small relative errors

, \delta_x,  = , x - \tilde x, /, x,

and

, \delta_y,  = , y - \tilde y, /, y,

from true values

x

and

y

, respectively, the relative error of the approximate ''difference''

\tilde x - \tilde y

from the true difference

x - y

is inversely proportional to the true difference:

\begin
  \tilde x - \tilde y
  &= x (1 + \delta_x) - y (1 + \delta_y)
   = x - y + x \delta_x - y \delta_y \\
  &= x - y + (x - y) \frac \\
  &= (x - y) \biggr(1 + \frac\biggr).
\end

Thus, the relative error of the exact difference

\tilde x - \tilde y

of the approximations from the difference

x - y

of the true numbers is

\left, \frac\.

which can be arbitrarily large if the true inputs

x

and

y

are close.

In numerical algorithms

Subtracting nearby numbers in floating-point arithmetic does not always cause catastrophic cancellation, or even any error—by the

, if the numbers are close enough the floating-point difference is exact. But cancellation may ''amplify'' errors in the inputs that arose from rounding in other floating-point arithmetic.

Example: Difference of squares

Given numbers

x

and

y

, the naive attempt to compute the mathematical function

x^2 - y^2

by the floating-point arithmetic

\operatorname(\operatorname(x^2) - \operatorname(y^2))

is subject to catastrophic cancellation when

x

and

y

are close in magnitude, because the subtraction can expose the rounding errors in the squaring. The alternative factoring

(x + y) (x - y)

, evaluated by the floating-point arithmetic

\operatorname(\operatorname(x + y) \cdot \operatorname(x - y))

, avoids catastrophic cancellation because it avoids introducing rounding error leading into the subtraction. For example, if

x = 1 + 2^ \approx 1.0000000018626451

and

y = 1 + 2^ \approx 1.0000000009313226

, then the true value of the difference

x^2 - y^2

2^ \cdot (1 + 2^ + 2^) \approx 1.8626451518330422 \times 10^

. In IEEE 754 binary64 arithmetic, evaluating the alternative factoring

(x + y) (x - y)

gives the correct result exactly (with no rounding), but evaluating the naive expression

x^2 - y^2

gives the floating-point number

2^ = 1.8626451\underline \times 10^

, of which less than half the digits are correct and the other (underlined) digits reflect the missing terms

2^ + 2^

, lost due to rounding when calculating the intermediate squared values.

Example: Complex arcsine

When computing the complex arcsine function, one may be tempted to use the logarithmic formula directly:

\arcsin(z) = i \log \bigl(\sqrt - i z\bigr).

However, suppose

z = i y

for

y \ll 0

. Then

\sqrt \approx -y

and

i z = -y

; call the difference between them

\varepsilon

—a very small difference, nearly zero. If

\sqrt

is evaluated in floating-point arithmetic giving

\operatorname\Bigl(\sqrt\Bigr)
= \sqrt(1 + \delta)

with any error

\delta \ne 0

, where

\operatorname(\cdots)

denotes floating-point rounding, then computing the difference

\sqrt(1 + \delta) - i z

of two nearby numbers, both very close to

-y

, may amplify the error

\delta

in one input by a factor of

1/\varepsilon

—a very large factor because

\varepsilon

was nearly zero. For instance, if

z = -1234567 i

, the true value of

\arcsin(z)

is approximately

-14.71937803983977i

, but using the naive logarithmic formula in IEEE 754 binary64 arithmetic may give

-14.719\underlinei

, with only five out of sixteen digits correct and the remainder (underlined) all garbage. In the case of

z = i y

for

y < 0

, using the identity

\arcsin(z) = -\arcsin(-z)

avoids cancellation because

\sqrt = \sqrt \approx -y

but

i (-z) = -i z = y

, so the subtraction is effectively addition with the same sign which does not cancel.

Example: Radix conversion

Numerical constants in software programs are often written in decimal, such as in the C fragment double x = 1.000000000000001; to declare and initialize an IEEE 754 binary64 variable named x. However,

1.000000000000001

is not a binary64 floating-point number; the nearest one, which x will be initialized to in this fragment, is

1.0000000000000011102230246251565404236316680908203125 = 1 + 5\cdot 2^

. Although the radix conversion from decimal floating-point to binary floating-point only incurs a small relative error, catastrophic cancellation may amplify it into a much larger one: double x = 1.000000000000001; // rounded to 1 + 5*2^ double y = 1.000000000000002; // rounded to 1 + 9*2^ double z = y - x; // difference is exactly 4*2^ The difference

1.000000000000002 - 1.000000000000001

0.000000000000001 = 1.0\times 10^

. The relative errors of x from

1.000000000000001

and of y from

1.000000000000002

are both below

10^ = 0.0000000000001\%

, and the floating-point subtraction y - x is computed exactly by the Sterbenz lemma. But even though the inputs are good approximations, and even though the subtraction is computed exactly, the difference of the ''approximations''

\tilde y - \tilde x = (1 + 9\cdot 2^) - (1 + 5\cdot 2^) = 4\cdot 2^ \approx 8.88\times 10^

has a relative error of over

11\%

from the difference

1.0\times 10^

of the original values as written in decimal: catastrophic cancellation amplified a tiny error in radix conversion into a large error in the output.

Benign cancellation

Cancellation is sometimes useful and desirable in numerical algorithms. For example, the 2Sum and Fast2Sum algorithms both rely on such cancellation after a rounding error in order to exactly compute what the error was in a floating-point addition operation as a floating-point number itself. The function

\log(1 + x)

, if evaluated naively at points

0 < x \lll 1

, will lose most of the digits of

x

in rounding

\operatorname(1 + x)

. However, the function

\log(1 + x)

itself is

well-conditioned In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...

at inputs near

0

. Rewriting it as

\log(1 + x) = x \frac

exploits cancellation in

\hat x := \operatorname(1 + x) - 1

to avoid the error from

\log(1 + x)

evaluated directly. This works because the cancellation in the numerator

\log(\operatorname(1 + x)) = \hat x + O(\hat x^2)

and the cancellation in the denominator

\hat x = \operatorname(1 + x) - 1

counteract each other; the function

\mu(\xi) = \log(1 + \xi)/\xi

is well-enough conditioned near zero that

\mu(\hat x)

gives a good approximation to

\mu(x)

, and thus

x\cdot \mu(\hat x)

gives a good approximation to

x\cdot \mu(x) = \log(1 + x)

References

{{reflist Numerical analysis