A roundoff error, also called rounding error, is the difference between the result produced by a given
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic.
Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of
quantization error
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and ...
. When using approximation
equation
In mathematics, an equation is a formula that expresses the equality of two expressions, by connecting them with the equals sign . The word ''equation'' and its cognates in other languages may have subtly different meanings; for example, in ...
s or algorithms, especially when using finitely many digits to represent real numbers (which in theory have infinitely many digits), one of the goals of
numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...
is to
estimate
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
computation errors. Computation errors, also called
numerical error In software engineering and mathematics, numerical error is the error in the numerical computations.
Types
It can be the combined effect of two kinds of error in a calculation.
* the first is caused by the finite precision of computations involv ...
s, include both
truncation error
In numerical analysis and scientific computing, truncation error is an error caused by approximating a mathematical process.
Examples Infinite series
A summation series for e^x is given by an infinite series such as
e^x=1+ x+ \frac + \frac ...
s and roundoff errors.
When a sequence of calculations with an input involving any roundoff error are made, errors may accumulate, sometimes dominating the calculation. In
ill-conditioned
In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...
problems, significant error may accumulate.
In short, there are two major facets of roundoff errors involved in numerical calculations:
# The ability of computers to represent both magnitude and precision of numbers is inherently limited.
# Certain numerical manipulations are highly sensitive to roundoff errors. This can result from both mathematical considerations as well as from the way in which computers perform arithmetic operations.
Representation error
The error introduced by attempting to represent a number using a finite string of digits is a form of roundoff error called representation error.
Here are some examples of representation error in decimal representations:
Increasing the number of digits allowed in a representation reduces the magnitude of possible roundoff errors, but any representation limited to finitely many digits will still cause some degree of roundoff error for
uncountably many real numbers. Additional digits used for intermediary steps of a calculation are known as
guard digits.
Rounding multiple times can cause error to accumulate.
For example, if 9.945309 is rounded to two decimal places (9.95), then rounded again to one decimal place (10.0), the total error is 0.054691. Rounding 9.945309 to one decimal place (9.9) in a single step introduces less error (0.045309). This can occur, for example, when software performs arithmetic in
x86 80-bit floating-point and then rounds the result to
IEEE 754 binary64 floating-point.
Floating-point number system
Compared with the
fixed-point number system, the
floating-point number system is more efficient in representing real numbers so it is widely used in modern computers. While the real numbers
are infinite and continuous, a floating-point number system
is finite and discrete. Thus, representation error, which leads to roundoff error, occurs under the floating-point number system.
Notation of floating-point number system
A floating-point number system
is characterized by
integers:
*
: base or radix
*
: precision
*
: exponent range, where
is the lower bound and
is the upper bound
Any
has the following form:
where
is an integer such that
for
, and
is an integer such that
.
Normalized floating-number system
* A floating-point number system is normalized if the leading digit
is always nonzero unless the number is zero.
Since the mantissa is
, the mantissa of a nonzero number in a normalized system satisfies
. Thus, the normalized form of a nonzero
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
floating-point number is
where
. In binary, the leading digit is always
so it is not written out and is called the implicit bit. This gives an extra bit of precision so that the roundoff error caused by representation error is reduced.
* Since floating-point number system
is finite and discrete, it cannot represent all real numbers which means infinite real numbers can only be approximated by some finite numbers through
rounding rules. The floating-point approximation of a given real number
by
can be denoted.
** The total number of normalized floating-point numbers is
where
***
counts choice of sign, being positive or negative
***
counts choice of the leading digit
***
counts remaining mantissa
***
counts choice of exponents
***
counts the case when the number is
.
IEEE standard
In the
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
standard the base is binary, i.e.
, and normalization is used. The IEEE standard stores the sign, exponent, and mantissa in separate fields of a floating point word, each of which has a fixed width (number of bits). The two most commonly used levels of precision for floating-point numbers are single precision and double precision.
Machine epsilon
Machine epsilon
Machine epsilon or machine precision is an upper bound on the relative approximation error due to rounding in floating point arithmetic. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the subjec ...
can be used to measure the level of roundoff error in the floating-point number system. Here are two different definitions.
* The machine epsilon, denoted
, is the maximum possible
absolute relative error in representing a nonzero real number
in a floating-point number system.
* The machine epsilon, denoted
, is the smallest number
such that
. Thus,
whenever
.
Roundoff error under different rounding rules
There are two common rounding rules, round-by-chop and round-to-nearest. The IEEE standard uses round-to-nearest.
* Round-by-chop: The base-
expansion of
is truncated after the
-th digit.
** This rounding rule is biased because it always moves the result toward zero.
* Round-to-nearest:
is set to the nearest floating-point number to
. When there is a tie, the floating-point number whose last stored digit is even (also, the last digit, in binary form, is equal to 0) is used.
** For IEEE standard where the base
is
, this means when there is a tie it is rounded so that the last digit is equal to
.
** This rounding rule is more accurate but more computationally expensive.
** Rounding so that the last stored digit is even when there is a tie ensures that it is not rounded up or down systematically. This is to try to avoid the possibility of an unwanted slow drift in long calculations due simply to a biased rounding.
* The following example illustrates the level of roundoff error under the two rounding rules.
The rounding rule, round-to-nearest, leads to less roundoff error in general.
Calculating roundoff error in IEEE standard
Suppose the usage of round-to-nearest and IEEE double precision.
* Example: the decimal number
can be rearranged into
Since the
-rd bit to the right of the binary point is a 1 and is followed by other nonzero bits, the round-to-nearest rule requires rounding up, that is, add 1 bit to the
-nd bit. Thus, the normalized floating-point representation in IEEE standard of 9.4 is
* Now the roundoff error can be calculated when representing
with
.
This representation is derived by discarding the infinite tail
from the right tail and then added
in the rounding step.
:Then
.
:Thus, the roundoff error is
.
Measuring roundoff error by using machine epsilon
The machine epsilon
can be used to measure the level of roundoff error when using the two rounding rules above. Below are the formulas and corresponding proof.
The first definition of machine epsilon is used here.
Theorem
# Round-by-chop:
# Round-to-nearest:
Proof
Let
where