Double-precision floating-point format (sometimes called FP64 or float64) is a
floating-point
In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
number format, usually occupying 64
bits in computer memory; it represents a wide range of numeric values by using a floating
radix point
alt=Four types of separating decimals: a) 1,234.56. b) 1.234,56. c) 1'234,56. d) ١٬٢٣٤٫٥٦., Both a full_stop.html" ;"title="comma and a full stop">comma and a full stop (or period) are generally accepted decimal separators for interna ...
.
Double precision may be chosen when the range or precision of
single precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floa ...
would be insufficient.
In the
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...
standard, the 64-bit base-2 format is officially referred to as binary64; it was called double in
IEEE 754-1985
IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, ...
. IEEE 754 specifies additional floating-point formats, including 32-bit base-2 ''single precision'' and, more recently, base-10 representations (
decimal floating point
Decimal floating-point (DFP) arithmetic refers to both a representation and operations on Decimal data type, decimal floating-point numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically ...
).
One of the first
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s to provide floating-point data types was
Fortran. Before the widespread adoption of IEEE 754-1985, the representation and properties of floating-point data types depended on the
computer manufacturer
Current notable computer hardware manufacturers:
Cases
List of computer case manufacturers:
* Aigo
* Antec
* AOpen
* ASRock
* Asus
* be quiet!
* CaseLabs (defunct)
* Chassis Plans
* Cooler Master
* Corsair
* Deepcool
* DFI
* ECS ...
and computer model, and upon decisions made by programming-language implementers. E.g.,
GW-BASIC
GW-BASIC is a dialect of the BASIC programming language developed by Microsoft from IBM BASICA. Functionally identical to BASICA, its BASIC interpreter is a fully self-contained executable and does not need the Cassette BASIC ROM found in the ori ...
's double-precision data type was the
64-bit MBF floating-point format.
IEEE 754 double-precision binary floating-point format: binary64
Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. It is commonly known simply as ''double''. The IEEE 754 standard specifies a binary64 as having:
*
Sign bit: 1 bit
*
Exponent
In mathematics, exponentiation, denoted , is an operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, i ...
: 11 bits
*
Significand
The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
precision: 53 bits (52 explicitly stored)
The sign bit determines the sign of the number (including when this number is zero, which is
signed).
The exponent field is an 11-bit unsigned integer from 0 to 2047, in
biased form: an exponent value of 1023 represents the actual zero. Exponents range from −1022 to +1023 because exponents of −1023 (all 0s) and +1024 (all 1s) are reserved for special numbers.
The 53-bit significand precision gives from 15 to 17
significant decimal digits precision (2
−53 ≈ 1.11 × 10
−16). If a decimal string with at most 15 significant digits is converted to the IEEE 754 double-precision format, giving a normal number, and then converted back to a decimal string with the same number of digits, the final result should match the original string. If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.
The format is written with the
significand
The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
having an implicit integer bit of value 1 (except for special data, see the exponent encoding below). With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log
10(2) ≈ 15.955). The bits are laid out as follows:
The real value assumed by a given 64-bit double-precision datum with a given
biased exponent
and a 52-bit fraction is
:
or
:
Between 2
52=4,503,599,627,370,496 and 2
53=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 2
53 to 2
54, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 2
51 to 2
52, the spacing is 0.5, etc.
The spacing as a fraction of the numbers in the range from 2
''n'' to 2
''n''+1 is 2
''n''−52.
The maximum relative rounding error when rounding a number to the nearest representable one (the
machine epsilon
Machine epsilon or machine precision is an upper bound on the relative approximation error due to rounding in floating point number systems. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the ...
) is therefore 2
−53.
The 11 bit width of the exponent allows the representation of numbers between 10
−308 and 10
308, with full 15–17 decimal digits precision. By compromising precision, the subnormal representation allows even smaller values up to about 5 × 10
−324.
Exponent encoding
The double-precision binary floating-point exponent is encoded using an
offset-binary
Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numbe ...
representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard. Examples of such representations would be:
The exponents
00016
and
7ff16
have a special meaning:
*
000000000002
=
00016
is used to represent a
signed zero
Signed zero is zero with an associated sign. In ordinary arithmetic, the number 0 does not have a sign, so that −0, +0 and 0 are equivalent. However, in computing, some number representations allow for the existence of two zeros, often denoted by ...
(if ''F'' = 0) and
subnormal number
In computer science, subnormal numbers are the subset of denormalized numbers (sometimes called denormals) that fill the arithmetic underflow, underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than ...
s (if ''F'' ≠ 0); and
*
111111111112
=
7ff16
is used to represent
∞
The infinity symbol () is a List of mathematical symbols, mathematical symbol representing the concept of infinity. This symbol is also called a ''lemniscate'', after the lemniscate curves of a similar shape studied in algebraic geometry, or " ...
(if ''F'' = 0) and
NaNs (if ''F'' ≠ 0),
where ''F'' is the fractional part of the
significand
The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
. All bit patterns are valid encoding.
Except for the above exceptions, the entire double-precision number is described by:
:
In the case of
subnormal number
In computer science, subnormal numbers are the subset of denormalized numbers (sometimes called denormals) that fill the arithmetic underflow, underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than ...
s (''e'' = 0) the double-precision number is described by:
:
Endianness
Double-precision examples
0 01111111111 0000000000000000000000000000000000000000000000000000
2
≙ 3FF0 0000 0000 0000
16
≙ +2
0 × 1
= 1
0 01111111111 0000000000000000000000000000000000000000000000000001
2
≙ 3FF0 0000 0000 0001
16
≙ +2
0 × (1 + 2
−52)
≈ 1.0000000000000002220 (the smallest number greater than 1)
0 01111111111 0000000000000000000000000000000000000000000000000010
2
≙ 3FF0 0000 0000 0002
16
≙ +2
0 × (1 + 2
−51)
≈ 1.0000000000000004441 (the second smallest number greater than 1)
0 10000000000 0000000000000000000000000000000000000000000000000000
2
≙ 4000 0000 0000 0000
16
≙ +2
1 × 1
= 2
1 10000000000 0000000000000000000000000000000000000000000000000000
2
≙ C000 0000 0000 0000
16
≙ −2
1 × 1
= −2
0 10000000000 1000000000000000000000000000000000000000000000000000
2
≙ 4008 0000 0000 0000
16
≙ +2
1 × 1.1
2
= 11
2
= 3
0 10000000001 0000000000000000000000000000000000000000000000000000
2
≙ 4010 0000 0000 0000
16
≙ +2
2 × 1
= 100
2
= 4
0 10000000001 0100000000000000000000000000000000000000000000000000
2
≙ 4014 0000 0000 0000
16
≙ +2
2 × 1.01
2
= 101
2
= 5
0 10000000001 1000000000000000000000000000000000000000000000000000
2
≙ 4018 0000 0000 0000
16
≙ +2
2 × 1.1
2
= 110
2
= 6
0 10000000011 0111000000000000000000000000000000000000000000000000
2
≙ 4037 0000 0000 0000
16
≙ +2
4 × 1.0111
2
= 10111
2
= 23
0 01111111000 1000000000000000000000000000000000000000000000000000
2
≙ 3F88 0000 0000 0000
16
≙ +2
−7 × 1.1
2
= 0.00000011
2
= 0.01171875 (3/256)
0 00000000000 0000000000000000000000000000000000000000000000000001
2
≙ 0000 0000 0000 0001
16
≙ +2
−1022 × 2
−52
= 2
−1074
≈ 4.9406564584124654 × 10
−324 (smallest positive subnormal number)
0 00000000000 1111111111111111111111111111111111111111111111111111
2
≙ 000F FFFF FFFF FFFF
16
≙ +2
−1022 × (1 − 2
−52)
≈ 2.2250738585072009 × 10
−308 (largest subnormal number)
0 00000000001 0000000000000000000000000000000000000000000000000000
2
≙ 0010 0000 0000 0000
16
≙ +2
−1022 × 1
≈ 2.2250738585072014 × 10
−308 (smallest positive normal number)
0 11111111110 1111111111111111111111111111111111111111111111111111
2
≙ 7FEF FFFF FFFF FFFF
16
≙ +2
1023 × (2 − 2
−52)
≈ 1.7976931348623157 × 10
308 (largest normal number)
0 00000000000 0000000000000000000000000000000000000000000000000000
2
≙ 0000 0000 0000 0000
16
≙ +0 (positive zero)
1 00000000000 0000000000000000000000000000000000000000000000000000
2
≙ 8000 0000 0000 0000
16
≙ −0 (negative zero)
0 11111111111 0000000000000000000000000000000000000000000000000000
2
≙ 7FF0 0000 0000 0000
16
≙ +∞ (positive infinity)
1 11111111111 0000000000000000000000000000000000000000000000000000
2
≙ FFF0 0000 0000 0000
16
≙ −∞ (negative infinity)
0 11111111111 0000000000000000000000000000000000000000000000000001
2
≙ 7FF0 0000 0000 0001
16
≙ NaN (sNaN on most processors, such as x86 and ARM)
0 11111111111 1000000000000000000000000000000000000000000000000001
2
≙ 7FF8 0000 0000 0001
16
≙ NaN (qNaN on most processors, such as x86 and ARM)
0 11111111111 1111111111111111111111111111111111111111111111111111
2
≙ 7FFF FFFF FFFF FFFF
16
≙ NaN (an alternative encoding of NaN)
0 01111111101 0101010101010101010101010101010101010101010101010101
2
≙ 3FD5 5555 5555 5555
16
≙ +2
−2 × (1 + 2
−2 + 2
−4 + ... + 2
−52)
≈ 0.33333333333333331483 (closest approximation to
1/
3)
0 10000000000 1001001000011111101101010100010001000010110100011000
2
≙ 4009 21FB 5444 2D18
16
≈ 3.141592653589793116 (closest approximation to π)
Encodings of qNaN and sNaN are not completely specified in
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...
and depend on the processor. Most processors, such as the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
family and the
ARM family processors, use the most significant bit of the significand field to indicate a quiet NaN; this is what is recommended by IEEE 754. The
PA-RISC
Precision Architecture reduced instruction set computer, RISC (PA-RISC) or Hewlett Packard Precision Architecture (HP/PA or simply HPPA), is a computer, general purpose computer instruction set architecture (ISA) developed by Hewlett-Packard f ...
processors use the bit to indicate a signaling NaN.
By default,
1/
3 rounds down, instead of up like
single precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floa ...
, because of the odd number of bits in the significand.
In more detail:
Given the hexadecimal representation 3FD5 5555 5555 5555
16,
Sign = 0
Exponent = 3FD
16 = 1021
Exponent Bias = 1023 (constant value; see above)
Fraction = 5 5555 5555 5555
16
Value = 2
(Exponent − Exponent Bias) × 1.Fraction – Note that Fraction must not be converted to decimal here
= 2
−2 × (15 5555 5555 5555
16 × 2
−52)
= 2
−54 × 15 5555 5555 5555
16
= 0.333333333333333314829616256247390992939472198486328125
≈ 1/3
Execution speed with double-precision arithmetic
Using double-precision floating-point variables is usually slower than working with their single precision counterparts. One area of computing where this is a particular issue is parallel code running on GPUs. For example, when using
Nvidia
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
's
CUDA
In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...
platform, calculations with double precision can take, depending on hardware, from 2 to 32 times as long to complete compared to those done using
single precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floa ...
.
Additionally, many mathematical functions (e.g., sin, cos, atan2, log, exp and sqrt) need more computations to give accurate double-precision results, and are therefore slower.
Precision limitations on integer values
* Integers from −2
53 to 2
53 (−9,007,199,254,740,992 to 9,007,199,254,740,992) can be exactly represented.
* Integers between 2
53 and 2
54 = 18,014,398,509,481,984 round to a multiple of 2 (even number).
* Integers between 2
54 and 2
55 = 36,028,797,018,963,968 round to a multiple of 4.
* Integers between 2
''n'' and 2
''n''+1 round to a multiple of 2
''n''−52.
Implementations
Doubles are implemented in many programming languages in different ways such as the following. On processors with only dynamic precision, such as
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
without
SSE2
SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of ...
(or when SSE2 is not used, for compatibility purpose) and with extended precision used by default, software may have difficulties to fulfill some requirements.
C and C++
C and C++ offer a wide variety of
arithmetic types. Double precision is not required by the standards (except by the optional annex F of
C99, covering IEEE 754 arithmetic), but on most systems, the
double
type corresponds to double precision. However, on 32-bit x86 with extended precision by default, some compilers may not conform to the C standard or the arithmetic may suffer from
double rounding.
Fortran
Fortran provides several integer and real types, and the 64-bit type
real64
, accessible via Fortran's intrinsic module
iso_fortran_env
, corresponds to double precision.
Common Lisp
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
provides the types SHORT-FLOAT, SINGLE-FLOAT, DOUBLE-FLOAT and LONG-FLOAT. Most implementations provide SINGLE-FLOATs and DOUBLE-FLOATs with the other types appropriate synonyms. Common Lisp provides exceptions for catching floating-point underflows and overflows, and the inexact floating-point exception, as per IEEE 754. No infinities and NaNs are described in the ANSI standard, however, several implementations do provide these as extensions.
Java
On
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
before version 1.2, every implementation had to be IEEE 754 compliant. Version 1.2 allowed implementations to bring extra precision in intermediate computations for platforms like
x87. Thus a modifier
strictfp was introduced to enforce strict IEEE 754 computations. Strict floating point has been restored in Java 17.
JavaScript
As specified by the
ECMAScript
ECMAScript (; ES) is a standard for scripting languages, including JavaScript, JScript, and ActionScript. It is best known as a JavaScript standard intended to ensure the interoperability of web pages across different web browsers. It is stan ...
standard, all arithmetic in
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
shall be done using double-precision floating-point arithmetic.
JSON
The
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
data encoding format supports numeric values, and the grammar to which numeric expressions must conform has no limits on the precision or range of the numbers so encoded. However, RFC 8259 advises that, since IEEE 754 binary64 numbers are widely implemented, good interoperability can be achieved by implementations processing JSON if they expect no more precision or range than binary64 offers.
Rust and Zig
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
and
Zig have the
f64
data type.
See also
Notes and references
{{data types
Binary arithmetic
Computer arithmetic
Floating point types