FP64
   HOME

TheInfoList



OR:

Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide
dynamic range Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of Signal (electrical engineering), signals, like sound and light. It is measured ...
of numeric values by using a floating
radix point A decimal separator is a symbol used to separate the integer part from the fractional part of a number written in decimal form (e.g., "." in 12.45). Different countries officially designate different symbols for use as the separator. The choi ...
. Floating point is used to represent fractional values, or when a wider range is needed than is provided by fixed point (of the same bit width), even if at the cost of precision. Double precision may be chosen when the range or precision of
single precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...
would be insufficient. In the
IEEE 754-2008 The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object th ...
, the 64-bit base-2 format is officially referred to as binary64; it was called double in
IEEE 754-1985 IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the ...
. IEEE 754 specifies additional floating-point formats, including 32-bit base-2 ''single precision'' and, more recently, base-10 representations. One of the first
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s to provide single- and double-precision floating-point data types was Fortran. Before the widespread adoption of IEEE 754-1985, the representation and properties of floating-point data types depended on the computer manufacturer and computer model, and upon decisions made by programming-language implementers. E.g.,
GW-BASIC GW-BASIC is a dialect of the BASIC programming language developed by Microsoft from IBM BASICA. Functionally identical to BASICA, its BASIC interpreter is a fully self-contained executable and does not need the Cassette BASIC ROM found in the ...
's double-precision data type was the 64-bit MBF floating-point format.


IEEE 754 double-precision binary floating-point format: binary64

Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. It is commonly known simply as ''double''. The IEEE 754 standard specifies a binary64 as having: *
Sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...
: 1 bit * Exponent: 11 bits * Significand
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
: 53 bits (52 explicitly stored) The sign bit determines the sign of the number (including when this number is zero, which is signed). The exponent field is an 11-bit unsigned integer from 0 to 2047, in biased form: an exponent value of 1023 represents the actual zero. Exponents range from −1022 to +1023 because exponents of −1023 (all 0s) and +1024 (all 1s) are reserved for special numbers. The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16). If a decimal string with at most 15 significant digits is converted to the IEEE 754 double-precision format, giving a normal number, and then converted back to a decimal string with the same number of digits, the final result should match the original string. If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number. The format is written with the significand having an implicit integer bit of value 1 (except for special data, see the exponent encoding below). With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955). The bits are laid out as follows: The real value assumed by a given 64-bit double-precision datum with a given biased exponent e and a 52-bit fraction is : (-1)^(1.b_b_...b_)_2 \times 2^ or : (-1)^\left(1 + \sum_^ b_ 2^ \right)\times 2^ Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 253 to 254, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 251 to 252, the spacing is 0.5, etc. The spacing as a fraction of the numbers in the range from 2''n'' to 2''n''+1 is 2''n''−52. The maximum relative rounding error when rounding a number to the nearest representable one (the machine epsilon) is therefore 2−53. The 11 bit width of the exponent allows the representation of numbers between 10−308 and 10308, with full 15–17 decimal digits precision. By compromising precision, the subnormal representation allows even smaller values up to about 5 × 10−324.


Exponent encoding

The double-precision binary floating-point exponent is encoded using an
offset-binary Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numb ...
representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard. Examples of such representations would be: The exponents 00016 and 7ff16 have a special meaning: * 000000000002=00016 is used to represent a
signed zero Signed zero is zero with an associated sign. In ordinary arithmetic, the number 0 does not have a sign, so that −0, +0 and 0 are identical. However, in computing, some number representations allow for the existence of two zeros, often denoted by ...
(if ''F'' = 0) and
subnormal number In computer science, subnormal numbers are the subset of denormalized numbers (sometimes called denormals) that fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest normal ...
s (if ''F'' ≠ 0); and * 111111111112=7ff16 is used to represent
The infinity symbol (\infty) is a List of mathematical symbols, mathematical symbol representing the concept of infinity. This symbol is also called a lemniscate, after the lemniscate curves of a similar shape studied in algebraic geometry, or ...
(if ''F'' = 0) and
NaN Nan or NAN may refer to: Places China * Nan County, Yiyang, Hunan, China * Nan Commandery, historical commandery in Hubei, China Thailand * Nan Province ** Nan, Thailand, the administrative capital of Nan Province * Nan River People Given name ...
s (if ''F'' ≠ 0), where ''F'' is the fractional part of the significand. All bit patterns are valid encoding. Except for the above exceptions, the entire double-precision number is described by: : (-1)^ \times 2^ \times 1.\text In the case of
subnormal number In computer science, subnormal numbers are the subset of denormalized numbers (sometimes called denormals) that fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest normal ...
s (''e'' = 0) the double-precision number is described by: : (-1)^ \times 2^ \times 0.\text = (-1)^ \times 2^ \times 0.\text


Endianness


Double-precision examples

Encodings of qNaN and sNaN are not completely specified in IEEE 754 and depend on the processor. Most processors, such as the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
family and the
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
family processors, use the most significant bit of the significand field to indicate a quiet NaN; this is what is recommended by IEEE 754. The
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
processors use the bit to indicate a signaling NaN. By default, 1/3 rounds down, instead of up like
single precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...
, because of the odd number of bits in the significand. In more detail: Given the hexadecimal representation 3FD5 5555 5555 555516, Sign = 0 Exponent = 3FD16 = 1021 Exponent Bias = 1023 (constant value; see above) Fraction = 5 5555 5555 555516 Value = 2(Exponent − Exponent Bias) × 1.Fraction – Note that Fraction must not be converted to decimal here = 2−2 × (15 5555 5555 555516 × 2−52) = 2−54 × 15 5555 5555 555516 = 0.333333333333333314829616256247390992939472198486328125 ≈ 1/3


Execution speed with double-precision arithmetic

Using double-precision floating-point variables is usually slower than working with their single precision counterparts. One area of computing where this is a particular issue is parallel code running on GPUs. For example, when using
NVIDIA Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
's
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
platform, calculations with double precision take, depending on a hardware, approximately 2 to 32 times as long to complete compared to those done using
single precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...
. Additionally, many mathematical functions (e.g., sin, cos, atan2, log, exp and sqrt) need more computations to give accurate double-precision results, and are therefore slower.


Precision limitations on integer values

* Integers from −253 to 253 (−9,007,199,254,740,992 to 9,007,199,254,740,992) can be exactly represented * Integers between 253 and 254 = 18,014,398,509,481,984 round to a multiple of 2 (even number) * Integers between 254 and 255 = 36,028,797,018,963,968 round to a multiple of 4


Implementations

Doubles are implemented in many programming languages in different ways such as the following. On processors with only dynamic precision, such as
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
without SSE2 (or when SSE2 is not used, for compatibility purpose) and with extended precision used by default, software may have difficulties to fulfill some requirements.


C and C++

C and C++ offer a wide variety of arithmetic types. Double precision is not required by the standards (except by the optional annex F of
C99 C99 (previously known as C9X) is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version ( C90) with new features for the language and the standard library, and helps impl ...
, covering IEEE 754 arithmetic), but on most systems, the double type corresponds to double precision. However, on 32-bit x86 with extended precision by default, some compilers may not conform to the C standard or the arithmetic may suffer from double rounding.


Fortran

Fortran provides several integer and real types, and the 64-bit type real64, accessible via Fortran's intrinsic module iso_fortran_env, corresponds to double precision.


Common Lisp

Common Lisp provides the types SHORT-FLOAT, SINGLE-FLOAT, DOUBLE-FLOAT and LONG-FLOAT. Most implementations provide SINGLE-FLOATs and DOUBLE-FLOATs with the other types appropriate synonyms. Common Lisp provides exceptions for catching floating-point underflows and overflows, and the inexact floating-point exception, as per IEEE 754. No infinities and NaNs are described in the ANSI standard, however, several implementations do provide these as extensions.


Java

On
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
before version 1.2, every implementation had to be IEEE 754 compliant. Version 1.2 allowed implementations to bring extra precision in intermediate computations for platforms like x87. Thus a modifier
strictfp strictfp is an obsolete and unused reserved word in the Java programming language. Previously, this keyword was used as a modifier that restricted floating-point calculations to IEEE 754 semantics in order to ensure portability. The strictfp keywor ...
was introduced to enforce strict IEEE 754 computations. Strict floating point has been restored in Java 17.


JavaScript

As specified by the
ECMAScript ECMAScript (; ES) is a JavaScript standard intended to ensure the interoperability of web pages across different browsers. It is standardized by Ecma International in the documenECMA-262 ECMAScript is commonly used for client-side scripting o ...
standard, all arithmetic in
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
shall be done using double-precision floating-point arithmetic.


JSON

The JSON data encoding format supports numeric values, and the grammar to which numeric expressions must conform has no limits on the precision or range of the numbers so encoded. However, RFC 8259 advises that, since IEEE 754 binary64 numbers are widely implemented, good interoperability can be achieved by implementations processing JSON if they expect no more precision or range than binary64 offers.


See also

* IEEE 754, IEEE standard for floating-point arithmetic *
D notation (scientific notation) Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, or ...


Notes and references

{{data types Binary arithmetic Computer arithmetic Floating point types