Extended precision refers to
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
number formats that provide greater
precision
Precision, precise or precisely may refer to:
Science, and technology, and mathematics Mathematics and computing (general)
* Accuracy and precision, measurement deviation from true value and its scatter
* Significant figures, the number of digit ...
than the basic floating-point formats. Extended precision formats support a basic format by
minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to ''extended precision'',
arbitrary-precision arithmetic
In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are li ...
refers to implementations of much larger numeric types (with a storage count that usually is not a power of two) using special software (or, rarely, hardware).
Extended precision implementations
There is a long history of extended floating-point formats reaching back nearly to the middle of the last century. Various manufacturers have used different formats for extended precision for different machines. In many cases the format of the extended precision is not quite the same as a scale-up of the ordinary single- and double-precision formats it is meant to extend. In a few cases the implementation was merely a software-based change in the floating-point data format, but in most cases extended precision was implemented in hardware, either built into the
central processor
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, ...
itself, or more often, built into the hardware of an optional, attached processor called a "
floating-point unit
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
" (FPU) or "floating-point processor" (
FPP), accessible to the
CPU as a fast input / output device.
IBM extended precision formats
The
IBM 1130
The IBM 1130 Computing System, introduced in 1965, was IBM's least expensive computer at that time. A binary 16-bit machine, it was marketed to price-sensitive, computing-intensive technical markets, like education and engineering, succeeding th ...
, sold in 1965,
offered two floating-point formats: A 32-bit "standard precision" format and a 40-bit "extended precision" format. Standard precision format contains a 24-bit
two's complement
Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
significand
The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
while extended precision utilizes a 32-bit two's complement significand. The latter format makes full use of the CPU's 32-bit integer operations. The characteristic in both formats is an 8-bit field containing the power of two
biased by 128. Floating-point arithmetic operations are performed by software, and
double precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Flo ...
is not supported at all. The extended format occupies three 16-bit words, with the extra space simply ignored.
The
IBM System/360
The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applica ...
supports a 32-bit "short" floating-point format and a 64-bit "long" floating-point format. The 360/85 and follow-on
System/370
The IBM System/370 (S/370) is a model range of IBM mainframe computers announced on June 30, 1970, as the successors to the System/360 family. The series mostly maintains backward compatibility with the S/360, allowing an easy migration path f ...
add support for a 128-bit "extended" format. These formats are still supported in the current
design
A design is a plan or specification for the construction of an object or system or for the implementation of an activity or process or the result of that plan or specification in the form of a prototype, product, or process. The verb ''to design'' ...
, where they are now called the "
hexadecimal floating-point" (HFP) formats.
Microsoft MBF extended precision format
The
Microsoft BASIC
Microsoft BASIC is the foundation software product of the Microsoft company and evolved into a line of BASIC interpreters and compiler(s) adapted for many different microcomputers. It first appeared in 1975 as Altair BASIC, which was the first ve ...
port for the
6502
The MOS Technology 6502 (typically pronounced "sixty-five-oh-two" or "six-five-oh-two") William Mensch and the moderator both pronounce the 6502 microprocessor as ''"sixty-five-oh-two"''. is an 8-bit microprocessor that was designed by a small te ...
CPU, such as in adaptations like
Commodore BASIC
Commodore BASIC, also known as PET BASIC or CBM-BASIC, is the dialect of the BASIC programming language used in Commodore International's 8-bit home computer line, stretching from the PET of 1977 to the C128 of 1985.
The core is based on 6502 M ...
,
AppleSoft BASIC
Applesoft BASIC is a dialect of Microsoft BASIC, developed by Marc McDonald and Ric Weiland, supplied with the Apple II series of computers. It supersedes Integer BASIC and is the BASIC in ROM in all Apple II series computers after the original ...
,
KIM-1 BASIC
The KIM-1, short for ''Keyboard Input Monitor'', is a small 6502-based single-board computer developed and produced by MOS Technology, Inc. and launched in 1976. It was very successful in that period, due to its low price (thanks to the inexpe ...
or
MicroTAN BASIC, supports an
extended 40-bit variant of the floating-point format ''
Microsoft Binary Format
In computing, Microsoft Binary Format (MBF) is a format for floating-point numbers which was used in Microsoft's BASIC language products, including MBASIC, GW-BASIC and QuickBASIC prior to version 4.00.
There are two main versions of the format ...
'' (MBF) since 1977.
IEEE 754 extended precision formats
The
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
floating-point standard recommends that implementations provide extended precision formats. The standard specifies the minimum requirements for an extended format but does not specify an encoding. The encoding is the implementor's choice.
The
IA32
IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation o ...
,
x86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
, and
Itanium
Itanium ( ) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Launched in June 2001, Intel marketed the processors for enterprise servers and high-performance computin ...
processors support an 80-bit "double extended" extended precision format with a 64-bit significand. The
Intel 8087
The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor for the 8086 line of microprocessors.
The purpose of the 8087 was to speed up computations for floating-point arithmetic, such as addition, subtraction, multiplicati ...
math
coprocessor
A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography o ...
was the first
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
device which supported floating-point arithmetic in hardware. It was designed to support a 32-bit "single precision" format and a 64-bit "double-precision" format for encoding and interchanging floating-point numbers. The temporary real (extended) format was designed not to store data at higher precision as such, but rather primarily to allow for the computation of double results more reliably and accurately by minimising overflow and roundoff-errors in intermediate calculations.
For example, many floating-point algorithms (e.g.
exponentiation
Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
) suffer from significant precision loss when computed using the most direct implementations. To mitigate such issues the internal registers in the 8087 were designed to hold intermediate results in an 80-bit "extended precision" format. The 8087 automatically converts numbers to this format when loading floating-point registers from
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
and also converts results back to the more conventional formats when storing the registers back into memory. To enable intermediate subexpression results to be saved in extended precision scratch variables and continued across programming language statements, and otherwise interrupted calculations to resume where they were interrupted, it provides
instructions
Instruction or instructions may refer to:
Computing
* Instruction, one operation of a processor within a computer architecture instruction set
* Computer program, a collection of instructions
Music
* Instruction (band), a 2002 rock band from Ne ...
which transfer values between these internal registers and memory without performing any conversion, which therefore enables access to the extended format for calculations – also reviving the issue of the accuracy of functions of such numbers, but at a higher precision.
The
floating-point unit
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
s (FPU) on all subsequent
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
processors have supported this format. As a result, software can be developed which takes advantage of the higher precision provided by this format.
William Kahan
William "Velvel" Morton Kahan (born June 5, 1933) is a Canadian mathematician and computer scientist, who received the Turing Award in 1989 for "''his fundamental contributions to numerical analysis''",
was named an ACM Fellow in 1994, and inducte ...
, a primary designer of the x87 arithmetic and initial IEEE 754 standard proposal notes on the development of the x87 floating point: "An extended format as wide as we dared (80 bits) was included to serve the same support role as the 13 decimal internal format serves in Hewlett-Packard's 10 decimal calculators." Moreover, Kahan notes that 64 bits was the widest significand across which carry propagation could be done without increasing the cycle time on the 8087, and that the x87 extended precision was designed to be extensible to higher precision in future processors: "For now the
10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a
16 byte format. ... That kind of gradual evolution towards wider precision was already in view when
IEEE Standard 754 for Floating-Point Arithmetic was framed."
The
Motorola 6888x math coprocessors and the
Motorola 68040
The Motorola 68040 ("''sixty-eight-oh-forty''") is a 32-bit microprocessor in the Motorola 68000 series, released in 1990. It is the successor to the 68030 and is followed by the 68060, skipping the 68050. In keeping with general Motorola nami ...
and
68060
The Motorola 68060 ("''sixty-eight-oh-sixty''") is a 32-bit microprocessor from Motorola released in 1994. It is the successor to the Motorola 68040 and is the highest performing member of the 68000 series. Two derivatives were produced, the 68L ...
processors support this same 64-bit significand extended precision type (similar to the Intel format although padded to a 96-bit format with 16 unused bits inserted between the exponent and significand fields). The follow-on
Coldfire
The NXP ColdFire is a microprocessor that derives from the Motorola 68000 family architecture, manufactured for embedded systems development by NXP Semiconductors. It was formerly manufactured by Freescale Semiconductor (formerly the semiconductor ...
processors do not support this 96-bit extended precision format.
The FPA10 math coprocessor for early
ARM
In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between the ...
processors also supports this extended precision type (similar to the Intel format although padded to a 96-bit format with 16 zero bits inserted between the sign and the exponent fields), but without correct rounding.
The x87 and Motorola 68881 80-bit formats meet the requirements of the IEEE 754 double extended format,
as does the IEEE 754
128-bit
While there are currently no mainstream general-purpose processors built to operate on 128-bit ''integers'' or addresses, a number of processors do have specialized ways to operate on 128-bit chunks of data.
Representation
128-bit processors co ...
format.
x86 extended precision format
The x86 extended precision format is an 80-bit format first implemented in the
Intel 8087
The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor for the 8086 line of microprocessors.
The purpose of the 8087 was to speed up computations for floating-point arithmetic, such as addition, subtraction, multiplicati ...
math
coprocessor
A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography o ...
and is supported by all processors that are based on the
x86 design that incorporate a
floating-point unit
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
(FPU). This 80-bit format uses one bit for the sign of the significand, 15 bits for the exponent field (i.e. the same range as the 128-bit
quadruple precision IEEE 754 format) and 64 bits for the significand. The exponent field is
biased by 16383, meaning that 16383 has to be subtracted from the value in the exponent field to compute the actual power of 2. An exponent field value of 32767 (all fifteen bits 1) is reserved so as to enable the representation of special states such as
infinity
Infinity is that which is boundless, endless, or larger than any natural number. It is often denoted by the infinity symbol .
Since the time of the ancient Greeks, the philosophical nature of infinity was the subject of many discussions amo ...
and
Not a Number
Nan or NAN may refer to:
Places China
* Nan County, Yiyang, Hunan, China
* Nan Commandery, historical commandery in Hubei, China
Thailand
* Nan Province
** Nan, Thailand, the administrative capital of Nan Province
* Nan River
People Given name ...
. If the exponent field is zero, the value is a
denormal number and the exponent of 2 is −16382.
In the following table, "''s''" is the value of the sign bit (0 means positive, 1 means negative), "''e''" is the value of the exponent field interpreted as a positive integer, and "''m''" is the significand interpreted as a positive binary number where the binary point is located between bits 63 and 62. The "''m''" field is the combination of the integer and fraction parts in the above diagram.
In contrast to the
single
Single may refer to:
Arts, entertainment, and media
* Single (music), a song release
Songs
* "Single" (Natasha Bedingfield song), 2004
* "Single" (New Kids on the Block and Ne-Yo song), 2008
* "Single" (William Wei song), 2016
* "Single", by ...
and
double-precision formats, this format does not utilize an implicit/
hidden bit
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
. Rather, bit 63 contains the integer part of the significand and bits 62-0 hold the fractional part. Bit 63 will be 1 on all normalized numbers. There were several advantages to this design when the
8087
The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor for the 8086 line of microprocessors.
The purpose of the 8087 was to speed up computations for floating-point arithmetic, such as addition, subtraction, multi ...
was being developed:
* Calculations can be completed a little faster if all bits of the significand are present in the register.
* A 64-bit significand provides sufficient precision to avoid loss of precision when the results are converted back to double-precision format in the vast number of cases.
* This format provides a mechanism for indicating precision loss due to underflow which can be carried through further operations. For example, the calculation generates the intermediate result which is a
denormal and also involves precision loss. The product of all of the terms is which can be represented as a normalized number. The
80287
x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that worked in tandem with corresponding x86 CPUs. Th ...
could complete this calculation and indicate the loss of precision by returning an "unnormal" result (exponent not 0, bit 63 = 0). Processors since the
80387 no longer generate unnormals and do not support unnormal inputs to operations. They will generate a denormal if an underflow occurs but will generate a normalized result if subsequent operations on the denormal can be normalized.
Introduction to use
The 80-bit floating-point format was widely available by 1984, after the development of C, Fortran and similar computer languages, which initially offered only the common 32- and 64-bit floating-point sizes. On the
x86 design most
C compilers now support 80-bit extended precision via the
long double
In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other flo ...
type, and this was specified in the
C99
C99 (previously known as C9X) is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version ( C90) with new features for the language and the standard library, and helps impl ...
/
C11 C11, C.XI, C-11 or C.11 may refer to:
Transport
* C-11 Fleetster, a 1920s American light transport aircraft for use of the United States Assistant Secretary of War
* Fokker C.XI, a 1935 Dutch reconnaissance seaplane
* LET C-11, a license-build var ...
standards (IEC 60559 floating-point arithmetic (Annex F)). Compilers on x86 for other languages often support extended precision as well, sometimes via nonstandard extensions: for example,
Turbo Pascal
Turbo Pascal is a software development system that includes a compiler and an integrated development environment (IDE) for the Pascal (programming language), Pascal programming language running on CP/M, CP/M-86, and DOS. It was originally develo ...
offers an
extended
type, and several
Fortran compilers have a
REAL*10
type (analogous to
REAL*4
and
REAL*8
). Such compilers also typically include extended-precision mathematical
subroutines
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may ...
, such as
square root
In mathematics, a square root of a number is a number such that ; in other words, a number whose ''square'' (the result of multiplying the number by itself, or ⋅ ) is . For example, 4 and −4 are square roots of 16, because .
E ...
and
trigonometric function
In mathematics, the trigonometric functions (also called circular functions, angle functions or goniometric functions) are real functions which relate an angle of a right-angled triangle to ratios of two side lengths. They are widely used in all ...
s, in their standard
libraries
A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
.
Working range
The 80-bit floating-point format has a range (including
subnormals) from approximately 3.65×10
−4951 to 1.18×10
4932. Although log
10(2
64) ≅ 19.266, this format is usually described as giving approximately eighteen significant digits of precision (the floor of log
10(2
63), the minimum guaranteed precision). The use of decimal when talking about binary is unfortunate because most decimal fractions are recurring sequences in binary just as 2/3 is in decimal. Thus, a value such as 10.15 is represented in binary as equivalent to 10.1499996185 etc. in decimal for REAL*4 but 10.15000000000000035527etc. in REAL*8: interconversion will involve approximation except for those few decimal fractions that represent an exact binary value, such as 0.625. For REAL*10, the decimal string is 10.1499999999999999996530553etc. The last 9 digit is the eighteenth fractional digit and thus the twentieth significant digit of the string. Bounds on conversion between decimal and binary for the 80-bit format can be given as follows: if a decimal string with at most 18 significant digits is correctly rounded to an 80-bit IEEE 754 binary floating-point value (as on input) then converted back to the same number of significant decimal digits (as for output), then the final string will exactly match the original; while, conversely, if an 80-bit IEEE 754 binary floating-point value is correctly converted and (nearest) rounded to a decimal string with at least 21 significant decimal digits then converted back to binary format it will exactly match the original.
These approximations are particularly troublesome when specifying the best value for constants in formulae to high precision, as might be calculated via
arbitrary-precision arithmetic
In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are li ...
.
Need for the 80-bit format
A notable example of the need for a minimum of 64 bits of precision in the significand of the extended precision format is the need to avoid precision loss when performing exponentiation on
double-precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Flo ...
values.
The
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
floating-point units do not provide an instruction that directly performs
exponentiation
Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
. Instead they provide a set of instructions that a program can use in sequence to perform exponentiation using the equation:
:
In order to avoid precision loss, the intermediate results "" and "" must be computed with much higher precision, because effectively both the exponent and the significand fields of must fit into the significand field of the intermediate result. Subsequently, the significand field of the intermediate result is split between the exponent and significand fields of the final result when is calculated. The following discussion describes this requirement in more detail.
With a little unpacking, an
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
double-precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Flo ...
value can be represented as:
:
where is the sign of the exponent (either 0 or 1), is the unbiased exponent, which is an integer that ranges from 0 to 1023, and is the significand which is a 53-bit value that falls in the range . Negative numbers and zero can be ignored because the logarithm of these values is undefined. For purposes of this discussion does not have 53 bits of precision because it is constrained to be greater than or equal to one i.e. the hidden bit does not count towards the precision (Note that in situations where is less than 1, the value is actually a de-normal and therefore may have already suffered precision loss. This situation is beyond the scope of this article).
Taking the log of this representation of a
double-precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Flo ...
number and simplifying results in the following:
:
This result demonstrates that when taking base 2 logarithm of a number, the sign of the exponent of the original value becomes the sign of the logarithm, the exponent of the original value becomes the integer part of the significand of the logarithm, and the significand of the original value is transformed into the fractional part of the significand of the logarithm.
Because is an integer in the range 0 to 1023, up to 10 bits to the left of the radix point are needed to represent the integer part of the logarithm. Because falls in the range , the value of will fall in the range so at least 52 bits are needed to the right of the radix point to represent the fractional part of the logarithm. Combining 10 bits to the left of the radix point with 52 bits to the right of the radix point means that the significand part of the logarithm must be computed to at least 62 bits of precision. In practice values of less than
require 53 bits to the right of the radix point and values of less than