Octuple-precision floating-point format
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, octuple precision is a binary
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
-based
computer number format A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The e ...
that occupies 32
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s (256
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s) in computer memory. This 256-
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
octuple precision is for applications requiring results in higher than
quadruple precision In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision. This 128-bit quadruple precision is desi ...
. This format is rarely (if ever) used and very few environments support it.


IEEE 754 octuple-precision binary floating-point format: binary256

In its 2008 revision, the
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
standard specifies a binary256 format among the ''interchange formats'' (it is not a basic format), as having: *
Sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...
: 1 bit *
Exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
width: 19 bits *
Significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
: 237 bits (236 explicitly stored) The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: ). The bits are laid out as follows:


Exponent encoding

The octuple-precision binary floating-point exponent is encoded using an
offset binary Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numbe ...
representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard. * Emin = −262142 * Emax = 262143 *
Exponent bias In IEEE 754 Floating-point arithmetic, floating-point numbers, the exponent is biased in the biasing, engineering sense of the word – the value stored is offset from the actual value by the exponent bias, also called a biased exponent. Biasing is ...
= 3FFFF16 = 262143 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 262143 has to be subtracted from the stored exponent. The stored exponents 0000016 and 7FFFF16 are interpreted specially. The minimum strictly positive (subnormal) value is and has a precision of only one bit. The minimum positive normal value is 2−262142 ≈ 2.4824 × 10−78913. The maximum representable value is 2262144 − 2261907 ≈ 1.6113 × 1078913.


Octuple-precision examples

These examples are given in bit ''representation'', in
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
, of the floating-point value. This includes the sign, (biased) exponent, and significand. 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = +0 8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = −0 7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = +infinity ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = −infinity 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000116 = 2−262142 × 2−236 = 2−262378 ≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10−78984 (smallest positive subnormal number) 0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff16 = 2−262142 × (1 − 2−236) ≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10−78913 (largest subnormal number) 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = 2−262142 ≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10−78913 (smallest positive normal number) 7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff16 = 2262143 × (2 − 2−236) ≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 1078913 (largest normal number) 3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff16 = 1 − 2−237 ≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472 (largest number less than one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = 1 (one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000116 = 1 + 2−236 ≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906 (smallest number larger than one) By default, 1/3 rounds down like
double precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Flo ...
, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a
unit in the last place In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit (rightmost digit) represents if it is 1 ...
.


Implementations

Octuple precision is rarely implemented since usage of it is extremely rare.
Apple Inc. Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company ...
had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit
two's complement Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
significand and a 32-bit exponent. One can use general
arbitrary-precision arithmetic In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are li ...
libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance.


Hardware support

There is no known hardware implementation of octuple precision.


See also

*
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
*
ISO/IEC 10967 ISO/IEC 10967, Language independent arithmetic (LIA), is a series of standards on computer arithmetic. It is compatible with ISO/IEC/IEEE 60559:2011, more known as IEEE 754-2008, and much of the specifications are for IEEE 754 special values (tho ...
, Language-independent arithmetic *
Primitive data type In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled pr ...
*
Scientific notation Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, o ...


References


Further reading

* {{data types Binary arithmetic Floating point types