Unum Type 3
   HOME

TheInfoList



OR:

Unums (''universal numbers'') are a family of formats and arithmetic, similar to
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
, proposed by
John L. Gustafson John Leroy Gustafson (born January 19, 1955) is an American computer scientist and businessman, chiefly known for his work in high-performance computing (HPC) such as the invention of Gustafson's law, introducing the first commercial computer cl ...
in 2015. They are designed as an alternative to the ubiquitous IEEE 754 floating-point standard. The latest version (known as posits) can be used as a drop-in replacement for programs that do not depend on specific features of IEEE 754.


Type I Unum

The first version of unums, formally known as Type I unum, was introduced in Gustafson's book ''The End of Error'' as a superset of the IEEE-754 floating-point format. The defining features of the Type I unum format are: * a variable-width storage format for both the
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
and
exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
, and * a ''u-bit'', which determines whether the unum corresponds to an exact number (''u'' = 0), or an interval between consecutive exact unums (''u'' = 1). In this way, the unums cover the entire extended real number line ˆ’∞,+∞ For computation with the format, Gustafson proposed using
interval arithmetic Interval arithmetic (also known as interval mathematics, interval analysis, or interval computation) is a mathematical technique used to put bounds on rounding errors and measurement errors in mathematical computation. Numerical methods using ...
with a pair of unums, what he called a ''ubound'', providing the guarantee that the resulting interval contains the exact solution. William M. Kahan and Gustafson debated unums at the '' Arith23'' conference.


Type II Unum

Type II Unums were introduced in 2016 as a redesign of Unums that broke IEEE-754 compatibility.


Posit (Type III Unum)

In February 2017, Gustafson officially introduced Type III unums, posits for fixed floating-point-like values and valids for
interval arithmetic Interval arithmetic (also known as interval mathematics, interval analysis, or interval computation) is a mathematical technique used to put bounds on rounding errors and measurement errors in mathematical computation. Numerical methods using ...
. In March 2021, a standard was ratified and published by the Posit Working Group. Posits are a hardware-friendly version of unum where difficulties faced in the original type I unum due to its variable size are resolved. Compared to IEEE 754 floats of similar size, posits offer a bigger dynamic range and more fraction bits for values with magnitude near 1 (but fewer fraction bits for very large or very small values), and Gustafson claims that they offer better accuracy. In an independent study, Lindstrom, Lloyd and Hittinger from
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
confirmed that posits out-perform floats in accuracy. Posits have superior accuracy in the range near one, where most computations occur. This makes it very attractive to the current trend in deep learning to minimise the number of bits used. It potentially helps any application to accelerate by enabling the use of fewer bits (since it has more fraction bits for accuracy) reducing network and memory bandwidth and power requirements. The format of an ''n''-bit posit is given a label of "posit" followed by the decimal digits of ''n'' (e.g., the 16-bit posit format is "posit16") and consists of four sequential fields: #
sign A sign is an object, quality, event, or entity whose presence or occurrence indicates the probable presence or occurrence of something else. A natural sign bears a causal relation to its object—for instance, thunder is a sign of storm, or me ...
: 1 bit, representing an unsigned integer ''s'' # regime: at least 2 bits and up to (''n'' âˆ’ 1), representing an unsigned integer ''r'' as described below #
exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
: up to 2 bits as available after regime, representing an unsigned integer ''e'' #
fraction A fraction (from la, fractus, "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight ...
: all remaining bits available after exponent, representing a non-negative real
dyadic rational In mathematics, a dyadic rational or binary rational is a number that can be expressed as a fraction whose denominator is a power of two. For example, 1/2, 3/2, and 3/8 are dyadic rationals, but 1/3 is not. These numbers are important in compute ...
''f'' less than 1 The regime field uses
unary coding Unary coding, or the unary numeral system and also sometimes called thermometer code, is an entropy encoding that represents a natural number, ''n'', with a code of length ''n'' + 1 ( or ''n'' ), usually ''n'' ones followed by a zero (if ...
of ''k'' identical bits, followed by a bit of opposite value if any remaining bits are available, to represent an unsigned integer ''r'' that is −''k'' if the first bit is 0 or ''k'' âˆ’ 1 if the first bit is 1. The sign, exponent, and fraction fields are analogous to IEEE 754 sign, exponent, and significand fields (respectively), except that the posit exponent and fraction fields may be absent or truncated and implicitly extended with zeroes—an absent exponent is treated as 002 (representing 0), a one-bit exponent E1 is treated as E102 (representing the integer 0 if E1 is 0 or 2 if E1 is 1), and an absent fraction is treated as 0. The two encodings in which all non-sign bits are 0 have special interpretations: * If the sign bit is 1, the posit value is NaR ("not a real") * If the sign bit is 0, the posit value is 0 (which is unsigned and the only value for which the sign function returns 0) Otherwise, the posit value is equal to ((1 - 3s) + f) \times 2^, in which ''r'' scales by powers of 16, ''e'' scales by powers of 2, ''f'' distributes values uniformly between adjacent combinations of (''r'', ''e''), and ''s'' adjusts the sign symmetrically about 0.


Examples

Note: 32-bit posit is expected to be sufficient to solve almost all classes of applications.


Quire

For each posit''n'' type of precision n, the standard defines a corresponding "quire" type quire''n'' of precision 16 \times n, used to accumulate exact sums of products of those posits without rounding or overflow in dot products for vectors of up to 231 or more elements (the exact limit is 2^). The quire format is a
two's complement Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
signed integer, interpreted as a multiple of units of magnitude 2^ except for the special value with a leading sign bit of 1 and all other bits equal to 0 (which represents NaR). Quires are based on the work of
Ulrich W. Kulisch Ulrich W. Kulisch (born 1933 in Breslau) is a German mathematician specializing in numerical analysis, including the computer implementation of interval arithmetic. Experience After graduation from high school in Freising, Kulisch studied mathe ...
and
Willard L. Miranker Willard L. Miranker (March 8, 1932 – April 28, 2011) was an American mathematician and computer scientist, known for his contributions to applied mathematics and numerical mathematics. Raised in Brooklyn, New York, he earned B.A. (1952), M. ...
.


Valid

Valids are described as a Type III Unum mode that bounds results in a given range.


Implementations

Several software and hardware solutions implement posits. The first complete parameterized posit arithmetic hardware generator was proposed in 2018. Unum implementations have been explored in
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g. ...
and
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
. A
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
version with support for any posit sizes combined with any number of exponent bits is available. A fast implementation in C, SoftPosit, provided by the NGA research team based on Berkeley SoftFloat adds to the available software implementations.


SoftPosit

SoftPosit is a software implementation of posits based on Berkeley SoftFloat. It allows software comparison between posits and floats. It currently supports * Add * Subtract * Multiply * Divide * Fused-multiply-add * Fused-dot-product (with quire) * Square root * Convert posit to signed and unsigned integer * Convert signed and unsigned integer to posit * Convert posit to another posit size * Less than, equal, less than equal comparison * Round to nearest integer


Helper functions

* convert double to posit * convert posit to double * cast unsigned integer to posit It works for 16-bit posits with one exponent bit and 8-bit posit with zero exponent bit. Support for 32-bit posits and flexible type (2-32 bits with two exponent bits) is pending validation. It supports x86_64 systems. It has been tested on
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
gcc ( SUSE Linux) 4.8.5 Apple
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
version 9.1.0 (clang-902.0.39.2).


Examples

''Add with posit8_t'' #include "softposit.h" int main (int argc, char *argv[]) ''Fused dot product with quire16_t'' //Convert double to posit posit16_t pA = convertDoubleToP16(1.02783203125 ); posit16_t pB = convertDoubleToP16(0.987060546875); posit16_t pC = convertDoubleToP16(0.4998779296875); posit16_t pD = convertDoubleToP16(0.8797607421875); quire16_t qZ; //Set quire to 0 qZ = q16_clr(qZ); //accumulate products without roundings qZ = q16_fdp_add(qZ, pA, pB); qZ = q16_fdp_add(qZ, pC, pD); //Convert back to posit posit16_t pZ = q16_to_p16(qZ); //To check answer double dZ = convertP16ToDouble(pZ);


Critique

William M. Kahan, the principal architect of
IEEE 754-1985 IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the ...
criticizes type I unums on the following grounds (some are addressed in type II and type III standards): * The description of unums sidesteps using calculus for solving physics problems. * Unums can be expensive in terms of time and power consumption. * Each computation in unum space is likely to change the bit length of the structure. This requires either unpacking them into a fixed-size space, or data allocation, deallocation, and garbage collection during unum operations, similar to the issues for dealing with variable-length records in mass storage. * Unums provide only two kinds of numerical exception, quiet and signaling NaN (Not-a-Number). * Unum computation may deliver overly loose bounds from the selection of an algebraically correct but numerically unstable algorithm. * The benefits of unum over short precision floating point for problems requiring low precision are not obvious. * Solving differential equations and evaluating integrals with unums guarantee correct answers but may not be as fast as methods that usually work.


See also

*
Karlsruhe Accurate Arithmetic Karlsruhe Accurate Arithmetic (KAA) or Karlsruhe Accurate Arithmetic Approach (KAAA), augments conventional floating-point arithmetic with good error behaviour with new operations to calculate scalar products with a single rounding error. The fo ...
(KAA) *
Q (number format) The Q notation is a way to specify the parameters of a binary fixed point number format. For example, in Q notation, the number format denoted by Q8.8 means that the fixed point numbers in this format have 8 bits for the integer part and 8 bits ...
*
Significant figures Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expre ...
*
Floating-point error mitigation Floating-point error mitigation is the minimization of errors caused by the fact that real numbers cannot, in general, be accurately represented in a fixed space. By definition, floating-point error cannot be eliminated, and, at best, can only ...
* Elias gamma (γ) code *
Tapered floating point In computing, tapered floating point (TFP) is a format similar to Floating-point arithmetic, floating point, but with variable-sized entries for the significand and exponent instead of the fixed-length entries found in normal floating-point forma ...
(TFP)


References


Further reading

* * * * (NB. PDFs come without notes

*

* * * * * * * (Roger Stokes' download link

*


External links

* * * * {{cite web , url=https://www.johndcook.com/blog/2018/04/11/anatomy-of-a-posit-number/ , title=Anatomy of a posit number , date=2018-04-11 , access-date=2019-08-09 Floating point types Articles with example C code