In computing, minifloats are floating-point values represented with
very few bits. Predictably, they are not well suited for
general-purpose numerical calculations. They are used for special
purposes, most often in computer graphics, where iterations are small
and precision has aesthetic effects. Additionally, they are frequently
encountered as a pedagogical tool in computer-science courses to
demonstrate the properties and structures of floating-point arithmetic
and
Contents 1 Example 1.1 Representation of zero 1.2 Subnormal numbers 1.3 Normalized numbers 1.4 Infinity 1.5 Not a number 1.6 Value of the bias 1.7 All values 1.8 Properties of this example 2 Arithmetic 2.1 Addition 2.2 Subtraction, multiplication and division 3 See also 4 References 5 External links Example[edit]
A minifloat in 1 byte (8 bit) with 1 sign bit, 4 exponent bits
and 3 mantissa bits (in short, a 1.4.3.−2 minifloat) should be used
to represent integral values. All
0 0000 000 = 0 Subnormal numbers[edit] The mantissa is extended with "0.": 0 0000 001 = 0.0012 × 2x = 0.125 × 2x = 1 (least subnormal number) ... 0 0000 111 = 0.1112 × 2x = 0.875 × 2x = 7 (greatest subnormal number) Normalized numbers[edit] The mantissa is extended with "1.": 0 0001 000 = 1.0002 × 2x = 1 × 2x = 8 (least normalized number) 0 0001 001 = 1.0012 × 2x = 1.125 × 2x = 9 ... 0 0010 000 = 1.0002 × 2x+1 = 1 × 2x+1 = 16 0 0010 001 = 1.0012 × 2x+1 = 1.125 × 2x+1 = 18 ... 0 1110 000 = 1.0002 × 2x+13 = 1.000 × 2x+13 = 65536 0 1110 001 = 1.0012 × 2x+13 = 1.125 × 2x+13 = 73728 ... 0 1110 110 = 1.1102 × 2x+13 = 1.750 × 2x+13 = 114688 0 1110 111 = 1.1112 × 2x+13 = 1.875 × 2x+13 = 122880 (greatest normalized number) Infinity[edit] 0 1111 000 = +infinity 1 1111 000 = −infinity If the exponent field were not treated specially, the value would be 0 1111 000 = 1.0002 × 2x+14 = 217 = 131072 Not a number[edit] x 1111 yyy =
Without the
0 1111 111 = 1.1112 × 2x+14 = 1.875 × 217 = 245760 Value of the bias[edit] If the least subnormal value (second line above) should be 1, the value of x has to be x = 3. Therefore the bias has to be −2; that is, every stored exponent has to be decreased by −2 or has to be increased by 2, to get the numerical exponent. All values[edit] ... 000 ... 001 ... 010 ... 011 ... 100 ... 101 ... 110 ... 111 0 0000 ... 0 1 2 3 4 5 6 7 0 0001 ... 8 9 10 11 12 13 14 15 0 0010 ... 16 18 20 22 24 26 28 30 0 0011 ... 32 36 40 44 48 52 56 60 0 0100 ... 64 72 80 88 96 104 112 120 0 0101 ... 128 144 160 176 192 208 224 240 0 0110 ... 256 288 320 352 384 416 448 480 0 0111 ... 512 576 640 704 768 832 896 960 0 1000 ... 1024 1152 1280 1408 1536 1664 1792 1920 0 1001 ... 2048 2304 2560 2816 3072 3328 3584 3840 0 1010 ... 4096 4608 5120 5632 6144 6656 7168 7680 0 1011 ... 8192 9216 10240 11264 12288 13312 14336 15360 0 1100 ... 16384 18432 20480 22528 24576 26624 28672 30720 0 1101 ... 32768 36864 40960 45056 49152 53248 57344 61440 0 1110 ... 65536 73728 81920 90112 98304 106496 114688 122880 0 1111 ... Inf NaN NaN NaN NaN NaN NaN NaN 1 0000 ... −0 −1 −2 −3 −4 −5 −6 −7 1 0001 ... −8 −9 −10 −11 −12 −13 −14 −15 1 0010 ... −16 −18 −20 −22 −24 −26 −28 −30 1 0011 ... −32 −36 −40 −44 −48 −52 −56 −60 1 0100 ... −64 −72 −80 −88 −96 −104 −112 −120 1 0101 ... −128 −144 −160 −176 −192 −208 −224 −240 1 0110 ... −256 −288 −320 −352 −384 −416 −448 −480 1 0111 ... −512 −576 −640 −704 −768 −832 −896 −960 1 1000 ... −1024 −1152 −1280 −1408 −1536 −1664 −1792 −1920 1 1001 ... −2048 −2304 −2560 −2816 −3072 −3328 −3584 −3840 1 1010 ... −4096 −4608 −5120 −5632 −6144 −6656 −7168 −7680 1 1011 ... −8192 −9216 −10240 −11264 −12288 −13312 −14336 −15360 1 1100 ... −16384 −18432 −20480 −22528 −24576 −26624 −28672 −30720 1 1101 ... −32768 −36864 −40960 −45056 −49152 −53248 −57344 −61440 1 1110 ... −65536 −73728 −81920 −90112 −98304 −106496 −114688 −122880 1 1111 ... −Inf NaN NaN NaN NaN NaN NaN NaN However, in practice, floats are not shown exactly.[citation needed] Instead, they are rounded; for example, if a float had about 3 significant digits, and the number 8192 was represented, it would be rounded to 8190 to avoid false precision.[citation needed] Properties of this example[edit] Graphical representation of integral (1.4.3.−2) minifloats Integral minifloats in 1 byte have a greater range of ±122 880 than two's-complement integer with a range −128 to +127. The greater range is compensated by a poor precision, because there are only 4 mantissa bits, equivalent to slightly more than one decimal place. They also have greater range than half-precision minifloats with range ±65 504, also compensated by lack of fractions and poor precision. There are only 242 different values (if +0 and −0 are regarded as different), because 14 bit patterns represent NaN. The values between 0 and 16 have the same bit pattern as minifloat or two's-complement integer. The first pattern with a different value is 00010001, which is 18 as a minifloat and 17 as a two's-complement integer. This coincidence does not occur at all with negative values, because this minifloat is a signed-magnitude format. The (vertical) real line on the right shows clearly the varying density of the floating-point values – a property which is common to any floating-point system. This varying density results in a curve similar to the exponential function. Although the curve may appear smooth, this is not the case. The graph actually consists of distinct points, and these points lie on line segments with discrete slopes. The value of the exponent bits determines the absolute precision of the mantissa bits, and it is this precision that determines the slope of each linear segment. Arithmetic[edit] Addition[edit] Addition of (1.3.2.3)-minifloats The graphic demonstrates the addition of even smaller
(1.3.2.3)-minifloats with 6 bits. This floating-point system follows
the rules of
Subtraction Multiplication Division See also[edit] Fixed-point arithmetic References[edit] ^ Buck, Ian (2005-03-13), "Chapter 32. Taking the Plunge into GPU Computing", in Pharr, Matt, GPU Gems, ISBN 0-321-33559-7, retrieved 2018-04-05 . Munafo, Robert (15 May 2016). "Survey of Floating-Point Formats". Retrieved 8 August 2016. External links[edit] OpenGL half float pixel v t e Data types Uninterpreted Bit
Byte
Trit
Tryte
Word
Numeric Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Double precision Extended precision Half precision Long double Minifloat Octuple precision Quadruple precision Single precision Integer signedness Interval Rational Pointer Address physical virtual Reference Text Character String null-terminated Composite Algebraic data type generalized Array Associative array Class Dependent Equality Inductive List Object metaobject Option type Product Record Set Union tagged Other Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Top type Type class Unit type Void Related topics Abstract data type Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Protocol interface Subtyping Type constructor Type conversion Type system Type theory See also platform-dependent and independent un |