HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, half precision (sometimes called FP16) is a
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that t ...
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
computer number format A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The e ...
that occupies
16 bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two mos ...
s (two bytes in modern computers) in
computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage ...
. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular
image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
and
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s. Almost all modern uses follow the
IEEE 754-2008 The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
standard, where the 16-bit
base-2 A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" ( one). The base-2 numeral system is a positional notatio ...
format is referred to as binary16, and the exponent uses 5 bits. This can express values in the range ±65,504, with the minimum value above 1 being 1 + 1/1024. Depending on the computer, half-precision can be over an order of magnitude faster than double precision, e.g. 550 PFLOPS for half-precision vs 37 PFLOPS for double precision on one cloud provider.


History

Several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982, Scott's WIF and the 3dfx Voodoo Graphics processor.
ILM Ilm or ILM may refer to: Acronyms * Identity Lifecycle Manager, a Microsoft Server Product * ''I Love Money,'' a TV show on VH1 * Independent Loading Mechanism, a mounting system for CPU sockets * Industrial Light & Magic, an American motion pic ...
was searching for an image format that could handle a wide
dynamic range Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base-1 ...
, but without the hard drive and memory cost of single or double precision floating point. The hardware-accelerated programmable shading group led by John Airey at SGI (Silicon Graphics) invented the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a
SIGGRAPH SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) is an annual conference on computer graphics (CG) organized by the ACM SIGGRAPH, starting in 1974. The main conference is held in North America; SIGGRAPH Asia ...
2000 paper (see section 4.3) and further documented in US patent 7518615. It was popularized by its use in the open-source
OpenEXR OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license. It is notabl ...
image format.
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
and
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
defined the half
datatype In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
in the Cg language, released in early 2002, and implemented it in silicon in the
GeForce FX The GeForce FX or "GeForce 5" series (codenamed NV30) is a line of graphics processing units from the manufacturer Nvidia. Overview Nvidia's GeForce FX series is the fifth generation of the GeForce line. With GeForce 3, the company introduced pr ...
, released in late 2002. Since then support for 16-bit floating point math in graphics cards has become very common. The
F16C The F16C (previously/informally known as CVT16) instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats. History T ...
extension in 2012 allows x86 processors to convert half-precision floats to and from single-precision floats with a machine instruction.


IEEE 754 half-precision binary floating-point format: binary16

The IEEE 754 standard specifies a binary16 as having the following format: *
Sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...
: 1 bit *
Exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
width: 5 bits *
Significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
: 11 bits (10 explicitly stored) The format is laid out as follows: The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log10(211) ≈ 3.311 decimal digits, or 4 digits ± slightly less than 5 units in the last place).


Exponent encoding

The half-precision binary floating-point exponent is encoded using an
offset-binary Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numbe ...
representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard. * Emin = 000012 − 011112 = −14 * Emax = 111102 − 011112 = 15 *
Exponent bias In IEEE 754 Floating-point arithmetic, floating-point numbers, the exponent is biased in the biasing, engineering sense of the word – the value stored is offset from the actual value by the exponent bias, also called a biased exponent. Biasing is ...
= 011112 = 15 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent. The stored exponents 000002 and 111112 are interpreted specially. The minimum strictly positive (subnormal) value is 2−24 ≈ 5.96 × 10−8. The minimum positive normal value is 2−14 ≈ 6.10 × 10−5. The maximum representable value is (2−2−10) × 215 = 65504.


Half precision examples

These examples are given in bit representation of the floating-point value. This includes the sign bit, (biased) exponent, and significand. By default, 1/3 rounds down like for
double precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Flo ...
, because of the odd number of bits in the significand. The bits beyond the rounding point are ... which is less than 1/2 of a
unit in the last place In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit (rightmost digit) represents if it is 1 ...
.


Precision limitations

65519 is the largest number that will round to a finite number (65504), 65520 and larger will round to infinity. This is for round-to-even, other rounding strategies will change this cutoff.


ARM alternative half-precision

ARM processors support (via a floating point
control register A control register is a processor register which changes or controls the general behavior of a CPU or other digital device. Common tasks performed by control registers include interrupt control, switching the addressing mode, paging control, a ...
bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (111112). It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.


Uses of half precision

This format is used in several
computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
environments to store pixels, including
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
,
OpenEXR OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license. It is notabl ...
,
JPEG XR JPEG XR (JPEG extended range) is an image compression standard for continuous tone photographic images, based on the HD Photo (formerly Windows Media Photo) specifications that Microsoft originally developed and patented. It supports both lossy a ...
,
GIMP GIMP ( ; GNU Image Manipulation Program) is a free and open-source raster graphics editor used for image manipulation (retouching) and image editing, free-form drawing, transcoding between different image file formats, and more specialized task ...
,
OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
, Cg,
Direct3D Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware a ...
, and
D3DX In computing, D3DX (Direct3D Extension) is a high level API library which is written to supplement Microsoft's Direct3D graphics API. The D3DX library was introduced in Direct3D 7, and subsequently was improved in Direct3D 9. It provides classe ...
. The advantage over 8-bit or 16-bit integers is that the increased
dynamic range Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base-1 ...
allows for more detail to be preserved in highlights and
shadow A shadow is a dark area where light from a light source is blocked by an opaque object. It occupies all of the three-dimensional volume behind an object with light in front of it. The cross section of a shadow is a two-dimensional silhouette, o ...
s for images, and the linear representation of intensity making calculations easier. The advantage over 32-bit
single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floating- ...
floating point is that it requires half the storage and
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
(at the expense of precision and range). Hardware and software for
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
or
neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
tend to use half precision: such applications usually do a large amount of calculation, but don't require a high level of precision. If the hardware has instructions to compute half-precision math, it is often faster than single or double precision. If the systems has
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
instructions that can handle multiple floating-point numbers within one instruction, half precision can be twice as fast by operating on twice as many numbers simultaneously. However, if there is no hardware support, math must be done by emulation, or by conversion to single or double precision and then back, and is therefore slower.


Hardware support

Several versions of the
ARM architecture ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
have support for half precision. Support for half precision in the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
is specified in the AVX-512_FP16 instruction set extension to be implemented in the future Intel
Sapphire Rapids Sapphire Rapids is a List of Intel codenames, codename for Intel's server (fourth generation Xeon Scalable) and workstation processors based on 7 nm process, Intel 7. Sapphire Rapids was intended as part of the Eagle Stream server platform. In a ...
processor.


See also

* bfloat16 floating-point format: Alternative 16-bit floating-point format with 8 bits of exponent and 7 bits of mantissa *
Minifloat In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes, most often in computer graphics, where iter ...
: small floating-point formats *
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
: IEEE standard for floating-point arithmetic (IEEE 754) *
ISO/IEC 10967 ISO/IEC 10967, Language independent arithmetic (LIA), is a series of standards on computer arithmetic. It is compatible with ISO/IEC/IEEE 60559:2011, more known as IEEE 754-2008, and much of the specifications are for IEEE 754 special values (tho ...
, Language Independent Arithmetic *
Primitive data type In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled pr ...
*
RGBE image format RGBE or Radiance HDR is an image format invented by Gregory Ward Larson for the Radiance rendering system. It stores pixels as one byte each for RGB (red, green, and blue) values with a one byte shared exponent. Thus it stores four bytes per pixe ...
* Power Management Bus § Linear11 Floating Point Format


References


Further reading


Khronos Vulkan signed 16-bit floating point format


External links



(in ''Survey of Floating-Point Formats'')
OpenEXR site

Half precision constants
from
D3DX In computing, D3DX (Direct3D Extension) is a high level API library which is written to supplement Microsoft's Direct3D graphics API. The D3DX library was introduced in Direct3D 7, and subsequently was improved in Direct3D 9. It provides classe ...

OpenGL treatment of half precision

Fast Half Float Conversions

Analog Devices variant
(four-bit exponent)
C source code to convert between IEEE double, single, and half precision can be found here

Java source code for half-precision floating-point conversion


{{DEFAULTSORT:Half-Precision Floating-Point Format Binary arithmetic Floating point types