computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...

, half precision (sometimes called FP16) is a

binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ...

floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...

computer number format A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The ...

that occupies

16 bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two m ...

s (two bytes in modern computers) in

computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term '' primary storage ...

. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimension ...

and

neural network A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...

s. Almost all modern uses follow the

IEEE 754-2008 The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...

standard, where the 16-bit

base-2 A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" (one). The base-2 numeral system is a positional notation ...

format is referred to as binary16, and the exponent uses 5 bits. This can express values in the range ±65,504, with the minimum value above 1 being 1 + 1/1024. Depending on the computer, half-precision can be over an order of magnitude faster than double precision, e.g. 550 PFLOPS for half-precision vs 37 PFLOPS for double precision on one cloud provider.

History

Several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982, Scott's WIF and the 3dfx Voodoo Graphics processor.

ILM Ilm or ILM may refer to: Acronyms * Identity Lifecycle Manager, a Microsoft Server Product * '' I Love Money,'' a TV show on VH1 * Independent Loading Mechanism, a mounting system for CPU sockets * Industrial Light & Magic, an American motion ...

was searching for an image format that could handle a wide

dynamic range Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base- ...

, but without the hard drive and memory cost of single or double precision floating point. The hardware-accelerated programmable shading group led by John Airey at SGI (Silicon Graphics) invented the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a

SIGGRAPH SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) is an annual conference on computer graphics (CG) organized by the ACM SIGGRAPH, starting in 1974. The main conference is held in North America; SIGGRAPH Asia ...

2000 paper (see section 4.3) and further documented in US patent 7518615. It was popularized by its use in the open-source

OpenEXR OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license. It is notab ...

image format.

Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...

and

Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...

defined the half

datatype In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...

in the Cg language, released in early 2002, and implemented it in silicon in the

GeForce FX The GeForce FX or "GeForce 5" series ( codenamed NV30) is a line of graphics processing units from the manufacturer Nvidia. Overview Nvidia's GeForce FX series is the fifth generation of the GeForce line. With GeForce 3, the company introduc ...

, released in late 2002. Since then support for 16-bit floating point math in graphics cards has become very common. The

F16C The F16C (previously/informally known as CVT16) instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats. History Th ...

extension in 2012 allows x86 processors to convert half-precision floats to and from single-precision floats with a machine instruction.

IEEE 754 half-precision binary floating-point format: binary16

The IEEE 754 standard specifies a binary16 as having the following format: *

Sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the te ...

: 1 bit *

Exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...

width: 5 bits *

Significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...

precision: 11 bits (10 explicitly stored) The format is laid out as follows: The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the

significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...

appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log₁₀(2¹¹) ≈ 3.311 decimal digits, or 4 digits ± slightly less than 5 units in the last place).

Exponent encoding

The half-precision binary floating-point exponent is encoded using an

offset-binary Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numb ...

representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard. * E_min = 00001₂ − 01111₂ = −14 * E_max = 11110₂ − 01111₂ = 15 *

Exponent bias In IEEE 754 floating-point numbers, the exponent is biased in the engineering sense of the word – the value stored is offset from the actual value by the exponent bias, also called a biased exponent. Biasing is done because exponents have to be ...

= 01111₂ = 15 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent. The stored exponents 00000₂ and 11111₂ are interpreted specially. The minimum strictly positive (subnormal) value is 2⁻²⁴ ≈ 5.96 × 10⁻⁸. The minimum positive normal value is 2⁻¹⁴ ≈ 6.10 × 10⁻⁵. The maximum representable value is (2−2⁻¹⁰) × 2¹⁵ = 65504.

Half precision examples

These examples are given in bit representation of the floating-point value. This includes the sign bit, (biased) exponent, and significand. By default, 1/3 rounds down like for

double precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. F ...

, because of the odd number of bits in the significand. The bits beyond the rounding point are ... which is less than 1/2 of a

unit in the last place In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit (rightmost digit) represents if it is 1. ...

Precision limitations

65519 is the largest number that will round to a finite number (65504), 65520 and larger will round to infinity. This is for round-to-even, other rounding strategies will change this cutoff.

ARM alternative half-precision

ARM processors support (via a floating point

control register A control register is a processor register which changes or controls the general behavior of a CPU or other digital device. Common tasks performed by control registers include interrupt control, switching the addressing mode, paging control, ...

bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (11111₂). It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.

Uses of half precision

This format is used in several

computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great deal ...

environments to store pixels, including

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...

JPEG XR JPEG XR (JPEG extended range) is an image compression standard for continuous tone photographic images, based on the HD Photo (formerly Windows Media Photo) specifications that Microsoft originally developed and patented. It supports both loss ...

GIMP GIMP ( ; GNU Image Manipulation Program) is a free and open-source raster graphics editor used for image manipulation (retouching) and image editing, free-form drawing, transcoding between different image file formats, and more specialized ...

OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve ha ...

, Cg,

Direct3D Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...

, and

D3DX In computing, D3DX (Direct3D Extension) is a high level API library which is written to supplement Microsoft's Direct3D graphics API. The D3DX library was introduced in Direct3D 7, and subsequently was improved in Direct3D 9. It provides classes ...

. The advantage over 8-bit or 16-bit integers is that the increased

allows for more detail to be preserved in highlights and

shadow A shadow is a dark area where light from a light source is blocked by an opaque object. It occupies all of the three-dimensional volume behind an object with light in front of it. The cross section of a shadow is a two-dimensional silhouette ...

s for images, and the linear representation of intensity making calculations easier. The advantage over 32-bit

single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floati ...

floating point is that it requires half the storage and

bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...

(at the expense of precision and range). Hardware and software for

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

tend to use half precision: such applications usually do a large amount of calculation, but don't require a high level of precision. If the hardware has instructions to compute half-precision math, it is often faster than single or double precision. If the systems has

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...

instructions that can handle multiple floating-point numbers within one instruction, half precision can be twice as fast by operating on twice as many numbers simultaneously. However, if there is no hardware support, math must be done by emulation, or by conversion to single or double precision and then back, and is therefore slower.

Hardware support

Several versions of the

ARM architecture ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configure ...

have support for half precision. Support for half precision in the

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intro ...

instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called a ...

is specified in the AVX-512_FP16 instruction set extension to be implemented in the future Intel

Sapphire Rapids Sapphire Rapids is a codename for Intel's server (fourth generation Xeon Scalable) and workstation processors based on Intel 7. Sapphire Rapids was intended as part of the Eagle Stream server platform. In addition, it will be powering Aurora, a ...

processor.

References

External links

(in ''Survey of Floating-Point Formats'')
OpenEXR site

Half precision constants
from

OpenGL treatment of half precision

Fast Half Float Conversions

Analog Devices variant
(four-bit exponent)
C source code to convert between IEEE double, single, and half precision can be found here

Java source code for half-precision floating-point conversion

{{DEFAULTSORT:Half-Precision Floating-Point Format Binary arithmetic Floating point types