In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, half precision (sometimes called FP16) is a
binary floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...
computer number format that occupies
16 bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two m ...
s (two bytes in modern computers) in
computer memory
In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term '' primary storag ...
. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular
image processing
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensio ...
and
neural network
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s.
Almost all modern uses follow the
IEEE 754-2008
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
standard, where the 16-bit
base-2 format is referred to as binary16, and the exponent uses 5 bits. This can express values in the range ±65,504, with the minimum value above 1 being 1 + 1/1024.
Depending on the computer, half-precision can be over an order of magnitude faster than double precision, e.g. 550 PFLOPS for half-precision vs 37 PFLOPS for double precision on one cloud provider.
History
Several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982, Scott's WIF and the
3dfx Voodoo Graphics processor.
ILM Ilm or ILM may refer to:
Acronyms
* Identity Lifecycle Manager, a Microsoft Server Product
* '' I Love Money,'' a TV show on VH1
* Independent Loading Mechanism, a mounting system for CPU sockets
* Industrial Light & Magic, an American motion ...
was searching for an image format that could handle a wide
dynamic range
Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base ...
, but without the hard drive and memory cost of single or double precision floating point.
The hardware-accelerated programmable shading group led by John Airey at
SGI (Silicon Graphics) invented the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a
SIGGRAPH
SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) is an annual conference on computer graphics (CG) organized by the ACM SIGGRAPH, starting in 1974. The main conference is held in North America; SIGGRAPH Asia ...
2000 paper
(see section 4.3) and further documented in US patent 7518615.
It was popularized by its use in the open-source
OpenEXR image format.
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
and
Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
defined the half
datatype
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
in the
Cg language, released in early 2002, and implemented it in silicon in the
GeForce FX, released in late 2002. Since then support for 16-bit floating point math in graphics cards has become very common.
The
F16C extension in 2012 allows x86 processors to convert half-precision floats to and from single-precision floats with a machine instruction.
IEEE 754 half-precision binary floating-point format: binary16
The IEEE 754 standard specifies a binary16 as having the following format:
*
Sign bit: 1 bit
*
Exponent
Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to r ...
width: 5 bits
*
Significand
The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
precision: 11 bits (10 explicitly stored)
The format is laid out as follows:
The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the
significand
The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log
10(2
11) ≈ 3.311 decimal digits, or 4 digits ± slightly less than 5
units in the last place).
Exponent encoding
The half-precision binary floating-point exponent is encoded using an
offset-binary representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard.
* E
min = 00001
2 − 01111
2 = −14
* E
max = 11110
2 − 01111
2 = 15
*
Exponent bias = 01111
2 = 15
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent.
The stored exponents 00000
2 and 11111
2 are interpreted specially.
The minimum strictly positive (subnormal) value is
2
−24 ≈ 5.96 × 10
−8.
The minimum positive normal value is 2
−14 ≈ 6.10 × 10
−5.
The maximum representable value is (2−2
−10) × 2
15 = 65504.
Half precision examples
These examples are given in bit representation
of the floating-point value. This includes the sign bit, (biased) exponent, and significand.
By default, 1/3 rounds down like for
double precision, because of the odd number of bits in the significand. The bits beyond the rounding point are ... which is less than 1/2 of a
unit in the last place
In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit (rightmost digit) represents if it is 1 ...
.
Precision limitations
65519 is the largest number that will round to a finite number (65504), 65520 and larger will round to infinity. This is for round-to-even, other rounding strategies will change this cutoff.
ARM alternative half-precision
ARM processors support (via a floating point
control register
A control register is a processor register which changes or controls the general behavior of a CPU or other digital device. Common tasks performed by control registers include interrupt control, switching the addressing mode, paging control, ...
bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (11111
2). It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.
Uses of half precision
This format is used in several
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
environments to store pixels, including
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
OpenEXR,
JPEG XR,
GIMP
GIMP ( ; GNU Image Manipulation Program) is a free and open-source raster graphics editor used for image manipulation (retouching) and image editing, free-form drawing, transcoding between different image file formats, and more specialized ...
,
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
,
Cg,
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...
, and
D3DX. The advantage over 8-bit or 16-bit integers is that the increased
dynamic range
Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base ...
allows for more detail to be preserved in highlights and
shadow
A shadow is a dark area where light from a light source is blocked by an opaque object. It occupies all of the three-dimensional volume behind an object with light in front of it. The cross section of a shadow is a two- dimensional silhouett ...
s for images, and the linear representation of intensity making calculations easier. The advantage over 32-bit
single-precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floatin ...
floating point is that it requires half the storage and
bandwidth (at the expense of precision and range).
Hardware and software for
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
or
neural networks
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
tend to use half precision: such applications usually do a large amount of calculation, but don't require a high level of precision.
If the hardware has instructions to compute half-precision math, it is often faster than single or double precision. If the systems has
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
instructions that can handle multiple floating-point numbers within one instruction, half precision can be twice as fast by operating on twice as many numbers simultaneously. However, if there is no hardware support, math must be done by emulation, or by conversion to single or double precision and then back, and is therefore slower.
Hardware support
Several versions of the
ARM architecture
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
have support for half precision.
Support for half precision in the
x86 instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
is specified in the
AVX-512_FP16 instruction set extension to be implemented in the future Intel
Sapphire Rapids processor.
See also
*
bfloat16 floating-point format
The bfloat16 (Brain Floating Point) floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a truncated (16- ...
: Alternative 16-bit floating-point format with 8 bits of exponent and 7 bits of mantissa
*
Minifloat: small floating-point formats
*
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found ...
: IEEE standard for floating-point arithmetic (IEEE 754)
*
ISO/IEC 10967, Language Independent Arithmetic
*
Primitive data type
*
RGBE image format
*
Power Management Bus § Linear11 Floating Point Format
References
Further reading
Khronos Vulkan signed 16-bit floating point format
External links
(in ''Survey of Floating-Point Formats'')
OpenEXR siteHalf precision constantsfrom
D3DX
OpenGL treatment of half precisionFast Half Float ConversionsAnalog Devices variant(four-bit exponent)
C source code to convert between IEEE double, single, and half precision can be found hereJava source code for half-precision floating-point conversion
{{DEFAULTSORT:Half-Precision Floating-Point Format
Binary arithmetic
Floating point types