In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, half precision (sometimes called FP16 or float16) is a
binary floating-point
In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
computer number format that occupies
16 bits (two bytes in modern computers) in
computer memory
Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular
image processing
An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
and
neural network
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
s.
Almost all modern uses follow the
IEEE 754-2008
The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines.
The IEEE has a corporate office ...
standard, where the 16-bit
base-2 format is referred to as binary16, and the exponent uses 5 bits. This can express values in the range ±65,504, with the minimum value above 1 being 1 + 1/1024.
Depending on the computer, half-precision can be over an order of magnitude faster than double precision, e.g. 550 PFLOPS for half-precision vs 37 PFLOPS for double precision on one cloud provider.
History
Several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982 (a 4-bit exponent and a 12-bit mantissa), Thomas J. Scott's WIF of 1991 (5 exponent bits, 10 mantissa bits) and the
3dfx Voodoo Graphics processor of 1995 (same as Hitachi).
ILM was searching for an image format that could handle a wide
dynamic range
Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' " power") or dynamic may refer to:
Physics and engineering
* Dynamics (mechanics), the study of forces and their effect on motion
Brands and ent ...
, but without the hard drive and memory cost of single or double precision floating point.
The hardware-accelerated programmable shading group led by John Airey at
SGI (Silicon Graphics) used the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a
SIGGRAPH
SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) is an annual conference centered around computer graphics organized by ACM, starting in 1974 in Boulder, CO. The main conference has always been held in North ...
2000 paper
(see section 4.3) and further documented in US patent 7518615.
It was popularized by its use in the open-source
OpenEXR image format.
Nvidia
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
and
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
defined the half
datatype
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
in the
Cg language, released in early 2002, and implemented it in silicon in the
GeForce FX, released in late 2002. However, hardware support for accelerated 16-bit floating point was later dropped by Nvidia before being reintroduced in the
Tegra X1 mobile GPU in 2015.
The
F16C extension in 2012 allows x86 processors to convert half-precision floats to and from single-precision floats with a machine instruction.
IEEE 754 half-precision binary floating-point format: binary16
The IEEE 754 standard
specifies a binary16 as having the following format:
*
Sign bit: 1 bit
*
Exponent
In mathematics, exponentiation, denoted , is an operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, i ...
width: 5 bits
*
Significand
The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
precision: 11 bits (10 explicitly stored)
The format is laid out as follows:
The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus, only 10 bits of the
significand
The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log
10(2
11) ≈ 3.311 decimal digits, or 4 digits ± slightly less than 5
units in the last place).
Exponent encoding
The half-precision binary floating-point exponent is encoded using an
offset-binary representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard.
* E
min = 00001
2 − 01111
2 = −14
* E
max = 11110
2 − 01111
2 = 15
*
Exponent bias = 01111
2 = 15
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent.
The stored exponents 00000
2 and 11111
2 are interpreted specially.
The minimum strictly positive (subnormal) value is
2
−24 ≈ 5.96 × 10
−8.
The minimum positive normal value is 2
−14 ≈ 6.10 × 10
−5.
The maximum representable value is (2−2
−10) × 2
15 = 65504.
Half precision examples
These examples are given in bit representation
of the floating-point value. This includes the sign bit, (biased) exponent, and significand.
By default, 1/3 rounds down like for
double precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point arithmetic, floating-point computer number format, number format, usually occupying 64 Bit, bits in computer memory; it represents a wide range of numeri ...
, because of the odd number of bits in the significand. The bits beyond the rounding point are ... which is less than 1/2 of a
unit in the last place
In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the '' least significant digit'' (rightmost digit) represents if it ...
.
Precision limitations
65520 and larger numbers round to infinity. This is for round-to-even; other rounding strategies will change this cut-off.
ARM alternative half-precision
ARM processors support (via a floating-point
control register bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (11111
2). It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.
Uses of half precision
Half precision is used in several
computer graphics
Computer graphics deals with generating images and art with the aid of computers. Computer graphics is a core technology in digital photography, film, video games, digital art, cell phone and computer displays, and many specialized applications. ...
environments to store pixels, including
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
OpenEXR,
JPEG XR,
GIMP
Gimp or GIMP may refer to:
Clothing
* Bondage suit, also called a gimp suit, a type of suit used in BDSM
* Bondage mask, also called a gimp mask, often worn in conjunction with a gimp suit
Embroidery and crafts
* Gimp (thread), an ornamental tr ...
,
OpenGL
OpenGL (Open Graphics Library) is a Language-independent specification, cross-language, cross-platform application programming interface (API) for rendering 2D computer graphics, 2D and 3D computer graphics, 3D vector graphics. The API is typic ...
,
Vulkan,
Cg,
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...
, and
D3DX. The advantage over 8-bit or 16-bit integers is that the increased
dynamic range
Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' " power") or dynamic may refer to:
Physics and engineering
* Dynamics (mechanics), the study of forces and their effect on motion
Brands and ent ...
allows for more detail to be preserved in highlights and
shadow
A shadow is a dark area on a surface where light from a light source is blocked by an object. In contrast, shade occupies the three-dimensional volume behind an object with light in front of it. The cross-section of a shadow is a two-dimensio ...
s for images, and avoids gamma correction. The advantage over 32-bit
single-precision floating point is that it requires half the storage and
bandwidth (at the expense of precision and range).
Half precision can be useful for
mesh quantization. Mesh data is usually stored using 32-bit single-precision floats for the vertices, however in some situations it is acceptable to reduce the precision to only 16-bit half-precision, requiring only half the storage at the expense of some precision. Mesh quantization can also be done with 8-bit or 16-bit fixed precision depending on the requirements.
Hardware and software for
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
or
neural networks
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
tend to use half precision: such applications usually do a large amount of calculation, but don't require a high level of precision. Due to hardware typically not supporting 16-bit half-precision floats, neural networks often use the
bfloat16 format, which is the single precision float format truncated to 16 bits.
If the hardware has instructions to compute half-precision math, it is often faster than single or double precision. If the system has
SIMD
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
instructions that can handle multiple floating-point numbers within one instruction, half precision can be twice as fast by operating on twice as many numbers simultaneously.
Support by programming languages
Zig provides support for half precisions with its
f16
type.
.NET
The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
5 introduced half precision floating point numbers with the
System.Half
standard library type. , no .NET language (
C#,
F#,
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to:
* Visual Basic (.NET), the current version of Visual Basic launched in 2002 which runs on .NET
* Visual Basic (classic), the original Visual Basic suppo ...
, and
C++/CLI and
C++/CX) has literals (e.g. in C#,
1.0f
has type
System.Single
or
1.0m
has type
System.Decimal
) or a keyword for the type.
Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIF ...
introduced half-precision floating point numbers in Swift 5.3 with th
Float16
type.
OpenCL
OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...
also supports half-precision floating point numbers with the half datatype on IEEE 754-2008 half-precision storage format.
,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
is currently working on adding a new
f16
type for IEEE half-precision 16-bit floats.
Julia provides support for half-precision floating point numbers with the
Float16
type.
C++ introduced half-precision since C++23 with the
std::float16_t
type.
GCC already implements support for it.
Hardware support
Several versions of the
ARM architecture
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer, RISC instruction set architectures (ISAs) for central processing unit, com ...
have support for half precision.
Support for conversions with half-precision floats in the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
instruction set
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
is specified in the
F16C instruction set extension, first introduced in 2009 by AMD and fairly broadly adopted by AMD and Intel CPUs by 2012. This was further extended up the
AVX-512_FP16 instruction set extension implemented in the Intel
Sapphire Rapids
Sapphire Rapids is a codename for Intel's server (fourth generation Xeon Scalable) and workstation (Xeon W-2400/2500 and Xeon W-3400/3500) processors based on the Golden Cove microarchitecture and produced using Intel 7. It features up to 60 c ...
processor.
On
RISC-V
RISC-V (pronounced "risk-five") is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. The project commenced in 2010 at the University of California, Berkeley. It transfer ...
, the
Zfh
and
Zfhmin
extensions provide hardware support for 16-bit half precision floats. The
Zfhmin
extension is a minimal alternative to
Zfh
.
On
Power ISA, VSX and the not-yet-approved SVP64 extension provide hardware support for 16-bit half-precision floats as of PowerISA v3.1B and later.
Support for half precision on
IBM Z
IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers.
In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family will soon include the newes ...
is part of the Neural-network-processing-assist facility that IBM introduced with
Telum. IBM refers to half precision floating point data as NNP-Data-Type 1 (16-bit).
See also
*
bfloat16 floating-point format: Alternative 16-bit floating-point format with 8 bits of exponent and 7 bits of mantissa
*
Minifloat: small floating-point formats
*
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...
: IEEE standard for floating-point arithmetic (IEEE 754)
*
ISO/IEC 10967, Language Independent Arithmetic
*
Primitive data type
*
RGBE image format
*
Power Management Bus § Linear11 Floating Point Format
References
Further reading
Khronos Vulkan signed 16-bit floating point format
External links
(in ''Survey of Floating-Point Formats'')
OpenEXR siteHalf precision constantsfrom
D3DX
OpenGL treatment of half precisionFast Half Float ConversionsAnalog Devices variant(four-bit exponent)
C source code to convert between IEEE double, single, and half precision can be found hereJava source code for half-precision floating-point conversion
{{DEFAULTSORT:Half-Precision Floating-Point Format
Binary arithmetic
Floating point types