![OpenArena-Rocket](https://upload.wikimedia.org/wikipedia/commons/5/53/OpenArena-Rocket.jpg)
Fast inverse square root, sometimes referred to as Fast InvSqrt() or by the
hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
constant 0x5F3759DF, is an algorithm that estimates
, the
reciprocal
Reciprocal may refer to:
In mathematics
* Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal''
* Reciprocal polynomial, a polynomial obtained from another pol ...
(or multiplicative inverse) of the
square root
In mathematics, a square root of a number is a number such that ; in other words, a number whose ''square'' (the result of multiplying the number by itself, or ⋅ ) is . For example, 4 and −4 are square roots of 16, because .
E ...
of a 32-bit
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
number
in
IEEE 754 floating-point format. This operation is used in
digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
to
normalize a vector, such as scaling it to length 1. For example,
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
programs use inverse square roots to compute
angles of incidence and
reflection Reflection or reflexion may refer to:
Science and technology
* Reflection (physics), a common wave phenomenon
** Specular reflection, reflection from a smooth surface
*** Mirror image, a reflection in a mirror or in water
** Signal reflection, in ...
for
lighting
Lighting or illumination is the deliberate use of light to achieve practical or aesthetic effects. Lighting includes the use of both artificial light sources like lamps and light fixtures, as well as natural illumination by capturing daylig ...
and
shading
Shading refers to the depiction of depth perception in 3D models (within the field of 3D computer graphics) or illustrations (in visual art) by varying the level of darkness. Shading tries to approximate local behavior of light on the object's ...
. Predated by similar video game algorithms, this one is best known for its implementation in 1999 in ''
Quake III Arena
''Quake III Arena'' is a 1999 multiplayer-focused first-person shooter developed by id Software. The third installment of the ''Quake'' series, ''Arena'' differs from previous games by excluding a story-based single-player mode and focusing prima ...
'', a
first-person shooter
First-person shooter (FPS) is a sub-genre of shooter video games centered on gun and other weapon-based combat in a first-person perspective, with the player experiencing the action through the eyes of the protagonist and controlling the pl ...
video game heavily based on
3D graphics
3D computer graphics, or “3D graphics,” sometimes called CGI, 3D-CGI or three-dimensional computer graphics are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the ...
. The algorithm only started appearing on public forums between 2002 and 2003. Computation of square roots usually depends upon many division operations, which for floating point numbers are
computationally expensive
In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that re ...
. The fast inverse square generates a good approximation with only one division step.
The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a
logical shift
In computer science, a logical shift is a bitwise operation that shifts all the bits of its operand. The two base variants are the logical left shift and the logical right shift. This is further modulated by the number of bit positions a gi ...
right by one bit is performed and the result subtracted from the number
0x5F3759DF (in decimal notation: 1,597,463,007), which is a floating-point representation of an approximation of
.
This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
, yielding a more precise approximation.
The algorithm was often misattributed to
John Carmack
John D. Carmack II (born August 20, 1970) is an American computer programmer and video game developer. He co-founded the video game company id Software and was the lead programmer of its 1990s games ''Commander Keen'', ''Wolfenstein 3D'', ''Doo ...
, but in fact the code is based on an unpublished paper by
William Kahan
William "Velvel" Morton Kahan (born June 5, 1933) is a Canadian mathematician and computer scientist, who received the Turing Award in 1989 for "''his fundamental contributions to numerical analysis''",
was named an ACM Fellow in 1994, and inducte ...
and K.C. Ng circulated in May 1986. The original constant was produced from a collaboration between Cleve Moler and Gregory Walsh, while they worked for Ardent Computing in the late 1980s.
With subsequent hardware advancements, especially the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
SSE instruction
rsqrtss
, this method is not generally applicable to general purpose computing,
though it remains an interesting example both historically
and for more limited machines, such as low-cost embedded systems. However, more manufacturers of embedded systems are including trigonometric and other math accelerators such as
CORDIC
CORDIC (for "coordinate rotation digital computer"), also known as Volder's algorithm, or: Digit-by-digit method Circular CORDIC (Jack E. Volder), Linear CORDIC, Hyperbolic CORDIC (John Stephen Walther), and Generalized Hyperbolic CORDIC (GH C ...
, avoiding the need for such algorithms.
Motivation
![Surface_normals](https://upload.wikimedia.org/wikipedia/commons/c/cc/Surface_normals.svg)
The inverse square root of a floating point number is used in calculating a
normalized vector
In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat").
The term ''direction vecto ...
. Programs can use normalized vectors to determine angles of incidence and
reflection Reflection or reflexion may refer to:
Science and technology
* Reflection (physics), a common wave phenomenon
** Specular reflection, reflection from a smooth surface
*** Mirror image, a reflection in a mirror or in water
** Signal reflection, in ...
.
3D graphics
3D computer graphics, or “3D graphics,” sometimes called CGI, 3D-CGI or three-dimensional computer graphics are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the ...
programs must perform millions of these calculations every second to simulate lighting. When the code was developed in the early 1990s, most floating point processing power lagged the speed of integer processing.
This was troublesome for 3D graphics programs before the advent of specialized hardware to handle
transform and lighting
Transform, clipping, and lighting (T&L or TCL) is a term used in computer graphics.
Overview
Transformation is the task of producing a two-dimensional view of a three-dimensional scene. Clipping means only drawing the parts of the scene that w ...
.
The length of the vector is determined by calculating its
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean s ...
: the square root of the sum of squares of the
vector components
In mathematics, physics, and engineering, a Euclidean vector or simply a vector (sometimes called a geometric vector or spatial vector) is a geometric object that has Magnitude (mathematics), magnitude (or euclidean norm, length) and Direction ( ...
. When each component of the vector is divided by that length, the new vector will be a
unit vector
In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat").
The term ''direction vecto ...
pointing in the same direction. In a 3D graphics program, all vectors are in three-
dimensional
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordi ...
space, so
would be a vector
.
:
is the Euclidean norm of the vector.
:
is the normalized (unit) vector, using
to represent
.
:
which relates the unit vector to the inverse square root of the distance components. The inverse square root can be used to compute
because this equation is equivalent to
:
where the fraction term is the inverse square root of
.
At the time, floating-point division was generally expensive compared to multiplication; the fast inverse square root algorithm bypassed the division step, giving it its performance advantage.
Overview of the code
The following code is the fast inverse square root implementation from ''
Quake III Arena
''Quake III Arena'' is a 1999 multiplayer-focused first-person shooter developed by id Software. The third installment of the ''Quake'' series, ''Arena'' differs from previous games by excluding a story-based single-player mode and focusing prima ...
'', stripped of
C preprocessor
The C preprocessor is the macro preprocessor for the C, Objective-C and C++ computer programming languages. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control ...
directives, but including the exact original comment text:
float Q_rsqrt( float number )
At the time, the general method to compute the inverse square root was to calculate an approximation for
, then revise that approximation via another method until it came within an acceptable error range of the actual result.
Common software methods in the early 1990s drew approximations from a
lookup table
In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation. The process is termed as "direct addressing" and LUTs differ from hash tables in a way that, to retrieve a value v wi ...
. The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups. The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division. The algorithm was designed with the
IEEE 754-1985
IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the ...
32-bit floating-point specification in mind, but investigation from Chris Lomont showed that it could be implemented in other floating-point specifications.
The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
word
[Use of the type ]long
reduces the portability of this code on modern systems. For the code to execute properly, sizeof
sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of ''char''-sized units. Consequently, the construct ''sizeof (char)'' is guaranteed to be ' ...
(long)
must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems, sizeof
sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of ''char''-sized units. Consequently, the construct ''sizeof (char)'' is guaranteed to be ' ...
(long)
is 8 bytes. The more portable replacement is int32_t
. as an
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
, then subtracting it from a "
magic
Magic or Magick most commonly refers to:
* Magic (supernatural), beliefs and actions employed to influence supernatural beings and forces
* Ceremonial magic, encompasses a wide variety of rituals of magic
* Magical thinking, the belief that unrela ...
" constant, .
This integer subtraction and
bit shift
In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operati ...
results in a bit pattern which, when
re-defined as a floating-point number, is a rough approximation for the inverse square root of the number. One iteration of Newton's method is performed to gain some accuracy, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
; however, it is much slower and less accurate than using the
SSE instruction
rsqrtss
on x86 processors also released in 1999.
Worked example
As an example, the number
can be used to calculate
. The first steps of the algorithm are illustrated below:
0011_1110_0010_0000_0000_0000_0000_0000 Bit pattern of both x and i
0001_1111_0001_0000_0000_0000_0000_0000 Shift right one position: (i >> 1)
0101_1111_0011_0111_0101_1001_1101_1111 The magic number 0x5F3759DF
0100_0000_0010_0111_0101_1001_1101_1111 The result of 0x5F3759DF - (i >> 1)
Interpreting as IEEE 32-bit representation:
0_01111100_01000000000000000000000 1.25 × 2
−3
0_00111110_00100000000000000000000 1.125 × 2
−65
0_10111110_01101110101100111011111 1.432430... × 2
63
0_10000000_01001110101100111011111 1.307430... × 2
1
Reinterpreting this last bit pattern as a floating point number gives the approximation
, which has an error of about 3.4%. After one single iteration of
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
, the final result is
, an error of only 0.17%.
Avoiding undefined behavior
According to the C standard, reinterpreting a floating point value as an integer by removing the pointer to it is considered unexpected behavior (
undefined behavior
In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, ...
). Another way would be to place the floating point value in an anonymous
union
Union commonly refers to:
* Trade union, an organization of workers
* Union (set theory), in mathematics, a fundamental operation on sets
Union may also refer to:
Arts and entertainment
Music
* Union (band), an American rock group
** ''Un ...
containing an additional 32-bit unsigned integer member, and accesses to that integer provides a bit level view of the contents of the floating point value. However,
type punning
In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal ...
through a
union
Union commonly refers to:
* Trade union, an organization of workers
* Union (set theory), in mathematics, a fundamental operation on sets
Union may also refer to:
Arts and entertainment
Music
* Union (band), an American rock group
** ''Un ...
is also undefined behavior in C++.
#include // uint32_t
float Q_rsqrt(float number)
In C++ the usual method for implementing this function's casts is through C++20's
std::bit_cast
. This also allows the function to work in
constexpr
context:
#include
#include
#include
constexpr float Q_rsqrt(float number) noexcept
Algorithm
The algorithm computes
by performing the following steps:
# Alias the argument
to an integer as a way to compute an approximation of the
binary logarithm
In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number ,
:x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n.
For example, the binary logarithm of is , the b ...
# Use this approximation to compute an approximation of
# Alias back to a float, as a way to compute an approximation of the base-2 exponential
# Refine the approximation using a single iteration of Newton's method.
Floating-point representation
Since this algorithm relies heavily on the bit-level representation of single-precision floating-point numbers, a short overview of this representation is provided here. To encode a non-zero real number
as a single precision float, the first step is to write
as a
normalized binary number:
:
where the exponent
is an integer,
, and
is the binary representation of the "significand"
. Since the single bit before the point in the significand is always 1, it need not be stored. From this form, three unsigned integers are computed:
*
, the "sign bit", is
if
is positive and
negative or zero (1 bit)
*
is the "biased exponent", where
is the "exponent bias"
[ should be in the range for to be representable as a ]normal number
In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to b ...
. (8 bits)
*
, where
[The only real numbers that can be represented ''exactly'' as floating point are those for which is an integer. Other numbers can only be represented approximately by rounding them to the nearest exactly representable number.] (23 bits)
These fields are then packed, left to right, into a 32-bit container.
As an example, consider again the number
. Normalizing
yields:
:
and thus, the three unsigned integer fields are:
*
*
*
these fields are packed as shown in the figure below:
Aliasing to an integer as an approximate logarithm
If
were to be calculated without a computer or a calculator, a
table of logarithms would be useful, together with the identity
, which is valid for every base
. The fast inverse square root is based on this identity, and on the fact that aliasing a float32 to an integer gives a rough approximation of its logarithm. Here is how:
If
is a positive
normal number
In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to b ...
:
:
then
:
and since