Fast inverse square root, sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates

\frac

, the

reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...

(or multiplicative inverse) of the

square root In mathematics, a square root of a number is a number such that ; in other words, a number whose ''square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...

of a 32-bit floating-point number

x

in IEEE 754 floating-point format. This operation is used in digital signal processing to normalize a vector, such as scaling it to length 1. For example,

computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...

programs use inverse square roots to compute angles of incidence and

reflection Reflection or reflexion may refer to: Science and technology * Reflection (physics), a common wave phenomenon ** Specular reflection, reflection from a smooth surface *** Mirror image, a reflection in a mirror or in water ** Signal reflection, in ...

for

lighting Lighting or illumination is the deliberate use of light to achieve practical or aesthetic effects. Lighting includes the use of both artificial light sources like lamps and light fixtures, as well as natural illumination by capturing daylig ...

and

shading Shading refers to the depiction of depth perception in 3D models (within the field of 3D computer graphics) or illustrations (in visual art) by varying the level of darkness. Shading tries to approximate local behavior of light on the ob ...

. Predated by similar video game algorithms, this one is best known for its implementation in 1999 in ''

Quake III Arena ''Quake III Arena'' is a 1999 multiplayer-focused first-person shooter developed by id Software. The third installment of the ''Quake'' series, ''Arena'' differs from previous games by excluding a story-based single-player mode and focusing prima ...

'', a

first-person shooter First-person shooter (FPS) is a sub-genre of shooter video games centered on gun and other weapon-based combat in a first-person perspective, with the player experiencing the action through the eyes of the protagonist and controlling the p ...

video game heavily based on

3D graphics 3D computer graphics, or “3D graphics,” sometimes called CGI, 3D-CGI or three-dimensional computer graphics are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for th ...

. The algorithm only started appearing on public forums between 2002 and 2003. Computation of square roots usually depends upon many division operations, which for floating point numbers are

computationally expensive In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that re ...

. The fast inverse square generates a good approximation with only one division step. The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a

logical shift In computer science, a logical shift is a bitwise operation that shifts all the bits of its operand. The two base variants are the logical left shift and the logical right shift. This is further modulated by the number of bit positions a giv ...

right by one bit is performed and the result subtracted from the number 0x5F3759DF (in decimal notation: 1,597,463,007), which is a floating-point representation of an approximation of

\sqrt

. This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of Newton's method, yielding a more precise approximation. The algorithm was often misattributed to John Carmack, but in fact the code is based on an unpublished paper by William Kahan and K.C. Ng circulated in May 1986. The original constant was produced from a collaboration between Cleve Moler and Gregory Walsh, while they worked for Ardent Computing in the late 1980s. With subsequent hardware advancements, especially the

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...

SSE instruction rsqrtss, this method is not generally applicable to general purpose computing, though it remains an interesting example both historically and for more limited machines, such as low-cost embedded systems. However, more manufacturers of embedded systems are including trigonometric and other math accelerators such as

CORDIC CORDIC (for "coordinate rotation digital computer"), also known as Volder's algorithm, or: Digit-by-digit method Circular CORDIC (Jack E. Volder), Linear CORDIC, Hyperbolic CORDIC (John Stephen Walther), and Generalized Hyperbolic CORDIC (GH C ...

, avoiding the need for such algorithms.

Motivation

The inverse square root of a floating point number is used in calculating a

normalized vector In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat"). The term ''direction vecto ...

. Programs can use normalized vectors to determine angles of incidence and

programs must perform millions of these calculations every second to simulate lighting. When the code was developed in the early 1990s, most floating point processing power lagged the speed of integer processing. This was troublesome for 3D graphics programs before the advent of specialized hardware to handle

transform and lighting Transform, clipping, and lighting (T&L or TCL) is a term used in computer graphics. Overview Transformation is the task of producing a two-dimensional view of a three-dimensional scene. Clipping means only drawing the parts of the scene that w ...

. The length of the vector is determined by calculating its Euclidean norm: the square root of the sum of squares of the

vector components In mathematics, physics, and engineering, a Euclidean vector or simply a vector (sometimes called a geometric vector or spatial vector) is a geometric object that has magnitude (or length) and direction. Vectors can be added to other vectors a ...

. When each component of the vector is divided by that length, the new vector will be a

unit vector In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat"). The term ''direction v ...

pointing in the same direction. In a 3D graphics program, all vectors are in three- dimensional space, so

\boldsymbol v

would be a vector

(v_1, v_2, v_3)

. :

\, \boldsymbol\,  = \sqrt

is the Euclidean norm of the vector. :

\boldsymbol = \frac

is the normalized (unit) vector, using

\, \boldsymbol\, ^2

to represent

v_1^2+v_2^2+v_3^2

. :

\boldsymbol = \frac

which relates the unit vector to the inverse square root of the distance components. The inverse square root can be used to compute

\boldsymbol

because this equation is equivalent to :

\boldsymbol = \boldsymbol\, \frac

where the fraction term is the inverse square root of

\, \boldsymbol\, ^2

. At the time, floating-point division was generally expensive compared to multiplication; the fast inverse square root algorithm bypassed the division step, giving it its performance advantage.

Overview of the code

The following code is the fast inverse square root implementation from ''

'', stripped of

C preprocessor The C preprocessor is the macro preprocessor for the C, Objective-C and C++ computer programming languages. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control ...

directives, but including the exact original comment text: float Q_rsqrt( float number ) At the time, the general method to compute the inverse square root was to calculate an approximation for

\frac

, then revise that approximation via another method until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew approximations from a lookup table. The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups. The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division. The algorithm was designed with the

IEEE 754-1985 IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the ...

32-bit floating-point specification in mind, but investigation from Chris Lomont showed that it could be implemented in other floating-point specifications. The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit floating-point wordUse of the type long reduces the portability of this code on modern systems. For the code to execute properly,

sizeof 


sizeof is a unary operator in the programming languages  C and C++. It generates the storage size of an expression or a data type, measured in the number of ''char''-sized units. Consequently, the construct ''sizeof (char)'' is guaranteed to be ' ...
(long)

must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems,

sizeof 


sizeof is a unary operator in the programming languages  C and C++. It generates the storage size of an expression or a data type, measured in the number of ''char''-sized units. Consequently, the construct ''sizeof (char)'' is guaranteed to be ' ...
(long)

is 8 bytes. The more portable replacement is int32_t. as an

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...

, then subtracting it from a "

magic Magic or Magick most commonly refers to: * Magic (supernatural), beliefs and actions employed to influence supernatural beings and forces * Ceremonial magic, encompasses a wide variety of rituals of magic * Magical thinking, the belief that unrela ...

" constant, . This integer subtraction and

bit shift In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic oper ...

results in a bit pattern which, when re-defined as a floating-point number, is a rough approximation for the inverse square root of the number. One iteration of Newton's method is performed to gain some accuracy, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for Newton's method; however, it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors also released in 1999.

Worked example

As an example, the number

x=0.15625

can be used to calculate

\frac \approx 2.52982

. The first steps of the algorithm are illustrated below: 0011_1110_0010_0000_0000_0000_0000_0000 Bit pattern of both x and i 0001_1111_0001_0000_0000_0000_0000_0000 Shift right one position: (i >> 1) 0101_1111_0011_0111_0101_1001_1101_1111 The magic number 0x5F3759DF 0100_0000_0010_0111_0101_1001_1101_1111 The result of 0x5F3759DF - (i >> 1) Interpreting as IEEE 32-bit representation: 0_01111100_01000000000000000000000 1.25 × 2⁻³ 0_00111110_00100000000000000000000 1.125 × 2⁻⁶⁵ 0_10111110_01101110101100111011111 1.432430... × 2⁶³ 0_10000000_01001110101100111011111 1.307430... × 2¹ Reinterpreting this last bit pattern as a floating point number gives the approximation

y=2.61486

, which has an error of about 3.4%. After one single iteration of Newton's method, the final result is

y=2.52549

, an error of only 0.17%.

Avoiding undefined behavior

According to the C standard, reinterpreting a floating point value as an integer by removing the pointer to it is considered unexpected behavior (

undefined behavior In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior ...

). Another way would be to place the floating point value in an anonymous

union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...

containing an additional 32-bit unsigned integer member, and accesses to that integer provides a bit level view of the contents of the floating point value. However, type punning through a

is also undefined behavior in C++. #include // uint32_t float Q_rsqrt(float number) In C++ the usual method for implementing this function's casts is through C++20's std::bit_cast. This also allows the function to work in constexpr context: #include #include #include constexpr float Q_rsqrt(float number) noexcept

Algorithm

The algorithm computes

\frac

by performing the following steps: # Alias the argument

x

to an integer as a way to compute an approximation of the binary logarithm

\log_(x)

# Use this approximation to compute an approximation of

\log_\left(\frac\right) = -\frac \log_(x)

# Alias back to a float, as a way to compute an approximation of the base-2 exponential # Refine the approximation using a single iteration of Newton's method.

Floating-point representation

Since this algorithm relies heavily on the bit-level representation of single-precision floating-point numbers, a short overview of this representation is provided here. To encode a non-zero real number

x

as a single precision float, the first step is to write

x

as a normalized binary number: :

\begin
x &= \pm 1.b_1b_2b_3\ldots \times 2^\\
  &= \pm 2^ (1 + m_x)
\end

where the exponent

e_x

is an integer,

m_x \in [0, 1)

, and

1.b_1b_2b_3\ldots

is the binary representation of the "significand"

(1 + m_x)

. Since the single bit before the point in the significand is always 1, it need not be stored. From this form, three unsigned integers are computed: *

S_x

, the "sign bit", is

0

x

is positive and

1

negative or zero (1 bit) *

E_x = e_x + B

is the "biased exponent", where

B = 127

is the "exponent bias"

E_x

should be in the range

[1, 254]

for

x

to be representable as a

normal number In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to b ...

. (8 bits) *

M_x = m_x \times L

, where

L = 2^

The only real numbers that can be represented ''exactly'' as floating point are those for which

M_x

is an integer. Other numbers can only be represented approximately by rounding them to the nearest exactly representable number. (23 bits) These fields are then packed, left to right, into a 32-bit container. As an example, consider again the number

x = 0.15625 = 0.00101_2

. Normalizing

x

yields: :

x = +2^(1 + 0.25)

and thus, the three unsigned integer fields are: *

S = 0

E = -3 + 127 = 124 = 0111\ 1100_2

M = 0.25 \times 2^ = 2\ 097\ 152 = 0010\ 0000\ 0000\ 0000\ 0000\ 0000_2

these fields are packed as shown in the figure below: Float w significand 2

Aliasing to an integer as an approximate logarithm

\frac

were to be calculated without a computer or a calculator, a table of logarithms would be useful, together with the identity

\log_b\left(\frac\right) = \log_b\left(x^\right) = -\frac \log_b(x)

, which is valid for every base

b

. The fast inverse square root is based on this identity, and on the fact that aliasing a float32 to an integer gives a rough approximation of its logarithm. Here is how: If

x

is a positive

: :

x = 2^ (1 + m_x)

then :

\log_2(x) = e_x + \log_2(1 + m_x)

and since

m_x \in optimal_approximation_(the_best_in_the_sense_of_the_uniform_norm.html" ;"title="Approximation theory#Optimal polynomials">optimal approximation (the best in the sense of the uniform norm">Approximation theory#Optimal polynomials">optimal approximation (the best in the sense of the uniform norm of the error). However, this value is not used by the algorithm as it does not take subsequent steps into account.

Thus there is the approximation :

\log_2(x) \approx e_x + m_x + \sigma.

Interpreting the floating-point bit-pattern of

x

as an integer

I_x

yieldsSince

x

is positive,

S_x = 0

. :

\begin
I_x &= E_x L + M_x\\
    &= L (e_x + B + m_x)\\
    &= L (e_x + m_x + \sigma + B - \sigma)\\
    &\approx L \log_2(x) + L (B - \sigma).
\end

It then appears that

I_x

is a scaled and shifted piecewise-linear approximation of

\log_2(x)

, as illustrated in the figure on the right. In other words,

\log_2(x)

is approximated by :

\log_2(x) \approx \frac - (B - \sigma).

First approximation of the result

The calculation of

y=\frac

is based on the identity :

\log_2(y) = - \tfrac\log_2(x)

Using the approximation of the logarithm above, applied to both

x

and

y

, the above equation gives: :

\frac - (B - \sigma) \approx - \frac\left(\frac - (B - \sigma)\right)

Thus, an approximation of

I_y

is: :

I_y \approx \tfrac L (B - \sigma) - \tfrac I_x

which is written in the code as : i = 0x5f3759df - ( i >> 1 ); The first term above is the magic number :

\tfrac L (B - \sigma) = \mathtt

from which it can be inferred that

\sigma \approx 0.0450466

. The second term,

\fracI_x

, is calculated by shifting the bits of

I_x

one position to the right.

Newton's method

With

y

as the inverse square root,

f(y)=\frac-x=0

. The approximation yielded by the earlier steps can be refined by using a root-finding method, a method that finds the zero of a function. The algorithm uses Newton's method: if there is an approximation,

y_n

for

y

, then a better approximation

y_

can be calculated by taking

y_n - \frac

, where

f'(y_n)

is the

derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...

f(y)

y_n

. For the above

f(y)

, :

y_ = \frac

where

f(y)= \frac - x

and

f'(y) = -\frac

. Treating

y

as a floating-point number, y = y*(threehalfs - x/2*y*y); is equivalent to :

y_ = y_\left (\frac32-\fracy_n^2\right ) = \frac.

By repeating this step, using the output of the function (

y_

) as the input of the next iteration, the algorithm causes

y

converge Converge may refer to: * Converge (band), American hardcore punk band * Converge (Baptist denomination), American national evangelical Baptist body * Limit (mathematics) * Converge ICT, internet service provider in the Philippines *CONVERGE CFD s ...

to the inverse square root. For the purposes of the ''Quake III'' engine, only one iteration was used. A second iteration remained in the code but was commented out.

Accuracy

As noted above, the approximation is very accurate. The single graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, and InvSqrt() gives 9.982522, making the relative difference 0.0017478, or 0.175% of the true value, 10. The absolute error only drops from then on, and the relative error stays within the same bounds across all orders of magnitude.

History

William Kahan and K.C. Ng at Berkeley wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations. In the late 1980s,

Cleve Moler Cleve Barry Moler is an American mathematician and computer programmer specializing in numerical analysis. In the mid to late 1970s, he was one of the authors of LINPACK and EISPACK, Fortran libraries for numerical computing. He invented MATL ...

at Ardent Computer learned about this technique and passed it along to his coworker Greg Walsh. Greg Walsh devised the now-famous constant and fast inverse square root algorithm. Gary Tarolli was consulting for Kubota, the company funding Ardent at the time, and likely brought the algorithm to

3dfx Interactive 3dfx Interactive was an American technology company headquartered in San Jose, California, founded in 1994, that specialized in the manufacturing of 3D graphics processing units, and later, video cards. It was a pioneer in the field from the l ...

circa 1994.

Jim Blinn James F. Blinn (born 1949) is an American computer scientist who first became widely known for his work as a computer graphics expert at NASA's Jet Propulsion Laboratory (JPL), particularly his work on the pre-encounter animations for the Vo ...

demonstrated a simple approximation of the inverse square root in a 1997 column for ''

IEEE Computer Graphics and Applications The publications of the Institute of Electrical and Electronics Engineers (IEEE) constitute around 30% of the world literature in the electrical and electronics engineering and computer science fields, publishing well over 100 peer-reviewed journa ...

''. Reverse engineering of other contemporary 3D video games uncovered a variation of the algorithm in

Activision Activision Publishing, Inc. is an American video game publisher based in Santa Monica, California. It serves as the publishing business for its parent company, Activision Blizzard, and consists of several subsidiary studios. Activision is one ...

's 1997

Interstate '76 ''Interstate '76'' is a vehicular combat video game for Microsoft Windows. It was developed and published by Activision and released on March 28, 1997. Plot The game opens in the Southwestern United States in an alternate history of the year 1 ...

, a

video game, was released in 1999 by

id Software id Software LLC () is an American video game developer based in Richardson, Texas. It was founded on February 1, 1991, by four members of the computer company Softdisk: game programmer, programmers John Carmack and John Romero, game designer T ...

and used the algorithm. Brian Hook may have brought the algorithm from 3dfx to id Software. Copies of the fast inverse square root code appeared on

Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...

and other forums as early as 2002 or 2003. Speculation arose as to who wrote the algorithm and how the constant was derived; some guessed John Carmack. Quake III's full source code was released at QuakeCon 2005, but provided no answers. The authorship question was answered in 2006. In 2007 the algorithm was implemented in some dedicated hardware

vertex shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of speci ...

s using field-programmable gate arrays (FPGA).

Subsequent improvements

Magic number

It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize approximation error by choosing the magic number

R

over a range. He first computed the optimal constant for the linear approximation step as , close to , but this new constant gave slightly less accuracy after one iteration of Newton's method. Lomont then searched for a constant optimal even after one and two Newton iterations and found , which is more accurate than the original at every iteration stage. He concluded by asking whether the exact value of the original constant was chosen through derivation or

trial and error Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying. According to W.H. Thorpe, the term was devised by C. Lloyd Morgan (18 ...

. Lomont said that the magic number for 64-bit IEEE754 size type double is , but it was later shown by Matthew Robertson to be exactly . Jan Kadlec reduced the relative error by a further factor of 2.7 by adjusting the constants in the single Newton's method iteration as well, arriving after an exhaustive search at conv.i = 0x5F1FFFF9 - ( conv.i >> 1 ); conv.f *= 0.703952253f * ( 2.38924456f - x * conv.f * conv.f ); return conv.f; A complete mathematical analysis for determining the magic number is now available for single-precision floating-point numbers.

Zero finding

Intermediate to the use of one vs. two iterations of Newton's method in terms of speed and accuracy is a single iteration of

Halley's method In numerical analysis, Halley's method is a root-finding algorithm used for functions of one real variable with a continuous second derivative. It is named after its inventor Edmond Halley. The algorithm is second in the class of Householder's m ...

. In this case, Halley's method is equivalent to applying Newton's method with the starting formula

f(y) = \frac - xy^ = 0

. The update step is then :

y_ = y_ - \frac = y_n \left(\frac\right),

where the implementation should calculate

xy_n^2

only once, via a temporary variable.

Notes

References

Bibliography

* * * * * * * * * * * *

External links

0x5f3759df
further investigations into accuracy and generalizability of the algorithm by Christian Plesner Hansen
Origin of Quake3's Fast InvSqrt()Quake III Arena source codearchived in Software Heritage

Implementation of InvSqrt
i
DESMOS"Fast Inverse Square Root — A Quake III Algorithm"
(

YouTube YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second mo ...

) {{Good article Quake (series) Source code Root-finding algorithms Articles with example C code