
Fast inverse square root, sometimes referred to as or by the
hexadecimal
Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
constant , is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
that estimates
, the
reciprocal (or multiplicative inverse) of the
square root
In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...
of a 32-
bit
The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
floating-point
In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
number
in
IEEE 754 floating-point format. The algorithm is best known for its implementation in 1999 in ''
Quake III Arena'', a
first-person shooter
A first-person shooter (FPS) is a video game genre, video game centered on gun fighting and other weapon-based combat seen from a First person (video games), first-person perspective, with the player experiencing the action directly through t ...
video game heavily based on
3D graphics
3D computer graphics, sometimes called CGI, 3D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of perfor ...
. With subsequent hardware advancements, especially the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
SSE instruction
rsqrtss
, this algorithm is not generally the best choice for modern computers,
though it remains an interesting historical example.
The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a
logical shift
In computer science, a logical shift is a bitwise operation that shifts all the bits of its operand. The two base variants are the logical left shift and the logical right shift. This is further modulated by the number of bit positions a given v ...
right by one bit is performed and the result subtracted from the number , which is a floating-point representation of an approximation of
.
This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
, yielding a more precise approximation.
History
William Kahan
William "Velvel" Morton Kahan (born June 5, 1933) is a Canadian mathematician and computer scientist, who is a professor emeritus at University of California, Berkeley. He received the Turing Award in 1989 for "his fundamental contributions to nu ...
and K.C. Ng at Berkeley wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations.
In the late 1980s,
Cleve Moler
Cleve Barry Moler (born August 17, 1939) is an American mathematician and computer programmer specializing in numerical analysis. In the mid to late 1970s, he was one of the authors of LINPACK and EISPACK, Fortran libraries for numerical compu ...
at
Ardent Computer Stardent Computer, Inc. was a manufacturer of graphics supercomputer workstations in the late 1980s. The company was formed in 1989 when Ardent Computer Corporation (formerly Dana Computer, Inc.) and Stellar Computer Inc. merged.
Both of the found ...
learned about this technique
and passed it along to his coworker Greg Walsh. Greg Walsh devised the now-famous constant and fast inverse square root algorithm. Gary Tarolli was consulting for Kubota, the company funding Ardent at the time, and likely brought the algorithm to
3dfx Interactive
3dfx Interactive, Inc. was an American computer hardware company headquartered in San Jose, California, founded in 1994, that specialized in the manufacturing of 3D graphics, 3D graphics processing units, and later, video cards. It was a pionee ...
circa 1994.
Jim Blinn demonstrated a simple approximation of the inverse square root in a 1997 column for ''
IEEE Computer Graphics and Applications''. Reverse engineering of other contemporary 3D video games uncovered a variation of the algorithm in
Activision
Activision Publishing, Inc. is an American video game publisher based in Santa Monica, California. It serves as the publishing business for its parent company, Activision Blizzard, and consists of several subsidiary studios. Activision is one o ...
's 1997
Interstate '76.
''
Quake III Arena'', a
first-person shooter
A first-person shooter (FPS) is a video game genre, video game centered on gun fighting and other weapon-based combat seen from a First person (video games), first-person perspective, with the player experiencing the action directly through t ...
video game, was released in 1999 by
id Software
id Software LLC () is an American video game developer based in Richardson, Texas. It was founded on February 1, 1991, by four members of the computer company Softdisk: game programmer, programmers John Carmack and John Romero, game designer T ...
and used the algorithm. Brian Hook may have brought the algorithm from 3dfx to id Software.
A discussion of the code appeared on the Chinese developer forum CSDN in 2000,
and
Usenet
Usenet (), a portmanteau of User's Network, is a worldwide distributed discussion system available on computers. It was developed from the general-purpose UUCP, Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Elli ...
and the gamedev.net forum spread the code widely in 2002 and 2003. Speculation arose as to who wrote the algorithm and how the constant was derived; some guessed
John Carmack
John D. Carmack II (born August 21, 1970) is an American computer programmer and video game developer. He co-founded the video game company id Software and was the lead programmer of its 1990s games ''Commander Keen'', ''Wolfenstein 3D'', ''Do ...
.
''Quake III''s full source code was released at
QuakeCon 2005, but provided no answers. The authorship question was resolved in 2006 when
Greg Walsh, the original author, contacted Beyond3D after their speculation gained popularity on
Slashdot
''Slashdot'' (sometimes abbreviated as ''/.'') is a social news website that originally billed itself as "News for Nerds. Stuff that Matters". It features news stories on science, technology, and politics that are submitted and evaluated by site ...
.
In 2007 the
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
was implemented in some dedicated hardware
vertex shader
In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene—a process known as '' shading''. Shaders have evolved to perform a variety of s ...
s using
field-programmable gate arrays (FPGA).
Motivation

The inverse square root of a floating point number is used in
digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
to normalize a vector, scaling it to
length
Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...
1 to produce a
unit vector
In mathematics, a unit vector in a normed vector space is a Vector (mathematics and physics), vector (often a vector (geometry), spatial vector) of Norm (mathematics), length 1. A unit vector is often denoted by a lowercase letter with a circumfle ...
. For example,
computer graphics
Computer graphics deals with generating images and art with the aid of computers. Computer graphics is a core technology in digital photography, film, video games, digital art, cell phone and computer displays, and many specialized applications. ...
programs use inverse
square roots
In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...
to compute
angles of incidence and
reflection for
lighting
Lighting or illumination is the deliberate use of light to achieve practical or aesthetic effects. Lighting includes the use of both artificial light sources like lamps and light fixtures, as well as natural illumination by capturing daylight. ...
and
shading
Shading refers to the depiction of depth perception in 3D models (within the field of 3D computer graphics) or illustrations (in visual art) by varying the level of darkness. Shading tries to approximate local behavior of light on the object's ...
.
3D graphics
3D computer graphics, sometimes called CGI, 3D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of perfor ...
programs must perform millions of these calculations every second to simulate lighting. When the code was developed in the early 1990s, most floating point processing power lagged the speed of integer processing.
This was troublesome for 3D graphics programs before the advent of specialized hardware to handle
transform and lighting
Transform, clipping, and lighting (T&L or TCL) is a term used in computer graphics.
Overview
Transformation is the task of producing a two-dimensional view of a 3D computer graphics, three-dimensional scene. Clipping (computer graphics), Clipp ...
. Computation of square roots usually depends upon many division operations, which for floating point numbers are
computationally expensive. The fast inverse square generates a good approximation with only one division step.
The length of the
vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
is determined by calculating its
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces'' ...
: the square root of the sum of squares of the
vector components
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction. In a 3D graphics program, all vectors are in three-
dimensional space, so
would be a vector
. Then,
:
is the Euclidean norm of the vector, and the normalized (unit) vector is
:
where the fraction term is the inverse square root of
.
At the time, floating-point division was generally expensive compared to multiplication; the fast inverse square root algorithm bypassed the division step, giving it its performance advantage.
Overview of the code
The following
C code is the fast inverse square root implementation from ''
Quake III Arena'', stripped of
C preprocessor
The C preprocessor (CPP) is a text file processor that is used with C, C++ and other programming tools. The preprocessor provides for file inclusion (often header files), macro expansion, conditional compilation, and line control. Although ...
directives, but including the exact original comment text:
float Q_rsqrt( float number )
At the time, the general method to compute the inverse square root was to calculate an approximation for
, then revise that approximation via another method until it came within an acceptable error range of the actual result.
Common software methods in the early 1990s drew approximations from a
lookup table
In computer science, a lookup table (LUT) is an array data structure, array that replaces runtime (program lifecycle phase), runtime computation of a mathematical function (mathematics), function with a simpler array indexing operation, in a proc ...
. The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups. The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division. The algorithm was designed with the
IEEE 754-1985
IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, ...
32-bit floating-point specification in mind, but investigation from Chris Lomont showed that it could be implemented in other floating-point specifications.
The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit floating-point word
[Use of the type ]long
reduces the portability of this code on modern systems. For the code to execute properly, sizeof
sizeof is a unary operator in the C and C++ programming languages that evaluates to the storage size of an expression or a data type, measured in units sized as char. Consequently, the expression sizeof(char) evaluates to 1. The number of b ...
(long)
must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems, sizeof
sizeof is a unary operator in the C and C++ programming languages that evaluates to the storage size of an expression or a data type, measured in units sized as char. Consequently, the expression sizeof(char) evaluates to 1. The number of b ...
(long)
is 8 bytes. The more portable replacement is int32_t
. as an
integer
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
, then subtracting it from a "
magic" constant, .
This integer subtraction and
bit shift
In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic opera ...
results in a bit pattern which, when
re-defined as a floating-point number, is a rough approximation for the inverse square root of the number. One iteration of Newton's method is performed to gain some accuracy, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
; however, it is much slower and less accurate than using the
SSE instruction
rsqrtss
on x86 processors also released in 1999.
Worked example
As an example, the number
can be used to calculate
. The first steps of the algorithm are illustrated below:
0011_1110_0010_0000_0000_0000_0000_0000 Bit pattern of both x and i
0001_1111_0001_0000_0000_0000_0000_0000 Shift right one position: (i >> 1)
0101_1111_0011_0111_0101_1001_1101_1111 The magic number 0x5F3759DF
0100_0000_0010_0111_0101_1001_1101_1111 The result of 0x5F3759DF - (i >> 1)
Interpreting as
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines.
The IEEE ...
32-bit representation:
0_01111100_01000000000000000000000 1.25 × 2
−3
0_00111110_00100000000000000000000 1.125 × 2
−65
0_10111110_01101110101100111011111 1.432430... × 2
63
0_10000000_01001110101100111011111 1.307430... × 2
1
Reinterpreting this last bit pattern as a floating point number gives the approximation
, which has an error of about 3.4%. After one iteration of
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
, the final result is
, an error of only 0.17%.
Avoiding undefined behavior
According to the C standard, reinterpreting a floating point value as an integer by casting then dereferencing the pointer to it is not valid (
undefined behavior
In computer programming, a program exhibits undefined behavior (UB) when it contains, or is executing code for which its programming language specification does not mandate any specific requirements. This is different from unspecified behavior, ...
). This can be avoided by using alternative
type punning
In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal ...
techniques such as C's unions or
C++20 C20 or C-20 may refer to:
Science and technology
* Carbon-20 (C-20 or 20C), an isotope of carbon
* C20, the smallest possible fullerene (a carbon molecule)
* C20 (engineering), a mix of concrete that has a compressive strength of 20 newtons per squ ...
's
std::bit_cast
.
Algorithm
The algorithm computes
by performing the following steps:
# Alias the argument
to an integer as a way to compute an approximation of the
binary logarithm
In mathematics, the binary logarithm () is the exponentiation, power to which the number must be exponentiation, raised to obtain the value . That is, for any real number ,
:x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n.
For example, th ...
# Use this approximation to compute an approximation of
# Alias back to a float, as a way to compute an approximation of the base-2 exponential
# Refine the approximation using a single iteration of
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
.
Floating-point representation
Since this
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
relies heavily on the bit-level representation of single-precision floating-point numbers, a short overview of this representation is provided here. To encode a non-zero
real number
In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
as a single precision float, the first step is to write
as a
normalized binary number:
:
where the exponent
is an integer, and
is the binary representation of the
significand
The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
. Since the single bit before the point in the significand is always 1, it does not need be stored. The equation can be rewritten as:
:
where
means
, so
. From this form, three unsigned integers are computed:
*
, the "sign bit", is
if
is positive and
negative or zero (1 bit)
*
is the "biased exponent", where
is the "exponent bias"
[ should be in the range for to be representable as a ]normal number
In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to ...
. (8 bits)
*
, where
[The only real numbers that can be represented ''exactly'' as floating point are those for which is an integer. Other numbers can only be represented approximately by rounding them to the nearest exactly representable number.] (23 bits)
Thus:
and
.
These fields are then packed, left to right, into a 32-bit container.
As an example, consider again the number
. Normalizing
yields:
:
and thus, the three unsigned integer fields are:
*
*
*
these fields are packed as shown in the figure below:

The number is represented in
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two values (0 and 1) for each digit
* Binary function, a function that takes two arguments
* Binary operation, a mathematical op ...
as:
Also, since this algorithm works on real numbers,
is only defined for
. The code thus assumes
and
.
The number, given to calculate the square root, could be rewritten as:
:
:
Aliasing to an integer as an approximate logarithm
If
were to be calculated without a computer or a calculator, a
table of logarithms would be useful, together with the identity
, which is valid for every base
. The fast inverse square root is based on this identity, and on the fact that aliasing a float32 to an integer gives a rough approximation of its logarithm. Here is how:
If
is a positive
normal number
In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to ...
:
:
then
:
and since