HOME

TheInfoList



OR:

Fast inverse square root, sometimes referred to as or by the
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
constant , is an
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
that estimates \frac, the reciprocal (or multiplicative inverse) of the
square root In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...
of a 32-
bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
number x in IEEE 754 floating-point format. The algorithm is best known for its implementation in 1999 in '' Quake III Arena'', a
first-person shooter A first-person shooter (FPS) is a video game genre, video game centered on gun fighting and other weapon-based combat seen from a First person (video games), first-person perspective, with the player experiencing the action directly through t ...
video game heavily based on
3D graphics 3D computer graphics, sometimes called CGI, 3D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of perfor ...
. With subsequent hardware advancements, especially the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
SSE instruction rsqrtss, this algorithm is not generally the best choice for modern computers, though it remains an interesting historical example. The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a
logical shift In computer science, a logical shift is a bitwise operation that shifts all the bits of its operand. The two base variants are the logical left shift and the logical right shift. This is further modulated by the number of bit positions a given v ...
right by one bit is performed and the result subtracted from the number , which is a floating-point representation of an approximation of \sqrt. This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
, yielding a more precise approximation.


History

William Kahan William "Velvel" Morton Kahan (born June 5, 1933) is a Canadian mathematician and computer scientist, who is a professor emeritus at University of California, Berkeley. He received the Turing Award in 1989 for "his fundamental contributions to nu ...
and K.C. Ng at Berkeley wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations. In the late 1980s,
Cleve Moler Cleve Barry Moler (born August 17, 1939) is an American mathematician and computer programmer specializing in numerical analysis. In the mid to late 1970s, he was one of the authors of LINPACK and EISPACK, Fortran libraries for numerical compu ...
at
Ardent Computer Stardent Computer, Inc. was a manufacturer of graphics supercomputer workstations in the late 1980s. The company was formed in 1989 when Ardent Computer Corporation (formerly Dana Computer, Inc.) and Stellar Computer Inc. merged. Both of the found ...
learned about this technique and passed it along to his coworker Greg Walsh. Greg Walsh devised the now-famous constant and fast inverse square root algorithm. Gary Tarolli was consulting for Kubota, the company funding Ardent at the time, and likely brought the algorithm to
3dfx Interactive 3dfx Interactive, Inc. was an American computer hardware company headquartered in San Jose, California, founded in 1994, that specialized in the manufacturing of 3D graphics, 3D graphics processing units, and later, video cards. It was a pionee ...
circa 1994. Jim Blinn demonstrated a simple approximation of the inverse square root in a 1997 column for '' IEEE Computer Graphics and Applications''. Reverse engineering of other contemporary 3D video games uncovered a variation of the algorithm in
Activision Activision Publishing, Inc. is an American video game publisher based in Santa Monica, California. It serves as the publishing business for its parent company, Activision Blizzard, and consists of several subsidiary studios. Activision is one o ...
's 1997 Interstate '76. '' Quake III Arena'', a
first-person shooter A first-person shooter (FPS) is a video game genre, video game centered on gun fighting and other weapon-based combat seen from a First person (video games), first-person perspective, with the player experiencing the action directly through t ...
video game, was released in 1999 by
id Software id Software LLC () is an American video game developer based in Richardson, Texas. It was founded on February 1, 1991, by four members of the computer company Softdisk: game programmer, programmers John Carmack and John Romero, game designer T ...
and used the algorithm. Brian Hook may have brought the algorithm from 3dfx to id Software. A discussion of the code appeared on the Chinese developer forum CSDN in 2000, and
Usenet Usenet (), a portmanteau of User's Network, is a worldwide distributed discussion system available on computers. It was developed from the general-purpose UUCP, Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Elli ...
and the gamedev.net forum spread the code widely in 2002 and 2003. Speculation arose as to who wrote the algorithm and how the constant was derived; some guessed
John Carmack John D. Carmack II (born August 21, 1970) is an American computer programmer and video game developer. He co-founded the video game company id Software and was the lead programmer of its 1990s games ''Commander Keen'', ''Wolfenstein 3D'', ''Do ...
. ''Quake III''s full source code was released at QuakeCon 2005, but provided no answers. The authorship question was resolved in 2006 when Greg Walsh, the original author, contacted Beyond3D after their speculation gained popularity on
Slashdot ''Slashdot'' (sometimes abbreviated as ''/.'') is a social news website that originally billed itself as "News for Nerds. Stuff that Matters". It features news stories on science, technology, and politics that are submitted and evaluated by site ...
. In 2007 the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
was implemented in some dedicated hardware
vertex shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene—a process known as '' shading''. Shaders have evolved to perform a variety of s ...
s using field-programmable gate arrays (FPGA).


Motivation

The inverse square root of a floating point number is used in
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
to normalize a vector, scaling it to
length Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...
1 to produce a
unit vector In mathematics, a unit vector in a normed vector space is a Vector (mathematics and physics), vector (often a vector (geometry), spatial vector) of Norm (mathematics), length 1. A unit vector is often denoted by a lowercase letter with a circumfle ...
. For example,
computer graphics Computer graphics deals with generating images and art with the aid of computers. Computer graphics is a core technology in digital photography, film, video games, digital art, cell phone and computer displays, and many specialized applications. ...
programs use inverse
square roots In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...
to compute angles of incidence and reflection for
lighting Lighting or illumination is the deliberate use of light to achieve practical or aesthetic effects. Lighting includes the use of both artificial light sources like lamps and light fixtures, as well as natural illumination by capturing daylight. ...
and
shading Shading refers to the depiction of depth perception in 3D models (within the field of 3D computer graphics) or illustrations (in visual art) by varying the level of darkness. Shading tries to approximate local behavior of light on the object's ...
.
3D graphics 3D computer graphics, sometimes called CGI, 3D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of perfor ...
programs must perform millions of these calculations every second to simulate lighting. When the code was developed in the early 1990s, most floating point processing power lagged the speed of integer processing. This was troublesome for 3D graphics programs before the advent of specialized hardware to handle
transform and lighting Transform, clipping, and lighting (T&L or TCL) is a term used in computer graphics. Overview Transformation is the task of producing a two-dimensional view of a 3D computer graphics, three-dimensional scene. Clipping (computer graphics), Clipp ...
. Computation of square roots usually depends upon many division operations, which for floating point numbers are computationally expensive. The fast inverse square generates a good approximation with only one division step. The length of the
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
is determined by calculating its
Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces'' ...
: the square root of the sum of squares of the
vector components Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction. In a 3D graphics program, all vectors are in three- dimensional space, so \boldsymbol v would be a vector (v_1, v_2, v_3). Then, :\, \boldsymbol\, = \sqrt is the Euclidean norm of the vector, and the normalized (unit) vector is :\begin \boldsymbol &= \frac\boldsymbol\\ &= \frac\sqrt\,\boldsymbol, \end where the fraction term is the inverse square root of v_1^2+v_2^2+v_3^2. At the time, floating-point division was generally expensive compared to multiplication; the fast inverse square root algorithm bypassed the division step, giving it its performance advantage.


Overview of the code

The following C code is the fast inverse square root implementation from '' Quake III Arena'', stripped of
C preprocessor The C preprocessor (CPP) is a text file processor that is used with C, C++ and other programming tools. The preprocessor provides for file inclusion (often header files), macro expansion, conditional compilation, and line control. Although ...
directives, but including the exact original comment text: float Q_rsqrt( float number ) At the time, the general method to compute the inverse square root was to calculate an approximation for \frac, then revise that approximation via another method until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew approximations from a
lookup table In computer science, a lookup table (LUT) is an array data structure, array that replaces runtime (program lifecycle phase), runtime computation of a mathematical function (mathematics), function with a simpler array indexing operation, in a proc ...
. The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups. The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division. The algorithm was designed with the
IEEE 754-1985 IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, ...
32-bit floating-point specification in mind, but investigation from Chris Lomont showed that it could be implemented in other floating-point specifications. The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit floating-point wordUse of the type long reduces the portability of this code on modern systems. For the code to execute properly,
sizeof sizeof is a unary operator in the C and C++ programming languages that evaluates to the storage size of an expression or a data type, measured in units sized as char. Consequently, the expression sizeof(char) evaluates to 1. The number of b ...
(long)
must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems,
sizeof sizeof is a unary operator in the C and C++ programming languages that evaluates to the storage size of an expression or a data type, measured in units sized as char. Consequently, the expression sizeof(char) evaluates to 1. The number of b ...
(long)
is 8 bytes. The more portable replacement is int32_t.
as an
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
, then subtracting it from a " magic" constant, . This integer subtraction and
bit shift In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic opera ...
results in a bit pattern which, when re-defined as a floating-point number, is a rough approximation for the inverse square root of the number. One iteration of Newton's method is performed to gain some accuracy, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
; however, it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors also released in 1999.


Worked example

As an example, the number x=0.15625 can be used to calculate \frac \approx 2.52982. The first steps of the algorithm are illustrated below: 0011_1110_0010_0000_0000_0000_0000_0000 Bit pattern of both x and i 0001_1111_0001_0000_0000_0000_0000_0000 Shift right one position: (i >> 1) 0101_1111_0011_0111_0101_1001_1101_1111 The magic number 0x5F3759DF 0100_0000_0010_0111_0101_1001_1101_1111 The result of 0x5F3759DF - (i >> 1) Interpreting as
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines. The IEEE ...
32-bit representation: 0_01111100_01000000000000000000000 1.25 × 2−3 0_00111110_00100000000000000000000 1.125 × 2−65 0_10111110_01101110101100111011111 1.432430... × 263 0_10000000_01001110101100111011111 1.307430... × 21 Reinterpreting this last bit pattern as a floating point number gives the approximation y=2.61486, which has an error of about 3.4%. After one iteration of
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
, the final result is y=2.52549, an error of only 0.17%.


Avoiding undefined behavior

According to the C standard, reinterpreting a floating point value as an integer by casting then dereferencing the pointer to it is not valid (
undefined behavior In computer programming, a program exhibits undefined behavior (UB) when it contains, or is executing code for which its programming language specification does not mandate any specific requirements. This is different from unspecified behavior, ...
). This can be avoided by using alternative
type punning In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal ...
techniques such as C's unions or
C++20 C20 or C-20 may refer to: Science and technology * Carbon-20 (C-20 or 20C), an isotope of carbon * C20, the smallest possible fullerene (a carbon molecule) * C20 (engineering), a mix of concrete that has a compressive strength of 20 newtons per squ ...
's std::bit_cast.


Algorithm

The algorithm computes \frac by performing the following steps: # Alias the argument x to an integer as a way to compute an approximation of the
binary logarithm In mathematics, the binary logarithm () is the exponentiation, power to which the number must be exponentiation, raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, th ...
\log_(x) # Use this approximation to compute an approximation of \log_\left(\frac\right) = -\frac \log_(x) # Alias back to a float, as a way to compute an approximation of the base-2 exponential # Refine the approximation using a single iteration of
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
.


Floating-point representation

Since this
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
relies heavily on the bit-level representation of single-precision floating-point numbers, a short overview of this representation is provided here. To encode a non-zero
real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
x as a single precision float, the first step is to write x as a normalized binary number: :\begin x &= \pm 1.b_1b_2b_3\ldots \times 2^ \end where the exponent e_x is an integer, and 1.b_1b_2b_3\ldots is the binary representation of the
significand The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
. Since the single bit before the point in the significand is always 1, it does not need be stored. The equation can be rewritten as: :\begin x &= (-1)^ \cdot 2^ (1 + m_x) \end where m_x means 0.b_1b_2b_3\ldots, so m_x \in [0, 1). From this form, three unsigned integers are computed: * S_x, the "sign bit", is 0 if x is positive and 1 negative or zero (1 bit) * E_x = e_x + B is the "biased exponent", where B = 127 is the "exponent bias"E_x should be in the range [1, 254] for x to be representable as a
normal number In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to ...
.
(8 bits) * M_x = m_x \times L, where L = 2^The only real numbers that can be represented ''exactly'' as floating point are those for which M_x is an integer. Other numbers can only be represented approximately by rounding them to the nearest exactly representable number. (23 bits) Thus: e_x = E_x -B and m_x = \frac. These fields are then packed, left to right, into a 32-bit container. As an example, consider again the number x = 0.15625 = 0.00101_2. Normalizing x yields: :x = (-1)^ \cdot 2^(1 + 0.25) = +2^(1 + 0.25) and thus, the three unsigned integer fields are: * S = 0 * E = -3 + 127 = 124 = 0111\ 1100_2 * M = 0.25 \times 2^ = 2\ 097\ 152 = 0010\ 0000\ 0000\ 0000\ 0000\ 0000_2 these fields are packed as shown in the figure below: The number is represented in
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...
as: I_x = S_x \cdot 2^ + E_xL + M_x Also, since this algorithm works on real numbers, \sqrt is only defined for x \geq 0. The code thus assumes x \geq 0 and S_x = 0. The number, given to calculate the square root, could be rewritten as: :I_x = E_xL +M_x :x = (-1)^0 \cdot 2^ (1 + m_x) = + 2^ (1 + m_x)


Aliasing to an integer as an approximate logarithm

If \frac were to be calculated without a computer or a calculator, a table of logarithms would be useful, together with the identity \log_b\left(\frac\right) = \log_b\left(x^\right) = -\frac \log_b(x), which is valid for every base b. The fast inverse square root is based on this identity, and on the fact that aliasing a float32 to an integer gives a rough approximation of its logarithm. Here is how: If x is a positive
normal number In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to ...
: :x = 2^ (1 + m_x) then :\log_2(x) = e_x + \log_2(1 + m_x) and since m_x \in optimal approximation (the best in the sense of the
uniform norm In mathematical analysis, the uniform norm (or ) assigns, to real- or complex-valued bounded functions defined on a set , the non-negative number :\, f\, _\infty = \, f\, _ = \sup\left\. This norm is also called the , the , the , or, when t ...
of the error). However, this value is not used by the algorithm as it does not take subsequent steps into account. Thus there is the approximation :\log_2(x) \approx e_x + m_x + \sigma. Interpreting the floating-point bit-pattern of x as an integer I_x yieldsSince x is positive, S_x = 0. :\begin I_x &= E_x L + M_x\\ &= L (e_x + B + m_x)\\ &= L (e_x + m_x + \sigma + B - \sigma)\\ &\approx L \log_2(x) + L (B - \sigma). \end It then appears that I_x is a scaled and shifted piecewise-linear approximation of \log_2(x), as illustrated in the figure on the right. In other words, \log_2(x) is approximated by :\log_2(x) \approx \frac - (B - \sigma).


First approximation of the result

The calculation of y=\frac is based on the identity :\log_2(y) = - \tfrac\log_2(x) Using the approximation of the logarithm above, applied to both x and y, the above equation gives: :\frac - (B - \sigma) \approx - \frac\left(\frac - (B - \sigma)\right) Thus, an approximation of I_y is: :I_y \approx \tfrac L (B - \sigma) - \tfrac I_x which is written in the code as : i = 0x5f3759df - ( i >> 1 ); The first term above is the magic number :\tfrac L (B - \sigma) = \mathtt from which it can be inferred that \sigma \approx 0.0450466. The second term, \fracI_x, is calculated by shifting the bits of I_x one position to the right.


Newton's method

The number y=\tfrac is a solution of the equation \tfrac-x=0. The approximation yielded by the earlier steps can be refined by using a root-finding method, a method that finds the
zero of a function In mathematics, a zero (also sometimes called a root) of a real-, complex-, or generally vector-valued function f, is a member x of the domain of f such that f(x) ''vanishes'' at x; that is, the function f attains the value of 0 at x, or equ ...
. The algorithm uses
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
: if there is an approximation, y_n for y, then a better approximation y_ can be calculated by taking y_n - \tfrac, where f'(y_n) is the
derivative In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
of f(y) at y_n. Applied to the equation \tfrac-x=0, Newton's method gives :\begin f(y) &= \frac - x\\ f'(y) &= -\frac\\ y_ &= y_n - \frac\\ &= y_n + \frac \left(\frac - x\right)\\ &= y_n \left(\frac - \fracy_n^2\right), \end which is written in the code as y = y * ( threehalfs - ( x2 * y * y ) );. By repeating this step, using the output of the function (y_) as the input of the next iteration, the algorithm causes y to
converge Converge may refer to: * Converge (band), American hardcore punk band * Converge (Baptist denomination), American national evangelical Baptist body * Limit (mathematics) In mathematics, a limit is the value that a function (or sequence) app ...
to the inverse square root. For the purposes of the ''Quake III'' engine, only one iteration was used. A second iteration remained in the code but was commented out.


Accuracy

As noted above, the approximation is very accurate. The single graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, and InvSqrt() gives 9.982522, making the relative difference 0.0017478, or 0.175% of the true value, 10. The absolute error only drops from then on, and the relative error stays within the same bounds across all orders of magnitude.


Subsequent improvements


Magic number

It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize
approximation error The approximation error in a given data value represents the significant discrepancy that arises when an exact, true value is compared against some approximation derived for it. This inherent error in approximation can be quantified and express ...
by choosing the magic number R over a range. He first computed the optimal constant for the linear approximation step as , close to , but this new constant gave slightly less accuracy after one iteration of Newton's method. Lomont then searched for a constant optimal even after one and two Newton iterations and found , which is more accurate than the original at every iteration stage. He concluded by asking whether the exact value of the original constant was chosen through derivation or
trial and error Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying. According to W.H. Thorpe, the term was devised by C. Lloyd Morgan ( ...
. Lomont said that the magic number for 64-bit IEEE754 size type double is , but it was later shown by Matthew Robertson to be exactly . Jan Kadlec reduced the relative error by a further factor of 2.7 by adjusting the constants in the single Newton's method iteration as well, arriving after an exhaustive search at conv.i = 0x5F1FFFF9 - ( conv.i >> 1 ); conv.f *= 0.703952253f * ( 2.38924456f - x * conv.f * conv.f ); return conv.f; A complete mathematical analysis for determining the magic number is now available for single-precision floating-point numbers.


Zero finding

Intermediate to the use of one vs. two iterations of Newton's method in terms of speed and accuracy is a single iteration of Halley's method. In this case, Halley's method is equivalent to applying Newton's method with the starting formula f(y) = \frac - xy^ = 0. The update step is then :y_ = y_ - \frac = y_n \left(\frac\right), where the implementation should calculate xy_n^2 only once, via a temporary variable.


Obsolescence

Subsequent additions by hardware manufacturers have made this algorithm redundant for the most part. For example, on
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
, Intel introduced the SSE instruction rsqrtss in 1999. In a 2009 benchmark on the
Intel Core 2 Intel Core 2 is a processor family encompassing a range of Intel's mainstream 64-bit x86-64 single-, dual-, and quad-core microprocessors based on the Core microarchitecture. The single- and dual-core models are single- die, whereas the quad-co ...
, this instruction took 0.85ns per float compared to 3.54ns for the fast inverse square root algorithm, and had less error. Some low-cost embedded systems do not have specialized square root instructions. However, manufacturers of these systems usually provide trigonometric and other math libraries, based on algorithms such as
CORDIC CORDIC, short for coordinate rotation digital computer, is a simple and efficient algorithm to calculate trigonometric functions, hyperbolic functions, square roots, multiplications, divisions, and exponentials and logarithms In ma ...
.


See also

* * Magic number


Notes


References


Bibliography

* * * * * Centenary Edition, 2008, . * * * * * * *


Further reading

*


External links


0x5f3759df
further investigations into accuracy and generalizability of the algorithm by Christian Plesner Hansen
Origin of Quake3's Fast InvSqrt()

Quake III Arena source codearchived in Software Heritage

Implementation of InvSqrt
i
DESMOS

"Fast Inverse Square Root — A Quake III Algorithm"
(
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
) {{Good article Quake (series) Source code Root-finding algorithms Articles with example C code