HOME

TheInfoList



OR:

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a comple ...
(DFT) of an arbitrary
composite Composite or compositing may refer to: Materials * Composite material, a material that is made from several different substances ** Metal matrix composite, composed of metal and other parts ** Cermet, a composite of ceramic and metallic materials ...
size N = N_1N_2 in terms of ''N''1 smaller DFTs of sizes ''N''2,
recursively Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
, to reduce the computation time to O(''N'' log ''N'') for highly composite ''N'' (
smooth number In number theory, an ''n''-smooth (or ''n''-friable) number is an integer whose prime factors are all less than or equal to ''n''. For example, a 7-smooth number is a number whose every prime factor is at most 7, so 49 = 72 and 15750 = 2 × 32 ...
s). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below. Because the Cooley–Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT. For example, Rader's or Bluestein's algorithm can be used to handle large prime factors that cannot be decomposed by Cooley–Tukey, or the prime-factor algorithm can be exploited for greater efficiency in separating out
relatively prime In mathematics, two integers and are coprime, relatively prime or mutually prime if the only positive integer that is a divisor of both of them is 1. Consequently, any prime number that divides does not divide , and vice versa. This is equivale ...
factors. The algorithm, along with its recursive application, was invented by
Carl Friedrich Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...
. Cooley and Tukey independently rediscovered and popularized it 160 years later.


History

This algorithm, including its recursive application, was invented around 1805 by
Carl Friedrich Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...
, who used it to interpolate the trajectories of the asteroids
Pallas Pallas may refer to: Astronomy * 2 Pallas asteroid ** Pallas family, a group of asteroids that includes 2 Pallas * Pallas (crater), a crater on Earth's moon Mythology * Pallas (Giant), a son of Uranus and Gaia, killed and flayed by Athena * Pa ...
and
Juno Juno commonly refers to: *Juno (mythology), the Roman goddess of marriage and queen of the gods *Juno (film), ''Juno'' (film), 2007 Juno may also refer to: Arts, entertainment and media Fictional characters *Juno, in the film ''Jenny, Juno'' *Ju ...
, but his work was not widely recognized (being published only posthumously and in
neo-Latin New Latin (also called Neo-Latin or Modern Latin) is the revival of Literary Latin used in original, scholarly, and scientific works since about 1500. Modern scholarly and technical nomenclature, such as in zoological and botanical taxonomy ...
).Heideman, M. T., D. H. Johnson, and C. S. Burrus,
Gauss and the history of the fast Fourier transform
" IEEE ASSP Magazine, 1, (4), 14–21 (1984)
Gauss did not analyze the asymptotic computational time, however. Various limited forms were also rediscovered several times throughout the 19th and early 20th centuries. FFTs became popular after James Cooley of IBM and John Tukey of
Princeton Princeton University is a private research university in Princeton, New Jersey. Founded in 1746 in Elizabeth as the College of New Jersey, Princeton is the fourth-oldest institution of higher education in the United States and one of the ni ...
published a paper in 1965 reinventing the algorithm and describing how to perform it conveniently on a computer. Tukey reportedly came up with the idea during a meeting of President Kennedy's Science Advisory Committee discussing ways to detect nuclear-weapon tests in the
Soviet Union The Soviet Union,. officially the Union of Soviet Socialist Republics. (USSR),. was a List of former transcontinental countries#Since 1700, transcontinental country that spanned much of Eurasia from 1922 to 1991. A flagship communist state, ...
by employing seismometers located outside the country. These sensors would generate seismological time series. However, analysis of this data would require fast algorithms for computing DFTs due to the number of sensors and length of time. This task was critical for the ratification of the proposed nuclear test ban so that any violations could be detected without need to visit Soviet facilities. Another participant at that meeting,
Richard Garwin Richard Lawrence Garwin (born April 19, 1928) is an American physicist, best known as the author of the first hydrogen bomb design. In 1978, Garwin was elected a member of the National Academy of Engineering for contributing to the application ...
of IBM, recognized the potential of the method and put Tukey in touch with Cooley. However, Garwin made sure that Cooley did not know the original purpose. Instead, Cooley was told that this was needed to determine periodicities of the spin orientations in a 3-D crystal of helium-3. Cooley and Tukey subsequently published their joint paper, and wide adoption quickly followed due to the simultaneous development of
Analog-to-digital converter In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provide ...
s capable of sampling at rates up to 300 kHz. The fact that Gauss had described the same algorithm (albeit without analyzing its asymptotic cost) was not realized until several years after Cooley and Tukey's 1965 paper. Their paper cited as inspiration only the work by I. J. Good on what is now called the
prime-factor FFT algorithm The prime-factor algorithm (PFA), also called the Good–Thomas algorithm (1958/1963), is a fast Fourier transform (FFT) algorithm that re-expresses the discrete Fourier transform (DFT) of a size ''N'' = ''N''1''N''2 as a two-dimensional ''N''1× ...
(PFA); although Good's algorithm was initially thought to be equivalent to the Cooley–Tukey algorithm, it was quickly realized that PFA is a quite different algorithm (working only for sizes that have
relatively prime In mathematics, two integers and are coprime, relatively prime or mutually prime if the only positive integer that is a divisor of both of them is 1. Consequently, any prime number that divides does not divide , and vice versa. This is equivale ...
factors and relying on the Chinese Remainder Theorem, unlike the support for any composite size in Cooley–Tukey).


The radix-2 DIT case

A radix-2 decimation-in-time (DIT) FFT is the simplest and most common form of the Cooley–Tukey algorithm, although highly optimized Cooley–Tukey implementations typically use other forms of the algorithm as described below. Radix-2 DIT divides a DFT of size ''N'' into two interleaved DFTs (hence the name "radix-2") of size ''N''/2 with each recursive stage. The discrete Fourier transform (DFT) is defined by the formula: : X_k = \sum_^ x_n e^, where k is an integer ranging from 0 to N-1. Radix-2 DIT first computes the DFTs of the even-indexed inputs (x_=x_0, x_2, \ldots, x_) and of the odd-indexed inputs (x_=x_1, x_3, \ldots, x_), and then combines those two results to produce the DFT of the whole sequence. This idea can then be performed
recursively Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
to reduce the overall runtime to O(''N'' log ''N''). This simplified form assumes that ''N'' is a
power of two A power of two is a number of the form where is an integer, that is, the result of exponentiation with number two as the base and integer  as the exponent. In a context where only integers are considered, is restricted to non-negativ ...
; since the number of sample points ''N'' can usually be chosen freely by the application (e.g. by changing the sample rate or window, zero-padding, etcetera), this is often not an important restriction. The radix-2 DIT algorithm rearranges the DFT of the function x_n into two parts: a sum over the even-numbered indices n= and a sum over the odd-numbered indices n=: : \begin X_k & = & \sum \limits_^ x_e^ + \sum \limits_^ x_ e^ \end One can factor a common multiplier e^ out of the second sum, as shown in the equation below. It is then clear that the two sums are the DFT of the even-indexed part x_ and the DFT of odd-indexed part x_ of the function x_n. Denote the DFT of the ''E''ven-indexed inputs x_ by E_k and the DFT of the ''O''dd-indexed inputs x_ by O_k and we obtain: : \begin X_k= \underbrace_ + e^ \underbrace_ = E_k + e^ O_k\qquad\textk=0,\dots,\frac N2-1. \end Note that the equalities hold for k=0,\dots,N-1 but the crux is that E_k and O_k are calculated in this way for k=0,\dots,\frac N2-1 only. Thanks to the periodicity of the complex exponential, X_ is also obtained from E_k and O_k: : \begin X_ & = \sum \limits_^ x_ e^ + e^ \sum \limits_^ x_ e^ \\ & = \sum \limits_^ x_ e^ e^ + e^e^ \sum \limits_^ x_ e^ e^ \\ & = \sum \limits_^ x_ e^ - e^ \sum \limits_^ x_ e^ \\ & = E_k - e^ O_k \end We can rewrite X_k and X_ as: : \begin X_k & = & E_k + e^ O_k \\ X_ & = & E_k - e^ O_k \end This result, expressing the DFT of length ''N'' recursively in terms of two DFTs of size ''N''/2, is the core of the radix-2 DIT fast Fourier transform. The algorithm gains its speed by re-using the results of intermediate computations to compute multiple DFT outputs. Note that final outputs are obtained by a +/− combination of E_k and O_k \exp(-2\pi i k/N), which is simply a size-2 DFT (sometimes called a
butterfly Butterflies are insects in the macrolepidopteran clade Rhopalocera from the order Lepidoptera, which also includes moths. Adult butterflies have large, often brightly coloured wings, and conspicuous, fluttering flight. The group comprise ...
in this context); when this is generalized to larger radices below, the size-2 DFT is replaced by a larger DFT (which itself can be evaluated with an FFT). This process is an example of the general technique of
divide and conquer algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved direc ...
s; in many conventional implementations, however, the explicit recursion is avoided, and instead one traverses the computational tree in
breadth-first Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next de ...
fashion. The above re-expression of a size-''N'' DFT as two size-''N''/2 DFTs is sometimes called the
Danielson Danielson is an American rock band from Clarksboro, New Jersey, that plays indie pop gospel music. The group consists of frontman Daniel Smith and a number of various artists with whom he collaborates. Smith has also released solo work as Br ...
Lanczos __NOTOC__ Cornelius (Cornel) Lanczos ( hu, Lánczos Kornél, ; born as Kornél Lőwy, until 1906: ''Löwy (Lőwy) Kornél''; February 2, 1893 – June 25, 1974) was a Hungarian-American and later Hungarian-Irish mathematician and physicist. Acco ...
lemma, since the identity was noted by those two authors in 1942 (influenced by Runge's 1903 work). They applied their lemma in a "backwards" recursive fashion, repeatedly ''doubling'' the DFT size until the transform spectrum converged (although they apparently didn't realize the
linearithmic In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
.e., order ''N'' log ''N''asymptotic complexity they had achieved). The Danielson–Lanczos work predated widespread availability of mechanical or electronic computers and required manual calculation (possibly with mechanical aids such as
adding machine An adding machine is a class of mechanical calculator, usually specialized for bookkeeping calculations. In the United States, the earliest adding machines were usually built to read in dollars and cents. Adding machines were ubiquitous of ...
s); they reported a computation time of 140 minutes for a size-64 DFT operating on real inputs to 3–5 significant digits. Cooley and Tukey's 1965 paper reported a running time of 0.02 minutes for a size-2048 complex DFT on an IBM 7094 (probably in 36-bit
single precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...
, ~8 digits). Rescaling the time by the number of operations, this corresponds roughly to a speedup factor of around 800,000. (To put the time for the hand calculation in perspective, 140 minutes for size 64 corresponds to an average of at most 16 seconds per floating-point operation, around 20% of which are multiplications.)


Pseudocode

In pseudocode, the below procedure could be written: ''X''0,...,''N''−1 ← ditfft2(''x'', ''N'', ''s''): ''DFT of (x''0, ''xs'', ''x''2''s'', ..., ''x''(''N''-1)''s''): if ''N'' = 1 then ''X''0 ← ''x''0 ''trivial size-1 DFT base case'' else ''X''0,...,''N''/2−1 ← ditfft2(''x'', ''N''/2, 2''s'') ''DFT of (x''0, ''x''2''s'', ''x''4''s'', ..., ''x''(''N''-2)''s'') ''XN''/2,...,''N''−1 ← ditfft2(''x''+s, ''N''/2, 2''s'') ''DFT of (xs'', ''xs''+2''s'', ''xs''+4''s'', ..., ''x''(''N''-1)''s'') for ''k'' = 0 to ''N''/2−1 do ''combine DFTs of two halves into full DFT:'' p ← ''Xk'' q ← exp(−2π''i''/''N'' ''k'') ''Xk''+''N''/2 ''Xk'' ← p + q ''Xk''+''N''/2 ← p − q end for end if Here, ditfft2(''x'',''N'',1), computes ''X''=DFT(''x'') out-of-place by a radix-2 DIT FFT, where ''N'' is an integer power of 2 and ''s''=1 is the
stride Stride or STRIDE may refer to: Computing * STRIDE (security), spoofing, tampering, repudiation, information disclosure, denial of service, elevation of privilege * Stride (software), a successor to the cloud-based HipChat, a corporate cloud-based ...
of the input ''x''
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
. ''x''+''s'' denotes the array starting with ''xs''. (The results are in the correct order in ''X'' and no further bit-reversal permutation is required; the often-mentioned necessity of a separate bit-reversal stage only arises for certain in-place algorithms, as described below.) High-performance FFT implementations make many modifications to the implementation of such an algorithm compared to this simple pseudocode. For example, one can use a larger base case than ''N''=1 to amortize the overhead of recursion, the
twiddle factor A twiddle factor, in fast Fourier transform (FFT) algorithms, is any of the trigonometric constant coefficients that are multiplied by the data in the course of the algorithm. This term was apparently coined by Gentleman & Sande in 1966, and has ...
s \exp 2\pi i k/ N/math> can be precomputed, and larger radices are often used for
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
reasons; these and other optimizations together can improve the performance by an order of magnitude or more.S. G. Johnson and M. Frigo,
Implementing FFTs in practice
" in ''Fast Fourier Transforms'' (C. S. Burrus, ed.), ch. 11, Rice University, Houston TX: Connexions, September 2008.
(In many textbook implementations the
depth-first Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alon ...
recursion is eliminated in favor of a nonrecursive
breadth-first Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next de ...
approach, although depth-first recursion has been argued to have better
memory locality In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
.) Several of these ideas are described in further detail below.


Idea

More generally, Cooley–Tukey algorithms recursively re-express a DFT of a composite size ''N'' = ''N''1''N''2 as:Duhamel, P., and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art," ''Signal Processing'' 19, 259–299 (1990) # Perform ''N''1 DFTs of size ''N''2. # Multiply by complex
roots of unity In mathematics, a root of unity, occasionally called a de Moivre number, is any complex number that yields 1 when raised to some positive integer power . Roots of unity are used in many branches of mathematics, and are especially important in ...
(often called the
twiddle factor A twiddle factor, in fast Fourier transform (FFT) algorithms, is any of the trigonometric constant coefficients that are multiplied by the data in the course of the algorithm. This term was apparently coined by Gentleman & Sande in 1966, and has ...
s). # Perform ''N''2 DFTs of size ''N''1. Typically, either ''N''1 or ''N''2 is a small factor (''not'' necessarily prime), called the radix (which can differ between stages of the recursion). If ''N''1 is the radix, it is called a decimation in time (DIT) algorithm, whereas if ''N''2 is the radix, it is decimation in frequency (DIF, also called the Sande–Tukey algorithm). The version presented above was a radix-2 DIT algorithm; in the final expression, the phase multiplying the odd transform is the twiddle factor, and the +/- combination (''butterfly'') of the even and odd transforms is a size-2 DFT. (The radix's small DFT is sometimes known as a
butterfly Butterflies are insects in the macrolepidopteran clade Rhopalocera from the order Lepidoptera, which also includes moths. Adult butterflies have large, often brightly coloured wings, and conspicuous, fluttering flight. The group comprise ...
, so-called because of the shape of the
dataflow diagram A data-flow diagram is a way of representing a flow of data through a process or a system (usually an information system). The DFD also provides information about the outputs and inputs of each entity and the process itself. A data-flow diagram h ...
for the radix-2 case.)


Variations

There are many other variations on the Cooley–Tukey algorithm. Mixed-radix implementations handle composite sizes with a variety of (typically small) factors in addition to two, usually (but not always) employing the O(''N''2) algorithm for the prime base cases of the recursion (it is also possible to employ an ''N'' log ''N'' algorithm for the prime base cases, such as Rader's or Bluestein's algorithm). Split radix merges radices 2 and 4, exploiting the fact that the first transform of radix 2 requires no twiddle factor, in order to achieve what was long the lowest known arithmetic operation count for power-of-two sizes, although recent variations achieve an even lower count. (On present-day computers, performance is determined more by
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
and
CPU pipeline In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time- ...
considerations than by strict operation counts; well-optimized FFT implementations often employ larger radices and/or hard-coded base-case transforms of significant size.). Another way of looking at the Cooley–Tukey algorithm is that it re-expresses a size ''N'' one-dimensional DFT as an ''N''1 by ''N''2 two-dimensional DFT (plus twiddles), where the output matrix is
transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...
d. The net result of all of these transpositions, for a radix-2 algorithm, corresponds to a bit reversal of the input (DIF) or output (DIT) indices. If, instead of using a small radix, one employs a radix of roughly and explicit input/output matrix transpositions, it is called a four-step algorithm (or ''six-step'', depending on the number of transpositions), initially proposed to improve memory locality,Gentleman W. M., and G. Sande, "Fast Fourier transforms—for fun and profit," ''Proc. AFIPS'' 29, 563–578 (1966).Bailey, David H., "FFTs in external or hierarchical memory," ''J. Supercomputing'' 4 (1), 23–35 (1990) e.g. for cache optimization or
out-of-core In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main memory at once. Such algorithms must be optimized to efficiently fetch and access d ...
operation, and was later shown to be an optimal
cache-oblivious algorithm In computing, a cache-oblivious algorithm (or cache-transcendent algorithm) is an algorithm designed to take advantage of a processor cache without having the size of the cache (or the length of the cache lines, etc.) as an explicit parameter. An ...
.M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In ''Proceedings of the 40th IEEE Symposium on Foundations of Computer Science'' (FOCS 99), p.285-297. 1999
Extended abstract at IEEE
The general Cooley–Tukey factorization rewrites the indices ''k'' and ''n'' as k = N_2 k_1 + k_2 and n = N_1 n_2 + n_1, respectively, where the indices ''k''a and ''n''a run from 0..''N''a-1 (for ''a'' of 1 or 2). That is, it re-indexes the input (''n'') and output (''k'') as ''N''1 by ''N''2 two-dimensional arrays in column-major and row-major order, respectively; the difference between these indexings is a transposition, as mentioned above. When this re-indexing is substituted into the DFT formula for ''nk'', the N_1 n_2 N_2 k_1 cross term vanishes (its exponential is unity), and the remaining terms give :X_ = \sum_^ \sum_^ x_ e^ ::= \sum_^ \left e^ \right \left( \sum_^ x_ e^ \right) e^ ::= \sum_^ \left( \sum_^ x_ e^ \right) e^ . where each inner sum is a DFT of size ''N''2, each outer sum is a DFT of size ''N''1, and the ../nowiki> bracketed term is the twiddle factor. An arbitrary radix ''r'' (as well as mixed radices) can be employed, as was shown by both Cooley and Tukey as well as Gauss (who gave examples of radix-3 and radix-6 steps). Cooley and Tukey originally assumed that the radix butterfly required O(''r''2) work and hence reckoned the complexity for a radix ''r'' to be O(''r''2 ''N''/''r'' log''r''''N'') = O(''N'' log2(''N'') ''r''/log2''r''); from calculation of values of ''r''/log2''r'' for integer values of ''r'' from 2 to 12 the optimal radix is found to be 3 (the closest integer to '' e'', which minimizes ''r''/log2''r''). This analysis was erroneous, however: the radix-butterfly is also a DFT and can be performed via an FFT algorithm in O(''r'' log ''r'') operations, hence the radix ''r'' actually cancels in the complexity O(''r'' log(''r'') ''N''/''r'' log''r''''N''), and the optimal ''r'' is determined by more complicated considerations. In practice, quite large ''r'' (32 or 64) are important in order to effectively exploit e.g. the large number of processor registers on modern processors, and even an unbounded radix ''r''= also achieves O(''N'' log ''N'') complexity and has theoretical and practical advantages for large ''N'' as mentioned above.


Data reordering, bit reversal, and in-place algorithms

Although the abstract Cooley–Tukey factorization of the DFT, above, applies in some form to all implementations of the algorithm, much greater diversity exists in the techniques for ordering and accessing the data at each stage of the FFT. Of special interest is the problem of devising an
in-place algorithm In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
that overwrites its input with its output data using only O(1) auxiliary storage. The most well-known reordering technique involves explicit bit reversal for in-place radix-2 algorithms. Bit reversal is the permutation where the data at an index ''n'', written in
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ...
with digits ''b''4''b''3''b''2''b''1''b''0 (e.g. 5 digits for ''N''=32 inputs), is transferred to the index with reversed digits ''b''0''b''1''b''2''b''3''b''4 . Consider the last stage of a radix-2 DIT algorithm like the one presented above, where the output is written in-place over the input: when E_k and O_k are combined with a size-2 DFT, those two values are overwritten by the outputs. However, the two output values should go in the first and second ''halves'' of the output array, corresponding to the ''most'' significant bit ''b''4 (for ''N''=32); whereas the two inputs E_k and O_k are interleaved in the even and odd elements, corresponding to the ''least'' significant bit ''b''0. Thus, in order to get the output in the correct place, ''b''0 should take the place of ''b''4 and the index becomes ''b''0''b''4''b''3''b''2''b''1. And for next recursive stage, those 4 least significant bits will become ''b''1''b''4''b''3''b''2, If you include all of the recursive stages of a radix-2 DIT algorithm, ''all'' the bits must be reversed and thus one must pre-process the input (''or'' post-process the output) with a bit reversal to get in-order output. (If each size-''N''/2 subtransform is to operate on contiguous data, the DIT ''input'' is pre-processed by bit-reversal.) Correspondingly, if you perform all of the steps in reverse order, you obtain a radix-2 DIF algorithm with bit reversal in post-processing (or pre-processing, respectively). The logarithm (log) used in this algorithm is a base 2 logarithm. The following is pseudocode for iterative radix-2 FFT algorithm implemented using bit-reversal permutation. algorithm iterative-fft is input: Array ''a'' of ''n'' complex values where n is a power of 2. output: Array ''A'' the DFT of a. bit-reverse-copy(a, A) ''n'' ← ''a''.length for ''s'' = 1 to log(''n'') do ''m'' ← 2''s'' ''ω''''m'' ← exp(−2π''i''/''m'') for ''k'' = 0 to ''n''-1 by ''m'' do ''ω'' ← 1 for ''j'' = 0 to ''m''/''2'' – 1 do ''t'' ← ''ω'' ''A'' 'k'' + ''j'' + ''m''/2 ''u'' ← ''A'' 'k'' + ''j'' ''A'' 'k'' + ''j''← ''u'' + ''t'' ''A'' 'k'' + ''j'' + ''m''/2← ''u'' – ''t'' ''ω'' ← ''ω'' ''ω''''m'' return ''A'' The bit-reverse-copy procedure can be implemented as follows. algorithm bit-reverse-copy(''a'',''A'') is input: Array ''a'' of ''n'' complex values where n is a power of 2. output: Array ''A'' of size ''n''. ''n'' ← ''a''.length for ''k'' = 0 ''to'' ''n'' – 1 do ''A'' ev(k):= ''a'' Alternatively, some applications (such as convolution) work equally well on bit-reversed data, so one can perform forward transforms, processing, and then inverse transforms all without bit reversal to produce final results in the natural order. Many FFT users, however, prefer natural-order outputs, and a separate, explicit bit-reversal stage can have a non-negligible impact on the computation time, even though bit reversal can be done in O(''N'') time and has been the subject of much research. Also, while the permutation is a bit reversal in the radix-2 case, it is more generally an arbitrary (mixed-base) digit reversal for the mixed-radix case, and the permutation algorithms become more complicated to implement. Moreover, it is desirable on many hardware architectures to re-order intermediate stages of the FFT algorithm so that they operate on consecutive (or at least more localized) data elements. To these ends, a number of alternative implementation schemes have been devised for the Cooley–Tukey algorithm that do not require separate bit reversal and/or involve additional permutations at intermediate stages. The problem is greatly simplified if it is out-of-place: the output array is distinct from the input array or, equivalently, an equal-size auxiliary array is available. The Stockham auto-sort algorithm performs every stage of the FFT out-of-place, typically writing back and forth between two arrays, transposing one "digit" of the indices with each stage, and has been especially popular on
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
architectures.P. N. Swarztrauber
FFT algorithms for vector computers
''Parallel Computing'' vol. 1, 45–63 (1984).
Even greater potential SIMD advantages (more consecutive accesses) have been proposed for the Pease algorithm, which also reorders out-of-place with each stage, but this method requires separate bit/digit reversal and O(''N'' log ''N'') storage. One can also directly apply the Cooley–Tukey factorization definition with explicit (
depth-first Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alon ...
) recursion and small radices, which produces natural-order out-of-place output with no separate permutation step (as in the pseudocode above) and can be argued to have
cache-oblivious In computing, a cache-oblivious algorithm (or cache-transcendent algorithm) is an algorithm designed to take advantage of a processor cache without having the size of the cache (or the length of the cache lines, etc.) as an explicit parameter. An ...
locality benefits on systems with hierarchical memory. A typical strategy for in-place algorithms without auxiliary storage and without separate digit-reversal passes involves small matrix transpositions (which swap individual pairs of digits) at intermediate stages, which can be combined with the radix butterflies to reduce the number of passes over the data.


References


External links

* * * * * {{DEFAULTSORT:Cooley-Tukey FFT algorithm FFT algorithms Articles with example pseudocode Divide-and-conquer algorithms