The Cooley-Tukey Fast Fourier Transform Algorithm*

OpenStax-CNX module: m16334 1 The Cooley-Tukey Fast Fourier Transform Algorithm* C. Sidney Burrus This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 2.0 The publication by Cooley and Tukey [5] in 1965 of an ecient algorithm for the calculation of the DFT was a major turning point in the development of digital signal processing. During the ve or so years that followed, various extensions and modications were made to the original algorithm [6]. By the early 1970's the practical programs were basically in the form used today. The standard development presented in [21], [22], [1] shows how the DFT of a length-N sequence can be simply calculated from the two length-N/2 DFT's of the even index terms and the odd index terms. This is then applied to the two half-length DFT's to give four quarter-length DFT's, and repeated until N scalars are left which are the DFT values. Because of alternately taking the even and odd index terms, two forms of the resulting programs are called decimation- in-time and decimation-in-frequency. For a length of M , the dividing process is repeated times 2 M = log2N and requires N multiplications each time. This gives the famous formula for the computational complexity of the FFT of which was derived in Multidimensional Index Mapping: Equation 34. Nlog2N Although the decimation methods are straightforward and easy to understand, they do not generalize well. For that reason it will be assumed that the reader is familiar with that description and this chapter will develop the FFT using the index map from Multidimensional Index Mapping. The Cooley-Tukey FFT always uses the Type 2 index map from Multidimensional Index Mapping: Equation 11. This is necessary for the most popular forms that have N = RM , but is also used even when the factors are relatively prime and a Type 1 map could be used. The time and frequency maps from Multidimensional Index Mapping: Equation 6 and Multidimensional Index Mapping: Equation 12 are (1) n = ((K1n1 + K2n2))N (2) k = ((K3k1 + K4k2))N Type-2 conditions Multidimensional Index Mapping: Equation 8 and Multidimensional Index Mapping: Equation 11 become K1 = aN2 or K2 = bN1 but not both (3) and K3 = cN2 or K4 = dN1 but not both (4) *Version 1.13: Sep 21, 2009 4:17 pm -0500 http://creativecommons.org/licenses/by/2.0/ http://cnx.org/content/m16334/1.13/ OpenStax-CNX module: m16334 2 The row and column calculations in Multidimensional Index Mapping: Equation 15 are uncoupled by Multidimensional Index Mapping: Equation 16 which for this case are or but not both (5) ((K1K4))N = 0 ((K2K3))N = 0 To make each short sum a DFT, the Ki must satisfy and (6) ((K1K3))N = N2 ((K2K4))N = N1 In order to have the smallest values for Ki the constants in (3) are chosen to be a = d = K2 = K3 = 1 (7) which makes the index maps of (1) become n = N2n1 + n2 (8) k = k1 + N1k2 (9) These index maps are all evaluated modulo N, but in (8), explicit reduction is not necessary since n never exceeds N. The reduction notation will be omitted for clarity. From Multidimensional Index Mapping: Equation 15 and example Multidimensional Index Mapping: Equation 19, the DFT is N −1 N −1 X2 X1 X = x W n1k1 W n2k1 W n2k2 (10) N1 N N2 n2=0 n1=0 This map of (8) and the form of the DFT in (10) are the fundamentals of the Cooley-Tukey FFT. The order of the summations using the Type 2 map in (10) cannot be reversed as it can with the Type-1 map. This is because of the WN terms, the twiddle factors. Turning (10) into an ecient program requires some care. From Multidimensional Index Mapping: Eciencies Resulting from Index Mapping with the DFT we know that all the factors should be equal. If M M−1 N = R , with R called the radix, N1 is rst set equal to R and N2 is then necessarily R . Consider n1 to be the index along the rows and n2 along the columns. The inner sum of (10) over n1 represents a length-N1 DFT for each value of n2. These N2 length-N1 DFT's are the DFT's of the rows of the x (n1; n2) array. The resulting array of row DFT's is multiplied by an array of twiddle factors which are the WN terms in (10). The twiddle-factor array for a length-8 radix-2 FFT is 2 3 2 3 W 0 W 0 1 1 6 7 6 7 6 W 0 W 1 7 6 1 W 7 n2k1 6 7 6 7 (11) TF : W8 = 6 7 = 6 7 6 W 0 W 2 7 6 1 −j 7 4 5 4 5 W 0 W 3 1 −jW The twiddle factor array will always have unity in the rst row and rst column. To complete (10) at this point, after the row DFT's are multiplied by the TF array, the N1 length-N2 DFT's of the columns are calculated. However, since the columns DFT's are of length RM−1, they can be posed as a RM−2 by R array and the process repeated, again using length-R DFT's. After M stages of length-R DFT's with TF multiplications interleaved, the DFT is complete. The ow graph of a length-2 DFT is given in Figure 1 and is called a buttery because of its shape. The ow graph of the complete length-8 radix-2 FFT is shown in Figure 2 . http://cnx.org/content/m16334/1.13/ OpenStax-CNX module: m16334 3 x(0) X(0) = x(0) + x(1) x(1) X(0) = x(0) - x(1) Figure 11: A Radix-2 Buttery x0 x0 x1 - x4 W0 x x 2 - 2 W2 x x 3 - - 6 W0 x x 4 - 1 W1 x x 5 - - 5 W2 W0 x x 6 - - 3 W3 W2 x x 7 - - - 7 Figure 11: Length-8 Radix-2 FFT Flow Graph This ow-graph, the twiddle factor map of (11), and the basic equation (10) should be completely understood before going further. A very ecient indexing scheme has evolved over the years that results in a compact and ecient computer program. A FORTRAN program is given below that implements the radix-2 FFT. It should be studied [3] to see how it implements (10) and the ow-graph representation. http://cnx.org/content/m16334/1.13/ OpenStax-CNX module: m16334 4 N2 = N DO 10 K = 1, M N1 = N2 N2 = N2/2 E = 6.28318/N1 A = 0 DO 20 J = 1, N2 C = COS (A) S =-SIN (A) A = J*E DO 30 I = J, N, N1 L = I + N2 XT = X(I) - X(L) X(I) = X(I) + X(L) YT = Y(I) - Y(L) Y(I) = Y(I) + Y(L) X(L) = XT*C - YT*S Y(L) = XT*S + YT*C 30 CONTINUE 20 CONTINUE 10 CONTINUE Listing 1: A Radix-2 Cooley-Tukey FFT Program This discussion, the ow graph of Winograd's Short DFT Algorithms: Figure 2 and the program of p. ?? are all based on the input index map of Multidimensional Index Mapping: Equation 6 and (1) and the calculations are performed in-place. According to Multidimensional Index Mapping: In-Place Calculation of the DFT and Scrambling, this means the output is scrambled in bit-reversed order and should be followed by an unscrambler to give the DFT in proper order. This formulation is called a decimation-in-frequency FFT [21], [22], [1]. A very similar algorithm based on the output index map can be derived which is called a decimation-in-time FFT. Examples of FFT programs are found in [3] and in the Appendix of this book. 1 Modications to the Basic Cooley-Tukey FFT Soon after the paper by Cooley and Tukey, there were improvements and extensions made. One very important discovery was the improvement in eciency by using a larger radix of 4, 8 or even 16. For example, just as for the radix-2 buttery, there are no multiplications required for a length-4 DFT, and therefore, a radix-4 FFT would have only twiddle factor multiplications. Because there are half as many stages in a radix-4 FFT, there would be half as many multiplications as in a radix-2 FFT. In practice, because some of the multiplications are by unity, the improvement is not by a factor of two, but it is signicant. A radix-4 FFT is easily developed from the basic radix-2 structure by replacing the length-2 buttery by a length-4 buttery and making a few other modications. Programs can be found in [3] and operation counts will be given in "Evaluation of the Cooley-Tukey FFT Algorithms" (Section 3: Evaluation of the Cooley-Tukey FFT Algorithms). http://cnx.org/content/m16334/1.13/ OpenStax-CNX module: m16334 5 Increasing the radix to 8 gives some improvement but not as much as from 2 to 4. Increasing it to 16 is theoretically promising but the small decrease in multiplications is somewhat oset by an increase in additions and the program becomes rather long. Other radices are not attractive because they generally require a substantial number of multiplications and additions in the butteries. The second method of reducing arithmetic is to remove the unnecessary TF multiplications by plus or minus unity or by plus or minus the square root of minus one.

The Cooley-Tukey Fast Fourier Transform Algorithm*

The Fast Fourier Transform

Lecture 11 : Discrete Cosine Transform Moving Into the Frequency Domain

Parallel Fast Fourier Transform Transforms

Evaluating Fourier Transforms with MATLAB

Fourier Transforms & the Convolution Theorem

Fourier Transform, Convolution Theorem, and Linear Dynamical Systems April 28, 2016

Chapter B12. Fast Fourier Transform Taking the FFT of Each Row of a Two-Dimensional Matrix: § 1236 Chapter B12

Parallel Fast Fourier Transform-Numerical Lib 2015

A Fast Computational Algorithm for the Discrete Cosine Transform

Fourier Transforms and the Fast Fourier Transform (FFT) Algorithm Paul Heckbert Feb

Lecture 12 Discrete and Fast Fourier Transforms

Fourier Analysis