<<

FAST" FOURIER TRANSFORMS-FOR FUN AND PROFIT

W. M. Gentleman

BeU Telephone Laboratories Murray HiU, New Jersey and G. Sande·

Princeton University Princeton, New Jersey

IMPLEMENTING FAST FOURIER which become practical when used with the fast TRANSFORMS algorithm, make the technique important. Let us therefore first consider some properties of such Definition and Elementary Properties of Fourier transforms. Transj017l'l8 The usual infinite Fourier integral transform is The "Fast " has now been well known and widely used-the physicist when widely known for about a year. During that time it solving a partial differential equation, the communi­ has had a ma~or effect on several areas of computing, cation engineer looking at noise and the statistician the most striking example being techniques of numer­ studying distributions may all resort to Fourier icaJ convolution, which have been completely ~vo­ transforms, not only be.cause the mathematics may lutionized. What exactly is the "Fast Fourier be simplified, but because nature itself is often easier Transform''? to understand in terms of frequency. What is less well known is that many of the properties of the In fact. the Fast Fourier Transform is nothing usual Fourier transform also hold, perhaps with more than an algorithm whereby, for appropriate slight modification, for the Fourier transform defined length sequences, the finite discrete Fourier transform on finite, equispaced, discrete sequences. of the sequence may be computed much more rapidly than by other available algorithms. The properties We see in Table I that the most significant change and uses of the finite discrete Fourier transform, required to modify the usual theorems so that they apply to the finite discrete case is that indexing must .,... work made use of computer facilities supported in be considered modulo N. A useful heuristic inter­ part by National ~CDce Foundation grant NSF-GPS79. pretation of this is to think of the sequence X(t) as Research was partially ~ported by the Office of Naval a function defined at equispaced points on a circle. ~ UDder CODtract Nonr 1858(05) and by the Na­ UUIIIU Reeearch CouncIl of Canada. This interpretation is to be contrasted with what is

.. 563 564 PROCEEDINGS-FALL J~INT COMPUTER. CONFERENCE, 1966 TABLE I TABLE l-(Continued) A Comparison Usual and Finite Discrete 0/ Uaual Finite Dilcrete Fourier Trans/orms that is, the inverse Fourior that is, tho inverse Fourier Usual Finite Discrete transform of the product of transform of the product of the Fourier transforms. the F 0 uri e r tranaforma. Definition NOTE: Tho convolutioD ....1' here must be conaidered as AA N-I ,X(n ... !(t)et-lI~dt X(t) ~ X(t)e lf cyclic, i.e.. the indicoa of X. f__ '=0 and Y must be intorpreled modulo N. Linearity Operational Calculru The Fourior transform is a The Fourier transform is a An operational can An operational calculus can linear operator. linear operator. be defined, based on the be defined, based on the OnhogonaUty property that property that N-I ... ,1 o:..llc1'-~dt - a(i'-'h X(t)j"'' ~ (~X(t»e'- f-co iG. '..0 · A where a(i'- f) is the Dirac where IN is the Kronecker - -2'11'01(i) - (e --;:' -1) X(I) delta function. That is, delta function with ita argu­ ment being considered mod­ i.e., the Fourior transform Le., tho Fourier transform ulo N. That is, b(kN) of the derivative of a func­ of the (forward) cWfereoce /(0) if 0 /.i(i>a(l)di "" .... I, for intesor k, othOl­ tion ia tho FoUriOl trans­ of a function is the Fourier wise b - 0. , form of the function, mul­ traDlform of the fgnc:tIoa. is fD the interval (a,b), tiplied by -2'11'li: otherwise I multiplied by (e ~t-1). NOTE: Tho difference here J.bf(i> a(i>Ji ... o. must be considered cycJl. cally, so that ~X(t) - X(t+ 1) - X(t) becomr:a Inverse Trans/orm ~X(N-1) .... K(N) - 00 AI' X(N-l) = X(O) - A A A If X(/) If X(t) - 1_ !(/)e... lldl X(N-l) for the case of I = N - 1. thon then Symmetries A If X is real then X is bermi­ If X is real then X is bermf. eo", A " i\ 1 N-l -..III X(t):- X(t)e-Iw"'dt , X(I) ... -~. X(;)e-·- til].D symmetric, i.e., t(t) - til].D aympletric, i.e., .I(t) .. -co f Nt:: {X(-1»·j if X is hepni­ {X(N-1)}·j if X is ~er­ l tian symmetric then jf is mitian symmetric, then Is which we observe can be . wbich we observe can be r considered real. , A ~. If Y is imasfnary, thea considered If Y is imasinarY, then Y r is hermitian antiaym­ 1 is hOJ..lDitian antispnmotric, m 0 tri , i.e., yO) - as X(t) ... as X(t) -- X N i.e., yet) - -{f(-n)·j -("(N-1»·j if Y Is her­ if Y is hermitian antisym­ mitian antiaymmolric theD N-I "lit}. metric thon Y is imasinary. t is imagiOary. . )"e-:-r {f-:X(iW..... '.G} • ~ {X(i> • I { NOTE: The uao of tile tenni hermitian aymmeCric or the complex conjugate of or the complex conjugate of and hermitian antisymmetric the Fourier transform of the tho Fourior transform of the complex conjupte of X. complex conjugate of X, di­ fD the discre~ cue is COD­ vided by N. sistont with that in tho umaI caso if we interpret ~ Convolution Theorem modUlo, N. Shifting Theore11U --X N FAST FOURIER TRANSFORMS-FOR FUN AND PROFIT 565 often the natural interpretation-as sampled data where from ..a continuous infinite function.

Basic Algebra 01 Fast Fourier Trans/orms We recognize At this point we will define the notation e(X) tJlflU so $at we may wP~ expressions such as those B-1 (bb) A-I ~ e - Z:(b) and 1 e (A)~ W,,(a) of Table ~ in a ~pler fonia. The two most impor­ "=0 B a=o A tant properties of e(X) are'·that as Fourier transforms themselves, applied to shorter e(X+Y) = e(X)e(Y) sequences. Obtaining a subsequence by starting at and the bth element and taking every Bth element e(X) = 1 if X is an integer. thereafter, in the manner W,,(a) is obtained from X(b+aB), is called decimating by B. Observe that Using this notation, the finite discrete Fourier tlJi sequence Z~(b) are not quite the sequence of transform' of the sequence X(I) is frequency a values from the transforms of these decimated sequences, but these values multiplied 'by N-t ( It) . = X(t)e N XC?) ~ Ii "twiddle fac;tor" e ( ab ). , AB Suppose N has factors A and B so that N AB. = The Fourier transform of the complete AB point • '" A A Then writing I = a + bA and I = b + aB where sequence may thus be accomplished by doing the B a,a = 0,1, .. . ,A-l and b,8 = 0,1, .. . ,B-1; we different A point Fourier transforms, multiplying have through by the appropriate twiddle factors, then X(a+bA) doing the A different B point Fourier transforms. This is a recursive formula which defines the larger = ~ ~ X(b+aB)e( {a+bAl~b+aB} ) Fourier transform in terms of smaller ones. The total number of operations is now proportional to AB(A +B) rather than (AB)" as it would be for a B-1 A-I {fA aa bb .. ) = ~ ~ X(b+aB)e - + - + - + ab direct implementation of the definition, hence the hOa=o B A B name "Fast Fourier Transform". Associated observations: = ~ ~ ':~(b+aB)' (~) e (;) e( ~) 1. Attention is drawn to the special ad­ vantage of factors 2 or 4, in that since a , as as is in~gral implies e( as) = 1 Fourier transform of a 2 or 4 point se­ quence may be done using only additions = ~ e (~)l e ~ ) ~ X(b+aB)e (~a)~ and subtractions, one stage of complex arithmetic may be avoided. If we define the B different sequences 2. Although the recursive algorithm above may be implemented directly, it is Wt(a) X(b+aH) a 0,1, .. . ,A-l = = much better to simulate the recursion as and the A different sequences described in the following sections, since one may avoid the unnecessary calculation Zt(b) = e (!) ~ X(b+aB)e (;) of some complex exponentials. With our programs, for example, the simulating th~ b = 0,1, .. . ,B-l recursion takes only 2/5 as long for a 210 point transform .• we can write the above equation as • ' . B 1 A • All programs discuaed in this paper were fmplemeated X(a+GB) = i e (bb) Za(b) usins ALCOR Algol 60 on tho mM 7094 at Princeton 11110 B University. .. 566 PROCEEDING~FALL JOINT COMPUTER CONFERBN,CB, 1966

3. The multi-dimensional finite discrete Fourier transform

e(~(a+bA+CAB») X(a+bA+cAB) . ABC

In both cases we may factor ~e tWiddle factor out­ side the summation. if we do tliis we obtain in the first case ...... can be computed efficiently by factorizing X(c+bc+aBC) = ~ e(~) eC~[~~~) in each dimension separately, as above. 4. The algebra presented here is not 2 e(b"&) ~(b~)2 e(~)X(a+bA+cA.B) quite that appearing in "An Algorithm for &..0 B B CaD .C the Machine Calculations of Complex (Cooley version) ," by I. W. Cooley' and I. W. In the second case we obtain Tukey.1I The difference can be seen if we consider N as having three factors, N = X(c+tC+DBC) ~ A-1~ e (""')~ e ( ABab )"'1l , ABC. . . _ A ' '-0 the COoley-Tukey algebra then is e(bb) e(' e.(a+bA»)~ e(' .c2) X(a+bA+cA.B) B ABC .c-O C (Sande'version) The difference between these two versions boo comes important when we come to the details of implementing the fast Fourier transform. We go on to this next.

Fast Fourier Tra'ns!orms Using Scratch Storage

Let us consider the Sande version applied to an X sequence stored in serial order. The summation over c represents a C point Fourier transform of points spaced AB apart. There is one such transform for each of the AB values of a + bA. The result of one such transform is a frequency sequence indexed by ~. U we look at the twiddle. factor e ( 2-(a+ bA») we ~ /( ABC see that it depends upon ~ and a + bA in a very convenient manner. To introduce some descriptive If, rather than collecting on the unhatted variables terminology we may call "C the frequency for this as above, we choose to collect on the hatted vari­ analysis and a + bA the displacement for tbis anal­ ables, we obtain ysis. At this stage there are no free indices corre­ sponding to replications. When we store the inter­ mediate results in the scratch storage area, placing eaph of the C point Fourier transforms in contiguous blocks indexed by the displacement leads to elements X(a+bA +cAB)e +CAB») (;i-(a+~ stored at ~ + C(a+bA). The intermediate summa­ tion over b represents a B point Foini.er transform . e (b'(a+bA +CAB).) ~ (~(a+bA +CA~») with points spaced AC apart; one s\lCh 1ransform for AB ABC each of the AC values of c+ aC. The result of any FABr FOURIER TRANSFORMS-FOR FUN AND PROFIT 567

one of,these transforms is a frequency sequence in­ an algorithmic viewpoint when we use storage in this manner. We have only'reversed the roles of the dexed by ,. The twiddle factor e ( a&) depends , AB hatted and unhatted variables in the fo~atio~ of:the only upon a and b, leaving e as a free index. Here twiddle factors. we would call &the frequency for the analysis, a the To obtain the general case of more, than' :threc displacement for the: analysis and e the replication factors we proceed to gI'9up our factors, in three index. We would store the intermediate results at groups, for example N = (Pt .. •Pj-1)P J(PJ+l' . . P,,). 8 + tc + aBC. In this case the contiguous blocks If we identify are ~ out by the replication factor C: The outer summation over a represents an .If point FoUrier A = Pl" 'PH B = PI transform with points spaced BC apart; one such " trimsform for each of the BC Values of 8 + te. The t = PI +t ••• P" result of any' one of these transforms is ~ frequency and perform the above reductions we find that after sequence indexed by a. There ,is no twiddle factor in two steps with this identification we arrive at exactly this case. Here we would call a the frequency and the same place as after only one step with the ~ + ~C the replication index. There fs no displace­ identificatioIJ, "" ment at thiS stage: We' would store the results at C + bC + aBC., At this point we see that the' results A = Pl" 'PH B = PH are stored in serial order of frequency. C = P ••• P" When we compute one of the transforms we may J wish to include the twiddle factor in the coefficients We can think of this as moving factors from the "A" for the values of the replication index before com­ part through the lOB" part to the "c" part. puting new coefficients for a different displacement. When we write a program to implement this, we If the Fourier transform is on two points there ap­ set up a triply nested iteration. The outer loop selects pears tc? be ,no advantage to either choice. For four the current factor being ,moved and sets up th~ ljmits and ind~xing parameters for the innc;r loops. TJle points' it is more economical to ~o the Fourier tr~­ inte~ediate loop, in which ~e would compute, the D;lultiply by the factor. In the f~ an4 ~~ twid~e twiddle factor, corresponds to the displaceJD.ent. The other c.ses'it is more efficient if the twiddle factor 'is inner loop provides 'indexing over the replication absorbed into the transform coeo\cients. & • • \ • variables. ' If we ,rere tq uslf the Cooley vers~on OQ a s~uence stored in serial order we would obtain an algorithm Fast Fourier Transforms in Place ,which differs only ,in minor details. The summation over c represents a C point Fourier ~orm of Lei us reConsider 'the storage requirements for our algorithm. In particular, let us consider one of points spaced AB apart. The twiddle' ~t e ( be ) the small Fourier transforms' on c for some value of , • BC depends upon the frequency e and displacement b, II + bA. It is only during this transform that we will leaving a free as a replication index: The e1ements need this set of intermediate results. We have just are stored at + aC + bAC. The intermediate s1.im­ vacated the' C cells a + bA + cAB (indexed by c) e and are looking about for C cells in which to store mation' over b represents a B point ;Fourier trahsform our' answers (ilidexed by ~). Why not use the cells of j)Omts spaced AC apart. twiddie factor The that have just been vacated? e (cr(e+bc») ,depends upon ' 'the combined fre- Having 'made this observation, let us reexamine " ABC ,'J, , ' " the algorithm corresponding to the Sande factoriza­ queo.cy e + 1c and the displaCement a. T4ere is rio tiop. The' summation on c represents a C point free. indel§~ ne intermediate res~ are sto~ed ~t Fourier transform of points spaced AB apart. The C+ ~C -t a!!C. The outer ~u~atio~ represellts an , . ( c-(a+bA) ) , twiddle factor e depends upon the fro- ..( poJnt lloQder transfOp;)1 of results spaced BC ABC ,apart. ~ is no tWIddle factor in this case. The quency c and the displac~ent a +. bA. There is no final resultB ' would be stoIed in serial order at replication index. The intermediate tesults are stored c + be:... aBc. , at a + bA +cAB. The sunmiatiori on b repres~ts a These twofred11ctions are essentially identical fl:om B point Fourier transform of pOints A apart. The .. 568 PROCEEDINGS-PALL JOINT COMPUTER CONFERENCE, 15166 , TABLE n twi~e, fa~tor e ( Aia1J ') depends upon the frequency TIME FOR A 1024 POINT 'fRI\lt{SFORM : and the displacement a. The replication index is ~. BY VARIOUS METHODS We treat each of the blocks of length AB indexed by ~ in an exactly equivalent manner. The summation Melhbd Time over a,represen~ contiguous A point Fourier trans­ forms. The frequency is aand the replication index is radix 2 " 'S + cJ. There is no displacement. The final answers radix 4 (also radix 4 + 2 and mixed flcliccI) are then stored at a + SA + CAB. The use of "dis­ radix 2 (recursively implemented) - placement" and "replication" is now much more Goertzel's method suggestive and natural than when we used these - , ~ whlle describing the use of scratch storage. incrementing by one) and one with revcraCd dJ&ba The more general case of more than three factors (easily obtained by nesting loops with tho bmermoat is again obtained by grouping to obtain "A", "B", stepping by the largest increments) ~~ recopyJua and "C" and-then moving factors from the "A" part from one array to another where one coUDtm' is used to the "C" part; the program being written as a triply for each array. If all of the factors are tho aam.e. the nested iteration loop. , req~ed interchanges become ~e and it is pol­ • When we reexamine the Cooley factorization for si.,le to do the unscrambling in place. Ono may un­ serially stored data, we will see that it has an un­ scramble the real and imaginary pans separately, 10 pleasant feature. The summation over c represents a scratch space could be generated by backing ODD or C point Fourier transformation of points spaced AB the other onto auxiliary store. A slower method of unscrambling is to follqw the permutatlon cydca • ~part. The twiddle factor e( :! ) depends only that no scratch s~go is needed. upon b and C, leaving a as 'a free index. The inter­ Two impleDlentations, bo~ for ~eriallJ stored so­ mediate t:esults are stored at a + bA + cAB. The quenc~, have been described. Al~tely, two b;a­ summation over b represents a B point Fourier trans­ plementations, both for data stored with digit 10- fbrm" of points spaced A apart. The twiddle factor versed 'subscripts,' may be developed: in this ~ a(C'+Sc») '. the Cooley factorizJltion has more cOnvenient twick11e e ( ABC depends upon the displacement a ~d factors. For the last two, the final results are c:or­ A .... I rectIy stored-the ''unscrambling" having been doDe the combined frequency c + bC. For data which first Digit reversed subscripts may arise because was originally stored serially, this is not convenient we have to compute. The intermediate results are stored at a :+ bA + cAB. The summation over a represents 1. scrambled serially stored data, contiguous A point Fourier transforms. The final 2. not unscrambled after a previous results are stored at a + SA + eAB. More than Fourier transform, three factors can be handled in the same manner as 3. generated data in scrambled order. before. We have written various Fourier transform pr0- The CODlJQon unpleasant feature of both of these grams in the manner described above. These include algorithms is that, the Fourier coefficient for fre­ one- and multi-d~ensional verSions of radix 2 (aD q~ency e + tc + -aBc is stored at a+ tA + ~AB. factors equal to 2), radix 4 (all factors equal to 4), Such a storage scheme may be described as storage radix 4 + 2 (all factors. equal to 4 except perhaps with "digit reversed subscripts." Before we may use the last, wJtich may be 2), and mbted radices (N is our Fourier coefficients we must unscramble them.· factored into as many 4's as possible, a 2 if nec:ea­ UD8C1'BDlbling has been found to require only a small sary, then any 3'8, 5'8 or other prim~). 'TImes re­ proportion of the total time for the algorithm. A very quired to transform a 1024 point iequence are giveD elegant solution corresponds to running two counters, in Table n, including the time required by a recur­ one with normal digitS (easily obtained by simply sive radix 2 transform and by the pn>fast F01J1ier 4o'IbiI problem has nothiq to do with the representation transform "eftfcient" Goertze1's method-(which sdB f:!, munbera in any particular machine, and unless the ma­ requires order N~ operations) for comparison. cbiDo has a "revel'lO digit order" instruction (e.,., "reverse bit ontu" for a binary machine), DO advantap can be taken Our standBrd tool is the radix 4 + 2 Fourier traDI­ of· JUCh representation. form which combines the speed of radix 4 with the FAST FOURIER TRANSFORMS-FOR FUN AND PROFIT ~69

1Iexibility of ra~ 2.. The ~ed radices Fourier The 64 files are now brought into core one at a transform is used when other factors, for example time. When a file comes in, it is Fourier tran,sformed 10, are desired. The mixed radix is used less as it is using a standard in-core transform, then the points more bulky (requiring scratch storage for unscram­ are multiplied by the twiddle factor e(~/218), where bling as well as being a larger program) and is less b is the number of the file (0 through 63) and a is thoroughly optimized than .th~ radix 4 + 2 Fourier the frequency index for the particular point. The file transform. is then written out again. By choosing to use ·trans­ forms of only 4096 points, we can fully buffer these Fast Fourier Transforms Using Hierarchical Store operations, one file being Fourier transformed while As we become interested in doing larger and ~e previous file is being written out and the succeed­ larger Fourier transforms, we reach a point where we ing file read in. may not be able to have all of the data in core simul­ We now start on the second set of transforms. taneously, and some must be kept in slower store Here we make use of the random access feature of such as drum, disk, or tape. At the other end of the disk to rea