EDUCATIONE DUCATION

Editor: Denis Donnelly, [email protected]

THE FAST FOR EXPERIMENTALISTS PART III: CLASSICAL SPECTRAL ANALYSIS

By Bert Rust and Denis Donnelly

ACH ARTICLE IN THIS CONTINUING SERIES ON THE FAST Pf()= 1 T 2 FOURIER TRANSFORM (FFT) IS DESIGNED TO ILLUMINATE lim∫ xt ( ) exp(−2πi ftdt ) , E T →∞T −T NEW FEATURES OF THE WIDE-RANGING APPLICABILITY OF THIS –f .(3)

TRANSFORM. THIS SEGMENT DEALS WITH SOME ASPECTS OF THE But we have only a discrete, real time series spectrum estimation problem. Before Spectrum Estimation’s we begin, here’s a short refresher Central Problem xj = x(tj), with tj = j t, about two elements we introduced The , invented by j = 0, 1, …, N – 1, (4) previously, windowing1 and convolu- Arthur Schuster in 1898,3 was the tion.2 As we noted in those install- first formal estimator for a time se- defined on a finite time interval of ments, a convolution is an integral ries’s frequency spectrum, but many length Nt. We saw in Part I1 that that expresses the amount of overlap others have emerged in the ensuing sampling x(t) with sample spacing t of one as it is shifted over an- century. Almost all use the FFT in confined our spectral estimates to the other. The result is a blending of the their calculations, but they differ in Nyquist band 0 f 1/2t. We used two functions. Closely related to the their assumptions about the missing the FFT algorithm to compute the dis- convolution process are the processes data; that is, the data outside the ob- crete Fourier transform (DFT) of cross-correlation and autocorrela- servation window. These assumptions tion. Computing the cross-correlation have profound effects on the spectral N −1   =−π j differs only slightly from the convolu- estimates. Let t be time, f be fre- Xxkj∑ exp 2 i k = N tion; it’s useful for finding the degree quency, and x(t) a real function on the j 0 of similarity in signal patterns from interval – < t < . The continuous k = 0, 1, …, N/2, (5) two different data streams and in de- Fourier transform (CFT) of x(t) is de- termining the lead or lag between fined by which approximates the CFT X(f ) at such similar signals. the Fourier frequencies ∞ is also related to the convolution; it’s X ()fxtftdt=−∫ ()exp(2πi ) , k described later. Windowing, used in −∞ f = , k = 0, 1, ..., N/2. (6) k Nt∆ extracting or data, is typi- cally executed by multiplying time- –f ,(1)We then computed periodogram esti- domain data or its autocorrelation mates of both the PSD and the ampli- function by the . A where i ≡−1 . If we knew x(t) per- tude spectrum by disadvantage of windowing is that it fectly and could compute Equation 1, alters or restricts the data, which, of then we could compute an energy = 1 2 Pf()kk||X , k = 0, 1, …, N/2, course, has consequences for the spec- function N tral estimate. In this installment, we continue our discussion, building on E(f ) = |X(f )|2, –f ,(2) = 2 Af()kk||X , k = 0, 1, …, N/2. (7) these concepts with a more general N approach to computing spectrum es- and a power spectral density function timates via the FFT. (PSD) by We also saw that we could approximate

74 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 © 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING x(t) = sin [2π(0.50)(t + 0.25)] + noise 2.5 2.0 Signal and noise 1.5 Signal without noise ) i t

( 1.0 x 0.5 = i

x 0.0 –0.5 –1.0 –1.5 the CFT and the frequency spectrum 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 on a denser frequency mesh simply by (a) t (time units) appending zeroes to the time series. Periodogram This practice, called zero padding, is 9 just an explicit assertion of an implicit 8 Signal and noise 7 Signal without noise assumption of the periodogram 6 method—namely, that the time series 5 4 is zero outside the observation window. 3

density (PSD) 2 Frequency spectrum estimation is a Power spectral 1 classic underdetermined problem be- 0 cause we need to estimate the spectrum 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 at an infinite number of frequencies us- (b) Frequency ing only a finite amount of data. This problem has many solutions, differing Figure 1. Original and new time series as defined by Equation 8. (a) The noise- mainly in what they assume about the corrupted time series and the uncorrupted series originally used in Part I’s Figure 1b. missing data. The noise is independently, identically distributed n(0, 0.25). (b) of Before considering other solutions the two times series plotted in (a). For the noise-corrupted series, the peak is ^ to this problem, let’s reconsider one of centered on frequency f 0 = 0.493. the examples from Part I1 (specifically, Figure 1b), but make it more realistic 4 ␶ by simulating some random measure- Tukey’s correlogram estimators. m = m t, m = 0, 1, …, N – 1. (11) ment errors. More precisely, we take They’re based on the autocorrelation N = 32, t = 0.22, and consider the theorem (sometimes called Wiener’s Because we’re working with a real time ␳ ␶ ␳ ␶ time series theorem), which states that if X(f ) is series, and ( –m) = ( m), we don’t need the CFT of x(t), then |X(f )|2 is the to worry about evaluating ␳(␶) at neg- tj = j t, j = 0, 1, 2, …, N – 1, CFT of the autocorrelation function ative lags. (ACF) of x(t). Norbert Wiener defined Because ␳(␶) is a limit of the average ␲ 5 ␶ xj = x(tj) = sin[2 f0(tj + 0.25)] + j, (8) the latter function as value of x*(t)x(t + ) on the interval [–T, T ], the obvious estimator is the 1 T with f = 0.5, and each a random ρτ( )=+ lim∫ xtxtdt * ( ) ( τ ) , sequence of average values 0 j T →∞ T −T number drawn independently from a 2 ρρˆˆ=∆()mt normal distribution with mean zero – < ␶ < , (9) m −− and standard deviation ␴ = 0.25. This 1 Nm1 = ∑ xx new time series is plotted together in which the variable ␶ is called the lag − nnm+ , Nmn=0 with the original uncorrupted series in (the time interval for the correlation of Figure 1a. Both series were zero x(t) with itself), and x*(t) is the com- m = 0, 1, …, N – 1. (12) padded to length 1,024 (992 zeroes plex conjugate of x(t). Thus, if we appended) to obtain the periodogram could access x(t), we could compute This sequence is sometimes called the estimates given in Figure 1b. It’s re- the PSD in two ways: either by Equa- unbiased estimator of ␳(␶) because its markable how well the two spectra tion 3 or by expected value is the true value—that ^ agree, even though the noise’s stan- ∞ is, {␳(mt)} = ␳(mt). But the data =−ρτ π τ τ. (10) dard deviation was 25 percent of the P()ffd∫−∞ ()exp(2 i ) are noisy, and for successively larger ␳^ signal’s amplitude. values of m, the average m is based on But again, we have access to only a fewer and fewer terms, so the variance The Autocorrelation Function noisy time series x0, x1, …, xN–1, so to grows and, for large m, the estimator After the periodogram, the next fre- use the second method, we need esti- becomes unstable. Therefore, it’s quency spectrum estimators to emerge mates for ␳(␶) evaluated at the discrete common practice to use the biased were Richard Blackman and John lag values estimator

SEPTEMBER/OCTOBER 2005 75 E DUCATION

Autocorrelation estimates 1.0 0.8 Biased estimate 0.6 Unbiased estimate 0.4 t)

∆ 0.2 m

( 0.0 ␳ –0.2 –0.4 –0.6 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 corresponding to the biased and un- (a) Lag = m∆t biased ACF estimates, shown in Fig- Correlogram estimates of power spectral density (PSD) ure 2a. The negative sidelobes for the unbiased correlogram show dramati- 14 Using biased ACF cally why most analysts choose the bi- 10 Using unbiased ACF ased estimate even though its central 6 )

f peak is broader. The reason for this 2 P( broadening, and for the damped side- –2 lobes, is that the biased ACF, Equa- –6 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 tion 13, can also be computed by multiplying the unbiased ACF, Equa- (b) f tion 12, by the triangular (Bartlett) ta- pering window Figure 2. Autocorrelation and correlogram estimates for the noisy time series k defined by Equation 8. (a) Biased and unbiased estimates of the autocorrelation w =−1 , k N function (ACF); (b) correlogram estimates obtained from the ACF estimates in (a). k = 0, 1, 2, …, N – 1. (17)

Nm−−1 Recall that we observed the same sort ρρ=∆=1 ^ ˆˆmnnm()mt ∑ xx+ , –1 rm 1, m = 0, 1, …, N – 1. (15) of peak broadening and sidelobe sup- N n=0 pression in Part I’s Figure 10 when we m = 0, 1, …, N – 1, (13) Correlogram PSD Estimators multiplied the observed data by a Once we’ve established the ACF esti- Blackman window before computing which damps those instabilities and has mate, we can use the FFT to calculate the periodogram. a smaller total error (bias + variance) the discrete estimate to the PSD. More Notice that the biased correlogram than does the unbiased estimator. (Bias precisely, the ACF estimate is zero estimate plotted in Figure 2b is identi- is the difference between the estimator’s padded to have M lags, which gives cal to the periodogram estimate plot- expected value and the true value of the M/2 + 1 frequencies in the PSD esti- ted in Figure 1b. The equality of these quantity being estimated.) Figure 2a mate, which we can then compute by two estimates, computed in very dif- gives plots of both estimates for the approximating Equation 10 with ferent ways, constitutes a finite dimen- times series that Equation 8 defines. sional analogue of Wiener’s theorem Pˆˆ= Pf() The ACF we have just described is kk for the continuous PSD. − sometimes called the engineering auto- M 1  j  Figure 2b’s two PSD correlograms =−∑ ρπˆ , correlation to distinguish it from the j exp 2 i k aren’t the only members of the class of = M statistical autocorrelation, which is de- j 0 correlogram estimates. We can obtain fined by k = 0, 1, …, M/2. (16) other variations by truncating the ACF ␶ Nm−−1 estimate at lags < (N – 1) t and by 1 −−Zero padding in this case is an explicit smoothing the truncated (or untrun- ∑ ()()xxxnnm+ x N = expression of the implicit assumption cated) estimate with one of the taper- rˆ = n 0 , m N −1 that the ACF is zero for all lag values ing windows defined in Part I’s 1 2 ∑ ()xx− ␶ > (N – 1)t. We must assume that Equation 11. Most of those windows N n n=0 because we don’t know the data out- were originally developed for the cor- side the observation window. Assum- relogram method; they were then − 1 N 1 ing some nonzero extension for the retroactively applied to the perio- where x = ∑ x . (14) ACF would amount to an implicit dogram method when the latter was N n n=0 assumption about the missing ob- resurrected in the mid 1960s. In those ^ The individual rm are true correlation served data. days, people often used very severe coefficients because they satisfy Figure 2b plots the correlograms truncations, with the estimates being

76 COMPUTING IN SCIENCE & ENGINEERING Untapered correlogram estimates of power spectral density (PSD) 9 mmax = 10 8 mmax = 20 mmax = 31 7 Periodogram

6 set to zero at 90 percent or more of the 5 ) lags. Not only did this alleviate the f P( 4 variance instability problem, but it also reduced the computing time—an im- 3 portant consideration before the in- 2 vention of the FFT algorithm, and when computers were much slower 1 than today. 0 The effect of truncating the biased ACF estimate is shown in Figure 3, –1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 where mmax is the largest index for which the nonzero ACF estimate is re- f tained. More precisely, −− Figure 3. Three correlogram estimates for Equation 8 computed from the biased 1 Nm1 ρˆ = ∑ autocorrelation function (ACF) estimator in Equation 13. The periodogram, mnnm− xx+ , Nmn=0 although plotted, doesn’t show up as a separate curve because it’s identical to the = m = 31 correlogram. mm01, ,...,max , max ρ = =+ − ˆm 0011,mmmax ,..., N . (18) It’s clear that smaller values of m Tapered correlogram estimates of power spectral density (PSD) max 9 produce more pronounced sidelobes mmax = 10 and broader central peaks than larger 8 mmax = 20 values. The peak broadening is ac- mmax = 31 companied by a compensating de- 7 Periodogram crease in height to keep the area under 6 the curve invariant. PSD is measured ) in units of power-per-unit-frequency f 5 interval, so the peak’s area indicates its P( associated power. 4 Figure 4 shows the effect of tapering the truncated ACF estimates used in 3 Figure 3 with a Hamming window 2  mπ  w =+0..cos 538 0 462 1 m   , mmax m = 0, 1, 2, …, mmax. (19) 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 f The sidelobes are suppressed by the ta- pering, but the central peaks are fur- Figure 4. Three correlogram estimates for the time series generated by Equation 8. ther broadened. This loss in resolution We computed the estimates by tapering three truncations of the biased estimator in is the price we must pay to smooth the Equation 13 with a Hamming window. The periodogram was also plotted for sidelobes and eliminate their negative comparison. Although it has sidelobes, its central peak is sharper than those of the excursions. correlograms. Tapering the biased ACF estimates with the Hamming window amounts to twice tapering the unbiased esti- Bartlett window, Equation 17. Figure Hamming window, Equation 19. mates; we can obtain the former from 5 shows the effect of a single tapering Note that the sidelobes are not com- the latter by tapering them with the of the unbaised estimates with the pletely suppressed, but they’re not as

SEPTEMBER/OCTOBER 2005 77 E DUCATION

Tapered correlograms using unbiased autocorrelation function (ACF) 9 mmax = 10 8 mmax = 20 mmax = 31 7 Periodogram

6

5 various truncation and windowing ) f strategies, but none of them have P( 4 proven to be advantageous, so correlo- 3 gram estimates are beginning to fall 2 out of favor. For the past 30 years or so, most researchers have concentrated on 1 autoregressive spectral estimates, which, as we shall see in Part 4, give 0 better resolution because they make –1 better assumptions about the the data 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 outside the window of observation. f References Figure 5. Three correlogram estimates for the time series generated by Equation 8. 1. D. Donnelly and B. Rust, “The Fast Fourier We computed the estimates by tapering three truncations of the unbiased estimator Transform for Experimentalists, Part I: Con- cepts,” Computing in Science & Eng., vol. 7, in Equation 12. We also plotted the periodogram for comparison; again, it has a no. 2, 2005, pp. 80–88. sharper peak but larger sidelobes . 2. D. Donelly and B. Rust, “The Fast Fourier Transform for Experimentalists, Part II: Con- volutions,” Computing in Science & Eng., vol. 7, no. 3, 2005, pp. 92–95. pronounced as in Figure 3, in which 3. A. Schuster, “On the Investigation of Hidden the tapering used the Bartlett win- Periodicities with Application to a Supposed dow. However, the central peaks are Twenty-Six-Day Period of Meteorological also slightly broader here. This is yet Phenomena,” Terrestrial Magnetism, vol. 3, no. 1, 1898, pp. 13–41. another example of the trade-off 4. R.B. Blackman and J.W. Tukey, The Measure- between resolution and sidelobe ment of Power Spectra, Dover Publications, suppression. 1959. This particular example contains 5. N. Wiener, Extrapolation, Interpolation, and only a single-sinusoid, so it doesn’t Smoothing of Stationary Time Series, MIT Press, 1949. suggest any advantage for the taper- ing and truncation procedures, but they weren’t developed to analyze a Denis Donnelly is a professor of physics at time series with such a simple struc- Siena College. His research interests include Stay on Track ture. Their advantages are said to be computer modeling and electronics. Donnelly best realized when the signal being received a PhD in physics from the University of IEEE Internet Computing analyzed contains two or more sinu- Michigan. He is a member of the American reports emerging tools, soids with frequencies so closely Physical Society, the American Association of technologies, and applications spaced that sidelobes from two adja- Physics Teachers, and the American Association implemented through cent peaks might combine and rein- for the Advancement of Science. Contact him the Internet to support a force one another to give a spurious at [email protected]. worldwide computing peak in the spectrum. But of course, if environment. two adjacent frequencies are close Bert Rust is a mathematician at the US National enough, then the broadening of both Institute for Standards and Technology. His re- peaks might cause them to merge into search interests include ill-posed problems, an unresolved lump. time-series modeling, nonlinear regression, and observational cosmology. Rust received a PhD in astronomy from the University of Illinois. He www.computer.org/internet/ uch ink has been used in de- is a member of SIAM and the American Astro- M bating the relative merits of the nomical Society. Contact him at [email protected].

78 COMPUTING IN SCIENCE & ENGINEERING