<<

MUS421 Lecture 2 Review of the Discrete (DFT)

Julius O. Smith III ([email protected]) Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California 94305

June 27, 2020

Outline

Domains of Definition • Discrete Fourier Transform • Properties of the Fourier Transform • For more details, see

Mathematics of the DFT (Music 320 text): • http://ccrma.stanford.edu/~jos/mdft/ Chapter 2 and Appendix B of • Spectral Audio (our text): http://ccrma.stanford.edu/~jos/sasp/

1 Domains of Definition

The Fourier Transform can be defined for signals that are

Discrete or Continuous Time • Finite or Infinite Duration • This results in four cases:

Time Duration Finite Infinite (FS) Fourier Transform (FT) cont. P + jω t ∞ jωt X(k)= x(t)e− k dt X(ω)= x(t)e− dt time 0 k = Z ,..., + ω Z(−∞ , + ) t −∞ ∞ ∈ −∞ ∞ Discrete FT (DFT) Discrete Time FT (DTFT) discr. N 1 + − jω n ∞ jωn X(k)= x(n)e− k X(ω)= x(n)e− time n=0 n= k =0, X1,...,N 1 ω X( −∞π, +π) n − ∈ − discrete freq. k continuous freq. ω

2 Geometric Interpretation of the Fourier Transform

In all four cases, X(ω)= x,s h ωi where sω is a complex sinusoid at radian frequency ω rad/s:

ejωt (Fourier transform case), • ejωn (DTFT case, T =1/f =1), • s ejωkt (Fourier series case), • ej2πkn/N (DFT case). • Geometrically, X(ω)= x,s is proportional to the h ωi coefficient of projection of the signal x onto the signal sω.

3 Signal and Transform Notation

n,k Z (integers) or Z (integers modulo N) • ∈ N x(n) R (reals) or C (complex numbers) • ∈ x CN means x is a length N complex sequence • ∈ x = x( ) • · X = DFT(x) CN , or • ∈ x X ↔ where “ ” is read as “corresponds to”. ↔ X(k)= DFT (x)= DFT (x) C • k N,k ∈ x(n)= IDFT (X)= IDFT (X) • n N,n C DTFT DFT C For x ∞, X = (x)= (x) 2∞π • ∈ ∞ ∈ x = conjugate of x • ∠x = phase of x • The notation XY or X Y denotes the vector containing · (XY )k = X(k)Y (k), k =0,...,N 1. This is denoted by ‘X .* Y’ in Matlab, where X and−Y may a pair of column vectors, or a pair of row vectors.

4 The Discrete Fourier Transform

The “kth bin” of the Discrete Fourier Transform (DFT) is defined as N 1 − ∆ ∆ ∆ jω tn X(k) = DFT (x) = x,s = x(t )e− k k h ki n n=0 X ∆ s (n) = ejωktn; k =0, 1,...,N 1 k − k 2πk ω =∆ 2π f = ; t =∆ nT k N s NT n The DFT is proportional the coefficients of projection of the signal vector x onto the N sinusoidal basis signals sk, k =0, 1,...,N 1: − X(k)= x,s h ki The normalized DFT is precisely the coefficients of projection of the signal vector x onto the N normalized sinusoidal basis signals s˜ = s / s = s /√N: k k k kk k X˜ (k)= x, s˜ h ki

5 Inverse DFT

The inverse DFT is given by N 1 N 1 − x,s 1 − k jωktn x(tn)= h 2isk(tn)= X(ωk)e sk N Xk=0 k k Xk=0 It can be interpreted as the superposition of the projections, i.e., the sum of the sinusoidal basis signals weighted by their respective coefficients of projection:

x,sk x = h 2isk sk Xk k k The inverse normalized DFT is cleaner: x = x, s˜ s˜ h ki k Xk

6 The DFT, Cont’d

There are several ways to think about the DFT:

1. Projection onto the set of “basis” sinusoids (frequencies at N roots of unity) 2. Coordinate transformation (“natural” RN basis to “sinusoidal” basis)

3. Matrix multiplication X = W∗x, jω tn where W∗[k,n]= e− k 4. Sampled uniform filter bank output

This course will emphasize interpretations 1 and 4.

7 Properties of the DFT

We are going to be performing manipulations on signals and their Fourier Transform throughout this class. It is important to understand how changes we make in one domain affect the other domain. The Fourier theorems are helpful for this purpose. Derivations of the Fourier theorems for the DTFT case may be found in Chapter 2 of the text, and in Mathematics of the DFT1 (Music 320 text) for the DFT case.

1http://ccrma.stanford.edu/~jos/mdft/Fourier Theorems.html

8 Linearity

αx + βx αX + βX 1 2 ↔ 1 2 or DFT(αx + βx ) = α DFT(x )+ β DFT(x ) 1 2 · 1 · 2 α, β C ∈ x ,x ,X ,X CN 1 2 1 2 ∈ The Fourier Transform “commutes with mixing.”

9 Symmetries for Real Signals

If a time-domain signal x is real, then its Fourier transform X is conjugate symmetric (Hermitian): x RN X( k)= X(k) ∈ ⇔ − or Real Hermitian ↔ Hermitian symmetry implies

Real part Symmetric (even): • re X( k) = re X(k) { − } { } Imaginary part Antisymmetric (skew-symmetric, odd): • im X( k) = im X(k) { − } − { } Magnitude Symmetric (even): • X( k) = X(k) | − | | | Phase Antisymmetric (odd): • ∠X( k)= ∠X(k) − −

10 Time Reversal

Definition: Flip (x) =∆ x( n) =∆ x(N n) n − − Note: x(n) =∆ x(n mod N) for signals in CN (DFT case). When computing a sampled DTFT using the DFT, we interpret time indices n =1, 2,...,N/2 1 as positive time indices, and n = N 1,N 2,...,N/− 2 as the negative time indices n =− 1, 2−,..., N/2. Under this interpretation, the Flip− operator− simply− reverses a signal in time. Fourier theorems: Flip(x) Flip(X) ↔ for x CN . In the typical special case of real signals (x R∈N ), we have Flip(X)= X so that ∈ Flip(x) X ↔ Time-reversing a real signal conjugates its spectrum

11 Shift Theorem

The Shift operator is defined as Shift (y) =∆ y(n l). l,n − Since indexing is defined modulo N, Shiftl(y) is a circular right-shift by l samples.

j( )l Shift (y) e− · Y l ↔ or, more loosely,

jωl y(n l) e− Y (ω) − ↔ i.e., jωkl DFTk[Shiftl(y)] = e− Y (ωk) jω l e− k = Linear Phase Term, slope = l  − ∠Y (ω ) += ω l • k − k Multiplying a spectrum Y by a linear phase term • jω l e− k with phase slope l corresponds to a circular right-shift in the time domain− by l samples: negative slope time delay • ⇒ positive slope time advance • ⇒

12

The cyclic convolution of x and y is defined as N 1 − (x y)(n) =∆ x(m)y(n m), x,y CN ∗ − ∈ m=0 X Cyclic convolution is also called , since y(n m) =∆ y(n m (mod N)). − − Convolution is cyclic in the time domain for the DFT and FS cases, and acyclic for the DTFT and FT cases. The Convolution Theorem is then (x y) X Y ∗ ↔ ·

13 Linear Convolution of Short Signals

x(t) h y(t) = (x∗h)(t)

Convolution theorem for DFTs: (h x) H X ∗ ↔ · or DFT (h x)= H(ω )X(ω ) k ∗ k k where h,x CN , and H and X are the N-point DFTs of h and x,∈ respectively. DFT performs circular (or cyclic) convolution: N 1 − y(n) =∆ (x h)(n) =∆ x(m)h(n m) ∗ − N m=0 X where (n m) means “(n m) modulo N” − N − Another way to look at this is as the inner product of x, and Shiftn[Flip(h)], i.e., y(n)= x, Shift [Flip(h)] h n i 14 FFT Convolution

The convolution theorem h x H X shows us that there are two ways to perform∗ circular↔ · convolution. direct calculation of the summation = (N 2) • O frequency-domain approach = (NlgN) • O Fourier Transform both signals • Perform term by term multiplication of the • transformed signals Inverse transform the result to get back to the • time domain Remember ... this still gives us cyclic convolution Idea: If we add enough trailing zeros to the signals being convolved, we can get the same results as in acyclic convolution (in which the convolution summation goes from m =0 to ). ∞ Question: How many zeros do we need to add?

N N N

∗ =

Nx Nh Nx +Nh -1

15 If we perform an acyclic convolution of two signals, x • and h, with lengths Nx and Nh, the resulting signal is length N = N + N 1. y x h − Therefore, to implement acyclic convolution using the • DFT, we must add enough zeros to x and y so that the cyclic convolution result is length Ny or longer. If we don’t add enough zeros, some of our • convolution terms “wrap around” and add back upon others (due to modulo indexing). This can be called time domain aliasing. • We typically zero-pad even further (to the next power • of 2) so we can use the Cooley-Tukey FFT for maximum speed

A sampling-theorem based insight: Zero-padding in the time domain results in more samples (closer spacing) in the . This can be thought of as a higher ‘sampling rate’ in the frequency domain. If we have a high enough frequency-domain sampling rate, we can avoid time domain aliasing.

16 Example FFT Convolution

% matlab/fftconvexample.m x = [1 2 3 4 5 6]; h = [1 1 1]; nx = length(x); nh = length(h); nfft = 2^nextpow2(nx+nh-1) xzp = [x, zeros(1,nfft-nx)]; hzp = [h, zeros(1,nfft-nh)]; X = fft(xzp); H = fft(hzp);

Y = H .* X; y = real(ifft(Y))

Program output: octave:10> fftconvexample nfft = 8 y = 1 3 6 9121511 6

17 FFT Convolution vs. Direct Convolution

Let’s compare the number of operations needed to perform the convolution of 2 length N sequences:

It takes N 2 multiply/add operations to calculate • the convolution≈ summation directly. It takes on the order of N log(N) operations to • · compute an FFT. (Note: H(ωk) can be calculated in advance for time-invariant filtering.)

N FFT Direct Convolution 4 176 16 32 2560 1024 64 5888 4096 128 13,312 16,384 256 29,696 65,536 2048 311,296 4,194,304

In this example (from Strum and Kirk), the FFT (software) beats direct time-domain convolution at length 128 and higher

18 Correlation

The cross-correlation of x and y in CN is defined as: N 1 − (x⋆y)(n) =∆ x(m)y(n + m), x,y CN ∈ m=0 X Using this definition we have the correlation theorem: (x⋆y) X(ω )Y (ω ) ↔ k k The correlation theorem is often used in the context of spectral analysis of filtered noise signals.

Autocorrelation

The autocorrelation of a signal x CN is simply the cross-correlation of x with itself: ∈ N 1 − (x⋆x)(n) =∆ x(m)x(m + n), x CN ∈ m=0 X From the correlation theorem, we have (x⋆x) X(ω ) 2 ↔| k |

19 Power Theorem The inner product of two signals is defined as: x,y =∆ x y h i n n n X Using this notation, we have the following: 1 x,y = X,Y h i N h i

Parseval’s Theorem When we consider the inner product of a signal with itself, we have a special case known as Parseval’s Theorem: 1 X 2 x 2 = x,x = X,X = k k k k h i N h i N (Also called the Rayleigh’s Energy Theorem.)

Vector Cosine

∆ x,y X,Y cos(θ ) = h i = h i xy x y X Y k k·k k k k·k k

20 Stretch

We define the Stretch operator such that: Stretch : CN CNL L → Which means that it transforms a length N complex signal, into a length NL signal. Specifically, we do this by inserting L 1 zeros in between each pair of samples of the signal. −

x y

→ y = Stretch2(x)

......

21 Repeat or Scale

Similarly, the RepeatL operator, defined on the unit circle, frequency-scales its input spectrum by the factor L: ω Lω ← The original spectrum is repeated L times as ω traverses the unit circle. This is illustrated in the following diagram for L =3:

X Y

→ Y = REPEAT3(X)

ω ω

Using these definitions, we have the Stretch Theorem: Stretch (x) Repeat (X) L ↔ L Application: Upsampling by any integer factor L: Passing the stretched signal through an ideal lowpass filter cutting off at ω π/L yields ideal bandlimited interpolation of the original≥ signal by the factor L.

22 Zero-Padding Interpolation ↔

Zero padding in the time domain corresponds to ideal interpolation in the frequency domain. Proof: http://ccrma.stanford.edu/~jos/mdft/Zero Padding Theorem Spectral.html

Downsampling Aliasing ↔

The downsampling operation DownsampleM selects every M th sample of a signal:

∆ DownsampleM,n(x) = x(Mn) N N In the DFT case, DownsampleM maps C to CM , while for the DTFT, DownsampleM maps C∞ to C∞. The Aliasing Theorem states that downsampling in time corresponds to aliasing in the frequency domain: 1 Downsample (x) Alias (X) M ↔ M M where the Alias operator is defined for X CN ∈

23 (DFT case) as

M 1 − N N Alias (X) =∆ X l + k , l =0, 1,..., 1 M,l M M − Xk=0   For X C∞ (DTFT case), the Alias operator is ∈ M 1 ∆ − j( ω +k 2π ) Alias (X) = X e M M , π ω<π M,ω − ≤ k=0   MX1 ∆ − 1 k M = X WM z Xk=0   ∆ j2π/M where WM = e is a common notation for the primitive Mth root of unity, and z = ejω as usual. This normalization corresponds to T =1 after downsampling. Thus, T =1/M prior to downsampling. The summation terms above for k =0 are called aliasing components. 6 The aliasing theorem points out that in order to downsample by factor M without aliasing, we must first lowpass-filter the spectrum to [ πfs/M,πfs/M]. This filtering essentially zeroes out the− spectral regions which alias upon sampling.

24 Ideal Spectral Interpolation

Recall: X(ω) =∆ x,s h ωi where ∆ jωt sω(t) = e (FT) ∆ jωtn ∆ jωn sω(tn) = e = e (DTFT) For signals in the DTFT domain which happen to be time limited to n [ N/2,N/2 1], ∈ − − N/2 1 − ∆ ∞ jωn jωn X(ω) = x,s = x(n)e− = x(n)e− h ωi n= n= N/2 X−∞ X− This can be interpreted as a 0-centered DFT • evaluated at ω instead of ωk =2πk/N It arises as the DTFT of a finite-length signal • Same as DFT plus infinite zero padding • Such signals can be sampled at ω = ωk =2πk/N • without loss of information

25 Meaning of Spectral Interpolation

Let X(ω ) denote the spectrum to be interpolated. • k Then the corresponding time signal is • x = IDFTN (X). We define the spectral interpolation X(ω) as the • projection of our signal x onto an arbitrary sinusoid jωnT sω = e . This is equivalent to X(ω)= DTFT (x): • ω ∆ jωnT X(ω) = x,s = x(n)e− h ωi n = DTFT X0,x, 0,... ω{··· } FFT ZeroPad x ≈ ωk{ L{ }} for some sufficiently large zero-padding factor L. In the Quadratically Interpolated FFT (QIFFT) • method for measuring parameters of spectral peaks, we will choose L to be sufficient in conjunction with quadratic interpolation of spectral log magnitude samples at each peak

26 Interpolating a DFT

Starting with a sampled spectrum X(ωk), k =0, 1,...,N 1, we may interpolate ideally by taking the DTFT of the− zero-padded IDFT:

X(ω) = DTFTω(ZeroPad (IDFTN (X))) ∞ N/2 1 N 1 − ∆ 1 − jω n jωn = X(ω )e k e− N k n= N/2 " k=0 # X− X N 1 N/2 1 − − 1 j(ω ω)n = X(ω ) e k− k N  k=0 n= N/2 X X− N 1 −   = X(ω )asinc (ω ω ) k N − k k=0 X Sample Shift = X, ΩN ( ω(asincN )) = (X ⊛ asinc ) , N ω where ⊛ denotes convolution between a discrete (X) and continuous (asinc) signal. (If math operators adapt to their argument types like perl functions, we can simply use as usual.) ∗ Zero-padding in the time domain corresponds to • “asincN interpolation” in the frequency domain This is “ideal time-limited spectral interpolation” • 27 Practical Zero Padding

To interpolate a uniformly sampled spectrum X(ωk) by the factor L, we may take the inverse DFT, append zeros, and take the FFT (which is very fast):

X(ωl) = FFTLN,l(ZeroPadLN (IDFTN (X))), l =0, . . . , LN 1 − This operation creates L 1 new bins between each pair of original bins in X, thus− increasing the number of spectral samples around the unit circle from N to LN. In matlab, we can specify zero-padding by simply providing the optional FFT-size argument:

X = fft(x,N); % FFT size N > length(x)

28 Reasons for Zero Padding (Spectral Interpolation)

Zero-padding makes our FFTs look like DTFTs when • displaying spectra. Zero-padding enables us to use the FFT with any • window length M. When M is not a power of 2, we append enough zeros to make the FFT size N>M a power of 2. For sinusoidal peak-finding, spectral interpolation via • zero-padding gets us closer to the true maximum of the main lobe when we simply take the maximum-magnitude FFT-bin as our estimate.

29 Zero Padding Examples

Let’s look at the effect of zero padding on the Fourier transform of the popular (causal) Hamming window: 2πn w(n)=0.54 0.46 cos , n =0, 1, 2,...M 1 − M −   where M = 21 in our examples. We will look at shifts of the

critically sampled window transform W (ω ω ), and • k − 0 2 oversampled window transform W (ω ω ) • × k′ − 0 where ω0 =2π 3/M =2π/7 0.9 rad/samp is the normalized radian· frequency of≈ the test sinusoid to which the window is applied.

30 Critically Sampled Hamming Window Transform

Consider performing a length M DFT on a length M windowed signal:

N =∆ DFT size = M =∆ Window length • 2π DFT frequency samples at ωk = k M • (critically sampled DTFT) Window sequence and windowed-sinusoid spectrum: •

Causal Hamming window − M = 21 − no zero padding (a) 1

0.8

0.6

0.4 Amplitude

0.2

0 0 2 4 6 8 10 12 14 16 18 20 Time (samples)

(b)

−10

−20

−30

−40 Magnitude (dB)

−50

−60 0 0.5 1 1.5 2 2.5 3 Normalized Frequency (radians/sample)

DFT bin width = 2π = 2π (critically sampled) • N M 4 samples per main lobe (Hamming window) •

31 2X Oversampled Hamming Window Transform

Let’s now zero-pad by a factor of 2 in the time domain, before we perform our DFT:

Zero-padding factor L =∆ N =2 • M N = DFT size = 2M • 2π 2π DFT frequency samples at ω = k′ = k′ • k′ N 2M Causal zero-padding by a factor of two (L =2):

Causal Hamming window − M = 21 − zero padding factor = 2 (a) 1

0.8

0.6

0.4 Amplitude

0.2

0 0 5 10 15 20 25 30 35 40 45 Time (samples)

(b)

−10

−20

−30

−40 Magnitude (dB)

−50

−60 0 0.5 1 1.5 2 2.5 3 Normalized Frequency (radians/sample)

DFT bin width = 1 2π = 2π (2 oversampled) • L M 2M × 8 samples per main lobe (Hamming window) • 32 Oversampled Spectral Peaks

Note that zero-padding helps in finding the true peak of the sampled window transform.

zero pad factor = 8

1

0.8

0.6

0.4

Magnitude (linear) 0.2

0 0 0.5 1 1.5 2 2.5 3 Normalized Frequency (radians/sample)

33 Zero-Centered Zero-Padding

Blackman Windowed Sinusoid (a) 1

0.5

0 Amplitude

−0.5

−1 −15 −10 −5 0 5 10 15 Time (samples)

(b) 1

0.5

positive time negative time 0 Amplitude

−0.5

−1 0 10 20 30 40 50 60 Time (samples) (a) Blackman window overlaid with windowed data. (b) Zero-padded and loaded into FFT input buffer.

Use zero-centered zero padding with zero-phase • windows Use causal zero padding with causal windows •

34 Zero-Centered Spectra

(a) 8 positive frequencies negative frequencies 6

4

Magnitude (linear) 2

0 0 10 20 30 40 50 60 Frequency (bins))

(b) 8 negative frequencies positive frequencies 6

4

Magnitude (linear) 2

0 −30 −20 −10 0 10 20 30 Frequency (bins)) (a) FFT magnitude data, as returned by the FFT. (b) FFT magnitude spectrum “rotated” to a more “physical” frequency axis in bin numbers.

35 fftshift

Matlab and Octave have a simple utility called fftshift that performs this bin rotation. Consider the following example:

octave:4> fftshift([1 2 3 4]) ans = 3 4 1 2 octave:5>

Note that both Matlab and Octave regard the spectral sample at half the sampling rate as a negative frequency. For odd N, the only reasonable answer is

octave:4> fftshift([1 2 3]) ans = 3 1 2 octave:5> corresponding to frequencies f /3, 0,f /3, respectively. − s s

36