<<

and Downsampling

MUS421/EE367B Lecture 9 Multirate, Polyphase, and Wavelet Filter Banks For the DFT, we have the Stretch Theorem (Repeat Theorem) which relates upsampling (“stretch”) to Julius O. Smith III ([email protected]) , Scott Levine and Harvey Thornburg spectral copies (“images”) in the DFT context (length N Center for Computer Research in Music and Acoustics (CCRMA) and spectra). Department of Music, Stanford University Stanford, California 94305 We also have the Downsampling Theorem ( Theorem) for DFTs which relates downsampling to June 2, 2020 aliasing for finite length signals and spectra. We now look at these relationships in the DTFT case. Outline Thus, the length N is extended to infinity, and the Upsampling and Downsampling spectrum becomes defined continuously over the unit • circle in the z plane. Polyphase Filtering • Critically Sampled Perfect Banks • Perfect Reconstruction Filter Banks • Paraunitary (Orthogonal) Filter Banks • Multirate Analysis of Filter Banks • DFT Filter Banks • Cosine Modulated Filter Banks • Wavelet Filter Banks • 1 2

Upsampling (Stretch) Downsampling (Decimation)

Diagram: Diagram: • • xN y xN y

Basic Idea: To upsample by the integer factor N, Basic Idea: Take every N th sample. • • insert N 1 zeros between x[n] and x[n +1] for all n. Downsample − Time Domain: y = N (x), i.e., Time Domain: y = Stretch (x), i.e., • • N y[n]= x[Nn], n Z ∈ x[n/N],N divides n y[n]= Frequency Domain: Y = AliasN (X), i.e., ( 0, otherwise. • N 1 1 − 1 jm2π Frequency Domain: Y = Repeat (X), i.e., Y (z)= X z N e− N , z C N N ∈ • m=0 Y (z)= X(zN ), z C X   ∈ Thus, the frequency axis is expanded by factor N, Plugging in z = ejω, we see that the spectrum on wrapping N times around the unit circle and adding. • [ π,π) contracts by the factor N, and N images For N =2, two partial spectra are summed, as appear− around the unit circle. For N =2, this is indicated below: depicted below: 1/Ν

−π π −π π −π π −π π

3 4 Twiddle Factor Notation Proof of Downsampling/Aliasing Relationship 1 k Downsample Alias In FFT terminology, W denotes the kth “twiddle N (x) N (X) N ↔ N factor,” where WN is a primitive Nth root of unity: N 1 ∆ j2π/N − WN = e− . or x(nN) 1 X ej2πm/N z1/N ↔ N m=0 The aliasing expression can therefore be written as X   N 1 From the DFT case, we know this is true when x and X 1 − 1 jm2π Y (z) = X z N e− N , z C are each complex sequences of length N , in which case y N ∈ s m=0 and Y are length N /N. Thus, X   s N 1 1 − N 1 m 1/N 1 − 2π Ns = X(WN z ). x(nN) Y (ω N)= X ω + m , k 0, N ↔ k N k N ∈ N m=0 m=0 X X     where we have chosen to keep frequency samples ωk in terms of the original frequency axis prior to downsampling, i.e., ωk =2πk/Ns for both X and Y . This choice allows us to easily take the limit as N s →∞ by simply replacing ωk by ω: N 1 1 − 2π 2π x(nN) Y (ωN)= X ω + m , ω 0, ↔ N N ∈ N m=0 X     Replacing ω by ω′ = ωN and converting to z-transform notation X(z) instead of notation X(ω), with z = ejω′, yields the final result.

5 6

Example: Downsampling by 2 Filtering and Downsampling

As an example, when N =2, y[n]= x[2n], and Because downsampling by N will cause aliasing for any ∆ j2π/2 (since W2 = e− = 1) frequencies in the original signal above ω > π/N, the − input signal must first be lowpass filtered.| | 1 Y (z) = X W 0z1/2 + X W 1z1/2 2 2 2 h    i H(z) N 1 j2π0/2 1/2 j2π1/2 1/2 = X e− z + X e− z 2 1 h    i = X z1/2 + X z1/2 The lowpass filter h[n] is an FIR filter of length M with a 2 − cutoff frequency of π/N. Let’s draw the FIR filter h in 1 h    i = [Stretch (X)+ Stretch (Shift (X))] direct form: 2 2 2 π x[n] h(0) y[n] N

Example: Upsampling by 2 z-1 h(1)

z-1 When N =2, y =[x0, 0,x1, 0,...], and h(2) 2 Y (z)= X(z )= Repeat2(X)

-1 z h(M-1)

7 8 Note that we do not need N 1 out of every N Interpretation: • − • samples due to the N :1 downsampler. – serial to parallel conversion Commute the downsampler through the adders inside from a stream of scalar samples x[n] • the FIR filter: to a sequence of length M buffers every N samples, followed by x[n] h(0) y[n] N – a dot product of each buffer with h(0 : M 1) − z-1 h(1) For N = M, the overall system is equivalent to a N • round-robin demultiplexor, with a different gain h(m) z-1 for each output, followed by an M-sample summer h(2) N which adds the “de-interleaved” signals together:

x[n] h(0) -1 z h(M-1) N h(1) Length M y[n] running sum The multipliers are now running at 1/N times the • sampling frequency of the input signal, x[n]. This reduces the computation requirements by 1/N.

The downsampler outputs are called polyphase signals h(M 1) • − This is a summed polyphase filter bank in which each • “subphase filter” is a constant scale factor h(m).

9 10

Polyphase Processing (Anti-Aliasing Filter) Polyphase Filtering

In multirate , it is often fruitful to split a Subphase 0, • signal or filter into its polyphase components. x(nN) ∞ =[x ,x ,x ,...] |n=0 0 N 2N Let’s look at the case N =2: is scaled by h(0) Begin with the filter Subphase 1, • • ∞ n x(nN + 1) ∞ =[x1,xN+1,x2N+1,...] H(z)= h(n)z |n=0 − n= is scaled by h(1) X−∞ Separate the even and odd terms: • •··· Subphase m, ∞ 2n 1 ∞ 2n • H(z)= h(2n)z− + z− h(2n + 1)z− n= n= x(nN + m) ∞ =[x ,x ,x ,...] |n=0 m N+m 2N+m X−∞ X−∞ Define the polyphase component filters: is scaled by h(m). • ∞ n E0(z) = h(2n)z− n= X−∞ ∞ n E1(z) = h(2n + 1)z− n= X−∞ E0(z) and E1(z) are the polyphase components of the polyphase decomposition of H(z) for N =2.

11 12 Now write H(z) as the sum of the odd and even e0 • e1 terms: e2 2 1 2 -1-2-3-4-5-6 0 1 2 3 4 5 6 H(z)= E0(z )+ z− E1(z )

Example Polyphase Decomposition into 2 Channels The polyphase decomposition into N channels is given by N 1 − H(z)= z lE (zN ) As a simple example, consider − l l=0 1 2 3 X H(z)=1+2z− +3z− +4z− . where the subphase filters are Then the even and odd terms are, respectively, ∞ n 1 El(z)= el(n)z− , l =0, 1,...,N 1, E0(z) = 1+3z− − n= 1 X−∞ E1(z) = 2+4z− with And so H(z) can be written as the sum of the following ∆ el(n) = h(Nn + l). (lth subphase filter) two polyphase components: 2 2 The signal e (n) can be obtained by passing h(n) E0(z ) = 1+3z− l 1 2 1 3 through an advance of l samples, followed by z− E1(z )=2z− +4z− downsampling by the factor N: Polyphase Decomposition into N Channels h(n) el(n) zl N For the general case of arbitrary N, the basic idea is to decompose x[n] into its periodically interleaved subsequences:

13 14

Three-Channel Polyphase Decomposition and Filtering and Downsampling, Revisited Reconstruction As another example of polyphase filtering, we return to For N =3, we have the following system diagram: the previous example about downsampling and filtering.

1 3 E0(z) 3 1 This time,

E (z) 1 H(z) z3 1 3 z− H(z) Let the FIR lowpass filter h[n] be of length • Z 2 E (z) 2 M = LN, L z 3 2 3 z− ∈ The N polyphase filters, el[n], are each length L. Type II Polyphase Decomposition • N 1 N 2 N Recall, H(z)= E0(z )+ z− E1(z )+ z− E2(z )+ • (N 1) N The preceding polyphase decomposition of H(z) into N + z− − EN 1(z ): ··· − channels N 1 x[n] y[n] E (zN) N − l N 0 H(z)= z− El(z ) z-1 l=0 X E (zN) can be termed a “Type I” polyphase decomposition. 1 z-1 In the “Type II”, or reverse polyphase decomposition, the E (zN) powers of z progress in the opposite direction: 2 N 1 − (N l 1) N H(z)= z− − − Rl(z ). -1 l=0 z N X EM-1(z ) We will see later that we need Type I for analysis filters and Type II for synthesis filters in a “perfect reconstruction filter bank”.

15 16 Now commute the N :1 downsampler through the Multirate Noble Identities • adders and through the upsampled polyphase filters, N El(z ): Downsamplers and upsamplers are linear, time-varying

x[n] y[n] operators. Therefore, operation order is very important. N E0(z)

z-1 N N H(z) H(z ) N

N E1(z)

z-1 N H(z) N N H(z ) N E2(z)

Multirate noble identities

z-1

N EN-1(z) It is also important to note that adders or multipliers (any memoryless operators) can commute across Commuting the downsampler through the subphase downsamplers and upsamplers: N filters El(z ) to get El(z) is an example of a “multirate noble identity”. k N N k

k N k

17 18

Critically Sampled Perfect applications. We normally only need it when Reconstruction Filter Banks compression is a requirement. Two-Channel Critically Sampled Filter Banks A perfect reconstruction (PR) filter bank is any filter • bank whose reconstruction is the original signal, Let’s begin with a simple two-channel case, with lowpass possibly delayed, and possibly scaled by a constant. analysis filter H0(z), highpass analysis filter H1(z), lowpass synthesis filter F (z), and highpass synthesis In this context, critical sampling (also called 0 filter F (z): • “maximal downsampling”) means that the 1

downsampling factor is the same as the number of x0(n) v0(n) y0(n) H (z) 2 2 F (z) filter channels. For the STFT, this implies 0 0 x(n) x(n) R = M = N (with M > N for Portnoff windows). x (n) y (n) 1 v1(n) 1 The short-Time Fourier transform (STFT) is a PR H1(z) 2 2 F1(z) • filter bank whenever the constant-overlap-add (COLA) condition is met by the analysis window w The outputs of the two analysis filters are then

and the hop size R. However, only the rectangular Xk(z)= Hk(z)X(z), k =0, 1 window case with no zero-padding is critically sampled After downsampling, the signals become (OLA hop size = FBS downsampling factor = N). 1 1/2 1/2 Advanced audio compression algorithms (“perceptual Vk(z)= Xk(z )+ Xk( z ) , k =0, 1 • 2 − audio coding”) are based on critically sampled filter After upsampling,h the signals become i banks, for obvious reasons. 2 1 Yk(z)= Vk(z ) = [Xk(z)+ Xk( z)] Important Point: We normally do not require 2 − • critical sampling for audio analysis, effects, and music 1 = [H (z)X(z)+ H ( z)X( z)], k =0, 1 2 k k − − 19 20 After substitutions and rearranging, the output xˆ is a For perfect reconstruction, we additionally need filtered replica plus an aliasing term: c = H0(z)F0(z)+ H1(z)F1(z) ˆ 1 X(z) = [H0(z)F0(z)+ H1(z)F1(z)]X(z) (Filtering Cancellation Constraint) (3) 2 1 where c = Ae jωD is any constant A> 0 times a + [H ( z)F (z)+ H ( z)F (z)]X( z) − 2 0 − 0 1 − 1 − linear-phase term corresponding to D samples of delay. ( Reconstruction) (1) Choosing F0 and F1 to cancel aliasing, We require the second term (the aliasing term) to be c = H (z)H ( z) H (z)H ( z) 0 1 − − 1 0 − zero for perfect reconstruction. This is arranged if we set (Filtering and Aliasing Cancellation) (4) F0(z) = H1( z) − Perfect reconstruction thus also imposes a constraint on F (z) = H ( z) 1 − 0 − the analysis filters, which is of course true for any (Aliasing Cancellation Constraints) (2) band-splitting filter bank. Thus, Let H˜ denote H( z). Then both constraints can be expressed in matrix− form as The synthesis lowpass filter F (z) is the rotation by π • 0 of the analysis highpass filter H (z) on the unit circle. H0 H1 F c 1 0 = If H (z) is highpass, cutting off at ω = π/2, then H˜ H˜ F 0 1  0 1  1    F0(z) will be lowpass, cutting off at π/2. The synthesis highpass filter F (z) is the negative of • 1 the π-rotation of the analysis lowpass filter H0(z). Note that aliasing is completely canceled by this choice of synthesis filters F0,F1, for any choice of analysis filters H0,H1.

21 22

Amplitude-Complementary 2-Channel Filter Bank Thus, the lowpass-filter impulse response can be • anything of the form Perhaps the most natural choice of analysis filters for our h =[h (0), h (1),h (2), 0,h (4), 0,h (6), 0,...] two-channel, critically sampled filter bank, is an 0 0 0 0 0 0 amplitude-complementary lowpass/highpass pair, i.e., or H (z)=1 H (z) h =[h (0), 0,h (2), h (3),h (4), 0,h (6), 0,...] 1 − 0 0 0 0 0 0 0 where we impose the unity dc gain constraint H0(1) = 1. etc. Amplitude-complementary thus means constant The corresponding highpass-filter impulse response is overlap-add (COLA) on the unit circle in the z plane. • h1(n)= δ(n) h0(n). Plugging the COLA constraint into the Filtering and − Aliasing Cancellation constraint (4) gives The first example above corresponds to the • c = H (z)[1 H ( z)] [1 H (z)]H ( z) highpass-filter 0 − 0 − − − 0 0 − = H (z) H ( z) h1 = [1 h0(0), h0(1), h0(2), 0, h0(4), 0, h0(6), 0,...] 0 − 0 − ←→ − − − − − Aδ(n D) = h (n) ( 1)nh (n) − 0 − − 0 etc. 0, n even = The above class of amplitude-complementary filters can ( 2h0(n), n odd be characterized as follows: Even-indexed terms of the impulse response are 2 o • H0(z) = E0(z )+ h0(o)z− , E0(1) + h0(o)=1, o odd unconstrained, since they subtract out in the 2 o H (z)=1 H (z)=1 E (z ) h (o)z− constraint. 1 − 0 − 0 − 0 For perfect reconstruction, exactly one odd-indexed In summary, we have shown that an • term must be nonzero in the lowpass impulse amplitude-complementary lowpass/highpass analysis filter response h (n). The simplest choice is h (1) =0. pair yields perfect reconstruction (aliasing and filtering 0 0 6 23 24 cancellation) when there is exactly one odd-indexed term Haar Example in the impulse response of h0(n). Before we leave this case (amplitude-complementary, Problem: two-channel, critically sampled, perfect reconstruction filter banks), let’s see what happens when H (z) is the E (z2) repeats twice around the unit circle. 0 • 0 simplest possible lowpass filter having unity dc gain, i.e., Since we assume real coefficients, the frequency • j2ω 1 1 1 response, E (e ) is magnitude-symmetric about H (z)= + z− 0 0 2 2 ω = π/2 as well as π. 2 This case is obtained above by setting E0(z )=1/2, This is not good since we only have one degree of o =1, and h (1) = 1/2. • o 0 freedom, h0(o)z− , with which we can break the π/2 symmetry to reduce the high-frequency gain and/or The polyphase components of H0(z) are clearly 2 2 boost the low-frequency gain. E0(z )= E1(z )=1/2. In other words, this class of filters cannot be expected Choosing H1(z)=1 H0(z) and choosing F0(z) and • to give us high quality lowpass or highpass behavior. − F1(z) for aliasing cancellation, the four filters become

To enable the use of high-quality lowpass and highpass 1 1 1 2 1 2 H0(z) = + z− = E0(z )+ z− E1(z ) channel filters, we must relax the 2 2 amplitude-complementary constraint (and/or filtering 1 1 1 2 1 2 H (z)==1 H (z)= z− = E (z ) z− E (z ) cancellation and/or aliasing cancellation) and find 1 − 0 2 − 2 0 − 1 another approach. 1 1 1 F (z) = H ( z)= + z− = H (z) 0 1 − 2 2 0 1 1 1 F (z) = H ( z)= + z− = H (z) 1 − 0 − −2 2 − 1

25 26

Thus, both the analysis and reconstruction filter banks Polyphase Haar Example are scalings of the familiar Haar filters (“sum and 1 difference” filters (1 z− )/√2). ± Let’s look at the polyphase representation for this The frequency responses are example. Starting with the filter bank and its reconstruction, jω jω 1 1 jω j ω ω H (e ) = F (e )= + e− = e− 2 cos 0 0 2 2 2 x0(n) v0(n) y0(n)   H0(z) 2 2 F0(z) jω jω 1 1 jω j ω ω H (e ) = F (e )= e− = je− 2 sin x(n) 1 − 0 2 − 2 2 x(n) x (n) y (n)   1 v1(n) 1 which are plotted below: H1(z) 2 2 F1(z)

Two−Channel Haar Filter Bank Frequency Response 1 the polyphase decomposition of H0(z) is 0.9

|H (ω T)|=cos(ω T/2) 1 1 0.8 0 2 1 2 1 |H (ω T)|=sin(ω T/2) 1 H0(z)= E0(z )+ z− E1(z )= + z− 0.7 2 2 2 2 0.6 Thus, E0(z )= E1(z )=1/2, and therefore

0.5 1 H1(z)=1 H0(z)= E0(z) z− E1(z)

Magnitude (linear) 0.4 − −

0.3 We may derive polyphase synthesis filters as follows:

0.2 Xˆ (z)=[F0(z)H0(z)+ F1(z)H1(z)] X(z) 0.1 1 1 1 1 1 1 = + z− H0(z)+ + z− H1(z) 0 0 0.5 1 1.5 2 2.5 3 2 2 −2 2 Normalized Frequency (rad/sample)      1 1 = [H (z) H (z)] + z− [H (z)+ H (z)] 2 0 − 1 0 1  27 28 The polyphase representation of the filter bank and its Quadrature Mirror Filterbanks (QMF) reconstruction can now be drawn as below: The well studied subject of Quadrature Mirror Filters

x0(n) (QMF) is entered by imposing the following symmetry x(n) 2 2 E0(z ) 2 2 E1(z ) −1 ↓ ↑ −1 z z constraint on the analysis filters: x1(n) 2 2 E1(z ) 2 2 E0(z ) xˆ(n) H (z)= H ( z) (QMF Symmetry Constraint) (5) - ↓ ↑ - 1 0 − That is, the filter for channel 1 is constrained to be a Notice that the reconstruction filter bank is formally the π-rotation of filter 0 along the unit circle. In the time transpose of the analysis filter bank. domain, h (n)=( 1)nh (n), i.e., all odd-index 1 − 0 Commuting the downsamplers (by the noble identities), coefficients are negated. we obtain Two-channel QMFs have been around since at least 1976 (see Croisier et al. in Music 421 Citations), and appear to x(n) be the first critically sampled perfect reconstruction filter 2 E0(z) E1(z) 2 −1 ↓ ↑ −1 z z banks.

2 E1(z) E0(z) 2 xˆ(n) ↓ - - ↑ If H0 is a lowpass filter cutting off near ω = π/2 (as is typical), then H1 is a complementary highpass filter. The Since E0(z)= E1(z)=1/2, this is simply the OLA form exact cut-off frequency can be adjusted along with the of an STFT filter bank for N =2, with roll-off rate to to provide a maximally constant N = M = R =2, and rectangular window frequency-response sum. w = [1/2, 1/2]. That is, the DFT size, window length, Historically, the term QMF applied only to two-channel and hop size are all 2, and both the DFT and its inverse filter banks having the QMF symmetry constraint (5). are simply sum-and-difference operations. Today, the term “QMF filter bank” may refer to more

29 30

general PR filter banks with any number of channels and On the other hand, very high quality IIR solutions are not obeying (5). • possible. See Vaidyanathan for details (pp. 201–204). Combining the QMF symmetry constraint with the In practice, approximate “pseudo QMF” filters are • aliasing-cancellation constraints, given by common, which only give approximate perfect reconstruction. We’ll return to this topic later. F (z) = H ( z)= H (z) 0 1 − 0 The Haar filters, which we saw gave perfect F1(z) = H0( z)= H1(z), − − − reconstruction in the amplitude-complementary case, are the perfect reconstruction requirement reduces to also examples of a QMF filter bank: 2 2 1 constant = H0(z)F0(z)+ H1(z)F1(z)= H (z) H ( z) H (z) = 1+ z 0 − 0 − 0 − (QMF Perfect Reconstruction Constraint) (6) 1 H (z)=1 z− 1 − Now, all four filters are determined by H (z). 0 In this example, c0 = c1 =1, and n0 = n1 =0. It is easy to show using the polyphase representation of Linear Phase Quadrature Mirror Filter Banks H0(z) (see Vaidyanathan) that the only causal FIR QMF analysis filters yielding exact perfect reconstruction are two-tap FIR filters of the form It is generally desirable to use linear phase filters whenever possible in audio work. This is because linear 2n0 (2n1+1) H0(z) = c0z− + c1z− phase filters delay all frequencies by equal amounts. 2n (2n +1) H (z) = c z− 0 c z− 1 1 0 − 1 A filter phase response is linear in ω whenever its impulse where c0 and c1 are constants, and n0 and n1 are integers. response h0(n) is symmetric, i.e.,

h0(L n)= h0(n) Only weak channel filters are available in the QMF − • in which case the frequency response can be expressed as case (H1(z)= H0( z)), as we saw in the − jω jωN/2 jω amplitude-complementary case. H0(e )= e− H0(e ) 31 32

Substituting this into the QMF perfect reconstruction Conjugate Quadrature Filters (CQF) constraint (6) gives 2 jωN jω 2 N j(π ω) A class of causal, FIR, two-channel, criticially sampled, constant = e− H (e ) ( 1) H (e − 0 − − 0 exact perfect-reconstruction filter-banks is the set of   so-called Conjugate Quadrature Filters (CQF). When N is even, the right hand side of the above equation is forced to zero at ω = π/2. Therefore, we will In the z-domain, the CQF relationships are only consider odd N, for which the perfect reconstruction (L 1) 1 H (z)= z− − H ( z− ) constraint reduces to 1 0 − 2 In the time domain, the analysis and synthesis filters are jωN jω 2 j(π ω) constant = e− H0(e ) + H0(e − given by   n We see that perfect reconstruction is obtained in the h1[n] = ( 1) h0[L 1 n] − − − − linear-phase case whenever the analysis filters are power f0[n] = h0[L 1 n] complementary. Since FIR QMF filters are constrained to − − f [n] = ( 1)nh (n)= h (L 1 n) the two-tap case, this is best accomplished using IIR 1 − − 0 − 1 − − filters. See Vaidyanathan for details. That is, f0 = Flip(h0) for the lowpass channel, and the highpass channel filters are a modulation of their lowpass counterparts by ( 1)n. − Again, all four analysis and synthesis filters are • determined by the lowpass analysis filter H0(z). It can be shown that this is an orthogonal filter bank. •

33 34

The analysis filters H0(z) and H1(z) are power With the CQF constraints, (1) reduces to • complementary, i.e., 1 1 1 Xˆ (z)= [H (z)H (z− )+ H ( z)H ( z− )]X(z) (7) jω 2 jω 2 0 0 0 0 H0e + H1e =1 (Power Complementary) 2 − −

or Let P (z)= H0(z)H0( z), such that H0(z) is a spectral factor of the half-band− filter P (z) (i.e., P (ejω) is a H˜ (z)H (z)+H˜ (z)H (z)=1 (Power Complementary) 0 0 1 1 nonnegative power response which is lowpass, cutting off ∆ 1 near ω = π/4). Then, (7) reduces to where H˜0(z) = H0(z− ) denotes the paraconjugate of H0(z) (for real filters H0). The paraconjugate is ˆ 1 (L 1) jω X(z)= [P (z)+ P ( z)]X(z)= z− − X(z) (8) the analytic continuation of H0(e ) from the unit 2 − − circle to the z plane. The problem of the PR filter design has been reduced Moreover, the analysis filters H0(z) are power • to designing one half-band filter, P (z). • symmetric, e.g., It can be shown that any half-band filter can be H˜ (z)H (z)+H˜ ( z)H ( z) (Power Symmetric) • 0 0 0 − 0 − written in the form p[2n]= δ[n]. That is, all non-zero even-idexed values of p[n] are set to zero. The power symmetric case was introduced by Smith • and Barnwell in 1984. A simple design of an FIR half-band filter would be to window a : sin[πn/2] p[n]= w[n] (9) πn/2 where w[n] is any suitable window, such as the Kaiser window.

35 36 Note that as a result of (7), the CQF filters are power Orthogonal Two-Channel Filter Banks • complementary. That is, they satisfy: Recall the reconstruction equation for the two-channel, jω 2 jω 2 critically sampled, perfect-reconstruction filter-bank: H0(e ) + H1(e ) =2 1 Xˆ (z) = [H0(z)F0(z)+ H1(z)F1(z)]X(z) Also note that the filters H 0 and H 1 are not linear 2 • 1 phase. + [H ( z)F (z)+ H ( z)F (z)]X( z) 2 0 − 0 1 − 1 − It can be shown that there are no two-channel perfect This can be written in matrix form as • reconstruction filter banks that have all three of the 1 F (z) T H (z) H ( z) X(z) following characteristics (except for the Haar filters): Xˆ (z)= 0 0 0 − 2 F (z) H (z) H ( z) X( z) – FIR  1   1 1 −  −  where the above 2 2 matrix, H (z), is called the alias – orthogonality m component matrix ×(or analysis modulation matrix). If – linear phase H˜ m(z)Hm(z)=2I In this design procedure, we have chosen to satisfy ˜ ∆ T 1 the first two and give up the third. where Hm(z) = Hm(z− ) denotes the paraconjugate of H (z), then the alias component (AC) matrix is lossless, By relaxing “orthogonality” to “biorthogonality”, it m and the (real) filter bank is orthogonal. • becomes possible to obtain FIR linear phase filters in a critically sampled, perfect reconstruction filter bank. It turns out orthogonal filter banks give perfect (See later section on Wavelet filter banks.) reconstruction filter banks for any number of channels. Orthogonal filter banks are also called paraunitary filter banks, which we’ll study shortly in polyphase form. The AC matrix is paraunitary if and only if the polyphase matrix is paraunitary. (See Vaidyanathan.)

37 38

Perfect Reconstruction Filter Banks The next step is to expand each analysis filter Hk(z) into its N-channel “Type 1” polyphase representation: N 1 We now consider filter banks with an arbitrary number of − l N channels, and ask under what conditions do we obtain a Hk(z)= z− Ekl(z ) l=0 perfect reconstruction filter bank? or X H (z) N N N 1 Polyphase analysis will give us the answer readily. 0 E0,0(z ) E0,1(z ) E0,N 1(z ) N N ··· − N 1 H1(z) E1,0(z ) E1,1(z ) E1,N 1(z ) z−  .  =  . . ··· −.   .  Let’s begin with the N-channel filter bank below: . . . . .    N N ··· N   (N 1)   HN 1(z)   EN 1,0(z ) EN 1,1(z ) EN 1,N 1(z )   z− −   −   − − ··· − −    N x(n)  h(z)   E(z )   e(z)  H0(z) R R F0(z) ↓ ↑ |which{z we} can| write as {z } | {z } N H1(z) R R F1(z) ↓ ↑ h(z)= E(z )e(z). Similarly, expand the synthesis filters in a Type II polyphase decomposition: HN−1(z) R R FN−1(z) xˆ(n) ↓ ↑ N 1 − (N l 1) N Fk(z)= z− − − Rlk(z ) The downsampling factor is R N. l=0 • ≤ or X T T For critical sampling, R = N. (N 1) N N N F0(z) z− − R0,0(z ) R0,1(z ) R0,N 1(z ) • (N 2) N N ··· − N F1(z) z− − R1,0(z ) R1,1(z ) R1,N 1(z )  .  =  .   . . ··· −.  . . . . .      N N ··· N   FN 1(z)   1   RN 1,0(z ) RN 1,1(z ) RN 1,N 1(z )   −     − − ··· − −  N  f T (z)   e˜(z)   R(z )  |which{z we} can| write{z as} | {z } f T (z)= e˜(z)R(zN ).

39 40 The polyphase representation can now be depicted as We call E(z) the polyphase matrix. As we will show below, the above simplification can be carried out more generally whenever R divides N (e.g., x(n) R R N/R 1 ↓ ↑ 1 z− z− R = N/2,..., 1). In these cases E(z) becomes E(z ) and R(z) becomes R(zN/R). R R 1 ↓ ↑ z− z 1 N N − E(z ) Rb(z )

1 1 z− z−

R R xˆ(n) ↓ ↑

When R = N, commuting the up/downsamplers gives

x(n) N N 1 ↓ ↑ 1 z− z−

N N ↓ ↑ z 1 − 1 z− E(z) Rb(z)

1 1 z− z−

N N xˆ(n) ↓ ↑

41 42

Simple Examples of Perfect Reconstruction At time 0, the downsamplers and upsamplers “fire”, • transferring x(0) (and N 1 zeros) from the delay − If we can arrange to have line to the output delay chain, summing with zeros. Over the next N 1 clocks, x(0) makes its way R(z)E(z)= I N • toward the output,− and zeros fill in behind it in the then the filter bank will reduce to the simple system output delay chain. below: Simultaneously, the input “buffer” (delay line) is • being filled with samples of x(n). x(n) N N 1 ↓ ↑ z 1 z− − At time N 1, x(0) makes it to the output. • − N N ↓ ↑ At time N, the downsamplers fire again, transferring z 1 − 1 • N z− a length N buffer [x(1 : N)] to the upsamplers. On the same clock pulse, the upsamplers also fire, • 1 1 transferring N samples to the output delay chain. z− z− The bottom-most sample [x(n N +1) = x(1)] goes N N xˆ(n) ↓ ↑ • out immediately at time N. − Over the next N 1 sample clocks, the length N 1 When R = N, we have a simple parallelizer/serializer (or • − − de-multiplexor/multiplexor, or output buffer will be “drained” and refilled by zeros. de-interleaver/re-interleaver), which is Simultaneously, the input buffer will be replaced by perfect-reconstruction by inspection: • new samples of x(n). At time 2N, the downsamplers and upsamplers fire, Think of the input samples x(n) as “filling” a length • • N 1 delay line over N 1 sample clocks. and the process goes on, repeating with period N. − − 43 44 The output of the N-way parellelizer/serializer is Thus, when R =1, the output is therefore xˆ(n)= Nx(n N + 1) xˆ(n)= x(n N + 1) − − and we have perfect reconstruction. and we again have perfect reconstruction. Sliding Polyphase Filter Bank Hopping Polyphase Filter Bank

When R =1, there is no downsampling or upsampling, When 1 < R < N and R divides N, we have, by a and the system further reduces to the case below: similar analysis, x(n) N 1 z 1 z− − xˆ(n)= x(n N + 1) R − which is again perfect reconstruction. z 1 − 1 N z− Note the built-in overlap-add when R < N.

1 1 Sufficient Condition for Perfect Reconstruction z− z−

xˆ(n) Above, we found that, for any integer 1 R N which divides N, a sufficient condition for perfect≤ ≤ Working backward along the output delay chain, the reconstruction is output sum can be written as ∆ 0 (N 1) 1 (N 2) 2 (N 3) P(z) = R(z)E(z)= IN Xˆ (z) = z− z− − + z− z− − + z− z− − + ··· h (N 2) 1 0 (N 1) and the output signal is then +z− − z− + z− z− − X(z) N (N 1) xˆ(n)= x(n N + 1) = Nz− − X(z) i R − 45 46

More generally, we allow any nonzero scaling and any Necessary and Sufficient Conditions for Perfect additional delay: Reconstruction

∆ K P(z) = R(z)E(z)= cz− IN (Perfect Reconstruction Constraint) (10) It can be shown (see Vaidyanathan ’93) that the most general conditions for perfect reconstruction are that where c =0 is any constant and K is any nonnegative 1 6 K 0(N L) L z− IN L integer. In this case, the output signal is R(z)E(z)= cz− − × − IL 0L (N L) N  × −  xˆ(n)= c x(n N +1 K) R − − for some constant c and some integer K 0, where L is any integer between 0 and N 1. ≥ Thus, given any polyphase matrix E(z), we can attempt − 1 to compute R(z)= E− (z): Note that the more general form of R(z)E(z) above can be regarded as a (non-unique) square root of a vector If it is stable, we can use it to build a unit delay, since • perfect-reconstruction filter bank. 1 2 0(N L) L z− IN L 1 − × − = z− IN . However, if E(z) is FIR, R(z) will typically be IIR. IL 0L (N L) •  × −  In the next section, we will look at paraunitary filter • banks, for which R(z) is FIR and paraunitary Thus, the general case is the same thing as K whenever E(z) is. R(z)E(z)= cz− IN . except for some channel swapping and an extra sample of delay in some channels.

47 48 Polyphase View of the STFT Looking again at the polyphase representation of the

N-channel filter bank with hop size R, E(z)= WN∗ , As a familiar special case, set R(z)= WN , R dividing N, we have

E(z)= WN∗ where WN∗ is the DFT matrix: x(n) R R ↓ ↑ 1 z 1 z− j2πkn/N − WN∗ [kn]= e− . R R 1 ↓ ↑ The inverse of this polyphaseh matrix isi then simply the z− z 1 inverse DFT matrix: DFT IDFT − 1 R(z)= WN N 1 1 z− z− We see that the STFT can be seen as the simple special R R xˆ(n) case of a perfect reconstruction filter bank for which the ↓ ↑ polyphase matrix is constant. It is also unitary when Thus, E(z)= WN∗ /√N and R(z)= WN /√N. The channel analysis and synthesis filters are, respectively, The polyphase representation is an overlap-add representation k Hk(z) = H0(zWN ) Our analysis showed that the STFT using a • k rectangular window is a perfect reconstruction filter Fk(z) = F0(zW − ) N bank for all integer hop sizes in the set ∆ j2π/N where WN = e− , as usual, and R N,N/2,N/3, . . . , N/N . N 1 ∈ { } − n The same type of analysis can be applied to the F (z)= H (z)= z− [1, 1,..., 1] 0 0 ↔ • STFT using the other windows we’ve studied, n=0 X corresponding to the rectangular window. including Portnoff windows.

49 50

Example: Polyphase Analysis of the STFT with Example: Polyphase Analysis of the Weighted 50% Overlap, Zero-Padding, and a Overlap Add Case: 50% Overlap, Zero-Padding, Non-Rectangular Window and a Non-Rectangular Window

The figure below illustrates how a window and a hop size We may convert the previous example to a weighted other than N can be introduced: overlap-add (WOLA) filter bank by replacing each w(m) by w(m) and introducing these gains also between the w(0) IDFT and upsamplers: x(n) M M p 2 2 1 ↓ ↑ z 1 z− − w(0) w(0) M M w(1) x(n) 2 2 −1 ↓ p ↑ p −1 M M z z 2 2 1 ↓ ↑ w(1) w(1) z− M M 1 2 2 z− −1 ↓ p ↑ p DFT IDFT z N N z−1 w(M 1) DFT IDFT − w(M 1) N N w(M 1) 0 − − 0 0 1 p p 1 z −1 z− − z−1 z 0 0 0 M M M M xˆ(n) 2 2 xˆ(n) ↓ 2 ↑ 2 ↓ ↑

The constant-overlap-add of the window w(n) is • implemented in the synthesis delay chain (which is technically the transpose of a tapped delay line). The downsampling factor and window must be • selected together to give constant overlap-add, independent of the choice of polyphase matrices E(z) and R(z) (shown here as the DFT and IDFT).

51 52 Paraunitary Filter Banks Lossless Filters

Paraunitary filter banks form an interesting subset of To motivate the idea of paraunitary filters, let’s first perfect reconstruction (PR) filter banks: review some properties of lossless filters, progressing from the simplest cases up to paraunitary filter banks: We saw above that we get a PR filter bank whenever A linear, time-invariant filter H(z) is said to be • the analysis polyphase matrix E(z) times the • lossless (or allpass) if it preserves signal energy. That synthesis polyphase matrix R(z) is the identity is, if the input signal is x(n), and the output signal is matrix, i.e., when y(n)=(h x)(n), then we have ∆ ∗ P(z) = E(z)R(z)= I ∞ ∞ y(n) 2 = x(n) 2 | | | | In particular, if R(z) is the paraconjugate of E(z), n= n= • we say the filter bank is paraunitary. X−∞ X−∞ In terms of the L2 signal norm 2, this can be expressed more succinctly as k · k So what is paraconjugation? y 2 = x 2 The short answer is that it is the generalization of the k k2 k k2 complex conjugate transpose operation from the unit Notice that only stable filters can be lossless since, circle to the entire z plane. A paraunitary filter bank is • otherwise, y = . We further assume all filters therefore a generalization of an orthogonal filter bank. are causal fork simplicity.k ∞ Recall that an orthogonal filter bank is one in which It is straightforward to show that losslessness implies E(ejω) is an orthogonal (or unitary) matrix, to within a • jω H(ejω) =1, ω. constant scale factor, and R(e ) is its transpose (or ∀ Hermitian transpose). That is, the frequency response must have magnitude 1 everywhere on the unit circle in the z plane.

53 54

Another way to express this is to write A causal, stable, filter H(z) is allpass if and only if • jω H(ejω)H(e )=1, ω, H˜ (z)H(z)=1 ∀ ˜ and this form generalizes to H(z)H(z) over the Note that this is equivalent to the previous result on entire the z plane. the unit circle since The paraconjugate of a transfer function may be H˜ (ejω)H(ejω)= H(1/ejω)H(ejω)= H(ejω)H(ejω) • defined as the analytic continuation of the complex conjugate from the unit circle to the whole z plane: To generalize lossless filters to the multi-input, • ∆ 1 multi-output (MIMO) case, we must generalize H˜ (z) = H(z− ) conjugation to MIMO transfer function matrices: where H(z) denotes complex conjugation of the – A p q transfer function matrix H(z) is lossless if coefficients only of H(z) and not the powers of z. × 1 it is stable and its frequency-response matrix For example, if H(z)=1+ jz− , then jω 1 H(e ) is unitary. That is, H(z)=1 jz− . We can write, for example, − jω jω ∆ H (e )H(e )= I H(z) = H (z) ∗ q for all ω, where Iq denotes the q q identity in which the conjugation of z serves to cancel the jω × outer conjugation. matrix, and H∗(e ) denotes the Hermitian transpose (complex-conjugate transpose) of We refrain from conjugating z in the definition of the H(ejω): paraconjugate becase z is not analytic in the jω ∆ T jω complex-variables sense. Instead, we invert z, which H∗(e ) = H (e ) jω jω is analytic, and which reduces to complex conjugation – Note that H∗(e )H(e ) is a q q matrix on the unit circle. product of a q p times a p q×matrix. If q>p, × × The paraconjugate may be used to characterize then the rank must be deficient. Therefore, we allpass filters as follows: must have p q. (There must be at least as ≥ 55 56 many outputs as there are inputs, but it’s ok to Every finite-order, single-input, single-output (SISO), have extra outputs.) • lossless IIR filter (recursive allpass filter) can be – A lossless p q transfer function matrix H(z) is written as × paraunitary, i.e., N ˜ jφ K z− A(z) H(z)= e z− H˜ (z)H(z)= Iq A(z) Thus, every paraunitary matrix transfer function is where K 0, ≥ 1 2 N unitary on the unit circle for all ω. Away from the A(z)=1+ a1z− + a2z− + + aN z− , and ∆ 1 ··· unit circle, paraunitary H(z) is the unique analytic A˜(z) = A(z− ). The polynomial A˜(z) can be continuation of unitary H(ejω). obtained by reversing the order of the coefficients in A(z), conjugating them, and multiplying by zN . (The Lossless Filter Examples N factor z− above serves to restore negative powers of z and hence causality.) Such filters are generally The simplest lossless filter is a unit-modulus gain called allpass filters. • H(z)= ejφ The normalized DFT matrix is an N N order zero • × where φ can be any phase value. In the real case φ paraunitary transformation. This is because the normalized DFT matrix, can only be 0 or π, hence H(z)= 1. nk ± W =[WN ]/√N, n,k =0,...,N 1, where A lossless FIR filter can only consist of a single ∆ − W = e j2π/N , is a unitary matrix: • nonzero tap: N − jφ K H(z)= e z− W∗ W = IN for some fixed integer K, where φ is again some √N √N constant phase, constrained to be 0 or π in the real-filter case. We consider only causal filters here, so K 0. ≥ 57 58

Properties of Paraunitary Systems More generally, we allow paraunitary filter banks to scale and/or delay the input signal: K Paraunitary systems are essentially multi-input, H˜ (z)H(z)= cKz− multi-output (MIMO) allpass filters. Let H(z) denote the p q matrix transfer function of a paraunitary system. where K is some nonnegative integer and cK =0. × 6 Some of its properties include We can note the following properties of paraunitary filter banks: In the square case (p = q), the matrix determinant, • det[H(z)], is an allpass filter. The synthesis filter bank is simply the paraconjugate • Therefore, if a square H(z) contains FIR elements, its of the analysis filter bank: • K determinant is a simple delay: det[H(z)] = z− for F(z)= H˜ (z) some integer K. That is, since the paraconjugate is the inverse of a Properties of Paraunitary Filter Banks paraunitary filter matrix, it is exactly what we need for perfect reconstruction. An N-channel analysis filter bank can be viewed as an The channel filters H (z) are power complementary: • k N 1 MIMO filter jω 2 jω 2 jω 2 × H1(e ) + H2(e ) + + HN (e ) =1 H1(z) ··· This follows immediately from looking at the H2(z) H(z)=   paraunitary property on the unit circle. .   When H(z) is FIR, the corresponding synthesis filter  HN (z)    • ˜   matrix H(z) is also FIR. A paraunitary filter bank must therefore obey When H(z) is FIR, each synthesis filter, • H˜ (z)H(z)=1 Fk(z)= H˜ k(z), k =1,...,N, is simply the Flip of

59 60 its corresponding analysis filter Hk(z)= Hk(z): determinant. For example, 1+ z3 z2 fk(n)= hk(L n) U(z)= − z 1 where L is the filter length. (When the filter   coefficients are complex, Flip includes a complex is unimodular. See Vaidyanathan for further details conjugation as well.) (p. 663). This follows from the fact that paraconjugating an Paraunitary Examples FIR filter amounts to simply flipping (and conjugating) its coefficients. Consider the Haar filter bank discussed previously, for Note that only trivial FIR filters can be paraunitary in which 1 the single-input, single-output (SISO) case. In the 1 1+ z− H(z)= 1 MIMO case, on the other hand, paraunitary systems √2 1 z−  −  can be composed of FIR filters of any order. The paraconjugate of H(z) is FIR analysis and synthesis filters in paraunitary filter 1 • H˜ (z)= 1+ z 1 z banks have the same amplitude response. √2 − This follows from the fact that Flip(h) H, i.e., so that   ↔ 1 flipping an FIR filter impulse response h(n) ˜ 1+ z− H(z)H(z)= 1+ z 1 z 1 =1 conjugates the frequency response, which does not − 1 z− jω  −  affect its amplitude response H(e ) .   | | Thus, the Haar filter bank is paraunitary. This is true for The polyphase matrix E(z) for any FIR paraunitary any power-complementary filter bank, since when H˜ (z) is • perfect reconstruction filter bank can be written as N 1, power-complementary and paraunitary are the the product of a paraunitary and a unimodular matrix. same× property. A unimodular polynomial matrix U(z) is any square For more about paraunitary filter banks, see Chapter 6 of • polynomial matrix having a constant nonzero Vaidyanathan.

61 62

Modulated STFT Filter Bank For H0(z) to be a good anti-aliasing lowpass filter, its • length must exceed the number of bins in the DTFT. Recall the complex Portnoff analysis bank, where (Otherwise, the best we have is the rectangular • H (z), k =1 ...N are Nth-band bandpass filters window, which gives only -13 dB stopband rejection.) k This means we must use a Portnoff window of some related to lowpass prototype H0(z) by modulation ∆ 2π length larger than the DFT length. k j N (e.g., Hk(z)= H0(zWN ), WN = e− ): Let the window length be L > N. In Lecture 7, it is • mentioned that we can still use a length N FFT Xn(ω0) Alias X(z) X0(z) provided h0 is replaced by N h0. I.e., it is X(z) H (z) X0(z) 0n H0(z) 0n 0 WN WN Xn(ω1) time-aliased down to length N. H0(z) X1(z) H1(z) X1(z) −1n 1n WN WN Xn(ω2) With polyphase analysis we obtain this result, along with H0(z) X2(z) H2(z) X2(z) −2n 2n WN WN an efficient FFT implementation: STFT Polyphase Analysis of Portnoff STFT Convolution gives • ∞ km Recall that the analysis filter in an STFT can be X (ω )= x[m]h [n m]W − . n k 0 − N • m= expressed as a frequency-shift of a prototype lowpass X−∞ filter: This is the sliding-window STFT implementation, k where h [n m] is the sliding window “centered” at Hk(z)= H0(zWN ), k =1,...,N 1 0 − − time n, and Xn(ωk) is the kth DTFT bin at time n. In principle, the lowpass filter’s impulse-response can After remodulating the DTFT channel outputs and be any length L. • summing, we obtain perfect reconstruction of X(n) Denote the N-channel polyphase components of provided H (z) is Nyquist(N). • H (z) by E (z), l =0, 1,...,N 1. 0 0 l − 63 64 Then by the polyphase decomposition, • X0(z) N 1 X(z) E0(z^3) F 3 X(z) 3 E0(z) F − l N z^-1 X1(z) z^-1 H0(z) = z− El(z ) E1(z^3) F 3 3 E1(z) F l=0 X z^-1 X2(z) z^-1 N 1 E2(z^3) T 3 3 E2(z) T − k l k N Hk(z) = (zWN )− El((zWN ) ) Xl=0 We see that the polyphase filters compute the N 1 • − appropriate time-aliases of the flipped window l N kl 1 = z− El(z )WN− H0(z− ). l=0 X The window is hopped by N samples, but recall that Consequently, • • it is operating on input date time aliased by the factor N 1 L/N, where L is the filter length. Thus, the hop size − l N kl Hk(z)X(z) = z− El(z )X(z)WN− is only a fraction of the anti-aliasing filter l=0 impulse-response length. X N 0 H0(z) E0(z )z− X(z) kl ... = WN− ... Question: What familiar case do we get when      N (N 1)  HN 1(z) EN 1(z )z− − X(z) Ek(z)=1 for all k? − −       If H (z) is a good Nth-band lowpass, the subband • 0 signals xk[n] are bandlimited to a region of width 2π/N. As a result, there is negligible aliasing when we downsample each of the subbands by N. Commuting the downsamplers to get an efficient • implementation is straightforward:

65 66

Critically Sampled STFT Filter Bank Therefore, (11) can be interpreted as computing a length-N FFT applied to the input block T [u0(m) uN 1(m)] . We will now analyze the filter bank interpretation of the ··· − STFT with hop size set to R = M = N. Now, we’ll form a simpler representation of the sequence u (m). First, define the polyphase decomposition of h(n) The kth subband signal in the DFT filter bank can be r and x(m): written as:

∞ pi(m)= h(mN + i), i =0, 1, ..., N 1 X (m)= h(mN n)x(n)W kn − k − N n= If the filter h(n) has length M = LN, then each of its −∞ X polyphase components pr(m) has length L. Now define j2π/N with W e− . The signal X (m) is regarded as a N k the sequence xr(m) complex sequence≡ formed from the kth DFT bin over x (m) x(mN r), r =0, 1, ..., N 1 time m (in frames). r ≡ − − Making the change of variable n = lN r, the above Finally, (12) can be expressed as: equation becomes: − ∞ N 1 ur(m)= pr(m l)xr(m) (13) − − kn l= Xk(m)= ur(m)WN (11) X−∞ r=0 X Thus, every input bin to the DFT is actually a with convolution between xr(m) and pi(m), the ith polyphase filter of the lowpass prototype filter, h(n). ∞ u (m) h(mN lN + r)x(lN r) (12) r ≡ − − l= X−∞

67 68 Complexity is now the cost of one full-rate Relationships bewteen OLA,LOT, & ELT • convolution with h(n) and the FFT cost. If M N (L 1), we can keep aliasing error • within≫ tolerable≫ limits. Let N be the length of the DFT (or the number of • frequency bins). N is no longer constrained to be the If M = 10N (L = 10), it is possible to keep the same as R. • reconstruction error below 0.1%. R is the downsampling factor on each branch Similar development for the synthesis DFT bank • • L is the length of each polyphase filter, p (m). • r Notice that when the polyphase filters are scalars The length of the lowpass prototype filter • (1-tap FIR filters) of unit gain, then this is simply a • h(n)= M = LN DFT block transform, with a rectangular window. Case 1 : critically sampled, no overlap The frequency response is a sinc, with poor frequency • characteristics. R = N, the filter bank is critically sampled • L =1,M = N, the polyphase filters are simply By extending the length of the polyphase filters, • scalar multiplies • L> 1, then the freqeuency characteristics of the Equivalent to a block transform window can become much better. • Perfect reconstruction only if The window length is no longer restricted to be of the • h(n)=1, n =0, ...N 1 • same length as the transform. − Case 2 : oversampled OLA of 50% overlap • N R = 2 , the filter bank output is oversampled by 2 • L =1,M = N, the polyphase filters are simply • scalar multiplies

69 70

PR requires that the prototype lowpass filter Pseudo-QMF Cosine Modulation Filter • (window) has constant overlap add in time: Bank ∞ h(n iR)=1 − Now, we want an N-channel transform with real i= • X−∞ filters and real outputs. Case 3 : critically sampled OLA of 50% overlap • Only eliminate aliasing between adjacent bands. – R = N, the filter bank is critically sampled • Bandpass filters used in practice attenuate noise about – L =2, the polyphase filters are two-tap FIR filters 96dB, so neglected bands are not much of a concern. – M =2N, the lowpass protoype filter h(n) is twice First, design a lowpass prototype window, h(n), with as long as the transform length, N • length M = LN, L, M, N Z. – A transform slightly different than the DFT matrix ∈ is needed for perfect reconstruction The lowpass design is constrained to give aliasing • cancellation in neighboring subbands: – The same form as the Princen-Bradley filter bank, jω 2 j(π/N) ω 2 where h(n)= h(M 1 n) and H(e ) + H(e − ) = 2, 0 < ω < π/2N | | | | | | h2(n + M )+ h2(n)=2− − H(ejω) 2 = 0, w > π/N 2 | | | | – Considered a Lapped Transform The filter bank analysis filters h (n) are cosine • k Case 4 : critically sampled OLA of 8:1 overlap modulations of h(n): • 1 M 1 π – Similar to Case 3, but L =8 and thus M =8N. h (n)= h(n)cos k + n − + φ , k 2 − 2 N k – Transform matrix is close to that of Case 3.     k =0,...,N 1, where the phases are restricted – The same form as the MPEG layer I,II filter bank − – Considered an Extended Lapped Transform according to π How do we generate the new transform matrices in φk+1 φk = (2r + 1) • − 2 Cases 3 & 4? again for aliasing cancellation.

71 72 Since it is an orthogonal filter bank, the synthesis Perfect Reconstruction Cosine • filters are simply the time reverse of the analysis Modulated Filter Banks filters: fk(n)= hk(M 1 n) By changing the phases φ , the pseudo-QMF filter − − • k This filter bank is used in MPEG audio, layers I,II. bank can yield perfect reconstruction: • 1 π – N=32 bands φ = k + (L + 1) k 2 2 – M=512 taps (L=8)   where L is the length of the polyphase filter (M = LN). If M =2N, then this is the oddly-stacked • Princen-Bradley filter bank, and the analysis filters are related by cosine modulations of the lowpass prototype: N +1 1 π f (n)= h(n)cos n + k + , k =0: N 1 k 2 2 N −     However, the length of the filters M can be any even • multiple of N: M = LN, (L/2) ∈Z L is called the overlapping factor • These filter banks are also referred to as Extended • Lapped Transforms, when K 2. ≥

73 74

MPEG Layer III Filter Bank Geometric Signal Theory

We will now approach filter bank derivation from a Original MPEG subbands were about 687 Hz wide. • “Hilbert space” point of view. This is the most natural Finer frequency resolution was needed. setting for the study of wavelet filter banks. • A Princen-Bradley filter bank with 12 to 36 sub-bands • Signals can be expanded as a linear combination of is appended after each subband of the 32-channel • PQMF cosine modulated analysis filter bank. orthonormal basis signals ϕk: ∞ The number of sub-bands and window shape are x(n) = ϕ ,x ϕ (n) • h k i k signal dependent: k= X−∞ n ( , ), x(n),ϕ (n) – Transients use 12 subbands, which gives better ∈ −∞ ∞ k ∈C time resolution and poorer frequency resolution. where the coefficient of projection of x onto ϕk is – Steady-state tones use 36 subbands, which gives given by better frequency resolution and poorer time ∆ ∞ resolution. ϕ ,x = ϕ∗(n)x(n) (Inner Product) h k i k n= – The encoder generates a function called the X−∞ perceptual entropy (PE), which tells the coder 1, k = l ϕ ,ϕ = δ(k l)= (Orthonormal) when to switch resolutions. h k li − 0, k = l  6 Signal expansion can be viewed as sum of geometric • projections of x onto ϕ { k}

75 76 x N 1 ∆ 1 − jω n = ϕ ,x = x(n)e− k ⇒ h k i √ N n=0 X ∆ √ ∆ √ ϕ = DFTN,k(x)/ N = X(ωk)/ N < ϕ>ϕ k x , k k N 1 − x(n) = ϕ ,x ϕ (n) Example Basis Signals h k i k k=0 XN 1 1 − = X(k)ejωkn “Natural” Basis: N • Xk=0 ∆ ∆ 1 ϕk = [..., 0, 1 , 0,...], i.e., = DFTN,n− (X) kth ϕ (n) =∆ δ(n k) k − |{z} = ϕ ,x = x(k) ⇒ h k i ∞ x(n) = x(k)ϕk(n) k= X−∞ ∞ = x(k)δ(n k) =(∆ x δ)(n) − ∗ k= X−∞ Normalized DFT Basis for CN : • ∆ ∆ ϕ (n) = ejωkn/√N, ω =2πk/N, n,k [0,N 1] k k ∈ −

77 78

Normalized Fourier Transform Basis: Normalized STFT Basis: • • jω n jω n ϕ (t) =∆ ejωt/√2π, ω,t ( , ) ∆ w(n mR)e k w(n mR)e k ω ϕmk(n) = − = − , ∈ −∞ ∞ Shift jωk( ) 2 mR(w)e · n w (n) 1 ∞ jωt = ϕω,x = x(t)e− dt ωk =2πk/N, k [0,N 1], n ( p, P), w(n) ⇒ h i √2π ∈ − ∈ −∞ ∞ ∈R Z−∞ ∆ ∆ – Overcomplete in general = FTω(x)/√2π = X(ω)/√2π – Orthonormal when ∞ R = M = N x(t) = ϕω,x ϕω(t)dω h i (Hop size = Window length = DFT length) Z−∞ 1 ∞ = X(ω)ejωtdω w = Rectangular Window wR 2π Shift ZeroPad DFT = ϕmk = mN [ (ϕk )], ∆ Z−∞ ⇒ ∞ 1 jωkn √ = FTt− (X) i.e., ϕmk(n)= e / N for mN n (m + 1)N 1, and 0 otherwise ≤ ≤ − In this case, Normalized DTFT Basis: • 1 ∞ ∆ jωn jωkn ϕ (n) = e /√2π, ω ( π,π], n ( , ) ϕmk,x = x(n)wR(n mN)e− ω h i √ − ∈ − ∈ −∞ ∞ N n= (Find inner product ϕ ,x and reconstruction of X−∞ ω =∆ STFT (x)/√N =∆ X (ω )/√N x(n) in terms of ϕh as ani exercise.) N,m,k m k { ω}

79 80 N 1 ∞ − – f(t) typically bandpass x(n) = ϕ ,x ϕ (n) h mk i mk – Qualitative example: m= k=0 −∞ X X N 1 ∞ 1 − jω n = w (n mN) X (ω )e k ( ) R − N m k tf m= k=0 −∞ (Mother X X Wavelet) t ∞ Shift ZeroPad 1 = mN,n DFTN− (Xm) ∞ m= 1 t X−∞     f ( ) ∆ 1 2 2 t = STFT−N,n(X) 1 t f ( ) t Continuous Wavelet Transform Basis: 2 4 • ∆ 1 τ t ϕsτ (t) = f ∗ − , s s | |   τ,s,t (p , ) ∈ −∞ ∞

∆ 1 ∞ t τ X(s,τ) = x(t)f − dt s s | | Z−∞   – s = scale parameterp – 1/ s maintains energy invariance | | – X(ps,τ)= wavelet coefficients – f(t)= mother wavelet

81 82

Wavelets Discrete Wavelet Transform

Admissibility condition for mother wavelet ψ(t): Transform (filterbank form): • 2 ∞ Ψ(ω) Cψ = | | dω < k/2 ∞ k ω ∞ X(k,n) = s− x(t)h nT a− t dt Z−∞ | | − Z−∞ Given sufficient decay, this reduces to Ψ(0) = 0 ∞  • = x(t)h nakT t dt (mother wavelet must be zero-mean) − Z−∞ Morlet Wavelet:  • n,k integers ∆ 1 jω t t2/2 ψ(t) = e− 0 e− √2π

←→ (ω ω )2/2 Inverse: Ψ(ω) = e− − 0 x(t)= X(k,n) ϕkn(t) n Xk X basis – Gaussian-windowed complex sinusoid | {z } – Scaled so that ψ =1 k k – Center frequency ω0 typically chosen so that second peak is half of first = ω0 = π 2/ln2 5.336 ⇒ 7 ≈ – Ψ(0) 7 10− 0 (close enough) ≈ × p ≈ Wavelet-based analysis called a scalogram • (analogous to STFT “spectrogram”)

83 84 Wavelet filter banks always Constant-Q: Discrete Wavelet (Dyadic) Filterbank • – Q =∆ center-frequency / bandwidth – Center-frequency =∆ geometric mean of bandlimits ω1 and ω2:

∆ ( ) ωc = √ω1ω2 h 0 2 0 nx 2 f 0 ∆ k k ( ) ωc(k) = ω1(k)ω2(k)= a ω1(0)a ω2(0) = h 1 4 1 nx 4 f 1 akω (0) ( nx ) ... Ã ( nx ) c p p ...... ∆ k ωc(k) a ωc(0) N ( ) N Q(k) = = = Q(0) h N − 1 2 x N − 1 n 2 f N − 1 ω2(k) ω1(k) akω (0) akω (0) − 2 − 1 { { { { Octave Down-samplers Up-samplers Octave In dyadic filter bank, Q = √2: Analysis Synthesis • Filter Bank Filter Bank center-frequency ∆ = ω0(ω0 + bandwidth)= √2ω0 √2ω0 √ p= Q = 2ω ω = 2 ⇒ 0− 0

85 86

STFT Tiling (Dyadic) Discrete Wavelet Generalized STFT Transform Tiling t t ∞ x (n)=(x h )(nR )= x(m) h (nR m) k ∗ k k k k − m= X−∞ analysis filter N 1 4 T 4 T − ∞ | {z } x(n) = (x f )(n)= x (m) f (n mR ) 3 T 3 T k ∗ k k k − k 2 T 2 T k k=0 m= −∞ synthesis filter T T X X X 0 ω ω ω 0 ω ω ω ω 1 2 0 0 0 0 ω Channel Signals | {z } 8 4 2 { ( ) h 0 R 0 0 nx R 0 f 0

( ) h 1 R 1 1 nx R 1 f 1 ( nx ) ...... Ã ( nx ) ( ) h N − 1 R N − 1 x N − 1 n R N − 1 f N − 1 { { { { Analysis Down-samplers Up-samplers Synthesis Filter Bank Filter Bank

Analysis filter h typically complex bandpass • k R downsamples filter output: • k Critical sampling: Rk = π/Width(Hk) Impulse response of synth. filter f = kth basis signal • k If f are orthonormal, f (n)= h∗( n) • { k} k k − More generally, h ,f form a biorthogonal basis. • { k k} 87 88 Biorthogonal Signal Expansions Discrete Wavelet Filterbank

A set of signals h ,f N is said to be a biorthogonal 1 t k k k=1 hk(t) = h , a> 1 { } √ k ak basis set if any signal x can be represented as a   H (ω) = √akH(akω) N ←→ k x = α x,h f k h ki k k=1 X Wavelet channel-filter Hk(ω) is a scaling of where αk is some normalizing scalar dependent only on • channel-filter H0(ω) (scaling in time domain also) hk and/or fk. Thus, in a biorthogonal system, we project In STFT, channel filter H (ω) is a shift of onto the signals hk and resynthesize in terms of fk. • k channel-filter H0(ω) (modulation in time domain) As k increases, h lengthens, H narrows • k k Dyadic filter bank (a =2): • 2  2 ... 1 H 2 H 1 H 0 ω ω ω 0 0 ω ω 0 ... 0 2 0 4 2

H (ω)= top-octave bandpass (BP) filter • 0 H (w)= √2H (2ω)= BP for next octave down • 1 0 H2(w)=2H0(4ω)= octave bandpass below that, • etc.

89 90

Review of STFT Filterbanks STFT, rectangular window, 50% overlap

Perfect reconstruction Let’s take a look at some of the STFT processors we’ve • seen before, now viewed as a polyphase filter bank. Oversampled by 2 (aliasing cancellation) • Since they are all based on the FFT, they are all efficient, Poor channel isolation (13dB) but most are oversampled as “filter banks” go. Some • Not robust to filter-bank modifications, but better is usually preferred outside of a compression • context. STFT, triangular window, 50% overlap The STFT also computes a uniform filter bank, but it can be used as the basis for a variety of non-uniform filter Perfect reconstruction • banks giving frequency resolution closer to that of Oversampled by 2 hearing. • Better channel isolation (26dB) • STFT, rectangular window, no overlap Moderately robust to filter-bank modifications • Perfect reconstruction • STFT, Hamming window, 75% overlap Critically sampled (aliasing cancellation) • Perfect reconstruction Poor channel isolation (13dB) • • Oversampled by 4 Not robust to filter-bank modifications • • Aliasing from sidelobes only • Good channel isolation (42dB) • Moderately robust to filter-bank modifications •

91 92 STFT, Kaiser window, Beta=10, 90 percent References overlap Approximate perfect reconstruction (sidelobes Books • controlled by β) P.P.Vaidyanthan, Multirate Systems and Filter Oversampled by 10 • • Banks, Prentice Hall, 1993. (Comprehensive Excellent channel isolation (80 dB) reference on perfect reconstruction filter banks. Little • Very robust to filter-bank modifications or no treatment of the oversampled case.) • Aliasing from sidelobes only M. Vetterli and J. Kovaˇcevi´c, Wavelets and Subband • • Coding, Prentice Hall, 1995. (Good depth on both Sliding FFT, any window, maximum overlap, perfect reconstruction filter banks and wavelets, and zero-padded by 5 their connections.) Perfect reconstruction H.S.Malvar, Signal Processing with Lapped • • (always true when hop-size = 1) Transforms, Artech House, Boston, 1992. Oversampled by 5M: M. Bosi and R.E. Goldberg, Introduction to Digital • • – M = window length [time-domain oversampling Audio Coding and Standards, Kluwer Academic factor] Publishers, Boston, 2003. – 5 = zero-padding factor [frequency-domain oversampling factor] Excellent channel isolation (set by window sidelobes) • Extremely robust to filter-bank modifications • No aliasing to cancel • 93 94

Papers

J.P. Princen, A.W.Johnson, and A.B. Bradley, • Subband/Transform Coding Using Filter Bank Desings Based on Time Domain Aliasing Cancellation, ICASSP 1987, pp 2161-2164. K. Brandenburg and G. Stoll, ISO-MPEG-I Audio: • A Generic Standard for Coding of High-Quality , Journal of the AES, Vol 42, no 10, October 1994.

95