Digital Speech Processing— Lecture 10 Short-Time Fourier Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Digital Speech Processing— Lecture 10 Short-Time Fourier Analysis Methods - Filter Bank Design 1 Review of STFT ∞ jjmωωˆˆˆ − 1.()[][]Xenˆ =−∑ xmwnme m=−∞ function of nˆ (looks like a time sequence) function of ωˆ (looks like a transform) jωˆ ˆ Xenˆ ( ) defined for n=≤≤123, , ,...; 0 ωπˆ jωˆ 2. Interpretations of Xenˆ ( ) ˆ jω ˆ 1.n fixed, ωω==−ˆ variable; Xenˆ ( ) DTFT⎣⎡ xmwnm [ ] [ ]⎦⎤ ⇒⇒DFT View OLA implementation jjnωωˆˆ− 2. nn==∗ˆ variable, ωˆ fixed; Xen ( ) xne [ ] wn [ ] ⇒⇒Linear Filtering view filter bank implementation ⇒ FBS implementation 2 Review of STFT 3. Sampling Rates in Time and Frequency ⇒ ˆ jωˆ recover xn[ ] exactly from Xnˆ ( e ) 1. time: We(jω ) has bandwidth of B Hertz ⇒ 2B samples/sec rate 2F Hamming Window: B =⇒S (Hz) sample at L 4F S (Hz) or every L/4 samples L 2. frequency: wn[ ] is time limited to L samples ⇒ need at least L frequency samples to avoid time aliasing 3 Review of STFT with OLA method can recover xn( ) exactly using lower sampling rates in either time or frequency, e.g., can sample every L samples (and divide by window), or can use fewer than L frequency samples (filter bank channels); but these methods are highly subject to aliasing errors with any modifications to STFT can use windows (LPF) that are longer than L samples and still recover with NL<⇒ frequency channels; e.g., ideal LPF is infinite in time duration, but with zeros spaced N samples apart where FNS / is the BW of the ideal LPF 4 Review of STFT jω H0(e ) jω H1(e ) X(ejω) Y(ejω) + … jω HN-1(e ) jnωk hnk []= wne [] N−1 jjωω He%()==∑ Hk () e 1 k =0 N−1 hn%[]===∑ hk [] nδ [] n wnpn [][] k =0 ∞ pn[]=− N∑ δ [ n rN ] r =−∞ ∞ hn%[]=−=++ N∑ wrN [ ][δ n rN ] w []0 wN [ ] ... r =−∞ ⇒ need to design digital filters that match criteria for exact reconstruction of xn[ ] and which still work with modifications to STFT 5 Tree-Decimated Filter Banks 6 Tree-Decimated Filter Banks • can sample STFT in time and frequency using lowpass filter (window) which is moved in jumps of R<L samples if L≤N where: – L is the window length, – R is the window shift, and – N is the number of frequency channels • for a given channel at ω=ωk, the sampling rate of the STFT need only be twice the bandwidth of the window Fourier transform – down-sample STFT estimates by a factor of R at the transmitter – up-sample back to original sampling rate at the receiver – final output formed by convolution of the up-sampled STFT with an appropriate lowpass filter, f [n] 7 Filter Bank Channels Fully decimated and interpolated filter bank channels; (a) analysis with bandpass filter, down- shifting frequency and down-sampling; (b) synthesis with up-sampler followed by lowpass interpolation filter and frequency up-shift; (c) analysis with frequency down-shift followed by lowpass filter followed by down-sampling; (d) synthesis with up-sampling followed by frequency 8 up-shift followed by bandpass filter. Full Implementation of Analysis-Synthesis System Full implementation of STFT analysis/synthesis system with channel decimation by a factor of R, and channel interpolation by 9 the same factor R (including a box for short-time modifications. Review of Down and Up-Sampling x[n] xnd []= xnR [ ] ↓ R R−1 Xe()jωωπ= 1 Xe (jrR()/−2 ) d R ∑ r =0 aliasing addition ⎧xn[ / R ] n=±± 0, 1, 2,... x[n] xnu []= ⎨ ↑ R ⎩ 0otherwise jjRωω Xeu ()= Xe ( ) imaging removal 10 Filter Bank Reconstruction • assuming no short time modifications jjmωωkk− XkrR( )== Xe rR ( )∑ wrRmxme ( − ) ( ) mW∈ rR N−1 1 jjnωωkk yn()= ∑ PkY ()rR ( e ) e N k =0 N−1 1 jnωk =−∑Pk()[] XrR [][ kfn rR ] e N k =0 ∞ N−1 ⎡⎤1 jω ()nm− =−−∑∑xm()⎢⎥ wrR ( mfn )( rR )∑ Pke () k mW∈=−∞rR ⎣⎦ rN k =0 •=∀ recognizing that when Pk( )1 , k 1 N−∞1 ∑∑enmqNjnmωk ()− =−−δ ( ) N kq==−∞0 •=∀ can show that the condition for yn( ) xn ( ), n is ∞ ∑ wrR()()()−+ n qNfn − rR =δ q ⇒ m = n 11 r =−∞ Filter Bank Reconstruction • frequency domain equations, assuming: jnωωkk jn hnkk[]== wne [] ; gn [] fne [] ; jjωωωωωωkkkk()−− jj () Hekk()== We(); Ge () Fe (); N−1 jω 1 jωk Ye( )=⋅∑ PkG ( ) k () e N k=0 22ππ ⎡⎤R−1 jj()ωω−− () 1 RRll ⎢⎥∑Hek ()() Xe ⎣⎦R l=0 N−1 jω 1 jωk jω = Xe() ∑PkG()kk() e H ( e ) RN k=0 RN−−1122ππ jj()ωω−−ll1 () RRjωk +⋅∑∑Xe() PkG ()()()kk e H e l==10RN k =⋅+He% ()()aliasingjjωω Xe terms 12 Filter Bank Reconstruction •= conditions for perfect reconstruction (Yeˆ (jjωω ) Xe ( )) 1 N−1 % jjωωjωk He()==∑ PkG ()()()kk e H e 1 RN k=0 Flat Gain and N−1 2π 1 j()ω− l jωk R ∑PkG( )kk ( e ) H ( e )==012 forl , ,..., R RN k=0 Alias Cancellation 13 Filter Bank Reconstruction 14 Decimated Analysis- Synthesis Filter Bank •=== practical solution for the case wn( ) fn ( ), N R 2 (2-band solution) where the conditions for exact reconstruction become 1 ⎡⎤PFeWe(01 ) (jjω ) ( ω )+= PFe ( ) ( j() ωπ−− ) We ( j () ωπ ) 1 4 ⎣⎦ and 1 ⎡⎤PFeWe()(0 jjωωπ ) ( ()− )()(+=PFe10jj()ωπ−− )( We ( ω2 π ) ) 4 ⎣⎦ •=−== aliasing cancels exactly if PP(011 ) ( ) (with FeWe(jjωω ) ( ))giving 1 22 ⎡⎤We(jjωωπω )−= We (()−− ) e jM 4 ⎣⎦⎢⎥()() ynˆ()=− xn ( M ) 15 Decimated Analysis- Synthesis Filter Bank when aliasing is completely eliminated (by suitable choice of filters), the overall analysis-synthesis system has frequency response: N −1 jω 1 j()ωω− k He% ()= ∑ PkW [](e e ) RN k=0 where jjjωωω Wee ()= We ()() Fe h%[]nwnpn= e [][] where wne[]=∗ wn [] fn [] 1 N −1 pn[]= ∑ Pk [] ejnωk RN k=0 16 Decimated Analysis- Synthesis Filter Bank • To design a decimated analysis/synthesis filter bank system, need to determine the following: 1. the number of channels, N, and the decimation/interpolation ratio R. (Usually determined by the desired frequency resolution) 2. the window, w[n], and the interpolation filter impulse response, f [n]. (These are lowpass filters that should have good frequency-selective properties such that the bandpass channel responses do not overlap into more than one band on either side of the channel). 3. the complex gain factors, P[k], k=0,1,…,N-1. (These constants are important for achieving flat overall response and alias cancellation.) 17 Maximally Decimated Filter Banks • we show here that it is possible to obtain exact reconstruction with R=N and N<L. – R=N is termed maximal decimation because this is the largest decimation factor that can be used with an N- channel filter bank and still achieve exact reconstruction. – nominal bandwidth of the bandpass channel signal is increased from ±π/N to ±π – down-sampling by N accomplishes frequency down- shifting normally accomplished by modulation 18 Maximally Decimated Filter Banks 19 Analysis-Synthesis Filter Bank (all modulators removed) 20 Two Channel Filter Banks 21 Two Channel Filter Banks For this two-band system, we have: RN==2;ωπk = kk , = 0,1 Applying the frequency domain analysis we get: 1 Ye()jjjjjjωωωωωω=+⎡⎤ G ()() e H e G ()() e H e Xe () 2 ⎣⎦00 11 1 ++⎡Ge()(jjωωπωωπ He()−− ) Ge ()( jj He () )⎤ Xe()j()ωπ− 2 ⎣⎦00 11 not including multiplier 1/NPP or the gain factors [0], [1] Aliasing can be eliminated completely by choosing the filters such that: ⎡⎤jjωωπωωπ()−− jj () ⎣⎦Ge00()( He )+= Ge 11 ()( He ) 0(aliasing cancelation) Perfect reconstruction requires: 1 ⎡⎤Ge()()jjωω He+= Ge ()()1 jj ωω He (flat gain condition) 22 2 ⎣⎦00 11 Quadrature Mirror Filter (QMF) Banks Alias cancellation achieved if the filters are chosen to satisfy the conditions: jnπωωπ j j()− hn10[]=⇔= e hn [] H 10 ( e ) H ( e ) jjωω gn00[]=⇔= 2 hn [] Ge 0 ( ) 2 He 0 ( ) jjωωπ()− gn11[]=− 2 hn [] ⇔ Ge 1 ( ) =− 2 H 0 ( e ) Note that there is only a single lowpass filter, with impulse response hn0[] on which all other filters are based; filter has nominal cutoff of π /2 rad with a narrow transition to a stopband with adequate attenuation to isolate the two bands jnπ The filter hn10[]= e hn [] is the complementary highpass filter The filters hn01[] and hn [] are called Quadrature Mirror Filters (QMF) 23 Quadrature Mirror Filter (QMF) Banks Substituting for the filter relationships we get: jjωωπωπωπ()−−− j () j (2) 2()(He00 He )2(−= He 0 )( He 0 )0 i.e., perfect alias cancellation for any choice of lowpass filter 22 Ye()jjωω=−⎡⎤ He () He ( j() ωπ− ) Xe()jjjωωω= He% ()() Xe ⎣⎦⎢⎥()()00 For perfect reconstruction we require that: 22 ⎡⎤He()jjωωπω−= He (()−− ) e jM ⎣⎦⎢⎥()()00 i.e., flat magnitude with delay of M samples, due to the delay of the causal filters of the filter bank 24 Quadrature Mirror Filters Assume FIR lowpass filter of length L samples such that hn00[ ]=−−≤≤− h [( L 1) n ], 0 n L 1; this filter has linear phase with a delay of (1)/2L − samples and a frequency response of the form: jjω ωω−−jL(1)/2 He00()= Ae ()(1)eL− must be odd jω where Ae0 () is real. Using this lowpass filter in the filter bank, the condition for perfect reconstruction becomes: 22 He% ()jjjLjjωωπωπω=−⎡⎤ A () e e(1)−−− A ( e ( ) ) e (1)L− ⎣⎦⎢⎥()00 ( ) If L −1 is odd, we get: 22 He% ()jjωω=+⎡⎤ Ae () Ae ( jjL() ωπω−−− ) e (1) ⎣⎦⎢⎥()()00 linear phase is guaranteed, but not perfect reconstruction 25 Quadrature Mirror Filters For flat gain we need to satisfy the condition: 22 ⎡⎤Ae()jjωωπ+= Ae (()− ) 1 ⎣⎦⎢⎥()()00 Solution is to use high order linear phase QMF filters Johnston proposed an algorithm for design of the basic lowpass filter that minimized the deviation from unity while providing a desired level of stopband attenuation 26 Quadrature Mirror Filters hn[]== wn [] gn [] hn[]== wne []jnπ gn [] 001127 Quadrature Mirror Filters Lowpass filter Highpass filter is designed by mirror image of windowing lowpass filter • practical realization of QMF filters • note that aliasing is cancelled, but the overall frequency response is not perfectly flat 28 Quadrature Mirror Filters 29 Issues with Perfect Reconstruction • The motivation for the subband approach is to process different speech properties independently in each band and thus being able to localize the band (compact bandwidth) is important.