General Discrete-Time Model of Speech Production
Digital Speech Processing— Lecture 9
Short-Time Fourier Analysis Methods- Voiced Speech: • AVP(z)G(z)V(z)R(z) Introduction Unvoiced Speech:
1 • ANN(z)V(z)R(z) 2
Short-Time Fourier Analysis Why STFT for Speech Signals
• represent signal by sum of sinusoids or • steady state sounds, like vowels, are produced complex exponentials as it leads to convenient by periodic excitation of a linear system => solutions to problems (formant estimation, pitch speech spectrum is the product of the excitation period estimation, analysis-by-synthesis spectrum and the vocal tract frequency response methods), and insight into the signal itself • speech is a time-varying signal => need more sophisticated analysis to reflect time varying • such Fourier representations provide properties – convenient means to determine response to a sum of – changes occur at syllabic rates (~10 times/sec) sinusoids for linear systems – over fixed time intervals of 10-30 msec, properties of – clear evidence of signal properties that are obscured most speech signals are relatively constant (when is in the original signal this not the case)
3 4
Frequency Domain Processing Overview of Lecture
• define time-varying Fourier transform (STFT) analysis method • define synthesis method from time-varying FT (filter-bank summation, overlap addition) • show how time-varying FT can be viewed in terms of a bank of filters model • Coding: • computation methods based on using FFT – transform, subband, homomorphic, channel vocoders • application to vocoders, spectrum displays, • Restoration/Enhancement/Modification: format estimation, pitch period estimation – noise and reverberation removal, helium restoration, time-scale modifications (speed-up and slow-down of speech) 5 6
1 Frequency and the DTFT DTFT and DFT of Speech
• sinusoids The DTFT and the DFT for the infinite duration jnωω00− jn signal could be calculated (the DTFT) and xn ( )==+ cos(ω0 n ) ( e e )/ 2 approximated (the DFT) by the following: where ω0 is the frequency (in radians) of the sinusoid ∞ • the Discrete-Time Fourier Transform (DTFT ) Xe (jjmωω )= ∑ xme ( )− ( DTFT ) ∞ m=−∞ jjnωω− Xe ( )== xne ( ) x (n) L−1 ∑ DTFT {} − jLkm(2π / ) n=−∞ Xk()= ∑ xmwme ( )( ) ,kL=−0,1,..., 1 π m=0 1 jjnωω-1 j ω xn()== Xe ( ) e dω DTFT {} Xe ( ) = Xe()jω ( DFT ) ∫ ωπ=(2kL / ) 2π −π where ω is the frequency variable of Xe (jω ) using a value of L =25000 we get the following plot
7 8
25000-Point DFT of Speech
Magnitude Short-Time Fourier Transform (STFT) Log Magnitude (dB)
9 10
Short-Time Fourier Transform Definition of STFT
∞ jjmωωˆˆˆˆ− • speech is not a stationary signal, i.e., it Xenˆ ( )=−∑ xmwnme ( ) ( ) both n and ωˆ are variables m=−∞ has properties that change with time •− wn (ˆˆ m ) is a real window which determines the portion of xn ( ) jωˆ • thus a single representation based on all that is used in the computation of Xenˆ ( ) the samples of a speech utterance, for the most part, has no meaning • instead, we define a time-dependent Fourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time
11 12
2 Short Time Fourier Transform Short-Time Fourier Transform
• STFT is a function of two variables, the time index, nˆ , which • alternative form of STFT (based on change of variables) is is discrete, and the frequency variable, ω ˆ, which is continuous ∞ ∞ jjnmωωˆˆ−−()ˆ jjmωωˆˆ− Xenˆ ( )=− wmxnme ( ) (ˆ ) Xenˆ ()=− xmwnme ()(ˆ ) ∑ ∑ m=−∞ m=−∞ ∞ ˆˆ ˆ =−⇒ DTFT (xmwn ( ) ( m )) n fixed, ω ˆ variable =−exnmwme− jnωωˆˆ∑ ()()ˆ jm m=−∞ • if we define ∞ % jjmωωˆˆˆ Xenˆ ( )=−∑ xnmwme ( ) ( ) m=−∞ jωˆ • then Xenˆ ( ) can be expressed as (using mm′ =− ) jjnjjnωωˆˆ−−ˆˆ ωω ˆˆ ˆ Xenˆ ()== e Xe% n () eDTFT ⎣⎦⎡⎤ xnmwm ( +− )()
13 14
STFT-Different Time Origins Time Origin for STFT
• the STFT can be viewed as having two different time origins 1. time origin tied to signal xn ( ) mnx=−ˆ ⇒ [0] ∞ jjmωωˆˆˆ − Xenˆ ()=−∑ xmwnme ()( ) m=−∞ =− DTFT ⎣⎦⎡⎤xmwn( ) (ˆˆ m ) , n fixed, ωˆ variable Time origin tied to 2. time origin tied to window signal wm (− ) ∞ window wmxnm[][]−+ˆ jjnωωˆˆ−−ˆ ˆ jm ω ˆ Xenˆ ()=+− e∑ xnmwme ( )() m=−∞ = eXe− jnωωˆˆˆ % () j − jnωˆ ˆ =−+ewmxnmnDTFT ⎣⎦⎡⎤( ) (ˆˆ ) , fixed, ωˆ variable
15 16
Interpretations of STFT Fourier Transform Interpretation
jωˆ • there are 2 distinct interpretations of Xenˆ ( ) jωˆ • consider Xenˆ ( ) as the normal Fourier transform of the sequence 1. assume nXeˆ is fixed, then (jωˆ ) is simply the normal Fourier nˆ wn(ˆˆ−−∞<<∞ mxm ) ( ), m for fixed n . transform of the sequence wn (ˆ −−∞<<∞=> mxm ) ( ), m for •− the window wn (ˆ m ) slides along the sequence xm ( ) and defines a jωˆ fixed nXˆ, nˆ ( e ) has the same properties as a normal Fourier new STFT for every value of nˆ transform • what are the conditions for the existence of the STFT jωˆ ˆ 2. consider Xenˆ ( ) as a function of the time index n with ωˆ fixed. •− the sequence wn (ˆ mxm ) ( ) must be absolutely summable for all jωˆ ˆ − jnωˆ ˆ values of nˆ Then Xenˆ ( ) is in the form of a convolution of the signal xne ( ) with the window wn (ˆ ). This leads to an interpretation in the form of - since |xn (ˆ ) |≤ L (32767 for 16-bit sampling) − jnωˆ ˆ ˆ linear filtering of the frequency modulated signal xne (ˆˆ ) by wn ( ). - since |wn ( ) |≤ 1 (normalized window levels) - since window duration is usually finite ˆˆ - we will now consider each of these interpretations of the STFT in •− wn ( mxm ) ( ) is absolutely summable for all n a lot more detail 17 18
3 Frequencies for STFT Signal Recovery from STFT
jωˆ • the STFT is periodic in ω with period 2π , i.e., • since for a given value of nXˆ ,nˆ ( e ) has the same properties as a jjkωωπˆˆ()+2 normal Fourier transform, we can recover the input sequence exactly Xennˆˆ()=∀ Xe ( ), k • since Xe (jωˆ ) is the normal Fourier transform of the windowed • can use any of several frequency variables nˆ sequence wn (ˆ − mxm ) ( ), then to express STFT, including 1 π wn (ˆ −= mxm ) ( ) X ( ejjmωωˆˆ ) e dωˆ --ωˆ =Ωˆ TT (where is the sampling period for ∫ nˆ 2π −π xm ( )) to represent analog radian frequency, •≠ assuming the window satisfies the property that w (00 ) ( a trivial
jTΩˆ requirement), then by evaluating the inverse Fourier transform giving Xenˆ ( ) when mn= ˆ , we obtain ˆ ˆ --ωπˆˆ==22fFT or ωπ to represent normalized 1 π xn (ˆ )= X ( ejjnωωˆ ) eˆ dωˆ ˆ ∫ nˆ frequency (0≤≤f 1 ) or analog frequency 20π w()−π ˆ jf2π ˆ j 2πFTˆ (0≤≤FFs =1 /T ), giving Xenˆ ( ) or Xenˆ ( ) 19 20
Signal Recovery from STFT Properties of STFT
jωˆ Xenˆ ( )=−DTFT [ wnmxm (ˆˆ ) ( )] n fixed, ωˆ variable π 1 jjnωωˆ ˆ • relation to short-time power density function xn (ˆ )= Xnˆ ( e ) e dωˆ ∫ jjωωˆˆ2 jj ωω ˆˆ∗ 20π w()−π ˆ Sennˆˆ ( )==⋅= | Xe ( ) | Xe nn ˆˆ ( ) Xe ( )DTFT [] Rk n ˆ ( ) n fixed •≠ with the requirement that wxn (00 ) , the sequence (ˆ ) ∞ jωˆ ˆˆ Rk()=− wnmxmwnm (ˆˆ )( )( −−+⇔ kxmk )( ) S()e can be recovered exactly from Xe (jjωω ), if Xe ( ) is known nˆ ∑ nˆ nnˆˆ m=−∞ for all values of ωˆ over one complete period • Relation to regular Xe (jωˆ ) (assuming it exists) - sample-by-sample recovery process ∞ jjmωωˆˆ− jωˆ Xe()==DTFT [] xm ()∑ xme () - Xenˆ ( ) must be known for every value of nˆ and for all ωˆ m=−∞ ˆ π can also recover sequence wn (− mxm ) ( ) but can't guarantee jjjjnωθωθθˆˆ1 −−−() ˆ Xe()= We ( )( Xe ) e dθ ˆ nˆ ∫ that xm ( ) can be recovered since w (nm− ) can equal 0 2π −π ˆ −−jjnjθθˆ θ ⎣⎦⎡⎤wn()()()−↔ m xm We e ∗ Xe ()
21 22
Properties of STFT Alternative Forms of STFT
Alternative forms of Xe (jωˆ ) • assume Xe (jωˆ ) exists nˆ 1. real and imaginary parts ∞ jjmωωˆˆ− Xe (jjωωˆˆ )=+ Re⎡⎤⎡⎤ Xe ( ) j Im Xe ( j ω ˆ ) Xe( )==DTFT [] xm ()∑ xme () nnˆˆ⎣⎦⎣⎦ n ˆ m=−∞ =−ajb()ωωˆˆ () π nnˆˆ 1 ˆ Xe()jjjjnωθωθθˆˆ= We (−−− )( Xe() ) e dθ aXe()ωˆ = Re⎡⎤ (jωˆ ) nˆ ∫ nnˆˆ⎣⎦ 2π −π bXe()ωˆ =− Im⎡⎤ (jωˆ ) • limiting case nnˆˆ⎣⎦ •− when xm ( ) and wn (ˆ m ) are both real (usually the case) wn()ˆˆ=−∞<<∞⇔12 n We (jωˆ ) =πδ () ωˆ can show that abˆˆ (ωωωˆˆˆ ) is symmetric in , and ( ) is 1 π nn X() ejjjnjωωθθωˆˆˆ=−2πδ ()( θ Xe()−− ) eˆ d θ = Xe () anti-symmetric in ωˆ nˆ 2π ∫ −π 2. magnitude and phase
i.e., we get the same thing no matter where the window is jjωωˆˆjθωnˆ ()ˆ Xennˆˆ ( )= | Xe ( ) | e shifted jωˆ • can relate |Xennˆˆ ( ) | and θω (ˆ ) to abnnˆˆ (ωωˆˆ ) and ( ) 23 24
4 Role of Window in STFT Role of Window in STFT