<<

Prediction Error

1. Speech Production Model p sn()=−+ asn ( k ) Gun () ∑ k Gu(n) s(n) k =1 H(z) Digital — Sz() G Hz()== Uz() p 1− az−k Lecture 14 ∑ k k =1 2. LPC Model: p en()= sn ()−=− sn () sn ()α sn ( − k ) % ∑ k s(n) e(n) k =1 A(z) Ez() p Az()==−1 α z−k Linear Predictive ∑ k Sz() k =1 3. LPC Error Model: Coding (LPC)-Lattice 11Sz() == Az() Ez () p 1− α z−k e(n) s(n) ∑ k 1/A(z) Methods, Applications k =1 p sn()=+ en ()α sn ( − k ) ∑ k Perfect reconstruction even if 1 k =1 2 ak not equal to αk

Lattice Formulations of LP Lattice Formulations of LP

• define the system function of the i th order inverse filter • both covariance and autocorrelation methods use (prediction error filter) as two step solutions i Az()iik ( ) =−1 α () z− 1. computation of a matrix of correlation values ∑ k k =1 2. efficient solution of a set of linear equations • if the input to this filter is the input segment ˆ ()ii () ˆ • another class of LP methods, called lattice methods, smnˆ ()=+ snmwm ( )(), with output emnˆ ( )=+ enm ( ) i has evolved in which the two steps are combined into em()ii()=− sm ()α () smk ( − ) ∑ k a recursive algorithm for determining LP parameters k =1 ˆ th • where we have dropped subscript n - the absolute • begin with Durbin algorithm--at the i stage the set of location of the signal ()ith• the z-transform gives coefficients {α j ,ji= 12 , ,..., } are coefficients of the i order ⎛⎞i optimum LP Ez()ii ( )==− AzSz () ( ) ( ) ⎜⎟1 α ()ikzSz− () ⎜⎟∑ k ⎝⎠k =1 3 4

Lattice Formulations of LP Lattice Formulations of LP

• where we can interpret the first term as the z-transform • using the steps of the Durbin recursion of the forward prediction error for an (i −1 )st order predictor, ()ii (-)11 (-) i () i and the second term can be similarly interpreted based on (ααjj =−=kkii α ij- , and α i ) defining a backward prediction error ()i ()iii−− ()1111 −−− ii () () i − we can obtain a recurrence formula for Az ( ) Bz ()== zAzSzzA ( )() ( zSzkAzSz )() −i ()() −−11()ii () − 1 of the form =−zB() z kEi () z ()ii (−−−11 ) ii ( ) −1 • with inverse transform Az ( )=− A ( zkzA ) i ()z i bm()iii ( )=−− smi ( )α () smki ( +−= ) b (−1 ) ( m −1)()− ke()i −1 m • giving for the error transform the expression ∑ k i k =1 ()ii ()−−−−111 ii () •− with the interpretation that we are attempting to predict sm ( i ) from the EzAzSzkzAzSz ( )=− ( ) ( )i ( ) ( ) ismi samples of the input that follow (− ) => we are doing a backward ()ii ()−−11 () i ()i em()=− e () mkbmi ( −1 ) prediction and bm ( ) is called the backward prediction error sequence

5 6

1 Lattice Formulations of LP Lattice Formulations of LP

• the prediction error transform and sequence Ezem()ii ( ), () ( ) can now be expressed in terms of forward and backward errors, namely ()ii ()−−−111 () i Ez()=− E () zkzBi () z ()ii (−−11 ) ( i) em ( )=− e ( mkb ) i ()1m −∗1 • similarly we can derive an expression for the backward error transform and sequence at sample m of the form ()ii−−11 () () i − 1 Bz()=− zB () zkEzi () ()ii ()−−11 () i bm ( )=−−∗ b ( m1 ) kemi ( ) 2 • these two equations define the forward and backward prediction error for an i th order predictor in terms of the corresponding prediction errors of an (i −1 )th order predictor, with the reminder that a zeroth order predictor does no prediction, so embmsm()00 ( )== () ( ) ( ) 01 ≤≤− mL ∗ 3 same set of samples is used to forward predict s(m) Ez()== E()p () z Az ()/() Sz and backward predict s(m-i) 7 8

Lattice Formulations of LP Lattice Formulations of LP

Assume we know k1 (from external computation): 1. we compute em(1)[] and bm (1) [] from sm[],0≤≤ m L using Eqs. ∗ 1 and ∗ 2 2. we next compute em(2)[] and bm (2) [] for 01≤≤+mL using Eqs. ∗1 and ∗ 2 3. extend solution (lattice) to p sections giving em()pp[] and bm () [] for 0≤ mLp ≤+− 1 ()p 4. solution is en[]= e [] n at the output of the • lattice network with p sections—the output of • no direct correlations pth lattice section which is the forward prediction error • no alphas • digital network implementation of the prediction • k’s computed from forward 9 error filter with transfer function A(z) and backward error signals10

Lattice Filter for A(z) All-Pole Lattice Filter for H(z) en()p [] en(1)p − [] en(1) [] en(0)[] s []n k k p −1 p k1

− k − k − k p p −1 1

−1 −1 z z bn()p [] bn(1)p − [] bn(1) [] bn(0)[] en()00[]== bn () [] sn []01 ≤≤− nL en()p []= en [],01≤≤−+ nL p ()ii−−11 () () i ()ii (−−11 ) ( i ) enenkbn[]=+ []i [ −1 ] en[ ]=− e [ n ] kbni [ −=1120 ] i , ,..., p , ≤≤−+ nL 1 i ()iii (−−11 ) ( ) ipp= ,−≤≤−+−110 ,..., , nL 1 i 1 bn[ ]=− keni [ ] + b [ n −1120 ] i = , ,..., p , ≤ nL ≤ − 1 + i ()iii ()−−11 () bn[]=− keni [] + b [ n −1 ] en[]=≤≤−+ e()p [], n01 n L p ipp= ,−≤≤−+110 ,..., , nL 1 i 1 11 ()00 () Sz()= 12 Ez () Ez()= AzSz ()() sn[]= e [] n=≤≤− b [], n01 n L Az()

2 All-Pole Lattice Filter for H(z) Lattice Formulations of LP

1. since bi()ii[1]0,−= ∀ , we can first solve for e (− 1) [0] for (1)ii− () ipp=−, 1,...,1,using the relationship: e [0] = e [0] • the lattice structure comes directly out of the Durbin algorithm (0) (0) (i ) This image cannot currently2. be displayed. since be[0]= [0] we can then solve for b [0] for • the ki parameters are obtained from the Durbin equations

()ii (− 1) • the predictor coefficients, αk , do not appear explicitly in the lattice structure ip=1,2,..., using the equation: bke[0]=− i [0] • can relate the k parameters to the forward and backward errors via (1)i− i 3. we can now begin to solve for e [1] as: Li−+1 (1)ii−− () (1) i embm()ii−−11() () (−1 ) eekbipp[1]=+ [1]i [0], =− , 1,...,1 ∑ k =∗m=0 4 (0) (0) (i ) i 12/ 4. we set be[1]= [1] and we can then solve for b [1] for ⎧⎫⎡⎤⎡⎤Li−+11 Li −+ ⎪⎪()ii−−12 () 1 2 ip=1,2,..., using the equation: ⎨⎬⎢⎥⎢⎥∑∑[()][()]em bm−1 ⎩⎭⎪⎪⎣⎦⎣⎦⎢⎥⎢⎥mm==00 ()iii (−− 1) ( 1) bkebi[1]=−i [1] + [0], = 1,2,..., p • where ki is a normalized cross correlation between the forward and backward 5. we iterate for nN=−2,3,..., 1 and end up with prediction error, and is therefore called a partial correlation or PARCOR coefficient sn[]== e(0) [] n b (0) [] n • can compute predictor coefficients recursively from the PARCOR coefficients

13 14

Direct Computation of k Parameters Direct Computation of k Parameters

assume sn [ ] non-zero for 01≤≤− n L assume k chosen to minimize total energy of the forward i • we can also choose to minimize the backward prediction error (or backward) prediction errors 2 Li−+112 Li −+ Ebmkembm()ii==−+−⎡⎤ () ( ) ⎡ ( ii−−11 ) ( ) ( ) (1 ) ⎤ we can then minimize forward prediction error as : backward∑∑⎣⎦ ⎣ i ⎦ mm==00 Li−+11 Li −+ 2 ()ii () 2 ()ii−−11 () ()i Li−+1 Eem= ⎡⎤ ( )()()=−− ⎡emkbm1 ⎤ ∂E ()ii−−11 () ()i −1 forward ∑∑⎣⎦ ⎣i ⎦ backward ==−02⎡⎤ −ke() m + b ( m − 1 )em() mm==00 ∑ ⎣⎦i ∂ki m=0 ()i Li−+1 ∂Eforward ()ii−−11 () () i − 1 Li−+1 ==−02⎡⎤emkbm() − ( − 1 ) bm ( − 1 ) ()ii () ∑ ⎣⎦i ⎡⎤embm−−11() (−1 ) ∂ki m=0 ∑ ⎣⎦ backward m=0 Li−+1 ki = Li−+1 ()ii−−11 () ()i −1 2 ∑ ⎣⎦⎡⎤embm()⋅− (1 ) ⎡⎤em() forward ∑ ⎣⎦ m=0 m=0 ki = Li−+1 ()i −1 2 ∑ ⎣⎦⎡⎤bm()−1 m=0 15 16

Direct Computation of k Parameters Direct Computation of k Parameters

• minimize sum of forward and backward prediction errors over if we window and sum over all time, then • fixed interval (covariance method) 22 Li−+11 Li −+ L−1 22 ()ii−−11 () Eembm()iii=+⎡⎤⎡⎤ () ( ) () ( ) ⎡⎤⎡em ( )=− bm (1 ) ⎤ Burg ∑{}⎣⎦⎣⎦ ∑∑⎣⎦⎣ ⎦ m=0 mm==00 L−1 2 ∞ 2 ⎡⎤emkbm()ii−−11() () ( )⎡⎤ kemb () ii −− 11 () ()()m =−−+−+∑ ⎣⎦ii1 ∑ ⎣⎦−1 therefore m=0 m=−∞ ∂E ()i L−1 Burg 02⎡⎤emkbm()ii−−11() () ( 1 ) bm () i − 1 ( 1 ) ==−∑ ⎣⎦ −i − − ∂ki m=0 PARCOR forward backward forward backward L−1 ()ii−−11 () () i − 1 kkkkkiiiii=== 21⎡⎤ke() m b ( m ) e () m −−∑ ⎣⎦i + − Li−+1 m=0 ()ii−−11 () L−1 embm() (−1 ) ()ii−−11 () ∑ 21∑ ⎣⎦⎡⎤embm()⋅− ( ) m=0 Burg m=0 = ki = 12/ L−−1122L ⎧⎫Li−+1122 Li −+ ()ii−−11 () ⎡⎤⎡⎤e(i −1)()()mbmi −1 (−1 ) ∑∑⎣⎦⎣⎡⎤⎡em()+− bm (1 ) ⎦⎤ ⎨⎬∑∑⎣⎦⎣⎦ m=0 m=0 mm==00 ⎩⎭17 Burg 18 •−≤11ki ≤ always

3 Comparison of Autocorrelation and Burg Spectra Summary of Lattice Procedure

• steps involved in determining the predictor coefficients and the k parameters for the lattice method are as follows 1. initial condition, emsmbm()00 ( )== ( ) () ( ) from Eq. *3 ()1 2. compute k1 = α1 from Eq. *4 3. determine forward and backward prediciton errors embm()11 ( ), () ( ) from Eqs. *1 and *2 4. set i = 2 ()i • significantly less 5. determine ki = αi from Eq. *4 ()i smearing of formant 6. determine α j for j =−12, ,...,i 1 from Durbin iteration peaks using Burg 7. determine em()ii ( ) and bm () ( ) from Eqs. *1 and *2 method 8. set ii=+1 9. if ip is less than or equal to , go to step 5 10. procedure is terminated • predictor coefficients obtained directly from speech samples => without calculation of autocorrelation function

• method is guaranteed to yield stable filters (ki <1) without using window 19 20

Summary of Lattice Procedure Summary of Lattice Procedure (0) (1) (2) s[n] e [n] e [n] e [n] e( p)[n] e[n]

− k1 − k2 − k p −1 − k1 −1 − k − k z z 2 z−1 p ( p) b(0)[n] b(1)[n] b(2)[n] b [n]

embmsm()00 ()= () ()=∗ () 3

Li−+1 ∑ embm()ii−−11() () (−1 ) k = m=0 ∗4 i 12/ ⎧⎫⎡⎤⎡⎤Li−+11 Li −+ ⎪⎪()ii−−12 () 1 2 ⎨⎬⎢⎥⎢⎥∑∑[()][()]em bm−1 ⎩⎭⎪⎪⎣⎦⎣⎦⎢⎥⎢⎥mm==00

()ii ()−−11 () i em ( )= e ( mkbm )−−∗i (1 ) 1 bm()ii ( )= b ()−−11 ( m−−1 ) kem () i ( ) ∗ 2 21 i 22

Prediction Error Signal LPC Comparisons

1. Speech Production Model p sn()=−+ asn ( k ) Gun () ∑ k Gu(n) s(n) k =1 H(z) Sz() G Hz()==p Uz() −k 1− ∑azk k =1 2. LPC Model: p en()=−=− sn () sn () sn ()α sn ( − k ) % ∑ k s(n) e(n) k =1 A(z) p Ez() −k Az()==−1 ∑αk z Sz() k =1 3. LPC Error Model: 11Sz() ==p Az() Ez () −k e(n) s(n) 1− ∑αk z k =1 1/A(z) p sn()=+ en ()α sn ( − k ) ∑ k Perfect reconstruction even if k =1 23 24 ak not equal to αk

4 LPC Comparisons Comparisons Between LP Methods

• the various LP solution techniques can be compared in a number of ways, including the following: – computational issues – numerical issues – stability of solution – number of poles (order of predictor) – window/section length for analysis

25 26

LP Solution Computations LP Solution Comparisons

• stability – guaranteed for autocorrelation method – cannot be guaranteed for covariance method; as window size gets larger, this almost always makes the system stable – guaranteed for lattice method • choice of LP analysis parameters – need 2 poles for each vocal tract resonance below F /2 • assume L1≈ L2>>p; choose values of L1=300, L2=300, L3=300, p=10 s – need 3-4 poles to represent source shape and • computation for radiation load 3 • covariance method ≈ L1p+p ≈ 4000 *,+ – use values of p ≈ 13-14 2 • autocorrelation method ≈ L2p+p ≈ 3100 *,+

• lattice method ≈ 5L3p ≈ 15000 *,+ 27 28

LP Speech Analysis

file:s5, ss:11000, frame size (L):320, lpc order (p):14, cov method Top panel: speech signal Second panel: error The Prediction Error Signal signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 29 30

5 LP Speech Analysis LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, ac method file:s3, ss:14000, frame size (L):160, lpc order (p):16, cov method Top panel: Top panel: speech signal speech signal Second Second panel: error panel: error signal signal Third panel: Third panel: log log magnitude magnitude spectra of spectra of signal and LP signal and LP model model Fourth Fourth panel: log panel: log magnitude magnitude spectrum of spectrum of error signal error signal 31 32

LP Speech Analysis Variation of Prediction Error with LP Order

file:s3, ss:14000, frame size (L):160, lpc order (p):16, ac method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal • prediction error flattens at around p=13-14 • normalized prediction error for unvoiced speech larger than for 33 34 voiced speech => model isn’t quite as accurate

Prediction Error Signal Behavior LP Solution Comparisons

• choice of LP analysis parameters - the prediction error signal is computed as – use small values of L for reduced computation p en ( ) sn ( ) sn ( k ) Gun ( ) =−∑αk −= –use l order of several pitch periods for reliable k =1 prediction-especially when using tapered window - en ( ) should be large at the beginning of –use l from 100-400 samples at 10 kHz for each pitch period (voiced speech) => autocorrelation good signal for pitch detection - can perform autocorrelation on en ( ) and – for lattice and covariance methods, L as small as 2 p detect largest peak has been used (within pitch periods); however if pitch - error spectrum is approximately flat-so pulse occurs within window, bad prediction results => effects of formants on pitch detection are use much larger values of L minimized

35 36

6 Normalized Mean-Squared Error

- for autocorrelation method - the prediction error sequence (defining α =−1 ) is Lp+−1 0 2 p emnˆ () Experimental Evaluations of ∑ emnnˆˆ ( )=−αk sm ( − k ) m=0 ∑ V = k =0 nˆ L−1 2 - giving many forms for the normalized error LPC Parameters smnˆ () ∑ pp m=0 φ (,ij ) V = ααnˆ nˆ ∑∑ ij(,) - for covariance method ij==00 φnˆ 00 L−1 p φ (,i 0 ) em2 () V =− α nˆ ∑ nˆ nˆ ∑ i φnˆ (,)00 V = m=0 i =0 nˆ L−1 p 2 2 sm() Vkˆ =−()1 i ∑ nˆ n ∏ m=0 i =1

37 38

Normalized Mean-Squared Error Normalized Mean-Squared Error

• Vn versus p for synthetic vowel, pitch period of 83 samples, L=60, pitch synchronous analysis • normalized error versus p for pitch asynchronous analysis, L=120, • covariance method-error goes to zero at p=11, the order of the synthesis filter normalized error falls to 0.1 near p=11 for both covariance and

• autocorrelation method, Vn≈0.1 for p≥7 since error dominated by prediction at autocorrelation methods beginning of interval 39 40

Normalized Mean-Squared Error Normalized Mean-Squared Error

• normalized error versus L, p=12 •for L < pitch period (83 samples), covariance method gives smaller normalized error than autocorrelation method • for values of L at or near multiples of pitch period, normalized error jumps due to • results for natural speech large prediction error in vicinity of pitch pulse

• when L > 2 * pitch period, normalized error same for both autocorrelation and 41 42 covariance methods

7 Normalized Mean-Squared Error

• variability of normalized error with positioning of frame • sample-by-sample LP analysis of 40 msec of vowel /i/ • signal energy in part a; normalized error in part b for p=14 pole analysis using 20 msec frame size for Properties of the LPC covariance method; normalized error in part c for autocorrelation method using HW; normalized error Polynomial in part d for autocorrelation method using RW • average pitch period of 84 samples => 2.5 pitch periods in 20 msec window • substantial variations in normalized error for covariance method-especially peaked when pitch pulses enter window => longer windows give larger normalized errors • substantial high frequency variations in normalized error for autocorrelation method with some pitch modulations 43 44

Minimum-Phase Property of A(z) PARCORs and Stability

Az( ) has all its zeros inside the unit circle ()i •≥⇒≥prove that kzij11 for some j 2 Proof: Assume that zzoo (> 1 ) is a zero (root) of Az() i A()zzzAz=− (1 −1 )′ () ()ii ()−1111−−− ii () () i − o A ()zA=− () zkzAzij ( ) =−∏ (1 zz ) The minimum mean-squared error is j =1

∞ π It is easily shown that − ki is the coefficient 2221 jjωω EemAeSednnˆˆ==∑ [ ] | ( )| | n ˆ ( )| ω −ii() () i ∫ of zAz in ( ), i.e., αii= k . Therefore, m=−∞ 2π −π p π 22 2 kz= ()i 1 − jjωω′ j ω ij∏ =−2π ∫ 10zeo A() e Snˆ () e dω > j =1 −π 22 If k ≥ 1 , then either all the roots must be on the −∗−jjωω2 i 111−−ze= z(/ z ) e oo o unit circle or at least one of them must be outside Thus, Az ( ) could not be the optimum filter because the unit circle. ∗ 45 46 we could replace zzoo by (1 / ) and decrease the error.

PARCORs and Stability Root Locations for Optimum LP

•≥ if ki 1 , then either all the roots must be on the Model unit circle or at least one of them must be outside ()i GG the unit circle. Since this is true for all Az ( ), Hz% ()== ip= 12, ,..., , a necessary condition for the roots of p Az() −i Az()p ( ) to be inside the unit circle is: 1− ∑αi z i=1 kii <=112 , , ,..., p p th GGz • for the i -order optimum linear predictor, == i pp EkEkE()ii=− (11021 ) (− ) = ( − 20 ) ( ) > −1 ij∏ (1−−zz ) ( z z ) j =1 ∏∏ii ()p ii==11 so ki < 1 and therefore Az ( ) has all its roots

inside the unit circle. 47 48

8 Pole-Zero Plot for Model Pole Locations Which poles correspond to formant frequencies?

49 50

Pole Locations (FS=10,000 Hz) Estimating Formant Frequencies

• compute A(z) and factor it. • find roots that are close to the unit circle. • compute equivalent analog frequencies from the angles of the roots. • plot formant frequencies as a function of time.

FF=⋅(θ /180) (S / 2) 51 52

Spectrogram with LPC Roots Spectrogram with LPC Roots

53 54

9 Comparison to ABS Methods

• error measure for ABS methods is log ratio of power spectra, i.e.,

π 2 ⎪⎪⎧⎫⎡⎤|(Sejω )|2 Ed′ = ⎨⎬ log⎢⎥n ω ∫ |(Hejω )|2 −π ⎩⎭⎪⎪⎣⎦⎢⎥ • thus for ABS minimization of E′ is equivalent to minimizing mean squared Relation of LP Analysis to error between two log spectra comparing EE and we see the following: • LPC ABS Lossless Tube Models o both error measures related to ratio of signal to model spectra o both tend to perform uniformly over whole frequency range o both are suitable to selective error minimization over specified frequency ranges o error criterion for LP modeling places higher weight on frequency regions jjωω22 where |Sn (eHe) |< | ( ) | , whereas the ABS error criterion places equal weight on both regions ⇒ for unsmoothed signal spectra (as obtained by STFT methods), LP error criterion yields better spectral matches than ABS error criterion ⇒ for smoothed signal spectra (as obtained at output of filter banks), both error criteria will perform well 55 56

Discrete-Time Model - I Discrete-Time Model - II

N -section lossless tube model corresponds to discrete-time system when: cN FS = 2l where cN is the velocity of sound, is the number of

tube sections, FS is the sampling frequency, and l is the total length of the vocal tract.

The reflection coefficients {,1rkNk ≤≤ − 1} are related to the areas of the lossless tubes by: AA− r = kk+1 k AA+ • Make all sections the same length with kk+1 Can find transfer function of digital system subject to

delay constraints of the form rG =1, ρcA/ − Z rr== NL τ =Δx /wherecNx = Δ NL l 57 ρcA/ NL+ Z 58

Discrete-Time Model - III Discrete-Time Model - IV Given the system function: N 051.(++rrz ) ( 1 ) −N / 2 Uz() Gk∏ Vz()==Lk=1 UzG () Dz ()

⎛⎞AAkk+1 − −≤11rk =⎜⎟ ≤ reflection coefficient ⎝⎠AAkk+1 + ρcA/ NL− Z if rRGGNL==∞==1 (i.e., ) and rr ρcA/ NL+ Z then Dz ( ) satisties the recursion

Dz0 ( )= 1 −−k 1 Dzkkkk()=+ DzrzDz−−11 () ( ) k =12 ,,K , N

59 Dz( )= DN ( z ) 60

10 All-Pole Lattice Filter for H(z) Tube Areas from PARCORS e[n] s[n]

k p k p−1 k1 • Relation to areas: − k − k −1 p−1 −1 − k1 −1 ⎛⎞⎛⎞AA−−(/) AA1 p z z z ii++11 ii −≤−=11krii =⎜⎟⎜⎟ = ≤ ⎝⎠⎝⎠AAii++11++(/) AA ii1 Az()0 ( )= 1 •Solve for Ai+1 in terms of Ai A()iiii()zAzkzAzip=− (−−−−111 ) () ( ) ( ) =12 ,, , i K ⎛⎞1− k ⎛⎞ ()p i Akii+1 1− Az( )= A ( z ) AAii+1 = ⎜⎟> 0 =>⎜⎟0 1+ k ⎝⎠i Akii⎝⎠1+ 1 Sz()= Ez () • Log area ratios (good for quantization) Az()

⎛⎞Aii+1 ⎛1− k ⎞Minimizes spectral gi ==log⎜⎟ log ⎜ ⎟ If rkii=− and Np = , then DzAz ( ) = ( ). sensitivity under ⎝⎠Aii ⎝1+ k ⎠ 61 uniform quantization 62

Estimating Tube Areas from Speech Estimation of Tube Areas (Wakita)

1. Sample speech [sn [ ]= sa ( nT )] at sampling Analysis of syllable /IY/ /B/ /AA/ rate FTpcs ==12/ / (l ). • /IY/ sound in part (a) 2. Remove effects of glottal source and radiation by • /B/ sound in parts (b) and (c) pre-emphasis xn[ ]=−− sn [ ] sn [1 ]. • /AA/ sound in part (d) 3. Compute the PARCOR coefficients on a short-

time basis: kii , = 12 , ,..., p . Using Wakita method 41. Assuming A1 = (arbitrary), compute to estimate tube areas from the speech ⎛⎞ waveform as shown in 1− ki lower plot. AAipii+1 ==−⎜⎟,12 , ,..., 1 ⎝⎠1+ ki H. Wakita, IEEE Trans. Audio and Electroacoustics,October, 1973. 63 64

Estimations of Tube Areas

Use of Wakita method to estimate tube areas for the five Alternative Representations vowels, /IY/, /EH/, /AA/, /AO/, /UW/ of the LP Parameters

65 66

11 PARCORs to Prediction Coefficients LP Parameter Sets • assume that kii ,= 12 , ,K , p are given. Then we can

skip the computation of ki in the Levinson recursion. for ip= 12, ,K , ()i αii= k if iji>=−1121 , then for , ,K , ()ii ()−−11 () i ααjj=−k iij α− end end ()p ααjj==jp12,,K , 67 68

Prediction Coefficients to PARCORs LP Parameter Transformations •=assume that α j ,jp12 , ,K , are given. Then we can work backwards through the Levinson Recursion. • roots of the predictor polynomial

()p p p −−k 1 ααjj==for j 12, ,K ,p Az ( ) =−11α z = − zz ∑ kk∏() k = α ()p k =1 k =1 pp • where each root can be expressed as a z-plane or s-plane for ipp=−,12 ,K , root, i.e.,

for ji=−12, ,K , 1 zzkkrki=+ jz ; s k =+Ωσ k j k ()ii () αα+ k z = esTk α ()i −1 = jiij− k j 2 giving 1− ki

end 11−122⎡⎤zki Ω=k tan⎢⎥ ;σ k = log()zz kr + ki ()i −1 Tz⎣⎦kr 2 T kii−−11= α • important for formant estimation end 69 70

LP Parameter Transformations LP Parameter Transformations • IR of all pole system • cepstrum of IR of overall LP system from predictor coefficients p n−1 ⎛⎞k hn ( )= αδ hn (−+ k ) G ( n ) 0 ≤ n hnˆˆ ( )=+αα hk ( ) 1 ≤ n ∑ k nnk∑⎜⎟n − k =1 k =1 ⎝⎠ autocorrelation of IR • predictor coefficients from cepstrum of IR • ∞ n−1 ˆˆ⎛⎞k Ri%% ( )=−=− hnhn ( ) ( i ) R ( i ) ααnnk=−hn ( ) hk ( ) − 1 ≤ n ∑ ∑⎜⎟n k =1 ⎝⎠ n=0 where p Ri%%()=−≤αk R (| i k |) 1 i ∞ G ∑ Hz ( )== hnz ( ) −n k =1 ∑ p n=0 −k p 1− αk z 2 ∑ RRkG%%()0 =+αk () k =1 ∑ 71 k =1 72

12 LP Parameter Transformations LP Parameter Transformations

• autocorrelation of the predictor polynomial p • coefficients from PARCOR coefficients −k Az ( ) =−1 αk z ∑ ⎡⎤Akii+1 ⎡1− ⎤ k =1 gipi == log⎢⎥ log ⎢ ⎥1 ≤≤ with IR of the inverse filter ⎣⎦Akii ⎣1+ ⎦ p with inverse relation an ( )=−δαδ ( n ) ( n − k ) ∑ k 1− egi k =1 kip=≤≤1 i g with autocorrelation 1+ e i pi− R ( i )=+≤≤ akak ( ) ( i ) 0 i p a ∑ k =0 73 74

Quantization of LP Parameters Quantization of LP Parameters • consider the magnitude-squared of the model frequency response 2 1 He (jω )== P (ω , g ) |(Aejω )|2 where gP is a parameter that affects . • spectral sensitivity can be defined as • spectral sensitivity for ⎡⎤π ∂S 11 Pg(,ω i ) log area ratio = lim⎢⎥∫ log dω Δgi →0 ∂giiΔ+ΔgPgg⎣⎦2πω−π (, ii ) parameters, gi – low • spectral sensitivity for ki sensitivity for virtually which measures sensitivity to errors in the gi parameters parameters; low sensitivity entire range is seen around 0; high sensitivity 75 around 1 76

Line Spectral Pair Parameters Line Spectral Pair Parameters

−−12 −p Az( )=+1 αα12 z + z + ... + αp z = all-zero prediction filter with all zeros, z , inside the unit circle k zeros of Pz ( ) and Qz ( ) fall on unit circle and are interleaved with −+()pppp11 − − 1 −+ 1 − −+ () 1 Az%( )==++++ z Az ( )αααp z ...21 z z z each other ⇒ set of {}ωk called Line Spectral Frequencies (LSF) = reciprocal polynomial with inverse zeros, 1 / z k LSFs are in ascending order consider the following: Az%() stability of Hz ( ) guaranteed by quantizing LSF parameters Lz ( )== allpass system ⇒ Le (jω ) =1 , all ω Az() jjωω jω Pe()+ Qe () form the symmetric polynomial Pz ( ) as: Ae ( ) = 2 −+()p 11 − Pz ( )=+=+ Az ( ) Az%% ( ) Az ( ) z Az ( ) ⇒ Pz ( ) has zeros for Lz ( ) =−=−1 ; ( Az ( ) A())z 22 jjωω jωk 2 Pe()+ Qe () ⇒=+⋅=−arg{}Le ( ) ( k12 / ) 2π , k 01 , ,..., p 1 Ae()jω = form the anti-symmetric polynomial Qz ( ) as: 4 Qz ()=−=− Az () Az%% () Az () z−+()p 11 Az ( − ) ⇒ Qz () has zeros for Lz() =+=1 ;(() Az Az ())

j ⇒=⋅=−arg{}Le (ωk ) k201π , k , ,..., p 1 77 78

13 LSP Example Line Spectrum Pair (LSP) Parameters p = 12 • properties of LSP parameters ∗ Pz( ) roots 1. Pz ( ) corresponds to a lossless tube, open at the lips and open ( kp+1 = 1 ) at the glottis o Qz ( ) roots 2. Qz ( ) corresponds to a lossless tube, open at the lips and closed (k = −1 ) p+1 x Az ( ) roots at the glottis 3. all the roots of Pz ( ) and Qz ( ) are on the unit circle 4. if pPzzQz is an even integer, then ( ) has a root at =+1 and ( ) has a root at z =−1

5. a necessary and sufficient condition for kii <=112 , , ,..., p is that the roots of Pz ( ) and Qz ( )alternate on the unit circle 6. the LSP frequencies get close together when roots of Az ( ) are close to the unit circle 7. the roots of Pz ( ) are approximately equal to the formant frequencies 79 80

Line Spectrum Pair (LSP) Parameters LPC Synthesis

• the basic synthesis equation is p sn ( ) sn ( k ) Gun ( ) %%=−+∑αk k =1 • need to update parameters every 10 msec or so • pitch synchronous mode works best

81 82

LPC Analysis-Synthesis LPC header

1. Extract αk parameters properly • bit rates of 72 FS where F =100, 67, 2. Quantize αk parameters properly so that there is little S quantization error and 33 for bit rates • Small number of bits go into coding the α coefficients of 7200 bps, 4800 k bps and 2400 bps 3. Represent e(n) via: • Pitch pulses and noise—LPC Coding 3600 bps • Multiple pulses per 10 msec interval—MPLPC Coding 2400 bps • Codebook vectors—CELP

• Almost all of the coding bits go into coding of e(n) VEV LPC

83 84

14 LPC Basics LPC Basics-Speech Model

e(n)=Gu(n) s(n) sn(m)A(z) en(m)H(z) sn(m) +

p Ez() P(z) Az ( )=−11α z−k =− Pz ( ) = ; prediction error filter ∑ k Sz() k =1 GGSz() p Hz ( ) === em ( )=− sm ( )α sm ( − k ); prediction error p nnˆˆ∑ k n ˆ −k 1− Pz() Uz () k =1 1− ∑αk z 11 1 k =1 Hz ( )==p = ; all pole model Az()−k 1− Pz () 1− ∑αk z jω G k =1 He()= p 2 ∞ ∞ p − jkω 2 ⎛⎞ 1− αke Eemsmnnˆˆ==−∑ []()∑∑⎜⎟ n ˆ () αksmnˆ ()− k ∑ m=−∞ mk=−∞⎝⎠ =1 k =1 85 86

Summary

• the LP model has many interesting and useful properties that follow from the structure of the Levinson-Durbin algorithms • the different equivalent representations have different properties under quantization – polynomial coefficients (bad) – polynomial roots (okay) – PARCOR coefficients (okay) – lossless tube areas (good) – LSP root angles (good) • almost all LPC representations can be used with a range of compression schemes and are all good candidates for the technique of Vector Quantization

87

15