Digital Speech Processing— Lecture 14

Linear Predictive Coding (LPC)-Lattice Methods, Applications

1 Prediction Error Signal

1. Speech Production Model p sn()=−+ asn ( k ) Gun () ∑ k Gu(n) s(n) k =1 H(z) Sz() G Hz()==p Uz() −k 1− ∑azk k =1 2. LPC Model: p en()=−=− sn () sn () sn ()α sn ( − k ) % ∑ k s(n) e(n) k =1 A(z) p α Ez() −k Az()==−1 ∑ k z Sz() k =1 3. LPC Error Model: 11Sz() ==p Az() Ez () −k α e(n) s(n) 1− ∑ k z k =1 1/A(z) p sn()=+ en ()α sn ( − k ) ∑ k Perfect reconstruction even if k =1 2 ak not equal to αk Lattice Formulations of LP

• both covariance and autocorrelation methods use two step solutions 1. computation of a matrix of correlation values 2. efficient solution of a set of linear equations • another class of LP methods, called lattice methods, has evolved in which the two steps are combined into a recursive algorithm for determining LP parameters • begin with Durbin algorithm--at the i th stage the set of ()ith coefficients {α j ,ji= 12 , ,..., } are coefficients of the i order optimum LP

3 Lattice Formulations of LP

• define the system function of the i th order inverse filter (prediction error filter) as i Az()iik ( ) =−1 α () z− ∑ k k =1 • if the input to this filter is the input segment ˆ ()ii () ˆ smnˆ ()=+ snmwm ( )(), with output emnˆ ( )=+ enm ( ) i e()ii() m=− sm ()α () sm ( − k ) ∑ k k =1 • where we have dropped subscript nˆ - the absolute location of the signal • the z-transform gives ⎛⎞i Ez()ii ( )==− AzSz () ( ) ( ) ⎜⎟1 ()ikzSz− () ⎜⎟∑ k ⎝⎠k =1 α 4 Lattice Formulations of LP

• using the steps of the Durbin recursion ()ii (-)11 (-) i () i (ααjj = α−= αkkii ij- , and i ) ()i we can obtain a recurrence formula for Az ( ) of the form ()ii (−−−11 ) ii ( ) −1 Az ( )=− A ( zkzA ) i ()z • giving for the error transform the expression ()ii (−−−−111 ) ii ( ) E ( z )=− A ( zSz ) ( ) kzi A ( z ) Sz ( ) ()ii ()−−11 () i em()=− e () mkbmi ( −1 )

5 Lattice Formulations of LP

• where we can interpret the first term as the z-transform of the forward prediction error for an (i −1 )st order predictor, and the second term can be similarly interpreted based on defining a backward prediction error ()iii−− ()1111 −−− ii ( ) ( i − ) Bz ( )== zAzSzzA ( ) ( ) ( zSzkAzSz ) ( ) −i ( ) ( ) −−11()ii () − 1 =−zB() z kEi () z • with inverse transform i bm()iii ( )=−− smi ( )α () smki ( +−= ) b (−1 ) ( m −1)()− ke()i −1 m ∑ k i k =1 •− with the interpretation that we are attempting to predict sm ( i ) from the ismi samples of the input that follow (− ) => we are doing a backward prediction and bm()i ( ) is called the backward prediction error sequence

6 Lattice Formulations of LP

same set of samples is used to forward predict s(m) and backward predict s(m-i) 7 Lattice Formulations of LP

• the prediction error transform and sequence Ezem()ii ( ), () ( ) can now be expressed in terms of forward and backward errors, namely ()ii (−−−111 ) ( i ) Ez()=− E () zkzBi () z ()ii (−−11 ) ( i) em ( )=− e ( mkb ) i ()1m −∗1 • similarly we can derive an expression for the backward error transform and sequence at sample m of the form ()iii−−11 ( ) ( − 1 ) Bz()=− zB () zkEzi () ()ii (−−11 ) ( i ) bm ( )=−−∗ b ( m1 ) kemi ( ) 2 • these two equations define the forward and backward prediction error for an i th order predictor in terms of the corresponding prediction errors of an (i −1 )th order predictor, with the reminder that a zeroth order predictor does no prediction, so embmsm()00 ()==≤≤− () () ()01 mL ∗ 3 Ez()== E()p () z Az ()/() Sz 8 Lattice Formulations of LP

Assume we know k1 (from external computation): 1. we compute em(1)[] and bm (1) [] from sm[],0≤≤ m L using Eqs. ∗ 1 and ∗ 2 2. we next compute em(2)[] and bm (2) [] for 01≤≤+mL using Eqs. ∗1 and ∗ 2 3. extend solution (lattice) to p sections giving em()pp[] and bm () [] for 0≤ mLp ≤+− 1 4. solution is en[]= e()p [] n at the output of the pth lattice section

9 Lattice Formulations of LP

• lattice network with p sections—the output of • no direct correlations which is the forward prediction error • no alphas • digital network implementation of the prediction • k’s computed from forward error filter with transfer function A(z) and backward error signals10 Lattice Filter for A(z)

en()00[]== bn () [] sn []01 ≤≤− nL ()ii ()−−11 () i en[ ]=− e [ n ] kbni [ −=1120 ] i , ,..., p , ≤≤−+ nL 1 i ()iii ()−−11 () bn[ ]=− keni [ ] + b [ n −1120 ] i = , ,..., p , ≤ nL ≤ − 1 + i en[]=≤≤−+ e()p [], n01 n L p Ez()= AzSz ()() 11 All-Pole Lattice Filter for H(z) en()p [] en(1)p − [] en(1) [] en(0)[] s []n k k p −1 p k1

− k − k − k p p −1 1

−1 −1 z z bn()p [] bn(1)p − [] bn(1) [] bn(0)[] en()p []=≤≤−+ en [],01 nL p ()ii−−11 () () i enenkbn[]=+ []i [ −1 ] ipp=−,110 ,..., , ≤≤−+− nL 1 i 1 ()iii ()−−11 () bn[]=− keni [] + b [ n −1 ] ipp=−,110 ,..., , ≤≤−+ nL 1 i 1 ()00 () Sz()= 12 Ez () snen[]== [] bn [],01 ≤≤− nL Az() All-Pole Lattice Filter for H(z)

1. since bi()ii[1]0,−= ∀ , we can first solve for e (− 1) [0] for ipp=−,1,...,1,using the relationship: e(1)ii− [0][0] = e ()

This image cannot currently be displayed. 2. since be(0)[0]= (0) [0] we can then solve for b (i ) [0] for ()ii (− 1) ip=1,2,..., using the equation: bke[0]=− i [0] 3. we can now begin to solve for e(1)i− [1] as: (1)ii−− () (1) i eekbipp[1]=+ [1]i [0], =− , 1, ...,1 4. we set be(0)[1]= (0) [1] and we can then solve for b (i ) [1] for ip=1,2,..., using the equation: ()iii (−− 1) ( 1) bkebi[1]=−i [1] + [0], = 1,2,..., p 5. we iterate for nN=−2,3,..., 1 and end up with sn[]== e(0) [] n b (0) [] n

13 Lattice Formulations of LP

• the lattice structure comes directly out of the Durbin algorithm

• the ki parameters are obtained from the Durbin equations

• the predictor coefficients, αk , do not appear explicitly in the lattice structure

• can relate the ki parameters to the forward and backward errors via Li−+1 ∑ embm()ii−−11() () (−1 ) k =∗m=0 4 i 12/ ⎧⎫⎡⎤⎡⎤Li−+11 Li −+ ⎪⎪()ii−−12 () 1 2 ⎨⎬⎢⎥⎢⎥∑∑[em ( )] [ bm (−1 )] ⎩⎭⎪⎪⎣⎦⎣⎦⎢⎥⎢⎥mm==00

• where ki is a normalized cross correlation between the forward and backward prediction error, and is therefore called a partial correlation or PARCOR coefficient • can compute predictor coefficients recursively from the PARCOR coefficients

14 Direct Computation of k Parameters

assume sn [ ] non-zero for 01≤ n≤− L

assume ki chosen to minimize total energy of the forward (or backward) prediction errors we can then minimize forward prediction error as : 2 Li−+112 Li −+ Eem()ii⎡ () ( )()()⎤⎡emkbm()ii−−11 () ⎤ forward = ∑∑⎣ ⎦⎣=−−i 1 ⎦ mm==00 ∂E ()i Li−+1 forward ⎡⎤emkbm()ii−−11() () ( ) bm () i − 1 ( ) ==−02∑ ⎣⎦ −i − 1 − 1 ∂ki m=0 Li−+1 ()ii−−11 () ∑ ⎣⎦⎡⎤embm()⋅− (1 ) forward m=0 ki = Li−+1 2 ()i −1 ∑ ⎣⎦⎡⎤bm()−1 m=0 15 Direct Computation of k Parameters

• we can also choose to minimize the backward prediction error 2 Li−+112 Li −+ E()ii⎡⎤ bm () ( ) ⎡ kembm ( ii−−11 ) ( ) ( ) (1 ) ⎤ backward==−+−∑∑⎣⎦ ⎣ i ⎦ mm==00 ∂E ()i Li−+1 backward ⎡⎤ke()ii−−11() m b () ( m )em()i −1 () ==−02∑ ⎣⎦ −i + − 1 ∂ki m=0 Li−+1 ()ii−−11 () ∑ ⎣⎦⎡⎤embm() (−1 ) backward m=0 ki = Li−+1 ()i −1 2 ∑ ⎣⎦⎡⎤em() m=0

16 Direct Computation of k Parameters

• if we window and sum over all time, then

Li−+1122 Li −+ ()ii−−11 () ∑∑⎣⎦⎣⎡⎤⎡em ( )=− bm (1 ) ⎦⎤ mm==00

therefore

PARCOR forward backward forward backward kkkkkiiiii=== Li−+1 ∑ embm()ii−−11() () (−1 ) m=0 = / Li−+11 Li −+ 12 ⎧⎫(i −1)()22i −1 ⎨⎬∑∑⎡⎤⎡⎤⎣⎦⎣⎦e ()mbm (−1 ) mm==00 ⎩⎭17 Direct Computation of k Parameters

• minimize sum of forward and backward prediction errors over fixed interval (covariance method)

L−1 22 Eembm()iii⎡⎤⎡⎤ () ( ) () ( ) Burg =+∑{}⎣⎦⎣⎦ m=0

L−1 2 ∞ 2 ⎡⎤emkbm()ii−−11() () (1 )⎡⎤ kemb () ii −− 11 () ()()m 1 =−−+−+∑ ⎣⎦ii∑ ⎣⎦− m=0 m=−∞ ∂E ()i L−1 Burg 02⎡⎤emkbm()ii−−11() () () 1 bm () i − 1 () 1 ==−∑ ⎣⎦ −i − − ∂ki m=0 L−1 21⎡⎤ke()ii−−11() m b () ( m ) e () i − 1 () m −−∑ ⎣⎦i + − m=0 L−1 ()ii−−11 () 21∑ ⎣⎦⎡⎤embm()⋅− ( ) Burg m=0 ki = L−−1122L ()ii−−11 () ∑∑⎣⎦⎣⎡⎤⎡em()+− bm (1 ) ⎦⎤ m=0 m=0 Burg 18 •−≤11ki ≤ always Comparison of Autocorrelation and Burg Spectra

• significantly less smearing of formant peaks using Burg method

19 Summary of Lattice Procedure

• steps involved in determining the predictor coefficients and the k parameters for the lattice method are as follows 1. initial condition, emsmbm()00 ( )== ( ) () ( ) from Eq. *3 ()1 2. compute k1 = α1 from Eq. *4 3. determine forward and backward prediciton errors embm()11 ( ), () ( ) from Eqs. *1 and *2 4. set i = 2 5. determine k = ()i from Eq. *4 i i α ()i 6. determine j forα j =−12, ,...,i 1 from Durbin iteration 7. determine em()ii ( ) and bm () ( ) from Eqs. *1 and *2 8. set ii=+1 9. if ip is less than or equal to , go to step 5 10. procedure is terminated • predictor coefficients obtained directly from speech samples => without calculation of autocorrelation function

• method is guaranteed to yield stable filters (ki <1) without using window 20 Summary of Lattice Procedure

21 Summary of Lattice Procedure

(0) (1) (2) s[n] e [n] e [n] e [n] e( p)[n] e[n]

− k1 − k2 − k p −1 − k1 −1 − k − k z z 2 z−1 p ( p) b(0)[n] b(1)[n] b(2)[n] b [n]

embmsm()00 ( )= () ( )=∗ ( ) 3

Li−+1 ∑ embm()ii−−11() () (−1 ) k =∗m=0 4 i 12/ ⎧⎫⎡⎤⎡⎤Li−+11 Li −+ ⎪⎪()ii−−12 () 1 2 ⎨⎬⎢⎥⎢⎥∑∑[()][()]em bm−1 ⎩⎭⎪⎪⎣⎦⎣⎦⎢⎥⎢⎥mm==00

()ii ()−−11 () i em ( )= e ( mkbm )−−∗i (1 ) 1

bm()ii ( )= b (−−11 ) ( m−−1 ) kem ( i ) ( ) ∗ 2 i 22 Prediction Error Signal

1. Speech Production Model p sn()=−+ asn ( k ) Gun () ∑ k Gu(n) s(n) k =1 H(z) Sz() G Hz()==p Uz() −k 1− ∑azk k =1 2. LPC Model: p en()=−=− sn () sn () sn ()α sn ( − k ) % ∑ k s(n) e(n) k =1 A(z) p α Ez() −k Az()==−1 ∑ k z Sz() k =1 3. LPC Error Model: 11Sz() ==p Az() Ez () −k α e(n) s(n) 1− ∑ k z k =1 1/A(z) p sn()=+ en ()α sn ( − k ) ∑ k Perfect reconstruction even if k =1 23 ak not equal to αk LPC Comparisons

24 LPC Comparisons

25 Comparisons Between LP Methods

• the various LP solution techniques can be compared in a number of ways, including the following: – computational issues – numerical issues – stability of solution – number of poles (order of predictor) – window/section length for analysis

26 LP Solution Computations

• assume L1≈ L2>>p; choose values of L1=300, L2=300, L3=300, p=10 • computation for

3 • covariance method ≈ L1p+p ≈ 4000 *,+ 2 • autocorrelation method ≈ L2p+p ≈ 3100 *,+

• lattice method ≈ 5L3p ≈ 15000 *,+ 27 LP Solution Comparisons

• stability – guaranteed for autocorrelation method – cannot be guaranteed for covariance method; as window size gets larger, this almost always makes the system stable – guaranteed for lattice method • choice of LP analysis parameters – need 2 poles for each vocal tract resonance below Fs/2 – need 3-4 poles to represent source shape and radiation load – use values of p ≈ 13-14

28 The Prediction Error Signal

29 LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, cov method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 30 LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, ac method

Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal

31 LP Speech Analysis file:s3, ss:14000, frame size (L):160, lpc order (p):16, cov method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 32 LP Speech Analysis file:s3, ss:14000, frame size (L):160, lpc order (p):16, ac method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal

33 Variation of Prediction Error with LP Order

• prediction error flattens at around p=13-14 • normalized prediction error for unvoiced speech larger than for voiced speech => model isn’t quite as accurate 34 LP Solution Comparisons

• choice of LP analysis parameters – use small values of L for reduced computation – use l order of several pitch periods for reliable prediction-especially when using tapered window – use l from 100-400 samples at 10 kHz for autocorrelation – for lattice and covariance methods, L as small as 2 p has been used (within pitch periods); however if pitch pulse occurs within window, bad prediction results => use much larger values of L

35 Prediction Error Signal Behavior

- the prediction error signal is computed as p en ( )=− sn ( )α sn ( −= k ) Gun ( ) ∑ k k =1 - en ( ) should be large at the beginning of each pitch period (voiced speech) => good signal for pitch detection - can perform autocorrelation on en ( ) and detect largest peak - error spectrum is approximately flat-so effects of formants on pitch detection are minimized

36 Normalized Mean-Squared Error

- for autocorrelation method α - the prediction error sequence (defining α =−1 ) is Lp+−1 0 2 p emnˆ () ∑ emnnˆˆ ( )=−k sm ( − k ) m=0 ∑ V = k =0 nˆ L−1 ααφ 2 - giving many forms for the normalized error smnˆ () φ ∑ pp m=0 (,ij ) V = nˆ nˆ ∑∑ ijα φ - for covariance method nˆ (,)00 ij==00 φ L−1 p (,i 0 ) em2 () V =− nˆ ∑ nˆ nˆ ∑ i nˆ (,)00 V = m=0 i =0 nˆ L−1 p 2 2 sm() Vkˆ =−()1 i ∑ nˆ n ∏ m=0 i =1

37 Experimental Evaluations of LPC Parameters

38 Normalized Mean-Squared Error

• Vn versus p for synthetic vowel, pitch period of 83 samples, L=60, pitch synchronous analysis • covariance method-error goes to zero at p=11, the order of the synthesis filter

• autocorrelation method, Vn≈0.1 for p≥7 since error dominated by prediction at beginning of interval 39 Normalized Mean-Squared Error

• normalized error versus p for pitch asynchronous analysis, L=120, normalized error falls to 0.1 near p=11 for both covariance and autocorrelation methods 40 Normalized Mean-Squared Error

• normalized error versus L, p=12 •for L < pitch period (83 samples), covariance method gives smaller normalized error than autocorrelation method • for values of L at or near multiples of pitch period, normalized error jumps due to large prediction error in vicinity of pitch pulse

• when L > 2 * pitch period, normalized error same for both autocorrelation and 41 covariance methods Normalized Mean-Squared Error

• results for natural speech

42 Normalized Mean-Squared Error

• variability of normalized error with positioning of frame • sample-by-sample LP analysis of 40 msec of vowel /i/ • signal energy in part a; normalized error in part b for p=14 pole analysis using 20 msec frame size for covariance method; normalized error in part c for autocorrelation method using HW; normalized error in part d for autocorrelation method using RW • average pitch period of 84 samples => 2.5 pitch periods in 20 msec window • substantial variations in normalized error for covariance method-especially peaked when pitch pulses enter window => longer windows give larger normalized errors • substantial high frequency variations in normalized error for autocorrelation method with some pitch modulations 43 Properties of the LPC Polynomial

44 Minimum-Phase Property of A(z) Az( ) has all its zeros inside the unit circle 2 Proof: Assume that zzoo (> 1 ) is a zero (root) of Az() −1 A()zzzAz=− (1 o )′ () π The minimumπ mean-squared error is

∞ π 2221 ωωjj ω EemAeSednnˆˆ==[] |( )||( n ˆ )| ω ∑ ∫ ωω m=−∞ 2 − π π 22 2 1 − jj j =−10ze A′() e S () e d > 2 ∫ o nˆ ω −π 22 −∗−jjωω2 111−−zeoo= z(/ z o ) e Thus, Az ( ) could not be the optimum filter because

∗ 45 we could replace zzoo by (1 / ) and decrease the error. PARCORs and Stability

()i •≥⇒≥ prove that kzij11 for some j

i ()ii ()−1111−−− ii () () i − A ()zA=− () zkzAzij ( ) =−∏ (1 zz ) j =1

It is easily shown that − ki is the coefficient −ii() () i of zAz in ( ), i.e., αii= k . Therefore, p ()i kzij= ∏ j =1

If ki ≥ 1 , then either all the roots must be on the unit circle or at least one of them must be outside the unit circle. 46 PARCORs and Stability

•≥ if ki 1 , then either all the roots must be on the unit circle or at least one of them must be outside the unit circle. Since this is true for all Az()i ( ), ip= 12, ,..., , a necessary condition for the roots of Az()p ( ) to be inside the unit circle is:

kii <=112 , , ,..., p • for the i th -order optimum linear predictor, i ()ii21 (− ) 20 ( ) EkEkE=− (110ij ) =∏ ( − ) > j =1 ()p so ki < 1 and therefore Az ( ) has all its roots

inside the unit circle. 47 Root Locations for Optimum LP Model

GG Hz% ()==p Az() −i 1− ∑αi z i=1 GGzp ==pp −1 ∏∏(1−−ziizzz ) ( ) ii==11

48 Pole-Zero Plot for Model Which poles correspond to formant frequencies?

49 Pole Locations

50 Pole Locations (FS=10,000 Hz)

FF= (θ /180)⋅ (S / 2) 51 Estimating Formant Frequencies

• compute A(z) and factor it. • find roots that are close to the unit circle. • compute equivalent analog frequencies from the angles of the roots. • plot formant frequencies as a function of time.

52 Spectrogram with LPC Roots

53 Spectrogram with LPC Roots

54 Comparison to ABS Methods

• error measure for ABS methods is log ratio of power spectra, i.e.,

π 2 ⎪⎪⎧⎫⎡⎤|(Sej )|2 Ed′ = log n ω ⎨⎬⎢⎥j 2 ∫ |(He ω )| − ⎩⎭⎪⎪⎣⎦⎢⎥ π • thus for ABS minimizationω of E′ is equivalent to minimizing mean squared error between two log spectra

• comparing EELPC and ABS we see the following: o both error measures related to ratio of signal to model spectra o both tend to perform uniformly over whole frequency range o both are suitable to selective error minimization over specified frequency ranges o error criterion for LP modeling places higher weight on frequency regions jjωω22 where |Sn (eHe) |< | ( ) | , whereas the ABS error criterion places equal weight on both regions ⇒ for unsmoothed signal spectra (as obtained by STFT methods), LP error criterion yields better spectral matches than ABS error criterion ⇒ for smoothed signal spectra (as obtained at output of filter banks), both error criteria will perform well 55 Relation of LP Analysis to Lossless Tube Models

56 Discrete-Time Model - I

• Make all sections the same length with delay τ =Δxc/ where = Nx Δ l 57 Discrete-Time Model - II

N -section lossless tube model corresponds to discrete-time system when: cN FS = 2l where cN is the velocity of sound, is the number of tube sections, FS is the sampling frequency, and l is the total length of the vocal tract.

The reflection coefficients {,1rkNk ≤≤ − 1} are related to the areas of the lossless tubes by:

AAkk+1 − rk = AAkk+1 + Can find transfer function of digital system subject to constraints of the form rG =1,

ρcA/ NL− Z rrNL== ρcA/ NL+ Z 58 Discrete-Time Model - III

59 Discrete-Time Model - IV

Given the system function: N 051.(++rrz ) ( 1 ) −N / 2 Uz() Gk∏ Vz()==Lk=1 UzG () Dz ()

⎛⎞AAkk+1 − −≤11rk =⎜⎟ ≤ reflection coefficient ⎝⎠AAkk+1 + ρcA/ NL− Z if rRGGNL==∞==1 (i.e., ) and rr ρcA/ NL+ Z then Dz ( ) satisties the recursion

Dz0 ( )= 1 −−k 1 Dzkkkk()=+ DzrzDz−−11 () ( ) k =12 ,,K , N

Dz( )= DN ( z ) 60 All-Pole Lattice Filter for H(z) e[n] s[n]

k p k p−1 k1 − k − k p−1 − k1 p z−1 z−1 z−1

Az()0 ( )= 1 ()iiii (−−−−111 ) ( ) A ()zAzkzAzip=− ()i ( ) =12 ,,K , Az( )= A()p ( z ) 1 Sz()= Ez () Az()

If rkii=− and Np = , then DzAz ( ) = ( ). 61 Tube Areas from PARCORS

• Relation to areas:

⎛⎞⎛⎞AAii++11−−(/) AA ii1 −≤−=11krii =⎜⎟⎜⎟ = ≤ ⎝⎠⎝⎠AAii++11++(/) AA ii1

• Solve for Ai+1 in terms of Ai

⎛⎞1− ki Akii+1 ⎛⎞1− AAii+1 =>⎜⎟0 =>⎜⎟0 1+ k ⎝⎠i Akii⎝⎠1+ • Log area ratios (good for quantization) ⎛⎞A ⎛1− k ⎞Minimizes spectral g ==logii+1 log i ⎜⎟ ⎜ ⎟sensitivity under ⎝⎠Aii ⎝1+ k ⎠ uniform quantization 62 Estimating Tube Areas from Speech (Wakita)

1. Sample speech [sn [ ]= sa ( nT )] at sampling

rate FTpcs ==12/ / (l ). 2. Remove effects of glottal source and radiation by pre-emphasis xn[ ]=−− sn [ ] sn [1 ]. 3. Compute the PARCOR coefficients on a short-

time basis: kii , = 12 , ,..., p .

41. Assuming A1 = (arbitrary), compute

⎛⎞1− ki AAipii+1 ==−⎜⎟,12 , ,..., 1 ⎝⎠1+ ki H. Wakita, IEEE Trans. Audio and Electroacoustics,October, 1973. 63 Estimation of Tube Areas

Analysis of syllable /IY/ /B/ /AA/ • /IY/ sound in part (a) • /B/ sound in parts (b) and (c) • /AA/ sound in part (d)

Using Wakita method to estimate tube areas from the speech waveform as shown in lower plot.

64 Estimations of Tube Areas

Use of Wakita method to estimate tube areas for the five vowels, /IY/, /EH/, /AA/, /AO/, /UW/

65 Alternative Representations of the LP Parameters

66 LP Parameter Sets

67 PARCORs to Prediction Coefficients

•=assume that kii ,12 , ,K , p are given. Then we can

skip the computation of ki in the Levinson recursion. for ip= 12, ,K , ()i αii= k if iji>=−1121 , then for , ,K , ()ii=− (−−11 )k ( i ) jjαα α iij− end end ==()p jp12,, , jjαα K 68 Prediction Coefficients to PARCORs

•=assume that α j ,jp12 , ,K , are given. Then we can work backwards through the Levinson Recursion.

()p ααjj==for j 12, ,K ,p ()p kpp= α for ipp=−,12 ,K , for ji=−12, ,K , 1 ()ii+ k () ()i −1 = jiij− j 1− k 2 α i end αα ()i −1 kii−−11= end 69 α LP Parameter Transformations

• roots of the predictor polynomial p p Az ( ) 11 z−−k zz1 =−∑αkk =∏() − k =1 k =1 • where each root can be expressed as a z-plane or s-plane root, i.e.,

zzkkrki=+ jz ; s k =+Ω k j k

sTk zk = e giving σ

11−122⎡⎤zki Ω=k tan⎢⎥ ;σ k = log()zz kr + ki Tz⎣⎦kr 2 T • important for formant estimation 70 LP Parameter Transformations

• cepstrum of IR of overall LP system from predictor coefficients n−1 ⎛⎞k hnˆˆ ( )=+αα hk ( ) 1 ≤ n nnk∑⎜⎟ − k =1 ⎝⎠n • predictor coefficients from cepstrum of IR n−1 αα⎛⎞k =−hnˆˆ ( ) hk ( ) 1 ≤ n nnk∑⎜⎟ − k =1 ⎝⎠n where ∞ G Hz ( )== hnz ( ) −n ∑ p n=0 1 z−k − ∑αk k =1 71 LP Parameter Transformations • IR of all pole system p hn ( )=−+≤αδ hn ( k ) G ( n ) 0 n ∑ k k =1 • autocorrelation of IR ∞ Ri%% ( )=−=−∑ hnhn ( ) ( i ) R ( i ) n=0 p Ri%%()=−≤ R (| i k |) 1 i ∑ k k =1 α p RRkG%%()0 =+ () 2 ∑ k k =1 α 72 LP Parameter Transformations

• autocorrelation of the predictor polynomial p Az ( ) 1 z−k =−∑αk k =1 with IR of the inverse filter p an ( )=− ( n )δαδ ( n − k ) ∑ k k =1 with autocorrelation pi− R ( i )=+≤≤ akak ( ) ( i ) 0 i p a ∑ k =0 73 LP Parameter Transformations

• log area ratio coefficients from PARCOR coefficients

⎡⎤Akii+1 ⎡1− ⎤ gipi == log⎢⎥ log ⎢ ⎥1 ≤≤ ⎣⎦Akii ⎣1+ ⎦ with inverse relation 1− egi kipi =≤≤1 1+ egi

74 Quantization of LP Parameters • consider the magnitude-squared of the model frequency response 2 1 He (jω )== P (ω , g ) |(Aej )|2 where gP is a parameter ωthat affects . • spectral sensitivity can be defined as ∂S 11⎡⎤πωπ Pg(,ω ) = lim⎢⎥∫ log i d Δgi →0 ∂giiΔ+ΔgPgg⎣⎦2 − (, ii ) π ω which measures sensitivity to errors in the gi parameters

75 Quantization of LP Parameters

• spectral sensitivity for log area ratio

parameters, gi – low • spectral sensitivity for ki sensitivity for virtually parameters; low sensitivity entire range is seen around 0; high sensitivity around 1 76 Line Spectral Pair Parameters

−−12 −p Az()=+1 αα12 z + z α + ... + p z

= all-zero prediction filter with all zeros, zk , inside the unit circle −+()pppp11 − − 1 −+ 1 − −+ () 1 Az%()==++++ z Az ( )p z ...21 z z z = reciprocal polynomial with inverse zeros, 1 / z αααk consider the following: Az%() Lz ( )== allpass system ⇒ Le (jω ) =1 , all ω Az() form the symmetric polynomial Pz ( ) as: Pz ( )=+=+ Az ( ) Az%% ( ) Az ( ) z−+()p 11 Az ( − ) ⇒ Pz ( ) has zeros for Lz ( ) =−=−1 ; ( Az ( ) Az())

jω ⇒=+⋅=−arg{}Le (k ) ( k12 / ) 2π , k 01 , ,..., p 1 form the anti-symmetric polynomial Qz ( ) as: Qz ()=−=− Az{} () Az%% () Az () z−+()p 11 Az ( − ) ⇒ Qz () has zeros for Lz() =+=1 ;(() Az Az ())

j k π ⇒=⋅=−argLe ( )ω k201 , k , ,..., p 1 77 Line Spectral Pair Parameters

zeros of Pz ( ) and Qz ( ) fall on unit circle and are interleaved with

each other ⇒ set of {}ωk called Line Spectral Frequencies (LSF) LSFs are in ascending order stability of Hz ( ) guaranteed by quantizing LSF parameters ω Pe()jjωω+ Qe () Ae (j ) = 2 22 ω jj 2 Pe()+ Qe () Ae()j = ωω 4

78 Line Spectrum Pair (LSP) Parameters

• properties of LSP parameters

1. Pz ( ) corresponds to a lossless tube, open at the lips and open ( kp+1 = 1 ) at the glottis

2. Qz ( ) corresponds to a lossless tube, open at the lips and closed (kp+1 = −1 ) at the glottis 3. all the roots of Pz ( ) and Qz ( ) are on the unit circle 4. if pPzzQz is an even integer, then ( ) has a root at =+1 and ( ) has a root at z =−1

5. a necessary and sufficient condition for kii <=112 , , ,..., p is that the roots of Pz ( ) and Qz ( )alternate on the unit circle 6. the LSP frequencies get close together when roots of Az ( ) are close to the unit circle 7. the roots of Pz ( ) are approximately equal to the formant frequencies 79 LSP Example p = 12

∗ Pz( ) roots o Qz ( ) roots x Az ( ) roots

80 Line Spectrum Pair (LSP) Parameters

81 LPC Synthesis

• the basic synthesis equation is p sn ( ) sn ( k ) Gun ( ) %%=−+∑αk k =1 • need to update parameters every 10 msec or so • pitch synchronous mode works best

82 LPC Analysis-Synthesis

1. Extract αk parameters properly

2. Quantize αk parameters properly so that there is little quantization error

• Small number of bits go into coding the αk coefficients 3. Represent e(n) via: • Pitch pulses and noise—LPC Coding • Multiple pulses per 10 msec interval—MPLPC Coding • Codebook vectors—CELP • Almost all of the coding bits go into coding of e(n)

83 LPC Vocoder header

• bit rates of 72 FS where FS=100, 67, and 33 for bit rates of 7200 bps, 4800 bps and 2400 bps

3600 bps

2400 bps

VEV LPC

84 LPC Basics s (m) e (m) s (m) n A(z) n H(z) n

p −k Ez() Az ( )=−11∑αk z =− Pz ( ) = ; prediction error filter k =1 Sz() p

emnnˆˆ ( )=− sm ( )∑ k sm n ˆ ( − k ); prediction error k =1 α 11 1 Hz ( )==p = ; all pole model Az()−k 1− Pz () 1− ∑ k z k =1 2 ∞ ∞ α p 2 ⎛⎞ Eemsmnnˆˆ==−∑ []()∑∑⎜⎟ n ˆ () αksmnˆ ()− k m=−∞ mk=−∞⎝⎠ =1 85 LPC Basics-Speech Model e(n)=Gu(n) s(n) +

P(z) GGSz() Hz ( ) ===p −k 1− Pz() Uz () 1− ∑αk z k =1

jω G He()= p − jk 1− ∑ ke k =1 α 86 ω Summary

• the LP model has many interesting and useful properties that follow from the structure of the Levinson-Durbin algorithms • the different equivalent representations have different properties under quantization – polynomial coefficients (bad) – polynomial roots (okay) – PARCOR coefficients (okay) – lossless tube areas (good) – LSP root angles (good) • almost all LPC representations can be used with a range of compression schemes and are all good candidates for the technique of Vector Quantization

87