Linear Predictive Coding (LPC)

LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis, speech Digital Speech Processing— recognition, speaker recognition and verification Lecture 13 and for speech storage – LPC methods provide extremely accurate estimates of speech parameters, and does it extremely efficiently Linear Predictive – basic idea of Linear Prediction: current speech sample can be closely approximated as a linear Coding (LPC)- combination of past samples, i.e., Introduction p sn( )=−αα sn ( k ) for some value of p, 's ∑ kk 1 k =1 2 LPC Methods LPC Methods • LP is based on speech production and synthesis models - speech can be modeled as the output of a linear, time-varying system, excited by either quasi-periodic • for periodic signals with period Np, it is obvious that pulses or noise; - assume that the model parameters remain constant sn()≈− sn ( N ) p over speech analysis interval but that is not what LP is doing; it is estimating sn( ) from LP provides a robust, reliable and accurate method for the pp (<< N) most recent values of sn( ) by linearly estimating the parameters of the linear system (the combined p vocal tract, glottal pulse, and radiation characteristic for voiced speech) predicting its value • for LP, the predictor coefficients (the αk 's) are determined (computed) by minimizing the sum of squared differences (over a finite interval) between the actual speech samples and the linearly predicted ones 3 4 LPC Methods Basic Principles of LP p sn() asn ( k ) Gun () • LP methods have been used in control and =−+∑ k information theory—called methods of system k =1 estimation and system identification – used extensively in speech under group of names • the time-varying digital filter including represents the effects of the glottal pulse shape, the vocal tract IR, and 1. covariance method radiation at the lips 2. autocorrelation method • the system is excited by an impulse 3. lattice method train for voiced speech, or a random 4. inverse filter formulation Sz() 1 noise sequence for unvoiced speech Hz()== p 5. spectral estimation formulation GU() z • this ‘all-pole’ model is a natural 1 az−k 6. maximum likelihood method − ∑ k representation for non-nasal voiced 7. inner product method k =1 speech—but it also works reasonably well for nasals and unvoiced sounds 5 6 1 LP Basic Equations LP Estimation Issues • a pth order linear predictor is a system of the form • need to determine {αk } directly from speech such ppSz%() sn%( )=−⇔==αα sn ( k ) Pz ( ) z−k that they give good estimates of the time-varying ∑∑kkSz() kk==11 spectrum • the prediction error, en( ), is of the form p • need to estimate {αk } from short segments of speech en( ) sn ( ) sn ( ) sn ( ) sn ( k ) =−=−% ∑αk − • need to minimize mean-squared prediction error over k =1 • the prediction error is the output of a system with transfer function short segments of speech Ez() p Az( )==−=−11 Pz ( ) α z−k • resulting {αkk } assumed to be the actual {a } in the Sz() ∑ k k =1 speech production model •=≤≤ if the speech signal obeys the production model exactly, and if α akp,1 kk => intend to show that all of this can be done efficiently, ⇒=en( ) Gun ( )and Az( ) is an inverse filter for Hz( ), i.e., 1 reliably, and accurately for speech Hz( ) = Az() 7 8 Solution for {αk} Solution for {αk} • can find values of αk that minimize Enˆ by setting: • short-time average prediction squared-error is defined as ∂E nˆ ==012,ip , ,..., 2 2 ∂α Eemsmsm==( ) ( ) − ( ) i nnˆˆ∑∑() nn ˆˆ% mm • giving the set of equations 2 p ⎛⎞p −−−−=≤≤201sm( ism )[ ( )αˆ sm ( k )] , i p =−⎜⎟sm()α smk ( − ) ∑∑nnˆˆk n ˆ ∑∑⎜⎟nnˆˆk mk=1 mk⎝⎠=1 −−=≤≤201sm()(), iem i p ∑ nnˆˆ •=+ select segment of speech smnˆ ( ) smn (ˆ ) in the vicinity m • where αˆ are the values of α that minimize E (from now of sample nˆ kknˆ on just use αα rather than ˆ for the optimum values) • the key issue to resolve is the range of m for summation kk •− prediction error (em ( )) is orthogonal to signal ( sm ( i )) for (to be discussed later) nnˆˆ delays (ip ) of 1 to 9 10 Solution for {αk} Solution for {αk} • minimum mean-squared prediction error has the form • defining p Esmsmsmk=−2 ( )α ( ) ( − ) nnˆˆ∑∑∑k nn ˆˆ φ (ik , )=−− s ( m is ) ( m k ) nnnˆˆˆ∑ mkm=1 m • which can be written in the form • we get p Ek=−φαφ(00 , ) ( 0 , ) p nnˆˆ∑ k n ˆ αφ(ik , )== φ ( i ,012 ), i , ,..., p k =1 ∑ k nnˆˆ k =1 Process: • leading to a set of pp equations in unknowns that can be 1. compute φnˆ (ik , ) for 10≤≤ip, ≤ kp ≤ 2. solve matrix equation for αk solved in an efficient manner for the {αk } • need to specify range of mik to compute φnˆ ( , ) • need to specify smnˆ ( ) 11 12 2 Autocorrelation Method Autocorrelation Method ⋅ if smnˆ ( ) is non-zero only for 01≤≤− m L then p assume smˆ ( ) exists for 01≤≤− m L and is em()=− sm ()α smk ( − ) n nnˆˆ∑ k n ˆ k =1 exactly zero everywhere else (i.e., window of is non-zero only over the interval 01≤≤−+mL p, giving ∞ Lp−+1 length L samples) (Assumption #1 ) Eemem==22( ) ( ) nnˆˆ∑∑ n ˆ mm=−∞ =0 smˆ ()=+ smnwm (ˆ )(),01 ≤≤− m L n at values of mm near 0 (i.e., = 01, ....,p − 1 ) we are predicting signal from zero-valued samples outside the window range => em( ) will be (relatively) large where wm( ) is a finite length window of length nˆ at values near mL==++− (i.e., mLL,11 ,..., Lp ) we are predicting zero-valued samples L samples (outside window range) from non-zero samples => emnˆ ( ) will be (relatively) large • for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window) ^^ nn+L-1 m=0 m=L m=0 m=L-1 m=p-1 m=L+p-1 13 14 The “Autocorrelation Method” Autocorrelation Method ^^nLˆ + −1 snˆ []msmnwm= [+ ˆ ][] L-1 Lk−−1 Rknnnˆˆˆ[]=+=∑ smsmk [ ] [ ] k12 ,,K , p m=0 15 16 The “Autocorrelation Method” Autocorrelation Method • for calculation of φnnˆˆ(ik , ) since s( m )=≤≤−001 outside the range m L , then Lp−+1 ˆ ˆ φ (ik , )=−−≤≤≤≤ s ( m is ) ( m k ),10 i p , k p nˆ n nLˆ +−1 nL+−1 nnnˆˆˆ∑ m=0 snˆ[]msmnwm=+ [ˆ ][] • which is equivalent to the form Lik−+1 () − φ (ik , )= s ( ms ) (mik+−),10 ≤≤ i p , ≤ k ≤ p nnnˆˆˆ∑ m=0 •−− there are Lik| | non-zero terms in the computation of φnˆ ( ik , ) for each value p L−1 of ik and ; can easily show that Large errors φ (ik , )=−= fi ( k ) R ( i − k ),10 ≤≤≤≤ i p , k p emnnˆˆ[]=− sm []α kn smk ˆ [ − ] nnˆˆ snˆ []msmnwm=+ [ˆ ][] ∑ at ends of k=1 • where Rinnˆˆ(−− k ) is the short-time autocorrelation of sm( ) evaluated at i k where window Lk−−1 Rk( )=+ smsmk ( ) ( ) nnnˆˆˆ∑ m=0 L −1 (1)Lp−+ 17 18 3 Autocorrelation Method Autocorrelation Method • since Rknˆ ( ) is even, then φnnˆˆ(ik , )=−≤≤≤≤ R ( i k ),10 i p , k p • as expressed in matrix form • thus the basic equation becomes ⎡⎤⎡⎤RRnnˆˆ(01 ) () . RpR n ˆ (− 1 )⎡⎤α1 n ˆ () 1 ⎢⎥ p ⎢⎥⎢⎥α RRnnˆˆ()10 ( ) . RpR n ˆ (− 2 )⎢⎥2 n ˆ ( 2 ) αφ(ik−= ) φ ( i ,01 ), ≤≤ i p ⎢⎥⎢⎥ ∑ k nnˆˆ ⎢⎥⎢⎥.....⎢⎥. = . k =1 ⎢⎥⎢⎥⎢⎥ . p ⎢⎥⎢⎥.....⎢⎥ . ⎢⎥⎢⎥⎢⎥ α Rik(−= ) Ri ( ), 1 ≤≤ i p Rpnnˆˆ()−−12 Rp ( )..() R n ˆ 0αp Rp n ˆ () ∑ k nnˆˆ ⎣⎦⎣⎦⎣⎦ k =1 ℜα=r • with the minimum mean-squared prediction error of the form with solution p α=ℜ−1r Ekˆˆ=−φαφ(00 , ) ˆ ( 0 , ) nn∑ k n •ℜ is a pp × Toeplitz Matrix => symmetric with all diagonal elements equal k =1 p => there exist more efficient algorithms to solve for {αk } than simple matrix =−RRk()0 α () nnˆˆ∑ k inversion k =1 19 20 Covariance Method Covariance Method • changing the summation index gives • there is a second basic approach to defining the speech Li−−1 φ (ik , )=+−≤≤≤≤ smsmi ( ) ( k ),10 i p , k p nnnˆˆˆ∑ segment smnˆ ( ) and the limits on the sums, namely fix the mi=− interval over which the mean-squared error is computed, Lk−−1 φ (,ik )=+−≤≤≤≤ smsmk ( ) ( i ),10 i p , k p nnnˆˆˆ∑ giving (Assumption #2 ): mk=− • key difference from Autocorrelation Method is that limits of summation 2 LL−−11⎡⎤p include terms before mp= 0 => window extends samples backwards Eemsmsmk==2 ( )⎢⎥ ( ) −α ( − ) nnˆˆ∑∑∑ n ˆk n ˆ from sn(ˆˆ - p ) to sn(+ L -1 ) mm==00⎣⎦⎢⎥ k = 1 • since we are extending window backwards, don't need to taper it using L−1 a HW- since there is no transition at window edges φ (,ik )=−−≤≤≤≤ s ( m is ) ( m k ),10 i p , k p nnnˆˆˆ∑ m=0 m=-p m=L-1 21 22 Covariance Method Covariance Method • cannot use autocorrelation formulation => this is a true cross correlation • need to solve set of equations of the form p αφ(ik , )== φ ( i ,012 ), i , ,..., p , ∑ k nnˆˆ k =1 p Ek=−φαφ(,)00 (, 0 ) nnˆˆ∑ k n ˆ k =1 ⎡⎤⎡⎤φnˆ (,11)φφnnˆˆ (,)12 . (,) 1p ⎡⎤α1 φ n ˆ (,) 10 ⎢⎥ ⎢⎥⎢⎥α ⎢⎥⎢⎥φφnnˆˆ(,)21 (,) 22 . φ n ˆ (, 2p )⎢⎥2 φ n ˆ (,) 20 ⎢⎥⎢⎥.....⎢⎥. = . ⎢⎥⎢⎥⎢⎥ ⎢⎥⎢⎥.....⎢⎥. ⎢⎥⎢⎥⎢⎥α ⎣⎦⎣⎦φφnnˆˆ(,)pp12 (,) . φ n ˆ (,) ppp⎣⎦p φ n ˆ (,) 0 φα =ψψor α = φ−1 23 24 4 Covariance Method Summary of LP • use psnpth order linear predictor to predict (ˆ ) from previous samples • minimize mean-squared error, ELnˆ , over analysis window of duration -samples •= we have φφnnˆˆ(ik , ) ( ki , ) => symmetric but not Toeplitz matrix • solution for optimum predictor coefficients, {α }, is based on solving a matrix equation whose diagonal elements are related as k => two solutions have evolved φφnˆˆˆˆˆˆ(i++=11 , k ) nnn ( iksi , ) +−−−−− ( 1 ) sk ( 1 ) sL n ( −− 1 isL ) n ( −− 1 k ) - autocorrelation method => signal is windowed by a tapering window in order to minimize discontinuities at beginning (predicting speech from zero-valued samples) φφnnnnˆˆˆˆ(,)22=+− (,) 11ss ( 2 ) ()−−222sLnnˆˆ ( − )( sL − ) and end (predicting zero-valued samples from speech samples) of the interval; the • all terms φ ˆ (ik , ) have a fixed number of terms contributing to the n matrix φnˆ (ik , ) is shown to be an autocorrelation

Load more