Linear Predictive Coding (LPC)

LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis, speech Digital Speech Processing— recognition, speaker recognition and verification Lecture 13 and for speech storage – LPC methods provide extremely accurate estimates of speech parameters, and does it extremely efficiently Linear Predictive – basic idea of Linear Prediction: current speech sample can be closely approximated as a linear Coding (LPC)- combination of past samples, i.e., Introduction p sn( )=−αα sn ( k ) for some value of p, 's ∑ kk 1 k =1 2 LPC Methods LPC Methods • LP is based on speech production and synthesis models - speech can be modeled as the output of a linear, time-varying system, excited by either quasi-periodic • for periodic signals with period Np, it is obvious that pulses or noise; - assume that the model parameters remain constant sn()≈− sn ( N ) p over speech analysis interval but that is not what LP is doing; it is estimating sn( ) from LP provides a robust, reliable and accurate method for the pp (<< N) most recent values of sn( ) by linearly estimating the parameters of the linear system (the combined p vocal tract, glottal pulse, and radiation characteristic for voiced speech) predicting its value • for LP, the predictor coefficients (the αk 's) are determined (computed) by minimizing the sum of squared differences (over a finite interval) between the actual speech samples and the linearly predicted ones 3 4 LPC Methods Basic Principles of LP p sn() asn ( k ) Gun () • LP methods have been used in control and =−+∑ k information theory—called methods of system k =1 estimation and system identification – used extensively in speech under group of names • the time-varying digital filter including represents the effects of the glottal pulse shape, the vocal tract IR, and 1. covariance method radiation at the lips 2. autocorrelation method • the system is excited by an impulse 3. lattice method train for voiced speech, or a random 4. inverse filter formulation Sz() 1 noise sequence for unvoiced speech Hz()== p 5. spectral estimation formulation GU() z • this ‘all-pole’ model is a natural 1 az−k 6. maximum likelihood method − ∑ k representation for non-nasal voiced 7. inner product method k =1 speech—but it also works reasonably well for nasals and unvoiced sounds 5 6 1 LP Basic Equations LP Estimation Issues • a pth order linear predictor is a system of the form • need to determine {αk } directly from speech such ppSz%() sn%( )=−⇔==αα sn ( k ) Pz ( ) z−k that they give good estimates of the time-varying ∑∑kkSz() kk==11 spectrum • the prediction error, en( ), is of the form p • need to estimate {αk } from short segments of speech en( ) sn ( ) sn ( ) sn ( ) sn ( k ) =−=−% ∑αk − • need to minimize mean-squared prediction error over k =1 • the prediction error is the output of a system with transfer function short segments of speech Ez() p Az( )==−=−11 Pz ( ) α z−k • resulting {αkk } assumed to be the actual {a } in the Sz() ∑ k k =1 speech production model •=≤≤ if the speech signal obeys the production model exactly, and if α akp,1 kk => intend to show that all of this can be done efficiently, ⇒=en( ) Gun ( )and Az( ) is an inverse filter for Hz( ), i.e., 1 reliably, and accurately for speech Hz( ) = Az() 7 8 Solution for {αk} Solution for {αk} • can find values of αk that minimize Enˆ by setting: • short-time average prediction squared-error is defined as ∂E nˆ ==012,ip , ,..., 2 2 ∂α Eemsmsm==( ) ( ) − ( ) i nnˆˆ∑∑() nn ˆˆ% mm • giving the set of equations 2 p ⎛⎞p −−−−=≤≤201sm( ism )[ ( )αˆ sm ( k )] , i p =−⎜⎟sm()α smk ( − ) ∑∑nnˆˆk n ˆ ∑∑⎜⎟nnˆˆk mk=1 mk⎝⎠=1 −−=≤≤201sm()(), iem i p ∑ nnˆˆ •=+ select segment of speech smnˆ ( ) smn (ˆ ) in the vicinity m • where αˆ are the values of α that minimize E (from now of sample nˆ kknˆ on just use αα rather than ˆ for the optimum values) • the key issue to resolve is the range of m for summation kk •− prediction error (em ( )) is orthogonal to signal ( sm ( i )) for (to be discussed later) nnˆˆ delays (ip ) of 1 to 9 10 Solution for {αk} Solution for {αk} • minimum mean-squared prediction error has the form • defining p Esmsmsmk=−2 ( )α ( ) ( − ) nnˆˆ∑∑∑k nn ˆˆ φ (ik , )=−− s ( m is ) ( m k ) nnnˆˆˆ∑ mkm=1 m • which can be written in the form • we get p Ek=−φαφ(00 , ) ( 0 , ) p nnˆˆ∑ k n ˆ αφ(ik , )== φ ( i ,012 ), i , ,..., p k =1 ∑ k nnˆˆ k =1 Process: • leading to a set of pp equations in unknowns that can be 1. compute φnˆ (ik , ) for 10≤≤ip, ≤ kp ≤ 2. solve matrix equation for αk solved in an efficient manner for the {αk } • need to specify range of mik to compute φnˆ ( , ) • need to specify smnˆ ( ) 11 12 2 Autocorrelation Method Autocorrelation Method ⋅ if smnˆ ( ) is non-zero only for 01≤≤− m L then p assume smˆ ( ) exists for 01≤≤− m L and is em()=− sm ()α smk ( − ) n nnˆˆ∑ k n ˆ k =1 exactly zero everywhere else (i.e., window of is non-zero only over the interval 01≤≤−+mL p, giving ∞ Lp−+1 length L samples) (Assumption #1 ) Eemem==22( ) ( ) nnˆˆ∑∑ n ˆ mm=−∞ =0 smˆ ()=+ smnwm (ˆ )(),01 ≤≤− m L n at values of mm near 0 (i.e., = 01, ....,p − 1 ) we are predicting signal from zero-valued samples outside the window range => em( ) will be (relatively) large where wm( ) is a finite length window of length nˆ at values near mL==++− (i.e., mLL,11 ,..., Lp ) we are predicting zero-valued samples L samples (outside window range) from non-zero samples => emnˆ ( ) will be (relatively) large • for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window) ^^ nn+L-1 m=0 m=L m=0 m=L-1 m=p-1 m=L+p-1 13 14 The “Autocorrelation Method” Autocorrelation Method ^^nLˆ + −1 snˆ []msmnwm= [+ ˆ ][] L-1 Lk−−1 Rknnnˆˆˆ[]=+=∑ smsmk [ ] [ ] k12 ,,K , p m=0 15 16 The “Autocorrelation Method” Autocorrelation Method • for calculation of φnnˆˆ(ik , ) since s( m )=≤≤−001 outside the range m L , then Lp−+1 ˆ ˆ φ (ik , )=−−≤≤≤≤ s ( m is ) ( m k ),10 i p , k p nˆ n nLˆ +−1 nL+−1 nnnˆˆˆ∑ m=0 snˆ[]msmnwm=+ [ˆ ][] • which is equivalent to the form Lik−+1 () − φ (ik , )= s ( ms ) (mik+−),10 ≤≤ i p , ≤ k ≤ p nnnˆˆˆ∑ m=0 •−− there are Lik| | non-zero terms in the computation of φnˆ ( ik , ) for each value p L−1 of ik and ; can easily show that Large errors φ (ik , )=−= fi ( k ) R ( i − k ),10 ≤≤≤≤ i p , k p emnnˆˆ[]=− sm []α kn smk ˆ [ − ] nnˆˆ snˆ []msmnwm=+ [ˆ ][] ∑ at ends of k=1 • where Rinnˆˆ(−− k ) is the short-time autocorrelation of sm( ) evaluated at i k where window Lk−−1 Rk( )=+ smsmk ( ) ( ) nnnˆˆˆ∑ m=0 L −1 (1)Lp−+ 17 18 3 Autocorrelation Method Autocorrelation Method • since Rknˆ ( ) is even, then φnnˆˆ(ik , )=−≤≤≤≤ R ( i k ),10 i p , k p • as expressed in matrix form • thus the basic equation becomes ⎡⎤⎡⎤RRnnˆˆ(01 ) () . RpR n ˆ (− 1 )⎡⎤α1 n ˆ () 1 ⎢⎥ p ⎢⎥⎢⎥α RRnnˆˆ()10 ( ) . RpR n ˆ (− 2 )⎢⎥2 n ˆ ( 2 ) αφ(ik−= ) φ ( i ,01 ), ≤≤ i p ⎢⎥⎢⎥ ∑ k nnˆˆ ⎢⎥⎢⎥.....⎢⎥. = . k =1 ⎢⎥⎢⎥⎢⎥ . p ⎢⎥⎢⎥.....⎢⎥ . ⎢⎥⎢⎥⎢⎥ α Rik(−= ) Ri ( ), 1 ≤≤ i p Rpnnˆˆ()−−12 Rp ( )..() R n ˆ 0αp Rp n ˆ () ∑ k nnˆˆ ⎣⎦⎣⎦⎣⎦ k =1 ℜα=r • with the minimum mean-squared prediction error of the form with solution p α=ℜ−1r Ekˆˆ=−φαφ(00 , ) ˆ ( 0 , ) nn∑ k n •ℜ is a pp × Toeplitz Matrix => symmetric with all diagonal elements equal k =1 p => there exist more efficient algorithms to solve for {αk } than simple matrix =−RRk()0 α () nnˆˆ∑ k inversion k =1 19 20 Covariance Method Covariance Method • changing the summation index gives • there is a second basic approach to defining the speech Li−−1 φ (ik , )=+−≤≤≤≤ smsmi ( ) ( k ),10 i p , k p nnnˆˆˆ∑ segment smnˆ ( ) and the limits on the sums, namely fix the mi=− interval over which the mean-squared error is computed, Lk−−1 φ (,ik )=+−≤≤≤≤ smsmk ( ) ( i ),10 i p , k p nnnˆˆˆ∑ giving (Assumption #2 ): mk=− • key difference from Autocorrelation Method is that limits of summation 2 LL−−11⎡⎤p include terms before mp= 0 => window extends samples backwards Eemsmsmk==2 ( )⎢⎥ ( ) −α ( − ) nnˆˆ∑∑∑ n ˆk n ˆ from sn(ˆˆ - p ) to sn(+ L -1 ) mm==00⎣⎦⎢⎥ k = 1 • since we are extending window backwards, don't need to taper it using L−1 a HW- since there is no transition at window edges φ (,ik )=−−≤≤≤≤ s ( m is ) ( m k ),10 i p , k p nnnˆˆˆ∑ m=0 m=-p m=L-1 21 22 Covariance Method Covariance Method • cannot use autocorrelation formulation => this is a true cross correlation • need to solve set of equations of the form p αφ(ik , )== φ ( i ,012 ), i , ,..., p , ∑ k nnˆˆ k =1 p Ek=−φαφ(,)00 (, 0 ) nnˆˆ∑ k n ˆ k =1 ⎡⎤⎡⎤φnˆ (,11)φφnnˆˆ (,)12 . (,) 1p ⎡⎤α1 φ n ˆ (,) 10 ⎢⎥ ⎢⎥⎢⎥α ⎢⎥⎢⎥φφnnˆˆ(,)21 (,) 22 . φ n ˆ (, 2p )⎢⎥2 φ n ˆ (,) 20 ⎢⎥⎢⎥.....⎢⎥. = . ⎢⎥⎢⎥⎢⎥ ⎢⎥⎢⎥.....⎢⎥. ⎢⎥⎢⎥⎢⎥α ⎣⎦⎣⎦φφnnˆˆ(,)pp12 (,) . φ n ˆ (,) ppp⎣⎦p φ n ˆ (,) 0 φα =ψψor α = φ−1 23 24 4 Covariance Method Summary of LP • use psnpth order linear predictor to predict (ˆ ) from previous samples • minimize mean-squared error, ELnˆ , over analysis window of duration -samples •= we have φφnnˆˆ(ik , ) ( ki , ) => symmetric but not Toeplitz matrix • solution for optimum predictor coefficients, {α }, is based on solving a matrix equation whose diagonal elements are related as k => two solutions have evolved φφnˆˆˆˆˆˆ(i++=11 , k ) nnn ( iksi , ) +−−−−− ( 1 ) sk ( 1 ) sL n ( −− 1 isL ) n ( −− 1 k ) - autocorrelation method => signal is windowed by a tapering window in order to minimize discontinuities at beginning (predicting speech from zero-valued samples) φφnnnnˆˆˆˆ(,)22=+− (,) 11ss ( 2 ) ()−−222sLnnˆˆ ( − )( sL − ) and end (predicting zero-valued samples from speech samples) of the interval; the • all terms φ ˆ (ik , ) have a fixed number of terms contributing to the n matrix φnˆ (ik , ) is shown to be an autocorrelation

Linear Predictive Coding (LPC)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support