M-Ary Predictive Coding: a Nonlinear Model for Speech

M-ARY PREDICTIVE CODING: A NONLINEAR MODEL FOR SPEECH M.Vandana Final Year B.E(ECE) PSG College of Technology Coimbatore-641004. [email protected] ABSTRACT 2. INTRODUCTION Speech Coding is pivotal in the Speech is a very special signal for a ability of networks to support multimedia variety of reasons. The most preliminary of services. The technique currently used for these is the fact that speech is a non- speech coding is Linear Prediction. It stationary signal. This makes speech a models the throat as an all-pole filter i.e. rather tough signal to analyze and model. using a linear difference equation. However, the physical nature of the The second reason, is that factors throat is itself a clue to its nonlinear like the intelligibility, the coherence and nature. Developing a nonlinear model is other such ‘human’ characteristics play a difficult as in the solution of nonlinear vital role in speech analysis as against equations and the verification of statistical parameters like Mean Square nonlinear schemes. Error. In this paper, two nonlinear The third reason is from a models– Quadratic Predictive Coding and communications point of view. The M-ary Predictive Coding- have been number of discrete values required to proposed. The equations for the models describe one second of speech amounts to were developed and the coefficients 8000 (at the minimum). Due to concerns of solved for. MATLAB was used for bandwidth, compression is desirable. implementation and testing. This paper gives the equations and preliminary 2.1. Linear Predictive Coding results. The proposed enhancements to MPC are also discussed in brief. In the technique of linear prediction, the vocal tract is modeled as an all-pole 1. AIM filter[1]. That is, the transfer function of the The objective of this endeavour was to vocal throat is written as i. Investigate the two nonlinear 1 models proposed by the author for H(z)= ----------------- more effective speech coding. p -i ii. Development of one of the two S ai*z models – MPC (M-ary Predictive i=1 Coding) – as a complete coding scheme. In terms of a difference equation, p y[n]= Sai*y[n-i] i=1 This means that each sample of the Various deductions were then drawn from sequence describing speech is assumed to the models and the results. These form the be a linear weighted combination of a remainder of the paper. finite number of previous samples of the sequence. 3. NONLINEAR MODELS PROPOSED BY THE AUTHOR By the method of minimization of Mean Square Error, the coefficients for the 3.1. Quadratic Model for speech linear predictive filter (ai) can be determined[1]. This is the method In this model, as against each currently used for speech coding (in the sample being expressed as a linear form of various implementations like weighted sum of a finite number (p) CELP, RELP etc.). values preceding it, it was attempted to express each sample as A look at the shape of the human throat a quadratic weighted sum of the reveals that the throat (vocal tract) is not same. linear. It is curved. The obvious question now is why the throat has been modeled Preliminary equation for quadratic as an all-pole filter (linear) when it in model: reality is not linear. This is because the assumption of linearity in the throat, p 2 highly simplifies the required y[n] = S ( ai*y[n-i] + bi*y [n-i] ) calculations. The final system that is i=i required to be solved is simply a set of linear, simultaneous equations, various 3.2. M-ary Model for speech methods to solve which (like the Levinson Durbin algorithm) are widely In this model, the current sample of known. the sequence is modeled as being a weighted sum of ‘m’ previous However, a simple implementation of sample values, with the power of LPC yields errors in the speech output at each sample value increasing with the Rx. Hence a nonlinear model was the delay of current sample from it. studied. This attempts to model the physical fact that the longer the time elapsed With the objectives of error since the occurrence of a sample minimization, intelligibility retention and value, the lesser its effect, i.e. the computational simplicity in mind, in this decay of each sample value with project, the attempt to develop two progressive increase of time. distinct nonlinear models for speech was carried out. Both these models were Preliminary equation for M-ary investigated. Both the models were given model: mathematical expressions, and equations were developed for the respective m i parameters. MATLAB was then used to y[n] = S ( ai * y [n-i]) code each of these models and test them i=i for the above-mentioned parameters. 4. QUADRATIC MODEL Applying Equations 4.3 and 4.4 to 4.2, 4.1. Development of Equations N-1 p 2 ?{y[n]-?(ai*y[n-i]+bi*y [n-i])}*y[n-j]}=0 The preliminary equation as given n=0 i=1 above is N-1 p p 2 2 2 ?{{y[n]-?(ai*y[n-i]+bi*y [n-i])}*y [n-j]}= 0 y[n] = S ( ai*y[n-i] + bi*y [n-i] ) n=0 i=1 i=1 for j = 1 to p. (4.5) This is the equation of the model suggested. However this model is to be Simplifying and reversing the order of made to predict the speech signal. Hence, summation, it would be more accurate to denote the N-1 p N-1 p N-1 2 LHS of the above equation as yp[n] (the ? y[n]y[n-j]=? ai? y[n-i]y[n-j]+? bi? y [n-i]y[n-j] predicted value of y[n]) rather than y[n] n=0 i=1 n=0 i=1 n=0 itself. Hence, N-1 p N-1 p N-1 p 2 2 2 2 2 ? y[n].y [n-j]=? ai? y[n-i]y [n-j]+? bi? y [n-i]y [n-j] yp[n] = S ( ai*y[n-i]+bi*y [n-i] ) (4.1) n=0 i=1 n=0 i=1 n=0 i=1 (4.6) The prediction error is simply the Solving and using matrix notation, difference between the actual speech -1 -1 -1 -1 A = [Y2 Y1 – Y4 Y3] * [Y2 Yj,1 – signal and its predicted value. -1 Y4 Y j,2] (4.7) Therefore, the prediction error -1 -1 -1 -1 B = [Y1 Y2 – Y3 Y4] * [Y1 Y j,1 – e[n]=y[n]-yp[n] -1 Y3 Yj,2] (4.8) The method of minimization of the MSE was employed to compute the A, B are the vectors representing the parameters. coefficients ai, bi respectively for ‘i’ from 1 The mean square error is to p. N-1 Y , Y , Y , Y are square matrices such that 2 2 1 2 3 4 E(e [n]) = S e [n] N-1 n=0 Y1(l,m)= S y[n-l]y[n-m] (4.9) N-1 2 n=0 = S(y[n]-yp[n]) n=0 N-1 N-1 p Y (l,m)= y[n-l]y2[n-m] (4.10) 2 2 2 S = S {y[n] - S (ai * y[n-i] + bi * y [n-i])} n=0 n=0 i=1 (4.2) N-1 2 ?E(e [n]) 2 Y3(l,m)= S y [n-l]y[n-m] (4.11) ------------ = 0 for i=1 to p (4.3) n=0 ?ai 2 N-1 ?E(e [n]) 2 2 Y4(l,m)= S y [n-l]y [n-m] (4.12) ------------ = 0 for i=1 to p (4.4) n=0 ?bi (l and m take values from 1 to ‘p’.) Yj,1 and Yj,2 are column vectors such that However, this algorithm by itself didn’t N-1 yield results as expected. Also, this model Yj,1(j) = S y[n]y[n-j] (4.13) doesn’t have a physical significance as n=0 MPC can be shown to have (section 5). Hence, work on QPC was suspended after N-1 investigation of preliminary results and 2 Yj,2(j) = S y[n]y [n-j] (4.14) attention was turned to M-ary Prediction n=0 for the reasons explained there under. (j takes values from 1 to ‘p’.) 5. M-ARY PREDICTION Thus, coefficients ai and bi were determined to develop a quadratic model 5.1. Basic Model and reasons for choice for the given speech signal. The preliminary equation for the M-ary 4.2. Deductions from Mathematical model is Analysis, used in implementation m i y[n] = S ( ai*y [n-i]) (5.1) From equation set 4 above, matrices Y1, i=1 Y2, Y3, Y4 are square matrices. Y1 and Y4 are symmetric matrices i.e. An assumption that |y[x]|<=1 for all x was Y1(l,m)= Y1(m,l) and the same holds true made. (This assumption will later be for Y4 as well. Hence with 2*p supported in the implementation hence its coefficients to be estimated in all (i.e. ‘p’ consequences at this point will correctly previous values being considered at serve as the justification of the choice of current sample determination), instead of this model.) p*p elements to be computed for each of these matrices, it is sufficient to compute Observation of the above model equation simply {p(p+1)/2} elements for each. reveals that the power to which an element Similarly, coefficient matrices Y2 and Y3 (a sample) is raised, increases as its delay are but the transpose of each other.

M-Ary Predictive Coding: a Nonlinear Model for Speech

Digital Speech Processing— Lecture 17

Mpeg Vbr Slice Layer Model Using Linear Predictive Coding and Generalized Periodic Markov Chains

Speech Compression

Advanced Speech Compression VIA Voice Excited Linear Predictive Coding Using Discrete Cosine Transform (DCT)

Source Coding Basics and Speech Coding

Linear Predictive Coding Is All-Pole Resonance Modeling

Hybrid Compression Technique Using Linear Predictive Coding for Electrocardiogram Signals

Embedded Lossless Audio Coding Using Linear Prediction and Cascade

Speech Compression Using LPC-10 and VELPC Based on DCT Technique Compression

Linear Predictive Coding

Speech Coding Methods, Standards, and Applications Jerry D. Gibson

Voice Coding with Opus