Fast Linear Prediction and Adaptive Quantization for Voice Compression
Total Page:16
File Type:pdf, Size:1020Kb
Fast linear prediction and adaptive quantization for voice compression Dawei Hunag Bell Labs Research China, Lucent Technologies 30 Hai Dian Nan Lu 100080, Beijing P.R. China [email protected] Abstract: Engineers in information technology have used linear prediction techniques for many decades. Parallel with this development, same models have been studied by statistician in time series analysis. Based on some concepts in statistics and aimed at applications in voice compression, this talk introduces two fast linear prediction methods. One is a fast recursive least squared algorithm with 7 multiplications and divisions in each step, which improves the existing results [3, 4]. The other is the Weighted Cascaded Least Mean Squared (WCLMS) method. Both algorithms are applied for voice compression with suitable adaptive quantization. These results show that combining the statistical methods with engineering approaches, we really can contribute to real information technology problems. 1. Introduction Voice compression technique becomes a very important method in the modern information technology. In our daily life, no matter you make a mobile/international/long-distance phone call, or you listen to a MP3 player, you use voice compression techniques. This technique has been developed in the signal processing area by engineers for about 50 years. It is based on two aspects: linear prediction and quantization. The first international standard m-law (A-law) [1] in voice compression is based on quantization. In fact, quantization is necessary for using digital signal instead of analog signal in recording sound. The problem is how to use as few as possible bits to record the sound while keeping a high quality. The solution of this problem is based on both perceptional consideration and distribution of the voice that turns to be an exponential distribution in many cases. The other key technique is linear prediction. At beginning engineers use simple difference for the prediction [2]. Then a technique called Linear Predictive Coding (LPC) has been popular. The recent development is using LPC plus coding the whole prediction residual sequence with few bits (such as CELP and MELP). At about the same time period, the linear prediction theory has been developed in time series analysis. First of all, this theory was established for stationary processes by Kolmogorov and Wiener in 1940’s. Then the AutoRegeressive (AR), the AutoRegeressive-Moving Average (ARMA) and their variations with differencing (ARIMA) have been the forcus of the research in time series analysis for several decates. In fact, the LPC model coincides with the stationary AR model. Furthermore, we believe that the concepts and methods in time series analysis and general statistics can help developing the tractable algorithms in voice compression. In this talk, we introduce two fast linear prediction algorithms and their applications to voice compression with suitable quantizations. In Section 2, a fast exponentially weighted recursive least squared algorithm is derived based on the concepts and methods similar to stationary process. This algorithm can be used to improve the widely used standards [1,2] with less bits per sample while keeping exactly the same quality (lossless compression to these standards). In Section 3, the Weighted Cascaded Least Mean Squared (WCLMS) algorithm is introduced and applied to music with a perpectual quantization. We summarize the talk in Section 4. 2. Fast recursive least squared algorithm with application to speech compression The exponentially weighted least squares problem, which is widely used in information technology in many areas, can be described as the following: Let {xt} be the observation sequence, m:n T T ft-l = [xt-m , xt-m-1,..., xt-n ] , m < n; At = [at,1,at,2,..., at,q ] . We want to find At such that t t T 1:q 2 min å l (xt-l - At ft-l ) , 0 < l <1. (1) At l=0 t m:n l m:n m:n T Let Rt = å lft-l (ft-l ) . Then we have the matrix form that is similar to the well-known l=0 Yule-Walker equation in time series analysis: 2 0:q é 1 ù és t ù Rt ê ú = ê ú. (2) ë- At û ë 0 û However, since the matrix R there is NOT a Toeplitz matrix, the tranditional Levinson algorithm cannot be used. We need supplemental matrix and variables: Let m:n T é 1 f ù T Gm:n = ( t ) , C = c c ... c such that t ê m:n m:n ú t [ t,1 t,2 t,q ] ëêft Rt ûú 2 1:q é 1 ù évt ù Gt ê ú = ê ú. (3) ë-Ct û ë 0 û Then we have æé 1 ù é 0 ùö éls 2 + d h ù 0:q ç ÷ t t t Rt+1 çê ú - dt ê ú÷ = ê ú Þ At+1 = At + dtCt. èë- At û ëCt ûø ë 0 û T To obtain Ct+1 we have to consider the third predictor Bt = [bt,1,...bt,q ] such that 0:q é- Bt ù é 0 ù Rt ê ú = ê 2 ú. (4) ë 1 û ëst û Use the three Yule-Walker type equations and their relationship, we can derive {Bt+1,Ct+1} from { At,Bt,Ct} with 7 multiplications and divisions. In compirison to the existing results [3-4], this algorithm has less computations. Now, consider a quantization scheme -1 qt = Q(xt ), rt = Q (qt ), here and after {xt} is the input voice sequence, qt=Q(xt) gives the quntized symbol with less bits, rt is the recoved voice sequnce with the same bits as xt. By the m-law [1], we convert 16 bit xt into 8 bit qt while by the ADPCM [2], 16 bit xt into 4 bit qt. In the encoding side, after using one of the quantization schemes, we apply the fast RLS algorithm to the anti-quntized sequence {rt}. Starting from initial values rt=0, t<0, we calculte the difference between qt and the quantized predictor T pt = [rt-1, rt-2,...,rt-q ]At-1, et = qt -Q( pt ). (5) Then we code et by some lossless source coding method, such as Huffman code, and transfer or store the results. In the decoding side, we have the sequence {et}. Then starting from the initial values rt=0, t<0, -1 we also calculate pt by (5) then recover qt by et+Q(pt). Once we obtain qt, we have rt=Q (qt). Then the fast RLS algorithm is used to get At and keep on going. The following table show the results of our method with the quantization scheme m-law [1] or ADPCM [2]: The numbers in 2nd and 3rd rows show the bits per voice sample. Voice file No 1 2 3 4 5 6 7 Average m-law (8 bits) 2.70 3.09 3.27 3.33 3.35 3.48 3.53 3.24 ADPCM (4 bits) 2.59 2.69 2.73 3.04 2.74 2.69 2.79 2.75 3. Weighted Cascaded Least Mean Squared (WCLMS) method with perceptual audio coding Another widely used popular prediction algorithm in information technology is the Least Mean Squared (LMS) method [5]: T et pt =Jt-1ft-1, et = xt - pt , Jt =Jt-1 + 2 ft-1. 1+ l || ft-1 || Here and after qt-1 is the tap coefficient vector while ft-1 is the regressor at time (t-1). This algorithm is very simple, it also converges to the model coefficients for a stationary AR model. However, the convergence rate is much lower than RLS. To improve it, we use cascaded LMS: The first LMS consists of the voice input signal xt, then the next LMS uses the prediction errors as the input. Let the prediction for xt of the first LMS be P1,t. From the second LMS, we denote the prediction Pi,t for xt by the sum of all previous predictions for its own input (prediction error of previous LMS). Now we have multiple predictors for the same voice signal xt. To find the final prediction, we use the weighted combination for a superior prediction in Bayesian statistics [6]: Pt = å wi,t Pi,t , å wi,t =1, i i where wi,t can be regarded as the posterior probability that Pi,t is “correct” given data to time t. Our choice of wi,t is based on the so-called Predictive Minimum Description Length (PMDL) principle [7] in the following way: Experience shows that the prediction errors can be approximated well with a Laplacian distribution, so is the conditional distribution of xt based on the previous observations. Also, since the signal is non-stationary, we adapt a “forgetting factor” µ. So the PMDL weights are æ t-1 ö w ¥expç- c(1- m) m t- j | x - P |÷. i,t ç å j i, j ÷ è j=0 ø For the quantization, we use the novel adaptive pre- and post- filters that are designed according to the psycho-acoustical model [8]. The pre-filter removes irrelevant components from the voice signals without much damage to the quality of the sound adaptively. Then the output is quantized under the masked threshold of human perception. After these two steps, we obtain better results than that in Section 2: Music file No 1 2 3 4 Average WCLMS(bits per sample) 1.94 1.99 2.16 1.96 2.0125 4. Summary In this talk, we introduce two linear prediction methods. One is the fast recursive least squared algorithm that is derived following the way similar to derive Levison recursion from Yule-Walker equations in time series analysis.