The High-Speed LSF Transformation Algorithm in CELP Vocoder
Total Page:16
File Type:pdf, Size:1020Kb
The Fast LSP Transformation Algorithm using not Uniform Searching Interval
SOYEON MIN, KYUNGA JANG, MYUNGJIN BAE Dept. of Electronics Engineering, Soongsil University Dept. of Information & Telecommunication Engineering, Soongsil University 1.1 Sangdo 5 dong Dongjakgu, Seoul, 156-743, KOREA
Abstract: We develop the computation reduction schemes of real root method that is mainly used in the CELP(Code Excited Linear Prediction) vocoder. The real root method is that if polynomial equations have the real roots, it is able to find those and transform them into LSP(Line Spectrum Pairs). However, this method takes much time to compute because searching the root is processed sequentially in frequency region. But LSP is that most of coefficients are occurred in specific frequency region. In order to decrease the computational amount the characteristic of proposed algorithm is as the following. Fast algorithm is developed by using mel scale and is to reduce the LSP computational amount. That is, the searching interval is arranged by using mel scale but not it is uniform. In experimental results, computational amount of the developed algorithm is reduced about 45% in average, but the transformed LSP parameters of the proposed method were the same as those of real root method. Hence, in case of applying proposed algorithm in G.723.1(6.3kbps MP-MLQ), the speech quality is no distortion compared to original speech quality. key-words: LSP, CELP vocoder, mel scale, LPC, real root method, computational amount . 1 Introduction applicable, hence further investigations are required. LPC (Linear Predictive Coding) is very powerful A promising and popular method is the use of the analysis technique and is used in many speech line spectrum pairs representation of the LPC processing system. For speech coding and synthesis parameters. LSP is used for speech analysis in system, apart from the different analysis techniques vocoders or recognizers since it has advantages of available to obtain the LPC parameters, auto constant spectrum sensitivity, low spectrum distortion correlation, covariance, lattice, etc., the quantization of and easy linear interpolation. But the method of the LPC parameters is also a very important aspect of transforming LPC coefficients to LSP parameters is so the LPC analysis, as minimization of coding capacity complex that it takes much time to compute. In order is the ultimate aim in these applications. The main to acquire LSP, the process of finding the roots of objective of the quantization procedure is to code the polynomial equations is implemented. LPC parameters with as few bits as possible without The conventional methods are complex root, real introducing additional spectral distortion. Whilst root, ratio filter, Chebyshev series, and adaptive perfect reconstruction is not possible, subjective sequential LMS(Least Mean Square) methods. Among transparency is achievable[1~3]. Considerable amounts these methods, the real root method is considerably of work on scalar and vector LPC quantizers have simpler than others, but nevertheless, it still suffers already been reported in the past, but these have been from its indeterministic computation time. In this predominantly directed at coding schemes operating paper, we propose the computation reduction scheme vocoders less than 4.8kb/s. Thus these have above of real root method using the distribution of LSP 9.6kb/s (APC, RELP, etc.) or at very low rate tended to parameters and the formant characteristics[2~5,8~11]. be good quality but high capacity schemes, e.g. 40-50 bits scalar quantization, or low capacity but only 2 LPC to LSP Transformation reasonable quality vector quantization schemes, e.g. An all-pole digital filter for speech synthesis, H(z), can 10-bit codebook vector quantization. Therefore, for be derived from linear predictive analysis, and is given medium to low bit rates, i.e. 9.6-4.8kb/s, the previously by reported LPC quantization schemes are not directly H (z) 1/ Ap (z) (1) where p k Ap (z) 1 z (2) k1 k
The PARCOR system is an equivalent representation, and its digital form is as shown in Figure 1, Fig. 1 PARCOR structure of LPC synthesis
where conditions for k p1 1 correspond to a perfect closure
at the input (glottis) and for k p1 1 correspond to an A (z) A (z) k B (z) p1 p p p1 (3) opening to infinite free space[5]. To derive the line 1 Bp (z) z [Bp1(z) k p Ap1(z)] (4) spectra or line spectrum pairs(LSP), we proceed as follows, where it is assumed that the PARCOR filter is where stable and the order is even. Ap (z) may be decomposed 1 A0 (z) 1and B0 (z) z , into a set of two transfer functions, one having an even ( p1) symmetry, and the other having an odd symmetry. This and Bp (z) z Ap (z) (5) can be accomplished by taking a difference and a sum A (z) The PARCOR system as shown in figure 1, is stable between p and its conjugate. Hence the transfer k 1 P (z) for | ki | 1 for all i. The PARCOR synthesis process functions with p1 are denoted by p1 and can be viewed as sound wave propagation through a Qp1(z) : lossless acoustic tube, consisting of p sections of equal length but non-uniform cross sections. The acoustic For k 1, P (z) Ap(z) B p(z) (6) tube is open at the terminal corresponding to the lips, p1 p1 and each section is numbered from the lips. For k p1 1, Q p1(z) A p(z) B p(z) (7) Mismatching between the adjacent sections p and 1 Ap (z) [P (z) Q (z)] (8) (p+1) causes wave propagation reflection. The 2 p1 p1 reflection coefficients are equal to the p_th PARCOR Substituting equation (5) into (6) coefficient k p section p+1, corresponds to the glottis, is ( p1) 1 terminated by a matched impedance. The excitation Pp1(z) Ap (z) z Ap (z ) (9) signal applied to the glottis drives the acoustic tube. In 1 p ( p1) PARCOR analysis, the boundary condition at the 1 (1 p )z ( p 1)z z glottis is impedance matched. Now consider a pair of ( p1) p1 artificial boundary conditions where the acoustic tube z (zai ) is completely closed or open at the glottis[2~4]. i0 Similarly, These conditions correspond to k p1 1 and k p1 1, a p1 pair of extreme values for the artificially extended ( p1) Qp1(z) z (z bi ) (10) PARCOR coefficients, which corresponds to perfectly i0 lossless tubes. The value Q of each resonance becomes infinite and the spectrum of distributed energy is As we know that two roots exits (k p1 1) , the order concentrated in several line spectra. The feedback of Pp1 (z) and Qp1(z) can be reduced,
P (z) ' p1 p ( p1) P (z) A z A z Ap (11) (1 z) 0 1 and Q (z) (4B 3B B )x (B B 0.5B ) ' p1 p ( p1) 0 2 4 1 3 5 (20) Q (z) B z B z B p (12) (1 z) 0 1 The LSPs are then given by where A0 1, B0 1 cos 1 (x ) LSP(i) i , for 1 i p (21) Ak (k p1k ) Ak1, 2T
Bk (k p1k ) Bk1 for k 1,, p (13) 2.2 The Characteristics of Mel Scale Psychophysical studies have shown that human The LSP’s are the angular of the roots of P' (z) and perception of the frequency content of sounds, either Q' (z) with 0 i [2~4]. for pure tones or for speech signals, does not follow a linear scale. This study has led to the idea of defining 2.1 Real root Method subjective pitch of pure tones. Thus for each tone with As the coefficients of P' (z) and Q' (z) are symmetrical, an actual frequency, measured in Hz, a subjective pitch the order of equation (11) can be reduced to p/2. is measured on a scale called the 'Mel' scale. As a reference point, the pitch of a 1 kHz tone, 40dB above ' p ( p1) 1 the perceptual hearing threshold, is defined as 1000 P (z) A0 z A1z A1z A0 (14) z p / 2[A (z p / 2 z p / 2 ) A (z ( p / 21) z ( p / 21) ) A ] mels. Other subjective pitch values are obtained by 0 1 p / 2 adjusting the frequency of a tone such that it is half or twice the perceived pitch of a reference tone (with a Similarly, known mel frequency). The unit of pitch is the Mel. Q ' (z) B z p B z ( p1) B z1 B 0 1 1 0 (15) Pitch in mels is plotted against frequency in equation p / 2 p / 2 p / 2 ( p / 21) ( p / 21) z [B0 (z z ) B1 (z z ) B p / 2 ] (22). In the equation (22), frequency is plotted on a logarithmic scale and is therefore linear in musical As all roots are on the unit circle, we can evaluate pitch. Koenig approximates this scale by a function equation (14) on the unit circle only. which is linear below 1 kHz and logarithmic above[7]. Fant gives the approximation, 1 1 Let z e then z z 2 cos( ) (16) f Fmel 1000/ log2 1 (22) jp / 2 p p 2 1 1000 P' (z) 2e [A cos( ) A cos( ) ... Ap / 2 ] 0 2 1 2 2 (17) 3 The Fast LSP Transformation jp / 2 p p 2 1 Algorithm Q' (z) 2e [B0 cos( ) B1 cos( ) ... B p/2 ] 2 2 2 In the real root method, odd order LSP parameters are (18) searched at the first time and then even order parameters are searched between odd order parameters. By making the substitution x cos() , equation (16) The searching time of odd order parameters takes up and (17) can be solved for x. For example, with p=10, the most of transformation time because the searching the following is obtained. is processed sequentially in the whole frequency region. But the important characteristic of LSP is that 5 4 3 2 most LSP parameters are occurred in specific P' (x) 16A x 8A x (4A 20A )x (2A 8A )x 10 0 1 2 0 3 1 frequency region. And, the searching interval is
(4A0 3A2 A4 )x (A1 A3 0.5A5 ) (19) uniform to acquire LSP parameter in real root method. So, to reduce the computational amount in real root and similarly, 5 4 3 2 method, the searching frequency interval is arranged Q'10 (x) 16B0 x 8B1x (4B2 20B0 )x (2B3 8B1 )x by using mel scale but not it is uniform. In order to decrease the computational amount the characteristic of proposed algorithm is as follow. Equation (23) Searching interval by not uniform control n represents the searching interval of proposed algorithm to find the LSP parameters (unit:Hz) in order to get LSP parameter. We can be known 0- 11. 14. 15. 19. 22. 0 2.8 5.5 8.3 25.4 through equation (24) to (25) that the odd order in 9 1 0 8 7 5 LSPs is changeable depending on the equation (23). 10 28. 31. 34. 37. 40. 43. 46. 49. 52. - 55.1 Equation (26) and (27) represent even order in LSPs by 3 2 2 1 1 1 1 1 1 equation (23). Table 1 represents searching intervals of 19 20 proposed algorithm by equation (23). That is, 58. 61. 64. 67. 70. 73. 77. 80. 83. - 86.6 2 3 4 5 6 8 0 2 4 searching interval of proposed algorthm is not uniform. 29 (23) 30 89. 93. 96. 99. 103 106 109 113 11 119. po int(n) /(1000 / log 2) - f (10 1) *1000 0.5 for 0 n 399 8 1 3 6 .0 .3 .6 .0 6.4 8 n 1 39 40 123 126 130 13 137 140 144 147 15 154. - where .2 .6 .1 3.6 .1 .6 .1 .7 1.3 9 49 f 0 0 , po int(n) index (n 1) for 0 n 399 50 158 161 165 16 173 176 180 184 18 191. - index F / 399 .5 .2 .8 9.5 .2 .9 .6 .4 8.1 9 mel 59 60 F 1000 / log 2 log1 FS /1000, FS 8000 195 199 203 20 211 215 219 223 22 231. mel - .8 .6 .5 7.4 .3 .2 .1 .1 7.1 1 69 70 ' p p 2 1 235 239 243 24 251 255 259 264 26 272. Q ( f ) A cos(2f ) A cos(2f ) ... A - n n 0 n 1 n p / 2 .2 .2 .3 7.4 .5 .7 .9 .1 8.3 5 2 2 2 79 80 for f : 0 n 399 (24) 276 281 285 28 294 298 302 307 31 316. n - .8 .2 .4 9.7 .1 .5 .9 .3 1.8 2 89 90 ' p p 2 1 320 325 329 33 339 343 348 353 35 362. Q ( f ) A cos(2f ) A cos(2f ) ... A - i i 0 i 2 1 i 2 2 p / 2 .8 .3 .8 4.4 .0 .7 .3 .0 7.7 5 99 for f : i n 1 i (25)
' p p 2 1 P ( f ) A cos(2f ) A cos(2f ) ... A n n 0 n 2 1 n 2 2 p / 2 for f LSP ( ), m 1,3,5,7,9 n m (26)
' p p 2 1 P ( f ) A cos(2f ) A cos(2f ) ... A i i 0 i 2 1 i 2 2 p / 2 for f LSP ( ), m 1,3,5,7,9 i m 2 (27)
Figure 2 describes the full block diagram of proposed method in the paper. And, figure 3 and 4 represent the sub-block diagram of proposed method. Figure 3 represents the searching process of odd order parameter in the fast transformation algorithm and figure 4 shows that of even order parameter. Table 1. Part of Searching Interval in Fast Algorithm Fig. 2 Block Diagram of Proposed method the proposed algorithms, we used the following speech data. Speech data was sampled at 8kHz and was quantized with 16bits. Following sentences were uttered five times by 5 male and female speakers who are in the middle or late 20’s. The data were recorded in a quiet room, with the SNR(Signal to Noise Ratio) greater than 30dB.
Utterance 1: /Insune komaneun cheonjaesonyuneul joahanda/ Utterance 2: /Yesunimkeoseo cheonjichangjoeu kyohuneul malseumhasyuda./ Utterance 3: /Soongsildae jeongbo tongshingong hak kwa eumseongtongshin yeungusilida/ Utterance 4: /Changgongeul hechye naganeun ingan eu dojeoneun keuchieobda/
Two steps simulation is performed in order to decrease the computation of LSP parameter by proposed Fig. 3 Sub-Block Diagram of algorithm. First, computational amount of LPC to LSP Proposed method (I) conversion is estimated by applying the proposed algorithm from real root algorithm of CELP vocoder. Second, proposed method is applied in G.723.1 vocoder and we see how much SNR(Signal to Noise Ratio) changes in the whole performance of vocoder. LSP parameter in one of speech frames shows in figure 5 in order to compare between real root method and proposed method using the utterance 1. As showing figure 5, proposed method has the same LSP parameter value as conventional method.
Fig. 4 Sub-Block Diagram of Fig. 5 The Distribution of LSP Parameters Proposed method (II) (a)Speech Signal (b)Spectrum characteristics (c)The Real Root Method 4 Experimental Results (d)The Distribution of LSP Parameters from Computer simulation was performed to evaluate the Experimental Results proposed algorithm using a PC interfaced with the 16- bit AD/DA converter. To measure the performance of Table 2 shows the computational amount of CELP vocoder [4,9~11]. The real root method is conventional real root method and proposed simpler than other transformation methods but this method(step 1). The proposed LPC to LSP takes much time to compute, because the root searching is processed sequentially in frequency transformation algorithm is less than conventional real region. In order to decrease the LSP transformation, root algorithm and the computational amount of the proposed algorithm has a main point that the searching proposed method is reduced by 45% in average. But the interval is controlled by mel scale. Simulation transformed LSP parameters with the proposed method performed with two steps. First, computational amount were the same as those of conventional real root of LPC parameter to LSP parameter is estimated and method. Table 3 describes the simulation result of real second, whole SNR is estimated by using the proposed root method and proposed method in the side of whole algorithm. Computational amount of LPC parameter to LSP parameter decreases over 45% in proposed SNR of performing the G.723.1(step 2). Proposed method. algorithm has similar result as table 3. References: Table 2. The Computation Amount of LSP [1]N. Jayant and P. Noll, Digital Coding of Transformation (step 1): Waveforms : Principles and Applications to Speech unit: Real Root Proposed Decreased and Video, Signal Processing, Prentice-Hall, 1984, [times/frames] Method Method Ratio (%) pp. 221-220. Utterance + 9057.3 5428.3 40.06 41.36 [2]S. Saito and K. Nakata, Fundamentals of Speech 1 - 6610.5 3790.5 42.66 Signal, Academic Press, 1985, pp.126-132. : 132 * 27027.2 14148.2 47.65 48.15 [3]A. M. Kondoz, Digital Speech, John Wiley & Sons frames / 1648.9 846.9 48.64 Ltd, 1994, pp. 84-92. + 9055.8 5426.8 40.07 Utterance2 41.37 [4]M. J. Bae, Digital Speech Analysis, Dongyoung - 6609.5 3789.5 42.67 : 173 Press, 1998, pp.95-120. * 27023.1 14144.1 47.66 frames 48.16 [5]P. Kabal and R. P. Ramachandran, The computation / 1648.6 846.6 48.65 + 9056.1 5427.1 40.07 of line spectral frequencies using Chebyshev Utterance3 41.37 - 6609.7 3789.7 42.66 polynomials, IEEE Trans. on ASSP, December : 142 * 27023.9 14144.9 47.66 1986. frames 48.15 / 1648.7 846.7 48.64 [6]ITU-T Recommendation G.723.1, March, 1996. + 9056.5 5427.5 40.07 [7]Thomas Parson, Voice and Speech Processing, Mac Utterance4 41.37 - 6610.0 3790.0 42.66 Graw Hill, pp71-73. : 154 * 27025.0 14146.1 47.66 [8]F. Soong and B. H. Juang, Line Spectrum pairs and frames 48.15 / 1648.8 846.8 48.64 speech data compression, Proc. of ICASSP, 1.10.1- 1.10.4, 1984. Table 3. The SNR of the Fast LSP Transformation [9]SoYeon MIN, MyungJin BAE, A High-Speed LSF Algorithm (step 2): Transformation Algorithm for CELP Vocoders, unit:[dB] G.723.1 Proposed Method The Journal of the Acoustic Society of Korea, Vol. Utterance 1 13.06 12.90 20, No. 1E, March, 2001. Utterance 2 10.68 10.45 [10]EunYoung KANG, SoYeon MIN, MyungJin BAE, Utterance 3 10.18 10.32 A Study on the reduction of LSP(Line Spectrum Utterance 4 11.74 11.72 Pairs) Transformation Time Using the Voice Characteristics, wireless2001,10, July, 2001. 5 Conclusion [11]SoYeon MIN, MyungJin BAE, A Study on the LSP parameter is used for speech analysis in low-bit rate speech vocoders or recognizers since it has Frequency Scaling Methods using LSP parameters advantages of constant spectrum sensitivity, low Distribution Characteristics, The Journal of the spectrum distortion and easy linear interpolation. But Acoustic Society of Korea, Vol. 21, No. 3, April, the method of transforming LPC to LSP is so complex 2002. that it takes much time to compute. In this paper, we proposed the new transformation algorithm based on the real root algorithm that which widely used in the