electronics
Article An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec
Yuqun Xue 1,2, Yongsen Wang 1,2, Jianhua Jiang 1,2, Zenghui Yu 1, Yi Zhan 1, Xiaohua Fan 1 and Shushan Qiao 1,2,*
1 Smart Sensing R&D Center, Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China; [email protected] (Y.X.); [email protected] (Y.W.); [email protected] (J.J.); [email protected] (Z.Y.); [email protected] (Y.Z.); [email protected] (X.F.) 2 Department of Microelectronics, University of Chinese Academy of Sciences, Beijing 100049, China * Correspondence: [email protected]
Abstract: A high-performance vector quantization (VQ) codebook search algorithm is proposed in this paper. VQ is an important data compression technique that has been widely applied to speech, image, and video compression. However, the process of the codebook search demands a high computational load. To solve this issue, a novel algorithm that consists of training and encoding procedures is proposed. In the training procedure, a training speech dataset was used to build the squared-error distortion look-up table for each subspace. In the encoding procedure, firstly, an input vector was quickly assigned to a search subspace. Secondly, the candidate code word group was obtained by employing the triangular inequality elimination (TIE) equation. Finally, a partial distortion elimination technique was employed to reduce the number of multiplications. The
proposed method reduced the number of searches and computation load significantly, especially when the input vectors were uncorrelated. The experimental results show that the proposed algorithm
Citation: Xue, Y.; Wang, Y.; Jiang, J.; provides a computational saving (CS) of up to 85% in the full search algorithm, up to 76% in the TIE Yu, Z.; Zhan, Y.; Fan, X.; Qiao, S. An algorithm, and up to 63% in the iterative TIE algorithm. Further, the proposed method provides CS Efficient Codebook Search Algorithm and load reduction of up to 29–33% and 67–69%, respectively, over the BSS-ITIE algorithm. for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec. Keywords: vector quantization (VQ); codebook search; line spectrum frequency (LSF); speech codec Electronics 2021, 10, 380. https:// doi.org/10.3390/electronics10040380
Academic Editor: 1. Introduction Manuel Rosa-Zurera Vector quantization (VQ) is a high-performance technique for data compression. Received: 3 December 2020 Due to its simple coding and high compression ratio, it has been successfully applied Accepted: 2 February 2021 to speech, image, audio, and video compression [1–6]. It is also a key part of the G.729 Published: 4 February 2021 Recommendation. A major handicap is that a remarkable computational load is required for the VQ of line spectrum frequency (LSF) coefficients of the speech codec [7–12]. Thus, it Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in is necessary to reduce the computation load of VQ. published maps and institutional affil- Conventionally, a full search algorithm is employed to find the code word that is best iations. matched with an arbitrary input vector. However, the full search algorithm demands a large computation load. To reduce the encoding search complexity, many approaches have been proposed [13–28]. These approaches can be classified into four main types, according to their base techniques. The first type is the tree-structured VQ (TSVQ) technique [13–17], the second type is the triangular inequality elimination (TIE)-based approach [18–21], Copyright: © 2021 by the authors. the third type is the equal-average equal-variance equal-norm nearest neighbor search Licensee MDPI, Basel, Switzerland. This article is an open access article (EEENNS) method [22–24] based on the statistical characteristic values of the input signal, distributed under the terms and and the last type is the binary search space-structured VQ (BSS-VQ) [25–28] method. conditions of the Creative Commons In the TSVQ approaches [13–17], the search complexity can be reduced significantly. Attribution (CC BY) license (https:// However, the reconstructed speech quality is poor because the selected code word is not creativecommons.org/licenses/by/ necessarily the best matched to the input vector. In contrast, the TIE-based method [18–21] 4.0/). can achieve an enormous computational load reduction without loss of speech quality. By
Electronics 2021, 10, 380. https://doi.org/10.3390/electronics10040380 https://www.mdpi.com/journal/electronics Electronics 2021, 10, x FOR PEER REVIEW 2 of 15 Electronics 2021, 10, 380 2 of 14
employing the high correlation between the adjacent frames, a TIE method is used to reject employing the high correlation between the adjacent frames, a TIE method is used to reject thethe impossible impossible candidate candidate code code words. words. However, However, t thehe performance performance of of comp computationalutational load load reductionreduction is is weakened weakened when when the the input input vectors vectors are are uncorrelated uncorrelated.. The The EEENNS EEENNS technique technique employsemploys the the statistical characteristicscharacteristics of of an an input input vector, vector, such such as as the the mean, mean, the the variance, variance, and andthe norm,the norm to reject, to reject the impossible the impossible code code words. words. In order In order to further to fu reducerther reduce the computational the compu- tationalload, the load BSS-VQ, the BSS method-VQ method [25] is proposed,[25] is proposed, in which in thewhich number the number of candidate of candidate code words code wordsis closely is closely related related to the to distribution the distribution of the of hit the probability, hit probability, and a and sharpened a sharpened distribution distri- butionat specific at specific code words code words yields yields an enormous an enormous computational computational load. load. However, However, the probability the prob- abilitydistribution distribution is not uniform; is not uniform some subspaces; some subspaces are concentrated, are concentrated, and some and are flattened.some are Thusflat- tened.sometimes Thus thesometimes performance the performance is not good. is not good. ToTo solve solve the the above above issues, issues, an an efficient efficient VQ VQ codebook codebook search search algorithm algorithm is is proposed. proposed. EvenEven though though the the input input vectors vectors are are uncorrelated, uncorrelated, it it can can still still reduce reduce the the computational computational load load significantlysignificantly whilewhile maintaining maintaining the the quantization quantization accuracy. accuracy. Compared Compared with previous with previous works workson the on full the search full search algorithm algorithm [6], TIE [6 [],18 TIE], ITIE [18], [21 ITIE], and [21 the], and BSS-ITIE the BSS [-28ITIE], the [28 experimental], the exper- imentalresults showresults that show the that proposed the proposed algorithm algorithm can quantize can quantize the input the vector input withvector the with lowest the lowestcomputational computational load. load. TheThe rest rest of of this this paper paper is organized is organized as follows. as follows. Section Section 2 describes2 describes the encoding the encoding proce- dureprocedure of the LSF of the coefficients LSF coefficients in G.729. in Section G.729. 3 Section presents3 presentsthe theory the of theorythe proposed of the proposedalgorithm inalgorithm detail. Section in detail. 4 shows Section the4 shows experimental the experimental results and results performance and performance comparison comparison between thebetween proposed the algorithm proposed and algorithm previous and works. previous Finally, works. Section Finally, 5 concludes Section this5 concludes work. this work. 2. LSF Coefficients Quantization in G.729 2. LSF Coefficients Quantization in G.729 The ITU-T G.729 [29] speech codec was selected as the platform to verify the perfor- manceThe of ITU-Tthe proposed G.729 [ 29algorithm.] speech codecThus, wasbefore selected introducing as the platform the theory to verifyof the theproposed perfor- method,mance of the the principle proposed of algorithm.the LSF quantization Thus, before in G.729 introducing is introduced the theory here. of the proposed method, the principle of the LSF quantization in G.729 is introduced here. The LSF coefficients are obtained by an equation iarcos q i , where qi is the LSF The LSF coefficients are obtained by an equation ωi = ar cos(qi), where qi is the LSF coefficient in the cosine domain, and w is the computed LSF coefficient in the frequency coefficient in the cosine domain, and wii is the computed LSF coefficient in the frequency domain.domain. TThehe procedure procedure of of the the LSF LSF quantize quantize is is organized organized as as follows: follows: a a switched switched 4th 4th moving moving averageaverage (MA) (MA) prediction prediction is is used used to to predict predict the the LSF LSF coefficients coefficients of of the the current current frame. frame. The The differencedifference between the the computed computed and and predicted predicted coefficients coefficients is quantized is quantized using using a two-stage a two- stagevector vector quantizer. quantizer. The The first first stage stage is a is 10-dimensional a 10-dimensional VQ VQ using using a codebook a codebookL1 withL1 with 128 128entries entries (7 bits). (7 bits). In the In the second second stage, stage, the the quantization quantization error error vectors vectors of the of firstthe first stage stage are splitare splitinto twointo sub-vectors,two sub-vectors, which which then arethen quantized are quantized by a split by VQa split associating VQ associating two codebooks, two code-L2 books,and L3 ,L each2 and containing L3 , each 32 entriescontaining (5 bits). 32 entries Figure 1(5 illustrates bits). Figure the structure1 illustrates of thethe two-stagestructure ofVQ the for two LSF-stage coefficients. VQ for LSF coefficients.
Codebook Codebook L1 L2
Speech frame xi Predicted Codevector l1 Codevector l2 quantized LSF qi LSF error Weighted codevector LPC analysis 4th MA filter MSE MSE - Minimize Predicted distortion LSF - select the p0 Weighted MSE Codevector l3
Codebook L3
Two-stage VQ
FigureFigure 1. 1. StructureStructure of of the the two two-stage-stage vector vector quantization quantization ( (VQ)VQ) for for line line spectr spectrumum frequency frequency ( (LSF)LSF) coefficients coefficients..
Electronics 2021, 10, 380 3 of 14
To explain the quantization process, it is convenient to describe the decoding process first. Each quantized value is obtained from the sum of the two code words, as follows: L1i(l1) + L2i(l2) i = 1, . . . , 5 lˆi = , (1) L1i(l1) + L3i−5(l3) i = 6, . . . , 10
where l1, l2, and l3 are the codebook indices. To guarantee that the reconstructed filters are stable, the vector lˆi is arranged such that adjacent elements have a minimum distance of dmin. This rearrangement process is done twice. First, the quantized LSF coefficients, (m) ωˆ i , for the current frame, m, are obtained from the weighted sum of previous quantizer ˆ(m−k) ˆm outputs, li , and the current quantizer output, li .
4 ! 4 (m) ˆ ˆm ˆ ˆ(m−k) ωˆ i = 1 − ∑ Pi,k li + ∑ Pi,kli i = 1, . . . , 10. (2) k = 1 k = 1
where pˆi,k are the coefficients of the switched MA predictor as defined by parameter, p0. For each of the two MA predictors the best approximation to the current LSF coefficients has to be found. The best approximation is defined as the one that minimizes the weighted mean-squared error (MSE).
10 2 Els f = ∑ Wi(ωi − ωˆ i) . (3) i = 1
The weights emphasize the relative importance of each LSF coefficient. The weights, Wi, are made adaptive as a function of the unquantized LSF parameters. 1.0 ,i f ω2 − 0.04π − 1 > 0 W1 = 2 . (4) 10(w2 − 0.04π − 1) + 1 , otherwise 1.0 ,i f ωi+1 − ωi−1 − 1 > 0 Wi, f or 2 ≤ i ≤ 9 = 2 . (5) 10(ωi+1 − ωi−1 − 1) + 1 , otherwise 1.0 , i f − ω9 + 0.92π − 1 > 0 W10 = 2 . (6) 10(−ω9 + 0.92π − 1) + 1 , otherwise
Then, the weights ω5 and ω6 are multiplied by 1.2 each. The vector to be quantized for the current frame, m, is obtained from Equation (7).
" 4 # 4 ! (m) (m) ˆ(m−k) Ii = ωi − ∑ pˆi,k Ii / 1 − ∑ pˆi,k , i = 1, . . . , 10. (7) k = 1 k = 1
The first codebook, L1, is searched and the entry, l1, that minimizes the unweighted MSE is selected. Then the second codebook, L2, is searched by computing the weighted MSE, and the entry l2, which results from the lowest error is selected. After selecting the first stage vector, l1, and the lower part of the second stage, l2, the higher part of the second stage is searched from the codebook, L3. The vector, l3, that minimizes the weighted MSE is selected. The resulting vector, Iˆi, i = 1, ... , 10, is rearranged twice using the above procedure. The procedure is done for each of the two MA predictors defined by p0, and the MA predictor that produces the lowest weighted MSE is selected. Table1 shows the two groups of coefficients of the MA predictor. When the first group of coefficients is selected, the value of p0 is 0. Otherwise, the value of p0 is 1. Table2 shows the bit allocation of the LSF quantizer. Electronics 2021, 10, 380 4 of 14
Table 1. The value of the MA predictor coefficients.
p0 k MA Predictor Coefficients 1 0.2570 0.2780 0.2800 0.2736 0.2757 0.2764 0.2675 0.2678 0.2779 0.2647 2 0.2142 0.2194 0.2331 0.2230 0.2272 0.2252 0.2148 0.2123 0.2115 0.2096 0 3 0.1670 0.1523 0.1567 0.1580 0.1601 0.1569 0.1589 0.1555 0.1474 0.1571 4 0.1238 0.0925 0.0798 0.0923 0.0890 0.0828 0.1010 0.0988 0.0872 0.1060 1 0.2360 0.2405 0.2499 0.2495 0.2517 0.2591 0.2636 0.2625 0.2551 0.2310 2 0.1285 0.0925 0.0779 0.1060 0.1183 0.1176 0.1277 0.1268 0.1193 0.1211 1 3 0.0981 0.0589 0.0401 0.0654 0.0761 0.0728 0.0841 0.0826 0.0776 0.0891 4 0.0923 0.0486 0.0287 0.0498 0.0526 0.0482 0.0621 0.0636 0.0584 0.0794
Table 2. Bit allocation of the LSF quantizer.
Procedure Code Index Bits First stage 10th order l1 7 5th low order l2 5 Second stage 5th high order l3 5 Selection of predictive filter p0 1 Total 18
3. Proposed Search Algorithm In this paper, a fast codebook search algorithm of vector quantization is proposed to reduce the computation load of the LSF coefficients’ quantization in the G.729 speech codec. Even though the input vectors are uncorrelated, this can still reduce the computation load significantly, while maintaining the quantization accuracy. In the BSS-VQ method [25], the number of candidate code words is strongly related to the distribution of the hit probability for each subspace, and a sharpened distribution at specific code words yields an enormous computational load while maintaining the quanti- zation accuracy. However, the probability distribution is not uniform: some subspaces are concentrated, and some are flattened. Thus, sometimes its performance is poor. The TIE algorithm [18] uses the strong correlation between adjacent values to narrow the search range. However, its performance is poor when the inputs are uncorrelated. Considering the drawbacks of the above methods, to further reduce computational load, especially when the inputs are uncorrelated, a novel algorithm is proposed. After the statistical analysis, the code word corresponding to the highest hit probability for each subspace is obtained and selected as the reference code word, and a squared-error distortion look-up table for each subspace is built. The adjacent code word is highly correlated due to the squared-error distortion changing slightly in the squared-error look-up table. The reference code word corresponding to the highest probability is regarded as the best- matched code word. Thus, the smaller the squared-error distortion with the reference code word, the more likely the candidate code word is to be the best matched. Therefore, the TIE technique can be employed to reject the impossible code words. The proposed algorithm consists of the training procedure and encoding procedure. The structure of the proposed algorithm is illustrated in Figure2, and the theory of the proposed algorithm is presented in detail as follows. Electronics 2021, 10, 380 5 of 14 Electronics 2021, 10, x FOR PEER REVIEW 5 of 15
Training procedure Encoding procedure
Training speech dataset
Fast locating technique Given testing speech dataset and a TQA
The codeword corresponding to the The candidate search group CSG(cr) highest hit probability is obtained is obtained by employing the TIE equation
The squared-error distortion look-up The PDE technique is employed to the table for each subspace is prebuilt computation of the squared-errpr distortion
Output index of the encoded codeword
FigureFigure 2.2. StructureStructure ofof thethe proposedproposed searchsearch algorithm.algorithm.
3.1.3.1. Training ProcedureProcedure InIn thethe three-stagethree-stage trainingtraining procedure,procedure, thethe squared-errorsquared-error look-uplook-up table for eacheach sub- spacespace isis prebuilt.prebuilt. At At stage stage 1, 1, each each dimension dimension is is dichotomized dichotomized into into two two subspaces. subspaces. Then, Then, an inputan input vector vector is assigned is assigned to a correspondingto a corresponding subspace subspace according according to the to entries the entries of the inputof the 10 vectorinput [vector25]. For [25 instance,]. For instance, when the when input the is a input 10-dimensional is a 10-dimensional vector, there vector, are 2 there= 1024 are subspaces210 1024 insubspaces G.729. Before in G.729. the encodingBefore the procedure, encoding aprocedure, look-up table a look that-up contains table that the con- hit probabilitytains the hit statistical probability information statistical oninformation the code words on the is code prebuilt. words is prebuilt. TheThe dichotomydichotomy positionposition forfor eacheach dimensiondimension is defineddefined asas thethe meanmean ofof allall thethe codecode wordswords fromfrom thethe codebook,codebook, presentedpresented as:as: 1 CSizeCSize meank1 ck,0 kDim , mean(k) = cii(k), 0 ≤ k < Dim, (8)(8) CSize∑i1 CSize i = 1
where ci k is the k -th component of the i -th code word ci , and mean k is the av- where ci(k) is the k-th component of the i-th code word ci, and mean(k) is the average value oferage all the valuei-th of components. all the i -th For components. instance, in theFor firstinstance, stage ofin thethe LSFfirst coefficients stage of the quantization LSF coeffi- incients G.729, quantizationCSize = 128 in ,G.729,Dim =CSize10 in the128 codebook, Dim 10L 1in. A the parameter codebookvn (Lk1) .is A defined parameter for vector quantization on the n-th input vector xn, symbolized as: vn k is defined for vector quantization on the n -th input vector xn , symbolized as: k k ( ) ≥ ( ) 22, x ,nxnk k mean meank k vn(k) = , 0 ≤ k < Dim, (9) vn k < ,0 k Dim , (9) 0, xn(k) mean(k) 0, xn k mean k where xn(k) is the k-th component of xn. Then, xn is assigned to a subspace j bssj , where where xn k is the k -th component of xn . Then, xn is assigned to a subspace j bssj , j is the sum of vn(k) over all the dimensions, presented as:
where j is the sum of vn k over all the dimensions, presented as: Dim−1 x ∈ bss |j = Dim1 v (k). (10) n j ∑ n xn bss j| jk = 0 v n k . (10) k0
For instance, given an input xn = {0.44, 0.08, 0.51, 0.87, 1.26, 1.40, 2.10, 2.18, 2.30, 2.42}, For instance, given an input xn 0.44,0.08,0.51,0.87,1.26,1.40,2.10,2.18,2.30,2.42 , vn(k) = {1, 0, 0, 8, 16, 0, 64, 128, 0, 0} for each k, j = 217 is obtained by (8) and (9), respec- v k 1,0,0,8,16,0,64,128,0,0 for each k , j 217 is obtained by (8) and (9), re- tively.n Then, the input vector xn is assigned to the subspace bssj with j = 217. This means that an input vector can be assigned to a corresponding subspace quickly, with only a few spectively. Then, the input vector xn is assigned to the subspace bssj with j 217 . This basic operations. means that an input vector can be assigned to a corresponding subspace quickly, with only a few basic operations.
Electronics 2021, 10, 380 6 of 14
At stage 2, the hit probability table for each subspace is prebuilt through a training mechanism, which includes the probability for each code word to be the best-matched code word in each subspace. This is defined as follows: Phit ci bssj , where 1 ≤ i ≤ CSize, 1 ≤ j ≤ Snum, and Snum = 2Dim is the number of subspaces. Subsequently, the table is sorted in descending order by the hit probability value. For example, when i = 1, Phit i bssj |i = 1 = max Phit(ci) bssj represents the highest hit probability in bssj and the ci corresponding code word. The sum of all the hit probabilities for each subspace is 1.0. The hit probability Phit ci bssj is computed by Equation (11), and the cumulative probability of the top N code words is symbolized as Equation (12):