applied sciences
Article A Multi-Frame PCA-Based Stereo Audio Coding Method
Jing Wang *, Xiaohan Zhao, Xiang Xie and Jingming Kuang
School of Information and Electronics, Beijing Institute of Technology, 100081 Beijing, China; [email protected] (X.Z.); [email protected] (X.X.); [email protected] (J.K.) * Correspondence: [email protected]; Tel.: +86-138-1015-0086
Received: 18 April 2018; Accepted: 9 June 2018; Published: 12 June 2018
Abstract: With the increasing demand for high quality audio, stereo audio coding has become more and more important. In this paper, a multi-frame coding method based on Principal Component Analysis (PCA) is proposed for the compression of audio signals, including both mono and stereo signals. The PCA-based method makes the input audio spectral coefficients into eigenvectors of covariance matrices and reduces coding bitrate by grouping such eigenvectors into fewer number of vectors. The multi-frame joint technique makes the PCA-based method more efficient and feasible. This paper also proposes a quantization method that utilizes Pyramid Vector Quantization (PVQ) to quantize the PCA matrices proposed in this paper with few bits. Parametric coding algorithms are also employed with PCA to ensure the high efficiency of the proposed audio codec. Subjective listening tests with Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) have shown that the proposed PCA-based coding method is efficient at processing stereo audio.
Keywords: stereo audio coding; Principal Component Analysis (PCA); multi-frame; Pyramid Vector Quantization (PVQ)
1. Introduction The goal of audio coding is to represent audio in digital form with as few bits as possible while maintaining the intelligibility and quality required for particular applications [1]. In audio coding, it is very important to deal with the stereo signal efficiently, which can offer better experiences of using applications like mobile communication and live audio broadcasting. Over these years, a variety of techniques for stereo signal processing have been proposed [2,3], including M/S stereo, intensity stereo, joint stereo, and parametric stereo. M/S stereo coding transforms the left and right channels into a mid-channel and a side channel. Intensity stereo works on the principle of sound localization [4]: humans have a less keen sense of perceiving the direction of certain audio frequencies. By exploiting this characteristic, intensity stereo coding can reduce the bitrate with little or no perceived change in apparent quality. Therefore, at very low bitrate, this type of coding usually yields a gain in perceived audio quality. Intensity stereo is supported by many audio compression formats such as Advanced Audio Coding (AAC) [5,6], which is used for the transfer of relatively low bit rate, acceptable-quality audio with modest internet access speed. Encoders with joint stereo such as Moving Picture Experts Group (MPEG) Audio Layer III (MP3) and Ogg Vorbis [7] use different algorithms to determine when to switch and how much space should be allocated to each channel (the quality can suffer if the switching is too frequent or if the side channel does not get enough bits). Based on the principle of human hearing [8,9], Parametric Stereo (PS) performs sparse coding in the spatial domain. The idea behind parametric stereo coding is to maximize the compression of a stereo signal by transmitting parameters describing the spatial image. For stereo input signals, the compression process basically follows one idea: synthesizing one signal
Appl. Sci. 2018, 8, 967; doi:10.3390/app8060967 www.mdpi.com/journal/applsci Appl. Sci. 2018, 8, 967 2 of 22 from the two input channels and extracting parameters to be encoded and transmitted in order to add spatial cues for synthesized stereo at the receiver’s end. The parameter estimation is made in the frequency domain [10,11]. AAC with Spectral Band Replication (SBR) and parametric stereo is defined as High-Efficiency Advanced Audio Coding version 2 (HE-AACv2). On the basis of several stereo algorithms mentioned above, other improved algorithms have been proposed [12], which causes Max Coherent Rotation (MCR) to enhance the correlation between the left channel and the right channel, and uses MCR angle to substitute the spatial parameters. This kind of method with MCR reduces the bitrate of spatial parameters and increases the performance of some spatial audio coding, but has not been widely used. Audio codec usually uses subspace-based methods such as Discrete Cosine Transform (DCT) [13], Fast Fourier Transform (FFT) [14], and Wavelet Transform [15] to transfer audio signal from time domain to frequency domain in suitably windowed time frames. Modified Discrete Cosine Transform (MDCT) is a lapped transform based on the type-IV Discrete Cosine Transform (DCT-IV), with the additional property of being lapped. Compared to other Fourier-related transforms, it has half as many outputs as inputs, and it has been widely used in audio coding. These transforms are general transformations; therefore, the energy aggregation can be further enhanced through an additional transformation like PCA [16,17], which is one of the optimal orthogonal transformations based on statistical properties. The orthogonal transformation can be understood as a coordinate one. That is, fewer new bases can be selected to construct a low dimensional space to describe the data in the original high dimensional space by PCA, which means the compressibility is higher. Some work was done on the audio coding method combined with PCA from different views. Paper [18] proposed a novel method to match different subbands of the left channel and the right channel based on PCA, through which the redundancy of two channels can be reduced further. Paper [19] mainly focused on the multichannel procession and the application of PCA in the subband, and it discussed several details of PCA, such as the energy of each eigenvector and the signal waveform after PCA. This paper introduced the rotation angle with Karhunen-Loève Transform (KLT) instead of the rotation matrix and the reduced-dimensional matrix compared to our paper. The paper [20] mainly focused on the localization of multichannel based on PCA, with which the original audio is separated into primary and ambient components. Then, these different components are used to analyze spatial perception, respectively, in order to improve the robustness of multichannel audio coding. In this paper, a multi-frame, PCA-based coding method for audio compression is proposed, which makes use of the properties of the orthogonal transformation and explores the feasibility of increasing the compression rate further after time-frequency transition. Compared to the previous work, this paper proposes a different method of applying PCA in audio coding. The main contributions of this paper include a new matrix construction method, a matrix quantization method based on PVQ, a combination method of PCA and parametric stereo, and a multi frame technique combined with PCA. In this method, the encoders transfer the matrices generated by PCA instead of the coefficients of the frequency spectrum. The proposed PCA-based coding method can hold both a mono signal and a stereo signal combined with parametric stereo. With the application of the multi-frame technique, the bitrate can be further reduced with a small impact on quality. To reduce the bitrate of the matrices, a method of matrix quantization based on PVQ [21] is put forward in this paper. The rest of the paper is organized as follows: Section2 describes the multi-frame, PCA-based coding method for mono signals. Section3 presents the proposed design of the matrix quantization. In Section4 , the PCA-based coding method for the mono signal is extended to stereo signals combined with improved parametric stereo. The experimental results, discussion, and conclusion are presented in Sections5–7, respectively. Appl. Sci. 2018, 8, 967 3 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 3 of 22
Appl. Sci. 2018, 8, x FOR PEER REVIEW 3 of 22 2.2. Multi-Frame Multi-Frame PCA-Based PCA-Based Coding Coding Method Method 2. Multi-Frame PCA-Based Coding Method 2.1.2.1. Framework Framework of of PCA-Based PCA-Based Coding Coding Method Method 2.1. Framework of PCA-Based Coding Method TheThe encoding process process can be described as follows: after time-frequency transformation such such as as MDCT,MDCT,The the encoding frequency process coefficients coefficients can be describedare used in as the follows: module after of time-frequency PCA, which includes transformation the multi-frame such as technique.technique.MDCT, the Several frequency Several matrices matrices coefficients are are generated are generated used after in afterthe PCA module PCA is quantized isof quantizedPCA, whichand andencoded includes encoded to the bitstream. multi-frame to bitstream. The decoderThetechnique. decoder is the Severalis mirror the mirrormatrices image image ofare the generated of encoder, the encoder, afterafter PCAde aftercoding is decodingquantized and de-quantizing, and and de-quantizing,encoded matrices to bitstream. matrices are used The are to generateuseddecoder to generate isfrequency the mirror frequency domain image domainof signalsthe encoder, signals by inverseafter by inverse de codingPCA PCA and(iPCA). (iPCA). de-quantizing, Finally, Finally, after matrices after frequency-time are used to transformation,transformation,generate frequency the the encoder encoder domain can can signals export export audio. audio.by inverse Flowcharts Flowcharts PCA of of(iPCA). encoder encoder Finally, and and decoder decoder after for forfrequency-time mono mono signals signals arearetransformation, shown shown in in Figures Figures the encoder 11 and and 2.2 . canThe The export part part of audio. of MDCT MDCT Flowcharts is isused used to ofconcentrate to encoder concentrate and energy decoder energy of signal offor signal mono on low onsignals band low inbandare frequency shown in frequency in domain, Figures domain, 1which and which2. isThe good ispart good for of MDCT forthe theprocess is process used of to ofmatrix concentrate matrix construction construction energy of(details (details signal areon are lowshown shown band in in SectionSectionin frequency 2.4).2.4). Some Some domain, informal informal which listening listening is good experiments experiments for the process ha haveve beenof been matrix carried carried construction out out on on the the performance(details performance are shown applying applying in PCAPCASection without 2.4). Some MDCT.MDCT. informal TheThe experimentalexperimental listening experiments results results show show have that that been without without carried MDCT, outMDCT, on the the the performance performance performance of applying PCAof PCA has hasslightPCA slight reduction,without reduction, MDCT. which which The means experimental means more more bits are resultsbits needed are show needed by that the by schemewithout the scheme withoutMDCT, without MDCTthe performance MDCT in order in to orderof achieve PCA to achievethehas same slight the output reduction,same qualityoutput which ofquality the means scheme of the more withscheme bits MDCT. are with needed Thus,MDCT. inby Thus, thisthe paperscheme in this MDCT withoutpaper is MDCT assumedMDCT is inassumed to order enhance to to enhancetheachieve performance thethe performancesame of output the PCA, qualityof the although PCA, of the although itscheme will bring with it will more MDCT. bring computational Thus,more computationalin this complexity.paper MDCT complexity. is assumed to enhance the performance of the PCA, although it will bring more computational complexity. Multi-frame PCA Multi-frame PCA
TF Rotation Quantization Input signal PCA Bitstream transformationTF Rotationmatrix Quantizationand encode Input signal PCA Bitstream transformation matrix and encode
Reduced- dimensionalReduced- dimensionalmatrices matrices
Figure 1. Flowchart of mono encoder. (TF, Time-to-Frequency; PCA, Principle Component Analysis). FigureFigure 1.1. FlowchartFlowchart of mono encoder. (TF, Time-to-Fr Time-to-Frequency;equency; PCA, Principle Principle Component Component Analysis). Analysis).
Decode and Rotation FT Bitstream Decode and Rotation iPCA FT Output Bitstream dequantization matrix iPCA transformation Output dequantization matrix transformation
Reduced- dimensionalReduced- dimensionalmatrices matrices
Figure 2. Flowchart of mono decoder. (iPCA, inverse Principle Component Analysis; FT, Frequency- Figure 2. Flowchart of mono decoder. (iPCA, inverse Principle Component Analysis; FT, Frequency- to-Time).Figure 2. Flowchart of mono decoder. (iPCA, inverse Principle Component Analysis; FT, Frequency-to-Time).to-Time). 2.2. Principle of PCA 2.2. Principle of PCA 2.2. Principle of PCA TheThe PCA’s PCA’s mathematical mathematical principleprinciple isis asas follows:follows: after coordinate transformation,transformation, thethe originaloriginal The PCA’s mathematical principle is as follows: after coordinate transformation, the original high-dimensionalhigh-dimensional samples samples with with certcertainain relevancerelevance can be transferred toto aa newnew setset ofof low-dimensionallow-dimensional high-dimensional samples with certain relevance can be transferred to a new set of low-dimensional samplessamples that that are are unrelated unrelated toto eacheach other.other. TheseThese newnew samples carry most informationinformation ofof thethe original original samples that are unrelated to each other. These new samples carry most information of the original datadata and and can can replace replace the the original original samplessamples forfor follow-upfollow-up analysis. data and can replace the original samples for follow-up analysis. ThereThere are are several several criteria criteria forfor choosingchoosing newnew samples or selecting new basesbases inin PCA.PCA. TheThe typical typical There are several criteria for choosing new1 samples or selecting new bases in PCA. The typical methodmethod is is to to use use the the variance variance ofof newnew samplesample FF1 (i.e.,(i.e., thethe variance ofof thethe originaloriginal samplesample mappingmapping on on themethodthe new new iscoordinates). coordinates). to use the variance The The larger larger of new VarVar sample((FiFi)) is,is, thetheF1 (i.e.,moremore the information variance of Fi the contains.contains. original So,So, sample thethe firstfirst mapping principal principal on the new coordinates). The larger Var (Fi) is, the more information Fi contains. So, the first principal componentcomponent should should have have the the largest largest variancevariance FF11.. IfIf thethe first principal componentcomponent FF11 isis notnot qualified qualified to to component should have the largest variance F . If the first principal component F is not qualified replacereplace the the original original sample, sample, thenthen thethe secondsecond principalprincipal1 component F22 shouldshould bebe considered.considered.1 F F2 2is is the the to replace the original sample, then the second principal component1 2 F should be considered. F is principalprincipal componentcomponent withwith thethe largestlargest variancevariance except F1, and F2 isis 2uncorrelateduncorrelated toto , , thatthat is,2is, the( principal) component with the largest variance1 except F ,2 and F is uncorrelated to F , that is, Cov Cov ( , , )=0=0. .This This means means that that thethe basebase ofof FF1 andand thethe base of1 F2 areare orthogonalorthogonal2 toto eacheach other, other,1 which which cancan reduce reduce the the data data redundancyredundancy betweenbetween newnew sampsamples (or principal components)components) effectively.effectively. TheThe third,third, fourth, fourth, andand p-p-thth principalprincipal componentcomponent cancan be constructed similarly.similarly. TheThe variancevariance ofof thesethese Appl. Sci. 2018, 8, 967 4 of 22
Cov(F1, F2) = 0. This means that the base of F1 and the base of F2 are orthogonal to each other, which can reduce the data redundancy between new samples (or principal components) effectively. The third, fourth, and p-th principal component can be constructed similarly. The variance of these principal components is in descending order, and the corresponding base in new space is uncorrelated to other new base. If there are m n-dimensional data, the procession of PCA is shown in Table1.
Table 1. PCA ALGORITHM.
Algorithm: PCA (Principle Component Analysis) (i) Obtain matrix by columns: X [ m n ] (ii) Zero-mean columns in X to get matrix X [ m n ] 1 T (iii) Calculate C = m X X (iv) Calculate eigenvalues λ1, λ2, λ3 ... λn and eigenvectors a1, a2, a3 ... an of C (v) Use eigenvector to construct P according to the eigenvalue [ n n ] (vi) Select the first k columns of P to construct the rotation matrix P [ n n ] [ n k ] (vii) Complete dimensionality reduction by Y = X × P [ m k ] [ m n ] [ n k ]
The contribution rate of the principal component reflects the proportion that each principal component accounts for the total amount of data after coordinate transformation, which can effectively solve the problem of dimension selection after dimensionality reduction. In PCA application, people often use the cumulative contribution rate as the basis for principal components selection. The cumulative contribution rate Mk of the first k principal components is
k ∑i=1 λi Mk = n (1) ∑i=1 λi If the contribution rate of the first k principal components meets the specific requirements (the contribution rates are different according to different requirements), the first k principal components can be used to describe the original data to achieve the purpose of dimensionality reduction. PCA is a good transformation due to its properties, as follows:
(i) Each new base is orthogonal to the other new base; (ii) Mean squared error of the data is the minimum after transformation; (iii) Energy is more concentrated and more convenient for data processing.
It is worthwhile noting that PCA does not simply delete the data of little importance. After PCA transformation, the dimension-reduced data can be transformed to restore most of the high-dimensional original data, which is a good character for data compression. In this paper, as is shown in Figure3, the spectrum coefficients of the input signal are divided into multiple samples according to specific rules; then, these samples will be constructed to the original matrix X. After the principal component analysis, matrix X is decomposed into reduced-dimensional matrix Y and rotation matrix P; the process of calculating matrix Y and P is shown in Table1. The matrix Y and P are transmitted to the decoder after quantization and coding. In decoder, the original matrix can be restored by multiplying reduced-dimensional matrix and transposed rotation matrix. There is some data loss during dimension reduction, but the loss is much less, so we can ignore it. For example, we can recover 99.97% information through a 6-dimension matrix, when the autocorrelation matrix has the 15th dimension. Ideally the original matrix X can be restored by reduced-dimensional matrix Y and rotation matrix P with T X ≈ Xrestore = Y×P (2)
T in which Xrestore is the matrix restored in decoder and P is the transposition rank of matrix P. Then, Xrestore is reconstructed to spectral coefficients. Appl. Sci. 2018, 8, 967 5 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 5 of 22
0.4
0.2
Appl. Sci. 2018, 8, x FOR PEER REVIEW 5 of 22 0 Reduced- PCA Rotation 0.4 dimensional matrix P matrix Y -0.2 0.2
-0.4 0 Reduced- 0 50 100 150 200 250 PCA Rotation dimensional matrix P matrix Y Spectrum-0.2 coefficients Original matrix X
Figure-0.4 3. Scheme of PCA-based coding method. (PCA, Principle Component Analysis). Figure0 3.50Scheme100 of150 PCA-based200 250 coding method. (PCA, Principle Component Analysis).
Spectrum coefficients Original matrix X 2.3. Format of Each Matrix 2.3. Format of EachFigure Matrix 3. Scheme of PCA-based coding method. (PCA, Principle Component Analysis). InIn encoder, encoder, when when the the sampling sampling rate rate is is 48 48 kHz, kHz, th thee frame frame has has 240 240 spectral spectral coefficients coefficients after after MDCT MDCT 2.3. Format of Each Matrix (in(in this this paper, paper, the the MDCT MDCT frame frame size size is is 5 5 ms ms with with 50% 50% overlap). overlap). There There are are many many forms of matrices like 6 In encoder, when the sampling rate is 48 kHz, the frame has 240 spectral coefficients after MDCT 6 × 40,40, 12 12 ×× 20,20, 20 20 × ×12,12, and and so soon; on; each each format format of ofmatr matrixix brings brings different different compression compression rates. rates. In In a asimple simple test, several(in formats this paper, of the original MDCT framematrix size were is 5 ms cons withtructed. 50% overlap). Then, There a subjectiveare many forms test of was matrices devised like using test, several6 × 40, formats 12 × 20, of20 × original 12, and so matrix on; each wereformat constructed. of matrix brings Then, different a subjectivecompression testrates. was In a simple devised using thosethose different differenttest, severaldimensional dimensional formats ofrotation rotation original matrices.matrix matrices. were 10 cons 10 lis listenerstructed.teners recordedThen, recorded a subjective the the number number test was of devised of dimensions dimensions using when when thethe restored restoredthose audio audio different had had acceptabledimensional acceptable rotationquality. quality. matrices. Then, Then, 10th the elis compression teners recorded rateratethe number waswas calculatedcalculated of dimensions byby when the the number number of the restored audio had acceptable quality. Then, the compression rate was calculated by the number ofdimensions. dimensions. As As is is shown shown in in Figure Figure4, the4, the matrix matrix has has the the largest largest compression compression rate rate when when it has it 16has rows. 16 of dimensions. As is shown in Figure 4, the matrix has the largest compression rate when it has 16 rows.So, the So, matrix the matrixX [ with] with 16 16 rows rows and and 15 columns15 columns is selected is selected for transientfor transient frame frame in this in paper.this rows. So,[ the matrix] [ ] with 16 rows and 15 columns is selected for transient frame in this paper. That means 16a 240-coefficient-long 15 frequency domain signal is divided into 16 samples, each That meanspaper. a 240-coefficient-long That means a 240-coefficient-long frequency frequency domain domain signal signal is divided is divided into into 16 16 samples, samples, each each sample sample havingsample 15 having dimensions. 15 dimensions. having 15 dimensions.
0.6 0.6
0.5 0.5 0.4 0.4 0.3
0.3 0.2 compression rate 0.2 0.1
compression rate 0 0.1 10 15 20 25 Row of matrix X
0FigureFigure 4. Compression 4. Compression rate rate for diffe differentrent format format of matrix. of matrix. 10 15 20 25 2.4. Way of Matrix Construction 2.4. Way of Matrix Construction Row of matrix X An appropriate way to obtain the 16 samples from frequency domain coefficients is necessary. An appropriateThis paper proposes wayFigure to one obtain4. methodCompression the as 16 follows: samples rate forsuppo diffe fromse rentthe frequency coefficientsformat of domain matrix.of one frame coefficients in frequency is necessary. This paperdomain proposes are one , … method . asis filled follows: in the suppose first column the and coefficients the first row of one [ frame], is in filled frequency in the domain a a firsta columna and the second row [ ] , and is filled in the firstX column anda the 16th row 2.4.are Way1, of2 ... Matrix240. Construction1 is filled in the first column and the first row [ ], 2 is filled in the first [ ]. Then, is filled in the first row and second column [ ], 1 is 1 filled in the second column and the second row X , and a is filled in the first column and the 16th row X . An appropriate way to obtain[ 2 the 1 ] 16 samples16 from frequency domain coefficients is necessary.[ 16 1 ] ThisThen, papera isproposes filled in one the firstmethod row as and follows: second suppo columnse Xthe coefficients, a is of filled one in frame the second in frequency row and 17 [ 1 2 ] 18 domain are , … . is filled in the first column and the first row [ ], is filled in the second column X , and so on, until all the coefficients have been filled in the original matrix first column and the[ 2 second 2 ] row [ ] , and is filled in the first column and the 16th row X ; that is, [ [ 16 ]. Then, 15 ] is filled in the first row and second column [ ], is filled in the second Appl. Sci. 2018, 8, x FOR PEER REVIEW 6 of 22
Appl. Sci. 2018, 8, 967 6 of 22 row and second column [ ], and so on, until all the coefficients have been filled in the original matrix [ ]; that is, , ⋯ a1, ··· a225 [ ] = ⋮⋱ ⋮ (3) X = . ⋯ .. . (3) [ 16 15 ] , . . . This method has two obvious advantages, whicha16, can··· be finda240 in Figure 5: This method has two obvious advantages, which can be find in Figure5:
5
4
3
2
1
value 0
-1
-2
-3
-4 20
15 15
10 10 row 5 5 column
0 0
Figure 5. Example for matrix construction (“value” means the value of cells in original matrix, “column” Figuremeans 5. the Example column offor original matrix matrix,construction and “row” (“value” means me theans row the of value original of matrix).cells in original matrix, “column” means the column of original matrix, and “row” means the row of original matrix). (i) This method takes advantage of the short-time stationary characteristic of signals in the frequency(i) This domain. method Therefore, takes advantage the difference of the between short-time different stationary rows incharacteristic the same column of signals of the in matrix the frequencyconstructed domain. by this Therefore, sampling the method difference is small. between In different other words, rows thein the difference same column between of the the matrix same constructeddimensions by of differentthis sampling samples method in the is matrix small. is In small, other and words, different the dimensionsdifference between have similar the same linear dimensionsrelationships, of different which is verysamples conducive in the matrix to dimensionality is small, and reduction. different dimensions have similar linear relationships,(ii) This methodwhich is allows very conducive signal energy to dimensionality to gather still in reduction. the low-dimensional region of the new space. The(ii) energy This ofmethod the frequency allows signal domain energy signal tois gather concentrated still in the in thelow-dimensional low frequency region region; ofafter the new PCA , space.the advanced The energy column of the of frequency reduced-dimensional domain signal matrix is concentrated still has the in mostthe low signal frequency energy. region; Thus, after after PCA,dimensionality the advanced reduction, column we of canreduced-dimensional still focus on the low-dimensional matrix still has region.the most signal energy. Thus, after dimensionality reduction, we can still focus on the low-dimensional region. 2.5. Multi-Frame Joint PCA 2.5. Multi-FrameIn the experiment, Joint PCA a phenomenon was observed that the rotation matrices of adjacent frames are greatlyIn the similar. experiment, Therefore, a phenomenon it is possible was to do observed joint PCA that with the multiplerotation matrices frames to of generate adjacent one frames rotation are greatlymatrix, similar. that is, Therefore, multiple frames it is possible use the to same do join rotationt PCA matrix. with multiple Therefore, frames the codecto generate can transmit one rotation fewer matrix,rotation that matrices, is, multiple and bitrateframes can use be the reduced. same rotation matrix. Therefore, the codec can transmit fewer rotationBelow matrices, is one and way bitrate to do can joint be reduced. PCA with least error. First, frequency domain coefficients of n sub-frames are constructed as n original matrices X , X ... X , Below is one way to do joint PCA with least error. First,1[ frequency16 15 ] domain2[ 16 coefficients 15 ] n[ of16 n sub- 15 ] , framesrespectively; are constructed then, the as original n original matrices matrices of each [ sub-frame ], [ are ]… used [ to form] respectively; one original then, matrix the originalX matrices. This of matrixeach sub-frame is used to are obtain used one to rotationform one matrix original and matrixn reduced-dimensional [ ]. This matrix matrices. is [ 16n 15 ] used to obtain one rotation matrix and n reduced-dimensional matrices. If too many matrices are analyzed at the same time, the codec delay will be high, which is If too many matrices are analyzed at the same time, the codec delay will be high, which is unbearable for real-time communication. Besides, the average quality of restored audio signal decreases unbearable for real-time communication. Besides, the average quality of restored audio signal with the increase in the number of frames. Therefore, the need to reduce bitrate and real-time decreases with the increase in the number of frames. Therefore, the need to reduce bitrate and real- communication should be comprehensively considered. A subjective listening test was designed Appl. Sci. 2018, 8, x FOR PEER REVIEW 7 of 22
Appl. Sci. 2018, 8, x FOR PEER REVIEW 7 of 22 timeAppl. communication Sci. 2018, 8, 967 should be comprehensively considered. A subjective listening test was designed7 of 22 to find the relationship between the number of frames and the quality of restored signal. 10 audio time communication should be comprehensively considered. A subjective listening test was designed materialsto find from the European relationship Broadcasting between the Union number (EBU) of fr testames materials and the qualitywere coded of restored with multi-frame signal. 10 audio PCA to find the relationship between the number of frames and the quality of restored signal. 10 audio with differentmaterials numbers from European of frames. Broadcasting The Mean Union Opin (EBU)ion test Score materials (MOS) were [22] coded of the with restored multi-frame music PCA was recordedmaterialswith by different from 10 listeners. European numbers The Broadcasting ofstatistical frames. Theresults Union Mean are (EBU) Opin shownion test inScore materials Figure (MOS) 6. were [22]coded of the withrestored multi-frame music was PCA withrecorded different by numbers 10 listeners. of frames.The statistical The Meanresults Opinionare shown Score in Figure (MOS) 6. [22] of the restored music was recorded by 10 listeners. The statistical results are shown in Figure6.
5 multi frame 5 onemulti frame frame 4.8 one frame 4.8 4.6 4.6 MOS 4.4MOS 4.4
4.2 4.2
4 4
2 2 4 4 6 6 8 8 1010 1212 1414 NumberNumber of of frame frame
FigureFigure 6. 6.Subjective Subjective test test results results forfor different number number of of frames. frames. Figure 6. Subjective test results for different number of frames. As is shown in Figure 6, when the number of frames is less than 6 or 8, the decrease of audio AsAs is isshown shown in inFigure Figure 6,6 ,when when the the number number of of fr framesames is is less less than than 6 6or or 8, 8, the the decrease decrease of of audio audio quality is not obvious. A suitable number of frames is then subjected to joint PCA. Taken together, quality is not obvious. A suitable number of frames is then subjected to joint PCA. Taken together, qualitywhen is not 8 sub-framesobvious. A are suitable analyzed number at the sameof frames time, theis then bitrate subjected and the delayto joint of PCA.encoder Taken is acceptable, together, when 8 sub-frames are analyzed at the same time, the bitrate and the delay of encoder is acceptable, when 8that sub-frames is, for every are 40analyzed ms signal, at the 8 sub-frame same time, redu theced-dimensional bitrate and the (Rd)delay matrices of encoder and isone acceptable, rotation thatthat is, is,matrix for for every everyare transferred. 40 40 msms signal,signal, Main 8 8functions sub-frame sub-frame of reduced-dimensionalthe redu monoced-dimensional encoder and decoder (Rd) (Rd) matrices combinedmatrices and withand one rotationmulti-frameone rotation matrix matrixare transferred.joint are transferred.PCA are Main shown Main functions in functionsFigures of 7 the andof monothe 8. Inmo encoderencoder,no encoder 40 and ms and decoder signal decoder is combined used combined to produce with with multi-frame 8 Rd multi-frame matrices joint jointPCA PCAand are 1are shown rotation shown in matrix. Figuresin Figures In7 decoder,and 7 8and. In after8. encoder, In receiving encoder, 40 ms 8 40 Rd signal ms matrices signal is used and is used to1 rotation produce to produce matrix, 8 Rd matrices88 Rdframes matrices andare 1 androtation 1 rotationrestored matrix. tomatrix. generate In decoder, In 40decoder, ms after signal. after receiving receiving 8 Rd matrices8 Rd matrices and 1 and rotation 1 rotation matrix, matrix, 8 frames 8 frames are restored are restoredto generate to generate 40 ms signal.40 ms signal. Frequency domain Frequency signal domain signal
PCA
PCA
Rd Rd Rd Rd matrix1 matrix2 matrix3 matrix4 Rotation Rd Rd Rd Rd matrix Rd matrix5Rdmatrix6 Rdmatrix7 Rdmatrix8 matrix1 matrix2 matrix3 matrix4 Rotation Rd Rd Rd Rd matrix matrix5 matrix6 matrix7 matrix8 Quantization and encode
Figure 7. Multi-frame in encoder. (PCA,Quantization Principle Component Analysis; Rd, reduced-dimensional). and encode
FigureFigure 7. Multi-frame 7. Multi-frame in inencoder. encoder. (PCA, (PCA, Principle Principle Co Componentmponent Analysis; Analysis; Rd, Rd, reduced-dimensional). reduced-dimensional). Appl. Sci. 2018, 8, 967 8 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 8 of 22
Dequantization
Rd Rd Rd Rd matrix1 matrix2 matrix3 matrix4 Rotation Rd Rd Rd Rd matrix matrix5 matrix6 matrix7 matrix8
iPCA coefficients
Frequency domain signal
FigureFigure 8. Multi-frame 8. Multi-frame in decoder. in decoder. (iPCA, inverse (iPCA, Principle inverse Component Principle Component Analysis; Rd, Analysis; reduced- Rd, dimensional).reduced-dimensional).
3. 3.Quantization Quantization Design Design Based Based On On PVQ PVQ AccordingAccording to tothe the properties properties of ofmatrix matrix multiplication, multiplication, if the if the error error of ofone one point point in inmatrix matrix Y orY or P isP is large,large, the the restored restored signal signal may may have have a largelarge error.error. Therefore, Therefore, uniform uniform quantization quantization cannot cannot limit limit the the error errorof every of every point point in the in the matrix matrix in the in the acceptable acceptable range range with with bitrate bitrate limitation. limitation. So, So, it is it necessary is necessary to setto a setseries a series of new of new quantization quantization rules rules based based on the on properties the properties of the of dimensionality the dimensionality matrix matrix and the and rotation the rotationmatrix. matrix. It is assumed It is assumed that the that audio the signalaudio obeyssignal theobeys distribution the distribution of Laplace of Laplace [23], and [23], both and PCA both and PCAMDCT and inMDCT the paper in the are paper orthogonal are orthogonal transformations. transformations. Thus, the distributionThus, the distribution of matrix coefficients of matrix is coefficientsmaintained is inmaintained Laplace distribution. in Laplace Meanwhile, distribution. we Meanwhile, have observed we the have values observed in reduced-dimensional the values in reduced-dimensionalmatrix and rotation matrix.matrix and It is shownrotation that matrix. most It values is shown of cells that in most matrix values are close of cells to 0, in and matrix the bigger are closethe absoluteto 0, and value,the bigger the smaller the absolute the probability value, the is. smaller Based on the the probability above two statements,is. Based onthe the distribution above two of statements,coefficients the in distribution reduced-dimensional of coefficients matrix in andredu rotationced-dimensional matrix can matrix be regarded and rotation as Laplace matrix distribution. can be regardedLattice vectoras Laplace quantization distribution. (LVQ) Lattice is widely vector used quantization in the codec (LVQ) because is widely of its used low in computational the codec becausecomplexity. of its low PVQ computational is one method complexity. of LVQ that PVQ is suitable is one method for Laplace of LVQ distribution. that is suitable Thus, for this Laplace section distribution.presents a Thus, design this of section quantization presents for a reduced-dimensionaldesign of quantization matrixfor reduce andd-dimensional rotation matrix matrix combined and rotationwith PVQ. matrix combined with PVQ.
3.1.3.1. Quantization Quantization Design Design of ofthe the Reduced-Dimensional Reduced-Dimensional Matrix Matrix In Inthe the reduced-dimensional reduced-dimensional matrix, matrix, the the first first column column is isthe the first first principal principal component, component, the the second second columncolumn is isthe the second second principal principal component, component, etc. etc. Acco Accordingrding to tothe the property property of ofPCA, PCA, the the first first principal principal componentcomponent has has the the most most important important information information of ofthe the original original signal, signal, and and information information carried carried by by otherother principal principal components components becomes becomes less less and and less less important. important. In Infact, fact, more more than than 95% 95% of ofthe the original original signalsignal energy, energy, which which can can be be also also ca calledlled information, information, is restored is restored only only by by the the first first principal principal component. component. ThatThat means means if ifthe the quantization quantization error error of of the the first first principal principal component component is islarge, large, compared compared with with the the originaloriginal signal, signal, the the restored restored signal signal also hashas aa large large error. error. Therefore, Therefore, the the first first principal principal component component needs needsto be to allocated be allocated more more bits, andbits, theand bits the for bits other for other principal principal components components should should be sequentially be sequentially reduced. reduced.For some For kinds some of kinds audio, of 4audio, principal 4 principal components comp areonents enough are enough to obtain to acceptableobtain acceptable quality, quality, while for whileother for kinds other of kinds audio of 5 principalaudio 5 principal components comp mayonents be needed. may be We needed. choose We 6 principal choose 6 components, principal components,because they because can satisfy they almostcan satisfy all kindsalmost of all audio. kinds In of fact, audio. the In fifth fact, and the sixth fifth principal and sixth components principal components play a small role in the restored spectral; therefore, little quantization accuracy is needed for the last two principal components. Appl. Sci. 2018, 8, 967 9 of 22 playAppl. a Sci. small 2018, role 8, x FOR in the PEER restored REVIEW spectral; therefore, little quantization accuracy is needed for the9 last of 22 two principal components. Based on the above conclusion, the reduced dimensional matrix can be divided into certain Based on the above conclusion, the reduced dimensional matrix can be divided into certain regions, as is shown in Figure 9. Different regions have different bit allocations: the darker color regions, as is shown in Figure9. Different regions have different bit allocations: the darker color means means more bits needed. more bits needed.
FigureFigure 9. 9.Bits Bits allocation allocation for for reduced-dimensional reduced-dimensional matrix matrix (darker (darker color color means means more more bits bits needed). needed).
AA PVQPVQ quantizerquantizer waswas usedused toto quantifyquantify thethe distributiondistribution ofof differentdifferent bitsbits inin each each principal principal componentcomponent of of the the reduced-dimensional reduced-dimensional matrix. matrix. Several Severa subjectivel subjective listening listening tests tests have have been been carried carried out, andout, the and bits the assignments bits assignments policy policy is determined is determin accordinged according to the qualityto the quality of the restoredof the restored audio under audio differentunder different bit assignments. bit assignments. Finally, theFinally, bits thatthe needbits tothat be need allocated to be for allocated each principal for each component principal arecomponent determined. are Tabledetermined.2 gives theTable number 2 gives of the bits number required of forbits each required principal for each component principal of component non-zero reduced-dimensionalof non-zero reduced-dimensional matrix under matrix the PVQ under quantizer. the PVQ quantizer.
TableTable 2. 2.Quantization Quantization bit bit for for reduced-dimensional reduced-dimensional matrix. matrix. Principal Component Bits Needed (bit/per Point) PrincipalFirst principal Component component Bits Needed 3 (bit/per Point) FirstSecond principal principal component component 3 3 Third principal component 2.5 Second principal component 3 Fourth principal component 1.5 Third principal component 2.5 Fifth principal component 0.45 Fourth principal component 1.5 Sixth principal component 0.45 Fifth principal component 0.45 Sixth principal component 0.45 3.2. Quantization Design of the Rotation Matrix