<<

applied sciences

Article A Multi-Frame PCA-Based Stereo Audio Coding Method

Jing Wang *, Xiaohan Zhao, Xiang Xie and Jingming Kuang

School of Information and Electronics, Beijing Institute of Technology, 100081 Beijing, China; [email protected] (X.Z.); [email protected] (X.X.); [email protected] (J.K.) * Correspondence: [email protected]; Tel.: +86-138-1015-0086

 Received: 18 April 2018; Accepted: 9 June 2018; Published: 12 June 2018 

Abstract: With the increasing demand for high quality audio, stereo audio coding has become more and more important. In this paper, a multi-frame coding method based on Principal Component Analysis (PCA) is proposed for the compression of audio signals, including both mono and stereo signals. The PCA-based method makes the input audio spectral coefficients into eigenvectors of covariance matrices and reduces coding bitrate by grouping such eigenvectors into fewer number of vectors. The multi-frame joint technique makes the PCA-based method more efficient and feasible. This paper also proposes a quantization method that utilizes Pyramid Vector Quantization (PVQ) to quantize the PCA matrices proposed in this paper with few bits. Parametric coding algorithms are also employed with PCA to ensure the high efficiency of the proposed audio . Subjective listening tests with Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) have shown that the proposed PCA-based coding method is efficient at processing stereo audio.

Keywords: stereo audio coding; Principal Component Analysis (PCA); multi-frame; Pyramid Vector Quantization (PVQ)

1. Introduction The goal of audio coding is to represent audio in digital form with as few bits as possible while maintaining the intelligibility and quality required for particular applications [1]. In audio coding, it is very important to deal with the stereo signal efficiently, which can offer better experiences of using applications like mobile communication and live audio . Over these years, a variety of techniques for stereo signal processing have been proposed [2,3], including M/S stereo, intensity stereo, joint stereo, and parametric stereo. M/S stereo coding transforms the left and right channels into a mid-channel and a side channel. Intensity stereo works on the principle of sound localization [4]: humans have a less keen sense of perceiving the direction of certain audio frequencies. By exploiting this characteristic, intensity stereo coding can reduce the bitrate with little or no perceived change in apparent quality. Therefore, at very low bitrate, this type of coding usually yields a gain in perceived audio quality. Intensity stereo is supported by many audio compression formats such as (AAC) [5,6], which is used for the transfer of relatively low , acceptable-quality audio with modest internet access speed. Encoders with joint stereo such as Moving Picture Experts Group (MPEG) Audio Layer III (MP3) and Ogg [7] use different algorithms to determine when to switch and how much space should be allocated to each channel (the quality can suffer if the switching is too frequent or if the side channel does not get enough bits). Based on the principle of human hearing [8,9], Parametric Stereo (PS) performs sparse coding in the spatial domain. The idea behind parametric stereo coding is to maximize the compression of a stereo signal by transmitting parameters describing the spatial image. For stereo input signals, the compression process basically follows one idea: synthesizing one signal

Appl. Sci. 2018, 8, 967; doi:10.3390/app8060967 www.mdpi.com/journal/applsci Appl. Sci. 2018, 8, 967 2 of 22 from the two input channels and extracting parameters to be encoded and transmitted in order to add spatial cues for synthesized stereo at the receiver’s end. The parameter estimation is made in the frequency domain [10,11]. AAC with Spectral Band Replication (SBR) and parametric stereo is defined as High-Efficiency Advanced Audio Coding version 2 (HE-AACv2). On the basis of several stereo algorithms mentioned above, other improved algorithms have been proposed [12], which causes Max Coherent Rotation (MCR) to enhance the correlation between the left channel and the right channel, and uses MCR angle to substitute the spatial parameters. This kind of method with MCR reduces the bitrate of spatial parameters and increases the performance of some spatial audio coding, but has not been widely used. Audio codec usually uses subspace-based methods such as Discrete Cosine Transform (DCT) [13], Fast Fourier Transform (FFT) [14], and [15] to transfer audio signal from time domain to frequency domain in suitably windowed time frames. Modified Discrete Cosine Transform (MDCT) is a lapped transform based on the type-IV Discrete Cosine Transform (DCT-IV), with the additional property of being lapped. Compared to other Fourier-related transforms, it has half as many outputs as inputs, and it has been widely used in audio coding. These transforms are general transformations; therefore, the energy aggregation can be further enhanced through an additional transformation like PCA [16,17], which is one of the optimal orthogonal transformations based on statistical properties. The orthogonal transformation can be understood as a coordinate one. That is, fewer new bases can be selected to construct a low dimensional space to describe the data in the original high dimensional space by PCA, which means the compressibility is higher. Some work was done on the audio coding method combined with PCA from different views. Paper [18] proposed a novel method to match different subbands of the left channel and the right channel based on PCA, through which the redundancy of two channels can be reduced further. Paper [19] mainly focused on the multichannel procession and the application of PCA in the subband, and it discussed several details of PCA, such as the energy of each eigenvector and the signal waveform after PCA. This paper introduced the rotation angle with Karhunen-Loève Transform (KLT) instead of the rotation matrix and the reduced-dimensional matrix compared to our paper. The paper [20] mainly focused on the localization of multichannel based on PCA, with which the original audio is separated into primary and ambient components. Then, these different components are used to analyze spatial perception, respectively, in order to improve the robustness of multichannel audio coding. In this paper, a multi-frame, PCA-based coding method for audio compression is proposed, which makes use of the properties of the orthogonal transformation and explores the feasibility of increasing the compression rate further after time-frequency transition. Compared to the previous work, this paper proposes a different method of applying PCA in audio coding. The main contributions of this paper include a new matrix construction method, a matrix quantization method based on PVQ, a combination method of PCA and parametric stereo, and a multi frame technique combined with PCA. In this method, the encoders transfer the matrices generated by PCA instead of the coefficients of the frequency spectrum. The proposed PCA-based coding method can hold both a mono signal and a stereo signal combined with parametric stereo. With the application of the multi-frame technique, the bitrate can be further reduced with a small impact on quality. To reduce the bitrate of the matrices, a method of matrix quantization based on PVQ [21] is put forward in this paper. The rest of the paper is organized as follows: Section2 describes the multi-frame, PCA-based coding method for mono signals. Section3 presents the proposed design of the matrix quantization. In Section4 , the PCA-based coding method for the mono signal is extended to stereo signals combined with improved parametric stereo. The experimental results, discussion, and conclusion are presented in Sections5–7, respectively. Appl. Sci. 2018, 8, 967 3 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 3 of 22

Appl. Sci. 2018, 8, x FOR PEER REVIEW 3 of 22 2.2. Multi-Frame Multi-Frame PCA-Based PCA-Based Coding Coding Method Method 2. Multi-Frame PCA-Based Coding Method 2.1.2.1. Framework Framework of of PCA-Based PCA-Based Coding Coding Method Method 2.1. Framework of PCA-Based Coding Method TheThe encoding process process can be described as follows: after time-frequency transformation such such as as MDCT,MDCT,The the encoding frequency process coefficients coefficients can be describedare used in as the follows: module after of time-frequency PCA, which includes transformation the multi-frame such as technique.technique.MDCT, the Several frequency Several matrices matrices coefficients are are generated are generated used after in afterthe PCA module PCA is quantized isof quantizedPCA, whichand andencoded includes encoded to the bitstream. multi-frame to bitstream. The decoderThetechnique. decoder is the Severalis mirror the mirrormatrices image image ofare the generated of encoder, the encoder, afterafter PCAde aftercoding is decodingquantized and de-quantizing, and and de-quantizing,encoded matrices to bitstream. matrices are used The are to generateuseddecoder to generate isfrequency the mirror frequency domain image domainof signalsthe encoder, signals by inverseafter by inverse de codingPCA PCA and(iPCA). (iPCA). de-quantizing, Finally, Finally, after matrices after frequency-time are used to transformation,transformation,generate frequency the the encoder encoder domain can can signals export export audio. audio.by inverse Flowcharts Flowcharts PCA of of(iPCA). encoder encoder Finally, and and decoder decoder after for forfrequency-time mono mono signals signals arearetransformation, shown shown in in Figures Figures the encoder 11 and and 2.2 . canThe The export part part of audio. of MDCT MDCT Flowcharts is isused used to ofconcentrate to encoder concentrate and energy decoder energy of signal offor signal mono on low onsignals band low inbandare frequency shown in frequency in domain, Figures domain, 1which and which2. isThe good ispart good for of MDCT forthe theprocess is process used of to ofmatrix concentrate matrix construction construction energy of(details (details signal areon are lowshown shown band in in SectionSectionin frequency 2.4).2.4). Some Some domain, informal informal which listening listening is good experiments experiments for the process ha haveve beenof been matrix carried carried construction out out on on the the performance(details performance are shown applying applying in PCAPCASection without 2.4). Some MDCT.MDCT. informal TheThe experimentalexperimental listening experiments results results show show have that that been without without carried MDCT, outMDCT, on the the the performance performance performance of applying PCAof PCA has hasslightPCA slight reduction,without reduction, MDCT. which which The means experimental means more more bits are resultsbits needed are show needed by that the by schemewithout the scheme withoutMDCT, without MDCTthe performance MDCT in order in to orderof achieve PCA to achievethehas same slight the output reduction,same qualityoutput which ofquality the means scheme of the more withscheme bits MDCT. are with needed Thus,MDCT. inby Thus, thisthe paperscheme in this MDCT withoutpaper is MDCT assumedMDCT is inassumed to order enhance to to enhancetheachieve performance thethe performancesame of output the PCA, qualityof the although PCA, of the although itscheme will bring with it will more MDCT. bring computational Thus,more computationalin this complexity.paper MDCT complexity. is assumed to enhance the performance of the PCA, although it will bring more computational complexity. Multi-frame PCA Multi-frame PCA

TF Rotation Quantization Input signal PCA Bitstream transformationTF Rotationmatrix Quantizationand encode Input signal PCA Bitstream transformation matrix and encode

Reduced- dimensionalReduced- dimensionalmatrices matrices

Figure 1. Flowchart of mono encoder. (TF, Time-to-Frequency; PCA, Principle Component Analysis). FigureFigure 1.1. FlowchartFlowchart of mono encoder. (TF, Time-to-Fr Time-to-Frequency;equency; PCA, Principle Principle Component Component Analysis). Analysis).

Decode and Rotation FT Bitstream Decode and Rotation iPCA FT Output Bitstream dequantization matrix iPCA transformation Output dequantization matrix transformation

Reduced- dimensionalReduced- dimensionalmatrices matrices

Figure 2. Flowchart of mono decoder. (iPCA, inverse Principle Component Analysis; FT, Frequency- Figure 2. Flowchart of mono decoder. (iPCA, inverse Principle Component Analysis; FT, Frequency- to-Time).Figure 2. Flowchart of mono decoder. (iPCA, inverse Principle Component Analysis; FT, Frequency-to-Time).to-Time). 2.2. Principle of PCA 2.2. Principle of PCA 2.2. Principle of PCA TheThe PCA’s PCA’s mathematical mathematical principleprinciple isis asas follows:follows: after coordinate transformation,transformation, thethe originaloriginal The PCA’s mathematical principle is as follows: after coordinate transformation, the original high-dimensionalhigh-dimensional samples samples with with certcertainain relevancerelevance can be transferred toto aa newnew setset ofof low-dimensionallow-dimensional high-dimensional samples with certain relevance can be transferred to a new set of low-dimensional samplessamples that that are are unrelated unrelated toto eacheach other.other. TheseThese newnew samples carry most informationinformation ofof thethe original original samples that are unrelated to each other. These new samples carry most information of the original datadata and and can can replace replace the the original original samplessamples forfor follow-upfollow-up analysis. data and can replace the original samples for follow-up analysis. ThereThere are are several several criteria criteria forfor choosingchoosing newnew samples or selecting new basesbases inin PCA.PCA. TheThe typical typical There are several criteria for choosing new1 samples or selecting new bases in PCA. The typical methodmethod is is to to use use the the variance variance ofof newnew samplesample FF1 (i.e.,(i.e., thethe variance ofof thethe originaloriginal samplesample mappingmapping on on themethodthe new new iscoordinates). coordinates). to use the variance The The larger larger of new VarVar sample((FiFi)) is,is, thetheF1 (i.e.,moremore the information variance of Fi the contains.contains. original So,So, sample thethe firstfirst mapping principal principal on the new coordinates). The larger Var (Fi) is, the more information Fi contains. So, the first principal componentcomponent should should have have the the largest largest variancevariance FF11.. IfIf thethe first principal componentcomponent FF11 isis notnot qualified qualified to to component should have the largest variance F . If the first principal component F is not qualified replacereplace the the original original sample, sample, thenthen thethe secondsecond principalprincipal1 component F22 shouldshould bebe considered.considered.1 F F2 2is is the the to replace the original sample, then the second principal component1 2 F should be considered. F is principalprincipal componentcomponent withwith thethe largestlargest variancevariance except F1, and F2 isis 2uncorrelateduncorrelated toto , , thatthat is,2is, the( principal) component with the largest variance1 except F ,2 and F is uncorrelated to F , that is, Cov Cov(,,)=0=0. .This This means means that that thethe basebase ofof FF1 andand thethe base of1 F2 areare orthogonalorthogonal2 toto eacheach other, other,1 which which cancan reduce reduce the the data data redundancyredundancy betweenbetween newnew sampsamples (or principal components)components) effectively.effectively. TheThe third,third, fourth, fourth, andand p-p-thth principalprincipal componentcomponent cancan be constructed similarly.similarly. TheThe variancevariance ofof thesethese Appl. Sci. 2018, 8, 967 4 of 22

Cov(F1, F2) = 0. This means that the base of F1 and the base of F2 are orthogonal to each other, which can reduce the data redundancy between new samples (or principal components) effectively. The third, fourth, and p-th principal component can be constructed similarly. The variance of these principal components is in descending order, and the corresponding base in new space is uncorrelated to other new base. If there are m n-dimensional data, the procession of PCA is shown in Table1.

Table 1. PCA ALGORITHM.

Algorithm: PCA (Principle Component Analysis) (i) Obtain matrix by columns: X [ m n ] (ii) Zero-mean columns in X to get matrix X [ m n ] 1 T (iii) Calculate C = m X X (iv) Calculate eigenvalues λ1, λ2, λ3 ... λn and eigenvectors a1, a2, a3 ... an of C (v) Use eigenvector to construct P according to the eigenvalue [ n n ] (vi) Select the first k columns of P to construct the rotation matrix P [ n n ] [ n k ] (vii) Complete dimensionality reduction by Y = X × P [ m k ] [ m n ] [ n k ]

The contribution rate of the principal component reflects the proportion that each principal component accounts for the total amount of data after coordinate transformation, which can effectively solve the problem of dimension selection after dimensionality reduction. In PCA application, people often use the cumulative contribution rate as the basis for principal components selection. The cumulative contribution rate Mk of the first k principal components is

k ∑i=1 λi Mk = n (1) ∑i=1 λi If the contribution rate of the first k principal components meets the specific requirements (the contribution rates are different according to different requirements), the first k principal components can be used to describe the original data to achieve the purpose of dimensionality reduction. PCA is a good transformation due to its properties, as follows:

(i) Each new base is orthogonal to the other new base; (ii) Mean squared error of the data is the minimum after transformation; (iii) Energy is more concentrated and more convenient for data processing.

It is worthwhile noting that PCA does not simply delete the data of little importance. After PCA transformation, the dimension-reduced data can be transformed to restore most of the high-dimensional original data, which is a good character for . In this paper, as is shown in Figure3, the spectrum coefficients of the input signal are divided into multiple samples according to specific rules; then, these samples will be constructed to the original matrix X. After the principal component analysis, matrix X is decomposed into reduced-dimensional matrix Y and rotation matrix P; the process of calculating matrix Y and P is shown in Table1. The matrix Y and P are transmitted to the decoder after quantization and coding. In decoder, the original matrix can be restored by multiplying reduced-dimensional matrix and transposed rotation matrix. There is some data loss during dimension reduction, but the loss is much less, so we can ignore it. For example, we can recover 99.97% information through a 6-dimension matrix, when the autocorrelation matrix has the 15th dimension. Ideally the original matrix X can be restored by reduced-dimensional matrix Y and rotation matrix P with T X ≈ Xrestore = Y×P (2)

T in which Xrestore is the matrix restored in decoder and P is the transposition rank of matrix P. Then, Xrestore is reconstructed to spectral coefficients. Appl. Sci. 2018, 8, 967 5 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 5 of 22

0.4

0.2

Appl. Sci. 2018, 8, x FOR PEER REVIEW 5 of 22 0 Reduced- PCA Rotation 0.4 dimensional matrix P matrix Y -0.2 0.2

-0.4 0 Reduced- 0 50 100 150 200 250 PCA Rotation dimensional matrix P matrix Y Spectrum-0.2 coefficients Original matrix X

Figure-0.4 3. Scheme of PCA-based coding method. (PCA, Principle Component Analysis). Figure0 3.50Scheme100 of150 PCA-based200 250 coding method. (PCA, Principle Component Analysis).

Spectrum coefficients Original matrix X 2.3. Format of Each Matrix 2.3. Format of EachFigure Matrix 3. Scheme of PCA-based coding method. (PCA, Principle Component Analysis). InIn encoder, encoder, when when the the sampling sampling rate rate is is 48 48 kHz, kHz, th thee frame frame has has 240 240 spectral spectral coefficients coefficients after after MDCT MDCT 2.3. Format of Each Matrix (in(in this this paper, paper, the the MDCT MDCT frame frame size size is is 5 5 ms ms with with 50% 50% overlap). overlap). There There are are many many forms of matrices like 6 In encoder, when the sampling rate is 48 kHz, the frame has 240 spectral coefficients after MDCT 6 × 40,40, 12 12 ×× 20,20, 20 20 × ×12,12, and and so soon; on; each each format format of ofmatr matrixix brings brings different different compression compression rates. rates. In In a asimple simple test, several(in formats this paper, of the original MDCT framematrix size were is 5 ms cons withtructed. 50% overlap). Then, There a subjectiveare many forms test of was matrices devised like using test, several6 × 40, formats 12 × 20, of20 × original 12, and so matrix on; each wereformat constructed. of matrix brings Then, different a subjectivecompression testrates. was In a simple devised using thosethose different differenttest, severaldimensional dimensional formats ofrotation rotation original matrices.matrix matrices. were 10 cons 10 lis listenerstructed.teners recordedThen, recorded a subjective the the number number test was of devised of dimensions dimensions using when when thethe restored restoredthose audio audio different had had acceptabledimensional acceptable rotationquality. quality. matrices. Then, Then, 10th the elis compression teners recorded rateratethe number waswas calculatedcalculated of dimensions byby when the the number number of the restored audio had acceptable quality. Then, the compression rate was calculated by the number ofdimensions. dimensions. As As is is shown shown in in Figure Figure4, the4, the matrix matrix has has the the largest largest compression compression rate rate when when it has it 16has rows. 16 of dimensions. As is shown in Figure 4, the matrix has the largest compression rate when it has 16 rows.So, the So, matrix the matrixX [ with] with 16 16 rows rows and and 15 columns15 columns is selected is selected for transientfor transient frame frame in this in paper.this rows. So,[ the matrix] [ ] with 16 rows and 15 columns is selected for transient frame in this paper. That means 16a 240-coefficient-long 15 frequency domain signal is divided into 16 samples, each That meanspaper. a 240-coefficient-long That means a 240-coefficient-long frequency frequency domain domain signal signal is divided is divided into into 16 16 samples, samples, each each sample sample havingsample 15 having dimensions. 15 dimensions. having 15 dimensions.

0.6 0.6

0.5 0.5 0.4 0.4 0.3

0.3 0.2 compression rate 0.2 0.1

compression rate 0 0.1 10 15 20 25 Row of matrix X

0FigureFigure 4. Compression 4. Compression rate rate for diffe differentrent format format of matrix. of matrix. 10 15 20 25 2.4. Way of Matrix Construction 2.4. Way of Matrix Construction Row of matrix X An appropriate way to obtain the 16 samples from frequency domain coefficients is necessary. An appropriateThis paper proposes wayFigure to one obtain4. methodCompression the as 16 follows: samples rate forsuppo diffe fromse rentthe frequency coefficientsformat of domain matrix.of one frame coefficients in frequency is necessary. This paperdomain proposes are one, … method. asis filled follows: in the suppose first column the and coefficients the first row of one[ frame], is in filled frequency in the domain a a firsta columna and the second row [] , and is filled in the firstX column anda the 16th row 2.4.are Way1, of2 ... Matrix240. Construction1 is filled in the first column and the first row [ ], 2 is filled in the first [ ]. Then, is filled in the first row and second column [], 1 is 1 filled in the second column and the second row X , and a is filled in the first column and the 16th row X . An appropriate way to obtain[ 2 the 1 ] 16 samples16 from frequency domain coefficients is necessary.[ 16 1 ] ThisThen, papera isproposes filled in one the firstmethod row as and follows: second suppo columnse Xthe coefficients, a is of filled one in frame the second in frequency row and 17 [ 1 2 ] 18 domain are , …. is filled in the first column and the first row [], is filled in the second column X , and so on, until all the coefficients have been filled in the original matrix first column and the[ 2 second 2 ] row [] , and is filled in the first column and the 16th row X ; that is, [[ 16 ]. Then, 15 ] is filled in the first row and second column [], is filled in the second Appl. Sci. 2018, 8, x FOR PEER REVIEW 6 of 22

Appl. Sci. 2018, 8, 967 6 of 22 row and second column [], and so on, until all the coefficients have been filled in the original matrix [ ]; that is, , ⋯  a1, ··· a225 [ ] = ⋮⋱ ⋮ (3) X =  . ⋯.. .  (3) [ 16 15 ] , . . .  This method has two obvious advantages, whicha16, can··· be finda240 in Figure 5: This method has two obvious advantages, which can be find in Figure5:

5

4

3

2

1

value 0

-1

-2

-3

-4 20

15 15

10 10 row 5 5 column

0 0

Figure 5. Example for matrix construction (“value” means the value of cells in original matrix, “column” Figuremeans 5. the Example column offor original matrix matrix,construction and “row” (“value” means me theans row the of value original of matrix).cells in original matrix, “column” means the column of original matrix, and “row” means the row of original matrix). (i) This method takes advantage of the short-time stationary characteristic of signals in the frequency(i) This domain. method Therefore, takes advantage the difference of the between short-time different stationary rows incharacteristic the same column of signals of the in matrix the frequencyconstructed domain. by this Therefore, sampling the method difference is small. between In different other words, rows thein the difference same column between of the the matrix same constructeddimensions by of differentthis sampling samples method in the is matrix small. is In small, other and words, different the dimensionsdifference between have similar the same linear dimensionsrelationships, of different which is verysamples conducive in the matrix to dimensionality is small, and reduction. different dimensions have similar linear relationships,(ii) This methodwhich is allows very conducive signal energy to dimensionality to gather still in reduction. the low-dimensional region of the new space. The(ii) energy This ofmethod the frequency allows signal domain energy signal tois gather concentrated still in the in thelow-dimensional region region; ofafter the new PCA , space.the advanced The energy column of the of frequency reduced-dimensional domain signal matrix is concentrated still has the in mostthe low signal frequency energy. region; Thus, after after PCA,dimensionality the advanced reduction, column we of canreduced-dimensional still focus on the low-dimensional matrix still has region.the most signal energy. Thus, after dimensionality reduction, we can still focus on the low-dimensional region. 2.5. Multi-Frame Joint PCA 2.5. Multi-FrameIn the experiment, Joint PCA a phenomenon was observed that the rotation matrices of adjacent frames are greatlyIn the similar. experiment, Therefore, a phenomenon it is possible was to do observed joint PCA that with the multiplerotation matrices frames to of generate adjacent one frames rotation are greatlymatrix, similar. that is, Therefore, multiple frames it is possible use the to same do join rotationt PCA matrix. with multiple Therefore, frames the codecto generate can transmit one rotation fewer matrix,rotation that matrices, is, multiple and bitrateframes can use be the reduced. same rotation matrix. Therefore, the codec can transmit fewer rotationBelow matrices, is one and way bitrate to do can joint be reduced. PCA with least error. First, frequency domain coefficients of n sub-frames are constructed as n original matrices X , X ... X , Below is one way to do joint PCA with least error. First,1[ frequency16 15 ] domain2[ 16 coefficients 15 ] n[ of16 n sub- 15 ] , framesrespectively; are constructed then, the as original n original matrices matrices of each[ sub-frame ], [ are ]… used[ to form] respectively; one original then, matrix the originalX matrices. This of matrixeach sub-frame is used to are obtain used one to rotationform one matrix original and matrixn reduced-dimensional [ ]. This matrix matrices. is [ 16n 15 ] used to obtain one rotation matrix and n reduced-dimensional matrices. If too many matrices are analyzed at the same time, the codec delay will be high, which is If too many matrices are analyzed at the same time, the codec delay will be high, which is unbearable for real-time communication. Besides, the average quality of restored audio signal decreases unbearable for real-time communication. Besides, the average quality of restored audio signal with the increase in the number of frames. Therefore, the need to reduce bitrate and real-time decreases with the increase in the number of frames. Therefore, the need to reduce bitrate and real- communication should be comprehensively considered. A subjective listening test was designed Appl. Sci. 2018, 8, x FOR PEER REVIEW 7 of 22

Appl. Sci. 2018, 8, x FOR PEER REVIEW 7 of 22 timeAppl. communication Sci. 2018, 8, 967 should be comprehensively considered. A subjective listening test was designed7 of 22 to find the relationship between the number of frames and the quality of restored signal. 10 audio time communication should be comprehensively considered. A subjective listening test was designed materialsto find from the European relationship Broadcasting between the Union number (EBU) of fr testames materials and the qualitywere coded of restored with multi-frame signal. 10 audio PCA to find the relationship between the number of frames and the quality of restored signal. 10 audio with differentmaterials numbers from European of frames. Broadcasting The Mean Union Opin (EBU)ion test Score materials (MOS) were [22] coded of the with restored multi-frame music PCA was recordedmaterialswith by different from 10 listeners. European numbers The Broadcasting ofstatistical frames. Theresults Union Mean are (EBU) Opin shownion test inScore materials Figure (MOS) 6. were [22]coded of the withrestored multi-frame music was PCA withrecorded different by numbers 10 listeners. of frames.The statistical The Meanresults Opinionare shown Score in Figure (MOS) 6. [22] of the restored music was recorded by 10 listeners. The statistical results are shown in Figure6.

5 multi frame 5 onemulti frame frame 4.8 one frame 4.8 4.6 4.6 MOS 4.4MOS 4.4

4.2 4.2

4 4

2 2 4 4 6 6 8 8 1010 1212 1414 NumberNumber of of frame frame

FigureFigure 6. 6.Subjective Subjective test test results results forfor different number number of of frames. frames. Figure 6. Subjective test results for different number of frames. As is shown in Figure 6, when the number of frames is less than 6 or 8, the decrease of audio AsAs is isshown shown in inFigure Figure 6,6 ,when when the the number number of of fr framesames is is less less than than 6 6or or 8, 8, the the decrease decrease of of audio audio quality is not obvious. A suitable number of frames is then subjected to joint PCA. Taken together, quality is not obvious. A suitable number of frames is then subjected to joint PCA. Taken together, qualitywhen is not 8 sub-framesobvious. A are suitable analyzed number at the sameof frames time, theis then bitrate subjected and the delayto joint of PCA.encoder Taken is acceptable, together, when 8 sub-frames are analyzed at the same time, the bitrate and the delay of encoder is acceptable, when 8that sub-frames is, for every are 40analyzed ms signal, at the 8 sub-frame same time, redu theced-dimensional bitrate and the (Rd)delay matrices of encoder and isone acceptable, rotation thatthat is, is,matrix for for every everyare transferred. 40 40 msms signal,signal, Main 8 8functions sub-frame sub-frame of reduced-dimensionalthe redu monoced-dimensional encoder and decoder (Rd) (Rd) matrices combinedmatrices and withand one rotationmulti-frameone rotation matrix matrixare transferred.joint are transferred.PCA are Main shown Main functions in functionsFigures of 7 the andof monothe 8. Inmo encoderencoder,no encoder 40 and ms and decoder signal decoder is combined used combined to produce with with multi-frame 8 Rd multi-frame matrices joint jointPCA PCAand are 1are shown rotation shown in matrix. Figuresin Figures In7 decoder,and 7 8and. In after8. encoder, In receiving encoder, 40 ms 8 40 Rd signal ms matrices signal is used and is used to1 rotation produce to produce matrix, 8 Rd matrices88 Rdframes matrices andare 1 androtation 1 rotationrestored matrix. tomatrix. generate In decoder, In 40decoder, ms after signal. after receiving receiving 8 Rd matrices8 Rd matrices and 1 and rotation 1 rotation matrix, matrix, 8 frames 8 frames are restored are restoredto generate to generate 40 ms signal.40 ms signal. Frequency domain Frequency signal domain signal

PCA

PCA

Rd Rd Rd Rd matrix1 matrix2 matrix3 matrix4 Rotation Rd Rd Rd Rd matrix Rd matrix5Rdmatrix6 Rdmatrix7 Rdmatrix8 matrix1 matrix2 matrix3 matrix4 Rotation Rd Rd Rd Rd matrix matrix5 matrix6 matrix7 matrix8 Quantization and encode

Figure 7. Multi-frame in encoder. (PCA,Quantization Principle Component Analysis; Rd, reduced-dimensional). and encode

FigureFigure 7. Multi-frame 7. Multi-frame in inencoder. encoder. (PCA, (PCA, Principle Principle Co Componentmponent Analysis; Analysis; Rd, Rd, reduced-dimensional). reduced-dimensional). Appl. Sci. 2018, 8, 967 8 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 8 of 22

Dequantization

Rd Rd Rd Rd matrix1 matrix2 matrix3 matrix4 Rotation Rd Rd Rd Rd matrix matrix5 matrix6 matrix7 matrix8

iPCA coefficients

Frequency domain signal

FigureFigure 8. Multi-frame 8. Multi-frame in decoder. in decoder. (iPCA, inverse (iPCA, Principle inverse Component Principle Component Analysis; Rd, Analysis; reduced- Rd, dimensional).reduced-dimensional).

3. 3.Quantization Quantization Design Design Based Based On On PVQ PVQ AccordingAccording to tothe the properties properties of ofmatrix matrix multiplication, multiplication, if the if the error error of ofone one point point in inmatrix matrix Y orY or P isP is large,large, the the restored restored signal signal may may have have a largelarge error.error. Therefore, Therefore, uniform uniform quantization quantization cannot cannot limit limit the the error errorof every of every point point in the in the matrix matrix in the in the acceptable acceptable range range with with bitrate bitrate limitation. limitation. So, So, it is it necessary is necessary to setto a setseries a series of new of new quantization quantization rules rules based based on the on properties the properties of the of dimensionality the dimensionality matrix matrix and the and rotation the rotationmatrix. matrix. It is assumed It is assumed that the that audio the signalaudio obeyssignal theobeys distribution the distribution of Laplace of Laplace [23], and [23], both and PCA both and PCAMDCT and inMDCT the paper in the are paper orthogonal are orthogonal transformations. transformations. Thus, the distributionThus, the distribution of matrix coefficients of matrix is coefficientsmaintained is inmaintained Laplace distribution. in Laplace Meanwhile, distribution. we Meanwhile, have observed we the have values observed in reduced-dimensional the values in reduced-dimensionalmatrix and rotation matrix.matrix and It is shownrotation that matrix. most It values is shown of cells that in most matrix values are close of cells to 0, in and matrix the bigger are closethe absoluteto 0, and value,the bigger the smaller the absolute the probability value, the is. smaller Based on the the probability above two statements,is. Based onthe the distribution above two of statements,coefficients the in distribution reduced-dimensional of coefficients matrix in andredu rotationced-dimensional matrix can matrix be regarded and rotation as Laplace matrix distribution. can be regardedLattice vectoras Laplace quantization distribution. (LVQ) Lattice is widely vector used quantization in the codec (LVQ) because is widely of its used low in computational the codec becausecomplexity. of its low PVQ computational is one method complexity. of LVQ that PVQ is suitable is one method for Laplace of LVQ distribution. that is suitable Thus, for this Laplace section distribution.presents a Thus, design this of section quantization presents for a reduced-dimensionaldesign of quantization matrixfor reduce andd-dimensional rotation matrix matrix combined and rotationwith PVQ. matrix combined with PVQ.

3.1.3.1. Quantization Quantization Design Design of ofthe the Reduced-Dimensional Reduced-Dimensional Matrix Matrix In Inthe the reduced-dimensional reduced-dimensional matrix, matrix, the the first first column column is isthe the first first principal principal component, component, the the second second columncolumn is isthe the second second principal principal component, component, etc. etc. Acco Accordingrding to tothe the property property of ofPCA, PCA, the the first first principal principal componentcomponent has has the the most most important important information information of ofthe the original original signal, signal, and and information information carried carried by by otherother principal principal components components becomes becomes less less and and less less important. important. In Infact, fact, more more than than 95% 95% of ofthe the original original signalsignal energy, energy, which which can can be be also also ca calledlled information, information, is restored is restored only only by by the the first first principal principal component. component. ThatThat means means if ifthe the quantization quantization error error of of the the first first principal principal component component is islarge, large, compared compared with with the the originaloriginal signal, signal, the the restored restored signal signal also hashas aa large large error. error. Therefore, Therefore, the the first first principal principal component component needs needsto be to allocated be allocated more more bits, andbits, theand bits the for bits other for other principal principal components components should should be sequentially be sequentially reduced. reduced.For some For kinds some of kinds audio, of 4audio, principal 4 principal components comp areonents enough are enough to obtain to acceptableobtain acceptable quality, quality, while for whileother for kinds other of kinds audio of 5 principalaudio 5 principal components comp mayonents be needed. may be We needed. choose We 6 principal choose 6 components, principal components,because they because can satisfy they almostcan satisfy all kindsalmost of all audio. kinds In of fact, audio. the In fifth fact, and the sixth fifth principal and sixth components principal components play a small role in the restored spectral; therefore, little quantization accuracy is needed for the last two principal components. Appl. Sci. 2018, 8, 967 9 of 22 playAppl. a Sci. small 2018, role 8, x FOR in the PEER restored REVIEW spectral; therefore, little quantization accuracy is needed for the9 last of 22 two principal components. Based on the above conclusion, the reduced dimensional matrix can be divided into certain Based on the above conclusion, the reduced dimensional matrix can be divided into certain regions, as is shown in Figure 9. Different regions have different bit allocations: the darker color regions, as is shown in Figure9. Different regions have different bit allocations: the darker color means means more bits needed. more bits needed.

FigureFigure 9. 9.Bits Bits allocation allocation for for reduced-dimensional reduced-dimensional matrix matrix (darker (darker color color means means more more bits bits needed). needed).

AA PVQPVQ quantizerquantizer waswas usedused toto quantifyquantify thethe distributiondistribution ofof differentdifferent bitsbits inin each each principal principal componentcomponent of of the the reduced-dimensional reduced-dimensional matrix. matrix. Several Severa subjectivel subjective listening listening tests tests have have been been carried carried out, andout, the and bits the assignments bits assignments policy policy is determined is determin accordinged according to the qualityto the quality of the restoredof the restored audio under audio differentunder different bit assignments. bit assignments. Finally, theFinally, bits thatthe needbits tothat be need allocated to be for allocated each principal for each component principal arecomponent determined. are Tabledetermined.2 gives theTable number 2 gives of the bits number required of forbits each required principal for each component principal of component non-zero reduced-dimensionalof non-zero reduced-dimensional matrix under matrix the PVQ under quantizer. the PVQ quantizer.

TableTable 2. 2.Quantization Quantization bit bit for for reduced-dimensional reduced-dimensional matrix. matrix. Principal Component Bits Needed (bit/per Point) PrincipalFirst principal Component component Bits Needed 3 (bit/per Point) FirstSecond principal principal component component 3 3 Third principal component 2.5 Second principal component 3 Fourth principal component 1.5 Third principal component 2.5 Fifth principal component 0.45 Fourth principal component 1.5 Sixth principal component 0.45 Fifth principal component 0.45 Sixth principal component 0.45 3.2. Quantization Design of the Rotation Matrix

3.2. QuantizationAccording Designto Y = XP ofthe in Rotationencoder Matrixand = in decoder, some properties of the rotation matrix can be found: T According to Y = XP in encoder and Xrestore = YP in decoder, some properties of the rotation matrix(i) The can higher be found: row in matrix P is used to restore the region of higher frequencies in the restored signal. (ii) The first column in matrix P corresponds to the first principal component in the reduced- (i) Thedimensional higher row matrix. in matrix ThatP meansis used that to restore the first the column region ofof higherthe rotation frequencies matrix in only the restoredmultiplies signal. with (ii) Thethe first first column column (first in principal matrix component)P corresponds of the toreduced-dimensional the first principal matrix component when calculating in the reduced-dimensionalthe restored signal in the matrix. decoder. That The means second that colu themn first of the column rotation of matrix the rotation only multiplies matrix only with multipliesthe second with column the first(second column principal (first principalcomponent) component) of the reduced-dimensional of the reduced-dimensional matrix, and matrix so on. whenAccording calculating to the the above restored properties signal in of the the decoder. rotation The matrix, second the column quantization of the rotation distribution matrix of only the rotation matrix has been made clearer, that is, the larger the row number is, and the larger the column number is, the fewer allocation bits there are. Appl. Sci. 2018, 8, 967 10 of 22

multiplies with the second column (second principal component) of the reduced-dimensional matrix, and so on. According to the above properties of the rotation matrix, the quantization Appl. Sci.distribution 2018, 8, x FOR of PEER the REVIEW rotation matrix has been made clearer, that is, the larger the row number10 of 22 is, and the larger the column number is, the fewer allocation bits there are. In addition to the above two properties of the rotation matrix, there is another important property.In addition Generally, to the the above data twoin the properties first four of rows the rotation around matrix, the diagonal there is are another bigger important than others. property. The thinkingGenerally, of thethis datacharacteristic in the first in fourthis paper rows aroundis as follows: the diagonal common are audio bigger focuses than more others. energy Thethinking on low- bandof this in characteristicfrequency domain, in this and paper the ismethod as follows: of matrix common construction audio focuses described more in energySection on2.4 low-bandcan keep thein frequency coefficients domain, of low-band and the stay method in low-column. of matrix Thus, construction the first diagonal described value in Section that is 2.4calculated can keep from the thecoefficients first column of low-band must be the stay largest in low-column. one of overall Thus, values the in first rotation diagonal matrix value or autocorrelation that is calculated matrix. from Thethe firstsecond column diagonal must value be the could largest quite one possibly of overall be values the second-largest in rotation matrix value, or and autocorrelation so on. That means matrix. theseThe second data arediagonal more important value could for quitedecoder, possibly so the be quantization the second-largest accuracy value, of these and regions so on. Thatwith meanslarger absolutethese data values are more can important determine for the decoder, error sobetween the quantization the restored accuracy signal of and these the regions original with signal. larger Therefore,absolute values the data can determinearound the the diagonal error between need to the be restored allocated signal with and more the bits. original Figure signal. 10 shows Therefore, the “averagethe data aroundvalue” therotation diagonal matrix need of a to piece be allocated of audio with as an more example bits. Figureto show 10 this shows property the “average more clearly. value” rotation matrix of a piece of audio as an example to show this property more clearly.

1 0.8

0.6 0.4

0.2 0 value

-0.2

-0.4

-0.6

-0.8 15

15 10 10 row 5 5 column 0 0

Figure 10. An example rotation matrix (“value” means the average value of cells in rotation matrices, Figure“column” 10. meansAn example the column rotation of matrix rotation (“value” matrix, means and “row” the average means the value row of of cells rotation in rotation matrix). matrices, “column” means the column of rotation matrix, and “row” means the row of rotation matrix). The rotation matrix also has the following quantization criterion: The rotation matrix also has the following quantization criterion: (i) The first column of the rotation matrix needs to be precisely quantized, because the first principal (i) The first column of the rotation matrix needs to be precisely quantized, because the first principal component of the reduced-dimensional signal is only multiplied by the first column of P in component of the reduced-dimensional signal is only multiplied by the first column of P in decoder to restore signal. decoder to restore signal. (ii) Data in columns 2–6 in row 1 have little effect on the restored signal, so that few bits can be (ii) Data in columns 2–6 in row 1 have little effect on the restored signal, so that few bits can be allocatedallocated for for this this region. region. (iii)(iii) TheThe higher higher row row in in matrix matrix PP isis used used to to restore restore the the region region of of higher higher frequencies frequencies in in the the restored restored signal.signal. The The data data in in lines lines 13, 13, 14, 14, and and 15 15 correspond correspond to to the the frequency frequency that that exceeds exceeds the the range range of of frequenciesfrequencies perceptible perceptible to to the the human human ear, ear, so so these these data data do do not not need need to to be be quantized. quantized. AccordingAccording toto thethe above above quantization quantization criteria, criteria, the rotationthe rotation matrix matrix that is dividedthat is divided into the followinginto the followingregions according regions according to bit allocation to bit isallocation shown in is Figure shown 11 in. TheFigure darker 11. The the darker color is, the the color more is, bits the should more bitsbe allocated. should be allocated. Appl. Sci. 2018, 8, 967 11 of 22 Appl. Sci. 2018, 8, x FOR PEER REVIEW 11 of 22

FigureFigure 11. 11.Bit Bit allocation allocation for for rotation rotation matrix matrix (darker (darker color color means means more more bits bits needed; needed; white white color color means means nono bits). bits).

TheThe same same test test method method as as the the one one for for reduced-dimensional reduced-dimensional matrix matrix was was used used to to determine determine the the numbernumber of of bits bits needed needed in in each each region region in in rotation rotation matrix. matrix. InIn Table Table3, the3, the first first region region corresponds corresponds to the to region the region with the with darkest the darkest color in color Figure in 11 Figure; the second 11; the correspondssecond corresponds to the area to withthe area the with second-darkest the second-darkest color, and color, so on. and The so whiteon. The color white means color there means are there no bitsare allocated no bits allocated to that area. to that area.

TableTable 3. 3.Quantization Quantization bits bits for for rotation rotation matrix. matrix.

Region Bits Needed (bit/per Point) RegionThe first region Bits Needed 4 (bit/per Point) TheThe first second region region 3 4 The secondThe third region region 2 3 TheThe third fourth region region 2 2 The fourthThe fifth region region 0.5 2 TheThe fifth sixth region region 0.5 0.5 TheThe sixth seventh region region 0 0.5 The seventh region 0 3.3. Design of the Low-Pass Filter 3.3. DesignThe noise of the generated Low-Pass Filterfrom quantization and matrix calculation is white noise. There are two ways to reduceThe noise it. The generated first way from is introducing quantization noise and matrix shaping calculation to make isnoise white more noise. comfortable There are twofor human ways tohearing, reduce it.and The the first second way way is introducing is introducing noise a filter shaping in decoder. to make noise more comfortable for human hearing,For and most the signals, second the way energy is introducing concentrates a filter on inlow decoder. frequency domain, therefore the noise in low frequencyFor most domain signals, does the energynot sound concentrates obvious because on low frequencyof simultaneous domain, masking. therefore While the noise in the in lowhigh frequencyfrequency domain part, if the does original not sound signal obvious does not because have high of simultaneousfrequency components masking., the While noise in signal the high will frequencynot be masked part, if and the can original be heard. signal So, does a low-pass not have highfilter frequencycan be set components,to mask the high the noise frequency signal noise will notsignal, be masked without and affecting can be the heard. original So, signal. a low-pass The key filter point can of be the set filter to mask design the is high to determine frequency the noise cut- signal,off frequency. without affecting the original signal. The key point of the filter design is to determine the … cut-off frequency. Given the original matrix [ ] = [ ⋮⋱ ⋮], there are 15 subbands in X, in which the … first subband is the first row, the second subband is the second row, and so on. When C = is calculated in PCA, the first value on the diagonal line , … is calculated by Appl. Sci. 2018, 8, 967 12 of 22

  a1 ... a225 Given the original matrix X =  . .. . , there are 15 subbands in [ 16 15 ]  . . .  a16 ... a240 X, in which the first subband is the first row, the second subband is the second row, and so 1 T on. When C = m X X is calculated in PCA, the first value e1 on the diagonal line e1, e2 ... e15 is calculated by

e1= ((a1 − a) ∗ (a1 − a) + (a2 − a) ∗ (a2 − a) + ... (a16 − a) ∗ (a16 − a))/16 2 2 2 2 = (a1 + a2 + ... a16 + 16a − 2a(a1 + a2 + ... a16))/16 (4) 2 2 2 2 = (a1 + a2 + ... a16 − 16a )/16

2 2 2 in which a1 + a2 + ... a16 is equal to the energy of the first subband E1, and a is the average value of the first subband. Therefore, the relationship between E1 and e1 is

2 E1 = 16(e1 + a ) (5) 2 Actually, the value of a is far less than e1, so E1 is equal to 16e1, and the relationships between E2 ... E15 and e2 ... e15 can be gotten by analogy. Therefore, through PCA, the energy of each subband is calculated, and the filter can be determined by the energy of each band. Considering the proportion of energy accumulation, Ak is

k e = ∑i=1 i Ak 15 (6) ∑i=1 ei

According to some experiments, when Ak = 99.6%, k is the proper cut-off band. When the signal passes through the filter, the noise signal will be filtered out, and the signal itself will not be too much damaged. Considering the frequency characteristics of the audio signal, the stop band setting is not low, and the signal with more than 20,000 Hz is often ignored by default, so each band of the above 15 bands will not be transmitted. Taken together, e1, e2, e3, e12, e13, e14, e15 will not be transmitted, and the index of the left 8 bands are quantized by 3 bits, so the bitrate for cut-off band is 75 bps.

4. PCA-Based Parametric Stereo The stereo coding method proposed in this paper, as the extension of mono coding method mentioned before, is shown in Figures 12 and 13. The encoder and decoder for stereo audio use the same module of PCA and quantization as mono audio. The differences between mono coding and stereo coding are elaborated in the following sections. In encoder, the two channels’ signal carries out MDCT and the two channels’ coefficients gather to generate an original matrix to do PCA; then, an improved parametric stereo module is used to downmix and calculate parameters of the high-band. Finally, a module based on PVQ is used for quantizing coefficients of matrix, and so on. In decoder, coefficients of mid downmix matrix and rotation matrix are used to generate mid channel; then, spatial parameters and other information are introduced to restore stereo signals. After inverse MDCT (iMDCT) and filtering, the signal can be regarded as the output signal. Appl. Sci. 2018, 8, x FOR PEER REVIEW 12 of 22

= (( −) ∗ ( −) + ( −) ∗ ( −) + …( −) ∗ ( −))/16 = ( + +⋯ +16-2( + +… ))/16 (4) = ( + +⋯ -16 )/16 in which + +⋯ is equal to the energy of the first subband , and is the average value of the first subband. Therefore, the relationship between and is

=16( + ) (5) Actually, the value of is far less than , so is equal to 16, and the relationships between … and … can be gotten by analogy. Therefore, through PCA, the energy of each subband is calculated, and the filter can be determined by the energy of each band. Considering the proportion of energy accumulation, is ∑ = (6) ∑

According to some experiments, when = 99.6%, k is the proper cut-off band. When the signal passes through the filter, the noise signal will be filtered out, and the signal itself will not be too much damaged. Considering the frequency characteristics of the audio signal, the stop band setting is not low, and the signal with more than 20,000 Hz is often ignored by default, so each band of the above 15 bands will not be transmitted. Taken together, ,,,,,, will not be transmitted, and the index of the left 8 bands are quantized by 3 bits, so the bitrate for cut-off band is 75 bps.

4. PCA-Based Parametric Stereo The stereo coding method proposed in this paper, as the extension of mono coding method mentioned before, is shown in Figures 12 and 13. The encoder and decoder for stereo audio use the same module of PCA and quantization as mono audio. The differences between mono coding and stereo coding are elaborated in the following sections. In encoder, the two channels’ signal carries out MDCT and the two channels’ coefficients gather to generate an original matrix to do PCA; then, an improved parametric stereo module is used to downmix and calculate parameters of the high-band. Finally, a module based on PVQ is used for quantizing coefficients of matrix, and so on. In decoder, coefficients of mid downmix matrix and rotation matrix are used to generate mid channel; then, Appl.spatial Sci. 2018parameters, 8, 967 and other information are introduced to restore stereo signals. After inverse13 of 22 MDCT (iMDCT) and filtering, the signal can be regarded as the output signal.

8 Rotation Left MDCT Channel subframes matrix Cutoff band Construct Left reduced- Downmix and Mid signal matrix and Quantization dimensional calculate bitstream PCA matrix parameters IC、ILD and encode

8 Right MDCT Right reduced- Channel subframes dimensional First principal component matrix Calculate parameters

Figure 12. 12. FlowchartFlowchart of stereo of stereo encoder. encoder. (MDCT, (MDCT, Modifi Modifieded Discrete Discrete Cosine Transform; Cosine Transform; PCA, Principle PCA, PrincipleComponent Component Analysis; Analysis;IC, Interaural IC, Interaural Coherence; Coherence; ILD, Interaural ILD, Interaural Level Difference). Level Difference). Appl. Sci. 2018, 8, x FOR PEER REVIEW 13 of 22

Mid downmix 8 Left output matrix Left subframes Mid channel Lowpass iPCA iMDCT signal Right filter Decode and Rotation channel 8 Bitstream Right output dequantization matrix subframes

ILD、IC First principal component Cutoff frequency

Figure 13. FlowchartFlowchart of ofstereo stereo decoder. decoder. (MDCT, (MDCT, Modifi Modifieded Discrete Discrete Cosine Cosine Transform; Transform; iPCA, inverse iPCA, inversePrinciple Principle Component Component Analysis; Analysis; IC, Interaural IC, Interaural Coherence; Coherence; ILD, Interaur ILD, Interauralal Level Difference). Level Difference).

4.1. Procession of Stereo SignalSignal Since thethe signalssignals inin two two channels channels of of the the stereo stereo tend tend to to have have high high correlation. correlation. The The signal signal of the of leftthe andleft and right right channels channels can be can constructed be constructed into one into original one or matrix.iginal matrix. Firstly, Firstly, the coefficients the coefficients from left from channel left channel and right channel construct original matrices [ ] and [ ], respectively. Then, and right channel construct original matrices Xl[m n] and Xl[m n] respectively. Then, matrices Xl[m n]. " # [ ] = matrices [ ] and [ ] are used to form a new matrix Xl,[ minn which] . Matrix X is used and Xr[m n] are used to form a new matrix X, in which = . Matrix [ ]X is used to obtain one Xr[m n] to obtain one rotation matrix [ ] by PCA, and [ ] can handle both left and right channel signals. rotation matrix P by PCA, and P can handle both left and right channel signals. That is, That is, [n k] [n k] = × Y[ ]l[m k] = Xl[ ][m n] × P[[ ]n k] (7)(7) = × Y[ ]r[m k] = Xr[ ][m n] × P[n[ ] k] (8)(8) IfIf the the first first six six principal principal components components are preserved, are preserved, most mostmono monoaudio signals audio signalscan be well can restored. be well restored.At this time, At this we time,keep wethe keepfirst thesix bases first six in basesprincipal in principal component component matrix and matrix obtain and rotation obtain rotation matrix matrix[ ] P. The reduced-dimensional. The reduced-dimensional matrices matrices of each of each sub-frame sub-frame are are Y[],,Y[],…,, ... , []Y .. [ 15 6 ] 1[156] 2[156] 8[156] Experiments were done to verify the design for stereo signals: 10 normal audio files and 5 artificial Experiments were done to verify the design for stereo signals: 10 normal audio files and 5 artificial synthesized audio files (the left channel and right channel have less correlation) were chosen as the synthesized audio files (the left channel and right channel have less correlation) were chosen as the test materials. Results of the subjective listening experiments are shown in Figures 14 and 15. We can test materials. Results of the subjective listening experiments are shown in Figures 14 and 15. We can consider that for most stereo signals, in which two channels have high relevance with each other, the consider that for most stereo signals, in which two channels have high relevance with each other, the proposed method for stereo signals perform as well as for mono signals. proposed method for stereo signals perform as well as for mono signals.

5

4

3

MOS 2 mono-PCA 1 stereo-PCA

0 1 2 3 4 5 6 Component left

Figure 14. Subjective MOS of high-relation stereo signal. (MOS, Mean Opinion Score). Appl. Sci. 2018, 8, x FOR PEER REVIEW 13 of 22

Mid downmix 8 Left output matrix Left subframes Mid channel Lowpass iPCA iMDCT signal Right filter Decode and Rotation channel 8 Bitstream Right output dequantization matrix subframes

ILD、IC First principal component Cutoff frequency Figure 13. Flowchart of stereo decoder. (MDCT, Modified Discrete Cosine Transform; iPCA, inverse Principle Component Analysis; IC, Interaural Coherence; ILD, Interaural Level Difference).

4.1. Procession of Stereo Signal Since the signals in two channels of the stereo tend to have high correlation. The signal of the left and right channels can be constructed into one original matrix. Firstly, the coefficients from left

channel and right channel construct original matrices [ ] and [ ], respectively. Then, [ ] matrices [ ] and [ ] are used to form a new matrix X, in which = . Matrix X is used [ ]

to obtain one rotation matrix [ ] by PCA, and [ ] can handle both left and right channel signals. That is,

[ ] =[ ] ×[ ] (7)

[ ] =[ ] ×[ ] (8) If the first six principal components are preserved, most mono audio signals can be well restored. At this time, we keep the first six bases in principal component matrix and obtain rotation matrix

[ ] . The reduced-dimensional matrices of each sub-frame are [],[],…,[] . Experiments were done to verify the design for stereo signals: 10 normal audio files and 5 artificial synthesized audio files (the left channel and right channel have less correlation) were chosen as the test materials. Results of the subjective listening experiments are shown in Figures 14 and 15. We can consider that for most stereo signals, in which two channels have high relevance with each other, the Appl. Sci.proposed2018, 8, 967 method for stereo signals perform as well as for mono signals. 14 of 22

5

4

3

MOS 2 mono-PCA 1 stereo-PCA

0 1 2 3 4 5 6 Component left Appl. Sci. 2018, 8, x FOR PEER REVIEW 14 of 22 FigureFigure 14. Subjective 14. Subjective MOS MOS of of high-relation high-relation stereostereo signal. signal. (MOS, (MOS, Mean Mean Opinion Opinion Score). Score).

5

4

3

MOS 2

mono-PCA 1 stereo-PCA

0 1 2 3 4 5 6 Component left

FigureFigure 15. 15. SubjectiveSubjective MOS MOS ofof low-relationlow-relation stereo stereo signal. signal.

4.2. Parameters4.2. Parameters in Parametric in Parametric Stereo Stereo In parametric stereo, Interaural Level Difference (ILD), Interaural Time Difference (ITD), In parametric stereo, Interaural Level Difference (ILD), Interaural Time Difference (ITD), and and Interaural Coherence (IC) are used to describe the difference between two channels’ signals. InterauralIn MDCT Coherence domain, (IC) the aboveare used parameters to describe in subband the differenceb are calculated between by: two channels’ signals. In MDCT domain, the above parameters in subband b are calculated by: Ab−1 ∑ Xl(k)Xl(k) k=Ab−1 ILD[b] = 10log10 ∑ ()() (9) Ab−1 ILD[b] =10∑ Xr(k)Xr(k) (9) ∑k=Ab−1 () () hXbl(k), Xbr(k)i IC[b] = R(Xbl(k), Xbr(k)) = <(),() > (10) IC[b] = R(),()=|Xbl(k)||Xbr(k)| (10) |()||()| While in MDCT domain, calculating ITD must introduce Modified Discrete Sine Transform While(MDST) in to MDCT calculate domain, Interaural calculating Phase Difference ITD (IPD)must instead introduce of ITD, Modified in which MDSTDiscrete is: Sine Transform (MDST) to calculate Interaural Phase Difference (IPD) instead of ITD, in which MDST is: N−1  2π  1 N  1  N Y(k) = x(n)w(n) sin n + + k + , k = 0, 1, 2 . . . , − 1 (11) ∑ N 2 2 14 2 1 2 () =n=0 ()() + + + , = 0,1,2 … , −1 (11) 2 4 2 2 in which () is the spectrum coefficients, () is the input signal in time domain, and () is the window function. Then, a new transform MDFT is introduced, () =() +(), in which () is the MDCT spectral coefficients, () is the MDST spectral coefficients, and IPD can be calculated by

∗ IPD[b] =∠ () () (12) Traditional decoder uses these parameters and a downmix signal to restore left channel’s signal and right channel’s signal. Compared with formula (4, 9, 10), when the method described in Section 4.1 is used to deal with stereo signals, ∑ () () and ∑ () () can be calculated in the processing of PCA; therefore, parametric stereo and PCA have high associativity. After PCA, we can get ILD and IC only by calculating <(),() >. In addition, we also need to calculate IPD by Formula (12); however, introducing MDST will bring computational complexity, and ITD or IPD mainly works on signals below 1.6 kHz that play smaller roles in domain. Thus, some improvements can be made to the parametric stereo according to the nature of the PCA.

4.3. PCA-Based Parametric Stereo

… … Given that the original matrix is X = [ ⋮⋱ ⋮], and the rotation matrix is P = [ ⋮⋱⋮], … … … the reduced-dimensional matrix is Y = XP = [ ⋮⋱⋮]. For the coefficients in the reduced- … dimensional matrix Y, Appl. Sci. 2018, 8, 967 15 of 22 in which Y(k) is the spectrum coefficients, x(n) is the input signal in time domain, and w(n) is the window function. Then, a new transform MDFT is introduced, Z(k) = X(k) + jY(k), in which X(k) is the MDCT spectral coefficients, Y(k) is the MDST spectral coefficients, and IPD can be calculated by   Ab−1 ∗ IPD[b] = ∠ ∑ Zl(k)Zr (k) (12) k=Ab−1

Traditional decoder uses these parameters and a downmix signal to restore left channel’s signal and right channel’s signal. Compared with formula (4, 9, 10), when the method described in Section 4.1 Ab−1 Ab−1 is used to deal with stereo signals, ∑ Xl(k)Xl(k) and ∑ Xr(k)Xr(k) can be calculated in the k=Ab−1 k=Ab−1 processing of PCA; therefore, parametric stereo and PCA have high associativity. After PCA, we can get ILD and IC only by calculating hXbl(k), Xbr(k)i. In addition, we also need to calculate IPD by Formula (12); however, introducing MDST will bring computational complexity, and ITD or IPD mainly works on signals below 1.6 kHz that play smaller roles in high frequency domain. Thus, some improvements can be made to the parametric stereo according to the nature of the PCA.

4.3. PCA-Based Parametric Stereo   a1 ... a225  . .. .  Given that the original matrix is X =  . . . , and the rotation matrix is P = a16 ... a240     p1 ... p76 b1 ... b49  . .. .   . .. .   . . . , the reduced-dimensional matrix is Y = XP =  . . . . For the p15 ... p90 b16 ... b64 coefficients in the reduced-dimensional matrix Y,

b1 = a1 p1 + a17 p2 + ... a225 p15 (13)

b2 = a2 p1 + a18 p2 + ... a226 p15 (14)

b16 = a16 p1 + a33 p2 + ... a240 p15 (15) The first column is only related to the first column of P (the first base). As Figure9 shows, main energy of the first base in the rotation matrix is entirely concentrated on the data in the first column of the first row. Therefore, the matrix Y can be approximated as

b1 = a1 p1 (16)

b2 = a2 p1 (17)

b16 = a16 p1 (18)

While p1 in the matrix P is approximately equal to 1. Therefore the first column in the matrix Y is equal to the first column originally in matrix X. When the sampling rate is 48 kHz, the first column in X indicates the coefficients from 0 to 1.6 kHz, which means that when calculating the restored signal, the points below 1.6 kHz in the frequency domain happen to be the first principal component. So, the first principal component can be used to restore signals below 1.6 kHz in frequency domain instead of introducing MDST and estimating binaural cues. In decoder, the spectrum of the left and right channels above 1.6 kHz can be restored according to the downmix reduced-dimensional matrix, rotation matrix, and spatial parameters. The spectrum of the left and right channels below 1.6 kHz can be restored according to the first principal component and the downmix reduced-dimensional matrix. Appl. Sci. 2018, 8, 967 16 of 22

4.4. Subbands and Bitrate The spectrums of signal are divided into several segments based on Equivalent Rectangular Bands (ERB) model. The subbands are shown in Table4.

Table 4. Subband division.

index 0 1 2 3 4 5 6 7 start 0 100 200 300 400 510 630 760 index 8 9 10 11 12 13 14 15 start 900 1040 1200 1380 1600 1860 2160 2560 index 16 17 18 19 20 21 22 23 start 3040 3680 4400 5300 6400 7700 9500 12,000 index 24 25 start 15,500 19,880

The quantization of space parameters uses ordinary vector quantization. The codebook with different parameters is designed based on the sensitivity of the human ear and the range of the parameter fluctuation of the experimental corpus. The codebooks of ILD and IC are shown in Tables5 and6, respectively.

Table 5. Codebook for ILD. (ILD, Interaural Level Difference).

index 0 1 2 3 4 5 6 7 ILD −20 −15 −11 −8 −5 −3 −1 0 index 8 9 10 11 12 13 14 15 ILD 1 3 5 8 11 13 15 20

Table 6. Codebook for IC. (IC, Interaural Coherence).

index 0 1 2 3 4 5 6 7 IC 1 0.94 0.84 0.6 0.36 0 −0.56 −1

According to the above codebooks, the ILD parameters of each subband are quantized using 4 bits, and the IC parameters of each subband are quantized using 3 bits. According to the above sub-band division, the number of sub-bands higher than 1.6 kHz accounts for half of the total number of sub-bands in the whole frequency domain, which is 13, so the number of bits needed for each frame’s spatial parameter is 13 × 7 = 91. For frequencies above 1.6 kHz, the rate of quantitative parameters is about 4.5 kbps. In the frequency domain less than 1.6 kHz, the first principal component is used to describe the signal directly. The rate of transmission of the first principal component is around 10 kbps, so the parameter rate of PCA-based parametric stereo is around 15 kbps. In traditional parametric stereo [24], IPD of each subband is quantized by 3 bits, so the parameter rate of the traditional parameter stereo is about (4 + 3 + 3 + 3) × 25 × 50 = 16.25 kbps. Therefore, compared with traditional parametric stereo, the rate of PCA-based parametric stereo is slightly reduced. Figure 16 shows the results of a 0–1 test for spatial sense. In this test, 12 stereo music from EBU test materials is chosen. Score 0 means the sound localization is stable, and score 1 means there are some unstable sound localization in test materials. The ratio in Figure 16 is calculated from the times of unstable localization, and lower ratio means better performance in the quality of spatial sense. Experiments show that compared with the traditional parametric stereo encoding method, the spatial sense of the audio source has been obviously improved through the PCA-based parametric stereo. Through the use of PCA, almost half of the amount of parameter estimation can be reduced, while the computational complexity still rises because of the increasing complexity of PCA. Appl. Sci. 2018, 8, x FOR PEER REVIEW 16 of 22

According to the above codebooks, the ILD parameters of each subband are quantized using 4 bits, and the IC parameters of each subband are quantized using 3 bits. According to the above sub- band division, the number of sub-bands higher than 1.6 kHz accounts for half of the total number of sub-bands in the whole frequency domain, which is 13, so the number of bits needed for each frame’s spatial parameter is 13 × 7 = 91. For frequencies above 1.6 kHz, the rate of quantitative parameters is about 4.5 kbps. In the frequency domain less than 1.6 kHz, the first principal component is used to describe the signal directly. The rate of transmission of the first principal component is around 10 kbps, so the parameter rate of PCA-based parametric stereo is around 15 kbps. In traditional parametric stereo [24], IPD of each subband is quantized by 3 bits, so the parameter rate of the traditional parameter stereo is about (4 + 3 + 3 + 3) × 25 × 50 = 16.25 kbps. Therefore, compared with traditional parametric stereo, the rate of PCA-based parametric stereo is slightly reduced. Figure 16 shows the results of a 0–1 test for spatial sense. In this test, 12 stereo music from EBU test materials is chosen. Score 0 means the sound localization is stable, and score 1 means there are some unstable sound localization in test materials. The ratio in Figure 16 is calculated from the times of unstable localization, and lower ratio means better performance in the quality of spatial sense. Experiments show that compared with the traditional parametric stereo encoding method, the spatial sense of the audio source has been obviously improved through the PCA-based parametric stereo. Through the use of PCA, almost half of the amount of parameter estimation can be reduced, while Appl.the computational Sci. 2018, 8, 967 complexity still rises because of the increasing complexity of PCA. 17 of 22

1 traditional parametric stereo 0.8 PCA-based parametric stereo

0.6

ratio 0.42 0.4

0.2 0.17

0 1 2 any unstable auditory localization

FigureFigure 16.16. TestTestresults results for for spatial spatial sense. sense.

5.5. TestTest and and Results Results TheThe methodmethod proposedproposed inin thisthis paperpaper performsperforms significantlysignificantly betterbetter withwith stereostereo signalssignals comparedcompared toto monomono signals.signals. Thus, Thus, this this section section only only presents presents th thee results results for for stereo stereo signals. signals. In Inorder order to verify to verify the theencoding encoding and and decoding decoding performance performance of ofthe the PCA- PCA-basedbased stereo stereo coding coding method, somesome optimizedoptimized modulesmodules suchsuch asas DTX, noise shaping, and and other other efficient efficient coding coding tools tools in in the the codec codec were were not not used used in intesting testing

5.1.5.1. DesignDesign ofof TestTest Based Based on on MUSHRA MUSHRA Appl. Sci. 2018, 8, x FOR PEER REVIEW 17 of 22 TheThe keykey pointspoints ofof thethe MUSHRAMUSHRA [ 25[25]] test test are are as as follows: follows: 5.1.1. Test Material 5.1.1. Test Material (i) Several typical EBU test sequences were selected: piano, trombone, percussion, vocals, song (i) Several typical EBU test sequences were selected: piano, trombone, percussion, vocals, song of of rock, multi sound source background and mixed voice, and so on. rock, multi sound source background and mixed voice, and so on. (ii) Contrast test objects: PCA-based codec signal that transmits two channels separately, PCA- (ii) Contrast test objects: PCA-based codec signal that transmits two channels separately, based codec signal with traditional parametric stereo, PCA-based codec signal with improved PCA-based codec signal with traditional parametric stereo, PCA-based codec signal with improved parametric stereo, G719 codec signal with traditional parametric stereo [24], HE-AACv2 codec signal, parametric stereo, G719 codec signal with traditional parametric stereo [24], HE-AACv2 codec signal, anchor signal, and original signal. In the algorithm proposed in this paper, the relationship between anchor signal, and original signal. In the algorithm proposed in this paper, the relationship between the quality of the restored signal and bitrate is not linear, as Figure 17 shows, which uses a simple the quality of the restored signal and bitrate is not linear, as Figure 17 shows, which uses a simple subjective test with different bitrate allocation; therefore, the test chooses a case in which the qualities subjective test with different bitrate allocation; therefore, the test chooses a case in which the qualities of restored signal and bitrate are both acceptable. of restored signal and bitrate are both acceptable.

5

4

3

MOS 2

1

0 20 40 60 80 Total bitrate

FigureFigure 17. 17.Relationship Relationshipbetween between quality quality and and bitrate. bitrate.

The bits allocations of each module in PCA–based codec for stereo signal are shown in Table 7.

Table 7. Bitrate allocation in encoder.

Module Bitrate Reduced-dimensional matrix 35 kbps Rotation matrix 5 kbps First principal component 10 kbps Spatial parameters and side information 5 kbps

(iii) In order to eliminate psychological effects, the order and the name of each test material in each group are random. The listener needs to select the original signal from the test signals and score 100 points, and the rest of the signals are scored by 0–100 according to overall quality, including and the spatial reduction degree.

5.1.2. Listeners 10 people with certain listening experiences were selected for the listening test, of which 5 were male, 5 were female, and each listener has normal hearing.

5.1.3. Auditory Environment All 10 listeners use headphones connected to a laptop in quiet environments.

5.2. Test Results After the test is finished, we calculated average value and the 95% confidence interval based on the listeners’ scores. The average confidence interval of each test codec is [77.2, 87.0], [74.4, 84.2], [70.5, Appl. Sci. 2018, 8, 967 18 of 22

The bits allocations of each module in PCA–based codec for stereo signal are shown in Table7.

Table 7. Bitrate allocation in encoder.

Module Bitrate Reduced-dimensional matrix 35 kbps Rotation matrix 5 kbps First principal component 10 kbps Spatial parameters and side information 5 kbps

(iii) In order to eliminate psychological effects, the order and the name of each test material in each group are random. The listener needs to select the original signal from the test signals and score 100 points, and the rest of the signals are scored by 0–100 according to overall quality, including sound quality and the spatial reduction degree.

5.1.2. Listeners 10 people with certain listening experiences were selected for the listening test, of which 5 were male, 5 were female, and each listener has normal hearing.

5.1.3. Auditory Environment All 10 listeners use headphones connected to a laptop in quiet environments.

5.2. Test Results After the test is finished, we calculated average value and the 95% confidence interval based Appl. Sci. 2018, 8, x FOR PEER REVIEW 18 of 22 on the listeners’ scores. The average confidence interval of each test codec is [77.2, 87.0], [74.4, 84.2], [70.5,80.7], 80.7],[65.8, [65.8,76.6], 76.6], [56.1, [56.1, 66.9], 66.9], and and[78.6, [78.6, 86.2]. 86.2]. After After removing removing three three outlier outlier data data (data (data beyond confidenceconfidence interval), thethe testtest resultsresults ofof MUSHRAMUSHRA areare shownshown inin FiguresFigures 1818 andand 1919..

100 100.0 90 82.1 79.8 82.4 80 75.6 71.2 70 61.5 60 50 40

MUSHRA SCORE 30 20 10 0 PCA_2 PCA_PS+ PCA_PS G.719 anchor HE-AACv2 reference

Figure 18. Results of MUSHRA test. (PCA_2 represents the PCA-based codec signal that is transmitted Figure 18. Results of MUSHRA test. (PCA_2 represents the PCA-based codec signal that is transmitted over two channels separately (75 kbps), PCA_PS+ represents PCA-based codec signal with improved parametric stereo (55 kbps), PCA_PS represents PCA-basedPCA-based codec signal with traditional parametric stereo (56 kbps), G.719 represents G.719 codec signalsignal with traditional parametric stereo (56 kbps), anchor represents anchor signal,signal, HE_AACv2 represents HE-AACv2 signal (55 kbps), and reference represents hiddenhidden referencereference signal).signal).

100 90 80 70 60 50 40

per item scores per item 30 20 10 0 123456

PCA_2 PCA_PS+ PCA_PS G.719 anchor HE-AACv2

Figure 19. MUSHRA score of per item test. (PCA_2 represents the PCA-based codec signal that is transmitted over two channels separately (75 kbps), PCA_PS+ represents PCA-based codec signal with improved parametric stereo (55 kbps), PCA_PS represents PCA-based codec signal with traditional parametric stereo (56 kbps), G.719 represents G.719 codec signal with traditional parametric stereo (56 kbps), anchor represents anchor signal; HE-AACv2 represents HE-AACv2 signal (55 kbps), and hidden reference material has been removed. 1–6 represents different test materials).

Compared with traditional parametric stereo, the PCA-based parametric stereo has less bitrate, higher quality, and better spatial sense. Compared with G719 with traditional parametric stereo with the same bitrate, PCA-based codec signal has better quality. Compared with HE-AACv2 signal, the Appl. Sci. 2018, 8, x FOR PEER REVIEW 18 of 22

80.7], [65.8, 76.6], [56.1, 66.9], and [78.6, 86.2]. After removing three outlier data (data beyond confidence interval), the test results of MUSHRA are shown in Figures 18 and 19.

100 100.0 90 82.1 79.8 82.4 80 75.6 71.2 70 61.5 60 50 40

MUSHRA SCORE 30 20 10 0 PCA_2 PCA_PS+ PCA_PS G.719 anchor HE-AACv2 reference

Figure 18. Results of MUSHRA test. (PCA_2 represents the PCA-based codec signal that is transmitted over two channels separately (75 kbps), PCA_PS+ represents PCA-based codec signal with improved parametric stereo (55 kbps), PCA_PS represents PCA-based codec signal with traditional parametric stereo (56 kbps), G.719 represents G.719 codec signal with traditional parametric stereo (56 kbps), Appl. Sci.anchor2018 ,represents8, 967 anchor signal, HE_AACv2 represents HE-AACv2 signal (55 kbps), and reference19 of 22 represents hidden reference signal).

100 90 80 70 60 50 40

per item scores per item 30 20 10 0 123456

PCA_2 PCA_PS+ PCA_PS G.719 anchor HE-AACv2

Figure 19.19. MUSHRA score of per itemitem test.test. (PCA_2 representsrepresents the PCA-based codec signal that is transmitted overover two two channels channels separately separately (75 (75 kbps), kbps), PCA_PS+ PCA_PS+ represents represents PCA-based PCA-based codec codec signal signal with improvedwith improved parametric parametric stereo (55stereo kbps), (55 PCA_PSkbps), PC representsA_PS represents PCA-based PCA-based codec signal codec with signal traditional with parametrictraditional stereoparametric (56 kbps), stereo G.719 (56 representskbps), G.719 G.719 represents codec signal G.719 with codec traditional signal parametricwith traditional stereo (56parametric kbps), anchor stereo represents(56 kbps), anchor anchor signal; represents HE-AACv2 anchor represents signal; HE-AACv2 HE-AACv2 represents signal (55 HE-AACv2 kbps), and hiddensignal (55 reference kbps), materialand hidden hasbeen reference removed. material 1–6 represents has been differentremoved. test 1–6 materials). represents different test materials). Compared with traditional parametric stereo, the PCA-based parametric stereo has less bitrate, Compared with traditional parametric stereo, the PCA-based parametric stereo has less bitrate, higher quality, and better spatial sense. Compared with G719 with traditional parametric stereo higher quality, and better spatial sense. Compared with G719 with traditional parametric stereo with with the same bitrate, PCA-based codec signal has better quality. Compared with HE-AACv2 signal, the same bitrate, PCA-based codec signal has better quality. Compared with HE-AACv2 signal, the the average score of the PCA-based parametric stereo is slightly less than HE-AACv2. HE-AACv2 is a mature codec that uses several techniques to improve the quality, including Quadrature Mirror Filter (QMF), Spectral Band Replication (SBR), noise shaping and so on. The complexity of PCA is less than the part of the 32-band QMF in HE-AACv2. Considering the high complexity and maturity of HE-AACv2, the test results are optimistic. Conclusions can be drawn that the PCA-based codec method possesses good performance, especially for stereo signal in which the audio quality and spatial sense can be recovered well.

5.3. Complexity Analysis The module of principal component analysis can be regarded as a part of the singular value decomposition (SVD): the calculate procession of the right singular matrix and the singular value of original matrix X , therefore the algorithm complexity of principal component analysis module [ m n ] is O(nˆ3). According to the properties of SVD, when n < m, the computation complexity of the right singular matrix is half of the computation complexity of SVD for X . Therefore, the algorithm [ m n ] complexity and delay of PCA are far less than those of SVD. In the Intel i5-5200U processor, 4 GB memory, 2.2 GHz work memory, it takes 20 ms to finish one part of PCA. Given the time reduction of parametric stereo, the delay of PCA-based codec algorithm is in the acceptable range. In the part of multi-frame joint PCA, the forming of the original matrix takes 40 ms. When the first frame finishes MDCT, the process of forming original matrix will begin. Besides, the thread of PCA is different from matrix construction, and MDCT windowing also belongs to the calculating thread. Suppose the time for MDCT of first frame is t1; the whole delay can be regarded as around 40 + t1 ms, which is around 50 ms. The delay of the algorithm proposed in this paper still has space to be improved, and we can Appl. Sci. 2018, 8, x FOR PEER REVIEW 19 of 22 average score of the PCA-based parametric stereo is slightly less than HE-AACv2. HE-AACv2 is a mature codec that uses several techniques to improve the quality, including Quadrature Mirror Filter (QMF), Spectral Band Replication (SBR), noise shaping and so on. The complexity of PCA is less than the part of the 32-band QMF in HE-AACv2. Considering the high complexity and maturity of HE- AACv2, the test results are optimistic. Conclusions can be drawn that the PCA-based codec method possesses good performance, especially for stereo signal in which the audio quality and spatial sense can be recovered well.

5.3. Complexity Analysis The module of principal component analysis can be regarded as a part of the singular value decomposition (SVD): the calculate procession of the right singular matrix and the singular value of original matrix [], therefore the algorithm complexity of principal component analysis module is O(n^3). According to the properties of SVD, when n < m, the computation complexity of the right singular matrix is half of the computation complexity of SVD for []. Therefore, the algorithm complexity and delay of PCA are far less than those of SVD. In the Intel i5-5200U processor, 4 GB memory, 2.2 GHz work memory, it takes 20 ms to finish one part of PCA. Given the time reduction of parametric stereo, the delay of PCA-based codec algorithm is in the acceptable range. In the part of multi-frame joint PCA, the forming of the original matrix takes 40 ms. When the first frame finishes MDCT, the process of forming original matrix will begin. Besides, the thread of PCA is different from matrix construction, and MDCT windowing also belongs to the calculating thread. Suppose the time Appl. Sci. 2018, 8, 967 20 of 22 for MDCT of first frame is t1; the whole delay can be regarded as around 40 + t1 ms, which is around 50 ms. The delay of the algorithm proposed in this paper still has space to be improved, and we can make the balance of delay and bitrate better by adjustingadjusting the number of multi frames using a more intelligentintelligent strategystrategy inin thethe future.future.

6. Discussion This paper just presents aa preliminarypreliminary algorithm.algorithm. There is still much space for improvement in real applications.applications. One question worth further study is how to eliminate the noise. In the experiment, whenwhen thethe number of bits bits or or the the number number of of principa principall components components is too is too small, small, the thenoise noise spectrum spectrum has hasspecial special nature, nature, as Figures as Figures 20–22 20show.–22 show.Signal in Signal Figure in 20 Figure is restored 20 is by restored three components; by three components; compared comparedwith signal with in Figures signal in21 Figures and 22, 21 the and spectrum 22, the spectrumof noise in of noisehigh-frequency in high-frequency domain has domain obvious has obviousrepeatability, repeatability, which occurs which once occurs every once 1.6 kHz. every Ther 1.6efore, kHz. low Therefore, pass filter low mentioned pass filter in mentioned Section 3.3 inis Sectionnot the 3.3best is way not theto get best rid way of tothis get noise: rid of the this damage noise: theof original damage signal of original is unavoidable. signal is unavoidable. Ideally, an Ideally,adaptive an notch adaptive filter notch can filter filter the can spectrum filter the of spectrum noise clearly of noise and clearly not damage and not original damage signal. original However, signal. However,the design the of such design an ofadaptive such an notch adaptive filter notch needs filter to be needs studied to be more studied in the more future. in the future.

Figure 20. The spectrogram of the signal restored by three components. Figure 20. The spectrogram of the signal restored by three components. Appl. Sci. 2018, 8, x FOR PEER REVIEW 20 of 22

Figure 21. The spectrogram of the signal restored by four components. Figure 21. The spectrogram of the signal restored by four components.

Figure 22. The spectrogram of the signal restored by five components.

7. Conclusions The framework of proposed multi-frame PCA-based audio coding method has several differences compared to other ; therefore, there are lots of barriers to the design of an optimal algorithm. This paper proposed several ways to remove those barriers. For mono signal, the design of PCA-based coding method in this paper, including multi-frame signal processing, matrix design, and quantization design can hold it efficiently. As to stereo signal, PCA has high associativity with parametric stereo, which makes PCA-based parametric stereo certainly feasible and significant. Experimental results show satisfactory performance of the multi-frame PCA-based stereo audio coding method compared with the traditional audio codec. In summary, research on the multi-frame PCA-based codec, both for mono and stereo, has certain significance and needs further improvement. This kind of stereo audio coding method has good performance in processing different kinds of audio signals, but further studies are still needed before it can be widely applied.

Author Contributions: J.W. conceived the method and modified the paper, X.Z. performed the experiments and wrote the paper, X.X. and J.K. contributed suggestions, and J.W. supervised all aspects of the research.

Funding: National Natural Science Foundation of China (No. 61571044)

Acknowledgments: The authors would like to thank the reviewers for their helpful suggestions. The work in this paper is supported by the cooperation between BIT and Ericsson. Appl. Sci. 2018, 8, x FOR PEER REVIEW 20 of 22

Appl. Sci. 2018, 8, 967 21 of 22 Figure 21. The spectrogram of the signal restored by four components.

Figure 22. The spectrogram of the signal restored by five components. Figure 22. The spectrogram of the signal restored by five components.

7. Conclusions 7. Conclusions The framework of proposed multi-frame PCA-based audio coding method has several The framework of proposed multi-frame PCA-based audio coding method has several differences differences compared to other codecs; therefore, there are lots of barriers to the design of an optimal compared to other codecs; therefore, there are lots of barriers to the design of an optimal algorithm. algorithm. This paper proposed several ways to remove those barriers. For mono signal, the design This paper proposed several ways to remove those barriers. For mono signal, the design of PCA-based of PCA-based coding method in this paper, including multi-frame signal processing, matrix design, coding method in this paper, including multi-frame signal processing, matrix design, and quantization and quantization design can hold it efficiently. As to stereo signal, PCA has high associativity with design can hold it efficiently. As to stereo signal, PCA has high associativity with parametric stereo, parametric stereo, which makes PCA-based parametric stereo certainly feasible and significant. which makes PCA-based parametric stereo certainly feasible and significant. Experimental results Experimental results show satisfactory performance of the multi-frame PCA-based stereo audio show satisfactory performance of the multi-frame PCA-based stereo audio coding method compared coding method compared with the traditional audio codec. with the traditional audio codec. In summary, research on the multi-frame PCA-based codec, both for mono and stereo, has In summary, research on the multi-frame PCA-based codec, both for mono and stereo, has certain certain significance and needs further improvement. This kind of stereo audio coding method has significance and needs further improvement. This kind of stereo audio coding method has good good performance in processing different kinds of audio signals, but further studies are still needed performance in processing different kinds of audio signals, but further studies are still needed before it before it can be widely applied. can be widely applied. Author Contributions: J.W. conceived the method and modified the paper, X.Z. performed the experiments and Author Contributions: J.W. conceived the method and modified the paper, X.Z. performed the experiments and wrote the paper,paper, X.X.X.X. andand J.K.J.K. contributedcontributed suggestions,suggestions, andand J.W.J.W.supervised supervised all all aspects aspects of of the the research. research. Funding: National Natural Science FoundationFoundation of China (No. 61571044).61571044) Acknowledgments: The authorsauthors wouldwould like like to to thank thank the the reviewers reviewers for for their their helpful helpful suggestions. suggestions. The The work work in this in paperthis paper is supported is supported by the bycooperation the cooperation between between BIT andBIT Ericsson.and Ericsson. Conflicts of Interest: The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

1. Bosi, M.; Goldberg, R.E. Introduction to Coding and Standards; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2003; pp. 399–400, ISBN 1402073577. 2. Fatus, B. Parametric Coding for Spatial Audio. Master’s Thesis, KTH, Stockholm, Sweden, 2015. 3. Faller, C. Parametric joint-coding of audio sources. In Proceedings of the AES 120th Convention, Paris, France, 20–23 May 2016. 4. Blauert, J. Spatial Hearing: The Psychophysics of Human Sound Localization; MIT Press: Cambridge, MA, USA, 1983; pp. 926–927, ISBN 0262021900. 5. ISO/IEC. 13818-7: Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Art 7: Advanced Audio Coding (AAC); ISO/IEC JTC 1/SC 29: Klagenfurt, Austria, 2006. Appl. Sci. 2018, 8, 967 22 of 22

6. Herre, J. From Joint Stereo to Spatial Audio Coding—Recent Progress and Standardization. In Proceedings of the 7th International Conference of Digital Audio Effects (DAFx), Naples, Italy, 5–8 October 2004. 7. Jannesari, A.; Huda, Z.U.; Atr, R.; Li, Z.; Wolf, F. Parallelizing Audio Analysis Applications—A Case Study. In Proceedings of the 39th International Conference on Software Engineering: Software Engineering Education and Training Track (ICSE-SEET), Buenos Aires, Argentina, 20–28 May 2017; pp. 57–66. 8. ISO/IEC. 14496-3:2001/Amd 2: Parametric Coding for High-Quality Audio; ISO/IEC JTC 1/SC 29: Redmond, WA, USA, 2004. 9. Breebaart, J.; Par, S.V.D.; Kohlrausch, A.; Schuijers, E. Parametric coding of stereo audio. EURASIP J. Appl. Signal Process. 2005, 9, 1305–1322. [CrossRef] 10. Faller, C.; Baumgarte, F.D. Binaural cue coding—Part I: Psychoacoustic fundamentals and design principles. IEEE Trans. Speech Audio Process. 2003, 11, 509–519. [CrossRef] 11. Faller, C.; Baumgarte, F.D. Binaural cue coding—Part II: Schemes and applications. IEEE Trans. Speech Audio Process. 2003, 11, 520–531. [CrossRef] 12. Zhang, S.H.; Dou, W.B; Lu, M. Maximal Coherence Rotation for Stereo Coding. In Proceedings of the 2010 IEEE International conference on multimedia & Expo (ICME), Suntec City, Singapore, 19–23 July 2010; pp. 1097–1101. 13. Ahmed, N.; Natarajan, T.; Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. 1974, 1, 90–93. [CrossRef] 14. Smith, S.W. The Scientist and Engineer's Guide to Digital Signal Processing, 2nd ed.; California Technical Publishing: San Diego, CA, USA, 1997; ISBN 0-9660176. 15. Skodras, A.; Christopoulos, C.; Ebrahimi, T. The jpeg 2000 still standard. IEEE Signal Process. Mag. 2001, 18, 36–58. [CrossRef] 16. Pearson, K. On Lines and Planes of Closest Fit to Systems of Points in Space. Philos. Mag. 1901, 2, 559–572. [CrossRef] 17. Jia, M.S.; Bao, C.C.; Liu, X.; Li, R. A novel super-wideband embedded speech and audio codec based on ITU-T Recommendation G.729.1. In Proceedings of the 2009 Annual Summit and Conference of Asia-Pacific Signal and Information Processing Association (APSIPA ASC), Sapporo, Japan, 4–7 October 2009; pp. 522–525. 18. Jia, M.S.; Bao, C.C.; Liu, X.; Li, X.M.; Li, R.W. An embedded stereo speech and audio coding method based on principal component analysis. In Proceedings of the International Symposium on Signal Processing and Information Technology (ISSPIT), Bilbao, Spain, 14–17 December 2011; Volume 42, pp. 321–325. 19. Briand, M.; Virette, D.; Martin, N. Parametric representation of multichannel audio based on Principal Component Analysis. In Proceedings of the 120th Convention Audio Engineering Society Convention, Paris, France, 1–4 May 2006. 20. Goodwin, M. Primary-Ambient Signal Decomposition and Vector-Based Localization for Spatial Audio Coding and Enhancement. In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, HI, USA, 15–20 April 2007; pp. I:9–I:12. 21. Chen, S.X.; Xiong, N.; Park, J.H.; Chen, M.; Hu, R. Spatial parameters for audio coding: MDCT domain analysis and synthesis. Multimed. Tools Appl. 2010, 48, 225–246. [CrossRef] 22. Robert, C.S.; Stefan, W.; David, S.H. Mean opinion score (MOS) revisited: Methods and applications, limitations and alternatives. Multimed. Syst. 2016, 22, 213–227. [CrossRef] 23. Hou, Y.; Wang, J. Mixture Laplace distribution speech model research. Comput. Eng. Appl. 2014, 50, 202–205. [CrossRef] 24. Jiang, W.J.; Wang, J.; Zhao, Y.; Liu, B.G.; Ji, X. Multi-channel audio compression method based on ITU-T G.719 codec. In Proceedings of the Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IINMSP), Beijing, China, 8 July 2014. 25. ITU-R. BS.1534: Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems; International Telecommunication Union: Geneva, Switzerland, 2001.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).