A Fast Algorithm of Integer MDCT for Lossless Audio Coding
Total Page:16
File Type:pdf, Size:1020Kb
« ¬ A Fast Algorithm of Integer MDCT for Lossless Audio Coding HUANG HaiBin, RAHARDJA Susanto, YU Rongshan, LIN Xiao Agency for Science, Technology and Research (A*STAR) Institute for Infocomm Research (I2R), Singapore {hhuang, rsusanto, rsyu, [email protected]} scheme [8]. It is then critical that a fast Abstract IntMDCT with low rounding error be developed In this paper, a new fast algorithm to and exploited to meet the industry requirements. implement integer modified Discrete Cosine Transform (IntMDCT) is proposed. It is shown The IntMDCT can be derived from its prototype that the total rounding operations required for – the modified discrete cosines transform type- this algorithm is only 2.5N. As a result, its IV. In his book [9], Malvar gave an efficient approximation error is far less than that of the realization of MDCT by cascading a bank of directly converted integer transforms. At the Givens rotations with a DCT-IV block. It is well same time, the complexity is greatly reduced known that Givens rotation can be factorized which results in improved performance of a into three lifting steps for mapping integers to lossless audio coding system employing the integers [3]. With that, the realization of IntMDCT in terms of compression ratio and IntMDCT mainly relies on an efficient complexity costs. implementation of integer DCT-IV. 1. Introduction Integer transforms can be directly converted Reversible transforms that map integers to from their prototypes by replacing each Givens integers are important tools for lossless coding rotation with three lifting steps. For each lifting schemes. For integer inputs, such a transform step there is one rounding operation, the total generates integer outputs that approximate its rounding number of an integer transform is three floating-point prototype transform outputs. The times the Givens rotation number of the original integer inputs can be completely prototype transform. For discrete trigonometric recovered by passing the integer outputs from transforms, the number of Givens rotations the forward transform through the inverse needed is usually N log 2 N , where N is the transform. block size. Accordingly, the total rounding number is also N log 2 N for the family of Recently, integer transform has attracted more directly converted integer transforms. Because and more attention for both image and audio of the rounding, an integer transform can only coding [1-6]. Fast and efficient algorithms for approximate its floating-point prototype. high order N (N>16) on integer transform still Obviously, the approximation error increases poses a challenge in research. In audio coding, with the number of rounding operations. a modified discrete cosine transform (MDCT) is usually used. It is known that the MDCT can In this paper, a novel fast algorithm for realizing provide critical sampling with good frequency reversible integer DCT-IV is proposed. The fast selectivity. With that, it is employed in MPEG- algorithm has immediate application in 4 advanced audio coding (AAC) system [7]. improving the performance of a lossless audio Very recently, MPEG issued a call for proposal coding system. The total rounding number of on lossless audio coding for its extension 3. this algorithm is only 2.5N while the Integer MDCT (IntMDCT) is one of the key approximation error is shown to be far less than components within this possible new coding that of the directly converted integer transforms. 0-7803-8484-9/04/$20.00 ©2004 IEEE IV - 177 ICASSP 2004 ¬ ¬ As a result, the lossless compression ratio of a IV IV K 1 C N / 2 D N / 2 2I N / 2 C N / 2 lossless audio coding system employing the fast algorithm improves and at same time the IV C N / 2 computational complexity is greatly reduced. K 2 , This paper is organized as follows. Section 2 2 describes the new algorithm. Section 3 presents K 2C IV D I discussions and numerical results on 3 N / 2 N / 2 N / 2 performance comparisons. Finally, in Section 4 a conclusion on this work is discussed. respectively and K1, K2, K3, H1, H2, H3, D are ª (N 1)S º 2. Integer Type-IV DCT tan « 8N » The discrete cosine transform type IV (DCT-IV) « » « » of an N-point real input sequence x(n) is H 3S 1 « tan » defined as follows [2]: « 8N » « S » tan IV « » y CN x ¬ 8N ¼ 2-1 x {x(n)}n 0,1,,N 1,䇭 y {y(m)}m 0,1,,N 1 IV ª S º whereC is the transform matrix defined as sin N « 4N » « 3S » « sin » IV 2 ª § (m 1 2)(n 1 2)S ·º H 2 4N CN «cos¨ ¸» 2-2 « » N ¬ © N ¹¼ « » « (N 1)S » m = 0, 1, …, N-1 and n = 0, 1, …, N-1 sin ¬« 4N ¼» IV CN can be decomposed into the following six H = H matrices: 3 1 IV 1 C N R1 R 2 S T 1 T2 Peo 2-3 ª º « 1 » « » where Peo is the even-odd permutation matrix, D « 1 » « » R1, R2, T1, T2, and S are « » ¬« 1¼» ªI º respectively. Equation 2-3 indicates that integer R N / 2 1 « H I » 2-4 DCT-IV transform can be realized by five lifting ¬ 1 N / 2 ¼ stages. As these lifting stages have identical structures, only the transform equation for one ªI N / 2 H 2 º stage is discussed here. It is trivial and R 2 « » 2-5 ¬ I N / 2 ¼ straightforward to write down the transform equations for all the other stages. ª D K º Let x and y be the input and outputN u1 integer T N / 2 2 2-6 1 « » vectors for the first transform stage, we have ¬ I N / 2 ¼ 2-9 I y T2 x ª N / 2 º 2-7 T2 « » ¬ K 3 I N / 2 ¼ Vectors x and y are divided into two halves, and the equation can be rewritten as: ª I º S N / 2 « » 2-8 ¬H 3 K1 I N / 2 ¼ IV - 178 ¬ ¬ computed and the MPEG-4 lossless audio y I x ª 1 º ª N / 2 º ª 1 º coding evaluation procedure is adopted where « » « » « » ¬y 2 ¼ ¬ K 3 I N / 2 ¼ ¬x2 ¼ the results are compared with the direct implementation method used in the RM0. The Thus, the forward integer transform is MSEs for the forward and inverse IntMDCTs y1 x1 are defined as: L1 1023 1 1 2 y 2 ¬¼K 3 x1 x 2 2-10 MSE e L ¦¦1024 i and the inverse integer transform is ji 0 0 x y Table 1 shows a comparison of the proposed 1 1 IntMDCT algorithm with the algorithm used in x 2 y 2 ¬¼K 3 x1 2-11 MPEG-4 RM0 and the new algorithm is where denotes rounding operation. integrated into the RM0 with the compression ¬¼ ratio performance compared with the original. 3. Discussion and Experimental Table 1. Comparison of proposed IntMDCT Results algorithm with MPEG-4 reference model 0 RM0 (Direct Proposed Compression ratio From Equations 2-10 and 2-11, there are N/2 Implementation) Algorithm improvement rounding operations in each lifting stage. (IntMDCT) *MSE 1.789 0.547 2-3 0.87% Therefore, from Equation , the total number (IntIMDCT) of rounding operations for the proposed integer *MSE 1.774 0.545 DCT-IV algorithm is five times N/2, which is (IntMDCT) **MSE 1.785 0.546 2.5N. From Equation 2-3, it can be seen that the 0.42% (IntIMDCT) majority of computational power is on the four **MSE 1.766 0.546 IV N/2 point DCT-IV subroutines CN / 2 , when N Complexity is a large value, e.g. N = 1024. It can be (instructions) 47,616 34,816 roughly estimated that the complexity of the *48kHz/16 bit test set; **96kHz/24 bit test set proposed integer DCT-IV to be twice that of the floating-point DCT-IV. In Table 1, it is shown that the new The proposed algorithm in this paper is algorithm demonstrates superior performance compared with the IntMDCT implementation compared with the algorithm employed in RM0. used in the MPEG-4 lossless audio codec [10] The RM0 of MPEG-4 lossless audio coding which was adopted as Reference Model 0 system is shown in Fig 2, which is the Advanced (RM0) in July 2003. Fig 1 shows the block Audio Zip (AAZ) technology adopted by MPEG diagram of performance evaluation of the recently [10]. In the encoder, the input audio transforms. Mean square error (MSE) is frames are transformed losslessly by using the integer MDCT (IntMDCT) to generate the rf IntMDCT coefficients. These coefficients are MDCT Round scaled, quantized and coded with perceptually xef weighted so that the noise introduced is best masked by the masking threshold of the human IntMDCT auditory system in the core AAC encoder. The pf resultant core layer bit-stream thus constitutes rt the minimum rate/quality unit of the final IMDCT Round lossless bit-stream. For optimal coding rf et efficiency, an error-mapping procedure is employed to remove the information that has been coded in the core layer from the IntMDCT IntIMDCT pt coefficients before being noiselessly coded by bit-plane Golomb code (BPGC) [11] to form Fig 1. Evaluation test for IntMDCT and IntIMDCT lossless bit stream. In the decoder, a reverse IV - 179 ¬ « process is shown in the lower part of Fig. 2. [3] R. Geiger, T. Sporer, J. Koller, K. According to MPEG audio test requirement, Brandenburg, “Audio Coding based on 450 seconds with 15 different types of music Integer Transforms” AES 111th Convention, files are used for each 48kHz/16-bit and New York, USA, Sept.