➠ ➡

A Fast Algorithm of Integer MDCT for Lossless Audio Coding

HUANG HaiBin, RAHARDJA Susanto, YU Rongshan, LIN Xiao Agency for Science, Technology and Research (A*STAR) Institute for Infocomm Research (I2R), Singapore {hhuang, rsusanto, rsyu, [email protected]}

scheme [8]. It is then critical that a fast Abstract IntMDCT with low rounding error be developed In this paper, a new fast algorithm to and exploited to meet the industry requirements. implement integer modified Discrete Cosine Transform (IntMDCT) is proposed. It is shown The IntMDCT can be derived from its prototype that the total rounding operations required for – the modified discrete cosines transform type- this algorithm is only 2.5N. As a result, its IV. In his book [9], Malvar gave an efficient approximation error is far less than that of the realization of MDCT by cascading a bank of directly converted integer transforms. At the Givens rotations with a DCT-IV block. It is well same time, the complexity is greatly reduced known that Givens rotation can be factorized which results in improved performance of a into three lifting steps for mapping integers to lossless audio coding system employing the integers [3]. With that, the realization of IntMDCT in terms of compression ratio and IntMDCT mainly relies on an efficient complexity costs. implementation of integer DCT-IV.

1. Introduction Integer transforms can be directly converted Reversible transforms that map integers to from their prototypes by replacing each Givens integers are important tools for lossless coding rotation with three lifting steps. For each lifting schemes. For integer inputs, such a transform step there is one rounding operation, the total generates integer outputs that approximate its rounding number of an integer transform is three floating-point prototype transform outputs. The times the Givens rotation number of the original integer inputs can be completely prototype transform. For discrete trigonometric recovered by passing the integer outputs from transforms, the number of Givens rotations the forward transform through the inverse needed is usually N log 2 N , where N is the transform. block size. Accordingly, the total rounding

number is also N log 2 N for the family of Recently, integer transform has attracted more directly converted integer transforms. Because and more attention for both image and audio of the rounding, an integer transform can only coding [1-6]. Fast and efficient algorithms for approximate its floating-point prototype. high order N (N>16) on integer transform still Obviously, the approximation error increases poses a challenge in research. In audio coding, with the number of rounding operations. a modified discrete cosine transform (MDCT) is usually used. It is known that the MDCT can In this paper, a novel fast algorithm for realizing provide critical sampling with good frequency reversible integer DCT-IV is proposed. The fast selectivity. With that, it is employed in MPEG- algorithm has immediate application in 4 advanced audio coding (AAC) system [7]. improving the performance of a lossless audio Very recently, MPEG issued a call for proposal coding system. The total rounding number of on lossless audio coding for its extension 3. this algorithm is only 2.5N while the Integer MDCT (IntMDCT) is one of the key approximation error is shown to be far less than components within this possible new coding that of the directly converted integer transforms.

0-7803-8484-9/04/$20.00 ©2004 IEEE IV - 177 ICASSP 2004 ➡ ➡

As a result, the ratio of a IV IV K 1  C N / 2 ˜D N / 2  2I N / 2 ˜C N / 2 lossless audio coding system employing the fast

algorithm improves and at same time the IV C N / 2 computational complexity is greatly reduced. K 2 , This paper is organized as follows. Section 2 2 describes the new algorithm. Section 3 presents K 2C IV D I discussions and numerical results on 3 N / 2 ˜ N / 2  N / 2 performance comparisons. Finally, in Section 4 a conclusion on this work is discussed. respectively and K1, K2, K3, H1, H2, H3, D are

ª (N 1)S º 2. Integer Type-IV DCT  tan « 8N » The discrete cosine transform type IV (DCT-IV) «  » « » of an N-point real input sequence x(n) is H 3S 1 «  tan » defined as follows [2]: « 8N » « S »  tan IV « » y CN ˜ x ¬ 8N ¼ 2-1

x {x(n)}n 0,1,,N 1,䇭 y {y(m)}m 0,1,,N 1

IV ª S º whereC is the transform matrix defined as sin N « 4N » « 3S » « sin » IV 2 ª § (m 1 2)(n 1 2)S ·º H 2 4N CN «cos¨ ¸» 2-2 « » N ¬ © N ¹¼ «  » « (N  1)S » m = 0, 1, …, N-1 and n = 0, 1, …, N-1 sin ¬« 4N ¼» IV CN can be decomposed into the following six H = H matrices: 3 1

IV 1 C N R1 ˜ R 2 ˜ S ˜ T ˜1 T2 ˜ Peo 2-3 ª º « 1 » «  » where Peo is the even-odd permutation matrix, D « 1 » « » R1, R2, T1, T2, and S are «  » ¬« 1¼» ªI º respectively. Equation 2-3 indicates that integer R N / 2 1 « H I » 2-4 DCT-IV transform can be realized by five lifting ¬ 1 N / 2 ¼ stages. As these lifting stages have identical structures, only the transform equation for one ªI N / 2 H 2 º stage is discussed here. It is trivial and R 2 « » 2-5 ¬ I N / 2 ¼ straightforward to write down the transform equations for all the other stages. ª D K º Let x and y be the input and outputN u1 integer T N / 2 2 2-6 1 « » vectors for the first transform stage, we have ¬ I N / 2 ¼ 2-9 I y T2 ˜x ª N / 2 º 2-7 T2 « » ¬ K 3 I N / 2 ¼ Vectors x and y are divided into two halves, and the equation can be rewritten as: ª I º S N / 2 « » 2-8 ¬H 3  K1 I N / 2 ¼

IV - 178 ➡ ➡

computed and the MPEG-4 lossless audio y I x ª 1 º ª N / 2 º ª 1 º coding evaluation procedure is adopted where « » « » ˜ « » ¬y 2 ¼ ¬ K 3 I N / 2 ¼ ¬x2 ¼ the results are compared with the direct implementation method used in the RM0. The Thus, the forward integer transform is MSEs for the forward and inverse IntMDCTs

y1 x1 are defined as: L1 1023 1 1 2 y 2 ¬¼K 3 ˜ x1  x 2 2-10 MSE e L ¦¦1024 i and the inverse integer transform is ji 0 0 x y Table 1 shows a comparison of the proposed 1 1 IntMDCT algorithm with the algorithm used in x 2 y 2  ¬¼K 3 ˜ x1 2-11 MPEG-4 RM0 and the new algorithm is where denotes rounding operation. integrated into the RM0 with the compression ¬¼ ratio performance compared with the original.

3. Discussion and Experimental Table 1. Comparison of proposed IntMDCT Results algorithm with MPEG-4 reference model 0 RM0 (Direct Proposed Compression ratio From Equations 2-10 and 2-11, there are N/2 Implementation) Algorithm improvement rounding operations in each lifting stage. (IntMDCT) *MSE 1.789 0.547 2-3 0.87% Therefore, from Equation , the total number (IntIMDCT) of rounding operations for the proposed integer *MSE 1.774 0.545 DCT-IV algorithm is five times N/2, which is (IntMDCT) **MSE 1.785 0.546 2.5N. From Equation 2-3, it can be seen that the 0.42% (IntIMDCT) majority of computational power is on the four **MSE 1.766 0.546 IV N/2 point DCT-IV subroutines CN / 2 , when N Complexity is a large value, e.g. N = 1024. It can be (instructions) 47,616 34,816 roughly estimated that the complexity of the *48kHz/16 bit test set; **96kHz/24 bit test set proposed integer DCT-IV to be twice that of the floating-point DCT-IV. In Table 1, it is shown that the new The proposed algorithm in this paper is algorithm demonstrates superior performance compared with the IntMDCT implementation compared with the algorithm employed in RM0. used in the MPEG-4 lossless audio [10] The RM0 of MPEG-4 lossless audio coding which was adopted as Reference Model 0 system is shown in Fig 2, which is the Advanced (RM0) in July 2003. Fig 1 shows the block Audio Zip (AAZ) technology adopted by MPEG diagram of performance evaluation of the recently [10]. In the encoder, the input audio transforms. Mean square error (MSE) is frames are transformed losslessly by using the integer MDCT (IntMDCT) to generate the rf IntMDCT coefficients. These coefficients are MDCT Round scaled, quantized and coded with perceptually xef weighted so that the noise introduced is best masked by the masking threshold of the human IntMDCT auditory system in the core AAC encoder. The pf resultant core layer bit-stream thus constitutes

rt the minimum rate/quality unit of the final IMDCT Round lossless bit-stream. For optimal coding rf et efficiency, an error-mapping procedure is employed to remove the information that has been coded in the core layer from the IntMDCT IntIMDCT pt coefficients before being noiselessly coded by bit-plane Golomb code (BPGC) [11] to form Fig 1. Evaluation test for IntMDCT and IntIMDCT lossless bit stream. In the decoder, a reverse

IV - 179 ➡ ➠

process is shown in the lower part of Fig. 2. [3] R. Geiger, T. Sporer, J. Koller, K. According to MPEG audio test requirement, Brandenburg, “Audio Coding based on 450 seconds with 15 different types of music Integer Transforms” AES 111th Convention, files are used for each 48kHz/16-bit and New York, USA, Sept. 2001 96kHz/24-bit test sets. It can be seen that the [4] R. Geiger, J. Herre, J. Koller and K. total compression ratios for both sequences are Brandenburg, “INTMDCT – a link between improved, and this is achieved without perceptual and lossless audio coding”, Proc. introducing any amendment from the original IEEE ICASSP2002, pp1813-1816 entropy coder and simultaneously the [5] Jin Li, “A Progressive to Lossless complexity cost has been reduced for more than Embedded Audio Coder (PLEAC) with 26%. Reversible Modulated Lapped Transform”, Proc. IEEE ICME2003, ppIII-221—III-224 4. Conclusion (reprint from ICASSP2003) In this paper, a new fast algorithm for realizing [6] Jia Wang, Jun Sun and Songyu Yu, “1-D reversible integer type-IV DCT is proposed. and 2-D transforms from integer to integer”, This algorithm requires only 2.5N rounding Proc. IEEE ICASSP2003, ppII-549 – II552 operations for every block of N input samples. [7] Information Technology – Coding of As a result, the approximation error is greatly Audiovisual Objects, Part 3. Audio, Subpart reduced. The proposed algorithm is low in 4 Time/Frequency Coding, ISO/JTC 1/SC computational complexity and modular in 29/WG11, 1998 structure. This algorithm is integrated and [8] “Final Call for Proposals on MPEG-4 evaluated with MPEG-4 audio extension 3 Lossless Audio Coding”, ISO/IEC JTC 1/SC lossless audio compression reference model 0. 29/WG 11 W5280, Shanghai, Oct. 2002 From the results it is clear that the new [9] H. S. Malvar, “Signal Processing with algorithm contributes to an overall Lapped Transforms” Artech House, 1992 improvement of compression ratio as well as [10]R.Yu, X. Lin and S. Rahardja, “Advanced lesser computational complexity. Audio Zip – A Scalable Perceptual and Lossless ”, ISO/IEC References JTC1/SC29/WG11 (m9134) MPEG2002 Awaji, Dec 2002. [1] K.Komatsu, K.Sezaki, “Design of Lossless [11]R. Yu, C.C. Ko, S. Rahardja, and X. Lin, LOT and its performance evaluation”, “Bit-Plane Golomb Coding for Sources with IEEE ICASSP2000, pp2119-2122 Laplacian Distributions,” Proc. IEEE [2] S.Oraintara, Y.Chen, T.Nguyen, “Integer ICASSP 2003. Fast Fourier Transform (INTFFT)”, Proc. IEEE ICASSP2001

$$& 0XOWLSOH[HU FRUH ,QSXW DXGLR LQW (UURU %3*& $$= 06 0'&7 PDSSLQJ HQFRGHU DXGLR

$ $ =  ( Q F R G H U /RVV\ GHFRGHG 'H0XOWLSOH[HU $$& DXGLR $$= FRUH DXGLR /RVV\ /RVVOHVV GHFRGHG ,QYHUVH %3*& LQW DXGLR HUURU 06 GHFRGHU ,0'&7 0DSSLQJ

$ $ =  ' H F R G H U Fig 2. AAZ lossless audio encoder/decoder system diagram

IV - 180