Improved Lossless Audio Coding Using the Noise-Shaped Intmdct

Improved Lossless Audio Coding using the Noise-Shaped IntMDCT Yoshikazu Yokotani∗, Ralf Geiger†, Gerald Schuller†, Soontorn Oraintara∗, and K. R. Rao∗ ∗University of Texas at Arlington, Electrical Engineering Dept., Arlington,TX, U.S.A. Emails: [yoshi,oraintar,rao]@uta.edu †Fraunhofer Institute for Digital Media Technology, Ilmenau, Germany Email: [ggr,shl]@idmt.fraunhofer.de Abstract— This paper discusses approximation noise shaping composed of (a) an identical sine window and time domain to improve the efficiency of the Integer Modified Discrete Cosine aliasing operation realized by the conventional three lifting Transform (IntMDCT)-based lossless audio codec. The scheme steps and (b) integer discrete cosine transform of type IV is applied to rounding operations associated with lifting steps to shape the noise spectrum towards the low frequency bands. (IntDCT-IV) operation realized by the MDL steps [4]. In In this paper, constraints on the noise shaping filter and a Fig. 1 (a), only the left channel case is drawn. cs[n] and design procedure with the constraints are discussed. Several noise s[n] are the lifting coefficients where cs[n]=(c[n] − 1)/s[n], shaping filters are designed and experimental results showing the c[n]=cosθ[n], s[n]=sinθ[n], and θ[n]=π/(2N)(n +0.5) improvement are presented. for n =0, ..., N/2 − 1. ea, eb, and ec is a rounding error introduced in the rounding operation associated with the first, I. INTRODUCTION AND GOAL second, and third lifting step, respectively. They are assumed The lifting scheme [1] based integer transforms are quite to be white noise and ranged between −0.5 and 0.5, provided useful for lossless coding applications such as audio [2] and that all the lifting coefficients are not quantized. In Fig. 1 (b), image [3] compression. These transforms are composed of the output of the window and time domain aliasing operation, × approximated plane rotations, 2 2 orthogonal matrices, each xtwL and xtwR, are the inputs of the IntDCT-IV operation. of which is generally realized by two or three lifting steps It should be noted that the subscript t is omitted from x twL associated with multiplications and rounding operations. Since and xtwR and the rest of the signals in Fig. 1 (b) since all every rounding operation introduces a rounding error, it is the signals processed by the IntDCT-IV are in the tth frame. accumulated and appears as “noise floor” in the transform Similar to Fig. 1 (a), the rounding errors introduced in the domain. Let us call this approximation noise. Although the first, second, and third MDL step, e1, e2, and e3, are assumed noise can be cancelled by the inverse transform, for lossless to have the same noise statistics. coding application, it is desirable that the noise floor level is kept small since it has a significant impact on the coding III. THEORY OF THE APPROXIMATION NOISE SHAPING efficiency. FILTER To improve the efficiency, the multi-dimensional lifting This MDL scheme-based stereo IntMDCT has the same (MDL) scheme was recently proposed [4]. This scheme sig- structure for the sine window and time domain aliasing op- nificantly lowers the entire level of the noise floor by reducing eration as the conventional IntMDCT. However, as for the the number of rounding operations as presented in [4]. In order IntDCT-IV operation, it requires rounding operations only to improve the efficiency further, reducing the noise at the high after each DCT-IV matrix, whereas the conventional IntMDCT frequency bands can be considered. Typically, the signal being requires a rounding operation after every scalar multiplication. compressed has energy concentrated near low frequencies and Therefore much fewer number of rounding operations are decays at high frequencies. Therefore, it is possible to improve needed, and the approximation noise can be reduced. The the efficiency by shaping the noise towards the low frequency lowered noise level, however, is flat over the transform domain. bands in such a way that the noise spectrum is under the This means that, if an input audio signal is bandlimited, IntMDCT spectral envelope of the signal. only the noise occupies the bands which are outside of the bandwidth of the signal. Since the noise at the bands has to II. MDL SCHEME-BASED STEREO INTMDCT be encoded as it is, this results in a limitation in the lossless The MDL scheme-based stereo IntMDCT [4] transforms coding efficiency. 2N stereo audio samples xtL[m] and xtR[m] for m = As a remedy to this problem, we apply the following 0,...,N − 1 into N spectral lines XL[k] and XR[k] for conventional noise shaping scheme [5] to a rounding operation k =0,...,N − 1 in the left and right channel, respectively, in each lifting step as illustrated in Fig.2. In this figure, where N = 1024. The subscript L and R indicate the left x[n] is an input to the lifting step which is multiplied by a and right channel, respectively. t denotes the frame number. scalar constant. This type of operations appears at the three The IntMDCT for the stereo signal, as illustrated in Fig. 1, is lifting steps of the sine window and time domain aliasing -1 x t L [n] -x (t-1) w L [N/2-1-n] x w L [0] X L [0] e a s[n] e c -1 x w L [N-1] X L [N-1] cs[n] e b cs[n] x t L [N-1-n] -x t w L [N/2+n] e 2 DCT-IV DCT-IV x (t+1) L [n] -x t w L [N/2-1-n] DCT-IV e a s[n] e c e 1 e 3 x w R [0] X R [0] cs[n] e b cs[n] x (t+1) L [N-1-n] -x (t+1) w L [N/2+n] x w R [N-1] X R [N-1] (a) (b) Fig. 1. The structure of the MDL scheme-based stereo IntMDCT. [] symbolizes a rounding operation. (a) the three lifting step structure for a sine window and time domain aliasing operation and (b) the MDL structure for the IntDCT-IV operation. x[n] Scalar/Matrix y[n] the IntDCT-IV computation. The variances are given by (7). Multiplication N−1 N−1 e[n] γ k 1 E e2 n 1 E e2 n , -1 [ ]= [ wL[ ]] = [ wR[ ]] (4) H’ [z] z N N n=0 n=0 N−1 Fig. 2. A block diagram of approximation noise shaping for a lifting step. 1 2 1 γ1[k]= E[e [n]] = , (5) [ ] symbolizes a rounding operation. N 1 12 n=0 N−1 1 2 1 γ2[k]= E[e2[n]] = . (6) operation. The scalar multiplication is replaced by a DCT-IV N 12 n=0 multiplication in case of MDL steps. After each multiplication, ⎧ the signal is added by a filtered version of the approximation 2 N −1− + 2 N −1− +1 ⎪ c [ 2 n] cs [ 2 n] noise e[n]. The result is then rounded and becomes the output ⎨⎪ 12 n ,..., N − of the lifting step y[n]. The filter h [n] is assumed to be causal. E e2 n E e2 n for =0 2 1. [ wL[ ]] = [ wR[ ]] = 2 − N +1 z ⎪ s [n 2 ] The approximation noise at the output is described in the - ⎩⎪ 12 N domain as: for n = 2 ,...,N − 1. (7) Y (z)=X(z)+(1+z−1H (z))E(z), (1) Since (4), (5), and (6) give a constant value for k = 0,...,N − 1 and the variance of e1 and e2 is 1/12 constant, E E where E(z) is the rounding noise spectrum. From (1), E(z) the approximation noise spectra L and R are approximately is shaped by the filter H(z)=1+z−1H (z). Thus, reducing flat all over the IntMDCT domain. E E 2 k E E2 k the noise at the high frequency bands is possible by designing Likewise, in case of noise shaping, [ L[ ]] and [ R[ ]] H(z) as a low pass filter. can be described as follows [6]: E E2 k E E2 k Let [ L[ ]] and [ R[ ]] be the variance of approximation noise after the IntMDCT computation in Case 2:Noise Shaping k the left and right channel, respectively. indicates the E E2 k ≈|H k |2 γ k γ k k ,...,N − [ L[ ]] [ ] ( [ ]+ 1[ ]) spectral line index and =0 1. According to [6], 2 these variances are approximately described as follows in +E[|h[n] ∗ e2[n]| ]+α[k](φ[k]+φ1[k]) case of both no noise shaping and noise shaping: +β[k](ψ[k]+ψ1[k]), (8) E E2 k ≈|H k |2 γ k γ k [ R[ ]] [ ] ( [ ]+ 2[ ]) Case 1:No Noise Shaping 2 +E[e3[n]] + α[k](φ[k]+φ2[k]) β k ψ k ψ k , E E2 k ≈ γ k γ k E e2 k , + [ ]( [ ]+ 2[ ]) (9) [ L[ ]] [ ]+ 1[ ]+ [ 2[ ]] (2) E E2 k ≈ γ k γ k E e2 k , [ R[ ]] [ ]+ 2[ ]+ [ 3[ ]] (3) where H[k] is a noise shaping filter designed in the DFT of type III (DFT-III) domain. h[n] is the impulse response. ∗ where γ[k], γ1[k], γ2[k] is an averaged variance of the approx- indicates the convolution operator and imation noise introduced in the three lifting steps in Fig. 1 (a), α k Re2 W kH k − Im2 W kH k , the first MDL step, and the second MDL step in Fig. 1 (b), [ ]= ( [ ]) ( [ ]) k k respectively. They are given by (4), (5), and (6). e wL[n] and β[k]=2 Re(W H[k])Im(W H[k]) , ewR[n] are the approximation noise associated with xwL[n] − π + 1 k j 2N (k 2 ) and xwR[n], which are illustrated in Fig.

Improved Lossless Audio Coding Using the Noise-Shaped Intmdct

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support