<<

id3484781 pdfMachine by Broadgun Software - a great PDF writer! - a great PDF creator! - http://www.pdfmachine.com http://www.broadgun.com

Audio based on Discrete Fractional Cosine Transform

Vijaya C a, J. S. Bhat Department of Physics, Karnatak University, Dharwad 580003, Karnataka,India. a SDM College of Engineering & Technology, Dharwad 580002, Karnataka,India.

Mailing Address: Dr. J. S. Bhat Department of Physics Karnatak University Dharwad-580003 Karnataka, India.

e-mail: [email protected], [email protected]

Ph: 0836 2215316, a 0836 2447465

ABSTRACT: In this paper we present a method of coding audio signals using Discrete Fractional Cosine Transform (DFRCT) and Set Partitioning In Hierarchical Tree (SPIHT). FRCT, a generalization of ordinary Cosine Transform with an order parameter a , is a transform suitable for processing a nonstationary signal. The value of compact domain a is optimized with the criteria of minimum Percent Root mean square Difference (PRD). The compact domain DFRCT coefficients are encoded with SPIHT encoding technique. The average compression ratio achieved in the codec for Assessment Material (SQAM) signals is above the stipulated 3:1 with insignificant error, resulting in perceptually lossless audio signal compression.

1 Audio Codec based on Discrete Fractional Cosine Transform

Vijaya C a, J. S. Bhat

Department of Physics, Karnatak University, Dharwad 580003, Karnataka,India. a SDM College of Engineering & Technology, Dharwad 580002, Karnataka,India.

continue to be important for Internet transmission of Abstract: We present a scheme of coding audio audio signals. Archiving and mixing of high fidelity audio signals using Discrete Fractional Cosine Transform recordings in professional environments also requires (DFRCT). The coefficients are encoded using Set of audio signals. The new DVD Partitioning In Hierarchical Tree (SPIHT). The scheme standards for storage of audio signals at higher resolution results in high compression ratio with perceptually and sampling rates employ lossless audio coding [8]. lossless audio signal compression and avoids the Some of the lossless compression algorithms are necessity of coding the error signal, leading to reduced AudioPak, DVD, LTAC, MUSICompress, OggSquish, computational time and codec complexity. Test results are Philips, , Sonarc and WA [8]. Amongst these presented for Sound Quality Assessment Material techniques, Lossless Transform Audio Compression (SQAM) signals. (LTAC) is the only algorithm based on and it employs DCT. However, in this algorithm error Key words: Audio signal compression, DFRCT, signal is also transmitted along with the coded SPIHT. coefficients. Moreover, DCT can perform well only when the signal is stationary, and the energy is exclusively 1. Introduction concentrated in certain bands.

The signal compression refers to the representation of For the processing nonstationary signal, which has the signal in a compact form so that it takes the least time time varying spectra, there is a need for a transform, for its transmission and the least space in storage device which is a joint function of time and frequency that with no loss of information of significant importance [2]. describes the energy density or signal intensity The signal compression methods are grouped into three simultaneously in the time and frequency plane [3,6,7]. categories – direct compression, parameter extraction The general principle of such transformation is to map a technique and transformation technique. In direct data signal of single independent variable, say, time, to a compression technique, high correlation among function of two independent variables- time and successive samples of the original signal is exploited [2]. frequency [6]. Short Time Fourier Transform (STFT) and The basic operation in parameter extraction involves (WT) are examples of linear TFRs extraction of significant parameters of the signal, such as [2,6] whereas Active Unterberger distribution (AUD) [6], amplitude and location of maxima and minima points, Wigner Distribution (WD) [1,6], Born-Jordan changes in slopes and zero crossing intervals. In Distribution (BJD), Page Distribution (PD) are examples transformation technique, the signal is transformed to of quadratic TFRs. Signal adaptive Radially Gaussian certain other domain where it is represented in terms of kernel Distribution (RGD) and Cohen’s nonnegative few highly de-correlated expansion coefficients. High de- Distribution (CND) [6] are nonlinear and non quadratic correlation among the coefficients reduces the redundancy TFRs. in the signal representation [2,5]. This fact enables independent quantization of each coefficient. [2,11]. Fractional Cosine Transform (FRCT) is a generalization of the ordinary cosine transform and it has Digital encoding of audio signal typically represents similar relationship with Fractional Fourier Transform each sample by 12-16 bits resulting in a rate of 96- (FRFT) as the ordinary cosine and sine transforms have 128kbps. There have been attempts to improve audio with the Fourier Transform (FT) [10]. Fractional domain coding techniques to increase the efficiency in is useful for solving some problems, which cannot be transmission and storage while maintaining the audio solved in the original domain [13]. Set Partitioning In signal quality. Audio coding is also essential for Hierarchical Tree (SPIHT) is a coding technique that achieving secure communication. Lossless audio coding better suits for the data to be coded having more zeros of CD quality stereo signals is very much than non-zeros [2]. In this paper we present coding essential for digital music distribution over the Internet. scheme, for audio signal compression, employing DFRCT techniques such as MPEG or MP3 as transform and SPIHT for binary coding the may not be acceptable for this application [8]. Because of coefficients. The scheme is tested on Sound Quality the limited Internet resources, lossless compression will Assessment Material (SQAM) [9]. We observe that the

2 compression ratio (CR) is significantly high with Hence to get the same kernel, when    / 2 , sampling insignificant reconstruction error. should be such that

S sin 2. Discrete FRCT tu  , (7) N

We discretize FRCT following the method described with S being sgn(sin()). Using equation (7) in (5), we get in [10]. With X (u) as the FRFT [1] of x(t) , FRCT is  defined as 2 N 1 cot 2 2 2 (2 t)  j (k n )t C (u) X (u) X ( u) , y(n)  y(k)e 2         1 j cot 2 sin k 0 2    M 2 Smn Smk   cos( N )cos( N ) . (8) j cot (u 2 t 2 ) / 2   e   cos(ut csc)x(t)dt (1) m0   To reduce RHS of equation (8) to y(n) , F c (m, n) must The samples of is written as  C (u) be normalized and the kernel becomes N 1 c c , , (2) Y (m)  F (m,n)y(n) m  (0, M ) n 0 2(1 jcot ) sin  c    F (m,n)  kmkn  N where u m u , t n t . y(n) denotes the samples cot  2 2 2 2     j (m u n t ) 2 mn of input with n  (0, N 1) .  e cos( N ) , (9)

c The kernel F (m, n) in equation (2) is defined as where k (1/ 2) for i 0  i   1 j cot = 1 otherwise. c   F (m,n)  2 t 2 The kernel in equation (9) reduces to that of DCT-I j cot (m 2 u 2 n 2 t 2 ) / 2  e     cos(mntucsc) (3) when    / 2 . The other definitions [12] of DCT can Inverse DFRCT is defined as be considered in deriving the DFRCT. To compute M IDFRCT, DFRCT is evaluated with order -, sampling y(n) F c* (n, m)Y c (m) , (4) interval u at the input and t at the output.     m0 with F c* (n, m) being complex conjugate of F c (m, n) . 3. SPIHT encoding   SPIHT is a well-known algorithm for signal From equation (2) and (3), compression. In fact it can simply be applied to binary code the time domain signal. However, it better suits for 2 N 1 cot 2 2 2  j (k n )t the data to be coded having more zeros than non-zeros. (2t) 2 y(n)  y(k)e We employ SPIHT to binary code the transform 2 sin    k 0 coefficients. The algorithm pseudo sorts the transform M coefficients and codes them, along with sorting  (cos(mntu csc) information, bit plane by bit plane. The for m0 transmission can thus be precisely defined so that only  cos(mktu csc)) (5) some of the most significant bits of each coefficient are sent. Due to this feature, this coding technique allows the The kernel of conventional DCT-I is given by [12] partial but progressive type reconstruction of the required coefficients from a small section of the bit stream produced. No complex arithmetic is involved in this C (2/ N) k k cos(mn ) , m,n (0, N) (6) N 1   m m N  encoding technique except for comparisons, bit level manipulations and a single search for the initial threshold.

3 4. New scheme for audio signal Compression Table 1: List of optimum a for SQAM signals. Lossless audio coding achieved typically by linear prediction of samples in time domain de-correlates the Test Test highly correlated time domain samples and reduces the Signal optimum a Signal optimum a signal energy that must be coded. Transform coding takes X1 1.2 X9 0.2 the advantage of the more harmonic nature of the audio X2 0.4 X10 0.8 signal. Application of DFRCT to a signal results in highly X3 1.2 X11 0.4 de-correlated coefficients in certain ath domain. The least X4 0.9 X12 1.1 valued coefficients are insignificant and hence are X5 1.9 X13 0.4 ignored. It is found that representation requires least X6 1.9 X14 0.3 number of nonzero significant coefficients. Since the X7 1.4 X15 0.7 magnitude of reconstruction error using these coefficients X8 1.7 X16 0.8 is found to be not more than 0.06, the compression may be termed as lossless compression [4]. 5. Results The audio signal is divided into frames of equal length amounting to 23.2 ms duration for a sampling rate Figure 1 shows certain segment of 250 samples from of 44.1kHz. The frame size so chosen reduces the loss of SQAM signal X1 and the reconstruction error signal, with the temporal localization as well as the side information, one of the sample with maximum error, when DFRCT is in terms of number of frames and hence the total number employed. The dynamic range of error signal is small. of significant coefficients. The value of compact domain The magnitude frequency spectrum of the original signal a is optimized for a typical frame, selected around X1 and the reconstruction error are shown in figure 2. maximum sample value, in the audio signal by Similar results are observed for all the SQAM signals. minimizing Percent Root mean square Difference (PRD). PRD is given by Some of the important performance measures of signal compression based on DFRCT obtained for SQAM N 2 signals are listed in Table 2. A segment of 10000 samples x(n)  y(n)  (10) are considered for each signal. CR ranges from 2.19 to PRD n N *100,  N 10.54 and bit rate varies in the range of 65.39 kbps to x(n) 2 314.99 kbps. SNR ranges from 25.22 to 43.62dB. Since  -6 n N the mean square error (MSE) is of the order of 10 , there with x(n) being original signal and y(n) the reconstructed is no need to code the error signal. signal.

0.6 The value for compact domain a is optimized by 0.4 repeated computation of DFRCT, IDFRCT and PRD, for Original signal Error signal different values of a from –2 to 2, till insignificant 0.2 difference between two successive PRDs is obtained. For

e 0 simplicity in computation, value of optimized for one d

a u t i l of the frames of the audio signal is used for the entire p m -0.2 signal. A -0.4 The optimum values of a for SQAM signals are listed in Table 1. The transform coefficients are quantized -0.6 by way of scaling up and rounding off to integers. A bit -0.8 stream is formed for each frame by applying SPIHT 0.028 0.029 0.03 0.031 0.032 0.033 0.034 0.035 0.036 Time (sec) encoder to the quantized coefficients. All the bits of the coefficients are considered in encoding and decoding so Figure 1: 250 samples of original and error signal. that no error is introduced due to SPIHT. The corresponding inverse transform is employed to reconstruct the audio signal.

4

References:

1. H M Ozaktas, Zeev Zalevsky and M Alper Kutay, ‘Fractional Fourier Transform With Applications In Optics And Signal Processing’, John Wiley & Sons, Ltd., Chapter 4 ,6 and 10. 2. Raghuveer M Rao and Ajit S Bopardikar, ‘Wavelet Transform – Introduction To Theory And Applications’, Addison Wesley, Chapter 5. 3. L Cohen, ‘Time Frequency Distributions-A Rreview’ Proc of IEEE vol. 77, no.7, pp. 941- 981, July 1989. Figure 2: Magnitude spectrum of original and error signal. 4. Mohammed Raad and Alfred Mertins, From ‘ Lossy To Loss Less Audio Coding Using Table 2: Performance measures on SQAM signals. th SPIHT , Proc of 5 Int. conf on digital audio ’ effects, DAFX, pp. 245-250 Sept. 2002. Test SNR CR Bit rate MSE -6 5. Bryan E Usevitch, ‘A Tutorial On Modern Lossy Signal (dB) (kbps) 10 Wavelet : Foundations of X1 34.24 3.22 214.00 4.73 JPEG 2000’, IEEE SP Magazine, pp. 22-35, Sept X2 43.62 2.36 291.85 5.55 2001. X3 30.91 3.40 202.65 4.59 6. F Hlawatsch and G F Boudreaux Bartels, ‘Linear X4 31.32 2.73 252.07 6.77 And Quadratic Time-Frequency Signal X5 25.22 2.19 314.99 7.09 Representation’, IEEE SP Magazine, pp. 21-67, X6 35.72 10.54 65.39 1.74 Apr. 1992. X7 31.13 3.86 178.17 4.11 7. L B Almeida, ‘The Fractional Fourier Transform X8 31.72 6.64 103.85 2.62 and Time Frequency Representation’, IEEE X9 34.81 2.97 232.32 5.57 Trans. SP, vol. 42, pp. 3084-3091, Nov. 1994. X10 31.11 2.23 308.70 6.83 8. Mat Hans and Ronald W Schafer, ‘Lossless X11 34.15 2.72 252.96 5.42 Compression of Digital Audio’, IEEE SP X12 33.91 3.45 199.95 5.25 Magazine, pp. 21-32, July 2001. X13 37.51 2.42 285.07 6.42 9. “MPEG web site at X14 34.52 2.63 262.01 5.91 http://www.tnt.unihannover.de/project/mpeg/aud X15 29.96 4.55 151.88 2.83 io” X16 31.42 7.42 92.84 2.04 10. Soo Chang Pei and Min-Hung Yeh, ‘The Discrete Fractional Cosine and Sine Transforms’, IEEE Trans. SP, vol. 49, no.6, pp. 6. Conclusion 1198-1207, June 2001. 11. Vivek Goyal, ‘Theoretical Foundations of Perceptually lossless compression of SQAM signals Transform Coding’, IEEE SP Magazine, pp. 9- is demonstrated. As compression is achieved by both, 21, Sept. 2001. DFRCT and SPIHT, an algorithm comprising DFRCT 12. Zhongde Wang, ‘Fast algorithms for the discrete combined with SPIHT coding is a promising compression W Transform and for the Discrete Fourier technique useful for obtaining low bit rate, high Transform ’, IEEE Trans. ASSP, vol. ASSP 32, compression ratio. In spite of complex transform no.4, pp. 803-816, Aug. 1984. coefficients, high compression ratio is achieved with 13. Vijaya C, J. S. Bhat, ‘Signal compression based insignificant error. As there is no perceivable distortion in on DFRFT and SPIHT’, Proc. Of National Conf. the reconstructed signals, there is no need for coding the On Electronic Circuits and Communication reconstruction error signal. Thus the scheme simplifies Systems, TIET, Patiala, pp 187-190, Sept 2004. and reduces the time for the coding and decoding procedures. The performance of the codec can be improved by implementing evaluation of compact domain separately for each frame of the signals.

5