Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 5, July - 2012 Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform Smita Vatsa, Dr. O. P. Sahu M. Tech (ECE Student ) Professor Department of Electronicsand Department of Electronics and Communication Engineering Communication Engineering NIT Kurukshetra, India NIT Kurukshetra, India Abstract reproduced with desired level of quality. Main approaches of speech compression used today are Aim of this paper is to explain and implement waveform coding, transform coding and parametric transform based speech compression techniques. coding. Waveform coding attempts to reproduce Transform coding is based on compressing signal input signal waveform at the output. In transform by removing redundancies present in it. Speech coding at the beginning of procedure signal is compression (coding) is a technique to transform transformed into frequency domain, afterwards speech signal into compact format such that speech only dominant spectral features of signal are signal can be transmitted and stored with reduced maintained. In parametric coding signals are bandwidth and storage space respectively represented through a small set of parameters that .Objective of speech compression is to enhance can describe it accurately. Parametric coders transmission and storage capacity. In this paper attempts to produce a signal that sounds like Discrete wavelet transform and Discrete cosine original speechwhether or not time waveform transform based speech compression techniques resembles the original. are implemented with Run length encoding, By removing redundancy between Huffman encoding and Run length encoding neighbouring samples signal can be compressed. In followed by Huffman encoding. Reconstructed this paper we have implemented compression speech obtained from implementation are technique in two step, in 1st step a transform compared on the basis of Compression Factor(CF), function is applied on speech signal to get result Signal to noise ratio (SNR), Peak signal to noise with a new set of data with smaller values and more ratio (PSNR), Normalized root mean square error repetition, 2nd step is coding(compression) step, this (NRMSE), Retained signal energy (RSE). Results step willrepresent the data set in its minimal form shows that Discrete wavelet transform with RUN by using encoding techniques such as Run Length length encoding followed by Huffman encoding encoding, Huffman encoding, Run lengthencoding gives higher compression with intelligible followed by Huffman encoding. Performance reconstructed speech. measurescompression factor(CF),signal to noise ratio(SNR),peak signal to noise 1.Introduction ratio(PSNR),normalized root mean square error(NRMSE), retained signal energy (RSE) is Objective of speech is communication measured for reconstructed speech obtained from whether face to face or cell phone to cell phone. A DWT and DCT based speech compression huge amount of data is a big issue for transmission techniques. A comparative analysis of performance or storage. Speech compression is the technology of transform technique based speech compression of converting human speech into an efficiently systems for 3 different encoding methods is done. encoded representation that can later be decoded to In this paper a brief introduction of speech produce a close approximation of the original compression technique is presented in section 2, signal. Major objective of speech compression is to section 2 discusses Transform techniques, in represent speech with less or few number of bits section 3 implementation of speech compression with level of quality. Speech coding means based different transform technique with several representing speech signal in an efficient way for encoding methods is explained, section 4 presents transmission and storage such a way it can be simulation results obtained from MATLAB 7.12.0, followed by concluding remarks in section 5. www.ijert.org 1 International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 5, July - 2012 2. Overview of transform techniques 2.2 Discrete cosine transform In this paper DWT and DCT transforms are We can use DCT for speech compression exploited for speech compression. because of high correlation in adjacent coefficient. Energy compaction property of DCT is good, we 2.1. Discrete wavelet transform can often reconstruct a sequence very accurately from only a few DCT coefficients, this property of A wavelet is a waveform of effectively limited DCT is very useful for applications requiring data duration that has an average value zero. reduction[7]. Fundamental idea behind wavelet is to analyse Discrete cosine transform of 1-D sequence x(n) of according to scale[4]. Wavelets are functions that length N is satisfy certain mathematical requirements and are 1/2 2 N1 (2nm 1) used in representing data or other functions.wavelet X( m ) c x ( n )cos functions are localized in space. In contrast Fourier m NN m0 2 sine and cosine functions are non-local and are active for all time t. This localization feature, along Where m=0,1,…..,N-1. with wavelets localization of frequency, makes The inverse discrete cosine transform is defined as many functions and operators using wavelets 1/2 N1 “sparse”, when transformed into the wavelet 2 (2nm 1) x( n ) c X ( m )cos domain. This sparseness, in turn results in a number m NN m0 2 of useful applications such as data compression, detecting features in images and de -noising In both equations c is defined as signals. The Discrete Wavelet Transform (DWT) m C = (1/2)1/2 for m=0 involves choosing scales and positions based on m 1 for m≠0 powers of two , so called dyadic scales and positions. The mother wavelet is rescaled or dilated, by powers of two and translated by 3. METHODOLOGY FOR COMPRESSION OF integers.Specifically, a function f(t) ∈ L2(R) (defines space of square integrable functions) can SPEECH SIGNAL be represented as[9]. In this paper we are implementing speech compression technique based on transform method L jLi.e. DCT and DWT. When we apply wavelet ft() djk (,)(2 tk ) aLK (,)(2 tk transform ) technique on speech signal original j1 k k Signal can be represented in terms of a wavelet The function ψ(t) is known as the mother wavelet, expansion (using coefficients in a linear while φ(t) is known as the scaling function. The set combination of wavelet function),similarly in case of functions of DCT transform speech can be represented in terms of DCT coefficient. Thus data operation can 2L (2 Lt k );2 j (2 j t k )| j L ,,, j k L Zbe performed using just the corresponding DWT and DCT coefficients. Transform techniques and where Z is the set of integers, is an orthonormal thresholding does not actually compress a signal, it basis for L2(R). simply provides information about the signal, The numbers a(L,k) are known as the which allows the data to be compressed by standard approximation coefficients at scale L, while d(j,k) encoding techniques. are known as the detail coefficients at scale j. Speech compression is achieved by neglecting The approximation and detail coefficients can be small coefficients as insignificant data and expressed as: discarding them and then applying quantization 1 and encoding scheme on coefficients. Speech a( L , k ) f ( t ) (2L t k ) dt compression algorithm is performed in following L 2 steps: (1). Transform technique on speech signal 1 j (2). Thresholding of transformed coefficient. d( j , k ) f ( t ) (2 t k ) dt 2 j (3). Quantization (4). Encoding wavelets with a high number of vanishing By following above 4 steps we will get a moments lead to a more compact signal compressed speech signal. For reconstruction of representation and are hence useful in coding speech signal inverse of above processes are applications[9]. performed i.e. decoding, de-quantization, inverse www.ijert.org 2 International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 5, July - 2012 transform. 4 different speech signal samples are selected. Step size ∆ is obtained using above 3 used for data base. Data base used are in .wav parameters. format, sampled at 8khz. ∆=(mmax-mmin)/L Explanation for compression algorithm: Input is divided into L+1 levels with equal interval 3.1. Transform technique on speech signal size ranging from m to m to create the min max DCT and DWT transform methods are used on quantization table. Coefficient values are quantized speech signal. We can reconstruct a sequence very to integer values[1]. accurately from only a few DCT coefficient, this property of DCT transform is used for data compression. Localization feature of wavelet along with time-frequency resolution property makes Speech signal them well suited for speech coding. Sparse coding of wavelet is utilized for compression application. The idea behind signal compression using wavelets is primarily linked to the relative scarceness of the Transform Reconstructed speech signal wavelet domain representation for the signal. techniques Wavelets concentrate speech information (energy and perception) into a few neighbouring Thresholding of Inverse transform coefficients [8]. coefficients Therefore as a result of taking the wavelet transform of a signal, many coefficients will either be zero or have negligible magnitudes. Data Quantization De-quantization compression is then achieved by treating small valued coefficients as insignificant data and thus discarding them. Compressed Encoding Decoding data 3.2. Thresholding After getting coefficient signal from different Fig. 1. Block diagram of compression technique transform methods thresholding is applied to signal. In case of DCT as we mentioned earlier that 3.4. Encoding very few DCT coefficient represent 99% of signal In this paper we are using 3 different approaches energy, hence threshold based on above mentioned property is calculated and applied to coefficient for encoding of quantized coefficients: obtained from DCT transform, coefficients of which values are less than thresholds value are 1) Run length encoding: Coefficients obtained truncated ie.