DCT and Transform Coding

Total Page:16

File Type:pdf, Size:1020Kb

DCT and Transform Coding DCT and Transform Coding Yao Wang Polytechnic University, Brooklyn, NY11201 http://eeweb.poly.edu Outline • Transform coding – General principle •DCT – Definition, basis images – Energy distribution – Approximation with different number of basis images – Quantization of DCT coefficients ©Yao Wang, 2006 EE3414: DCT and Transform Coding 2 A Typical Compression System Input Transformed Quantized Binary Samples parameters parameters bitstreams Transfor- Quanti- Binary mation zation Encoding Prediction Scalar Q Fixed length Transforms Vector Q Variable length Model fitting (Huffman, …... arithmetic, LZW) ©Yao Wang, 2006 EE3414: DCT and Transform Coding 3 Motivation for “Transformation” • Motivation for transformation: – To yield a more efficient representation of the original samples. – The transformed parameters should require fewer bits to code. • Types of transformation used: – For speech coding: prediction • Code predictor and prediction error samples – For audio coding: subband decomposition • Code subband samples – For image coding: DCT and wavelet transforms • Code DCT/wavelet coefficients ©Yao Wang, 2006 EE3414: DCT and Transform Coding 4 Transform Coding • Represent an image as the linear combination of some basis images and specify the linear coefficients. + t 1 t2 t3 t4 ©Yao Wang, 2006 EE3414: DCT and Transform Coding 5 A Typical Transform Coder Transform Coefficient Coded Quantized Coefficients Indices Bitstream Coefficients Input Output Samples Samples Coder Inverse Inverse Forward Decoder Quantizer Transform Quantizer Channel Transform Run-Length Run-Length ©Yao Wang, 2006 EE3414: DCT and Transform Coding 6 Transform Basis Design • Optimality Criteria: – Energy compaction: a few basis images are sufficient to represent a typical image. – Decorrelation: coefficients for separate basis images are uncorrelated. • Karhunen Loeve Transform (KLT) is the Optimal transform for a given covariance matrix of the underlying signal. • Discrete Cosine Transform (DCT) is close to KLT for images that can be modeled by a first order Markov process (i.e., a pixel only depends on its previous pixel). ©Yao Wang, 2006 EE3414: DCT and Transform Coding 7 1D Unitary Transform Consider the N − point signal s(n) as an N - dimensional vector s 0 s = 1 s sN −1 The inverse transform says that s can be represented as the sum of N basis vectors = + + + s t0u0 t1u1 ... tN −1u N −1 where uk corresponds to the k - th transform kernel: u k,0 uk,1 u = k ... uk,N −1 The forward transform says that the expansion coefficient tk can be determined by the inner product of s with uk : N −1 = ()= * tk uk ,s ∑uk;nsn n=0 ©Yao Wang, 2006 EE3414: DCT and Transform Coding 8 1D Discrete Cosine Transform • Can be considered “real” version of DFT – Basis vectors contain only co-sinusoidal patterns DFT DCT π π π π = 1 2 k = 1 2 k + 2 k = α k + u exp j n cos n j sin n uk,n (k)cos (2n 1) k,n N N N N N 2N 1 2 α(0) = ,α(k) = ,k = 1,2,..., N −1 N N 2πk 2πk πk cos 0 sin 0 cos 1 N N 2N π π π 2 k 2 k k = 1 cos 1 + 1 sin 1 u = α(k) cos 3 uk j k 2N N N N N ... ... ... 2πk 2πk πk cos (N −1) sin (N −1) cos (2N +1) N N 2N ©Yao Wang, 2006 EE3414: DCT and Transform Coding 9 Example: 4-point DCT kπ 1 1 2 1 Using u = α(k) cos (2n +1),α(0) = = ,α(k) = = ,k ≠ 0, k ,n 2*4 4 2 4 2 π π 3π cos cos cos 8 4 8 1 3π 0.9239 3π 1 9π 0.3827 cos cos cos 1 1 1 8 1 0.3827 1 4 1 −1 1 8 1 − 0.9239 1D DCT basis are : u = ;u = = ;u = = ;u = = 0 2 1 1 2 5π 2 -0.3827 2 2 5π 2 −1 3 2 15π 2 0.9239 cos cos cos 1 8 - 0.9239 4 1 8 − 0.3827 7π 7π 21π cos cos cos 8 4 8 2 4 For s = , determine the transform coefficients t Also determine the reconstructed vector from all coefficients and two largest coeffcieints. 5 k. 3 Go through on the board ©Yao Wang, 2006 EE3414: DCT and Transform Coding 10 2D Separable Transform Consider the M × N − point image S as a M × N -dimensional array (matrix) S S ... S − 0,0 0,1 0,N 1 S1,0 S1,1 ... S1,N −1 S = ... ... ... ... SM −1,0 SM −1,1 ... SM −1,N −1 The inverse transform says that s can be represented as the sum of M × N basis images = + + + S T0,0U0,0 T0,1U0,1 ... TM −1,N −1U M −1,N −1 where Uk,l corresponds to the (k,l) - th transform kernel: * * * uk,0 uk,0ul,0 uk,0ul,1 ... uk,0ul,N −1 * * * T uk,1 * * * uk,1ul,0 uk,1ul,1 ... uk,1ul,N −1 U = u ()u = []u u ... u − = k,l k l ... l,0 l,1 l,N 1 ... ... ... ... * * * uk,N −1 uk,N −1ul,0 uk,N −1ul,1 ... uk,N −1ul,N −1 The forward transform says that the expansion coefficient Sk can be determined by the inner product of S and Uk,l : M −1 N −1 = ()= * Tk,l Uk,l ,S ∑∑U k,l;m,n Sm,n m=0 n=0 ©Yao Wang, 2006 EE3414: DCT and Transform Coding 11 2D Discrete Cosine Transform • Basis image = outer product of 1D DCT basis vector πk cos 1 2N π k 1 2 = α cos 3 α = α = = − uk;N (k) 2N , (0) , (k) ,k 1,2,..., N 1 N N ... πk cos (2N +1) 2N = ()T Uk,l;M ,N uk;M ul;N kπ lπ kπ lπ kπ lπ cos 1cos 1 cos 1cos 3 ... cos 1cos (2N +1) 2M 2N 2M 2N 2M 2N kπ lπ kπ lπ kπ lπ cos 3cos 1 cos 3cos 3 ... cos 3cos (2N +1) = α(k)α(l) 2M 2N 2M 2N 2M 2N ... ... ... kπ lπ kπ lπ kπ lπ cos (2M +1)cos 1 cos (2M +1)cos 3 ... cos (2M +1)cos (2N +1) 2M 2N 2M 2N 2M 2N ©Yao Wang, 2006 EE3414: DCT and Transform Coding 12 Basis Images of 8x8 DCT Low-Low High-Low Low-High High-High ©Yao Wang, 2006 EE3414: DCT and Transform Coding 13 Example: 4x4 DCT kπ 1 1 2 1 Using u = α(k) cos (2n +1),α(0) = = ,α(k) = = ,k ≠ 0, k ,n 2*4 4 2 4 2 π π 3π cos cos cos 8 4 8 1 3π 0.9239 3π 1 9π 0.3827 cos cos cos 1 1 1 8 1 0.3827 1 4 1 −1 1 8 1 − 0.9239 1D DCT basis are : u = ;u = = ;u = = ;u = = 0 2 1 1 2 5π 2 -0.3827 2 2 5π 2 −1 3 2 15π 2 0.9239 cos cos cos 1 8 - 0.9239 4 1 8 − 0.3827 7π 7π 21π cos cos cos 8 4 8 = ()T using Uk,l uk ul yields: 1 1 1 1 1 −1 −1 1 1 1 1 1 1 −1 −1 1 1 1 1 1 1 1 1 −1 −1 1 1 −1 −1 −1 −1 1 −1 1 1 −1 U = , U = , U = ,U = , ...... 0,0 4 1 1 1 1 0,2 4 1 −1 −1 1 2,0 4 −1 −1 −1 −1 2,2 4 −1 1 1 −1 1 1 1 1 1 −1 −1 1 1 1 1 1 1 −1 −1 1 ©Yao Wang, 2006 EE3414: DCT and Transform Coding 14 1 2 2 0 0 1 3 1 For S = , compute T 0 1 2 1 k,l 1 2 2 −1 = Using Tk,l (Uk ,l ,S) yields, e.g, 1 1 1 1 1 2 2 0 1 1 1 1 1 0 1 3 1 1 18 T = (U ,S) = , = ()1+ 2 + 2 + 0 + 0 +1+ 3 +1+ 0 +1+ 2 +1+1+ 2 + 2 −1 = = 4.5 0,0 0,0 4 1 1 1 1 0 1 2 1 4 4 1 1 1 1 1 2 2 −1 1 −1 −1 1 1 2 2 0 1 −1 1 1 −1 0 1 3 1 1 2 T = (U ,S) = , = ()1− 2 − 2 +1+ 3−1+1+ 2 −1+1− 2 − 2 −1 = − = −0.5 2,2 3,2 4 −1 1 1 −1 0 1 2 1 4 4 1 −1 −1 1 1 2 2 −1 Completing calculation using Matlab yields (using "dct2(S)") : 4.5 − 0.0793 − 3 1.1152 0.4619 − 0.5 0.1913 0 T = 0 2.0391 − 0.5 − 0.3034 − 0.1913 0 0.4619 − 0.5 Reconstruction from top 2x2 coefficients only yields: 1.09 1.54 1.38 0.86 1.44 1.9 1.63 0.95 S = 1.11 1.38 1.63 0.95 0.61 0.68 0.31 0.05 The reconstructed image varies slower than the original, because we cut off some high frequency coefficients DCT on a Real Image Block >>imblock = lena256(128:135,128:135) imblock= 182 196 199 201 203 201 199 173 175 180 176 142 148 152 148 120 148 118 123 115 114 107 108 107 >>dctblock =dct2(imblock) 115 110 110 112 105 109 101 100 dctblock=1.0e+003* 104 106 106 102 104 95 98 105 1.0550 0.0517 0.0012 -0.0246 -0.0120 -0.0258 0.0120 0.0232 99 115 131 104 118 86 87 133 0.1136 0.0070 -0.0139 0.0432 -0.0061 0.0356 -0.0134 –0.0130 112 154 154 107 140 97 88 151 0.1956 0.0101 -0.0087 -0.0029 -0.0290 -0.0079 0.0009 0.0096 145 158 178 123 132 140 138 133 0.0359 -0.0243 -0.0156 -0.0208 0.0116 -0.0191 -0.0085 0.0005 0.0407 -0.0206 -0.0137 0.0171 -0.0143 0.0224 -0.0049 -0.0114 0.0072 -0.0136 -0.0076 -0.0119 0.0183 -0.0163 -0.0014 –0.0035 -0.0015 -0.0133 -0.0009 0.0013 0.0104 0.0161 0.0044 0.0011 -0.0068 -0.0028 0.0041 0.0011 0.0106 -0.0027 -0.0032 0.0016 Low frequency coefficients (top left corner) are much larger than the rest! ©Yao Wang, 2006 EE3414: DCT and Transform Coding 16 Reconstructed Block Original block 182 196 199 201 203 201 199 173 175 180 176 142 148 152 148 120 148 118 123 115 114 107 108 107 115 110 110 112 105 109 101 100 104 106 106 102 104 95 98 105 99 115 131 104 118 86 87 133 112 154 154 107 140 97 88 151 145 158 178 123 132 140 138 133 Reconstructed using top 4x4 coefficients Reconstructed using top 2x2 coefficients only 190 192 195 197 196 189 179 172 162 161 158 154 149 146 143 141 175 169 163 163 164 160 150 141 159 157 154 151 147 143 140 138 147 136 124 120 122 121 114 106 153 151 149 145 141 137 135 133 115 110 103 98 96 94 93 92 145 144 141 138 134 131 128 126 96 104 109 104 93 89 96 104 137 135 133 130 126 123 121 119 102 117 130 123 106 98 109 124 129 128 125 122 119 116 114 113 126 139 148 138 119 111 121 136 123 122 119 117 114 111 109 108 148 155 156 144 125 118 126 138 119 118 116 114 111 108 106 105 ©Yao Wang, 2006 EE3414: DCT and Transform Coding 17 original 1 2 3 4 5 6 7 8 2 4 6 8 From 4x4 coef.
Recommended publications
  • Speech Coding and Compression
    Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression Corso di Networked Multimedia Systems Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression: OUTLINE Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Introduction Approaches to voice coding/compression I Waveform coders (PCM) I Voice coders (vocoders) Quality assessment I Intelligibility I Naturalness (involves speaker identity preservation, emotion) I Subjective assessment: Listening test, Mean Opinion Score (MOS), Diagnostic acceptability measure (DAM), Diagnostic Rhyme Test (DRT) I Objective assessment: Signal to Noise Ratio (SNR), spectral distance measures, acoustic cues comparison Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks
    [Show full text]
  • An End-To-End System for Automatic Characterization of Iba1 Immunopositive Microglia in Whole Slide Imaging
    Neuroinformatics (2019) 17:373–389 https://doi.org/10.1007/s12021-018-9405-x ORIGINAL ARTICLE An End-to-end System for Automatic Characterization of Iba1 Immunopositive Microglia in Whole Slide Imaging Alexander D. Kyriazis1 · Shahriar Noroozizadeh1 · Amir Refaee1 · Woongcheol Choi1 · Lap-Tak Chu1 · Asma Bashir2 · Wai Hang Cheng2 · Rachel Zhao2 · Dhananjay R. Namjoshi2 · Septimiu E. Salcudean3 · Cheryl L. Wellington2 · Guy Nir4 Published online: 8 November 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract Traumatic brain injury (TBI) is one of the leading causes of death and disability worldwide. Detailed studies of the microglial response after TBI require high throughput quantification of changes in microglial count and morphology in histological sections throughout the brain. In this paper, we present a fully automated end-to-end system that is capable of assessing microglial activation in white matter regions on whole slide images of Iba1 stained sections. Our approach involves the division of the full brain slides into smaller image patches that are subsequently automatically classified into white and grey matter sections. On the patches classified as white matter, we jointly apply functional minimization methods and deep learning classification to identify Iba1-immunopositive microglia. Detected cells are then automatically traced to preserve their complex branching structure after which fractal analysis is applied to determine the activation states of the cells. The resulting system detects white matter regions with 84% accuracy, detects microglia with a performance level of 0.70 (F1 score, the harmonic mean of precision and sensitivity) and performs binary microglia morphology classification with a 70% accuracy.
    [Show full text]
  • Digital Speech Processing— Lecture 17
    Digital Speech Processing— Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Processing • Waveform coding – sample-by-sample matching of waveforms – coding quality measured using SNR • Source modeling (block processing) – block processing of signal => vector of outputs every block – overlapped blocks Block 1 Block 2 Block 3 2 Model-Based Speech Coding • we’ve carried waveform coding based on optimizing and maximizing SNR about as far as possible – achieved bit rate reductions on the order of 4:1 (i.e., from 128 Kbps PCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech • to lower bit rate further without reducing speech quality, we need to exploit features of the speech production model, including: – source modeling – spectrum modeling – use of codebook methods for coding efficiency • we also need a new way of comparing performance of different waveform and model-based coding methods – an objective measure, like SNR, isn’t an appropriate measure for model- based coders since they operate on blocks of speech and don’t follow the waveform on a sample-by-sample basis – new subjective measures need to be used that measure user-perceived quality, intelligibility, and robustness to multiple factors 3 Topics Covered in this Lecture • Enhancements for ADPCM Coders – pitch prediction – noise shaping • Analysis-by-Synthesis Speech Coders – multipulse linear prediction coder (MPLPC) – code-excited linear prediction (CELP) • Open-Loop Speech Coders – two-state excitation
    [Show full text]
  • The Discrete Cosine Transform (DCT): Theory and Application
    The Discrete Cosine Transform (DCT): 1 Theory and Application Syed Ali Khayam Department of Electrical & Computer Engineering Michigan State University March 10th 2003 1 This document is intended to be tutorial in nature. No prior knowledge of image processing concepts is assumed. Interested readers should follow the references for advanced material on DCT. ECE 802 – 602: Information Theory and Coding Seminar 1 – The Discrete Cosine Transform: Theory and Application 1. Introduction Transform coding constitutes an integral component of contemporary image/video processing applications. Transform coding relies on the premise that pixels in an image exhibit a certain level of correlation with their neighboring pixels. Similarly in a video transmission system, adjacent pixels in consecutive frames2 show very high correlation. Consequently, these correlations can be exploited to predict the value of a pixel from its respective neighbors. A transformation is, therefore, defined to map this spatial (correlated) data into transformed (uncorrelated) coefficients. Clearly, the transformation should utilize the fact that the information content of an individual pixel is relatively small i.e., to a large extent visual contribution of a pixel can be predicted using its neighbors. A typical image/video transmission system is outlined in Figure 1. The objective of the source encoder is to exploit the redundancies in image data to provide compression. In other words, the source encoder reduces the entropy, which in our case means decrease in the average number of bits required to represent the image. On the contrary, the channel encoder adds redundancy to the output of the source encoder in order to enhance the reliability of the transmission.
    [Show full text]
  • Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 5, July - 2012 Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform Smita Vatsa, Dr. O. P. Sahu M. Tech (ECE Student ) Professor Department of Electronicsand Department of Electronics and Communication Engineering Communication Engineering NIT Kurukshetra, India NIT Kurukshetra, India Abstract reproduced with desired level of quality. Main approaches of speech compression used today are Aim of this paper is to explain and implement waveform coding, transform coding and parametric transform based speech compression techniques. coding. Waveform coding attempts to reproduce Transform coding is based on compressing signal input signal waveform at the output. In transform by removing redundancies present in it. Speech coding at the beginning of procedure signal is compression (coding) is a technique to transform transformed into frequency domain, afterwards speech signal into compact format such that speech only dominant spectral features of signal are signal can be transmitted and stored with reduced maintained. In parametric coding signals are bandwidth and storage space respectively represented through a small set of parameters that .Objective of speech compression is to enhance can describe it accurately. Parametric coders transmission and storage capacity. In this paper attempts to produce a signal that sounds like Discrete wavelet transform and Discrete cosine original speechwhether or not time waveform transform
    [Show full text]
  • Speech Coding Using Code Excited Linear Prediction
    ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) IJCST VOL . 5, Iss UE SPL - 2, JAN - MAR C H 2014 Speech Coding Using Code Excited Linear Prediction 1Hemanta Kumar Palo, 2Kailash Rout 1ITER, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha, India 2Gandhi Institute For Technology, Bhubaneswar, Odisha, India Abstract The main problem with the speech coding system is the optimum utilization of channel bandwidth. Due to this the speech signal is coded by using as few bits as possible to get low bit-rate speech coders. As the bit rate of the coder goes low, the intelligibility, SNR and overall quality of the speech signal decreases. Hence a comparative analysis is done of two different types of speech coders in this paper for understanding the utility of these coders in various applications so as to reduce the bandwidth and by different speech coding techniques and by reducing the number of bits without any appreciable compromise on the quality of speech. Hindi language has different number of stops than English , hence the performance of the coders must be checked on different languages. The main objective of this paper is to develop speech coders capable of producing high quality speech at low data rates. The focus of this paper is the development and testing of voice coding systems which cater for the above needs. Keywords PCM, DPCM, ADPCM, LPC, CELP Fig. 1: Speech Production Process I. Introduction Speech coding or speech compression is the compact digital The two types of speech sounds are voiced and unvoiced [1]. representations of voice signals [1-3] for the purpose of efficient They produce different sounds and spectra due to their differences storage and transmission.
    [Show full text]
  • An Ultra Low-Power Miniature Speech CODEC at 8 Kb/S and 16 Kb/S Robert Brennan, David Coode, Dustin Griesdorf, Todd Schneider Dspfactory Ltd
    To appear in ICSPAT 2000 Proceedings, October 16-19, 2000, Dallas, TX. An Ultra Low-power Miniature Speech CODEC at 8 kb/s and 16 kb/s Robert Brennan, David Coode, Dustin Griesdorf, Todd Schneider Dspfactory Ltd. 611 Kumpf Drive, Unit 200 Waterloo, Ontario, N2V 1K8 Canada Abstract SmartCODEC platform consists of an effi- cient, block-floating point, oversampled This paper describes a CODEC implementa- Weighted OverLap-Add (WOLA) filterbank, a tion on an ultra low-power miniature applica- software-programmable dual-Harvard 16-bit tion specific signal processor (ASSP) designed DSP core, two high fidelity 14-bit A/D con- for mobile audio signal processing applica- verters, a 14-bit D/A converter and a flexible tions. The CODEC records speech to and set of peripherals [1]. The system hardware plays back from a serial flash memory at data architecture (Figure 1) was designed to enable rates of 16 and 8 kb/s, with a bandwidth of 4 memory upgrades. Removable memory cards kHz. This CODEC consumes only 1 mW in a or more power-efficient memory could be package small enough for use in a range of substituted for the serial flash memory. demanding portable applications. Results, improvements and applications are also dis- The CODEC communicates with the flash cussed. memory over an integrated SPI port that can transfer data at rates up to 80 kb/s. For this application, the port is configured to block 1. Introduction transfer frame packets every 14 ms. e Speech coding is ubiquitous. Increasing de- c i WOLA Filterbank v mands for portability and fidelity coupled with A/D e and D the desire for reduced storage and bandwidth o i l Programmable d o r u utilization have increased the demand for and t D/A DSP Core A deployment of speech CODECs.
    [Show full text]
  • Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyafil
    1 Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyafil Abstract—Speech coding is a field where compression ear and undergoes a highly complex transformation paradigms have not changed in the last 30 years. The before it is encoded efficiently by spikes at the auditory speech signals are most commonly encoded with com- nerve. This great efficiency in information representation pression methods that have roots in Linear Predictive has inspired speech engineers to incorporate aspects of theory dating back to the early 1940s. This paper tries to cognitive processing in when developing efficient speech bridge this influential theory with recent cognitive studies applicable in speech communication engineering. technologies. This tutorial article reviews the mechanisms of speech Speech coding is a field where research has slowed perception that lead to perceptual speech coding. Then considerably in recent years. This has occurred not it focuses on human speech communication and machine because it has achieved the ultimate in minimizing bit learning, and application of cognitive speech processing in rate for transparent speech quality, but because recent speech compression that presents a paradigm shift from improvements have been small and commercial applica- perceptual (auditory) speech processing towards cognitive tions (e.g., cell phones) have been mostly satisfactory for (auditory plus cortical) speech processing. The objective the general public, and the growth of available bandwidth of this tutorial is to provide an overview of the impact has reduced requirements to compress speech even fur- of cognitive speech processing on speech compression and discuss challenges faced in this interdisciplinary speech ther.
    [Show full text]
  • Block Transforms in Progressive Image Coding
    This is page 1 Printer: Opaque this Blo ck Transforms in Progressive Image Co ding Trac D. Tran and Truong Q. Nguyen 1 Intro duction Blo ck transform co ding and subband co ding have b een two dominant techniques in existing image compression standards and implementations. Both metho ds actually exhibit many similarities: relying on a certain transform to convert the input image to a more decorrelated representation, then utilizing the same basic building blo cks such as bit allo cator, quantizer, and entropy co der to achieve compression. Blo ck transform co ders enjoyed success rst due to their low complexity in im- plementation and their reasonable p erformance. The most p opular blo ck transform co der leads to the current image compression standard JPEG [1] which utilizes the 8 8 Discrete Cosine Transform DCT at its transformation stage. At high bit rates 1 bpp and up, JPEG o ers almost visually lossless reconstruction image quality. However, when more compression is needed i.e., at lower bit rates, an- noying blo cking artifacts showup b ecause of two reasons: i the DCT bases are short, non-overlapp ed, and have discontinuities at the ends; ii JPEG pro cesses each image blo ck indep endently. So, inter-blo ck correlation has b een completely abandoned. The development of the lapp ed orthogonal transform [2] and its generalized ver- sion GenLOT [3, 4] helps solve the blo cking problem to a certain extent by b or- rowing pixels from the adjacent blo cks to pro duce the transform co ecients of the current blo ck.
    [Show full text]
  • Video/Image Compression Technologies an Overview
    Video/Image Compression Technologies An Overview Ahmad Ansari, Ph.D., Principal Member of Technical Staff SBC Technology Resources, Inc. 9505 Arboretum Blvd. Austin, TX 78759 (512) 372 - 5653 [email protected] May 15, 2001- A. C. Ansari 1 Video Compression, An Overview ■ Introduction – Impact of Digitization, Sampling and Quantization on Compression ■ Lossless Compression – Bit Plane Coding – Predictive Coding ■ Lossy Compression – Transform Coding (MPEG-X) – Vector Quantization (VQ) – Subband Coding (Wavelets) – Fractals – Model-Based Coding May 15, 2001- A. C. Ansari 2 Introduction ■ Digitization Impact – Generating Large number of bits; impacts storage and transmission » Image/video is correlated » Human Visual System has limitations ■ Types of Redundancies – Spatial - Correlation between neighboring pixel values – Spectral - Correlation between different color planes or spectral bands – Temporal - Correlation between different frames in a video sequence ■ Know Facts – Sampling » Higher sampling rate results in higher pixel-to-pixel correlation – Quantization » Increasing the number of quantization levels reduces pixel-to-pixel correlation May 15, 2001- A. C. Ansari 3 Lossless Compression May 15, 2001- A. C. Ansari 4 Lossless Compression ■ Lossless – Numerically identical to the original content on a pixel-by-pixel basis – Motion Compensation is not used ■ Applications – Medical Imaging – Contribution video applications ■ Techniques – Bit Plane Coding – Lossless Predictive Coding » DPCM, Huffman Coding of Differential Frames, Arithmetic Coding of Differential Frames May 15, 2001- A. C. Ansari 5 Lossless Compression ■ Bit Plane Coding – A video frame with NxN pixels and each pixel is encoded by “K” bits – Converts this frame into K x (NxN) binary frames and encode each binary frame independently. » Runlength Encoding, Gray Coding, Arithmetic coding Binary Frame #1 .
    [Show full text]
  • Predictive and Transform Coding
    Lecture 14: Predictive and Transform Coding Thinh Nguyen Oregon State University 1 Outline PCM (Pulse Code Modulation) DPCM (Differential Pulse Code Modulation) Transform coding JPEG 2 Digital Signal Representation Loss from A/D conversion Aliasing (worse than quantization loss) due to sampling Loss due to quantization Digital signals Analog signals Analog A/D converter signals Digital Signal D/A sampling quantization processing converter 3 Signal representation For perfect re-construction, sampling rate (Nyquist’s frequency) needs to be twice the maximum frequency of the signal. However, in practice, loss still occurs due to quantization. Finer quantization leads to less error at the expense of increased number of bits to represent signals. 4 Audio sampling Human hearing frequency range: 20 Hz to 20 Khz. Voice:50Hz to 2 KHz What is the sampling rate to avoid aliasing? (worse than losing information) Audio CD : 44100Hz 5 Audio quantization Sample precision – resolution of signal Quantization depends on the number of bits used. Voice quality: 8 bit quantization, 8000 Hz sampling rate. (64 kbps) CD quality: 16 bit quantization, 44100Hz (705.6 kbps for mono, 1.411 Mbps for stereo). 6 Pulse Code Modulation (PCM) The 2 step process of sampling and quantization is known as Pulse Code Modulation. Used in speech and CD recording. audio signals bits sampling quantization No compression, unlike MP3 7 DPCM (Differential Pulse Code Modulation) 8 DPCM (Differential Pulse Code Modulation) Simple example: Code the following value sequence: 1.4 1.75 2.05 2.5 2.4 Quantization step: 0.2 Predictor: current value = previous quantized value + quantized error.
    [Show full text]
  • A Novel Speech Enhancement Approach Based on Modified Dct and Improved Pitch Synchronous Analysis
    American Journal of Applied Sciences 11 (1): 24-37, 2014 ISSN: 1546-9239 ©2014 Science Publication doi:10.3844/ajassp.2014.24.37 Published Online 11 (1) 2014 (http://www.thescipub.com/ajas.toc) A NOVEL SPEECH ENHANCEMENT APPROACH BASED ON MODIFIED DCT AND IMPROVED PITCH SYNCHRONOUS ANALYSIS 1Balaji, V.R. and 2S. Subramanian 1Department of ECE, Sri Krishna College of Engineering and Technology, Coimbatore, India 2Department of CSE, Coimbatore Institute of Engineering and Technology, Coimbatore, India Received 2013-06-03, Revised 2013-07-17; Accepted 2013-11-21 ABSTRACT Speech enhancement has become an essential issue within the field of speech and signal processing, because of the necessity to enhance the performance of voice communication systems in noisy environment. There has been a number of research works being carried out in speech processing but still there is always room for improvement. The main aim is to enhance the apparent quality of the speech and to improve the intelligibility. Signal representation and enhancement in cosine transformation is observed to provide significant results. Discrete Cosine Transformation has been widely used for speech enhancement. In this research work, instead of DCT, Advanced DCT (ADCT) which simultaneous offers energy compaction along with critical sampling and flexible window switching. In order to deal with the issue of frame to frame deviations of the Cosine Transformations, ADCT is integrated with Pitch Synchronous Analysis (PSA). Moreover, in order to improve the noise minimization performance of the system, Improved Iterative Wiener Filtering approach called Constrained Iterative Wiener Filtering (CIWF) is used in this approach. Thus, a novel ADCT based speech enhancement using improved iterative filtering algorithm integrated with PSA is used in this approach.
    [Show full text]