Beyond Traditional Transform Coding

Beyond Traditional Transform Coding by Vivek K Goyal B.S. (University of Iowa) 1993 B.S.E. (University of Iowa) 1993 M.S. (University of California, Berkeley) 1995 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering|Electrical Engineering and Computer Sciences in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Martin Vetterli, Chair Professor Venkat Anantharam Professor Bin Yu Fall 1998 Beyond Traditional Transform Coding Copyright 1998 by Vivek K Goyal Printed with minor corrections 1999. 1 Abstract Beyond Traditional Transform Coding by Vivek K Goyal Doctor of Philosophy in Engineering|Electrical Engineering and Computer Sciences University of California, Berkeley Professor Martin Vetterli, Chair Since its inception in 1956, transform coding has become the most successful and pervasive technique for lossy compression of audio, images, and video. In conventional transform coding, the original signal is mapped to an intermediary by a linear transform; the final compressed form is produced by scalar quantization of the intermediary and entropy coding. The transform is much more than a conceptual aid; it makes computations with large amounts of data feasible. This thesis extends the traditional theory toward several goals: improved compression of nonstationary or non-Gaussian signals with unknown parameters; robust joint source–channel coding for erasure channels; and computational complexity reduction or optimization. The first contribution of the thesis is an exploration of the use of frames, which are overcomplete sets of vectors in Hilbert spaces, to form efficient signal representations. Linear transforms based on frames give representations with robustness to random additive noise and quantization, but with poor rate–distortion characteristics. Nonlinear, signal-adaptive representations can be produced with frames using a greedy algorithm called matching pursuit. Matching pursuit is a computationally feasible alternative to producing the most sparse approximate representation with respect to a frame. It exhibits good compression performance at low rates even with rather arbitrary frames. Optimal reconstruction is described for both linear and nonlinear frame-based representations. Within the conventional setting of basis representations, the Karhunen–Loève transform (KLT) is known to solve a variety of problems, including giving optimal compression of Gaussian sources. Its depen- dence on the probability density of the source, which is generally unknown, limits its application. Including a description of an estimated KLT in the coded data may create significant overhead. In a universal system the overhead is asymptotically negligible. This thesis introduces a method for universal transform coding of stationary Gaussian sources which utilizes backward adaptation. In a backward adaptive system, all parameter updates depend only on data already available at the decoder; hence, no side information is needed to describe the adaptation. Related to this is the development of new transform adaptation techniques based on stochastic gradient descent. These are inspired by an analogy between FIR Wiener filtering and transform coding. Transform coding is normally used as a source coding method; as such, it is optimized for a noiseless communication channel. Providing a noiseless or nearly noiseless communication channel usually requires channel coding or a retransmission protocol. Rather than using a system with separate source and channel coding, 2 better overall performance with respect to rate and distortion, with a bound on delay, can often be achieved with joint source–channel coding. This thesis introduces two computationally simple joint source–channel coding methods for erasure channels. Source coding for an erasure channel is analogous to multiple description coding; it is in this context that these techniques are presented. The first technique uses a square transform to produce transform coefficients that are correlated so lost coefficients can be estimated. This technique uses discrete transforms that are also shown to be useful in reducing the complexity of entropy coding. Analyzed with fine quantization approximations and ideal entropy coding, the rate allocated to channel coding is continuously adjustable with fixed block length. The second technique uses a quantized frame expansion. This is similar to scalar quantization followed by a block channel code, except that quantization and addition of redundancy are swapped. In certain circumstances—marked especially by block length constraints and a lack of knowledge of the channel state—the end-to-end performance significantly exceeds that of a system with separate source and channel coding. The multiple description coding techniques both give graceful performance degradation in the face of random loss of transform coefficients. The final topic of the thesis is the joint optimization of computational complexity and rate–distortion performance. The basic idea is that higher computational complexity is justified only by better performance; a framework for formalizing this conviction is presented. Sample analyses compare unstructured and harmonic transforms, and show that JPEG encoding complexity can be reduced with little loss in performance. iii Contents List of Figures vi List of Tables viii Acknowledgements ix 1 Introduction and Preview 1 1.1 Traditional Transform Coding ...................................... 1 1.1.1 Mathematical Communication ................................. 2 1.1.2 Quantization ........................................... 5 1.1.3 Series Signal Expansions and Transform Coding ....................... 11 1.1.4 Applications ........................................... 15 1.2 Thesis Themes .............................................. 16 1.2.1 Redundant Signal Expansion .................................. 17 1.2.2 Adaptive Signal Expansion ................................... 17 1.2.3 Computational Optimization .................................. 18 2 Quantized Frame Expansions 19 2.1 Introduction ................................................ 19 2.2 Nonadaptive Expansions ......................................... 20 2.2.1 Frames .............................................. 20 2.2.2 Reconstruction from Frame Coefficients ............................ 23 2.3 Adaptive Expansions ........................................... 28 2.3.1 Matching Pursuit ......................................... 29 2.3.2 Quantized Matching Pursuit .................................. 31 2.3.3 Lossy Vector Coding with Quantized Matching Pursuit ................... 36 2.4 Conclusions ................................................ 43 2.A Proofs ................................................... 44 2.A.1 Proof of Theorem 2.1 ...................................... 44 2.A.2 Proof of Proposition 2.2 ..................................... 45 2.A.3 Proof of Proposition 2.5 ..................................... 45 2.B Frame Expansions and Hyperplane Wave Partitions ......................... 47 2.C Recursive Consistent Estimation .................................... 48 2.C.1 Introduction ........................................... 48 2.C.2 Proposed Algorithm and Convergence Properties ....................... 49 2.C.3 A Numerical Example ...................................... 51 2.C.4 Implications for Source Coding and Decoding ......................... 51 2.C.5 Final Comments ......................................... 52 CONTENTS iv 3 On-line Universal Transform Coding 54 3.1 Introduction ................................................ 54 3.2 Proposed Coding Methods ........................................ 55 3.2.1 System with Subtractive Dither ................................. 56 3.2.2 Undithered System ........................................ 57 3.3 Main Results ............................................... 57 3.3.1 System with Subtractive Dither ................................. 57 3.3.2 Undithered System ........................................ 59 3.4 Derivations ................................................ 60 3.4.1 Proof of Theorem 3.1 ...................................... 60 3.4.2 Proof of Theorem 3.2 ...................................... 61 3.4.3 Proof of Theorem 3.3 ...................................... 62 3.4.4 Proof of Theorem 3.4 ...................................... 63 3.5 Variations on the Basic Algorithms ................................... 66 3.6 Experimental Results ........................................... 66 3.6.1 Synthetic Sources ......................................... 67 3.6.2 Image Coding ........................................... 67 3.7 Conclusions ................................................ 70 3.A Parametric Methods for Adaptation .................................. 71 3.B Convergence with Independence Assumption ............................. 72 (k) (`) 3.C Calculation of E[Aij Aij ] ........................................ 73 4 New Methods for Transform Adaptation 75 4.1 Introduction ................................................ 75 4.2 Problem Definition, Basic Strategy, and Outline ............................ 76 4.3 Performance Criteria ........................................... 77 4.4 Methods for Performance Surface Search ................................ 77 4.4.1 Parameterization of Transform Matrices ............................ 78 4.4.2 Random Search .......................................... 79 4.4.3 Descent Methods ........................................

Beyond Traditional Transform Coding

The Discrete Cosine Transform (DCT): Theory and Application

Block Transforms in Progressive Image Coding

16.1 Digital “Modes”

Video/Image Compression Technologies an Overview

Predictive and Transform Coding

Speech Compression

Shannon's Coding Theorems

Information Theory and Coding

Source Coding: Part I of Fundamentals of Source and Video Coding Full Text Available At

Explainable Machine Learning Based Transform Coding for High

Signal Processing, IEEE Transactions On

Fundamental Limits of Video Coding: a Closed-Form Characterization of Rate Distortion Region from First Principles