Data Compression: Transform Coding in Practice 2 / 50 Last Lecture Last Lectures: Orthogonal Block Transforms

Total Page:16

File Type:pdf, Size:1020Kb

Data Compression: Transform Coding in Practice 2 / 50 Last Lecture Last Lectures: Orthogonal Block Transforms Transform Coding in Practice Last Lecture Last Lectures: Basic Concept Transform Coding Transform reduces linear dependencies (correlation) between samples before scalar quantization For correlated sources: Scalar quantization in transform domain is more efficient encoder decoder 0 u0 q0 q0 u0 α0 β0 forward entropy entropy 0 inverse s u1 q1 b q1 u1 s0 transform α1 coding decoding β1 transform . γ −1 . 0 −1 A uN−1 qN−1 γ qN−1 uN−1 A αN−1 βN−1 Encoder (block-wise) Decoder (block-wise) −1 Forward transform: u = A · s Entropy decoding: fqk g = γ (b) Scalar quantization: q = α (u ) 0 k k k Inverse quantization: uk = βk (qk ) 0 −1 0 Entropy coding: b = γ( fqk g ) Inverse transform: s = A · u Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 2 / 50 Last Lecture Last Lectures: Orthogonal Block Transforms Transform matrix has property: A−1 = AT (special case of unitary matrix: A−1 = (A∗)T) 2 b0 3 2 3 6 b1 7 6 7 6 7 6 7 6 b2 7 −1 T 6 b b b ··· b 7 A = 6 7 A = A = 6 0 1 2 N−1 7 6 . 7 6 7 4 . 5 4 5 bN−1 −1 T Basis vectors bk (rows of A, columns of A = A ) form an orthonormal basis Geometric interpretation: Rotation (and potential reflection) in N-dimensional signal space Why Orthogonal Transforms ? 0 2 0 2 Same MSE distortion in sample and transform space: jju − ujj2 = jjs − sjj2 Minimum MSE in signal space can be achieved by minimization of MSE for each individual transform coefficient Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 3 / 50 Last Lecture Last Lectures: Bit Allocation and High-Rate Approximations Bit Allocation of Transform Coefficients Optimal bit allocation: Pareto condition @ Dk (Rk ) = −λ = const =) high rates: Dk (Rk ) = const @Rk High-Rate Approximation High-rate distortion rate function for transform coding with optimal bit allocation 1 1 2 2 −2R 2 Y 2N 2 Y 2N D(R) =" ~ · σ~ · 2 with "~ = "k ; σ~ = σk k k High-rate transform coding gain GT and energy compaction measure GEC D (R) "2 · σ2 σ2 1 PN−1 σ2 G = SQ = S S ; G = S = N k=0 k T 2 2 EC 2 q DTC (R) "~ · σ~ σ~ N QN−1 2 k=0 σk Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 4 / 50 Last Lecture Last Lectures: Karhunen Loève Transform (KLT) Design criterion: Orthogonal transform A that yields uncorrelated transform coefficients 2 2 3 σ0 0 ··· 0 6 0 σ2 ··· 0 7 T 6 1 7 2 CUU = A · CSS · A = 6 . 7 =) CSS · bk = σ · bk 6 . 7 k 4 . 5 2 0 0 ··· σN−1 Eigenvector equation for all basis vectors bk (rows of transform matrix A) Rows of KLT matrix A are the unit-norm eigenvectors of CSS 2 Transform coefficient variances σk are the eigenvalues of CSS 2 2 b0 3 2 σ0 0 ··· 0 3 2 6 b1 7 6 0 σ ··· 0 7 6 7 6 1 7 A = 6 . 7 C UU = 6 . 7 6 . 7 6 . .. 7 4 . 5 4 . 5 2 bN−1 0 0 ··· σN−1 Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 5 / 50 Last Lecture Last Lectures: Maximum Energy Compaction and Optimality High-Rate Approximation for KLT and Gauss-Markov High-rate operational distortion-rate function N−1 2 2 2 N −2R DN (R) = " · σS · (1 − % ) · 2 High-rate transform coding gain: Increases with transform size N N N 2 1−N 1 1 G = G = (1 − % ) N =) G = T EC T 1 − %2 For N ! 1, gap to fundamental lower bound reduces to space-filling gain (1.53 dB) On Optimality of KLT KLT yields uncorrelated transform coefficients and maximizes energy compaction GEC KLT is the optimal transform for stationary Gaussian sources Other sources: Optimal transform is hard to find (iterative algorithm) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 6 / 50 Signal-Independent Unitary Transforms Transform Selection in Practice Optimal Unitary Transform Stationary Gaussian sources: KLT General sources: Not straightforward to determine (typically KLT close to optimal) Signal dependent (may change due to signal instationarities) Adaptive Transform Selection Determine transform in encoder, include transform specification in bitstream Increased side information may lead to sub-optimal overall coding efficiency Simple variant: Switched transforms (e.g., in H.266/VVC) Signal-Independent Transforms Choose transform that provides good performance for variety of signals Not optimal, but often close to optimal for typical signal Most often used design in practice Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 7 / 50 Signal-Independent Unitary Transforms / Walsh-Hadamard Transform Walsh-Hadamard Transform For transform sizes N that are positive integer powers of 2 1 AN=2 AN=2 h i AN = p with A1 = 1 : 2 AN=2 −AN=2 Examples: Transform matrices for N = 2, N = 4, and N = 8 2 1 1 1 1 1 1 1 1 3 1 1 1 A2 = p 6 1 −1 1 −1 1 −1 1 −1 7 2 1 −1 6 7 6 1 1 −1 −1 1 1 −1 −1 7 6 7 2 3 1 6 1 −1 −1 1 1 −1 −1 1 7 1 1 1 1 A8 = p 6 7 8 6 1 1 1 1 −1 −1 −1 −1 7 1 6 1 −1 1 −1 7 6 7 A4 = p 6 7 6 1 −1 1 −1 −1 1 −1 1 7 4 4 1 1 −1 −1 5 6 7 4 1 1 −1 −1 −1 −1 1 1 5 1 −1 −1 1 1 −1 −1 1 −1 1 1 −1 Very simple orthogonal transform (only additions, subtractions, and final scaling) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 8 / 50 Signal-Independent Unitary Transforms / Walsh-Hadamard Transform Basis Functions of the WHT (Example for N = 8) b0 b4 b1 b5 b2 b6 b3 b7 Media coding: Walsh-Hadamard transform with strong quantization Piece-wise constant basis vectors yield subjectively disturbing artifacts Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 9 / 50 Signal-Independent Unitary Transforms / Fourier Transform Discrete Version of the Fourier Transform The Fourier Transform Fundamental transform used in mathematics, physics, signal processing, communications, ... Integral transform representing signal as integral of frequency components Forward and inverse transform are given by 1 1 Z Z X (f ) = Fx(t) = x(t) · e−2πift dt () x(t) = F −1x(t) = X (f ) · e2πift df −∞ −∞ 2πift Basis functions are complex exponentials bf (t) = e Discrete Version of the Fourier Transform Fourier transform for finite discrete signals Could also be useful for coding of discrete signals Can be derived using sampling and windowing Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 10 / 50 Signal-Independent Unitary Transforms / Fourier Transform Important Properties of the Fourier Transform n o Linearity: F a · h(t) + b · g(t) = a · H(f ) + b · G(f ) n o 1 f Scaling: F h(a · t) = · H jaj a n o −2πit0f Translation: F h(t − t0) = e · H(f ) n o 2πitf0 Modulation: F e · h(t) = H(f − f0) n o Duality: F H(t) = h(−f ) n o Z 1 Convolution: F h(t) ∗ g(t) = F g(τ) h(t − τ) dτ = H(f ) · G(f ) −∞ n o Multiplication: F h(t) · g(t) = H(f ) ∗ G(f ) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 11 / 50 Signal-Independent Unitary Transforms / Fourier Transform The Dirac Delta Function Dirac Delta Function Not a function in traditional sense Dirac delta distribution Can be thought of function with the following properties 1 +1 : x = 0 Z δ(x) = and δ(x) dx = 1 0 : x 6= 0 −∞ Important Properties Z 1 Sifting: h(t) δ(t − t0) dt = h(t0) −∞ Z 1 Convolution: h(t) ∗ δ(t − t0) = h(τ) δ(t − t0 − τ) dτ = h(t − t0) −∞ 1 ! 1 Z 1 X X Sampling: h(t) δ(t − k · t0) dt = h(k · t0) −∞ k=−∞ k=−∞ Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 12 / 50 Signal-Independent Unitary Transforms / Fourier Transform Selected Fourier Transform Pairs Dirac delta function δ(t − T ) complex exponential F δ(t − T ) T t f x(t) = δ(t − T ) X (f ) = e−2πifT = cos(2πfT ) + i sin(2πfT ) Dirac comb шT (t) Dirac comb ШT (f ) 1 T T 1 t 1 f X X шT (t) = δ(t − kT ) ШT (f ) = δ(f − k=T ) k=−∞ k=−∞ Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 13 / 50 Signal-Independent Unitary Transforms / Fourier Transform Selected Fourier Transform Pairs rectangular window rectT (t) Sinc filter F rectT (f ) T 2 T t f 1 : jtj ≤ T =2 1 rectT (t) = F rect (f ) = sin(πfT ) = T sinc(fT ) 0 : jtj > T =2 T πf Gaussian g(t) Gaussian G(f ) t f 2 1 2 g(t) = e−π·t with σ2 = G(f ) = e−π·f = g(f ) t 2π Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 14 / 50 Signal-Independent Unitary Transforms / Discrete Fourier Transform Derivation of Discrete Fourier Transform: (1) Sampling of Signal continuous signal s(t) Fourier transform S(f ) t f × (multiplication) ∗ (convolution) Dirac comb ш0(t) Dirac comb Ш0(f ) t f = = sampled signal s(t) ш0(t) Fourier transform S(f )∗Ш0(f ) t f Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 15 / 50 Signal-Independent Unitary Transforms / Discrete Fourier Transform Derivation of Discrete Fourier Transform: (2) Time Restriction sampled signal s(t) ш0(t) Fourier transform S(f )∗Ш0(f ) t f × (multiplication) ∗ (convolution) rectangular window r(t) Sinc filter R(f ) t f = = finite sampled signal s(t) ш0(t) r(t) Fourier transform S(f )∗Ш0(f )∗R(f ) t f Heiko Schwarz (Freie Universität Berlin) — Data Compression: Transform Coding in Practice 16 / 50 Signal-Independent Unitary Transforms / Discrete Fourier Transform Derivation of Discrete Fourier Transform: (3) Sampling of Spectrum finite sampled signal s(t) ш0(t) r(t) Fourier transform S(f )∗Ш0(f )∗R(f ) t f ∗ (convolution) × (multiplication) Dirac comb ш1(t) Dirac comb Ш1(f ) t f = = periodic sampled signal s(t) ш0(t) r(t) ∗ ш1(t) Fourier transform [S(f )∗Ш0(f )∗R(f )] Ш1(f ) t f Heiko Schwarz (Freie Universität Berlin) — Data Compression:
Recommended publications
  • Error Correction Capacity of Unary Coding
    Error Correction Capacity of Unary Coding Pushpa Sree Potluri1 Abstract Unary coding has found applications in data compression, neural network training, and in explaining the production mechanism of birdsong. Unary coding is redundant; therefore it should have inherent error correction capacity. An expression for the error correction capability of unary coding for the correction of single errors has been derived in this paper. 1. Introduction The unary number system is the base-1 system. It is the simplest number system to represent natural numbers. The unary code of a number n is represented by n ones followed by a zero or by n zero bits followed by 1 bit [1]. Unary codes have found applications in data compression [2],[3], neural network training [4]-[11], and biology in the study of avian birdsong production [12]-14]. One can also claim that the additivity of physics is somewhat like the tallying of unary coding [15],[16]. Unary coding has also been seen as the precursor to the development of number systems [17]. Some representations of unary number system use n-1 ones followed by a zero or with the corresponding number of zeroes followed by a one. Here we use the mapping of the left column of Table 1. Table 1. An example of the unary code N Unary code Alternative code 0 0 0 1 10 01 2 110 001 3 1110 0001 4 11110 00001 5 111110 000001 6 1111110 0000001 7 11111110 00000001 8 111111110 000000001 9 1111111110 0000000001 10 11111111110 00000000001 The unary number system may also be seen as a space coding of numerical information where the location determines the value of the number.
    [Show full text]
  • Soft Compression for Lossless Image Coding
    1 Soft Compression for Lossless Image Coding Gangtao Xin, and Pingyi Fan, Senior Member, IEEE Abstract—Soft compression is a lossless image compression method, which is committed to eliminating coding redundancy and spatial redundancy at the same time by adopting locations and shapes of codebook to encode an image from the perspective of information theory and statistical distribution. In this paper, we propose a new concept, compressible indicator function with regard to image, which gives a threshold about the average number of bits required to represent a location and can be used for revealing the performance of soft compression. We investigate and analyze soft compression for binary image, gray image and multi-component image by using specific algorithms and compressible indicator value. It is expected that the bandwidth and storage space needed when transmitting and storing the same kind of images can be greatly reduced by applying soft compression. Index Terms—Lossless image compression, information theory, statistical distributions, compressible indicator function, image set compression. F 1 INTRODUCTION HE purpose of image compression is to reduce the where H(X1;X2; :::; Xn) is the joint entropy of the symbol T number of bits required to represent an image as much series fXi; i = 1; 2; :::; ng. as possible under the condition that the fidelity of the It’s impossible to reach the entropy rate and everything reconstructed image to the original image is higher than a we can do is to make great efforts to get close to it. Soft com- required value. Image compression often includes two pro- pression [1] was recently proposed, which uses locations cesses, encoding and decoding.
    [Show full text]
  • Generalized Golomb Codes and Adaptive Coding of Wavelet-Transformed Image Subbands
    IPN Progress Report 42-154 August 15, 2003 Generalized Golomb Codes and Adaptive Coding of Wavelet-Transformed Image Subbands A. Kiely1 and M. Klimesh1 We describe a class of prefix-free codes for the nonnegative integers. We apply a family of codes in this class to the problem of runlength coding, specifically as part of an adaptive algorithm for compressing quantized subbands of wavelet- transformed images. On test images, our adaptive coding algorithm is shown to give compression effectiveness comparable to the best performance achievable by an alternate algorithm that has been previously implemented. I. Generalized Golomb Codes A. Introduction Suppose we wish to use a variable-length binary code to encode integers that can take on any non- negative value. This problem arises, for example, in runlength coding—encoding the lengths of runs of a dominant output symbol from a discrete source. Encoding cannot be accomplished by simply looking up codewords from a table because the table would be infinite, and Huffman’s algorithm could not be used to construct the code anyway. Thus, one would like to use a simply described code that can be easily encoded and decoded. Although in practice there is generally a limit to the size of the integers that need to be encoded, a simple code for the nonnegative integers is often still the best choice. We now describe a class of variable-length binary codes for the nonnegative integers that contains several useful families of codes. Each code in this class is completely specified by an index function f from the nonnegative integers onto the nonnegative integers.
    [Show full text]
  • The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3296-3414 © Research India Publications. http://www.ripublication.com The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Evon Abu-Taieh, PhD Information System Technology Faculty, The University of Jordan, Aqaba, Jordan. Abstract tree is presented in the last section of the paper after presenting the 12 main compression algorithms each with a practical This paper presents the pillars of lossless compression example. algorithms, methods and techniques. The paper counted more than 40 compression algorithms. Although each algorithm is The paper first introduces Shannon–Fano code showing its an independent in its own right, still; these algorithms relation to Shannon (1948), Huffman coding (1952), FANO interrelate genealogically and chronologically. The paper then (1949), Run Length Encoding (1967), Peter's Version (1963), presents the genealogy tree suggested by researcher. The tree Enumerative Coding (1973), LIFO (1976), FiFO Pasco (1976), shows the interrelationships between the 40 algorithms. Also, Stream (1979), P-Based FIFO (1981). Two examples are to be the tree showed the chronological order the algorithms came to presented one for Shannon-Fano Code and the other is for life. The time relation shows the cooperation among the Arithmetic Coding. Next, Huffman code is to be presented scientific society and how the amended each other's work. The with simulation example and algorithm. The third is Lempel- paper presents the 12 pillars researched in this paper, and a Ziv-Welch (LZW) Algorithm which hatched more than 24 comparison table is to be developed.
    [Show full text]
  • New Approaches to Fine-Grain Scalable Audio Coding
    New Approaches to Fine-Grain Scalable Audio Coding Mahmood Movassagh Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2015 Research Thesis submitted to McGill University in partial fulfillment of the requirements for the degree of PhD. c 2015 Mahmood Movassagh In memory of my mother whom I lost in the last year of my PhD studies To my father who has been the greatest support for my studies in my life Abstract Bit-rate scalability has been a useful feature in the multimedia communications. Without the need to re-encode the original signal, it allows for improving/decreasing the quality of a signal as more/less of a total bit stream becomes available. Using scalable coding, there is no need to store multiple versions of a signal encoded at different bit-rates. Scalable coding can also be used to provide users with different quality streaming when they have different constraints or when there is a varying channel; i.e., the receivers with lower channel capacities will be able to receive signals at lower bit-rates. It is especially useful in the client- server applications where the network nodes are able to drop some enhancement layer bits to satisfy link capacity constraints. In this dissertation, we provide three contributions to practical scalable audio coding systems. Our first contribution is the scalable audio coding using watermarking. The proposed scheme uses watermarking to embed some of the information of each layer into the previous layer. This approach leads to a saving in bitrate, so that it outperforms (in terms of rate-distortion) the common scalable audio coding based on the reconstruction error quantization (REQ) as used in MPEG-4 audio.
    [Show full text]
  • Critical Assessment of Advanced Coding Standards for Lossless Audio Compression
    TONNY HIDAYAT et al: A CRITICAL ASSESSMENT OF ADVANCED CODING STANDARDS FOR LOSSLESS .. A Critical Assessment of Advanced Coding Standards for Lossless Audio Compression Tonny Hidayat Mohd Hafiz Zakaria, Ahmad Naim Che Pee Department of Information Technology Faculty of Information and Communication Technology Universitas Amikom Yogyakarta Universiti Teknikal Malaysia Melaka Yogyakarta, Indonesia Melaka, Malaysia [email protected] [email protected], [email protected] Abstract - Image data, text, video, and audio data all require compression for storage issues and real-time access via computer networks. Audio data cannot use compression technique for generic data. The use of algorithms leads to poor sound quality, small compression ratios and algorithms are not designed for real-time access. Lossless audio compression has achieved observation as a research topic and business field of the importance of the need to store data with excellent condition and larger storage charges. This article will discuss and analyze the various lossless and standardized audio coding algorithms that concern about LPC definitely due to its reputation and resistance to compression that is audio. However, another expectation plans are likewise broke down for relative materials. Comprehension of LPC improvements, for example, LSP deterioration procedures is additionally examined in this paper. Keywords - component; Audio; Lossless; Compression; coding. I. INTRODUCTION Compression is to shrink / compress the size. Data compression is a technique to minimize the data so that files can be obtained with a size smaller than the original file size. Compression is needed to minimize the data storage (because the data size is smaller than the original), accelerate information transmission, and limit bandwidth prerequisites.
    [Show full text]
  • Answers to Exercises
    Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.
    [Show full text]
  • Lec 05 Arithmetic Coding
    ECE 5578 Multimedia Communication Lec 05 Arithmetic Coding Zhu Li Dept of CSEE, UMKC web: http://l.web.umkc.edu/lizhu phone: x2346 Z. Li, Multimedia Communciation, 2018 p.1 Outline Lecture 04 ReCap Arithmetic Coding About Homework-1 and Lab Z. Li, Multimedia Communciation, 2018 p.2 JPEG Coding Block (8x8 pel) based coding DCT transform to find sparse * representation Quantization reflects human visual system Zig-Zag scan to convert 2D to 1D string Run-Level pairs to have even more = compact representation Hoffman Coding on Level Category Quant Table: Fixed on the Level with in the category Z. Li, Multimedia Communciation, 2018 p.3 Coding of AC Coefficients Zigzag scanning: Example 8 24 -2 0 0 0 0 0 -31 -4 6 -1 0 0 0 0 0 -12 -1 2 0 0 0 0 0 0 -2 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Example: zigzag scanning result 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 0 0 -1 EOB (Run, level) representation: (0, 24), (0, -31), (1, -4), (0, -2), (1, 6), (0, -12), (3, -1), (0, -1), (3, 2), (0, -2), (5, -1), EOB Z. Li, Multimedia Communciation, 2018 p.4 Coding of AC Coefficients Run / Base Run / Base … Run / Base codeword Catg. codeword Catg. Codeword Cat. EOB 1010 - - … ZRL 1111 1111 001 0/1 00 1/1 1100 … 15/1 1111 1111 1111 0101 0/2 01 1/2 11011 … 15/2 1111 1111 1111 0110 0/3 100 1/3 1111001 … 15/3 1111 1111 1111 0111 0/4 1011 1/4 111110110 … 15/4 1111 1111 1111 1000 0/5 11010 1/5 11111110110 … 15/5 1111 1111 1111 1001 … … … … … … … ZRL: represent 16 zeros when number of zeros exceeds 15.
    [Show full text]
  • Optimization Methods for Data Compression
    OPTIMIZATION METHODS FOR DATA COMPRESSION A Dissertation Presented to The Faculty of the Graduate School of Arts and Sciences Brandeis University Computer Science James A. Storer Advisor In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy by Giovanni Motta May 2002 This dissertation, directed and approved by Giovanni Motta's Committee, has been accepted and approved by the Graduate Faculty of Brandeis University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY _________________________ Dean of Arts and Sciences Dissertation Committee __________________________ James A. Storer ____________________________ Martin Cohn ____________________________ Jordan Pollack ____________________________ Bruno Carpentieri ii To My Parents. iii ACKNOWLEDGEMENTS I wish to thank: Bruno Carpentieri, Martin Cohn, Antonella Di Lillo, Jordan Pollack, Francesco Rizzo, James Storer for their support and collaboration. I also thank Jeanne DeBaie, Myrna Fox, Julio Santana for making my life at Brandeis easier and enjoyable. iv ABSTRACT Optimization Methods for Data Compression A dissertation presented to the Faculty of the Graduate School of Arts and Sciences of Brandeis University, Waltham, Massachusetts by Giovanni Motta Many data compression algorithms use ad–hoc techniques to compress data efficiently. Only in very few cases, can data compressors be proved to achieve optimality on a specific information source, and even in these cases, algorithms often use sub–optimal procedures in their execution. It is appropriate to ask whether the replacement of a sub–optimal strategy by an optimal one in the execution of a given algorithm results in a substantial improvement of its performance. Because of the differences between algorithms the answer to this question is domain dependent and our investigation is based on a case–by–case analysis of the effects of using an optimization procedure in a data compression algorithm.
    [Show full text]
  • Lecture 14 Announcements
    Image Compression and Watermarking Image Processing CSE 166 Lecture 14 Announcements • Assignment 4 is due May 20, 11:59 PM • Assignment 5 will be released May 20 – Due May 29, 11:59 PM • Reading – Chapter 8: Image Compression and Watermarking • Sections 8.1, 8.9, 8.10, and 8.12 CSE 166, Spring 2020 2 Data compression • Data redundancy where compression ratio where CSE 166, Spring 2020 3 Data redundancy in images Coding Spatial Irrelevant redundancy redundancy information Does not need all Information is Information is 8 bits unnecessarily not useful replicated CSE 166, Spring 2020 4 Image information • Entropy where – It is not possible to encode input image with fewer than bits/pixel CSE 166, Spring 2020 5 Fidelity criteria, objective (quantitative) • Total error • Root-mean-square error • Mean-square signal to noise ratio (SNR) CSE 166, Spring 2020 6 Fidelity criteria, subjective (qualitative) CSE 166, Spring 2020 7 Approximations Objective (quantitative) quality rms error (in intensity levels) Lower is 5.17 15.67 14.17 better (a) (b) (c) (a) is better Subjective (qualitative) quality, relative than (b), (b) is better CSE 166, Spring 2020 than (c)8 Compression system CSE 166, Spring 2020 9 Compression methods • Huffman coding • Golomb coding • Arithmetic coding • Lempel-Ziv-Welch (LZW) coding • Run-length coding • Symbol-based coding • Bit-plane coding • Block transform coding • Predictive coding • Wavelet coding CSE 166, Spring 2020 10 Symbol-based coding (0,2) (3,10) … CSE 166, Spring 2020 11 Block-transform coding Encoder Decoder
    [Show full text]
  • Optimal Prefix Codes for Pairs of Geometrically-Distributed Random Variables Frédérique Bassino, Julien Clément, Gadiel Seroussi, Alfredo Viola
    Optimal prefix codes for pairs of geometrically-distributed random variables Frédérique Bassino, Julien Clément, Gadiel Seroussi, Alfredo Viola To cite this version: Frédérique Bassino, Julien Clément, Gadiel Seroussi, Alfredo Viola. Optimal prefix codes for pairs of geometrically-distributed random variables. IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2013, 59 (4), pp.2375 - 2395. 10.1109/TIT.2012.2236915. hal-00511248 HAL Id: hal-00511248 https://hal.archives-ouvertes.fr/hal-00511248 Submitted on 24 Aug 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Optimal prefix codes for pairs of geometrically-distributed random variables Fred´ erique´ Bassino LIPN UMR 7030 Universite´ Paris 13 - CNRS, France [email protected] Julien Clement´ GREYC UMR 6072 CNRS, Universite´ de Caen, ENSICAEN, France [email protected] Gadiel Seroussi HP Labs, Palo Alto, California and Universidad de la Republica,´ Montevideo, Uruguay [email protected] Alfredo Viola Universidad de la Republica,´ Montevideo, Uruguay, and LIPN UMR 7030, Universite´ Paris 13 - CNRS, France [email protected] Abstract Optimal prefix codes are studied for pairs of independent, integer-valued symbols emitted by a source with a geometric probability distribution of parameter q, 0<q<1.
    [Show full text]
  • Efficient Inverted Index Compression Algorithm Characterized by Faster
    entropy Article Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm Andrzej Chmielowiec 1,* and Paweł Litwin 2 1 The Faculty of Mechanics and Technology, Rzeszow University of Technology, Kwiatkowskiego 4, 37-450 Stalowa Wola, Poland 2 The Faculty of Mechanical Engineering and Aeronautics, Rzeszow University of Technology, Powsta´ncówWarszawy 8, 35-959 Rzeszow, Poland; [email protected] * Correspondence: [email protected] Abstract: This article deals with compression of binary sequences with a given number of ones, which can also be considered as a list of indexes of a given length. The first part of the article shows that the entropy H of random n-element binary sequences with exactly k elements equal one satisfies the inequalities k log2(0.48 · n/k) < H < k log2(2.72 · n/k). Based on this result, we propose a simple coding using fixed length words. Its main application is the compression of random binary sequences with a large disproportion between the number of zeros and the number of ones. Importantly, the proposed solution allows for a much faster decompression compared with the Golomb-Rice coding with a relatively small decrease in the efficiency of compression. The proposed algorithm can be particularly useful for database applications for which the speed of decompression is much more important than the degree of index list compression. Citation: Chmielowiec, A.; Litwin, P. Efficient Inverted Index Compression Keywords: inverted index compression; Golomb-Rice coding; runs coding; sparse binary sequence Algorithm Characterized by Faster compression Decompression Compared with the Golomb-Rice Algorithm.
    [Show full text]