Entropy Coding of Adaptive Sparse Data Representations by Context Mixing
Total Page:16
File Type:pdf, Size:1020Kb
Entropy Coding of Adaptive Sparse Data Representations by Context Mixing von Thomas Wenzel Matrikel-Nr. 5960073 geboren am 23.07.1987 Masterarbeit im Studiengang Mathematik Universität Hamburg 2013 Gutachter: Prof. Dr. Armin Iske Zweitgutachter: Dr. Laurent Demaret Erklärung Die vorliegende Arbeit habe ich selbständig verfasst und keine anderen als die angegebe- nen Hilfsmittel - insbesondere keine im Quellenverzeichnis nicht benannten Internet-Quellen - benutzt. Die Arbeit habe ich vorher nicht in einem anderen Prüfungsverfahren eingere- icht. Die eingereichte schriftliche Fassung entspricht genau der auf dem elektronischen Speichermedium. Abstract In this thesis improvements on the video compression scheme AT3D, which relies on linear spline interpolation over anisotropic Delaunay tetrahedralizations, are presented by introducing a more sophisticated entropy coding stage and optimizing several intermediate computations to reduce compression time. A detailed literature summary on information theory basics and a detailed implementation summary of AT3D is presented. The original entropy coding stage is replaced by the core of the general purpose data compression scheme PAQ by Matt Mahoney, which employs a geometric context-mixing technique based on online convex programming for coding cost minimization. Novel models for pixel position- and grayscale value encoding are incorporated into the new AT3D_PAQ implementation and investigated numerically. A significant average improvement was achieved on a wide variety of test sequences, and in a comparison with H.264 a sequence class is presented, in which AT3D_PAQ outperforms H.264. Theoretical and numerical results additionally demonstrate the higher computational efficiency of AT3D_PAQ and show significantly reduced compression times in a variety of test environments. Contents 1 Introduction3 1.1 Thesis organization and contributions ..................... 6 2 Fundamentals8 2.1 Notational conventions.............................. 8 2.2 A brief introduction to video compression................... 8 2.2.1 Summary and outlook.......................... 13 2.3 MPEG4-AVC/H.264............................... 14 2.3.1 Overview ................................. 14 2.3.2 Coding units ............................... 14 2.3.3 Intra- and inter-frame prediction.................... 16 2.3.4 Transform coding............................. 17 2.3.5 Scan, quantization and entropy coding................. 18 2.3.6 In-loop deblocking filter......................... 18 2.3.7 Rate control................................ 19 2.3.8 Summary ................................. 19 2.4 Introduction to information theory....................... 20 2.4.1 Basic definitions ............................. 20 2.4.2 Entropy.................................. 21 2.4.3 Linking entropy to data compression.................. 25 2.4.4 Arithmetic coding ............................ 29 2.4.5 Separation of modeling and encoding.................. 34 2.4.6 Summary ................................. 35 2.5 AT3D ....................................... 36 2.5.1 Delaunay tetrahedralizations ...................... 37 2.5.2 Linear splines over Delaunay tetrahedralizations ........... 38 2.5.3 Pre-processing .............................. 40 2.5.4 Adaptive thinning ............................ 41 2.5.5 Post-processing: Pixel exchange..................... 43 2.5.6 Post-processing: Best approximation.................. 45 2.5.7 Quantization ............................... 46 2.5.8 Implementation of arithmetic coding.................. 47 2.5.9 Entropy coding: Pixel positions..................... 47 2.5.10 Entropy coding: Grayscale values.................... 51 2.5.11 Numerical results regarding stationarity................ 54 3 Development of AT3D_PAQ 55 3.1 PAQ........................................ 56 3.1.1 PAQ8 - Building blocks ......................... 57 3.1.2 Context mixing in PAQ1-3 ....................... 60 3.1.3 Context mixing in PAQ4-6 ....................... 61 3.1.4 Context mixing in PAQ7-8: Logistic/geometric mixing . 64 3.1.5 PAQ8 - Modeling tools.......................... 67 3.1.6 PAQ8 - Adaptive probability maps................... 68 3.2 Integration of PAQ into AT3D ......................... 69 3.3 Modifications of PAQ after the integration................... 70 3.3.1 Modeling pixel positions......................... 71 3.3.2 Modeling grayscale values........................ 75 3.4 Numerical results: AT3D_PAQ vs. AT3D................... 79 3.4.1 Pixel position encoding results ..................... 81 3.4.2 Grayscale value encoding results .................... 85 3.4.3 Total compression results for selected sequences............ 85 3.4.4 Summary and analysis of overall results ................ 90 3.4.5 Compression time comparison: AT3D_PAQ vs. AT3D . 90 3.5 Numerical results: AT3D_PAQ vs. H.264................... 91 3.5.1 Summary and analysis of results .................... 98 3.6 Conclusion .................................... 98 3.7 Outlook...................................... 99 4 Runtime optimization 100 4.1 Improvements of the AT3D implementation..................101 4.1.1 Assignment of pixels to tetrahedra...................101 4.1.2 Custom queue implementation .....................117 4.1.3 Further improvements..........................119 4.2 Numerical results.................................120 4.3 Conclusion ....................................127 4.4 Outlook......................................127 4.5 Further code improvements ...........................128 Appendices 129 A List of used hard- and software 129 B Test video sequences 130 C Detailed compression results 133 D Detailed H.264 results 137 E Detailed runtime results 139 1 Introduction 1 Introduction Given a sequence S of symbols from an alphabet A, that is to be stored on a medium or transmitted via a channel, it is natural to ask which is the shortest representation, that allows an exact reconstruction of S, and how to obtain it. This requires the process of lossless data compression or entropy coding, a transformation of an input sequence into a preferably shorter output sequence. First answers were found by Shannon in 1948, when he created the field of information theory. Shannon introduced the concept of entropy of a sequence S. Entropy may be measured in bits per symbol and refers to the expected information content per symbol of a sequence with respect to a probability distribution P of the symbols in S. One of the most important results of his work was showing that entropy is a lower bound for the minimal length of an encoded sequence, which still allows exact reconstruction. We will present results from the information theory literature ex- tending Shannon’s results to sources S, which are defined as random processes generating sequences Si, and their entropy rates, which describe the expected information content per symbol of a sequence emitted by S. Those results show that a lower bound for the expected code-length per symbol of such a sequence Si exists and that it is given by the entropy rate of S, if S is stationary. The latter notion refers to the independence of joint probability distributions of arbitrary symbols from their time of emission. Even if S is not stationary, which is the case in our application, a lower expected code length bound per symbol is given by the joint entropy of a finite consecutive subsequence of random variables that S consists of. It is calculated based on the joint probability distribution of those random variables. Unfortunately, in practical applications all of these lower bounds are not computable. The above concepts and results will later be introduced in detail and provide a theoretical background for data compression and lead to a better understanding of our later goals. We will focus on a specific kind of data to compress: video data. It is a digital representation of a sequence of images. The need for efficient video compression schemes is due to the large sizes of typical video sequences and the variety of fields, where video sequences have to be stored or transmitted. Most video compression schemes, and in particular the ones considered in this thesis, are lossy schemes, which introduce an error in the reconstructed video data. In 1989 the scheme H.262/MPEG-1 was introduced, which created the basis of today’s state-of-the-art general purpose compression scheme H.264/MPEG4-AVC. Both rely on intra-frame compression via a discrete cosine transform (or a closely related transform respectively) and translational inter-frame motion compensation of pixel-blocks. Besides a vast number of small improvements, H.264 additionally introduced an in-loop deblocking filter, which improved the visual quality of its output significantly by reducing 5 the tendency to produce visible block-artifacts. A detailed introduction to the specification of H.264 is given in a dedicated chapter. In this thesis we focus on the improvement of a different compression scheme, yet we will compare it to H.264 later: AT3D. The latter was developed by Demaret, Iske, and Khachabi, based on the two-dimensional image-compression scheme AT (adaptive thinning). Given data is interpreted as a 3d scalar field with domain X, and the scheme then relies on linear spline interpolation over anisotropic Delaunay tetrahedralizations. The linear spline space is indirectly determined by the greedy iterative adaptive thinning algorithm, which selects a sparse set Xn of significant pixels from the initial grid by removing one least 3 significant pixel or edge in each iteration.