Source Coding: Part I of Fundamentals of Source and Video Coding
Total Page:16
File Type:pdf, Size:1020Kb
Foundations and Trends R in sample Vol. 1, No 1 (2011) 1–217 c 2011 Thomas Wiegand and Heiko Schwarz DOI: xxxxxx Source Coding: Part I of Fundamentals of Source and Video Coding Thomas Wiegand1 and Heiko Schwarz2 1 Berlin Institute of Technology and Fraunhofer Institute for Telecommunica- tions — Heinrich Hertz Institute, Germany, [email protected] 2 Fraunhofer Institute for Telecommunications — Heinrich Hertz Institute, Germany, [email protected] Abstract Digital media technologies have become an integral part of the way we create, communicate, and consume information. At the core of these technologies are source coding methods that are described in this text. Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are de- scribed: entropy coding, quantization as well as predictive and trans- form coding. The emphasis is put onto algorithms that are also used in video coding, which will be described in the other text of this two-part monograph. To our families Contents 1 Introduction 1 1.1 The Communication Problem 3 1.2 Scope and Overview of the Text 4 1.3 The Source Coding Principle 5 2 Random Processes 7 2.1 Probability 8 2.2 Random Variables 9 2.2.1 Continuous Random Variables 10 2.2.2 Discrete Random Variables 11 2.2.3 Expectation 13 2.3 Random Processes 14 2.3.1 Markov Processes 16 2.3.2 Gaussian Processes 18 2.3.3 Gauss-Markov Processes 18 2.4 Summary of Random Processes 19 i ii Contents 3 Lossless Source Coding 20 3.1 Classification of Lossless Source Codes 21 3.2 Variable-Length Coding for Scalars 22 3.2.1 Unique Decodability 22 3.2.2 Entropy 27 3.2.3 The Huffman Algorithm 29 3.2.4 Conditional Huffman Codes 31 3.2.5 Adaptive Huffman Codes 33 3.3 Variable-Length Coding for Vectors 33 3.3.1 Huffman Codes for Fixed-Length Vectors 34 3.3.2 Huffman Codes for Variable-Length Vectors 36 3.4 Elias Coding and Arithmetic Coding 40 3.4.1 Elias Coding 41 3.4.2 Arithmetic Coding 47 3.5 Probability Interval Partitioning Entropy Coding 52 3.6 Comparison of Lossless Coding Techniques 61 3.7 Adaptive Coding 62 3.8 Summary of Lossless Source Coding 64 4 Rate Distortion Theory 66 4.1 The Operational Rate Distortion Function 67 4.1.1 Distortion 68 4.1.2 Rate 70 4.1.3 Operational Rate Distortion Function 70 4.2 The Information Rate Distortion Function 72 4.2.1 Mutual Information 72 4.2.2 Information Rate Distortion Function 76 4.2.3 Properties of the Rate Distortion Function 80 4.3 The Shannon Lower Bound 81 4.3.1 Differential Entropy 81 4.3.2 Shannon Lower Bound 85 4.4 Rate Distortion Function for Gaussian Sources 89 4.4.1 Gaussian IID Sources 90 4.4.2 Gaussian Sources with Memory 91 Contents iii 4.5 Summary of Rate Distortion Theory 98 5 Quantization 100 5.1 Structure and Performance of Quantizers 101 5.2 Scalar Quantization 104 5.2.1 Scalar Quantization with Fixed-Length Codes 106 5.2.2 Scalar Quantization with Variable-Length Codes 111 5.2.3 High-Rate Operational Distortion Rate Functions 119 5.2.4 Approximation for Distortion Rate Functions 125 5.2.5 Performance Comparison for Gaussian Sources 127 5.2.6 Scalar Quantization for Sources with Memory 129 5.3 Vector Quantization 133 5.3.1 Vector Quantization with Fixed-Length Codes 133 5.3.2 Vector Quantization with Variable-Length Codes 137 5.3.3 The Vector Quantization Advantage 138 5.3.4 Performance and Complexity 142 5.4 Summary of Quantization 144 6 Predictive Coding 146 6.1 Prediction 148 6.2 Linear Prediction 152 6.3 Optimal Linear Prediction 154 6.3.1 One-Step Prediction 156 6.3.2 One-Step Prediction for Autoregressive Processes 158 6.3.3 Prediction Gain 160 6.3.4 Asymptotic Prediction Gain 160 6.4 Differential Pulse Code Modulation (DPCM) 163 6.4.1 Linear Prediction for DPCM 165 6.4.2 Adaptive Differential Pulse Code Modulation 172 6.5 Summary of Predictive Coding 174 7 Transform Coding 176 7.1 Structure of Transform Coding Systems 179 7.2 Orthogonal Block Transforms 180 iv Contents 7.3 Bit Allocation for Transform Coefficients 187 7.3.1 Approximation for Gaussian Sources 188 7.3.2 High-Rate Approximation 190 7.4 The Karhunen Lo`eve Transform (KLT) 191 7.4.1 On the Optimality of the KLT 193 7.4.2 Asymptotic Operational Distortion Rate Function 197 7.4.3 Performance for Gauss-Markov Sources 199 7.5 Signal-Independent Unitary Transforms 200 7.5.1 The Walsh-Hadamard Transform (WHT) 201 7.5.2 The Discrete Fourier Transform (DFT) 201 7.5.3 The Discrete Cosine Transform (DCT) 203 7.6 Transform Coding Example 205 7.7 Summary of Transform Coding 207 8 Summary 209 Acknowledgements 212 References 213 1 Introduction The advances in source coding technology along with the rapid develop- ments and improvements of network infrastructures, storage capacity, and computing power are enabling an increasing number of multime- dia applications. In this text, we will describe and analyze fundamental source coding techniques that are found in a variety of multimedia ap- plications, with the emphasis on algorithms that are used in video cod- ing applications. The present first part of the text concentrates on the description of fundamental source coding techniques, while the second part describes their application in modern video coding. The application areas of digital video today range from multi- media messaging, video telephony, and video conferencing over mo- bile TV, wireless and wired Internet video streaming, standard- and high-definition TV broadcasting, subscription and pay-per-view ser- vices to personal video recorders, digital camcorders, and optical stor- age media such as the digital versatile disc (DVD) and Blu-Ray disc. Digital video transmission over satellite, cable, and terrestrial channels is typically based on H.222.0/MPEG-2 systems [37], while wired and wireless real-time conversational services often use H.32x [32, 33, 34] or SIP [64], and mobile transmission systems using the Internet and mo- 1 2 Introduction bile networks are usually based on RTP/IP [68]. In all these application areas, the same basic principles of video compression are employed. Video s Video b Channel Modu- Scene Capture Encoder Encoder lator Video Error Control Channel Channel Codec Human Video s' Video b' Channel Demodu- Observer Display Decoder Decoder lator Fig. 1.1 Typical structure of a video transmission system. The block structure for a typical video transmission scenario is il- lustrated in Fig. 1.1. The video capture generates a video signal s that is discrete in space and time. Usually, the video capture consists of a camera that projects the 3-dimensional scene onto an image sensor. Cameras typically generate 25 to 60 frames per second. For the con- siderations in this text, we assume that the video signal s consists of progressively-scanned pictures. The video encoder maps the video sig- nal s into the bitstream b. The bitstream is transmitted over the error control channel and the received bitstream b′ is processed by the video decoder that reconstructs the decoded video signal s′ and presents it via the video display to the human observer. The visual quality of the decoded video signal s′ as shown on the video display affects the view- ing experience of the human observer. This text focuses on the video encoder and decoder part, which is together called a video codec. The error characteristic of the digital channel can be controlled by the channel encoder, which adds redundancy to the bits at the video encoder output b. The modulator maps the channel encoder output to an analog signal, which is suitable for transmission over a phys- ical channel. The demodulator interprets the received analog signal as a digital signal, which is fed into the channel decoder. The chan- nel decoder processes the digital signal and produces the received bit- stream b′, which may be identical to b even in the presence of channel noise. The sequence of the five components, channel encoder, modula- 1.1. The Communication Problem 3 tor, channel, demodulator, and channel decoder, are lumped into one box, which is called the error control channel. According to Shannon’s basic work [69, 70] that also laid the ground to the subject of this text, by introducing redundancy at the channel encoder and by introducing delay, the amount of transmission errors can be controlled. 1.1 The Communication Problem The basic communication problem may be posed as conveying source data with the highest fidelity possible without exceeding an available bit rate, or it may be posed as conveying the source data using the lowest bit rate possible while maintaining a specified reproduction fi- delity [69]. In either case, a fundamental trade-off is made between bit rate and signal fidelity. The ability of a source coding system to suit- able choose this trade-off is referred to as its coding efficiency or rate distortion performance. Video codecs are thus primarily characterized in terms of: throughput of the channel: a characteristic influenced by the • transmission channel bit rate and the amount of protocol and error-correction coding overhead incurred by the trans- mission system; distortion of the decoded video: primarily induced by the • video codec and by channel errors introduced in the path to the video decoder. However, in practical video transmission systems, the following addi- tional issues must be considered: delay: a characteristic specifying the start-up latency and • end-to-end delay.