Efficient Video Coding with Motion-Compensated Orthogonal
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Video Coding with Motion-Compensated Orthogonal Transforms DU LIU Master’s Degree Project Stockholm, Sweden 2011 XR-EE-SIP 2011:011 Efficient Video Coding with Motion-Compensated Orthogonal Transforms Du Liu July, 2011 Abstract Well-known standard hybrid coding techniques utilize the concept of motion- compensated predictive coding in a closed-loop. The resulting coding de- pendencies are a major challenge for packet-based networks like the Internet. On the other hand, subband coding techniques avoid the dependencies of predictive coding and are able to generate video streams that better match packet-based networks. An interesting class for subband coding is the so- called motion-compensated orthogonal transform. It generates orthogonal subband coefficients for arbitrary underlying motion fields. In this project, a theoretical lossless signal model based on Gaussian distribution is proposed. It is possible to obtain the optimal rate allocation from this model. Addition- ally, a rate-distortion efficient video coding scheme is developed that takes advantage of motion-compensated orthogonal transforms. The scheme com- bines multiple types of motion-compensated orthogonal transforms, variable block size, and half-pel accurate motion compensation. The experimental results show that this scheme outperforms individual motion-compensated orthogonal transforms. i Acknowledgements This thesis was carried out at Sound and Image Processing Lab, School of Electrical Engineering, KTH. I would like to express my appreciation to my supervisor Markus Flierl for the opportunity of doing this thesis. I am grateful for his patience and valuable suggestions and discussions. Many thanks to Haopeng Li, Mingyue Li, and Zhanyu Ma, who helped me a lot during my research. I would also like to thank my parents and my friends Alicia Wang and Peng Wu for their constant support. ii Contents Abstract i Acknowledgements ii 1 Introduction 1 2 Background 3 2.1 Motion-Compensated Orthogonal Transforms . .3 2.1.1 Motion Compensation . .3 2.1.2 The Orthogonal Transform . .5 2.2 Adaptive Spatial Wavelet Transforms . .7 2.2.1 Type-1 Spatial Wavelet Transform . .7 2.2.2 Type-2 Spatial Wavelet Transform . .8 2.3 Quantization . .8 2.4 Entropy Coding . 10 I Theoretical Model 12 3 Theoretical Signal Model 13 3.1 General Transform Model . 13 3.2 Memoryless Gaussian Model . 16 4 Numerical Results 20 II Practical System 24 5 Efficient Video Coding Scheme 25 5.1 Construction of Various MCOTs . 25 5.1.1 Multiple Types of MCOT . 25 5.1.2 Multi-hypothesis MCOT . 26 5.2 Obtaining Motion Vectors . 29 5.3 Variable Block Size . 30 iii CONTENTS iv 5.4 Mode Decision . 31 6 Experimental Results 33 7 Conclusions 36 Bibliography 37 List of Figures 2.1 Blocked based motion-compensatioin with two matching blocks x1,i and x2,j and a motion vector mv..............4 2.2 Half-pel accurate motion compensation for integer position A. Position 1 to Position 8 are the possible half-pel positions for Position A. .4 2.3 The distribution of a 2-dimensional noised image for (a) Haar wavelet transform with a rotation of 45◦ and (b) MCOT with an optimal decorrelation angle α∗................6 2.4 Type-2 spatial wavelet transform of Lena with three decom- position levels. .9 2.5 Structure of a bitstream for one code-block. Sign: Signs of the coefficients. SP: Significant Propagation Pass. MR: Mag- nitude Refinement Pass. CP: Cleanup Pass. 11 3.1 Theoretical signal model. 13 3.2 The theoretical curve g(Rp) of the variance of the clean high band over Rp............................ 14 3.3 The theoretical curve of the total rate ht over Rp........ 15 3.4 Different g for different f..................... 18 3.5 Different hc for different f.................... 19 4.1 The total rate ht over Rp with γ = 9 for different noise levels. 2 g0 = σv = 1............................. 21 4.2 The rate of the coefficients hc over Rp with γ = 9 for different 2 noise levels. g0 = σv = 1...................... 22 5.1 Efficient video coding system. 26 5.2 Multi-hypothesis for bidirectional half-pel motion estimation. 27 5.3 An example of 6-hypothesis motion estimation. 28 5.4 Partitions of a macroblock of 16x16 for motion estimation. 30 5.5 Structure of the minimization of the cost function with the three levels. 31 v LIST OF FIGURES vi 6.1 Luminance PSNR vs. rate for the QCIF sequence F oreman at 30fps with 64 frames and a GOP size of 8 frames. The compared transforms include the proposed MCOT, the bidi- rectional MCOT with variable block size (VBS) and half-pel motion compensation (HP), the bidirectional MCOT without VBS or HP, the Haar wavelet transform without VBS or HP, and the intra coding. 34 6.2 Luminance PSNR vs. rate for the QCIF sequence Mother & Daughter at 30fps with 64 frames and a GOP size of 8 frames. The compared transforms include the proposed MCOT, the bidirectional MCOT with variable block size (VBS) and half- pel motion compensation (HP), the bidirectional MCOT with- out VBS or HP, the Haar wavelet transform without VBS or HP, and the intra coding. 35 Chapter 1 Introduction Video communication has been broadly used in today’s communication and visual services such as terrestrial broadcast, cable TV, satellite TV, real- time conversation, Internet video, and so on. For all these applications, video coding techniques play an important part in storage, transmission, and representation of video data. Since the storage space or the transmission bandwidth is usually limited, most video coding schemes are lossy. There is obviously a trade-off between the video quality and the hardware and software requirements. Thus for a video coding technique, it is expected to code the video sequences efficiently such that the decoded video will provide with the highest possible quality for a given storage space or a given data rate. The standard video compression techniques, such as H.261 [1], H.263 [2], MPEG-1 [3], MPEG-4 Part2 [4], and more recently, H.264/AVC [5], uti- lize the concept of motion-compensated predictive coding. Predicted frames (known as P-frame) and bi-predicted frames (B-frame) are used to exploit the temporal redundancy of the sequences with one key frame (I-frame) for each group of pictures (GOP). Because predictive coding is developed in a close-loop fashion, the coded videos heavily depend on the relationship of the successive pictures. These dependencies introduce the risk of error prop- agation to the subsequently decoded pictures, which might be suboptimal in packet loss channels [6]. On the other hand, the motion-compensated orthogonal transform (MCOT) is a subband coding technique that operates in an open-loop fashion. It does not depend on predictive coding and, there- fore, avoids the error propagation. Thus it is more suitable for packet based networks like the Internet. The motion-compensated orthogonal transform is a class of subband coding techniques. It generates orthogonal subband coefficients for arbitrary underlying motion fields. The goal of this project is to develop a rate-distortion efficient video cod- ing scheme that takes advantage of motion-compensated orthogonal trans- forms. A theoretical transform coding model is proposed to analyze the 1 CHAPTER 1. INTRODUCTION 2 optimal rate allocation. The performance of the practical system will be evaluated by peak signal-to-noise ratio (PSNR). The report is organized as follows: Chapter 2 introduces the background of the motion compensated orthogonal transforms, the adaptive spatial wavelet transforms, the quantization, and the entropy coding. Chapter 3 proposes a theoretical signal model for the transform coding. Numerical results for the theoretical model are presented in Chapter 4. Chapter 5 describes the implemented video coding system. Chapter 6 presents the experimental results for the coding system. Chapter 2 Background 2.1 Motion-Compensated Orthogonal Transforms The class of MCOTs include the unidirectional motion-compensated orthog- onal transform [7], the bidirectional motion-compensated orthogonal trans- form [8], a half-pel motion accurate transform [9], and a multi-hypothesis transform [10]. In this thesis, various types of MCOTs are combined with various motion models to achieve an efficient adaption of the actual motion of the coded image sequence. 2.1.1 Motion Compensation Motion compensation describes the similarity of consequent pictures. It is very commonly used in today’s video coding techniques. Usually, a sequence of successive frames are similar. Motion compensation is used to explore the redundancy of this kind of information. Applying this algorithm to the Internet video services, one can save bits from several megabytes per second to 10 kbps [11]. In block-based motion compensation each frame is divided into blocks, such as 8 × 8 pixels or 16 × 16 pixels in each block. A reference frame is defined and the motion-compensation algorithm searches the most similar block that best matches the current processing block. A motion vector is used to indicate the shift between the current block and the reference block. Fig. 2.1 depicts the two matching blocks x1,i in x1 and x2,j in x2. x2,j is the current processing block. x1 is the reference frame in which x2,j will find the most matching block with a motion vector mv. The system searches the most matching block. The criteria is usually Sum of Squared Differences (SSD) or Sum of Absolute Differences (SAD). The values of motion vector need not to be integer. It can be sub-samples such as half pixel or quarter pixel position.It will provide more accurate motion compensation for the blocking matching scheme and therefore reduce the information in the residual signals. 3 CHAPTER 2. BACKGROUND 4 Figure 2.1: Blocked based motion-compensatioin with two matching blocks x1,i and x2,j and a motion vector mv.