H.264/MPEG-4 Advanced Video Coding Alexander Hermans
Total Page:16
File Type:pdf, Size:1020Kb
Seminar Report H.264/MPEG-4 Advanced Video Coding Alexander Hermans Matriculation Number: 284141 RWTH September 11, 2012 Contents 1 Introduction 2 1.1 MPEG-4 AVC/H.264 Overview . .3 1.2 Structure of this paper . .3 2 Codec overview 3 2.1 Coding structure . .3 2.2 Encoder . .4 2.3 Decoder . .6 3 Main differences 7 3.1 Intra frame prediction . .7 3.2 Inter frame prediction . .9 3.3 Transform function . 11 3.4 Quantization . 12 3.5 Entropy encoding . 14 3.6 In-loop deblocking filter . 15 4 Profiles and levels 16 4.1 Profiles . 17 4.2 Levels . 18 5 Evaluation 18 6 Patents 20 7 Summary 20 1 1 Introduction Since digital images have been created, the need for compression has been clear. The amount of data needed to store an uncompressed image is large. But while it still might be feasible to store a full resolution uncompressed image, storing video sequences without compression is not. Assuming a normal frame-rate of about 25 frames per second, the amount of data needed to store one hour of high definition video is about 560GB1. Compressing each image on its own, would reduce this amount, but it would not make the video small enough to be stored on today's typical storage mediums. To overcome this problem, video compression algorithms have exploited the temporal redundancies in the video frames. By using previously encoded, or decoded, frames to predict values for the next frame, the data can be compressed with such a rate that storage becomes feasible. This approach to compress videos is also called the motion compensation approach. Like earlier standards, such as MPEG-1 [4], MPEG-2 [5] and MPEG-4 advanced simple profile (ASP) [6], the MPEG-4 advanced video coding (AVC) standard [7] is a lossy compression standard and also based on the motion compensation approach. In many ways these standards are very similar, but each consecutive standard offered a higher compression, while maintaining a similar quality. This has been achieved by improving the steps in the compression algorithm, often at the cost of a higher complexity. In general it is said that MPEG-4 AVC gives double the compression rate, at the same quality, when compared to MPEG-4 ASP. MPEG-4 AVC was developed by the Moving Picture Experts Group (MPEG), together with the ITU-T Video Coding Experts Group (VCEG). Since both groups have their own way of naming new compression standards, MPEG-4 AVC is also called H.264, which is the name given by VCEG. Both groups have competed in creating better video compression standards since the early nineties. VCEG created H.261 in 1988, which was the first practical video compression standard. Many of today's video compression algorithms are similar to H.261. Over the years both the VCEG and MPEG created their own standards. In 1996 they released MPEG-2 Part 2/H.262, which was also created under cooperation of both groups. Then in 2003 they released MPEG-4 AVC/H.264. In 2012 the two teams released a first draft of the High Efficiency Video Coding (HEVC) standard. This will be the successor to MPEG-4 AVC and again it should provide double the compression rate, when compared to MPEG-4 AVC [1]. 11920 × 1080 × 3 × 25 × 3600 = 559:872:000 Bytes | {z } |{z} |{z} |{z} pixels byte/pixel framerate seconds 2 1.1 MPEG-4 AVC/H.264 Overview The MPEG-4 AVC standard was created to offer a higher compression rate, but often at the cost of a higher complexity. This compression gain is achieved by many small improvements, when compared to earlier standards. Some of these changes are better motion compensation, an image segmenta- tion into finer blocks, improved entropy encoding schemes and a mandatory deblocking filter. It is however not a single standard, but rather a family of standards. It defines a set of compression tools, which are then used or not used, based on a selected profile. There are several profiles, designed to match the standard to the user's needs. For example the baseline profile offers a low compression rate and some error resilience, while maintaining a low complexity. The main profile, on the other hand, offers high compression gain, at the cost of a high complexity. Based on these profiles the standard can be used in many fields. For example it is used to store videos on Blu-ray discs, but it is also used to stream videos on websites such as Youtube or Vimeo. 1.2 Structure of this paper Section 2 will describe a general encoder and decoder pair based on the MPEG-4 AVC standard. Here the important differences will be pointed out. These are then explained in more detail in section 3. Section 4 explains the profiles and levels, which are used to define the required capabilities of a decoder. An evaluation of the standard will be given in section 5, followed by a short discussion of some patenting issues in section 6. 2 Codec overview Before looking at the changes in detail, this section will give a high level overview of how the general encoding and decoding structure has changed in the MPEG-4 AVC standard. First the coding structure of actual data will be discussed and then a general overview of an encoder and a decoder will be shown. 2.1 Coding structure The structure in which a video is described is similar to that of earlier standards. A video sequence is split into frames, these are dissected into slices, which are groups of macroblocks. A macroblock (MB) is, as in earlier standards, a block of 16 × 16 pixels. Each macroblock contains further sub- macroblocks of 16 × 8, 8 × 16 or 8 × 8 pixels. Unlike earlier standards, these sub-macroblocks can further be refined into 8 × 4, 4 × 8 and 4 × 4 blocks. 3 The 4 × 4 pixel block is the smallest pixel block and the basic encoding unit in the MPEG-4 AVC standard. In previous standards, the 8 × 8 blocks were the smallest encoding units. Depending on the encoding format of a pixel, the luminance and chromi- nance components of a pixel change. Considering the human eye is more sensible to the luminance information, the chrominance information is stored in a sub-sampled way. For each 2x2 pixel block, each pixel has one byte of luminance information and the whole pixel block share one byte of chromi- nance for each chrominance channel (Cb and Cr). This is called the 4:2:0 format, which is also used by most of the earlier standards. For a macroblock this means that there are 16×16 pixels, with each one luminance values and together these pixels share two 8 × 8 blocks of chrominance values. Each macroblock thus consists of sixteen 4 × 4 blocks, containing luminance val- ues and four 4 × 4 blocks, containing chrominance values for each of the two channels. This gives a total of 24 4 × 4 blocks per macroblock. Initially this was the only supported format, but in later updates of the standard other encoding formats were added, such as the 4:2:2 or 4:4:4 encoding formats [8]. In the 4:4:4 format, a pixel has one byte for each of the channels. As in earlier standards, the notion of Intra-frames (I-frames), Predictive- frames (P-frames) and Bidirectional-frames (B-frames) still exists. In the MPEG-4 AVC standard the different encoding types are no longer decided on a frame level, but on a slice level. This means there are I-slices, P-slices and B-slices. I-frames, P-frames and B-frames can easily be simulated. This can be done by either just using one slice per frame, or by only using the same slice type for all the slices in a frame. Frames only containing I-slices are called instantaneous decoder refresh (IDR) frames. Once an IDR frame is decoded, no previous frames will be used for motion prediction. These frames can thus be used as random entry points. The I-slices do not use any motion prediction and offer the least com- pression gain. The macroblocks in P-slices can use motion prediction, but only from one slice in the future or the past. In previous standards, P-frames were only allowed to use the previous I-frame as a references frame for mo- tion prediction. Macroblocks in B-Slices use one slice in the future and one slice in the past for motion prediction. Unlike previous standards, it is also allowed to have both predictions in the future or the past. Furthermore B-Slices can be stored and also used for motion prediction. Macroblocks in both P-Slices and B-Slices can also be encoded without using motion prediction, if this gives better results. 2.2 Encoder As in previous standards, the encoder is not part of the actual standard. How the resulting implementation works is left to the developer, but the resulting bitstream has to be in a format specified by the standard. Still, 4 Dissect into frames, slices, Video data macroblocks and sub-blocks Intra Intra frame Encoding? prediction Inter Data pool containing Inter frame previous data prediction Integrated Quantize Transform decoder Estimated motion vectors, Lossless File/Stream parameters, encoding headers etc. Figure 1: A general MPEG-4 AVC encoder flowchart. The raw video data is first dissected down to the 4 × 4 blocks. For each block a prediction is created by either inter or intra frame prediction. The prediction is subtracted from the original data to obtain the residual data.