Coding Mode Analysis of MPEG-2 to H.264/AVC Transcoding for Digital TV Applications
Total Page:16
File Type:pdf, Size:1020Kb
Coding Mode Analysis of MPEG-2 to H.264/AVC Transcoding for Digital TV Applications Yi-Nung Liu, Chi-Sun Tang, and Shao-Yi Chien Media IC and System Lab Graduate Institute of Electronics Engineering and Department of Electrical Engineering National Taiwan University BL-421, 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan [email protected] TABLE I Abstract— MPEG-2 to H.264/AVC transcoding is an important module for video recoding in digital TV applications. For pixel CODING MODE COMPARISON BETWEEN MPEG-2 AND H.264. domain transcoding, MPEG-2 bitstream is decoded and then re- encoded by H.264 encoder. Since the behavior of the decoded MPEG-2 H.264 Baseline Profile video is different from the original video, in this paper, the ME Block Size 16x16 VBS-ME performance of each coding mode is analyzed to select the effec- ME Reference Frame 2 (B-frame) 5 (MRF-ME) tive coding tools. It is shown from the analysis that transcoding ME Accuracy 1/2 1/4 with motion estimation with only one reference frame and 16x16 Intra Prediction No Spatial domain block size and deblocking filter can achieve almost the same Block Transform 8x8 DCT 4x4 Integel DCT Quantization Linear or non-linear Exponential video quality with only 19% of the computation. The analysis Rate-Distortion-Optimization No Yes result could be an important reference for the implementation Deblocking Filter No In loop filter of MPEG-2 to H.264 transcoder. Entropy Coding VLC CAVLC or CABAC I. INTRODUCTION One of the most important applications of MPEG-2 video coding standard [1] is digital TV (DTV) broadcasting. When coding tools. Thus it is necessary to find computationally using the recording function of a set-up box, the restriction efficient approaches for transcoding. [3]. of hard disk capacity is a big problem. The newly established Although coding mode analysis for the H.264 encoder can video coding standard, H.264/MPEG-4 AVC [2], can achieve be found in literatures. There are two reasons for which the much higher coding efficiency than MPEG-2 does. Converting complete coding mode analysis is still required for MPEG-2 the MPEG-2 videos to the H.264 videos can achieve almost to H.264 transcoding. The first reason is the fundamental dif- the same quality while requiring only half bitrate. That is why ference between MPEG-2 and H.264. The transform methods transcoding becomes a hot research topic. are different, and the coefficient mapping is also a problem The differences of coding tools between MPEG-2 and because of the exponential quantizer of H.264. The second H.264 are shown in Table I. For motion estimation (ME) reason is that since the input videos of the transcoder have tools, H.264 provides much more coding modes than MPEG- been compressed by MPEG-2, the behavior is quite different 2 does. MPEG-2 can only support fixed block size (16x16) from uncompressed video sequences. Some useful coding tools while H.264 can support variable block size ME (VBS-ME), may be no more effective in transcoding. where block sizes of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, Consequently, in this paper, the performance of different and 4x4 can be used. MPEG-2 can support at most two coding tools of H.264 in MPEG-2 to H.264 transcoding is reference frames for ME (B-frame) while H.264 can support at analyzed. Based on the analysis, the effective coding tools can most five reference frames, which is called multiple reference be selected. which is a good reference for the implementation frame ME (MRF-ME); MPEG-2 can support ME accuracy of MPEG-2 to H.264 transcoder. to half-pel while H.264 can support quarter-pel ME. The The organization of this paper is shown below. The variety of ME modes of H.264 also leads to large increase transcoding algorithm is first described in Sec. II. Next, in of computational complexity. H.264 also provides the coding Sec. III, the experiment environment will be introduced, and tools for rate-distortion-optimization (RDO) to find out the then the simulation results will be shown. Finally, a short optimal coding modes. The in-loop deblocking filter (DF), in conclusion is given in Sec. V. addition, can enhance the visual quality. Other new coding tools, such as intra-prediction, quantization tools, and entropy II. TRANSCODING ALGORITHMS coding tools with CAVLC and CABAC, are also employed Video transcoding is to convert a video from one format in H.264 to further increase the coding efficiency. But the into another format [4]. A format is defined by many char- computational complexity dramatically increases with these acteristics, such as the bit rate, frame rate, spatial resolution, coding syntax, and content. The transcoding algorithms can Transcoder Raw MPEG-2 be classified into two categories: frequency domain (transform Original Video TM5 MPEG-2 TM5 MPEG-2 sequence Bitstream Encoder Decoder I B domain) transcoding and pixel domain transcoding [5]. O MPEG2 Raw Bit PSNR Sequence Rate A. Transform Domain Transcoding IMPEG2 Raw H.264 JM10 H.264 JM10 H.264 sequence Bitstream In transform domain transcoding, an input bitstream is Transcoded Video Decoder Encoder I B transcoded in DCT domain without fully decoded to video H264 H264 RD frames. It is generally used in homogeneous transcoding PSNR Curve Bit Rate [5] because of the identical DCT/IDCT process. Transform domain transcoding theoretically reduces the computation be- cause most transform operations of encoder are skipped[6] . The drawback of this scheme is the error drift problem [7]. Fig. 1. Flowchart of the experiments. B. Pixel Domain Transcoding TABLE II H.264 JM10.1 TEST CONFIGURATIONS. Pixel domain transcoding is to transcode a compressed video by fully decoding the bitstream to video frames in pixel domain. This scheme is flexible since the decoding loop and Configuration No. Ref. VBME DF RD Opt. FME I4 the encoding loop can be totally independent of each other. In (a) Ref. Frames = 5 5 4x4 to 16x16 on on on on pixel domain transcoding, the abundant useful information in (b) Ref. Frames = 2 2 4x4 to 16x16 on on on on (c) Ref. Frames = 1 1 4x4 to 16x16 on on on on the input bitstream, such as MVs or QP parameters, can help (d) VBS-ME 16x16 8x8 1 8x8 and 16x16 on on on on reducing computation, and the related researches can be found (e) VBS-ME 16x16 only 1 16x16 on on on on in [8] [9]. (f) DF off 1 4x4 to 16x16 off on on on (g) RDO off 1 4x4 to 16x16 on off on on (h) RDO off DB off 1 4x4 to 16x16 off off on on C. MPEG-2 to H.264 Transcoding (i) FME off 1 4x4 to 16x16 on on off on (j) I4 off 1 4x4 to 16x16 on on on off In H.264, both the intra prediction and the in-loop de- blocking filter are operations in pixel domain. Furthermore, most of the core functions in a macroblock (MB) processing unit cannot be replaced easily. Therefore, the mainstream of B. Transcoding Analysis MPEG-2 to H.264 transcoding is pixel domain transcoding. The flowchart of the experiments is shown in Fig. 1. In the experiments, pixel domain cascaded transcoder is considered, III. EXPERIMENT ENVIRONMENT SETUP and the TM5 MPEG-2 codec is cascaded by JM10 H.264 The experiments are designed to analyze the transcoding codec. The raw video sequences, IO, are first encoded by performance of different coding tools of an H.264 encoder. TM5 encoder to MPEG-2 bitstream, BMPEG2, at appropriate Each major coding tool is turned on or off and the rate- bit rate for DTV applications (2MB/s for CIF; 6Mb/s for distortion (RD) curves are plotted for different test sequences. D1(480p); 15Mb for HD720p), and then decoded to YUV The TM5 and JM10 codec available on the web are used as sequences, IMPEG2. The YUV sequences are slightly different standard MPEG-2 and H.264 codec. The simulation platform from the original IO because of the lossy compression. Next, is a PC with Intel Pentium4 3GHz CPU and 1GB RAM, and sequences IMPEG2 are re-encoded by H.264 video encoder the runtime of the encoder is used to evaluate the computa- to generate H.264 bitstream, BH264, with different coding tional complexity. configurations, as shown in Table II. After the H.264 bitstream are decoded, the PSNR values of the decoded video sequences, A. Test Sequences IH264, where the original sequences IO are used as the To derive convincible results, nine standard sequences reference, are calculated. Finally, the relationship between with different frame sizes and characteristics are used as coding tools and the corresponding RD curves and CPU time test sequences. Some sequences have large motion such as are recorded as the experimental results. Stefan,Foreman, Soccer,andNight; some have complicated texture such as Mobile (Mobile and Calendar), Preakness, C. Reference Transcoder Night,andHarbour; some have great illuminance change The reference (or optimal) transcoder is a pixel domain in a short time like Crew; some have complicated motion cascaded transcoder, where the TM5 MPEG-2 decoder is like Preakness. The variety of sequences makes experimental cascaded by a JM10 H.264 baseline encoder with all coding result reliable and convincible. All CIF sequences, including tools. It is denoted as (a) “Ref. Frames = 5” in Table II. Foreman, Mobile and Stefan contain 100 frames; D1 (480p) In MPEG-2 to H.264 transcoding, the quality upper limit of sequences, Toshiba and Soccer, contain 50 frames; HD720p output sequences, IH264, are bounded by the input sequences, sequences, Crew, Night, Harbour,andPreakness, contain 24 IMPEG2.