Coding Mode Analysis of MPEG-2 to H.264/AVC Transcoding for Digital TV Applications

Yi-Nung Liu, Chi-Sun Tang, and Shao-Yi Chien Media IC and System Lab Graduate Institute of Electronics Engineering and Department of Electrical Engineering National Taiwan University BL-421, 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan [email protected]

TABLE I Abstract— MPEG-2 to H.264/AVC transcoding is an important module for video recoding in digital TV applications. For pixel CODING MODE COMPARISON BETWEEN MPEG-2 AND H.264. domain transcoding, MPEG-2 bitstream is decoded and then re- encoded by H.264 encoder. Since the behavior of the decoded MPEG-2 H.264 Baseline Profile video is different from the original video, in this paper, the ME Block Size 16x16 VBS-ME performance of each coding mode is analyzed to select the effec- ME Reference Frame 2 (B-frame) 5 (MRF-ME) tive coding tools. It is shown from the analysis that transcoding ME Accuracy 1/2 1/4 with motion estimation with only one reference frame and 16x16 Intra Prediction No Spatial domain block size and deblocking filter can achieve almost the same Block Transform 8x8 DCT 4x4 Integel DCT Quantization Linear or non-linear Exponential video quality with only 19% of the computation. The analysis Rate-Distortion-Optimization No Yes result could be an important reference for the implementation Deblocking Filter No In loop filter of MPEG-2 to H.264 transcoder. Entropy Coding VLC CAVLC or CABAC

I. INTRODUCTION One of the most important applications of MPEG-2 video coding standard [1] is digital TV (DTV) broadcasting. When coding tools. Thus it is necessary to find computationally using the recording function of a set-up box, the restriction efficient approaches for transcoding. [3]. of hard disk capacity is a big problem. The newly established Although coding mode analysis for the H.264 encoder can video coding standard, H.264/MPEG-4 AVC [2], can achieve be found in literatures. There are two reasons for which the much higher coding efficiency than MPEG-2 does. Converting complete coding mode analysis is still required for MPEG-2 the MPEG-2 videos to the H.264 videos can achieve almost to H.264 transcoding. The first reason is the fundamental dif- the same quality while requiring only half bitrate. That is why ference between MPEG-2 and H.264. The transform methods transcoding becomes a hot research topic. are different, and the coefficient mapping is also a problem The differences of coding tools between MPEG-2 and because of the exponential quantizer of H.264. The second H.264 are shown in Table I. For motion estimation (ME) reason is that since the input videos of the transcoder have tools, H.264 provides much more coding modes than MPEG- been compressed by MPEG-2, the behavior is quite different 2 does. MPEG-2 can only support fixed block size (16x16) from uncompressed video sequences. Some useful coding tools while H.264 can support variable block size ME (VBS-ME), may be no more effective in transcoding. where block sizes of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, Consequently, in this paper, the performance of different and 4x4 can be used. MPEG-2 can support at most two coding tools of H.264 in MPEG-2 to H.264 transcoding is reference frames for ME (B-frame) while H.264 can support at analyzed. Based on the analysis, the effective coding tools can most five reference frames, which is called multiple reference be selected. which is a good reference for the implementation frame ME (MRF-ME); MPEG-2 can support ME accuracy of MPEG-2 to H.264 transcoder. to half-pel while H.264 can support quarter-pel ME. The The organization of this paper is shown below. The variety of ME modes of H.264 also leads to large increase transcoding algorithm is first described in Sec. II. Next, in of computational complexity. H.264 also provides the coding Sec. III, the experiment environment will be introduced, and tools for rate-distortion-optimization (RDO) to find out the then the simulation results will be shown. Finally, a short optimal coding modes. The in-loop deblocking filter (DF), in conclusion is given in Sec. V. addition, can enhance the visual quality. Other new coding tools, such as intra-prediction, quantization tools, and entropy II. TRANSCODING ALGORITHMS coding tools with CAVLC and CABAC, are also employed Video transcoding is to convert a video from one format in H.264 to further increase the coding efficiency. But the into another format [4]. A format is defined by many char- computational complexity dramatically increases with these acteristics, such as the bit rate, frame rate, spatial resolution, coding syntax, and content. The transcoding algorithms can Transcoder

Raw MPEG-2 be classified into two categories: frequency domain (transform Original Video TM5 MPEG-2 TM5 MPEG-2 sequence Bitstream Encoder Decoder I B domain) transcoding and pixel domain transcoding [5]. O MPEG2 Raw Bit PSNR Sequence Rate A. Transform Domain Transcoding IMPEG2 Raw H.264 JM10 H.264 JM10 H.264 sequence Bitstream In transform domain transcoding, an input bitstream is Transcoded Video Decoder Encoder I B transcoded in DCT domain without fully decoded to video H264 H264 RD frames. It is generally used in homogeneous transcoding PSNR Curve Bit Rate [5] because of the identical DCT/IDCT process. Transform domain transcoding theoretically reduces the computation be- cause most transform operations of encoder are skipped[6] . The drawback of this scheme is the error drift problem [7]. Fig. 1. Flowchart of the experiments.

B. Pixel Domain Transcoding TABLE II H.264 JM10.1 TEST CONFIGURATIONS. Pixel domain transcoding is to transcode a compressed video by fully decoding the bitstream to video frames in pixel domain. This scheme is flexible since the decoding loop and Configuration No. Ref. VBME DF RD Opt. FME I4 the encoding loop can be totally independent of each other. In (a) Ref. Frames = 5 5 4x4 to 16x16 on on on on pixel domain transcoding, the abundant useful information in (b) Ref. Frames = 2 2 4x4 to 16x16 on on on on (c) Ref. Frames = 1 1 4x4 to 16x16 on on on on the input bitstream, such as MVs or QP parameters, can help (d) VBS-ME 16x16 8x8 1 8x8 and 16x16 on on on on reducing computation, and the related researches can be found (e) VBS-ME 16x16 only 1 16x16 on on on on in [8] [9]. (f) DF off 1 4x4 to 16x16 off on on on (g) RDO off 1 4x4 to 16x16 on off on on (h) RDO off DB off 1 4x4 to 16x16 off off on on C. MPEG-2 to H.264 Transcoding (i) FME off 1 4x4 to 16x16 on on off on (j) I4 off 1 4x4 to 16x16 on on on off In H.264, both the intra prediction and the in-loop de- blocking filter are operations in pixel domain. Furthermore, most of the core functions in a macroblock (MB) processing unit cannot be replaced easily. Therefore, the mainstream of B. Transcoding Analysis MPEG-2 to H.264 transcoding is pixel domain transcoding. The flowchart of the experiments is shown in Fig. 1. In the experiments, pixel domain cascaded transcoder is considered, III. EXPERIMENT ENVIRONMENT SETUP and the TM5 MPEG-2 is cascaded by JM10 H.264 The experiments are designed to analyze the transcoding codec. The raw video sequences, IO, are first encoded by performance of different coding tools of an H.264 encoder. TM5 encoder to MPEG-2 bitstream, BMPEG2, at appropriate Each major coding tool is turned on or off and the rate- bit rate for DTV applications (2MB/s for CIF; 6Mb/s for distortion (RD) curves are plotted for different test sequences. D1(480p); 15Mb for HD720p), and then decoded to YUV The TM5 and JM10 codec available on the web are used as sequences, IMPEG2. The YUV sequences are slightly different standard MPEG-2 and H.264 codec. The simulation platform from the original IO because of the . Next, is a PC with Intel Pentium4 3GHz CPU and 1GB RAM, and sequences IMPEG2 are re-encoded by H.264 video encoder the runtime of the encoder is used to evaluate the computa- to generate H.264 bitstream, BH264, with different coding tional complexity. configurations, as shown in Table II. After the H.264 bitstream are decoded, the PSNR values of the decoded video sequences, A. Test Sequences IH264, where the original sequences IO are used as the To derive convincible results, nine standard sequences reference, are calculated. Finally, the relationship between with different frame sizes and characteristics are used as coding tools and the corresponding RD curves and CPU time test sequences. Some sequences have large motion such as are recorded as the experimental results. Stefan,Foreman, Soccer,andNight; some have complicated texture such as Mobile (Mobile and Calendar), Preakness, C. Reference Transcoder Night,andHarbour; some have great illuminance change The reference (or optimal) transcoder is a pixel domain in a short time like Crew; some have complicated motion cascaded transcoder, where the TM5 MPEG-2 decoder is like Preakness. The variety of sequences makes experimental cascaded by a JM10 H.264 baseline encoder with all coding result reliable and convincible. All CIF sequences, including tools. It is denoted as (a) “Ref. Frames = 5” in Table II. Foreman, Mobile and Stefan contain 100 frames; D1 (480p) In MPEG-2 to H.264 transcoding, the quality upper limit of sequences, Toshiba and Soccer, contain 50 frames; HD720p output sequences, IH264, are bounded by the input sequences, sequences, Crew, Night, Harbour,andPreakness, contain 24 IMPEG2. The RD curve of the reference transcoder is also frames. the theoretical upper limit of a transcoder. The target is to find ˠ̂˵˼˿˸ʳʹʳ˖˴˿˸́˷˴̅ ˙̂̅˸̀˴́˲˶˼˹ ˛˴̅˵̂̈̅ ˦̇˸˹˴́ ˅ˊ ˆˋ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ˆˊˁˈ ˆˆ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ˅ˉˁˈ ˆˊ ˆˊˁˈ ˆ˅

ˆˉˁˈ ˅ˉ ˆˊ ˆ˄ ̀̃˸˺˅ ˆˉ ̀̃˸˺˅ ˅ˈˁˈ ̃̆́̅ ̀̃˸˺˅ʳʳʳ

̃̆́̅ ˆ˃ ˆˉˁˈ ̃̆́̅ ̅˸˹ʳːʳ˄ ˆˈˁˈ ˣ˦ˡ˥ ̅˸˹ʳːʳ˄ ̅˸˹ʳːʳˈ ˅ˈ ˅ˌ ̅˸˹ʳːʳ˅ ˆˈ ̅˸˹ʳːʳ˅ ˆˉ ˄ˉ̋˄ˉ ˋ̋ˋ ˅ˇˁˈ ̀̃˸˺˅ ̅˸˹ʳːʳˈ ˆˇˁˈ ̅˸˹ʳːʳ˄ ˅ˋ ˆˈˁˈ ˄ˉ̋˄ˉʳˋ̋ˋ ˄ˉ̋˄ˉ ̅˸˹ʳːʳ˄ ˅ˇ ˆˇ ˄ˉ̋˄ˉ ˅ˊ ˃ ˈ˃˃ ˄˃˃˃ ˄ˈ˃˃ ˅˃˃˃ ˅ˈ˃˃ ˃˅ˇˉˋ˄˃˄˅˄ˇ˄ˉ˄ˋ ˆˈ ˃ ˈ˃˃ ˄˃˃˃ ˄ˈ˃˃ ˅˃˃˃ ˅ˈ˃˃ ˞˵̃̆ ˠ˵̃̆ ˃ ˈ˃˃ ˄˃˃˃ ˄ˈ˃˃ ˅˃˃˃ ˅ˈ˃˃ ˾˵̃̆ ˞˵̃̆ (a)Mobile @ CIF (b)Harbour @ HD720p (a)Foreman @ CIF (b)Stefan @ CIF

ˣ̅˸˴˾́˸̆̆ ˧̂̆˻˼˵˴ ˆ˃ Fig. 2. RD curves of MRF-ME analyses, (a)(b)(c) in Table II ˆˆ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ̀̃˸˺˅ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ˅ˌˁˋ ̅˸˹ʳːʳ˄ ˆ˅ ˄ˉ̋˄ˉ ˅ˌˁˉ ˋ̋ˋ effective coding tool set that can perform well as the reference ˄ˉ̋˄ˉ transcoder with lowest computation requirement. Note that the ˆ˄ ̀̃˸˺˅ ˅ˌˁˇ ̃̆́̅

±32 ±64 ±128 ̃̆́̅ search range is set to , ,and for CIF, D1(480p), ̅˸˹ʳːʳ˄ ˅ˌˁ˅ HD720p videos, respectively. ˆ˃ ˄ˉ̋˄ˉ ˅ˌ ˋ̋ˋ ˅ˌ IV. EXPERIMENTAL RESULTS AND DISCUSSIONS ˄ˉ̋˄ˉ ˅ˋˁˋ

˅ˋˁˉ The complexity reduction of each configuration can be ˅ˋ ˃˅ˇˉˋ˄˃˄˅˄ˇ˄ˉ evaluated by the CPU time data shown in Table. III, where ˃˄˅ˆˇˈˉˊ ˠ˵̃̆ ˠ˵̃̆ the CPU time of transcoding D1 size sequence Toshiba is demonstrated. It shows that MRF-ME takes most computation (c)Toshiba @ D1(480p) (d)Preakness @ HD720p of transcoding, the VBS-ME is also costly, the RDO operations Fig. 3. RD curves for VBS-ME analyses, (c)(d)(e) in Table II also introduce large computation overhead, and the DF has only small overhead. The RD curves for different MRF-ME configurations (a)(b) are shown in Fig. 2. In these charts, the RD curve for the that, the RD curves of the other five sequences are similar to MPEG-2 encoder is also shown as a reference data, and the Fig. 2(c). On the other hand, as shown in Table III, about 20% “transcoding input” denotes the RD data of the selected input computation can be saved if VBS-ME is turned off, when one bitstream for the transcoder. It is shown that when the MPEG-2 reference frame is used. videos are decoded and re-encoded by H.264, the performance The RD curves of different RDO and DF configurations with different number of reference frames are almost the same (c)(f)(g)(h) for different sequences are shown in Fig. 4. It for all sequences except for sequence Mobile. It implies that shows that the RDO can always improve the video quality MRF-ME is not effective for this kind of input sequences, by only 0.1 to 0.2dB despite of the frame size, motion, and and only one reference frame is enough. The reason may be texture. When RDO is turned off, the bit rate will slightly that the correlation between inter frames has already weakened increased with the same PSNR. As shown in Table III, the by the MPEG-2 encoding process. Note that, because of the RDO operations cost 50% of CPU time when with one refer- space limitation, only two RD curves are shown. As for ence frame but only provide a little improvement. On the other the computational complexity, in JM10.1, the computation of hand, the deblocking filter (DF) has outstanding performance. MRF-ME is proportional to the number of reference frames. For most test sequences, DF can improve the video quality by As shown in Table III, When one reference frame is considered about 0.3 to 0.4dB with the same QP. It also shows in Table instead, the quality is almost same, and 60% computation is III that DF operations only introduce slightly computation reduced. overhead. Consequently, DF is an effective coding tool and The RD curves for different VBS-ME settings (c)(d)(e) should be always turned on. for different sequences are shown in Fig. 3. It shows that Some of the tools in H.264 are critical in transcoding, such VBS-ME is important while coding a small size video or as fractional motion estimation. The simulation result is shown video sequences that have complicated motion like Preakness, in Fig. 5 together with I4 mode. While FME turned off, the but while the larger frame sizes, such as D1 and 720p, are RD-Curve drops about 0.5 to 1.5 dB. which is an unacceptable considered in DTV applications, the RD curves are almost result for our application. Furthermore The RD-curve drops the same no matter the VBS-ME is employed or not, where about 0.2dB when I4 mode off for HD720p. The impact of the difference in PSNR is kept within 0.2dB. Therefore, the I4 mode depends on the frame size. Note that, similar results block size of 16x16 is enough for DTV applications. Note can also be found for the other sequences. ˙̂̅˸̀˴́˲˶˼˹ ˡ˼˺˻̇ ˛˴̅˵̂̈̅ ˆˋ ˦̂˶˶˸̅ ˆˋ ˆˋ ˇ˅

̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ˇ˄ˁˈ ˆˊˁˈ ˆˊˁˈ ˆˊ ˇ˄ ˆˊ ˇ˃ˁˈ ˆˊ ˆˉ ˇ˃ ̀̃˸˺˅ ˆˉˁˈ ˣ˦ˡ˥ ˣ˦ˡ˥ ˣ˦ˡ˥ ˆˉˁˈ ˆˌˁˈ ̃̆́̅ ̅˸˹ʳːʳ˄ ˆˉ ˆˈ ̀̃˸˺˅ ˆˌ ˗˙ʳ̂˹˹ ̀̃˸˺˅ ̀̃˸˺˅ ˆˈˁˈ ˆˉ ˥˗ʳ̂˹˹ ˆˋˁˈ ̇̅˴́̆˶̂˷˸ʳ˼́̃̈̇ ˆˇ ̇̅˴́̆˶̂˷˸ʳ˼́̃̈̇ ˥˗ˢʳ̂˹˹ ˥˗ʳ̂˹˹ʳʳʳʳʳʳʳʳ ˿̂̂̃ʳ̂˹˹ʳʳʳʳʳʳ ˆˋ ˆˈ ̅˸˹ ̅˸˹ ˥˗ˢʳ̂˹˹ ˆˈˁˈ ̅˸˹ʳːʳ˄ ˆˊˁˈ ˗˙ʳ̂˹˹ ˜ˇʳ̂˹˹ ˆˆ ˜ˇʳ̂˹˹ ˿̂̂̃ʳ̂˹˹ ˆˇˁˈ ˆˊ ˙ˠ˘ʳ̂˹˹ ˙ˠ˘ʳ̂˹˹ ˆˈ ˃˄˅ˆˇˈˉˊ ˆˇ ˆ˅ ˃ ˈ˃˃ ˄˃˃˃ ˄ˈ˃˃ ˅˃˃˃ ˅ˈ˃˃ ˾˵̃̆ ˠ˵̃̆ ˃˅ˇˉˋ˄˃˄˅˄ˇ˄ˉ ˃ ˈˠ˵̃̆ ˄˃ ˄ˈ ˅˃ ˠ˵̃̆ (a)Foreman @ CIF (b)Soccer @ D1(480p) (a)Night @ HD720p (b)Harbour @ HD720p ˡ˼˺˻̇ ˣ̅˸˴˾́˸̆̆ ˆˋ ˆ˃ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ̀̃˸˺˅ Fig. 5. RD curves for FME & I4, (i)(j) in Table II ˆˊˁˈ ̇̅˴́̆˶̂˷˼́˺ʳ˼́̃̈̇ ˅ˌˁˋ ̅˸˹ʳːʳ˄ ˆˊ ˅ˌˁˉ ˗˙ʳ̂˹˹ ˆˉˁˈ ˥˗ˢʳ̂˹˹ tool can only contribute a little improvement at bitrate with ˅ˌˁˇ ˆˉ ̀̃˸˺˅ʳʳʳʳʳʳ

̃̆́̅ ˥˗ˢʳ̂˹˹ large overhead in computation. ̃̆́̅ ˆˈˁˈ ̅˸˹ʳːʳ˄ ˅ˌˁ˅ ˗˙ʳ̂˹˹ In summary, in transcoding HD720p MPEG-2 sequences to ˆˈ ˗˙ʳ̂˹˹ ˅ˌ H.264 for DTV applications, the appropriate coding tool set ˆˇˁˈ ˥˗ˢʳ̂˹˹ ˅ˋˁˋ is with one reference frame, 16x16 block size, DF, and RDO ˆˇ ˗˙ʳ̂˹˹ʳʳʳʳ ˥˗ˢʳ̂˹˹ turned off. This configuration requires less than 20% CPU ˆˆˁˈ ˅ˋˁˉ ˃˅ˇˉˋ˄˃˄˅˄ˇ˄ˉ ˃˅ˇˉˋ˄˃˄˅˄ˇ˄ˉ time compared to the reference transcoder, and can provide ˠ˵̃̆ ˠ˵̃̆ almost the same visual quality. Other research topics about (c)Night @ HD720p (d)Preakness @ HD720p the search range reduction and MV information reuse from MPEG-2 bitstream can further reduce the computation. That Fig. 4. RD curves for RDO & DF analyses, (c)(f)(g)(h) in Table II means the cost of a MPEG-2 to H.264 pixel domain transcoder TABLE III should be much lower than a standard H.264 encoder in both CPU TIME OF DIFFERENT CONFIGURATIONS. software and hardware, which is a topic that worthy to study and will be our future work.

Configuration Total Time (ms) ME Time (ms) Complexity REFERENCES Reduction (%) (a) Ref. Frames = 5 1,821,298 1,382,838 0.00 [1] Recommendation ITU-T H.262, ISO/IEC Std. 13 818-2, 1995. (b) Ref. Frames = 2 995,628 568,332 45.33 [2] Draft ITU-T Recommendation and Final Draft International Standard of (c) Ref. Frames = 1 716,100 287,042 60.68 Joint Video Specification, ITU-T Recommendation H.264 and ISO/IEC (d) VBS-ME 16x16 8x8 647,609 222,940 64.44 14496-10 Std., 2003. (e) VBS-ME 16x16 only 581,341 196,781 68.08 [3] S.-Y. Chien, Y.-W. Huang, C.-Y. Chen, H. H. Chen, and L.-G. Chen, (f) DF off 702,125 286,073 61.45 “Hardware architecture design of video compression for multimedia (g) RDO off 347,096 284,821 80.94 communication systems,” IEEE Commun. Mag., vol. 43, no. 8, pp. 122– (h) RDO off DB off 340,390 288,672 81.31 131, Aug. 2005. (j) I4 off 402,328 247,047 77.91 [4] J. Xin, C.-W. Lin, and M.-T. Sun, “ transcoding,” Proc. (k) FME off 594,583 185,693 67.35 IEEE, vol. 93, no. 1, pp. 84–97, Jan. 2005. [5] I. Ahmad, X.-H. Wei, Y. Sun, and Y.-Q. Zhang, “Video transcoding: An overview of various techniques and research issues,” IEEE Trans. Multimedia, vol. 7, no. 5, pp. 793–804, Oct. 2005. [6] Y.-J. Chuang and J.-L. Wu, “An efficient matrix-based dct splitter/merger V. CONCLUSION for mpeg-2-to-avc/h.264 transform kernel conversion,” IEEE Trans. Cir- From the experimental results for different coding mode cuits Syst. Video Technol., vol. 17, no. 1, pp. 120–125, Jan. 2007. [7] T. Qian, J. Sun, , D. Li, X. Yang, , and J. Wang, “Transform domain configurations for MPEG-2 to H.264 transcoding shown transcoding from MPEG-2 to H.264 with interpolation drift-error com- above, several interesting facts can be found. The behavior pensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. of the compressed video sequences are quite different from 523–534, Mar. 2006. [8] J. Xin, A. Vetro, S. I. Sekiguchi, and K. Sugimoto, “Motion and mode original video sequences, which will change the selection of mapping for MPEG-2 to H.264/AVC transcoding,” in Proc. IEEE Inter- effective coding modes. First of all, MRF-ME may be useful national Conference on Multimedia and Expo (ICME’06), Aug. 2006, pp. in H.264 encoder but not the MPEG-2 to H.264 transcoder. 313–316. [9] M. Kucukgoz and M. T. Sun, “Early-stop and motion vector re-using Next, when transcoding HD720p sequences, the VBS-ME for MPEG-2 to H.264 transcoding,” in Proc. of SPIE-IS&T Electronic tools can only slightly improve the RD curve but require Imaging, SPIE Vol. 5308 Visual Communications and Image Processing heavy computation. Third, the DF tool in H.264 can enhance 2004,), vol. 5308, 2004, pp. 932–936. the video quality with only small computation overhead even though the sequences were compressed before. Finally, RDO