15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

AN EFFICIENT INTER-CODING ALGORITHM FOR H.264

Laeeq Aslam and Nadeem Ahmad Khan Department of Computer Science, Lahore University of Management Sciences Opposite Sector ‘U’, DHA, 54792, Lahore, Pakistan phone: + (92)42-5722670-9, fax: + (92)42-5722591, email: {laeeq, nkhan}@lums.edu.pk web: http://cs.lums.edu.pk

ABSTRACT In general, a large area with consistent motion is An efficient inter-coding algorithm for H.264 cod- more likely to be coded using large block size, and the area ing standard is presented that gives a simultaneous improve- containing the boundaries of motion are more likely to be ment of encoding time and without a loss in SNR. coded using smaller block sizes. The basic idea is to exploit This is an extension of 3DRS algorithm for variable block the homogeneity of motion in the scene. The encoder selects sizes which was previously known for fixed block sizes. We the best macroblock partition and mode of prediction for use central difference to find boundaries in a consistent mo- each macroblock, such that the video coding performance is tion vector field to make macroblock partition decision. In optimized. comparison to Full Search (FS) algorithm the proposed al- gorithm on average takes 66.65% less computational time, producing 2.9% less number of bits with an average SNR gain of 0.053 dB. The proposed algorithm, on average, saves 6.31% of computational time and 1.5% number of bits as compared to Fast (FME) reference algo- rithm with a gain of 0.072 dB of SNR. Experimental results using six different sequences are presented to demonstrate the advantage of using the proposed algorithm.

1. INTRODUCTION H.264 defines three key profiles: Baseline, Main, and Ex- tended. The Baseline profile is the simplest profile which targets applications with limited processing resources and Figure 1 - Allowed macroblock and sub-macroblock partitioning in low delay requirements. The Main profile adds features that H.264 improve video quality at the expense of a significant in- In the next section we will give an overview of the crease in computational complexity. The Extended profile related work from the literature. Third section will cover the targets streaming video, and includes features to improve suggested algorithm and fourth section contains the experi- error resilience and to facilitate switching between different mental results after which we give conclusions. bit streams [1]. To overcome the high computational com- plexity of the we need fast algorithms, efficient im- 2. OVERVIEW OF RELATED WORK plementation and enhanced computational resources. All three profiles use inter-coding to exploit the temporal redun- A lot of work in recent years has been done suggesting algo- dancy. This work is regarding inter-coding and is more use- rithms to find the best mode and partition [2, 3, 6, 7]. If we ful for the baseline profile due to limited processing re- can identify, prior to motion estimation, the macroblocks that sources, however other profiles will also get benefit of the are likely to be skipped, we can save a lot of computations. proposed scheme. Inter coding in H.264 is done with mac- One such technique is suggested in [2] which can reduce the roblocks partitioned into different ways as shown in figure encoding time by 29.67% on average without significant loss 1; with a minimum luma-block size of 4x4 to a maximum of in rate-distortion performance. This is achieved by estimating 16x16. The primary macroblock partition patterns are a Lagrangian rate-distortion cost function. 16x16, 16x8, 8x16 and 8x8. If the 8x8 partitioning is se- In [3] an early termination algorithm is given for vari- lected, it can be further partitioned into 8x4, 4x8 or 4x4 [1]. able block-size motion estimation by concentrating on zero If we take object coding and fixed block coding as two ex- motion. If the rate distortion cost of a partition in a macrob- treme approaches, H.264 gives the flexibility to use variable lock at (0, 0) is less than a threshold, it is declared as a zero block sizes to explore the tradeoffs between the two ap- motion block. For sequences with less motion this technique proaches. can save up to 93.4% search points per macroblock while reduction in PSNR is not more than .05 dB, whereas, in se-

©2007 EURASIP 1260 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

quences with high motion, this technique can save up to 60% In the first step, our algorithm obtains the consistent mo- of search points with a negligible loss in PSNR. tion field using 3DRS algorithm with fixed block size of According to [6], homogeneity decision of a block is de- 16x16. In the second step obtained motion field is used to pendent on edge information, and macroblock differencing determine whether a certain macroblock is on the motion can be used to judge whether the macroblock is time station- vector field discontinuity or not and what is the right partition ary or not. Based on these facts a fast algorithm for inter- of that macroblock. This process is performed as follows: coding is presented in [6] which can reduce the encoding Assume that we have (j x k) blocks in a given frame. Af- time up to 30% with a loss of 0.03 dB in PSNR and 0.6% ter the first step motion vector field can be represented as: increase in bit rate. MV = {(Xxy, Yxy) | 0≤ x < j & 0≤ y

the discontinuity of motion vector field or not. Consistent or true motion field is obtained using 3DRS algorithm with fixed block size [8]. The examples of such motion field are shown in figure 3. Scaled absolute central difference is ap- plied on motion field to find the macroblocks containing the motion boundaries. Such macroblocks are partitioned to ob- tain high coding efficiency. The suggested algorithm consists of two steps: 1. Keep the mode of each macroblock as inter 16x16 and find Figure 2: Image plane considered in this paper the motion vectors using 3DRS algorithm [8]. Apply fine search to refine the motion vectors found. The strength of an edge in this case will be a measure of 2. Based on motion vector information found in above step, relative motion between two blocks in horizontal or vertical decide about the macroblock partitions and encode. direction. 3-D recursive search algorithm has been previously used It is also important to note that if we use forward or for fixed block size in [8], and for variable block-size in [9]. backward difference the obtained information is not suffi- Key idea of this algorithm is to use known motion vectors of cient to point out the blocks which contain the motion spatial and temporal neighbours to find motion vector for the boundaries; rather such difference will just tell us that two current block. Spatial neighbours are those for which we blocks are moving in different directions. To find the mac- have already found the motion vector. Temporal neighbours roblocks having motion boundaries we have to take the cen- are those for which current frame does not have the motion tral difference. vector information yet but the information for the previous Using central difference as approximate derivative we frame is still available and is not overwritten. Two estimators can capture many local properties. If right and left, or top and ‘a’ and ‘b’ are defined with diagonally opposite convergence bottom, neighbouring blocks are moving in different direc- directions. Each estimator provides a set of candidates con- tions then the difference in motion will be quantified by gra- sisting of a temporal predictor and up to four other candi- dient magnitude. Blocks containing boundary of two objects dates from spatial predictor by adding random update vectors or surfaces which have a relative motion will also be identi- to it. These candidates are spatial and temporal neighbours of fied in this way. All such blocks are candidates for further the current block. All candidates are evaluated based on some split in their size, leading to higher efficiency in coding. distortion measure like SAD and the best candidate is se- We are taking absolute value of DH and DV as a measure for magnitude. If for a certain macroblock there is a strong lected for assignment of motion vectors to the current block. In fine search further four points are evaluated based on SAD magnitude of DH and a weak magnitude for DV, then we to refine the motion vector. These four points are left, right, should split that macroblock into two 16x8 sub-macroblocks. On the other hand if D is strong and D is weak, the mac- top and bottom points of the already determined motion vec- V H tor. roblock should be partitioned into two 8x16 sub- macroblocks. If both DH and DV are strong enough, then we

©2007 EURASIP 1261 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

should partition the macroblock into four 8x8 blocks. Sub- 16x16. These two frames are just to explain the working of macroblock partitions are not used because they are optimal the algorithm. choice in only 2 to 4 percent of macroblocks as reported in Once the partitioning decision has been made, the sub- [7], so evaluating them will be inefficient. macroblocks for that macroblock are re-encoded. Motion To quantify the strength or weakness of these derivative vector predictions are taken and evaluated from spatial and magnitudes, we define a threshold T. The value of T will temporal neighbours as described by 3DRS algorithm and specify the strength of motion field edge which is sufficient the best one is assigned. Same procedure is repeated for all for deciding partitions. If we increase the value of T, sensitiv- partitions of a macroblock. The motion vector finalized from ity of algorithm towards motion change will be decreased the predictors based on distortion measure is further refined. and vice versa. In the next section we give results with T=1 One local diamond search is performed around the for both DH and DV, being on the conservative side. position corresponding to the previously finalized motion If the macroblock partitioning only takes place at the vector for this refinement [9]. motion boundaries, it will increase the gain in terms of re- duced bitrate and will reduce the time complexity in terms of 4. EXPERIMENTAL RESULTS total number of computations required because we decide We implemented our algorithm and compared with full about the mode of macroblocks using D and D arrays. H V search and fast motion estimation (FME) algorithms for

only integer-pel level. Sub-pel level in the reference code has been disabled and all other options were kept same for the comparison purposes. Results were generated on a Pen- tium-IV, 3.2 GHz with 504 MB of RAM. Reference encoder JM version 8.2 with MSVC 6.0 compiler and Windows XP operating system are used. We are reporting results with T=1. Results have been compiled for six standard video se- quences which include news, foreman, coastguard, trevor, susie and silent. We used 100 frames of each at QCIF (176x144) resolution. These sequences are selected because they contain variety of motion like; foreman and coastguard Frame number 71 of foreman sequence contain camera movement while susie contains foreground movement. News sequence contain foreground as well as background movement, while silent contains static back- ground and trevor contain six differently moving insets. Time to encode and bit-stream size are noted for FS, FME and suggested algorithms with a quantization parame- ter value of 28. Results show that the proposed algorithm on average saves 66.65% of computational time as compared to FS and 6.31% as compared to FME algorithm. Detailed data has been reported in table-1. For all of the sequences the proposed algorithm is showing some gain in SNR as com- Frame number 54 of travor sequence pared to FME and when compared to FS it is better except one case when the SNR is 0.01 dB less. When we look at the Figure 3: Two example frames showing macroblock partitioning by averages, the proposed algorithm is gaining 0.053 dB as suggested algorithm compared to FS and 0.072 dB as compared to FME. On av- The working of the algorithm is explained in figure 3 erage 2.9% of bits are saved as compared to FS and 1.5% which contains two frames from different sequences. The when compared with FME. In some cases our algorithm is first is 71st frame of foreman sequence and the second is 54th saving up to 19% encoding time, compared to FME, and up of travor. The dark grid shows 16x16 macroblock boundary to 67% compared to FS. In most of the cases the proposed while small black lines within a block represent motion vec- algorithm is saving time and bits simultaneously with better tors found in first pass of the proposed algorithm. Light lines SNR compared to both FME and FS. Overall, the proposed show macroblock partition decision. A macroblock is parti- algorithm is showing significant improvement in encoding tioned into two 16x8 macroblocks if light line pass horizon- performance without compromising SNR. Comparisons of tally, into two 8x16 if light line pass vertically and into four encoding time, SNR and bit stream size with full search and 8x8 if it passes in both directions. FME are shown in figure 4, 5 and 6 respectively. It is clearly observable in these examples that the parti- tioning is done only on motion boundaries. Areas on either side of light lines have uniform motion field. While macrob- locks are being partitioned if they lie on the boundary of two areas having different motion trend. It is important to note that this motion field is determined using fixed block sizes of

©2007 EURASIP 1262 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

Table 1: Performance comparison of proposed algorithm with FS T ime Comparison and FME 35 30 25 20

(5) (6) (1) (2) 15 (4) Susie Susie News Silent Silent Coast- 10 Trevor guard(3) Foreman Foreman 5

Encoding Time (sec) 0 Bits 409040 1390440 1731000 778168 435616 731528 123456 SNR 36.07 34.64 33.4 35.83 35.34 36.41 Video Sequence Number (dB) FS FME Proposed Algorithm Time

FS 31.112 31.983 31.812 31.547 30.70 31.811 (sec)

396232 1376704 1722688 766720 430016 724504 Figure 4: Total encoding time comparison of full search, FME and Bits SNR the proposed algorithm 36.03 34.6 33.4 35.82 35.34 36.39 (dB) Time

FME 9.541 12.41 13.9 10.77 10.054 11.386 SNR Comparison (sec) 37 Bits 390048 1370128 1724504 764368 404440 718680 36 SNR 36.07 34.7 33.42 35.82 35.42 36.58 35 (dB) 34 Time

Proposed Proposed Algorithm 10.06 10.862 11.18 10.36 10.155 10.423 (sec) SNR (dB) 33 32 31 Finally figure 7 shows the rate distortion comparison 123456 for the susie and silent sequences. Quantization parameter Video Sequence Number value varies from 10 to 50. We have shown SNR on y-axis FS FME Proposed Algorithm and shown number of kilo-bits on x-axis. Primary x-axis (shown at bottom) is used for the proposed algorithm and secondary x-axis (shown at top) is used for FME. Figure 5: SNR comparison of full search, FME and the proposed It is clear for both of the sequences that the proposed al- algorithm gorithm is performing better than or at least as good as FME. Exact numbers are given in table 2 for further clarity. Comparison of Bitstream Size 2000 Rate Distortion Curve for Susie 1500 FME (Kbits) 1000 38.00 125.20 532.52 2027.06 5682.11 60 500 50 40 FME

Bitstream Size (Kbits) 0 30 123456 20 SNR (dB) SNR Proposed Video Sequence Number 10 Algorithm 0 FS FME Proposed Algorithm 36.09 146.12 540.62 1928.46 5430.91 Proposed Algorithm (Kbits)

Figure 6: Comparison of bit stream size for full search, FME and the proposed algorithm Rate Distortion Curve for Silent

It is clear from figure 4 that the proposed algorithm is FME (Kbits) performing better than full search and FME with respect to 24.6 80.9 330.0 1058.7 3139.6 encoding time. Figure 5 shows that the proposed algorithm 60 50 FME is competing with FS and FME and also performing better 40 in some cases. Figure 6 is showing comparison between bit 30 Proposed 20 Algorithm stream size generated by full search, FME and the proposed SNR (dB) 10 algorithm. Detailed data about these plots along with video 0 sequence numbers is given in table 1. 25.7 84.9 310.8 1007.7 3053.9 Proposed Algorithm (Kbits)

Figure 7: Rate distortion comparison of the proposed algorithm and FME

©2007 EURASIP 1263 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

Table 2: Rate distortion data for the proposed algorithm and FME [3] L. Yang, K.Yu, J. Li and S. Li, “An Effective Variable Block-Size Early termination Algorithm for H.264 Video Proposed Algorithm FME Coding,” IEEE Trans. on circuits and systems for video SNR SNR QP Bits Bits technology, vol. 15, no. 6, June 2005. (dB) (dB) [4] C. Kim, K. Shih and C. C. Jay Kuo, “Feature-Based In- 50 25.02 36088 24.96 38000 tra-Prediction Mode Decision For H.264,” IEEE, ICIP 2004. 40 29.74 146120 29.32 125200 [5] M. Yang, and W. Wang, “Fast Macroblock Mode Selec- 30 35.27 540616 35 532520 tion Based on Motion Content Classification in Susie 20 42.02 1928464 41.81 2027056 H.264/AVC,”, IEEE, ICIP, Singapore, pp. 741-744, October 10 49.61 5430912 49.56 5682112 2004. [6] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Ra- 50 23.1 25672 23.17 24616 hardja and C. C Ko, “Fast Intermode Decision in 40 27.81 84912 27.64 80880 H.264/AVC Video Coding,” IEEE Trans. on Circuit and 30 33.96 310784 33.84 329984 Systems for Video Technology, Vol. 15, No. 6, July 2005. Silent 20 41.58 1007736 41.51 1058656 [7] J. Bu, S. Lou, C. Chen, and J. Zhu, “A Predictive Block- 10 50.02 3053864 49.99 3139648 Size Mode Selection for in H.264,” IEEE, ICASP, vol. 2, pp. II-917-II-920, 14-19 May 2006. [8] G. DeHaan, P.W.A.C. Biezen, O.A. Huijgen, “True Mo- 5. CONCLUSIONS tion Estimation with 3-D Recursive Search Block Match- ing,” IEEE Transactions on Circuits and Systems for Video An efficient Inter-Coding algorithm is presented to improve Technology, vol. 3, pp. 368-379, 388, October 1993. the encoding efficiency of H.264. It is an extended 3DRS [9] N.A. Khan, S. Masud, and A. Ahmad, “A Variable Block algorithm for variable block sizes. A consistent motion field Size Motion Estimation Algorithm for Real-Time H.264 is obtained by 3DRS for fixed block size. Scaled absolute Video Encoding,” Signal Processing: Image Communica- central difference which is a gradient estimator is used to tion, Elsevier, vol. 21, Issue 4, pp. 306-315, April 2006. identify discontinuities in motion vector field. Macroblocks containing these boundaries are partitioned to achieve the true essence of variable block motion estimation making the algorithm motion adaptive. Experimental results, using six standard sequences show that the proposed algorithm, on average, consumes 66.65% less encoding time and produces 2.9% less bits with a 0.053 dB gain in SNR as compared to FS. The proposed algorithm saves on average 6.31% of computational time and 1.5% of bits with a gain of 0.072 dB in SNR when compared with FME. The proposed algorithm achieves reduced number of bits and encoding time without a loss of SNR, making H.264 more practical for applications demanding reduced bitrates. We believe that the perform- ance of H.264 can be further improved by investigating pre- sented algorithm for submacroblock partitioning and for sub-pel level motion estimation. In future we also plan to extend our algorithm for multiple reference frames.

6. ACKNOWLEDGEMENTS This work is funded by HEC (Higher Education Commis- sion) Pakistan, and we fully acknowledge that. We would also like to thank Amna Ahmad and our other colleagues who helped us by their discussions which were useful in this work.

REFERENCES [1] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narro- schke, F. Pereira, T. Stockhammer, and T. Wedi, “Video cod- ing with H.264/AVC: Tools, Performance, and Complexity,” IEEE Circuits and Systems Magazine, vol. 4, pp. 7-28, First Quarter 2004. [2] Tran Duc Hai Du, “Macroblock Mode Decision for H.264,” MIR-05, ACM, Singapore, November 10-11, 2005.

©2007 EURASIP 1264