EE 5359

PROJECT REPORT

Topic: Complexity Reduction Algorithm for Intra

Mode selection in H.264/AVC Video Coding

By

Amruta Kulkarni

Student ID: 1000666836

Under the guidance of Dr. K. R. Rao

TABLE OF ACRONYMS

AVC advanced video coding

CABAC context adaptive binary arithmetic coding

DCT discrete cosine transform

I-frame intra frame

JM joint model

MSE mean square error

PSNR peak signal to noise ratio

SSIM structural similarity index metric

VLC variable length coding

RDO rate distortion optimization

MPEG moving picture experts group

VCEG video coding experts group

FMO flexible macro block ordering

ASO arbitrary slice ordering

RS redundant slices

SATD sum of absolute transformed differences

Implementation of Complexity reduction algorithm for intra mode selection in H.264/AVC Video Coding

Objective:

It is proposed to implement a complexity reduction algorithm for intra mode selection in H.264/AVC video coding [5]. Need for the same arises as, the mode decision algorithm used for H.264 encoding is inherently complex. The encoding time required for a single macro block depends on the number of computations. The mode decision process can be made simpler by reducing the number of computations; this algorithm proposes to do the same.

Introduction:

H.264 is an Advanced Video Compression standard, developed by ITU-T Video Coding Experts Group together with ISO/IEC Moving Picture Experts Group [1]. It is the widely used video codec in mobile applications, internet (YouTube, flash players), set top box, TV etc. A H.264 encoder converts the video into a compressed format (H.264) and a decoder convert’s compressed video back into an uncompressed format. A H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed H.264 bit stream. The H.264 encoder block diagram is shown in Fig.1. [1]

Fig. 1 H.264 encoder block diagram [1]

A decoder carries out a complementary process by decoding, inverse transform and reconstruction to output a decoded video sequence. Fig. 2 shows the basic building blocks of H.264 decoder.

Fig. 2 H.264 decoder block diagram [1]

The H.264 encoder forms a prediction of the current macro block – One based on the current frame using intra prediction/spatial prediction technique. Intra prediction is an important technique in image and video compression to exploit spatial correlation within one picture. Intra prediction supports the following block sizes. [8]

a)  16x16 (for Luma) –

  H.264 Intra 16x16 prediction modes are shown in Fig. 2a

Ø  Mode 0 (vertical): extrapolation from upper samples (H).

Ø  Mode 1 (horizontal): extrapolation from left samples (V).

Ø  Mode 2 (DC): mean of upper and left-hand samples (H+V).

Ø  Mode 3 (Plane): a linear “plane” function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance.

Fig.2a 16x16 intra prediction modes [9]

b) 8x8 (for Chroma) –

Ø  Mode 0 (DC): mean of upper and left-hand samples (H+V).

Ø  Mode 1 (horizontal): extrapolation from left samples (V).

Ø  Mode 2 vertical): extrapolation from upper samples (H).

Ø  Mode 3 (Plane): a linear “plane” function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance.

b)  4x4 (for Luma) –

H.264 Intra 4x4 prediction modes are shown in Fig.2b. The intra 4x4 prediction has 9 directional modes as listed below.

Ø  Mode 0 - Vertical

Ø  Mode 1 - Horizontal

Ø  Mode 2 - DC

Ø  Mode 3 - Diagonal-down-left

Ø  Mode 4 - Diagonal-down-right

Ø  Mode 5 - Vertical-Right

Ø  Mode 6 - Horizontal-down

Ø  Mode 7 - Vertical-left

Ø  Mode 8 - Horizontal-up

Fig. 2b Intra prediction 4x4 modes [2]

A-H -> they are previously coded pixels of the upper macro block and are available both at encoder/decoder.

I-L -> they are previously coded pixels of the left macro block and are available both at encoder/decoder.

M -> this is previously coded pixel of the upper left macro block.


Inter Prediction in H.264- [2]

Inter prediction is the process of predicting a block of luma and chroma samples from a picture that has previously been coded and transmitted, a reference picture. This involves selecting a prediction region, generating a prediction block and subtracting this from the original block of samples to form a residual that is then transformed, coded and transmitted. The block size can range from 16x16 to 4x4 luma and corresponding chroma samples as shown in Fig. 4 and Fig.5.

Fig. 4 Macro block partitions: 8x8, 4x8, 8x4, 4x4 [9]

Fig. 5 Macro block sub partitions: 8x8, 4x8, 8x4, 4x4 [9]

Rate Distortion Optimization [6]-

The H.264/AVC intra-prediction is conducted for all types of blocks such as 4x4 luma blocks, 16x16 luma blocks, and 8x8 chroma blocks. The residual between the current block and its prediction is then transformed, quantized, and entropy coded.

To obtain the best mode among these modes, the H.264/AVC encoder performs the rate-distortion optimization (RDO) technique for each macro block.

  Set macro block parameters : QP (quantization parameter) and Lagrangian multiplier λMODE

  Calculate : λMODE = 0.85x 2(QP-12)/3

  Then calculate cost, which determines the best mode

Cost = D + λ MODE x R,

D – Distortion

R - Bit rate with given QP

Distortion (D) is obtained by SSD (sum of squared differences) between the original macro block and its reconstructed block.

Bit rate(R) includes the number of bits for the mode information and transforms coefficients for macro block.

Considering the RDO procedure for intra mode selection in H.264/AVC, the number of mode combinations in one macro block is N8x (16xN4 + N16)

Ø  N8 – number of modes of an 8x8 chroma block

Ø  N4 – number of modes of an 4x4 luma block

Ø  N16 – number of modes of an 16x16 luma block

Computing best mode for one macro block:

N8x (16xN4 + N16) = 4 x (16 x 9 + 4)

= 592

Thus, to select the best mode for one Macro block in the intra prediction, the H.264/AVC encoder carries out 592 RDO calculations. As a result, the complexity of the encoder increases extremely.

Goal: This project uses the baseline profile, as it provides simplicity in implementation. The profiles supported by H.264/AVC are shown in Fig.6:

·  Baseline profile

·  Main profile

·  High profile

·  Extended profile

Fig. 6 the specific coding parts of the profiles in H.264 [10]

The important features are –

a)  I and P slice coding

b)  Enhanced error resilience such as FMO (Flexible macro block ordering) and Arbitrary slice ordering(ASO) and redundant slices (RS)

c)  Context adaptive variable length coding (CAVLC)

Baseline profile is primarily used for low-cost applications, for data loss robustness.

The joint model (JM 17.2) implementation of the H.264 encoder is used in this project. [7]

This project has implemented the complexity reduction algorithm for all the 3 block sizes

1) 16x16 luma

2) 4x4 luma

3) 8x8 chroma

The proposed intra mode selection algorithm for a 16x16 luma block is summarized as follows: [6]

  Step 1 - Examine sizes of adjacent blocks: if both blocks (upper block and left block) are 16x16, go to Step 2, otherwise go to Step 4.

  Step 2 - Examine modes of adjacent blocks: if both modes are same, go to Step 3, otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD (sum of absolute transformed differences) between two adjacent modes of modeA and modeB.

  Step 3 - If both adjacent modes are DC mode, go to Step 4, and otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD between the adjacent mode and DC mode.

  Step 4 - Let ΔV be a vertical difference between upper boundary pixels of the current block and boundary pixels of the upper block, and ΔH be a horizontal difference between left boundary pixels of the current block and boundary pixels of the left block as follows.

Ø  ΔV = Σ |u(i)-q(i)| for i =0 to 15.

Ø  ΔH = Σ |l(i)-r(i)| for i =0 to 15.

where u(i) -> upper block boundary pixels

q(i) -> upper boundary pixels of current block

l(i) -> boundary pixels of the left block

r(i) -> left boundary pixels of the current block

Fig. 7 Calculation for ΔV and ΔH in 16x16 luma block. [5]

  Obtain candidate modes by using two difference values, ΔV and ΔH: if |ΔV − ΔH | is smaller than 2xT2, candidate modes are DC mode and plane mode; if (ΔV − ΔH) is larger than T2, candidate modes are DC mode and horizontal mode; if (ΔV − ΔH) is smaller than − T2, candidate modes are DC and vertical mode, where T2 is a positive value. The threshold T2 is set equal to 32.

  Finally, select the best mode between each candidate mode by choosing the mode with minimum SATD.


The complexity reduction algorithm implemented to reduce the number of computations to decide the best mode is shown in Fig. 8

Fig. 8 Algorithm for 16x16 luma intra prediction.


Following steps are performed in JM 17.2 for calculation of SATD.

a) Find the absolute difference (magnitude) between original 16x16 block and predicted 16x16 block.

b) Apply Hadamard transform on every 4x4 block.

c) Then take sum of every 4x4 block transform coefficients except the DC coeff.

d) Check if the sum_cost > max_cost. if yes return max_cost

e) If no , apply Hadamard transform on every 4x4 block and add sum of all 16 DC coefficients to sum_cost.

f) Check if the sum_cost > max_cost if yes return max_cost else return sum_cost.

Where,

sum_cost -> cost calculated for each block partition.

max_cost -> maximum cost value allowed for each block size.

Comparison of number of RDO computations [6]

After implementing the complexity reduction algorithm the numbers of mode decisions reduce from 4 to 2 for I16x16 and I8x8 (for chroma) blocks and for I4x4 number of mode decisions reduce from 9 to 4.

Intra block sizes / Number of modes -Original JM 17.2 Implementation / Number of modes -Complexity reduction algorithm
16x16 / 4 / 2
8x8 / 4 / 2
4x4 / 9 / 4


Results:

The Tables 1 and 2 show the results after implementing the algorithm for intra prediction on the JM17.2 reference software [6]. The platform used to perform tests is a 2.10GHz IntelCore2Duo (T6500) processor with 4GB RAM. This project uses Baseline profile, QP (Quantization parameter) is set as 28 and total number of frames is set as 100 (only I frames) are used. Tables 1 and 2 show results for some CIF and QCIF sequences.

Computational efficiency is measured by the amount of time reduction, which is computed as follows:

  Δ Time is calculated as:

  Δ MSE is calculated as:

  Δ Bit rate is calculated as :

  Δ PSNR is calculated as :

QCIF and CIF resolution sequences:

  CIF (Common Intermediate Format) is a format used to standardize the horizontal and vertical resolutions in pixels of Y, Cb, Cr sequences in video signals, commonly used in video teleconferencing systems.

  QCIF means "Quarter CIF". To have one fourth of the area as "quarter" implies the height and width of the frame are halved.

  The differences in Y, Cb, Cr of CIF and QCIF are as shown below in fig.9. [16]

Fig. 9 CIF and QCIF resolutions(Y, Cb, Cr).

Test Sequences

The Fig.10 shows QCIF and CIF sequences were used to test the complexity reduction algorithm. [12]

  Akiyo

  Foreman

  Car phone

  Hall monitor

  Silent

  News

  Container

  Coastguard

Fig. 10 CIF and QCIF test sequences.
Table 1. Simulation results for QCIF sequences (only I frames)

Sequence (QCIF) / Δ Time (%) / Δ PSNR(dB) / Δ Bit rate (%) / MSE
Akiyo / -10.203 / 0.014 / 3.59 / 0.033
Foreman / -10.942 / -0.004 / 2.03 / -0.012
Car phone / -9.768 / -0.002 / 4.33 / -0.012
Hall monitor / -10.826 / 0.002 / 2.78 / 0.011
Silent / -10.669 / -0.002 / 2.98 / -0.007
News / -10.566 / 0.004 / 1.81 / 0.080
Container / -9.107 / 0.008 / 1.33 / -0.019
Coastguard / -10.629 / -0.021 / 2.72 / -0.082

*Negative values indicate the gain (e.g. decrease in encoding time)

*Positive values indicate the loss (e.g. increase in the bit rate)

Table 2. Simulation results for CIF sequences (only I frames)

Sequence (CIF) / Δ Time (%) / Δ PSNR(dB) / Δ Bit rate (%) / MSE
Bus / -10.459 / -0.006 / 5.37 / 0.111
Container / -10.495 / 0.050 / 3.93 / -0.027
Coastguard / -10.287 / -0.001 / 2.72 / -0.020

*Negative values indicate the gain (e.g. decrease in encoding time)

*Positive values indicate the loss (e.g. increase in the bit rate)

Conclusions:

This project has implemented the complexity reduction algorithm for intra prediction on block sizes 16x16 and 4x4 for Luma and 8x8 for the chroma components. This algorithm has been successfully implemented on JM 17.2. The results show improvements in the time reduction up to 10% without measurable change in the PSNR, MSE and bit rate.


References:

[1]  T. Wiegand, G. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp.560-576, July 2003.

[2]  I. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia, Wiley, 2003.