9.1How Does H.263 Differ from H.261 and MPEG-1?

Chapter 9 H.263

9.1 How does H.263 differ from H.261 and MPEG-1? 9.1.1 Coding of H.263 coefficients three-dimensional event of (last, run, level)

9.1.2 Coding of motion vectors

MVDx = MVx – predx

MVDy = MVy – predy (9.2)

Figure 9.1 Motion vector prediction

Figure 9.2 Motion vector prediction for the border macroblocks

9.1.3 Source pictures 10 Table 9.1 Number of pixels per line and number of lines per picture for each of the H.263 picture formats Picture Number of Number of Number of Number of format pixels for lines for pixels for lines for luminance per luminance per chrominance chrominance line picture per line per picture subQCIF 128 96 64 48 QCIF 176 144 88 72 CIF 352 288 176 144 4CIF 704 576 352 288 16CIF 1408 1152 704 576 Extensions of H.263 (H.263+ , H.263++) This has led to H.26L and eventually to H.264/AVC/MPEG4-V10

Scope and goals of H.263+ The expected enhancements of H.263+ over H.263 fall into two basic categories:

 enhancing quality within existing applications  broadening the current range of applications. A few examples of the enhancements are:

 improving perceptual compression efficiency  reducing video coding delay  providing greater resilience to bit errors and data losses.

Scopes and goals of H.26L  enhanced visual quality at very low bit rates and particularly at PSTN rates (e.g. at rates below 24 kbit/s)  enhanced error robustness in order to accommodate the higher error rates experienced when operating for example over mobile links.  low complexity appropriate for small, relatively inexpensive, audio-visual terminals  low end-to-end delay as required in bidirectional personal communications. In addition, the group was closely working with the MPEG-4 experts group, to include new coding methods and promote interoperability. This work was formally ratified in 2003 and named H.264 in ITU-T and MPEG_4 Part-10 in ISO/IEC. Details of this codec will be covered in Chapter 11. Optional modes of H.263

Advanced motion estimation/compensation

9.4.1.1 Four motion vectors per macroblock

Figure 9.3 Redefinition of the candidate predictors MV1, MV2 and MV3 for each luminance block in a macroblock

9.4.1.2 Overlapped motion compensation

Figure 9.4 Weighting values for prediction with motion vectors of the luminance blocks on top or bottom of the current luminance block, H1(i,j) Figure 9.5 Weighting values for prediction with motion vectors of luminance blocks to the left or right of current luminance block, H2(i,j)

The creation of each interpolated (overlapped) pixel, p(i,j), in an reference luminance block is governed by:

(9.3) where q(i,j), r(i,j) and s(i,j) are the motion compensated pixels from the reference picture with the three motion vectors defined by:

Figure 9.6 Weighting values for prediction with motion vector of current block, H0(i,j)

9.4.2 Importance of motion estimation Figure 9.7 PSNR of Claire sequence coded at 256 kbit/s, with MPEG-1, H.261 and H.263

 motion compensation on smaller block sizes of pixels results in smaller error signals than for the macroblock compensation used in the other codecs  overlapped motion compensation; by removing the blocking artefacts on the block boundaries, the prediction picture has a better quality, so reducing the error signal, and hence the number of significant DCT coefficients  efficient coding of DCT coefficients through three-dimensional (last, run, level)  efficient representation of the combined macroblock type and block patterns.

Deblocking filter

Figure 9.8 Filtering of pixels at the block boundaries Figure 9.9 d1 as a function of d

Motion estimation/compensation with spatial transforms

Figure 9.10 Mapping of a block to a quadrilateral

One of the methods for this purpose is the bilinear transform, defined as [11]: Figure 9.11 Intensity interpolation of a nongrid pixel

(9.7) which is simplified to

Figure 9.12 Reconstructed pictures with operating individually the BMST and BMA motion vectors.

Figure 9.13. Frame by frame reconstruction of the pictures by BMST

Figure 9.14 Mesh-based motion compensation a mesh b motion compensated picture Figure 9.15 Performance of spatial transform motion compensation

9.5 Treatment of B-pictures

Figure 9.16 Prediction in PB frames mode

9.5.1.1 Macroblock type One of the modes of the MCBPC is the intra macroblock type that has the following meaning:

 the P-blocks are intra coded  the B-blocks are inter coded with prediction as for an intra block.

9.5.1.2 Motion vectors for B-pictures in PB frames

9.5.1.3 Prediction for a B-block in PB frames

Figure 9.17 Forward and bi-directional prediction for a B-block

9.5.2 Improved PB frames Bidirectional prediction: in the bidirectional prediction mode, prediction uses the reference pictures before and after the BPB-picture

Forward prediction: In the forward prediction mode the vector data contained in MVDB are used for forward prediction from the previous reference picture (an intra or inter picture, or the P-picture part of PB or improved PB frames).

Backward prediction: In the backward prediction mode the prediction of BPB-macroblock is identical to

BREC of normal PB frames mode. No motion vector data is used for the backward prediction.

9.5.3 Quantisation of B-pictures 10 Table 9.2 dbquant codes and relation between quant and bquant dbquant Bquant 00 01 10 11

10.4 Advanced variable length coding 10.4.1 Syntax-based arithmetic coding (More in H.264) 10.4.2 Reversible variable length coding

Figure 9.18 A reversible VLC

10.4.3 Resynchronisation markers 10.4.4 Advanced Intra/Inter VLC

10.4.4.1 Advanced intra coding Scanning

Figure 9.19 Alternate scans: (a) horizontal, (b) vertical

Figure 9.20 Three neighbouring blocks in the DCT domain

The reconstruction for each coding mode is given by: Mode 0: DC prediction only

Mode 1: DC and AC prediction from the block above

(9.10)

Mode 2: DC and AC prediction from the block to the left

(9.11)

10.4.4.2 Advanced inter coding with switching between two VLC tables

The outcome of this annex [25-S] in a different form is also used in H.264. In H.264, rather than switching between two VLC tables, selection is made among 11 VLC tables. Decision for selection of the most suitable table is based on the context of data to be coded. This is called context adaptive VLC (CAVLC), to be explained in great details in Chapter 11.

10.5 Protection against error 10.5.1 Forward error correction To allow the video data and error correction parity information to be identified by the decoder, an error correction framing pattern is included. This pattern consists of multiframes of eight frames, each frame comprising 1 bit framing, 1 bit fill indicator (FI), 492 bits of coded data and 18 bits parity. One bit from each one of the eight frames provide the frame alignment pattern of (S1S2S3S4S5S6S7S8)=(00011011) that will help the decoder to resynchronise itself after the occurrence of errors.

The error detection/correction code is a BCH (511, 493) [16]. The parity is calculated against a code of 493 bits, comprising a bit fill indicator (FI) and 492 bits of coded video data. The generator polynomial is given by:

(9.12)

The parity bits are calculated by dividing the 493 bits (left shifted by 18 bits) of the video data (including the fill bit) to this generating function. Since the generating function is a 19-bit polynomial, the remainder will be an 18-bit binary number (that is why data bits had to be shifted by 18 bits to the left), to be used as the parity bits. For example, for the input data of 01111...11 (493 bits), the resulting correction parity bits are 011011010100011011 (18 bits). The encoder appends these 18 bits to the 493 data bits, and the whole 511-bits are sent to the receiver as a block of data. Now this 511 bit data is exactly divisible to the generating function, and the remainder will be zero. Thus, the receiver can perform a similar division, and if there is any remainder, it is an indication of channel error. This is a very robust form of error detection, since burst of errors can also be detected.

10.5.2 Back channel T: Transform Q: Quantiser CC: Coding control P: Picture memory with motion compensated variable delay AP: Additional picture memory v: Motion vector p: Flag for INTRA/INTER t: Flag for transmitted or not qz: Quantisation indication q: Quantisation index for DCT coefficients

Figure 9.21 An encoder with multiple reference pictures

A positive acknowledgment (ACK) or a negative acknowledgment (NACK) is returned depending on whether the decoder successfully decodes a GOB.

An improved version of this annex [25-N] is used in H.264, under the concept of multiple reference frames. It is used both to improve compression efficiency, by searching for better motion vectors among several frames, as well as error robustness as discussed here. Multiple reference motion compensation and error free selection of reference frame will be studied in some details in Chapter 11.

10.5.3 Data partitioning

Figure 9.22 Effects of errors on (a) with and (b) without data partitioning

Table 9.3 VLC and RVLC bits of MCBPC (for I-pictures) Index MB type CBPC Normal VLC RVLC

0 3 (Intra) 00 1 1

1 3 01 001 010 2 3 10 010 0110

3 3 11 011 01110

4 4 (Intra+Q) 00 0001 00100

5 4 01 000001 011110

6 4 10 000010 001100

7 4 11 000011 0111110

Table 9.4 Number of bits per slice for data partitioning Data Partitioning

Slice No Slice header MV Coeff SUM Normal

1 52 30 211 293 269

2 63 34 506 603 571

3 45 42 748 835 803

4 48 42 1025 1115 1083

5 45 71 959 1075 1043

6 41 46 844 931 899

7 48 34 425 507 475

8 51 32 408 491 459

9 38 24 221 283 251 Figure 9.23 Error in a bit stream

(9.13) where N is the total number of pixels at the upper and lower borders of the MB.

Figure 9.24 Pixels at the boundary of (a) a macroblock and (b) four blocks Figure 9.25 Step-by-step decoding and skipping of bits in the bit stream

10.5.4 Error concealment 10.5.4.1 Intraframe error concealment

Figure 9.26 An example of Intraframe error concealment

9.7.5.1 Interframe error concealment Figure 9.27 A grid of macroblocks in the current and previous frame

Zero mv

Previous mv

Top mv

Mean mv

; (9.14)

Majority mv

; N  6 (9.15)

Vector median mv

th the j motion vector, mvj, is the median if:

(9.16)

Table 9.5 PSNR [dB] of various error concealment methods at 5 frames/s. Type 64Kbps, 5 fps, QCIF

Sequence Seq-1 Seq-2 Seq-3 Seq-4

Zero 17.04 18.15 14.54 13.08

Previous 17.28 18.34 14.53 13.48

Top 19.27 21.08 17.04 16.25

Average 19.18 21.74 17.51 16.18

Majority 19.35 21.83 17.89 16.61

Median 19.87 22.52 18.29 16.89

No Errors 22.57 26.94 20.85 19.88

Table 9.6 PSNR [dB] of the various error concealment methods at 12.5 frames/s. Type 64Kbps, 12.5 fps, QCIF

Sequence Seq-1 Seq-2 Seq-3 Seq-4

Zero 20.62 22.11 18.64 16.62

Previous 22.49 22.19 18.19 16.53

Top 22.97 25.16 20.97 20.04

Average 22.92 25.99 21.24 20.08

Majority 23.33 26.32 21.57 20.36

Median 24.36 26.72 22.16 20.81

No Errors 26.12 29.69 23.93 23.04

Figure 9.28, An erroneous picture along with its error concealed one

Bidirectional mv Figure 9.29 A group of alternate P and B pictures

To estimate a missing motion vector for a P-picture, say P31, the available motion vectors of the same spatial coordinates of the B-pictures can be used, with the following substitutions:

if only B23 is available, then P31 = 2 × B23

if only B21 is available, then P31 = -2 × B21

if both B23 and B21 are available, then P31 = B23 - B21

If none of them are available, then set P31 = 0

To estimate a missing motion vector of a B-picture, simply divide that of the P-picture by two;

B23=½P31 or B21= -½P31

9.7.5.2 Loss concealment 9.7.5.3 Selection of best estimated motion vector

Scalability Figure 8.11 Block diagram of a two-layer SNR scalable coder

Figure 8.12 A DCT based base layer encoder

Figure 8.13 A two-layer SNR scalable encoder with drift at the base layer Figure 8.14 A three layer drift free SNR scalable encoder Figure 8.15 A block diagram of a three-layer SNR decoder

Figure 8.16 Picture quality of the base layer of SNR encoder at 2 Mbit/s Base + Enhancement @ 8 Mbit/s

Spatial scalability

Figure 8.17 Block diagram of a two-layer spatial scalable encoder Figure 8.19 Details of spatial scalability encoder

(a) (b) Figure 8.20 (a) Base layer picture of a spatial scalable encoder at 2 Mbit/s , and (b) its enlarged version

Temporal scalability

Figure 8.21 A block diagram of a two-layer temporal scalable encoder Hybrid scalability Figure 8.25 SNR, spatial and temporal hybrid scalability encoder

Overhead due to scalability

Figure 8.26 Increase in bit rate due to scalability

Table 8.4 Applications of SNR scalability Base layer Enhancement layer Application ITU-R-601 same resolution and two quality service for format as lower layer standard TV high definition same resolution and two quality service for HDTV format as lower layer 4:2:0 high definition 4:2:2 chroma simulcast video production/distribution

Table 8.5 Applications of spatial scalability Base Enhancement Application progressive(30Hz) progressive(30Hz) CIF/QCIF compatibility or scalability interlace(30Hz) interlace(30Hz) HDTV/SDTV scalability progressive(30Hz) interlace(30Hz) ISO/IECE11172-2/compatibility with this specification interlace(30Hz) progressive(60Hz) Migration to HR progressive HDTV

Table 8.6 Applications of temporal scalability Base Enhancement Higher Application progressive(30Hz) progressive(30Hz) progressive migration to HR (60Hz) progressive HDTV interlace(30Hz) interlace(30Hz) progressive migration to HR (60Hz) progressive HDTV

Scalability in H.263

Figure 9.30 B-picture prediction dependency in the temporal scalability SNR scalability

Figure 9.31 Prediction flow in SNR scalability

Spatial scalability Figure 9.32 Prediction flow in an spatial scalability

Multilayer scalability

Figure 9.33 Positions of the base and enhancement-layer pictures in a multilayer scalable bit-stream

Transmission order of pictures Figure 9.34 Example of picture transmission order Numbers next to each picture indicate the bit stream order, separated by commas for the two alternatives.