9.1How Does H.263 Differ from H.261 and MPEG-1?

9.1How Does H.263 Differ from H.261 and MPEG-1?

Chapter 9 H.263

9.1How does H.263 differ from H.261 and MPEG-1?

9.1.1Coding of H.263 coefficients

three-dimensional event of (last, run, level)

9.1.2Coding of motion vectors

MVDx = MVx – predx

MVDy = MVy – predy(9.2)

Figure 9.1 Motion vector prediction

Figure 9.2 Motion vector prediction for the border macroblocks

9.1.3Source pictures

10Table 9.1 Number of pixels per line and number of lines per picture for each of the H.263 picture formats

Picture format / Number of pixels for luminance per line / Number of lines for luminance per picture / Number of pixels for chrominance per line / Number of lines for chrominance per picture
subQCIF / 128 / 96 / 64 / 48
QCIF / 176 / 144 / 88 / 72
CIF / 352 / 288 / 176 / 144
4CIF / 704 / 576 / 352 / 288
16CIF / 1408 / 1152 / 704 / 576

Extensions of H.263 (H.263+ , H.263++)

This has led to H.26L and eventually to H.264/AVC/MPEG4-V10

Scope and goals of H.263+

The expected enhancements of H.263+ over H.263 fall into two basic categories:

  • enhancing quality within existing applications
  • broadening the current range of applications.

A few examples of the enhancements are:

  • improving perceptual compression efficiency
  • reducing video coding delay
  • providing greater resilience to bit errors and data losses.

Scopes and goals of H.26L

  • enhanced visual quality at very low bit rates and particularly at PSTN rates (e.g. at rates below 24 kbit/s)
  • enhanced error robustness in order to accommodate the higher error rates experienced when operating for example over mobile links.
  • low complexity appropriate for small, relatively inexpensive, audio-visual terminals
  • low end-to-end delay as required in bidirectional personal communications.

In addition, the group was closely working with the MPEG-4 experts group, to include new coding methods and promote interoperability. This work was formally ratified in 2003 and named H.264 in ITU-T and MPEG_4 Part-10 in ISO/IEC. Details of this codec will be covered in Chapter 11.

Optional modes of H.263

Advanced motion estimation/compensation

9.4.1.1 Four motion vectors per macroblock

Figure 9.3 Redefinition of the candidate predictors MV1, MV2 and MV3 for each luminance block in a macroblock

9.4.1.2 Overlapped motion compensation

Figure 9.4 Weighting values for prediction with motion vectors of the luminance blocks on top or bottom of the current luminance block, H1(i,j)

Figure 9.5 Weighting values for prediction with motion vectors of luminance blocks to the left or right of current luminance block, H2(i,j)

The creation of each interpolated (overlapped) pixel, p(i,j), in an reference luminance block is governed by:

(9.3)

where q(i,j), r(i,j) and s(i,j) are the motion compensated pixels from the reference picture with the three motion vectors defined by:

Figure 9.6 Weighting values for prediction with motion vector of current block, H0(i,j)

9.4.2Importance of motion estimation

clai h261 mpeg1 h263 30hz 256k wl ed

Figure 9.7 PSNR of Claire sequence coded at 256 kbit/s, with MPEG-1, H.261 and H.263

  • motion compensation on smaller block sizes of pixels results in smaller error signals than for the macroblock compensation used in the other codecs
  • overlapped motion compensation; by removing the blocking artefacts on the block boundaries, the prediction picture has a better quality, so reducing the error signal, and hence the number of significant DCT coefficients
  • efficient coding of DCT coefficients through three-dimensional (last, run, level)
  • efficient representation of the combined macroblock type and block patterns.

Deblocking filter

Figure 9.8 Filtering of pixels at the block boundaries

Figure 9.9 d1 as a function of d

Motion estimation/compensation with spatial transforms

Figure 9.10 Mapping of a block to a quadrilateral

One of the methods for this purpose is the bilinear transform, defined as [11]:

Figure 9.11 Intensity interpolation of a nongrid pixel

(9.7)

which is simplified to

Figure 9.12 Reconstructed pictures with operating individually the BMST and BMA motion vectors.

Figure 9.13. Frame by frame reconstruction of the pictures by BMST

Figure 9.14 Mesh-based motion compensation

a mesh

b motion compensated picture

Figure 9.15 Performance of spatial transform motion compensation

9.5Treatment of B-pictures

Figure 9.16 Prediction in PB frames mode

9.5.1.1 Macroblock type

One of the modes of the MCBPC is the intra macroblock type that has the following meaning:

  • the P-blocks are intra coded
  • the B-blocks are inter coded with prediction as for an intra block.

9.5.1.2Motion vectors for B-pictures in PBframes

9.5.1.3 Prediction for a B-block in PB frames

Figure 9.17 Forward and bi-directional prediction for a B-block

9.5.2Improved PB frames

Bidirectional prediction: in the bidirectional prediction mode, prediction uses the reference pictures before and after the BPB-picture

Forward prediction: In the forward prediction mode the vector data contained in MVDB are used for forward prediction from the previous reference picture (an intra or inter picture, or the P-picture part of PB or improved PB frames).

Backwardprediction: In the backward prediction mode the prediction of BPB-macroblock is identical to BREC of normal PB frames mode. No motion vector data is used for the backward prediction.

9.5.3Quantisation of B-pictures

10Table 9.2 dbquant codes and relation between quant and bquant

dbquant / Bquant
00 /
01 /
10 /
11 /

10.4Advanced variable length coding

10.4.1Syntax-based arithmetic coding (More in H.264)

10.4.2Reversible variable length coding

Figure 9.18 A reversible VLC

10.4.3Resynchronisation markers

10.4.4Advanced Intra/Inter VLC

10.4.4.1Advanced intra coding

Scanning

Figure 9.19 Alternate scans: (a) horizontal, (b) vertical

Figure 9.20 Three neighbouring blocks in the DCT domain

The reconstruction for each coding mode is given by:

Mode 0: DC prediction only

(9.9)

Mode 1: DC and AC prediction from the block above

(9.10)

Mode 2: DC and AC prediction from the block to the left

(9.11)

10.4.4.2 Advanced inter coding with switching between two VLC tables

The outcome of this annex [25-S] in a different form is also used in H.264. In H.264, rather than switching between two VLC tables, selection is made among 11 VLC tables. Decision for selection of the most suitable table is based on the context of data to be coded. This is called context adaptive VLC (CAVLC), to be explained in great details in Chapter 11.

10.5Protection against error

10.5.1Forward error correction

To allow the video data and error correction parity information to be identified by the decoder, an error correction framing pattern is included. This pattern consists of multiframes of eight frames, each frame comprising 1 bit framing, 1 bit fill indicator (FI), 492 bits of coded data and 18 bits parity. One bit from each one of the eight frames provide the frame alignment pattern of (S1S2S3S4S5S6S7S8)=(00011011) that will help the decoder to resynchronise itself after the occurrence of errors.

The error detection/correction code is a BCH (511, 493) [16]. The parity is calculated against a code of 493 bits, comprising a bit fill indicator (FI) and 492 bits of coded video data. The generator polynomial is given by:

(9.12)

The parity bits are calculated by dividing the 493 bits (left shifted by 18 bits) of the video data (including the fill bit) to this generating function. Since the generating function is a 19-bit polynomial, the remainder will be an 18-bit binary number (that is why data bits had to be shifted by 18 bits to the left), to be used as the parity bits. For example, for the input data of 01111...11 (493 bits), the resulting correction parity bits are 011011010100011011 (18 bits). The encoder appends these 18 bits to the 493 data bits, and the whole 511-bits are sent to the receiver as a block of data. Now this 511 bit data is exactly divisible to the generating function, and the remainder will be zero. Thus, the receiver can perform a similar division, and if there is any remainder, it is an indication of channel error. This is a very robust form of error detection, since burst of errors can also be detected.

10.5.2Back channel

T: TransformQ: QuantiserCC: Coding control

P: Picture memory with motion compensated variable delay

AP: Additional picture memoryv: Motion vector

p: Flag for INTRA/INTERt: Flag for transmitted or not

qz: Quantisation indicationq: Quantisation index for DCT coefficients

Figure 9.21 An encoder with multiple reference pictures

A positive acknowledgment (ACK) or a negative acknowledgment (NACK) is returned depending on whether the decoder successfully decodes a GOB.

An improved version of this annex [25-N] is used in H.264, under the concept of multiple reference frames. It is used both to improve compression efficiency, by searching for better motion vectors among several frames, as well as error robustness as discussed here. Multiple reference motion compensation and error free selection of reference frame will be studied in some details in Chapter 11.

10.5.3Data partitioning

Figure 9.22 Effects of errors on (a) with and (b) without data partitioning

Table 9.3 VLC and RVLC bits of MCBPC (for I-pictures)

Index / MB type / CBPC / Normal VLC / RVLC
0 / 3 (Intra) / 00 / 1 / 1
1 / 3 / 01 / 001 / 010
2 / 3 / 10 / 010 / 0110
3 / 3 / 11 / 011 / 01110
4 / 4 (Intra+Q) / 00 / 0001 / 00100
5 / 4 / 01 / 000001 / 011110
6 / 4 / 10 / 000010 / 001100
7 / 4 / 11 / 000011 / 0111110

Table 9.4 Number of bits per slice for data partitioning

Data Partitioning
Slice No / Slice header / MV / Coeff / SUM / Normal
1 / 52 / 30 / 211 / 293 / 269
2 / 63 / 34 / 506 / 603 / 571
3 / 45 / 42 / 748 / 835 / 803
4 / 48 / 42 / 1025 / 1115 / 1083
5 / 45 / 71 / 959 / 1075 / 1043
6 / 41 / 46 / 844 / 931 / 899
7 / 48 / 34 / 425 / 507 / 475
8 / 51 / 32 / 408 / 491 / 459
9 / 38 / 24 / 221 / 283 / 251

Figure 9.23 Error in a bit stream

:

(9.13)

whereN is the total number of pixels at the upper and lower borders of the MB.

Figure 9.24 Pixels at the boundary of (a) a macroblock and (b) four blocks

Figure 9.25 Step-by-step decoding and skipping of bits in the bit stream

10.5.4Error concealment

10.5.4.1 Intraframe error concealment

Figure 9.26 An example of Intraframe error concealment

9.7.5.1 Interframe error concealment

Figure 9.27 A grid of macroblocks in the current and previous frame

Zero mv

Previous mv

Top mv

Mean mv

;(9.14)

Majority mv

; N 6(9.15)

Vector median mv

thejth motion vector, mvj, is the median if:

(9.16)

Table 9.5 PSNR [dB] of various error concealment methods at 5 frames/s.

Type / 64Kbps, 5 fps, QCIF
Sequence / Seq-1 / Seq-2 / Seq-3 / Seq-4
Zero / 17.04 / 18.15 / 14.54 / 13.08
Previous / 17.28 / 18.34 / 14.53 / 13.48
Top / 19.27 / 21.08 / 17.04 / 16.25
Average / 19.18 / 21.74 / 17.51 / 16.18
Majority / 19.35 / 21.83 / 17.89 / 16.61
Median / 19.87 / 22.52 / 18.29 / 16.89
No Errors / 22.57 / 26.94 / 20.85 / 19.88

Table 9.6 PSNR [dB] of the various error concealment methods at 12.5 frames/s.

Type / 64Kbps, 12.5 fps, QCIF
Sequence / Seq-1 / Seq-2 / Seq-3 / Seq-4
Zero / 20.62 / 22.11 / 18.64 / 16.62
Previous / 22.49 / 22.19 / 18.19 / 16.53
Top / 22.97 / 25.16 / 20.97 / 20.04
Average / 22.92 / 25.99 / 21.24 / 20.08
Majority / 23.33 / 26.32 / 21.57 / 20.36
Median / 24.36 / 26.72 / 22.16 / 20.81
No Errors / 26.12 / 29.69 / 23.93 / 23.04

Figure 9.28, An erroneous picture along with its error concealed one

Bidirectional mv

Figure 9.29 A group of alternate P and B pictures

To estimate a missing motion vector for a P-picture, say P31, the available motion vectors of the same spatial coordinates of the B-pictures can be used, with the following substitutions:

if only B23 is available, then P31 = 2 × B23

if only B21 is available, then P31 = -2 × B21

if both B23 and B21 are available, then P31 = B23 - B21

If none of them are available, then set P31 = 0

To estimate a missing motion vector of a B-picture, simply divide that of the P-picture by two; B23=½P31 or B21= -½P31

9.7.5.2Loss concealment

9.7.5.3 Selection of best estimated motion vector

Scalability

Figure 8.11 Block diagram of a two-layer SNR scalable coder
Figure 8.12 A DCT based base layer encoder
Figure 8.13 A two-layer SNR scalable encoder with drift at the base layer
Figure 8.14 A three layer drift free SNR scalable encoder
Figure 8.15 A block diagram of a three-layer SNR decoder

Figure 8.16 Picture quality of the base layer of SNR encoder at 2 Mbit/s

Base + Enhancement @ 8 Mbit/s

Spatial scalability

Figure 8.17 Block diagram of a two-layer spatial scalable encoder
Figure 8.19 Details of spatial scalability encoder

(a)

(b)

Figure 8.20 (a) Base layer picture of a spatial scalable encoder at 2 Mbit/s , and (b) its enlarged version

Temporal scalability
Figure 8.21 A block diagram of a two-layer temporal scalable encoder

Hybrid scalability

Figure 8.25 SNR, spatial and temporal hybrid scalability encoder

Overhead due to scalability

Figure 8.26 Increase in bit rate due to scalability

Table 8.4 Applications of SNR scalability

Base layer / Enhancement layer / Application
ITU-R-601 / same resolution and format as lower layer / two quality service for standard TV
high definition / same resolution and format as lower layer / two quality service for HDTV
4:2:0 high definition / 4:2:2 chroma simulcast / video production/distribution

Table 8.5 Applications of spatial scalability

Base / Enhancement / Application
progressive(30Hz) / progressive(30Hz) / CIF/QCIF compatibility or scalability
interlace(30Hz) / interlace(30Hz) / HDTV/SDTV scalability
progressive(30Hz) / interlace(30Hz) / ISO/IECE11172-2/compatibility with this specification
interlace(30Hz) / progressive(60Hz) / Migration to HR progressive HDTV

Table 8.6 Applications of temporal scalability

Base / Enhancement / Higher / Application
progressive(30Hz) / progressive(30Hz) / progressive (60Hz) / migration to HR progressive HDTV
interlace(30Hz) / interlace(30Hz) / progressive (60Hz) / migration to HR progressive HDTV

Scalability in H.263

Figure 9.30 B-picture prediction dependency in the temporal scalability

SNR scalability

Figure 9.31 Prediction flow in SNR scalability

Spatial scalability

Figure 9.32 Prediction flow in an spatial scalability

Multilayer scalability

Figure 9.33 Positions of the base and enhancement-layer pictures in a multilayer scalable bit-stream

Transmission order of pictures

Figure 9.34 Example of picture transmission order

Numbers next to each picture indicate the bit stream order, separated by commas for the two alternatives.