<<

Motion Estimation for Coding

. Motion-Compensated Prediction . Allocation . Motion Models . . Efficiency of Techniques

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 1 Hybrid Video Encoder

Coder Control Control Data s[x, y,t] u[x, y,t] Intra-Frame DCT DCT Coder - Coefficients Decoder Intra-Frame Decoder u'[x, y,t]

s'[x, y,t] 0 Motion- Compensated Intra/Inter Predictor

sˆ[x, y,t] Motion Data Motion Estimator

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 2 Motion-Compensated Prediction

Previous Stationary frame background Moving t object Current frame x

y

time t  d   x  Displaced   d y  object Prediction for the luminance signal s[x,y,t] within the moving object:

sˆ[x, y,t]  s(x  dx , y  d y ,t  t)

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 3 Motion-Compensated Prediction: Example

Frame 1 s[x,y,t-1] (previous) Frame 2 s[x,y,t] (current) Partition of frame 2 into blocks (schematic)

Size of Blocks Accuracy of Motion Vectors

Referenced blocks in frame 1 Frame 2 with Difference between motion- displacement vectors compensated prediction and current frame u[x,y,t]

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 4 Motion Models

. Motion in 3-D space corresponds to displacements in the image plane . Motion compensation in the image plane is conducted to provide a prediction signal for efficient video compression . Efficient motion-compensated prediction often uses side information to transmit the displacements . Displacements must be efficiently represented for video compression . Motion models relate 3-D motion to displacements assuming reasonable restrictions of the motion and objects in the 3-D world

Motion Model d x  x  x  f x (a, x, y), d y  y  y  f y (b, x, y) x, y : location in previous image x, y : location in current image a,b : vector of motion coefficients : displacements d x ,d y

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 5 Representation of Video Signal

Decoded video signal is given as

s[x, y,t]  sˆ[x, y,t]u[x, y,t]

Motion-compensated Prediction residual signal prediction signal M 1 sˆ[x, y,t]  s(x  d , y  d ,t  t)  x y u (x, y)  c j  j (x, y) j0 N 1 N 1 dx  ai i (x, y), d y  bi i (x, y) i0 i0 Transmitted residual parameters Transmitted motion parameters

Ru  f (c), c  (c0 ,...) Rm  f (a,b), a  (a0 ,...), b  (b0 ,...)

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 6 Rate-Constrained Motion Estimation

Bit-rate Motion vector rate D

Rm Prediction error rate

Ru

Displacement error variance dD dD . Optimum trade-off:  , R  Rm  Ru dRm dRu . Displacement error variance can be influenced via • Block-size, quantization of motion parameters • Choice of motion model

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 7 Lagrangian Optimization in Video Coding

. A number of interactions are often neglected • Temporal dependency due to DPCM loop • Spatial dependency of coding decisions • Conditional entropy coding . Rate-Constrained Motion Estimation [Sullivan, Baker 1991]:

min Dm  mRm

Distortion after Lagrange Number of motion compensation parameter for motion vector

. Rate-Constrained Mode Decision [Wiegand, et al. 1996]: min D  R

Distortion after Lagrange Number of bits reconstruction parameter for coding mode

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 8 Motion Models

N 1 N 1 dx  ai i (x, y), d y  bi i (x, y) i0 i0

. Translational motion model dx  a0 , d y  b0 . 4-Parameter motion model: translation, zoom (isotropic Scaling), rotation in image plane d x a0a1x a2y

d y b0 a2x a1y

. Affine motion model: d x a0a1x a2y

d y b0 b1x b2 y

2 2 . Parabolic motion model dx a0a1x a3y a2x a6y a5xy 2 2 dx b0 b1x b3y b2 x b6 y b5 xy

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 9 Impact of the Affine Parameters

150

100 dx  a0  translation 50

y 0 -150 -100 -50 0 50 100 150 -50

-100

-150 x

150 150

100 100

50 50

dx  a1x  scaling y y 0 0 -150 -100 -50 0 50 100 150 -150 -100 -50 0 50 100 150 -50 -50

-100 -100

-150 -150 x x

150

100

dx  a3 y  sheering 50

y 0 -150 -100 -50 0 50 100 150 -50

-100

-150 x

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 10 Impact of the Parabolic Parameters

150

2 100 dx  a2 x 50

y 0 -150 -100 -50 0 50 100 150 -50

-100

-150 x

150 150 100 2 100

50 dx  a6 y 50 y y 0 0 -150 -100 -50 0 50 100 150 -150 -100 -50 0 50 100 150 -50 -50

-100 -100

-150 -150 x x

150

100

50 dx  a5 xy y 0 -150 -100 -50 0 50 100 150 -50

-100

-150 x

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 11 Differential Motion Estimation

. Assume small displacements dx,dy:

u(x, y,t)  s(x, y,t)  sˆ(x, y,t,d x ,d y ) s(x, y,t  t) s(x, y,t  t)  s(x, y,t)  s(x, y,t  t)  d  d x x y y

Displace frame difference Horizontal and vertical gradient of image signal S

. Aperture problem: several observations required . Inaccurate for displacements > 0.5 pel  multigrid methods, iteration

. Minimize By Bx min u 2 (x, y,t) y1 x1

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 12 Gradient-Based Affine Refinement

. Displacement vector field is represented as

x  a1  a2 x  a3 y . Combination y  b1  b2 x  b3 y s s u(x, y,t)  s(x, y,t)  s(x, y,t  t)  (a  a x  a y)  (b  b x  b y) x 1 2 3 y 1 2 3  a  yields a system of linear equations:  1  a2     s s s s s s  a3 u  s  s   , x, y, , x, y,   x x x y y y  b1    b2    b3  . System can be solved using, e.g., pseudo-inverse, By Bx by minimizing 2 arg min u (x, y,t) a1 ,a2 ,a3 ,b1 ,b2 ,b3 y1 x1

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 13 Block-matching Algorithm

search range in • Subdivide current frame previous frame into blocks.

Sk1 • Find one displacement vector for each block. • Within a search range, find a “best match” that minimizes an error measure. • Intelligent search strategies can reduce computation.

block of current

frame Sk

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 14 Block-matching Algorithm

Previous Frame Current Frame

Measurement window is Block of is selected compared with a shifted block as a measurement window of pixels in the other image, to determine the best match

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 15 Block-matching Algorithm

Previous Frame Current Frame

. . . process repeated for another block.

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 16 Error Measures for Block-matching

. Mean squared error (sum of squared errors)

By Bx  2 SSD(dx ,d y )  [s(x, y,t)  s (x  dx , y  d y ,t  t)] y1 x1

. Sum of absolute differences

By Bx  SAD(dx ,d y )  | s(x, y,t)  s (x  dx , y  d y ,t  t) | y1 x1

. Approximately same performance  SAD less complex for some architectures

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 17 Block-matching: Search Strategies

Full search . All possible displacements within the search range are compared.

dx

dy . Computationally expensive . Highly regular, parallelizable

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 18 Speedup of Block-matching

Complexity of block-matching: evaluation of complex error measure for many candidates

Reduce complexity Reduce number • Approximations • Cover likely search areas • Early terminations • Unequal steps between • Exclude candidates searched candidates

Combine both approaches: Choose starting point and search order that maximizes likelihood for efficient approximations, early terminations and excluding of candidates

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 19 Approximations

• Stop search, if match is “good enough” (SSD, SAD < threshold or J=D+R is small enough) • Practical method in video conferencing for static background: test zero-vector first and stop search if match is good enough static background

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 20 Early Termination

. Partial distortion measure: DK (dx ,d y ), K  N...By

K Bx  p DK (dx ,d y )  [s(x, y,t)  s (x  dx , y  d y ,t  t)] y1 x1

. Previously computed minimum: J min

. Early termination without loss:  Stop, if DK (dx ,d y )  m  R(dx ,d y )  Jmin . Early termination with possible loss, but higher speedup:

Stop, if DK (dx ,d y )  m  R(dx ,d y ) (K) Jmin

1/ 2 e.g., (K)  (K / By )

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 21 Block Comparison Speed-Ups

. Triangle inequality for samples in block B (here SAD)

| Sk  Sk1 |  | Sk |  | Sk1 | B B . Strategy: 1. Compute partial sums for blocks in current and previous frame 2. Compare blocks based on partial sums 3. Omit full block comparison, if partial sums indicate worse error measure than previous best result

. Block partitioning B  Bn , B  n n

| Sk  Sk1 |  | Sk  Sk1 | B n Bn

. Choose blocks B n to be nested (4x4, 8x8, 16x16, …) – efficient for modern variable block size video

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 22 Blockmatching: Search Order I 2D logarithmic search [Jain + Jain, 1981]

• Iterative comparison of error measure values at 5 neighboring points • Logarithmic refinement of the search pattern if – best match is in the center of the 5-point pattern – center of search pattern touches the border of the search range

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 23 Blockmatching: Search Order II

Diamond search [Li, Zeng, Liou, 1994] [Zhu, Ma, 1997]

d y

d x

Start with If best match If best match does not lie large diamond lies in the center in the center of large pattern at of large diamond, diamond, center large (0,0) proceed with diamond pattern at new small diamond best match

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 24 Hierarchical blockmatching

Displacement vector field

Block matching current frame previous frame

Filtering and T. Wiegand / B. Girod:Block EE398A Image and Video Compression Motion estimation no. 25 subsampling matching

Filtering and Block subsampling matching Sub-pel Translational Motion

. Motion vectors are often not restricted to only point into the integer-pel grid of the reference frame . Typical sub-pel accuracies: half-pel and quarter-pel . Sub-pel positions are often estimated by „refinement“

•1 •1 •1 •2•2•2 dx •1 •1•2•1•2 •2•2•2 •1 •1 •1

dy

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 26 Sub-pel Motion Compensation

. Sub-pel positions are obtained via interpolation . Example: half- accurate displacements

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• •• • • • • • • • • • • • d • • • • • • • • • • • • • x 4.5 • • • •   • • • • • • • • • • • • • • • • • d 4.5 • • • • • • • • • • • • • y  • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 27 Bi-linear Interpolation

brightness

Interpolated Pixel Value

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 28 History of Motion Compensation

• Intraframe coding: only spatial correlation exploited Complexity  DCT [Ahmed, Natarajan, Rao 1974], JPEG [1992] increase • Conditional replenishment  H.120 [1984] (DPCM, scalar quantization) • Frame difference coding  H.120 Version 2 [1988] • Motion compensation: integer-pel accurate displacements  H.261 [1991] • Half-pel accurate motion compensation  MPEG-1 [1993], MPEG-2/H.262 [1994] • Variable block-size (16x16 & 8x8) motion compensation  H.263 [1996], MPEG-4 [1999] • Variable block-size (16x16 – 4x4) and multi-frame motion compensation  H.264/MPEG-4 AVC [2003]

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 29 Milestones in Video Coding

Variable block size Variable block size Frame PSNR (16x16 – 4x4) + (16x16 – 8x8) Half-pel Difference coding [dB] quarter-pel + (H.263, 1996) + motion compensation (H.120 1988) multi-frame quarter-pel (MPEG-1 1993 40 motion compensation motion compensation MPEG-2 1994) (H.264/AVC, 2003) (MPEG-4, 1998) 38 Foreman 36 Bit-rate Reduction: 75% 10 Hz, QCIF 35 100 frames 34 Conditional Replenishment (H.120) 32 Integer-pel motion 30 compensation (H.261, 1991) Intraframe 28 DCT coding (JPEG, 1990) Rate [kbit/s] 0 100 200 300 T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 30 Milestones in Video Coding

Variable block size Variable block size Frame PSNR (16x16 – 4x4) + (16x16 – 8x8) Half-pel Difference coding (H.263, 1996) + [dB] quarter-pel + motion compensation (H.120 1988) multi-frame quarter-pel (MPEG-1 1993 40 motion compensation motion compensation MPEG-2 1994) (H.264/AVC, 2003) (MPEG-4, 1998) 38 Foreman 36 10 Hz, QCIF 100 frames 34 Visual Comparison Conditional Replenishment (H.120) 32 Integer-pel motion 30 compensation (H.261, 1991) Intraframe 28 DCT coding (JPEG, 1990) Rate [kbit/s] 0 100 200 300 T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 31 Visual Comparison

Foreman, QCIF, 10 Hz, 100 kbit/s JPEG H.264/AVC

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 32 Summary

. Video coding as a hybrid of motion compensation and prediction residual coding . Motion models can represent various kinds of motions . Lagrangian bit-allocation rules specify constant slope allocation to motion coefficients and prediction error . In practice: affine or 8-parameter model for camera motion, translational model for small blocks . Differential methods calculate displacement from spatial and temporal differences in the image signal - . Block matching computes error measure for candidate displacements and finds best match . Speed up block matching by fast search methods, approximations, early terminations and clever application of triangle inequality . Hybrid video coding has been drastically improved by enhanced motion compensation capabilities

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 33