Lec 09 - Video Processing I Motion Estimation & Compensation
Total Page:16
File Type:pdf, Size:1020Kb
ECE 5578 Multimedia Communication Lec 09 - Video Processing I Motion Estimation & Compensation Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: [email protected], Ph: x 2346. http://l.web.umkc.edu/lizhu slides created with WPS Office Linux and EqualX LaTex equation editor Z. Li Multimedia Communciation, 2018 p.1 Outline Lecture 07 Re-Cap Arithmetic Coding HW Color Space and Sampling Motion Estimation and Compensation . Optical Flow (pixel based) . Block Based Motion Estimation . Fast Block Motion Estimation Z. Li Multimedia Communciation, 2018 p.2 Scalar Quantization Uniform Quantizer & Distortions Distortion Metrics: ¥ M bi . MSE = ˆ - 2 = - 2 d ò (x x) f (x)dx å ò (yi x) f (x)dx i=1 -¥ bi-1 . Uniform Q MSE: D 1 2 1 1 1 d = M ò ( x - D / 2) dx = D3 = D2 M D 0 D 12 12 Z. Li Multimedia Communciation, 2018 p.3 Scalar Quantization Non-Uniform Scalar Quantization f(x) . Intuition: denser sampling at higher pdf region ¥ M bk = ˆ - 2 = - 2 d({bk , yk }) ò (x x) f (x)dx å ò (yk x) f (x)dx k =1 -¥ bk-1 x . Formulation: minimize d() over {bk, yk}, take Lagrangian and via KKT condition: ¶d 2 2 = 0 Þ (yi - bi ) f (bi ) - (yi+1 - bi ) f (bi ) = 0 ¶bi yi + yi+1 Þ bi = 2 bi ò x f (x)dx ¶d b i-1 y is the centroid ! = 0 Þ yi = E{X | X Î I i }= i ¶y bi i ò f (x)dx bi-1 Z. Li Multimedia Communciation, 2018 p.4 Vector Quantizer A more optimal solution, better approx. R-D info theoretical boundary kmeans() . % desired rate . R=8; . [indx, vq_codebook]=kmeans(x, 2^R); kd-tree implementation . [kdt.indx, kdt.leafs, kdt.mbox]=buildVisualWordList(x, 2^R); . [node, prefix_code]=searchVisualWordList(q, kdt.indx, kdt.leafs); Z. Li Multimedia Communciation, 2018 p.5 Arithmetic Coding Quantization/ExpGolomb Binarization . Imfilter(im, f1) . Res = res(: ) – mean(res(: )) . ResQ = fix(Res/Delta); Arithmetic Coding . SFU code, thanks for Prof. Jie Liang ! Binary Image Adaptive Arithmetic Coding . Re-use HEVC BAC DNA sequence coding (bonus) . Re-use HEVC BAC Will have more Lab sessions to help if necessary, don’t worry. Z. Li Multimedia Communciation, 2018 p.6 Outline Lecture 07 Re-Cap Color Space and Sampling Motion Estimation and Compensation . Optical Flow (pixel based) . Block Based Motion Estimation . Fast Block Motion Estimation Z. Li Multimedia Communciation, 2018 p.7 Digital Video Basics Frame 1 51 71 91 111 Neighboring frames are usually very similar: . Prediction coding will be very efficient. Typical video coding methods: Motion compensated hybrid video coding: most popular approach Block-based motion estimation/compensation Block transform (DCT) Wavelet based video coding (AVC Scalable Video Coding (Shi- Tah Hsiang, Motorola Lab) Model-based video coding: (MPEG4 object based coding) Use analysis/synthesis techniques. Encode model parameters for the decoder to synthesize. Z. Li Multimedia Communciation, 2018 p.8 Color Space Conversion RGB to YUV conversion: . Kind of backward compatibility with old BW TVs . Color does not carry as much info. Z. Li Multimedia Communciation, 2018 p.9 Color Space Conversion Matlab example . Play YUV sequences: ffplay -s '1920x1080' -pix_fmt yuv420p -f rawvideo Beauty_1920x1080_120fps_420_8bit_YUV.yuv . Access YUV sequences ffmpeg -f rawvideo -s cif -i stefan_cif.yuv -vcodec png stefan_yuv_%03d.png f0 = imread(‘Stefan_yuv_100.png '); Z. Li Multimedia Communciation, 2018 p.10 Color Space: Down-sampling RGB components of an image have strong correlation. Can be converted to YUV space for better compression. HVS is more sensitive to the details of brightness than color. Can down-sample color components to improve compression. Luma sample Chroma sample MPEG-1 MPEG-2 YUV 4:4:4 YUV 4:2:2 YUV 4:2:0 No downsampling • 2:1 horizontal downsampling •2:1 horizontal downsampling Of Chroma of chroma components of chroma components • 2 chroma samples for •1 chroma sample for every every 4 luma samples 4 luma samples Z. Li Multimedia Communciation, 2018 p.11 YUV 4:2:0 File Format In YUV 4:2:0, number of U and V samples are 1/4 of the Y samples YUV samples are stored separately: Image: YYYY…..Y UU…U VV…V Y (row by row in each channel) Video: YUV of frame 1, YUV of frame 2, …… U V CIF (Common Intermediate format): . 352 x 288 pixels for Y, 176 x 144 pixels for U, V QCIF (Quarter CIF): 176 x 144 pixels for Y, 88 x 72 pixels for U, V U: 88 x 72 V: 88 x 72 Y: 176 x 144 Z. Li Multimedia Communciation, 2018 p.12 Layered Structure for Video Data Video data layers: . Sequence layer Group of Picture (GOP) layer Picture (frame) layer Group of Block (GOB) or slice layer Macroblock (MB) layer block layer sub-block layer Block: usually 8x8 pixels (4x4 pixels in H.264) Macroblock (MB): 16 x 16 region An MB: . Includes 4 luma blocks, 2 chroma blocks. Group of Blocks (GOB) or slice: 8x8 Y1 Y2 Cr Cb . H.261: 3 rows of 11 MBs (176 x 48 pixels) Y3 Y4 o QCIF: 176 x 144 3 GOBs o CIF: 352 x 288 12 GOBs . Slice: More flexible def. in H.264 1 2 Picture layer: one frame 3 4 GOP layer: Group of frames 5 6 7 8 Sequence: entire video sequence. 1 9 10 One GOB 2 3 11 12 QCIF: 176 x 144 CIF: 352 x 288 Z. Li Multimedia Communciation, 2018 p.13 Outline Lecture 07 Re-Cap Color Space and Sampling Motion Estimation and Compensation . Optical Flow (pixel based) . Block Based Motion Estimation . Fast Block Motion Estimation Z. Li Multimedia Communciation, 2018 p.14 Key Idea in Video Coding Predictive coding: Predict each frame from the previous frame(s) and only encode the prediction error: . Pred. error has smaller energy and is easier to compress Prediction can be performed at Frame level, Macroblock level, Block level, or even sub-block level. x Current Frame y Previous frame Z. Li Multimedia Communciation, 2018 p.15 Motion Field Image velocity of a point moving in the scene vodt X f ' ro Z dr ri = o d Y Scene point velocity: vo vi t dt Image velocity: dr v = i i dt 1 ro Perspective projection: ri = f ' ro × Z Motion field dri (ro × Z)vo - (vo × Z)ro (ro ´ vo )´ Z vi = = f ' 2 = f ' 2 dt (ro × Z) (ro × Z) Z. Li Multimedia Communciation, 2018 p.16 Optical Flow Optical Flow: OPTICAL FLOW = apparent motion of brightness patterns . Ideally a projection of 3d obj motion to 2d image plane Z. Li Multimedia Communciation, 2018 p.17 Lucas-Kanade OF estimator Optical Flow . I(x, y, t) is the pixel value at location (x,y) on image plane at time t . Brightness constancy assumption: dx dy I(x + dt, y + dt,t +dt) = I(x, y,t) dt dt . Optical Flow Constraint Equation: dI ¶I dx ¶I dy ¶I = + + = 0 dt ¶x dt ¶y dt ¶t This leads to Lucas-Kanade algorithm (details skipped) . Optimal OF [u, v] should satisfy: Z. Li Multimedia Communciation, 2018 p.18 Block Based Motion Estimation (ME) For each ME unit (MB or block), find the best match in the previous frame (reference frame) . Upper-left corner of the block being encoded: (x0, y0) . Upper-left corner of the matched block in the reference frame: (x1, y1) . Motion vector (dx, dy): the offset of the two blocks: o (dx, dy) = (x1 – x0, y1 – y0) = (x1, y1) - (x0, y0) o (x0, y0) + (dx, dy) = (x1, y1) . Motion vector need to be sent to the decoder. (x1, y1) Prediction error: B (x0, y0) e = A – B. A Z. Li Multimedia Communciation, 2018 p.19 GOP, I, P, and B Frames …… …… I P P P … P P P I P P P … P P P GOP GOP GOP: Group of pictures (frames). I frames (Key frames): . Intra-coded frame, coded as a still image. Can be decoded directly. Used at GOP head, or at scene changes. Allow random access, improves error resilience. P frames: (Inter-coded frames) . Predicated from the previous frame. Z. Li Multimedia Communciation, 2018 p.20 GOP, I, P, and B Frames B frames: Bi-directional interpolated prediction frames . Predicted from both the previous frame and the next frame: more flexibilities better prediction. Useful when new objects come into the scene. Before H.264, B frames are not used as reference for future frames: . B frames can be coded with lower quality or can be discarded without affecting future frames. Allow temporal scalability Encoding/Decoding order: 1 4 2 3 7 5 6 Display order: 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Need more buffers I B B P B B P Z. Li Multimedia Communciation, 2018 p.21 Basic Encoder Block Diagram Intra Input DCT Q Entropy frame Coding -1 Inter Q Pred. error I DCT Intra Recon Pred error Prediction Inter Recon. MC Memory Prediction Motion vectors ME Reconstructed Previous frame Motion compensation (MC): get the prediction of the current frame using MV information. Use reconstructed frame in the loop to prevent drifting. Z. Li Multimedia Communciation, 2018 p.22 Basic Decoder Block Diagram Intra Reconstructed frame -1 Entropy Q I DCT Decoding Recon Inter Pred error MC Memory Prediction Motion vectors Reconstructed Decoder is simpler than the encoder: Previous frame . No need to do motion estimation. Z. Li Multimedia Communciation, 2018 p.23 Motion Estimation For the k-th MB, the prediction error with MV (dx, dy): N -1 N -1 p e (k,dx,dy) = åå fn (xk + i, yk + j)- fn-1(xk + dx + i, yk + dy + j) i=0 j=0 xk , yk : The upper - left coordinate of the k - th MB; fn (x,y) : The (x, y)- th pixel in Frame n; N : ME block size (16 for MB level ME) p = 1: Sum of Absolute Difference (SAD) p = 2: Sum of Squared Difference (SSD) (-R, -R) Number of MV candidates: (2R+1) x (2R + 1) Objective: Find the MV that minimizes the pred error.