Motion Estimation and Compensation
Total Page:16
File Type:pdf, Size:1020Kb
Video Codec Design Iain E. G. Richardson Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-48553-5 (Hardback); 0-470-84783-2 (Electronic) Motion Estimation and Compensation 6.1 INTRODUCTION In the video coding standards described in Chapters 4 and 5, blocks of image samples or residual data are compressed using a block-based transform (such as the DCT) followed by quantisationand entropy encoding. There is limitedscope for improved compression performance in the later stages of encoding (DCT, quantisation and entropy coding), since the operation of the DCT and the codebook for entropy coding are specified by the relevant video coding standard. However, there is scope for significant performance improvement in the design of the first stage of a video CODEC (motion estimation and compensation). Efficient motion estimation reduces the energy in the motion-compensated residual frame and can dramatically improve compression performance. Motion estimation can bevery computationally intensive and so this compression performance may be at the expense of high computational complexity. This chapter describes the motion estimation and compen- sation process in detail and discusses implementation alternatives and trade-offs. The motion estimation and compensation functions have many implications for CODEC performance. Key performance issues include: 0 Coding performance (how efficient is the algorithm at minimising the residual frame?) 0 Complexity (does the algorithm makeeffective use of computation resources, how easy is it to implement in software or hardware?) 0 Storage and/or delay (does the algorithm introduce extra delay and/or require storage of multiple frames?) 0 ‘Side’ information(how much extrainformation, e.g. motion vectors, needs to be transmitted to the decoder?) 0 Error resilience (how does the decoder perform when errors occur during transmission?) These issues are interrelated and are potentially contradictory (e.g. better coding perfor- mance may lead to increased complexity and delay and poor error resilience) and different solutions are appropriate for different platforms and applications. The design and imple- mentation of motionestimation, compensation and reconstruction can be critical to the performance of a video coding application. 94 COMPENSATIONESTIMATIONMOTION AND 6.2 MOTION ESTIMATION AND COMPENSATION 6.2.1 Requirements for Motion Estimation and Compensation Motion estimation creates a model of the current frame based on available data in one or more previously encoded frames (‘reference frames’). These reference framesmay be ‘past’ frames (i.e. earlier than the current frame in temporal order) or ‘future’ frames (i.e. later in temporal order). The design goals for a motion estimation algorithm areto model the current frame as accurately as possible (since this gives better compression performance) whilst maintainingacceptable computational complexity. InFigure 6.1, the motion estimation module creates a model by modifying one or more reference frames to match the current frame asclosely as possible (according to a matching criterion).The current frame is motion compensated by subtracting the model from the frame to produce a motion-compensated residualframe. This is coded and transmitted, along with theinformation required for the decoder to recreate the model (typically a set of motion vectors). At the same time, the encoded residual is decoded and added to the model to reconstruct a decoded copy of the current frame (which may not be identical to the original frame because of coding losses). This reconstructed frame is stored to be used as a reference frame for further predictions. The residualframe (or displaced frame difference, DFD) is encoded and transmitted, together with any ‘side information’ (such as motion vectors) needed to recreate the model at the decoder. The ‘best’ compression performance is achieved when the size of the coded DFD and coded side information is minimised. The size of the coded DFD is related to the energy remaining in the DFD after motion compensation. Figure 6.2 shows a previous, current and residual frame (DFD) without motion compensation: there is clearly a significant amount of energy present around the boundaries of moving objects (the girl and the bicycle in this case).It should be possible to reducethis energy (and improve compression performance) using motion estimation and compensation. Motion Current frame compensation Encode residual $. Reconstructed Reconstruction 4 frame Decode residual d Figure 6.1 Motion estimation and compensation block diagram MOTION ESTIMATIONMOTION AND COMPENSATION 95 Figure 6.2 (a)Previous frame; (b) current (c) frame; (c) DFD (no motion compensation) 6.2.2 Block Matching In the popular video coding standards (H.261, H.263, MPEG-1, MPEG-2 and MPEG-4), motion estimation and compensation are carried out8 onx 8 or 16 x 16 blocksin the current frame. Motion estimation of complete blocks is known as block matching. For each block of luminance samples (say 16 x 16) in the current frame, the motion estimation algorithm searches a neighbouring area of the reference frame for a ‘matching’ 16 x 16 area. The best match is the one that minimises the energyof the difference between the current 16 x 16 block and the matching 16 x 16 area. The area in which the search is carried out may be centred around the position of the current 16 x 16 block, because (a) there is likely to be a good match in the immediate areaof the current block due to thehigh similarity (correlation) between subsequent frames and (b) it would be computationally intensive to search the whole of the reference frame. Figure 6.3 illustrates the block matching process. The current ‘block’ (in this case, 3x 3 pixels) is shown on the left and thisblock is comparedwith the same position in the reference frame (shown by the thick line in the centre) and the immediate neighbouring positions (+/-l pixelin each direction). The mean squarederror (MSE) betweenthe 96 MOTION ESTIMATION AND COMPENSATION CurrentReferenceblock area Positions (x,y) Figure 6.3 Current 3 x 3 block and 5 x 5 reference area current block and the same position in the reference frame (position (0, 0)) is given by {(l - 4)* + (3 - 2)2 + (2 - 3)2+ (6 - 4)* + (4 - 2)’ + (3 - 2)’ + (5 - 4)’ + (4 - 3)’ + (3 - 3)2}/9 = 2.44 The complete set of MSE values for each search position is listed in Table 6.1 and shown graphically in Figure 6.4. Of the nine candidate positions, (- 1, l) gives the smallest MSE and hence the ‘best’ match. In this example, the best ‘model’ for the current block (i.e. the best prediction) is the 3 x 3 region in position (- l, 1). A video encoder carries out this process for each block in the current frame 1. Calculate the energyof the difference betweenthe current block and a set of neighbouring regions in the reference frame. 2. Select the region that gives the lowest error (the ‘matching region’). 3. Subtract the matching region from the current block to produce a difference block. 4. Encode and transmit the difference block. 5. Encode and transmit a ‘motion vector’ that indicates the position of the matching region, relative to the current block position (in the above example, the motionvector is ( - I, 1). Steps 1 and 2 above correspond to motion estimation and step 3 to motion compensation. The video decoder reconstructs the block as follows: 1. Decode the difference block and motion vector. 2. Add the difference block to the matching region in the reference frame (i.e. the region ‘pointed’ to by the motion vector). Table 6.1 MSE values for block matching example Position(x,y) (-1,-1) (0, -1) (1, -1) (-1,O) (0,O) (1, 0) (-1, 1) (0, 1) (1, 1) MSE 4.672.78 2.89 2.443.22 3.33 0.22 5.332.56 M OTION ESTIMATIONMOTION AND COMPENSATION 97 (b) Figure 6.4 MSE map: (a) surface plot; (b) pseudocolour plot 6.2.3 MinimisingDifference Energy The name‘motion estimation’ is misleading because the process does not necessarily identify ‘true’ motion, instead it attempts to find a matching region in the reference frame MOTION ESTIMATION AND COMPENSATION l, I 2t3 4- 5- 6- e-- 7 8 “0 2 4 6 8 10 12 Figure 6.5 16 x 16 block motion vectors thatminimises the energy of thedifference block. Where there is clearly identifiable linearmotion, such as largemoving objects or global motion (camera panning, etc.), motion vectors produced in this way should roughly correspond to the movement of blocks between the reference and the current frames. However, where the motion is less obvious (e.g. small moving objects that do not correspond to complete blocks, irregular motion, etc.), the ‘motion vector’ may not indicate genuine motion but rather the positionof a good match. Figure6.5 shows the motion vectors produced by motionestimation for each of the 16 x 16 blocks (‘macroblocks’)of the frame in Figure 6.2. Most of the vectors do correspond to motion: the girl and bicycle are movingto the left and so the vectors point to therighr (i.e. to the region the objects have movedfrorn). There is an anomalous vectorin the middle (it is larger than the rest and points diagonally upwards). This vector does not correspond to ‘true’ motion, it simply indicates that the best match can be found in this position. There are many possible variations on the basic block matching process, some of which will be described later in this chapter. Alternative measures of DFD energy may be used (to reduce the computation required to calculate MSE). Varying block sizes, or irregular- shaped regions, can be more efficient at matching ‘true’ motion than fixed 16 x 16 blocks. A better match may be found by searching within two or more reference frames (rather than just one). The order of searching neighbouring regions can have a significant effect on matching efficiency and computational complexity.