CH: 7 FUNDAMENTALS of VIDEO CODING Inter Frame Redundancy: an Inter Frame Is a Frame in a Video Compression Stream Which Is Expr

CH: 7 FUNDAMENTALS OF VIDEO CODING Inter frame redundancy: An inter frame is a frame in a video compression stream which is expressed in terms of one or more neighboring frames. The "inter" part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy between neighboring frames enabling higher compression rates. Motion estimation: The temporal encoding aspect of this system relies on the assumption that rigid body motion is responsible for the differences between two or more successive frames. The objective of the motion estimator is to estimate the rigid body motion between two frames. The motion estimator operates on all current frame 16 x 16 image blocks and generates the pixel displacement or motion vector for each block. The technique used to generate motion vectors is called block-matching motion estimation.The method uses the current frame Ik and the previous reconstructed frame fk-l as input. Each block in the previous frame is assumed to have a displacement that can be found by searching for it in the current frame. The search is usually constrained to be within a reasonable neighborhood so as to minimize the complexity of the operation. Search matching is usually based on a minimum MSE or MAE criterion. When a match is found, the pixel displacement is used to encode the particular block. If a search does not meet a minimum MSE or MAE threshold criterion, the motion compensator will indicate that the current block is to be spatially encoded by using the intraframe mode. Motion Estimation technique full search and fast search techniques: Motion estimation (ME) is used extensively in video codecs based on MPEG-4 standards to remove interframe redundancy. Motion estimation is based on the block matching method which evaluates block mismatch by the sum of squared differences (SSD) measure. Winograd’s Fourier transform is applied and the redundancy of the overlapped area computation among reference blocks is eliminated in order to reduce the computational amount of the ME. When the block size is N × N and the number of reference blocks in a search window is the same as the current block, this method reduces the computational amount (additions and multiplications) by 58 % of the straightforward approach for N = 8 and to 81 % for N = 16 without degrading motion tracking capability. The proposed fast full- search ME method enables more accurate motion estimation in comparison to conventional fast ME methods, thus it can be applied in video systems. The popularity of video as a mean of data representation and transmission is increasing. Hence the requirements for a quality and size of video are growing. High visual quality of video is provided by coding. In 1960s the motion estimation (ME) and compensation were proposed to improve the efficiency of video coding . The current frame is divided into non-overlapping blocks. For each block of the current frame the most similar block of the reference frame within the limited search area is found. The criterion of the similarity of the two blocks is called a metric comparison of the two blocks. The position of the block, for which an extremum of metric is founded, determines the coordinates of the motion vector of the current block. The full search algorithm is the most accurate method of the block ME, i.e. the proportion of true motion vectors found is the highest . The current block is compared to all candidate blocks within the restricted search area in order to find the best match. This ME algorithm requires a lot of computing resources. Therefore, a lot of alternative fast motion estimation algorithms were developed. In 1981 T. Koga and other authors proposed a three-step search algorithm (TTS). The disadvantage of fast search methods is finding a local extremum of a function of the difference of two blocks. Consequently motion estimation degrades by half degradation in some sequences compared to brute-force and visual quality of video degrades as well. The Criterion To Compare Blocks The standards of video coding do not regulate the choice of criterion for matching two blocks (metric). One of the most popular metrics is the sum of square difference (SSD): Nh-1 Nw-1 2 SSD(i,j) = ∑ ∑(B(x,y)-s(x+i,y+j)) Y=0 x=0 where i, j – the coordinates of the motion vector of the current block, i ϵ (–Vw/2; Vw/2), j ϵ (–Vh/2; Vh/2), where Vw × Vh – size of the area which can be is the upper left corner of the title block on the reference frame; x, y – coordinates of the current block B; Nw × Nh – block size B; S – reference area of size Sw × Sh, where Sw = Nw + Vw, Sh = Nh + Vh; B and S – luminance images in color format YUV. Inside the search area size Sw × Sh is the minimum value of SSD criterion for the current block B, which determines the coordinates of the motion vector in order. SSD can be calculated through fewer number of operations by decomposition into three components and those are: Nh-1 Nw-1 2 ∑ ∑ B (x,y) y=0 x=0 Nh-1 Nw-1 -∑ ∑ B(x,y)S(x+i,y+j) y=0 x=0 Nh-1 Nw-1 2 +∑ ∑ S (x+i,y+j) y=0 x=0 We propose to replace this algorithm by other fast transforms: Winograd algorithm and the number- theoretic transform of Farm (NTT). Backward motion estimation: The motion estimation that we have discussed in Section-20.3 and Section 20.4 is essentially backward motion estimation, since the current frame is considered as the candidate frame and the reference frame on which the motion vectors are searched is a past frame, that is, the search is backward. Backward motion estimation leads to forward motion prediction. Backward motion estimation, illustrated in below fig Forward motion estimation It is just the opposite of backward motion estimation. Here, the search for motion vectors is carried out on a frame that appears later than the candidates frame in temporal ordering. In other words, the search is “forward”. Forward motion estimation leads to backward motion prediction. Forward motion estimation, illustrated in fig 20.3 It may appear that forward motion estimation is unusual, since one requires future frames to predict the candidate frame. However, this is not unusual, since the candidate frame, for which the motion vector is being sought is not necessarily the current, that is the most recent frame. It is possible to store more than one frame and use one of the past frames as a candidate frame that uses another frame, appearing later in the temporal order as a reference. Forward motion estimation (or backward motion compensation) is supported under the MPEG 1 & 2 standards, in addition to the conventional backward motion estimation. The standard also supports bidirectional motion compensation in which the candidate frame is predicted from a past reference as well as a future reference frame with respect to the candidates frame. Frame classification: In video compression, a video frame is compressed by using different algorithms.These different algorithms for video frames are called picture types or frame types and they are I, P and B.The characteristics of frame types are: I-frame: I-frame are the least compressible but do not require other video frames to decode. P-frame: It can use data from previous frames to decompress and are more compressible than I-frame. B-frame: It can use both previous and forward frames for data reference to get the highest amount of data compression. An I frame(Intra coded picture) is a complete image like JPG image file. A P-frame(Predicted picture) holds only the changes in the image from the previous frame. For example , in a scene where a car moves across a stationary background, only the car’s movements need to be encoded. The encoder does not need to store the unchanging background pixels in the P- frame for saving space. P-frames are also known as delta frames. A B-frame(Bidirectional predicted picture) saves even more space by using differences between the current frame and both the preceding and following frames to specify its content. Picture/Frame: The term picture and frame are used interchangeably. The term picture is more general notion as a picture can be either a frame or a field. A frame is complete image and a field is the set of odd numbered or even numbered scan lines composing a partial image. For example, an HD 1080 picture has 1080 lines of pixels. An odd field consist of pixel information for lines 1,3,5,......1079. An even field has pixel information for lines 2,4,6,....1080.Whwn video is sent in interlaced scan format then each frame is sent in two fields, the fields of odd numbered lines followed by the field of even numbered lines. A frame used as a reference for preceding other frames is called a reference frame. Frame encoded without information from other frames are called I-frames. Frame that use prediction from a single preceding reference frame are called P-frames. The frames that use prediction from a average of two reference frames, one preceding and one succeeding are called B-frames. Slices: A slice is a spatially distinct region of a frame that is encoded separately from any other region in the same frame . I-slices, P-slices, and B-slices take the place of I, P and B frames. Macroblocks: It is a processing unit in image and video compression formats based on linear block transforms, typically the DCT.

Load more