Motion Compensated Two-Link Chain Coding for Binary Shape Sequence
Total Page:16
File Type:pdf, Size:1020Kb
1 Motion Comp ensated Two-link Chain Co ding for Binary Shap e Sequence Zhitao Lu William A. Pearlman Electrical Computer and System Engineering Department Rensselare Polytechnic Institute Troy NY 12180 luz2, p [email protected] ABSTRACT In this pap er, we present a motion comp ensated two-link chain co ding technique to e ectively enco de 2-D binary shap e sequences for ob ject-based video co ding. This technique consists of a contour motion estimation and comp ensation algorithm and a two-link chain co ding algorithm. The ob ject contour is de ned on a 6-connected contour lattice for a smo other contour representation. The contour in the current frame is rst predicted by global motion and lo cal motion based on the deco ded contour in the previous frame; then, it is segmented into motion success segments, which can b e predicted by the global motion or the lo cal motion, and motion failure segments, which can not b e predicted by the global and lo cal motion. For each motion failure segment, a two- link chain co de, which uses one chain co de to representtwo consecutivecontour links, followed by an arithmetic co der is prop osed for ecient co ding. Each motion success segment can b e represented by the motion vector and its length. For contour motion estimation and comp ensation, b esides the translational motion mo del, an ane global motion mo del is prop osed and investigated for complex global motion. We test the p erformance of the prop osed technique by several MPEG-4 shap e test sequences. The exp erimental results show that our prop osed 1 scheme is b etter than the CAE technique which is applied in the MPEG-4 veri cation mo del. Keywords: Shap e Co ding, Video Co ding, Chain Co ding, MPEG-4, Motion Estimation, Motion Comp ensation. 1. INTRODUCTION With the emergence of multimedia applications, functions such as access, searching, indexing and manipulation of visual information at the semantic ob ject level, are b ecoming very imp ortant issues in research and standard- ization e orts, such as MPEG-4. In MPEG-4, each ob ject is represented by three sets of parameters, shap e, texture, and motion so that the ob ject can b e enco ded, accessed and manipulated in arbitrary shap e. Among these three sets of parameters, shap e information is crucial for ob ject representation and ob ject-based co ding. 1{7 In order to transmit the shap e of an ob ject eciently,alargenumber of techniques have b een prop osed. The shap e co ding techniques are typically classi ed as blo ck-based techniques and contour-based techniques. Blo ck-based techniques use a binary image to represent the shap e of the video ob ject; this binary image is enco ded blo ckbyblock as in the conventional image co ding technique. The context-based arithmetic enco ding 1 is one of the most successful metho ds for binary image co ding and has b een adopted in the MPEG-4 (CAE) veri cation mo del. Contour-based techniques p erform the compression along the b oundary of the video ob ject. A p olygon or a contour list is usually used to represent the shap e of a video ob ject. For these representations, distortion b etween the deco ded and original shap e information is easy and well de ned. According to the sp eci ed distortion, the co ding algorithm can achieve lossy and/or lossless co ding. The shap e images b ear a high degree of similarity along the temp oral dimension of an image sequence. The 3, 5 co ding eciency can b e improved by exploiting the temp oral redundancy in a shap e sequence. In this pap er, wepresent a motion comp ensated two-link chain co ding technique. The contour of the video ob ject is de ned on a 6-connected contour lattice for a smo other contour representation. Atow-link chain co ding technique is prop osed to exploit the spatial redundancy within the ob ject contour. In contour motion estimation and comp ensation, b esides translational global motion mo del, wealsoinvestigate an ane global motion mo del for complex motion in the contour sequence. The whole pap er is organized as follows: In section 2, we present the two-link chain co ding technique. In section 3, b oth translational and ane global contour motion estimation are presented. The exp erimental results on MPEG-4 shap e sequences are presented in section 4. The pap er is concluded in section 5. 2. TWO-LINK CHAIN CODE Our approach is a mo di ed chain co ding metho d which b elongs to the contour-based technique. We de ne the contour p oints on the contour lattice which consists of the half-pixel p ositions on the image. As illustrated in Fig. 1, the represents the original image pixel; the and + represent the half-pixel p ositions between two neighb oring image pixels in the horizontal and vertical directions; The b ox represents the remaining half-pixel p ositions. Throughout the whole pap er, top p osition, left p osition and vertex p osition are used to refer to these three typ es of half-pixel p ositions. In our chain co ding system, A contour p ointisonatop (or left ) p osition when its twoneighb oring image pixels in the vertical (or horizontal) direction have di erent lab els. Any contour which passes a vertex p osition can b e represented byacontour without passing that vertex p osition. Therefore, our contour lattice only contains the top and left half-pixel p ositions. One advantage of this contour lattice representation is that it makes contour smo other. In order to improve the contour co ding eciency, we imp ose constrains on the contour to be enco ded by 3 applying a ma jority lter. A ma jority lter simpli es the contour by removing its rugged comp onents. The e ect is similar to the p erfect 8-connectivity constraint. Another feature of our chain co de is the jointencodingoftwo consecutive links in o der to exploit the spatial redundancy of the contour. An illustration of the enco ding of two consecutive links is shown in Fig. 1. On the contour lattice,eachcontour p oint has 6 p ossible links to its six neighb ors. After ma jority ltering, the number of the next p ossible links is more limited. If the current link is in the horizontal or vertical direction, there are 3 p ossible directions for the next link. If the current link is in a diagonal direction, there are only 2 p ossible directions for the next link ; other directional links are not p ossible. If the last link is in the horizontal or vertical direction, there are 7 p ossible combinations for the next two contour links; If the last link is in the diagonal direction, there are 5 p ossible combinations for the next two contour links. In order to reduce the bit-rate for the straightcontour segment, we add two more dashed links. When the contour segment is a straight line in a diagonal direction, one co de can represent 3 or 4 contour links. Without using an entropy enco der, the bit-rate of the prop osed chain co de is 1:5 bits/link. ******** + + + + + + + + + ******* * + + + + + + + + + *******pL pc * + + + + + + + + + Image Pixel ******* * * Top Position + + + + + + + + + + Left Positon *** **** * + + + + + + + + + ******* * pL -- last contour point + + + + + + + + + pc pc -- current contour point *******pL * + + + + + + + + + ******* * + + + + + + + + + ******* * Figure 1. Prop osed two-pixel chain co de We use a context-based arithmetic enco der to enco de the chain co de sequence for a higher co ding eciency. The context is the direction of the last contour link. In total, there are 12 contexts, six of them b egin from a top p osition, and the other six b egin from a left p osition. The probability of eachcodeword under each context is adapted during the co ding pro cedure. 3. CONTOUR MOTION ESTIMATION AND COMPENSATION Since a contour sequence has very high correlation in temp oral domain as the texture do es, a straightforward metho d to exploit its temp oral redundancy is using motion estimation and comp ensation. The contour in the current frame can b e predicted from the contour obtained in the previous frame. Only the contour segment which can not b e predicted is enco ded by the shap e co ding technique. This can reduce the bit-rate of shap e co ding drastically. In the literature, the contour motion is usually assumed as translational motion. This approach works well for image sequences with low sp eed or simple motion. When there is zo oming and/or rotation, this assumption do es not work well. In this pap er, we use two global motion mo dels, translational global motion mo del and ane global motion mo del, to predict the global motion of the contour sequence to b e enco ded. 3.1. Translational Contour Motion Estimation In this contour motion estimation/comp ensation scheme, the ob ject contour is assumed to undergo a translational motion. A global motion vector for the whole ob ject contour is searched according to the numberofmatched contour p oints b etween twocontours. The whole contour is segmented into global motion success segments and global motion failure segments. For each global motion failure segment, a local motion vector search pro cess is applied. This pro cess searches for the motion vector that which minimizes the following function: f (mv )=bit(mv )+ bit(l eng th)+ bit(f ail ur eseg ment) (1) where bit(mv ) denotes the numb er of bits used for the lo cal motion vector mv ; bit(l eng th) the numb er of bits used for the length of the lo cal motion success segment under motion vector mv ;andbit(f ail ur eseg ment) the numb er of estimated bits used for the motion failure segment under motion vector mv .