1
Motion Comp ensated Two-link Chain Co ding for Binary
Shap e Sequence
Zhitao Lu
William A. Pearlman
Electrical Computer and System Engineering Department
Rensselare Polytechnic Institute Troy NY 12180
luz2, p [email protected]
ABSTRACT
In this pap er, we present a motion comp ensated two-link chain co ding technique to e ectively enco de 2-D binary
shap e sequences for ob ject-based video co ding. This technique consists of a contour motion estimation and
comp ensation algorithm and a two-link chain co ding algorithm. The ob ject contour is de ned on a 6-connected
contour lattice for a smo other contour representation. The contour in the current frame is rst predicted by
global motion and lo cal motion based on the deco ded contour in the previous frame; then, it is segmented into
motion success segments, which can b e predicted by the global motion or the lo cal motion, and motion failure
segments, which can not b e predicted by the global and lo cal motion. For each motion failure segment, a two-
link chain co de, which uses one chain co de to representtwo consecutivecontour links, followed by an arithmetic
co der is prop osed for ecient co ding. Each motion success segment can b e represented by the motion vector and
its length. For contour motion estimation and comp ensation, b esides the translational motion mo del, an ane
global motion mo del is prop osed and investigated for complex global motion. We test the p erformance of the
prop osed technique by several MPEG-4 shap e test sequences. The exp erimental results show that our prop osed
1
scheme is b etter than the CAE technique which is applied in the MPEG-4 veri cation mo del.
Keywords: Shap e Co ding, Video Co ding, Chain Co ding, MPEG-4, Motion Estimation, Motion Comp ensation.
1. INTRODUCTION
With the emergence of multimedia applications, functions such as access, searching, indexing and manipulation
of visual information at the semantic ob ject level, are b ecoming very imp ortant issues in research and standard-
ization e orts, such as MPEG-4. In MPEG-4, each ob ject is represented by three sets of parameters, shap e,
texture, and motion so that the ob ject can b e enco ded, accessed and manipulated in arbitrary shap e. Among
these three sets of parameters, shap e information is crucial for ob ject representation and ob ject-based co ding.
1{7
In order to transmit the shap e of an ob ject eciently,alargenumber of techniques have b een prop osed.
The shap e co ding techniques are typically classi ed as blo ck-based techniques and contour-based techniques.
Blo ck-based techniques use a binary image to represent the shap e of the video ob ject; this binary image is
enco ded blo ckbyblock as in the conventional image co ding technique. The context-based arithmetic enco ding
1
is one of the most successful metho ds for binary image co ding and has b een adopted in the MPEG-4 (CAE)
veri cation mo del. Contour-based techniques p erform the compression along the b oundary of the video ob ject.
A p olygon or a contour list is usually used to represent the shap e of a video ob ject. For these representations,
distortion b etween the deco ded and original shap e information is easy and well de ned. According to the sp eci ed
distortion, the co ding algorithm can achieve lossy and/or lossless co ding.
The shap e images b ear a high degree of similarity along the temp oral dimension of an image sequence. The
3, 5
co ding eciency can b e improved by exploiting the temp oral redundancy in a shap e sequence.
In this pap er, wepresent a motion comp ensated two-link chain co ding technique. The contour of the video
ob ject is de ned on a 6-connected contour lattice for a smo other contour representation. Atow-link chain co ding
technique is prop osed to exploit the spatial redundancy within the ob ject contour. In contour motion estimation
and comp ensation, b esides translational global motion mo del, wealsoinvestigate an ane global motion mo del
for complex motion in the contour sequence. The whole pap er is organized as follows: In section 2, we present
the two-link chain co ding technique. In section 3, b oth translational and ane global contour motion estimation
are presented. The exp erimental results on MPEG-4 shap e sequences are presented in section 4. The pap er is concluded in section 5.
2. TWO-LINK CHAIN CODE
Our approach is a mo di ed chain co ding metho d which b elongs to the contour-based technique. We de ne the
contour p oints on the contour lattice which consists of the half-pixel p ositions on the image. As illustrated in
Fig. 1, the represents the original image pixel; the and + represent the half-pixel p ositions between two
neighb oring image pixels in the horizontal and vertical directions; The b ox represents the remaining half-pixel
p ositions. Throughout the whole pap er, top p osition, left p osition and vertex p osition are used to refer to these
three typ es of half-pixel p ositions.
In our chain co ding system, A contour p ointisonatop (or left ) p osition when its twoneighb oring image pixels
in the vertical (or horizontal) direction have di erent lab els. Any contour which passes a vertex p osition can b e
represented byacontour without passing that vertex p osition. Therefore, our contour lattice only contains the
top and left half-pixel p ositions. One advantage of this contour lattice representation is that it makes contour
smo other.
In order to improve the contour co ding eciency, we imp ose constrains on the contour to be enco ded by
3
applying a ma jority lter. A ma jority lter simpli es the contour by removing its rugged comp onents. The
e ect is similar to the p erfect 8-connectivity constraint.
Another feature of our chain co de is the jointencodingoftwo consecutive links in o der to exploit the spatial
redundancy of the contour. An illustration of the enco ding of two consecutive links is shown in Fig. 1. On the
contour lattice,eachcontour p oint has 6 p ossible links to its six neighb ors. After ma jority ltering, the number
of the next p ossible links is more limited. If the current link is in the horizontal or vertical direction, there are
3 p ossible directions for the next link. If the current link is in a diagonal direction, there are only 2 p ossible
directions for the next link ; other directional links are not p ossible. If the last link is in the horizontal or vertical
direction, there are 7 p ossible combinations for the next two contour links; If the last link is in the diagonal
direction, there are 5 p ossible combinations for the next two contour links. In order to reduce the bit-rate for
the straightcontour segment, we add two more dashed links. When the contour segment is a straight line in a
diagonal direction, one co de can represent 3 or 4 contour links. Without using an entropy enco der, the bit-rate
of the prop osed chain co de is 1:5 bits/link. ******** + + + + + + + + + ******* * + + + + + + + + + *******pL pc * + + + + + + + + + Image Pixel ******* * * Top Position + + + + + + + + + + Left Positon *** **** * + + + + + + + + + ******* * pL -- last contour point + + + + + + + + + pc pc -- current contour point *******pL * + + + + + + + + + ******* * + + + + + + + + +
******* *
Figure 1. Prop osed two-pixel chain co de
We use a context-based arithmetic enco der to enco de the chain co de sequence for a higher co ding eciency.
The context is the direction of the last contour link. In total, there are 12 contexts, six of them b egin from a
top p osition, and the other six b egin from a left p osition. The probability of eachcodeword under each context is adapted during the co ding pro cedure.
3. CONTOUR MOTION ESTIMATION AND COMPENSATION
Since a contour sequence has very high correlation in temp oral domain as the texture do es, a straightforward
metho d to exploit its temp oral redundancy is using motion estimation and comp ensation. The contour in the
current frame can b e predicted from the contour obtained in the previous frame. Only the contour segment which
can not b e predicted is enco ded by the shap e co ding technique. This can reduce the bit-rate of shap e co ding
drastically. In the literature, the contour motion is usually assumed as translational motion. This approach
works well for image sequences with low sp eed or simple motion. When there is zo oming and/or rotation, this
assumption do es not work well. In this pap er, we use two global motion mo dels, translational global motion
mo del and ane global motion mo del, to predict the global motion of the contour sequence to b e enco ded.
3.1. Translational Contour Motion Estimation
In this contour motion estimation/comp ensation scheme, the ob ject contour is assumed to undergo a translational
motion. A global motion vector for the whole ob ject contour is searched according to the numberofmatched
contour p oints b etween twocontours. The whole contour is segmented into global motion success segments and
global motion failure segments. For each global motion failure segment, a local motion vector search pro cess is
applied. This pro cess searches for the motion vector that which minimizes the following function:
f (mv )=bit(mv )+ bit(l eng th)+ bit(f ail ur eseg ment) (1)
where bit(mv ) denotes the numb er of bits used for the lo cal motion vector mv ; bit(l eng th) the numb er of bits
used for the length of the lo cal motion success segment under motion vector mv ;andbit(f ail ur eseg ment) the
numb er of estimated bits used for the motion failure segment under motion vector mv .
By using this function, the small motion success segment which consume more bits to be enco ded can be
avoided.
3.2. Ane Contour Motion Estimation
When there is more complex motion such as zoom and/or rotation, the contour can not b e well comp ensated
by translational motion mo del. Here weinvestigate an ane global motion mo del for these cases. We use the
following six-parameter ane motion mo del as the global contour motion mo del.
x^ = a x + a y + a (2)
1 2 3
y^ = a x + a y + a (3)
4 5 6
In the ab ove equations, x^ and y^ are the co ordinates of contour points in the current frame. x; y are the
co ordinates of contour p oints in the previous frame.
The problem is to estimate the vector [a ;a ;a ;a ;a ; a ] according to available contours. As shown in
1 2 3 4 5 6
Fig. 2. First the corner p oints of eachcontour are detected according to their curvature values. Then, the corner
8
points are matched by a corner matching pro cess. The motion vectors are calculated from these matched corner
9
pairs. The ane parameters are estimated by a least median square algorithm.
In order to test the corner matching and the ane motion estimation algorithms, we create shap e images
undergo translational and rotational motion from an available shap e image. The actual motion parameters and
the estimated motion parameters are listed in Table 1. The number of correctly predicted contour p oints by
the decompressed motion parameters are listed in Table 2. We can see that the estimated ane parameters are fairly accurate. Corner points Motion vectors
Contour in current Corner Detection Affine Parameters frame
Corner LMS Matching Estimator
Contour in previous Corner Detection
frame
Figure 2. Diagram of the ane motion estimation for a contour sequence
Table 1. True and estimated ane parameters
a a a a a a
1 2 3 4 5 6
True 0. -.055 -5. .055 0. 5
Est .0017 -.054 -5.07 .059 -.0007 3.9
True 0 0 -5 0 0 5
Est 0. 0. -5. 0 0 5
Table 2. Numb er of pixels which are correctly predicted
Correct Pixels Total Pixels
Translational 129 514
Ane 408 514
3.3. Prop osed Motion Comp ensated Chain Co ding System
Two coding modes: the Intra mo de and the Inter mo de are used in our shap e co ding scheme. The contour in
the rst frame within a GOP (group of picture) is enco ded by the intra mo de. The contours in other frames
are enco ded by the inter mo de. The diagram of the inter mo de contour co ding is shown in Fig. 3. In the intra
mo de co ding, the contour is enco ded directly bythetwo-link chain co ding metho d. In the inter mo de co ding,
the contour is rst predicted by the motion estimation and comp ensation, then only the motion failure segments
are enco ded. The syntax of the bitstream is shown in Fig. 4. The rst bit is the global motion ag, followed by
the global motion vector, the co ordinate of the start point, and then the bitstream for each contour segment.
For a global motion success segment, only its length is enco ded. For a lo cal motion success segment, b oth its
length and the motion vector are enco ded. For a motion failure segment, a series of chain co des are followed.
An END is used to tell the deco der that the end of this segment is reached.
4. EXPERIMENTAL RESULTS
We test the p erformance of the prop osed algorithm by co ding several widely used MPEG-4 test shap e sequences:
the AkiyoandWeather sequences in QCIF and CIF format with frame rate of 30 fps. The total length of the
sequences is 300 frames. We co de the sequences in b oth the intra mo de and the inter mo de. First the shap e Object Contour + + Chain Coding Entropy Coding -
Predicted Contour
Contour Contour Memory Predictor Decoded Motion Vector
Contour Chain Decoding Entropy Decoding Reconstructor
Reconstructed Object Contour
Figure 3. Diagram for the inter mo de contour co ding
Contour Segment
Global Motion Flag Global Motion Start Motion Bitstream Next Segment Vector Point Type
Bit Stream
Motion Segment Type Length
Global Motion Motion Type: 00 -- Global Motion Segment Motion Motion Segment 01 -- Local Motion Type Vector Length Local Motion 10 -- Motion Failure Segment 11 -- End Motion Chain End Type Code Motion Failure
Segment
Figure 4. Bitstream syntax for contour co ding
image is transformed into ob ject contour de ned on the contour lattice. Then the contour links are enco ded by
the prop osed chain co ding technique.
We compare the p erformance of our prop osed shap e co ding technique with the CAE and baseline-based
7
co der. The results are listed in Table 3. From Table 3, the p erformance of our algorithm is b etter than that
of the CAE and the Baseline-based technique.
Table 3. Comparison of shap e co ding techniques in intra mo de
Sequence Frame Format CAE Baseline Prop osed
algorithm
Akiyo 0 QCIF 0.06059 N/A 0.0576
Akiyo 0 CIF 0.03657 N/A 0.0260
Weather 0 QCIF 0.05581 N/A 0.0574
Weather 30 QCIF 0.0801 0.0745 0.0579
Weather 0 CIF 0.03167 N/A 0.0252
We also compare the p erformance of our prop osed shap e co ding technique with the CAE and another motion
5
comp ensated contour-based technique, GPSC. The results are listed in Table 4. From Table 4, the p erformance
of our algorithm is b etter than that of the CAE and GPSC technique.
Table 4. Comparison of shap e co ding techniques in inter mo de (bits/frame)
Sequence Format frame rate CAE GPSC Prop osed
algorithm
Weather QCIF 30fps 303 N/A 288
Weather QCIF 10fps 382 394 356
5. CONCLUSION
In this pap er, we present our new motion comp ensated two-link chain co ding metho d for 2-D shap e co ding. The
contour p oints are de ned on the contour lattice, a 6-connected image. We imp ose a smo oth constraint on the
ob ject contour by applying a ma jority lter to improve the co ding eciency. In the inter mo de, weinvestigate
the translational and ane-mo del based motion comp ensated chain co ding scheme for shap e sequence co ding.
The exp erimental results show that ane motion mo del works well for more complicated global motion than
traditional translational motion mo del. The exp erimental results show that our prop osed scheme uses fewer bits
than that of the CAE technique which is applied in MPEG-4 VM7.0 and other contour-based techniques: GPSC
and Baseline-based technique.
REFERENCES
1. N. Brady, F. Bossen, and N. Murphy,\Context-based arithmetic enco ding of 2d shap e sequences," in Special
session on shapecoding, ICIP97,1997.
2. H. Freeman, \On the enco ding of arbitrary geometric con gurations," IRE Trans. Electron. Comput.,vol. 10,
pp. 260{268, June 1961.
3. C. Gu and M. Kunt, \Contour simpli cation and motion comp ensated co ding," Signal Processing: Image
Communications (Special Issue on Coding Techniques for Very Low Bit-Rate Video,vol. 7, pp. 279{296, Nov.
1995.
4. T. Kaneko and M. Okudaira, \Enco ding of arbitrary curves based on the chain co de representation," IEEE
Trans. on Communications,vol. COM-33, pp. 697{706, July 1985.
5. J. I. Kim, A. C. Bovik, and B. L. Evans, \Generalized predictive binary shap e co ding using p olygon approx-
imation," Signal Processing: Image Communication, pp. 643{663, July 2000.
6. L. Lab elle, D. Lauzon, J. Konrad, and E. Dub ois, \Arithmetic co ding of a lossless contour-based representa-
tion of lab el images," in Proc. of the International Conf. on Image Processing,vol.1,1998.
7. S. H. Lee, D.-S. Cho, Y.-S. Cho, S. H. Son, E. S. Jang, J.-S. Shin, and Y. S. Seo, \Binary shap e co ding using
baseline-based metho d," IEEE Trans. on Circuits and Systems for VideoTechnology,vol. 9, pp. 44{58, Feb.
1999.
8. R. N. Strickland and Z. Mao, \Computing corresp ondences in a sequence of non-rigid shap es," Pattern
Recognition,vol. 25, pp. 901{912, Sept. 1992.
9. P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection. New York, NY: John Wiley Sons, 1987.