1

Motion Comp ensated Two-link Chain Co ding for Binary

Shap e Sequence

Zhitao Lu

William A. Pearlman

Electrical Computer and System Engineering Department

Rensselare Polytechnic Institute Troy NY 12180

luz2, p [email protected]

ABSTRACT

In this pap er, we present a motion comp ensated two-link chain co ding technique to e ectively enco de 2-D binary

shap e sequences for ob ject-based co ding. This technique consists of a contour and

comp ensation algorithm and a two-link chain co ding algorithm. The ob ject contour is de ned on a 6-connected

contour lattice for a smo other contour representation. The contour in the current frame is rst predicted by

global motion and lo cal motion based on the deco ded contour in the previous frame; then, it is segmented into

motion success segments, which can b e predicted by the global motion or the lo cal motion, and motion failure

segments, which can not b e predicted by the global and lo cal motion. For each motion failure segment, a two-

link chain co de, which uses one chain co de to representtwo consecutivecontour links, followed by an arithmetic

co der is prop osed for ecient co ding. Each motion success segment can b e represented by the motion vector and

its length. For contour motion estimation and comp ensation, b esides the translational motion mo del, an ane

global motion mo del is prop osed and investigated for complex global motion. We test the p erformance of the

prop osed technique by several MPEG-4 shap e test sequences. The exp erimental results show that our prop osed

1

scheme is b etter than the CAE technique which is applied in the MPEG-4 veri cation mo del.

Keywords: Shap e Co ding, Video Co ding, Chain Co ding, MPEG-4, Motion Estimation, Motion Comp ensation.

1. INTRODUCTION

With the emergence of multimedia applications, functions such as access, searching, indexing and manipulation

of visual information at the semantic ob ject level, are b ecoming very imp ortant issues in research and standard-

ization e orts, such as MPEG-4. In MPEG-4, each ob ject is represented by three sets of parameters, shap e,

texture, and motion so that the ob ject can b e enco ded, accessed and manipulated in arbitrary shap e. Among

these three sets of parameters, shap e information is crucial for ob ject representation and ob ject-based co ding.

1{7

In order to transmit the shap e of an ob ject eciently,alargenumber of techniques have b een prop osed.

The shap e co ding techniques are typically classi ed as blo ck-based techniques and contour-based techniques.

Blo ck-based techniques use a binary image to represent the shap e of the video ob ject; this binary image is

enco ded blo ckbyblock as in the conventional image co ding technique. The context-based arithmetic enco ding

1

is one of the most successful metho ds for binary image co ding and has b een adopted in the MPEG-4 (CAE)

veri cation mo del. Contour-based techniques p erform the compression along the b oundary of the video ob ject.

A p olygon or a contour list is usually used to represent the shap e of a video ob ject. For these representations,

distortion b etween the deco ded and original shap e information is easy and well de ned. According to the sp eci ed

distortion, the co ding algorithm can achieve lossy and/or lossless co ding.

The shap e images b ear a high degree of similarity along the temp oral dimension of an image sequence. The

3, 5

co ding eciency can b e improved by exploiting the temp oral redundancy in a shap e sequence.

In this pap er, wepresent a motion comp ensated two-link chain co ding technique. The contour of the video

ob ject is de ned on a 6-connected contour lattice for a smo other contour representation. Atow-link chain co ding

technique is prop osed to exploit the spatial redundancy within the ob ject contour. In contour motion estimation

and comp ensation, b esides translational global motion mo del, wealsoinvestigate an ane global motion mo del

for complex motion in the contour sequence. The whole pap er is organized as follows: In section 2, we present

the two-link chain co ding technique. In section 3, b oth translational and ane global contour motion estimation

are presented. The exp erimental results on MPEG-4 shap e sequences are presented in section 4. The pap er is concluded in section 5.

2. TWO-LINK CHAIN CODE

Our approach is a mo di ed chain co ding metho d which b elongs to the contour-based technique. We de ne the

contour p oints on the contour lattice which consists of the half-pixel p ositions on the image. As illustrated in

Fig. 1, the  represents the original image pixel; the  and + represent the half-pixel p ositions between two

neighb oring image pixels in the horizontal and vertical directions; The b ox represents the remaining half-pixel

p ositions. Throughout the whole pap er, top p osition, left p osition and vertex p osition are used to refer to these

three typ es of half-pixel p ositions.

In our chain co ding system, A contour p ointisonatop (or left ) p osition when its twoneighb oring image pixels

in the vertical (or horizontal) direction have di erent lab els. Any contour which passes a vertex p osition can b e

represented byacontour without passing that vertex p osition. Therefore, our contour lattice only contains the

top and left half-pixel p ositions. One advantage of this contour lattice representation is that it makes contour

smo other.

In order to improve the contour co ding eciency, we imp ose constrains on the contour to be enco ded by

3

applying a ma jority lter. A ma jority lter simpli es the contour by removing its rugged comp onents. The

e ect is similar to the p erfect 8-connectivity constraint.

Another feature of our chain co de is the jointencodingoftwo consecutive links in o der to exploit the spatial

redundancy of the contour. An illustration of the enco ding of two consecutive links is shown in Fig. 1. On the

contour lattice,eachcontour p oint has 6 p ossible links to its six neighb ors. After ma jority ltering, the number

of the next p ossible links is more limited. If the current link is in the horizontal or vertical direction, there are

3 p ossible directions for the next link. If the current link is in a diagonal direction, there are only 2 p ossible

directions for the next link ; other directional links are not p ossible. If the last link is in the horizontal or vertical

direction, there are 7 p ossible combinations for the next two contour links; If the last link is in the diagonal

direction, there are 5 p ossible combinations for the next two contour links. In order to reduce the bit-rate for

the straightcontour segment, we add two more dashed links. When the contour segment is a straight line in a

diagonal direction, one co de can represent 3 or 4 contour links. Without using an entropy enco der, the bit-rate

of the prop osed chain co de is  1:5 bits/link. ******** + + + + + + + + + ******* * + + + + + + + + + *******pL pc * + + + + + + + + + Image Pixel ******* * * Top Position + + + + + + + + + + Left Positon *** **** * + + + + + + + + + ******* * pL -- last contour point + + + + + + + + + pc pc -- current contour point *******pL * + + + + + + + + + ******* * + + + + + + + + +

******* *

Figure 1. Prop osed two-pixel chain co de

We use a context-based arithmetic enco der to enco de the chain co de sequence for a higher co ding eciency.

The context is the direction of the last contour link. In total, there are 12 contexts, six of them b egin from a

top p osition, and the other six b egin from a left p osition. The probability of eachcodeword under each context is adapted during the co ding pro cedure.

3. CONTOUR MOTION ESTIMATION AND COMPENSATION

Since a contour sequence has very high correlation in temp oral domain as the texture do es, a straightforward

metho d to exploit its temp oral redundancy is using motion estimation and comp ensation. The contour in the

current frame can b e predicted from the contour obtained in the previous frame. Only the contour segment which

can not b e predicted is enco ded by the shap e co ding technique. This can reduce the bit-rate of shap e co ding

drastically. In the literature, the contour motion is usually assumed as translational motion. This approach

works well for image sequences with low sp eed or simple motion. When there is zo oming and/or rotation, this

assumption do es not work well. In this pap er, we use two global motion mo dels, translational global motion

mo del and ane global motion mo del, to predict the global motion of the contour sequence to b e enco ded.

3.1. Translational Contour Motion Estimation

In this contour motion estimation/comp ensation scheme, the ob ject contour is assumed to undergo a translational

motion. A global motion vector for the whole ob ject contour is searched according to the numberofmatched

contour p oints b etween twocontours. The whole contour is segmented into global motion success segments and

global motion failure segments. For each global motion failure segment, a local motion vector search pro cess is

applied. This pro cess searches for the motion vector that which minimizes the following function:

f (mv )=bit(mv )+ bit(l eng th)+ bit(f ail ur eseg ment) (1)

where bit(mv ) denotes the numb er of bits used for the lo cal motion vector mv ; bit(l eng th) the numb er of bits

used for the length of the lo cal motion success segment under motion vector mv ;andbit(f ail ur eseg ment) the

numb er of estimated bits used for the motion failure segment under motion vector mv .

By using this function, the small motion success segment which consume more bits to be enco ded can be

avoided.

3.2. Ane Contour Motion Estimation

When there is more complex motion such as zoom and/or rotation, the contour can not b e well comp ensated

by translational motion mo del. Here weinvestigate an ane global motion mo del for these cases. We use the

following six-parameter ane motion mo del as the global contour motion mo del.

x^ = a x + a y + a (2)

1 2 3

y^ = a x + a y + a (3)

4 5 6

In the ab ove equations, x^ and y^ are the co ordinates of contour points in the current frame. x; y are the

co ordinates of contour p oints in the previous frame.

The problem is to estimate the vector [a ;a ;a ;a ;a ; a ] according to available contours. As shown in

1 2 3 4 5 6

Fig. 2. First the corner p oints of eachcontour are detected according to their curvature values. Then, the corner

8

points are matched by a corner matching pro cess. The motion vectors are calculated from these matched corner

9

pairs. The ane parameters are estimated by a least median square algorithm.

In order to test the corner matching and the ane motion estimation algorithms, we create shap e images

undergo translational and rotational motion from an available shap e image. The actual motion parameters and

the estimated motion parameters are listed in Table 1. The number of correctly predicted contour p oints by

the decompressed motion parameters are listed in Table 2. We can see that the estimated ane parameters are fairly accurate. Corner points Motion vectors

Contour in current Corner Detection Affine Parameters frame

Corner LMS Matching Estimator

Contour in previous Corner Detection

frame

Figure 2. Diagram of the ane motion estimation for a contour sequence

Table 1. True and estimated ane parameters

a a a a a a

1 2 3 4 5 6

True 0. -.055 -5. .055 0. 5

Est .0017 -.054 -5.07 .059 -.0007 3.9

True 0 0 -5 0 0 5

Est 0. 0. -5. 0 0 5

Table 2. Numb er of pixels which are correctly predicted

Correct Pixels Total Pixels

Translational 129 514

Ane 408 514

3.3. Prop osed Motion Comp ensated Chain Co ding System

Two coding modes: the Intra mo de and the Inter mo de are used in our shap e co ding scheme. The contour in

the rst frame within a GOP (group of picture) is enco ded by the intra mo de. The contours in other frames

are enco ded by the inter mo de. The diagram of the inter mo de contour co ding is shown in Fig. 3. In the intra

mo de co ding, the contour is enco ded directly bythetwo-link chain co ding metho d. In the inter mo de co ding,

the contour is rst predicted by the motion estimation and comp ensation, then only the motion failure segments

are enco ded. The syntax of the bitstream is shown in Fig. 4. The rst bit is the global motion ag, followed by

the global motion vector, the co ordinate of the start point, and then the bitstream for each contour segment.

For a global motion success segment, only its length is enco ded. For a lo cal motion success segment, b oth its

length and the motion vector are enco ded. For a motion failure segment, a series of chain co des are followed.

An END is used to tell the deco der that the end of this segment is reached.

4. EXPERIMENTAL RESULTS

We test the p erformance of the prop osed algorithm by co ding several widely used MPEG-4 test shap e sequences:

the AkiyoandWeather sequences in QCIF and CIF format with of 30 fps. The total length of the

sequences is 300 frames. We co de the sequences in b oth the intra mo de and the inter mo de. First the shap e Object Contour + + Chain Coding Entropy Coding -

Predicted Contour

Contour Contour Memory Predictor Decoded Motion Vector

Contour Chain Decoding Entropy Decoding Reconstructor

Reconstructed Object Contour

Figure 3. Diagram for the inter mo de contour co ding

Contour Segment

Global Motion Flag Global Motion Start Motion Bitstream Next Segment Vector Point Type

Bit Stream

Motion Segment Type Length

Global Motion Motion Type: 00 -- Global Motion Segment Motion Motion Segment 01 -- Local Motion Type Vector Length Local Motion 10 -- Motion Failure Segment 11 -- End Motion Chain End Type Code Motion Failure

Segment

Figure 4. Bitstream syntax for contour co ding

image is transformed into ob ject contour de ned on the contour lattice. Then the contour links are enco ded by

the prop osed chain co ding technique.

We compare the p erformance of our prop osed shap e co ding technique with the CAE and baseline-based

7

co der. The results are listed in Table 3. From Table 3, the p erformance of our algorithm is b etter than that

of the CAE and the Baseline-based technique.

Table 3. Comparison of shap e co ding techniques in intra mo de

Sequence Frame Format CAE Baseline Prop osed

algorithm

Akiyo 0 QCIF 0.06059 N/A 0.0576

Akiyo 0 CIF 0.03657 N/A 0.0260

Weather 0 QCIF 0.05581 N/A 0.0574

Weather 30 QCIF 0.0801 0.0745 0.0579

Weather 0 CIF 0.03167 N/A 0.0252

We also compare the p erformance of our prop osed shap e co ding technique with the CAE and another motion

5

comp ensated contour-based technique, GPSC. The results are listed in Table 4. From Table 4, the p erformance

of our algorithm is b etter than that of the CAE and GPSC technique.

Table 4. Comparison of shap e co ding techniques in inter mo de (bits/frame)

Sequence Format frame rate CAE GPSC Prop osed

algorithm

Weather QCIF 30fps 303 N/A 288

Weather QCIF 10fps 382 394 356

5. CONCLUSION

In this pap er, we present our new motion comp ensated two-link chain co ding metho d for 2-D shap e co ding. The

contour p oints are de ned on the contour lattice, a 6-connected image. We imp ose a smo oth constraint on the

ob ject contour by applying a ma jority lter to improve the co ding eciency. In the inter mo de, weinvestigate

the translational and ane-mo del based motion comp ensated chain co ding scheme for shap e sequence co ding.

The exp erimental results show that ane motion mo del works well for more complicated global motion than

traditional translational motion mo del. The exp erimental results show that our prop osed scheme uses fewer bits

than that of the CAE technique which is applied in MPEG-4 VM7.0 and other contour-based techniques: GPSC

and Baseline-based technique.

REFERENCES

1. N. Brady, F. Bossen, and N. Murphy,\Context-based arithmetic enco ding of 2d shap e sequences," in Special

session on shapecoding, ICIP97,1997.

2. H. Freeman, \On the enco ding of arbitrary geometric con gurations," IRE Trans. Electron. Comput.,vol. 10,

pp. 260{268, June 1961.

3. C. Gu and M. Kunt, \Contour simpli cation and motion comp ensated co ding," Signal Processing: Image

Communications (Special Issue on Coding Techniques for Very Low Bit-Rate Video,vol. 7, pp. 279{296, Nov.

1995.

4. T. Kaneko and M. Okudaira, \Enco ding of arbitrary curves based on the chain co de representation," IEEE

Trans. on Communications,vol. COM-33, pp. 697{706, July 1985.

5. J. I. Kim, A. C. Bovik, and B. L. Evans, \Generalized predictive binary shap e co ding using p olygon approx-

imation," Signal Processing: Image Communication, pp. 643{663, July 2000.

6. L. Lab elle, D. Lauzon, J. Konrad, and E. Dub ois, \Arithmetic co ding of a lossless contour-based representa-

tion of lab el images," in Proc. of the International Conf. on Image Processing,vol.1,1998.

7. S. H. Lee, D.-S. Cho, Y.-S. Cho, S. H. Son, E. S. Jang, J.-S. Shin, and Y. S. Seo, \Binary shap e co ding using

baseline-based metho d," IEEE Trans. on Circuits and Systems for VideoTechnology,vol. 9, pp. 44{58, Feb.

1999.

8. R. N. Strickland and Z. Mao, \Computing corresp ondences in a sequence of non-rigid shap es," Pattern

Recognition,vol. 25, pp. 901{912, Sept. 1992.

9. P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection. New York, NY: John Wiley Sons, 1987.