Quick viewing(Text Mode)

1. Introduction

1. Introduction



FACTA UNIVERSITATIS (NIS)

Series: Electronics and Energetics vol. 9, No. 2 (1996), 173{184

SOME IMPROVEMENTS OF THE IMAGE

SEQUENCE COMPRESSION



Dusan Curap ov and Mio drag Pop ovic

Abstract. In this pap er, some improvements of the standard metho d for image

sequence compression using Discrete cosine transform and motion comp ensation

are describ ed. The main improvements are realized by dividing the images into

blo cks of various size and by tuning the quantization matrices according to

image activity. The realized signal to noise ratio was 37 to 39 dB at the rate

of 0.3 bpp, what represents improvementof2to4dBover similar DCT{based

metho ds.

1. Intro duction

The recentadvances in communication and computer systems havemade

p ossible the transmission of images at low bit rates, using standard telephone

or ISDN channels. In order to transmit TV signal through such links, its

spatial and temp oral redundancies should b e made signi cantly smaller. To

reduce the spatial redundancy the Discrete cosine transform is usually used,

b ecause of its ability to compress the signal into a small number of DCT

co ecients. Further, the DCT is a real transform, and fast algorithms for

its computation exist. In order to realize compression in real time using

mo dern VLSI integrated circuits for DCT computation, the image is usually

divided into blo cks of 8  8 [7].

The simplest metho d for the removal of temp oral redundancy is the co ding

of the di erence between successive images from the sequence. The more

elab orate metho ds extract some information ab out the motion and use it to

comp ensate the motion b efore the di erence and co ding op erations.

Manuscript received January 27, 1995.

Aversion of this pap er was presented at the second Conference Telecommunications

in Mo dern Satellite and Cables Services, TELSIKS'95, Octob er 1995, Nis, Yugoslavia..

The authors are with Faculty of Electrical Engineering, University of Belgrade, Bule-

var Revolucije 73, P.O. Box 816, 11001 Belgrade, Yugoslavia. 173

174 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)

In order to facilitate the transmission and storage of signals, sev-

eral international standards are prop osed, such as H.261 [1] and MPEG [3].

In these standards, the DCT is used to remove the spatial redundancy,and

motion comp ensation is optionally used to remove the temp oral redundancy.

However, no sp eci c motion comp ensation metho d is sp eci ed in these stan-

dards, only the p osition of motion vector in the data stream and its size are

sp eci ed.

The simpli ed blo ck diagrams of co der and deco der in b oth systems are

very similar, and they are presented in Fig. 1.

(a)

(b)

Figure 1. The blo ck diagram of the motion comp ensated video

signal transmission: (a) co der, (b) deco der.

D. Curap ov and M. Pop ovic: Some improvements of the image... 175

In the last few years, several solutions for motion comp ensation in image

sequences were prop osed [6]. It has b een shown that motion comp ensation

can signi cantly improve visual quality of images at the xed , or

p ermit the transmission at lower bit rate at the xed image quality. However,

no metho d app ears to b e the b est solution.

In this pap er an exp erimental analysis is p erformed to establish what can

be obtained if images are divided into blo cks of variable sizes (instead of

xed size 8  8 pixels) in the pro cess of transformation and co ding. The

results are evaluated using signal to noise ratio as the ob jective criterion,

andalsoby a sub jectiveevaluation. To facilitate the comparison with other

results, the sequence known as Miss America was used in our exp eriments.

2. Motion comp ensation

The motion comp ensation is a very imp ortant part of any algorithm for

the compression of image sequences. Every algorithm for the motion com-

p ensation consists of two parts: the estimation of the motion, and the com-

pensation of the estimated motion. In order to realize ecient motion esti-

mation and comp ensation, images are usually divided into small blo cks, and

the motion of the blo cks is estimated giving a set of motion vectors. The

quality of the algorithm is dep endent on its computational

complexity and accuracy of the estimated motion vectors.

In the usual video conferencing sequences, the motions of the blo cks be-

tween two successive images are only few pixels. Because of that, the motion

estimation of a blo ck n  n from the current image, reduces to the deter-

mination of the b est match of that blo ck and blo cks of the same size in the

previous image which lay in the search region (n +2p)  (n +2p), where

p is the maximal tolerable motion. The most accurate, but the least ef-

fective, motion estimation metho d is the blo ck matching metho d based on

the criterion of the maximal cross-correlation (or minimal average absolute

error) between blo cks from the two images. In both cases, it is necessary

2

to examine (2p +1) p ossible matches, what represents very high compu-

tational complexity. Because of that, in the last few years several more

e ective metho ds for the motion estimation were prop osed [6], that improve

computational complexity up to ten times. In all these metho ds only the

lo cal optimum is found, but this solution is acceptable in this application.

One of the b est algorithms for the motion estimation is the three-step

algorithm [5]. In the rst step of this algorithm, the mean absolute error

(MAE) is calculated in nine p oints with co ordinates (i  p ;j  p ), p =0

1 1 1

or 3, based on formula [4]:

176 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)

n n

X X

1

MAE(i; j )= j f (u; v ) f (u + i; v + j ) j (1)

k k 1

2

n

u=1 v =1

These p oints are denoted by 1 in Fig. 2. From these nine p oints, the p oint

with minimum MAE is chosen, which represent the rst approximation of

the motion vector. In Fig. 1 this is the p oint(i +3;j+3). In the second step,

the MAE is calculated in new eight points (denoted by 2 in Fig. 2), using

ner resolution p

2 1

determined. This is the p oint(i +3;j +5) in Fig. 2. The second step can b e

rep eated several times, every time with ner resolution p

i i1

step, the resolution is p =1. When the size of the search region is p  6,

i

only three steps are needed. The nal p osition of the motion vector in Fig.

2is(i +2;j +6). This metho d is also suitable for hardware realizations in

VLSI.

In this pap er, in order to obtain higher compression, several exp eriments

using variable blo ck size will b e p erformed. It is known that blo cks of greater

size can b e co ded using smaller numb er of p er (bpp), but their use

is not suitable if there is a large motion b etween images. That is the reason

why in standard co ding metho ds the blo ck size of 8  8 pixels is used. In

our exp eriment, in the rst phase of the motion estimation, the blo ck size of

16  16 pixels was used. Each blo ckischaracterized bytwo parameters: the

motion vector and the MAE. The value of MAE determines the activity of

the blo ck, e.g. it represents the amount of motion in the blo ck that can not

b e comp ensated. Using the MAE, all blo cks are classi ed into three classes,

so that class I contains the blo cks with the highest activity. Then, in the

second phase of the algorithm, four neighb oring blo cks are examined. If at

least three blo cks b elong to class I I I and only one to class I I, then these four

blo cks are merged into a larger blo ckofsize32 32, with motion vector equal

to the arithmetic mean of four motion vectors. If a blo ck b elongs to class I,

it will b e divided into four smaller blo cks of size 8  8. The motion vectors of

these new blo cks should b e determined again, but using the existing motion

vector as a go o d initial guess to reduce the computation. The remaining

blo cks, which are not group ed nor divided, retain the initial size of 16  16

pixels. Using this pro cedure, only the blo cks that contain large motion will

b e co ded using blo cksizeof8 8 pixels.

D. Curap ov and M. Pop ovic: Some improvements of the image... 177

Figure 2. The three{step algorithm for motion estimation.

3. Discrete cosine transform

When motion vectors of all blo cks of an image are determined, the dif-

ferences between blo cks from the current image and corresp onding motion

comp ensated blo cks from the previous image are formed. The di erence

blo cks are then transformed using 2{D Discrete cosine transform (DCT)

using the expression:

n1 n1

X X

c c (2x +1)u (2y +1)v

u v

p

F (u; v )= f (x; y ) cos cos (2)

2n 2n

2n

y =0 x=0

where:

8

1

<

p

u; v =0

c ;c = (3)

2

u v

:

1 u; v 6=0

and n =8; 16 or 32.

The values of DCT co ecients F (u; v ) are very di erent. It has been

found by a detailed examination that the values of all DCT co ecients have

178 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)

Laplacian probability densitywithmeanvalue equal to zero. Therefore, the

expression for Laplacian probability density

jxj

p(x)= e (4)

2

with  = 0, can be used to represent the probability density of the DCT

co ecient of di erence blo cks. Also, the variance of DCT co ecients is given

by:

2

2

(5)  =

4. Design of the quantizers

Based on the variances of the DCT co ecients, the bit allo cation ma-

trices which show the allo cation of the available bits to the quantizers can

b e determined. In the available literature, two typ es of bit allo cation algo-

rithms are describ ed. In the rst typ e, the xed bit allo cation pattern is

used, which is determined from the image statistics. In the second typ e, the

DCT co ecients are divided by the elements of the predetermined quan-

tization matrix and subsequently quantized and co ded. In this case, some

overhead information ab out the p osition and size of DCT co ecients must

b e transmitted. The rst typ e is considered simpler but less e ective.

In this pap er, an integer xed bit allo cation algorithm is used, which

is describ ed in more detail in [4], [2]. In this algorithm, B represents the

2

desired average bit rate, and N = n is the numb er of DCT co ecients in a

blo ck.

1. Initialization of the algorithm:

0

n =0; 0  k  N 1; j =1 (6)

k

2. Calculate

j 1 j

+  (k i) (7) = n n

k k

where i is the value of the index k for which the expression

h i

j j 1 j 1

2

 =  d(n ) d(n +1) (8)

k

k k k

j

reaches its maximum. The  represents the reduction in distortion if the

k

j -th bit is allo cated to the k -th DCT co ecient, and d(m) represents the

mean square distortion of the m-bit quantizer for the unityvariance input.

D. Curap ov and M. Pop ovic: Some improvements of the image... 179

3. The pro cess of bit allo cation is nished if the inequality

N 1

X

j

n  NB (9)

k

k =0

is satis ed. If it is not satis ed, put j ! j +1, and rep eat the second step

of the algorithm.

The second step in this algorithm need some additional comments. In

the most common case of Lloyd{Max optimum quantizer, the mean square

distortion function d(m) has the form:

bm

d(m)=a2 (10)

where the parameters a and b dep end on the typ e of the distribution and

the range of variable m. Their values can b e determined from Table 4.4 in

Ref. [4]. In this case eqn. (8) reduces to:

j j 1

b 2

 =(1 2 ) d(n ) (11)

k

k k

that is much easier to compute.

Another problem may o ccur in step 2 when two or more indices k give

j

equal  . In that case an additional bit is allo cated to each quantizer.

k

Using this algorithm, and starting from the prescrib ed bit rates for blo ck

size 8  8, B , 16  16, B , and 32  32, B , the quantization matrices

8 16 32

can be formed. But, it has been exp erimentally found that for blo ck size

8  8 the distribution of DCT co ecients can be very di erent. Hence, in

this case, it is useful to prop ose several quantization matrices. The decision,

what matrix should b e used, is made based on the value of the ACactivity

measure, de ned as:

" ! #

1=2

7 7

X X

1

2 2

(12) F (k; l) F (0; 0) AC T =

63

k =0 l=0

In this work vetyp es of 8  8quantization matrices are used, shown in

Fig. 3a. The quantization matrices for blo cksizes 16  16 and 32  32 are

shown in Fig. 3b and Fig. 3c. Taking into account the frequency of the

app earance of various blo cks, and its bit rates B = 0:5 bpp, B = 0:25

8 16

bpp, and B =0:125 bpp, the average bit rate is ab out 0:3 bpp.

32

5. Computer simulation

In order to assess the p erformance of the prop osed mo di cations using

the ob jective and sub jective criterions, some simulations were p erformed on

180 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)

(a)

(b)

(c)

Figure 3. The quantization matrices for the sequence Miss America:

(a) 8  8(B =0:5 bpp), (b) 16  16 (B =0:25 bpp),

8 16

(c) 32  32 (B =0:125 bpp). 32

D. Curap ov and M. Pop ovic: Some improvements of the image... 181

a computer system consisting of a PC 486/DX2 computer, the video sub-

system Imaging Technology Series 151, and a high quality video display.

All simulation programs that simulate the op erations in co der and deco der

were written in Microsoft C language. However, there are some di erences

between the simulations and a real transmission system. The DCT and its

inverse were calculated using oating p ointinsteadof xedpoint arithmetic.

Also, the entropy co ding and error correcting co ding were omitted from sim-

ulations, since the main goal was to determine the in uence of the blo ck

size on the compression due to quantization of DCT co ecients. Since other

exp eriments, using xed size blo cks, were conducted using the same exp er-

imental setup, the obtained results can be used for comparison with other

metho ds. The system was tested using several known video conferencing

sequences.

In order to evaluate the quality of the reconstructed sequence, the peak

signal to noise ratio (PSNR), de ned as:

2

255

PSNR = 10 log (13)

10

N

TOT

P

1

2

^

(f f )

i i

N

TOT

i=1

was used as the ob jective criterion. In this expression, N represents

TOT

the total number of pixels in an image, f represent the value of a pixel

i

^

from original image, and f represent the value of the same pixel from the

i

reconstructed image. In Fig. 4 are shown the results of simulations for the

rst 30 frames from the sequence Miss America, at the bit rate B =0:3bpp.

To facilitate the simulation, the rst frame in the reconstructed sequence was

taken from the original sequence.

Figure 4. The p eak signal{to{noise ratio for the rst 30 frames

from the sequence Miss America at B =0:3bpp.

182 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)

It is easily seen in Fig. 4 that the realized PSNR varies between 37 dB

and 39 dB. This result compares very well with the results known from the

literature. For example, in our previous simulations of DCT-based metho ds

using xed size blo cks of 8  8 pixels [2], the realized PSNR was smaller for

2{4 dB.

(a)

(b)

Figure 5. (a) The reconstructed 16th frame from the sequence Miss America,

(b) The negative of the di erence b etween original

and reconstructed 16-th frame multiplied by20.

D. Curap ov and M. Pop ovic: Some improvements of the image... 183

The sub jectiveevaluation of the quality of the reconstructed sequence was

p erformed by observing the images on a high quality large video display.

Only at the bit rates smaller than 0.5 bpp some small distortions can be

noticed in regions having high motion (lips, eyes, neck, and part of the hair)

as can b e seen in Fig. 5b. At the higher bit rates such distortions can not

b e noticed.

6. Conclusion

In this pap er a new system for the compression of image sequences, suit-

able for video conferencing systems, is describ ed. The main improvements

over standard systems lays in the dividing the images into blo cks of variable

sizes (8  8, 16  16 or 32  32 pixels), and co ding of blo cks according to the

statistical prop erties of their DCT co ecients. It has b een found that DCT

co ecients of di erences between motion comp ensated blo cks from current

and previous images have Laplace distribution. Basedonthisfact, aset of

quantization matrices and corresp onding optimal quantizers were designed.

The second improvement is in the use of several quantization matrices for

the smallest blo cks of 8  8 pixels, according to their activity. The prop osed

mo di cations were tested using rst 30 frames from the video conferencing

sequence Miss America, and it has b een found that the p eak signal to noise

ratio (PSNR) varies between 37 dB and 39 dB, at the bit rate of 0.3 bpp.

This represents an improvement of 2 dB to 4 dB over existing DCT-based

metho ds using xed size blo cks. The sub jective quality of the image se-

quence was evaluated by viewing the sequence on a high quality display.

Some small distortions can b e noticed in regions having high motion only at

the bit rates smaller than 0.5 bpp.

REFERENCES

1. CCITT Recommendation H.261: VideoCodecforAudiovisual Services at p 

64 K bit=s. Dec. 1990.





2. D. Curapov and M. Popovic: Use of motion compensation in the ecient coding

of image sequences. YU Info Symp., Brezovica, Yugoslavia, April 1995.

3. ISO/IEC-11172: Information Technology { Coding of Moving Pictures and Asso-

ciatedAudio for Digital Storage Mediaatupto1:5 M bit=s. Oct. 1992.

4. A. K. Jain: Fundamentals of . Prentice Hall, Englewood

Cli s, 1989.

5. T. Koga, K. Iinuma, A. Hirano, Y. Iijima and T. Ishiguro: Motion{com-

pensated interframe coding for videoconferencing. Pro c. Nat. Telecomm. Conf.

NTC'81, New Orleans, Dec. 1981, pp. G5.3.1-G5.3.5.

6. H. G. Musmann, P. Pirsch and H. J. Grallert: Advances in PictureCoding.

Pro c. IEEE, vol. 73, No. 4, April 1985, pp. 523-548.

184 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)

7. K. R. Rao and P. Yip: Discrete Cosine Transform { Algorithms, Advantages,

Applications. Academic Press, London, 1990.