1. Introduction
FACTA UNIVERSITATIS (NIS)
Series: Electronics and Energetics vol. 9, No. 2 (1996), 173{184
SOME IMPROVEMENTS OF THE IMAGE
SEQUENCE COMPRESSION
Dusan Curap ov and Mio drag Pop ovic
Abstract. In this pap er, some improvements of the standard metho d for image
sequence compression using Discrete cosine transform and motion comp ensation
are describ ed. The main improvements are realized by dividing the images into
blo cks of various size and by tuning the quantization matrices according to
image activity. The realized signal to noise ratio was 37 to 39 dB at the bit rate
of 0.3 bpp, what represents improvementof2to4dBover similar DCT{based
metho ds.
1. Intro duction
The recentadvances in communication and computer systems havemade
p ossible the transmission of images at low bit rates, using standard telephone
or ISDN channels. In order to transmit TV signal through such links, its
spatial and temp oral redundancies should b e made signi cantly smaller. To
reduce the spatial redundancy the Discrete cosine transform is usually used,
b ecause of its ability to compress the signal into a small number of DCT
co ecients. Further, the DCT is a real transform, and fast algorithms for
its computation exist. In order to realize compression in real time using
mo dern VLSI integrated circuits for DCT computation, the image is usually
divided into blo cks of 8 8 pixels [7].
The simplest metho d for the removal of temp oral redundancy is the co ding
of the di erence between successive images from the sequence. The more
elab orate metho ds extract some information ab out the motion and use it to
comp ensate the motion b efore the di erence and co ding op erations.
Manuscript received January 27, 1995.
Aversion of this pap er was presented at the second Conference Telecommunications
in Mo dern Satellite and Cables Services, TELSIKS'95, Octob er 1995, Nis, Yugoslavia..
The authors are with Faculty of Electrical Engineering, University of Belgrade, Bule-
var Revolucije 73, P.O. Box 816, 11001 Belgrade, Yugoslavia. 173
174 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)
In order to facilitate the transmission and storage of video signals, sev-
eral international standards are prop osed, such as H.261 [1] and MPEG [3].
In these standards, the DCT is used to remove the spatial redundancy,and
motion comp ensation is optionally used to remove the temp oral redundancy.
However, no sp eci c motion comp ensation metho d is sp eci ed in these stan-
dards, only the p osition of motion vector in the data stream and its size are
sp eci ed.
The simpli ed blo ck diagrams of co der and deco der in b oth systems are
very similar, and they are presented in Fig. 1.
(a)
(b)
Figure 1. The blo ck diagram of the motion comp ensated video
signal transmission: (a) co der, (b) deco der.
D. Curap ov and M. Pop ovic: Some improvements of the image... 175
In the last few years, several solutions for motion comp ensation in image
sequences were prop osed [6]. It has b een shown that motion comp ensation
can signi cantly improve visual quality of images at the xed bit rate, or
p ermit the transmission at lower bit rate at the xed image quality. However,
no metho d app ears to b e the b est solution.
In this pap er an exp erimental analysis is p erformed to establish what can
be obtained if images are divided into blo cks of variable sizes (instead of
xed size 8 8 pixels) in the pro cess of transformation and co ding. The
results are evaluated using signal to noise ratio as the ob jective criterion,
andalsoby a sub jectiveevaluation. To facilitate the comparison with other
results, the sequence known as Miss America was used in our exp eriments.
2. Motion comp ensation
The motion comp ensation is a very imp ortant part of any algorithm for
the compression of image sequences. Every algorithm for the motion com-
p ensation consists of two parts: the estimation of the motion, and the com-
pensation of the estimated motion. In order to realize ecient motion esti-
mation and comp ensation, images are usually divided into small blo cks, and
the motion of the blo cks is estimated giving a set of motion vectors. The
quality of the motion estimation algorithm is dep endent on its computational
complexity and accuracy of the estimated motion vectors.
In the usual video conferencing sequences, the motions of the blo cks be-
tween two successive images are only few pixels. Because of that, the motion
estimation of a blo ck n n from the current image, reduces to the deter-
mination of the b est match of that blo ck and blo cks of the same size in the
previous image which lay in the search region (n +2p) (n +2p), where
p is the maximal tolerable motion. The most accurate, but the least ef-
fective, motion estimation metho d is the blo ck matching metho d based on
the criterion of the maximal cross-correlation (or minimal average absolute
error) between blo cks from the two images. In both cases, it is necessary
2
to examine (2p +1) p ossible matches, what represents very high compu-
tational complexity. Because of that, in the last few years several more
e ective metho ds for the motion estimation were prop osed [6], that improve
computational complexity up to ten times. In all these metho ds only the
lo cal optimum is found, but this solution is acceptable in this application.
One of the b est algorithms for the motion estimation is the three-step
algorithm [5]. In the rst step of this algorithm, the mean absolute error
(MAE) is calculated in nine p oints with co ordinates (i p ;j p ), p =0
1 1 1
or 3, based on formula [4]:
176 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)
n n
X X
1
MAE(i; j )= j f (u; v ) f (u + i; v + j ) j (1)
k k 1
2
n
u=1 v =1
These p oints are denoted by 1 in Fig. 2. From these nine p oints, the p oint
with minimum MAE is chosen, which represent the rst approximation of
the motion vector. In Fig. 1 this is the p oint(i +3;j+3). In the second step,
the MAE is calculated in new eight points (denoted by 2 in Fig. 2), using
ner resolution p
2 1
determined. This is the p oint(i +3;j +5) in Fig. 2. The second step can b e
rep eated several times, every time with ner resolution p
i i 1
step, the resolution is p =1. When the size of the search region is p 6,
i
only three steps are needed. The nal p osition of the motion vector in Fig.
2is(i +2;j +6). This metho d is also suitable for hardware realizations in
VLSI.
In this pap er, in order to obtain higher compression, several exp eriments
using variable blo ck size will b e p erformed. It is known that blo cks of greater
size can b e co ded using smaller numb er of bits p er pixel (bpp), but their use
is not suitable if there is a large motion b etween images. That is the reason
why in standard co ding metho ds the blo ck size of 8 8 pixels is used. In
our exp eriment, in the rst phase of the motion estimation, the blo ck size of
16 16 pixels was used. Each blo ckischaracterized bytwo parameters: the
motion vector and the MAE. The value of MAE determines the activity of
the blo ck, e.g. it represents the amount of motion in the blo ck that can not
b e comp ensated. Using the MAE, all blo cks are classi ed into three classes,
so that class I contains the blo cks with the highest activity. Then, in the
second phase of the algorithm, four neighb oring blo cks are examined. If at
least three blo cks b elong to class I I I and only one to class I I, then these four
blo cks are merged into a larger blo ckofsize32 32, with motion vector equal
to the arithmetic mean of four motion vectors. If a blo ck b elongs to class I,
it will b e divided into four smaller blo cks of size 8 8. The motion vectors of
these new blo cks should b e determined again, but using the existing motion
vector as a go o d initial guess to reduce the computation. The remaining
blo cks, which are not group ed nor divided, retain the initial size of 16 16
pixels. Using this pro cedure, only the blo cks that contain large motion will
b e co ded using blo cksizeof8 8 pixels.
D. Curap ov and M. Pop ovic: Some improvements of the image... 177
Figure 2. The three{step algorithm for motion estimation.
3. Discrete cosine transform
When motion vectors of all blo cks of an image are determined, the dif-
ferences between blo cks from the current image and corresp onding motion
comp ensated blo cks from the previous image are formed. The di erence
blo cks are then transformed using 2{D Discrete cosine transform (DCT)
using the expression:
n 1 n 1
X X
c c (2x +1)u (2y +1)v
u v
p
F (u; v )= f (x; y ) cos cos (2)
2n 2n
2n
y =0 x=0
where:
8
1
<
p
u; v =0
c ;c = (3)
2
u v
:
1 u; v 6=0
and n =8; 16 or 32.
The values of DCT co ecients F (u; v ) are very di erent. It has been
found by a detailed examination that the values of all DCT co ecients have
178 Facta Universitatis ser.: Elect. and Energ. vol. 9, No.2 (1996)
Laplacian probability densitywithmeanvalue equal to zero. Therefore, the
expression for Laplacian probability density