<<

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2423281

Video Compression using the Three Dimensional Discrete Cosine Transform (3D-DCT)

Article · August 1997 DOI: 10.1109/COMSIG.1997.629976 · Source: CiteSeer

CITATIONS READS 33 784

2 authors, including:

G. de Jager University of Cape Town

110 PUBLICATIONS 1,128 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

UCT Digital Image Processing Masters Studies View project

All content following this page was uploaded by G. de Jager on 08 September 2015.

The user has requested enhancement of the downloaded file.

Video Compression using the Three Dimensional

Discrete Cosine TransformDDCT

Marc Servais Gerhard De Jager Member IEEE

Abstract Sequences of images typically re timation p erformed on each frame These blo cks are

quire vast amounts of electronic memory for storage and

generated from noninterlaced High Denition

o ccupy much bandwidth during transmission Widely used

HDTV frames

image and video compression standards such as JPEG and

The three dimensional DCT describ ed in this pap er

MPEG use the twodimensional Discrete Cosine Transform

DCT to achieve nearoptimal compression of individual

utilises the high degree of temp oral correlation b etween

frames This is done by decomp osing the the frames into

successive frames in a video sequence In contrast to mo

comp onents of dierent spatial frequencies This pap er

tion vector implementations of interframe compression

presents an extension of the DCT to the temporal dimen

sion It describ es the implementation of a based

p erforming a D DCT involves using the same technique

compression scheme which involves the computation of a

in all three dimensions horizontal vertical and temp oral

three dimensional DCT of successive groups of eight frames

This compression scheme transforms the eight image frames

I I Definition

into eight DCT frames with comp onents of b oth spatial

and temp oral frequencies The compression scheme was im

The twodimensional Discrete Cosine Transform DCT

plemented and tested with standard MPEG video test se

of an N xN blo ck of and the Inverse Discrete Co

R C

quences It showed compression p erformance comparable to

sine Transform IDCT are dened by Rao and Yip as

various MPEG implementations

Keywords Discrete Cosine Transform DCT Three di

Forward D DCT

mensional D Video compression Transform co ding

I Introduction

N 1 N 1

R C

X X

S v u v u sy x cos t cos t

2D 1 2

RANSFORMS and in particular integral transforms

y =0 x=0

are used primarily for the reduction of complexity in

T

mathematical problems The Fourier Transform and the

Inverse D DCT

KarhunenLo eve Transform KLT which decorrelates a

sequence are wellknown examples in the digital sig

N 1 N 1

R C

nal pro cessing area

X X

sy x v uS v u cos t cos t

2D 1 2

The KLT is a series representation of a given random

v =0 u=0

function This transform is optimal in that it completely

decorrelates the random function ie the signal sequence

where

in the transform domain and pro duces uncorrelated co e

y v x u

cients Decorrelation of the co ecients is very imp ortant

t t

2 1

N N

C R

for compression b ecause each co ecient can b e treated in

dep endently without loss of compression eciency

r

The Discrete Cosine Transform DCT was rst applied

v u C v C u

2D

to by Ahmed Natara jan and Rao in

N N

R C

They showed that this particular transform was very

and

1

close to the KLT for natural images The two dimen

p

k

2

C k

sional DCT has b een applied to b oth image compression

otherwise

and intraframe compression of video frames

sy x is a value within the N xN image blo ck

R C

Rao and Yip rep ort several implementations of the three

and S v u a DCT co ecient in the corresp onding N xN

R C

dimensional DCT This compression technique has b een

DCT blo ck

applied to multisp ectral scanner data based on xx

If in addition to the two spatial dimensions one consid

cub es comp osed of x blo cks from each of the four sp ectral

ers the dimension of time then the N xN blo ck referred

R C

bands A compression ratio of was achieved

1

to ab ove can b e extended to an N xN xN cub e There

F R C

Probably the most wellknown application is the D

DCT of threedimensional blo cks displaced by motion es

1

Technically cub e would only b e correct for the case N N

F R

N A more general threedimensional term such as blo ck may

C

G De Jager is with the Digital Image Pro cessing Group De

b e considered preferable However blo ck is commonly used to

partment of Electrical Engineering University of Cap e Town South

refer to part of a frame as in macroblo ck Consequently the

Africa Email gdjelecenguctacza

term cub e although not entirely general will b e used to refer to

a threedimensional array of either pixels or DCT co ecients with Servais is a Masters student at the University of Cap e Town

N frames N rows and N columns Email marcsdipeeuctacza

F R C

are then N successive frames of an N xN blo ck forming

F R C

a cub e in DDCT space The three dimensional DCT and

IDCT are dened as

Forward D DCT

N 1 N 1 N 1

F R C

X X X

S w v u w v u f

3D

z =0 y =0 x=0

sz y x cos t cos t cos t g

1 2 3

Inverse D DCT

N 1 N 1 N 1

F R C

X X X

sz y x f w v u

3D

w =0 v =0 u=0

S w v u cos t cos t cos t g

1 2 3

where

x u y v z w

t t t

1 2 3

N N N

C R F

r

w v u C w C v C u

3D

N N N

F R C

and

1

p

k

2

C k

otherwise

sz y x is a pixel value in one of the N image frames

F

z N and S w v u is a DCT co ecient in the

F

corresp onding N xN xN DCT cub e The N xN xN

F R C F R C

DCT cub e then contains information regarding each of the

N image frames An example D DCT is illustrated b e

F

low Eight successive frames taken from a are

transformed pro ducing a corresp onding eight frames in the

DCT domain as depicted in Figures and resp ectively

Note that in this example altogether cub es are trans

2

formed into the DCT domain

In the ab ove example it is apparent that DCT frame

contains much of the information in the DCT cub e while

the opp osite is true for DCT frame Most of the infor

mation in DCT frames to is contained in the area of

motion ie the b ottom right hand corner

It will b e shown that DCT frame can b e thought of as

the DC Frame conveying the information common to each

of the image frames The other DCT Frames all convey AC

information which corresp onds to motion in the original

image sequence

I I I Properties

A Separability

Fig A sequence of eight images Note the motion of the arm in

the b ottom right

The most straightforward metho d of implementing the

D DCT or IDCT is to follow the theoretical equations

Equations and For an xx cub e this corresp onds

2

The image size is x

to multiplication and addition op erations p er co

ecient

However the DCT is a separable transform This im

plies that a multidimensional DCT may b e implemented

as a series of onedimensional discrete cosine transforms

Numerous fast algorithms for implementing the DCT and

IDCT in b oth one and two dimensions exist Thus

for example a fast DDCT could b e implemented on the

rows and columns of each of the N frames This could b e

F

followed by a fast DDCT along the time axis ie corre

sp onding pixels in each of the frames

B Relationship to the DDCT

The Deltamodication process

In this section the relationship b etween the three dimen

sional DCT of an image sequence and the two dimensional

DCT of the rst frame of the same sequence is examined

Consider N frames of a N xN image blo ck

F R C

Let frame b e DDiscreteCosineTransformed to pro

duce one frame with DDCT co ecients S v u

F r ame0

Next let all N frames ie frame to frame N

F F

b e DDiscreteCosineTransformed to pro duce a cub e of

N frames with DDCT co ecients S w v u The co

F

ecients S v u can b e calculated from Equation by

setting w This gives

N 1 N 1 N 1

F R C

X X X

p

v u f S v u

2D

N

F

z =0 y =0 x=0

sz y x cos t cos t g

1 2

where sz x y corresp onds to a pixel value within the

original image sequence

Then letting

sz y x s y x z y x

the op erations on s y x and z y x can b e considered

separately Thus S v u can b e decomp osed into two

comp onents

a term prop ortional to S v u and

F r ame0

a term prop ortional to the amount of motion present in

3

the N image frames relative to the rst frame

F

Accordingly this gives

p

N S v u S v u S v u

F F r ame0

where

N 1 N 1 N 1

F R C

X X X

p

S v u v u f

2D

N

F

z =0 y =0 x=0

z y x cos t cos t g

1 2

Fig The D DCT of the image frames in Figure Grey rep

The ab ove equations are illustrated with an example

resents zerovalued co ecients white represents p ositivevalued

co ecients and black represents negativevalued co ecients

Using the eight image frames shown in Figure as the in

put sequence S v u and S v u were calculated The

3

Note that z x y indicates a pixel value relative to the corre

sp onding pixel in the rst frame of the N frame sequence F

resulting DCT co ecients are represented graphically in Figure demonstrates an interesting phenomenon the

4

Figures and resp ectively entropy of DCT frame has b een further reduced Notice

that no information has b een lost since the original D

For regions of low movement within the N frames com

F

DCT co ecients of frame can b e determined as shown in

prising the image sequence it is evident that the mo di

Equation

ed co ecients S v u are clustered more tightly around

zero This has the eect of reducing the entropy of the

frame Note that in areas of greater movement eg b ot

N 1

F

X

p

tom right the mo died co ecients S v u represent a

N S v u S v u S w v u S v u

F F r ame0

greater amount of information

w =1

The delta mo dication pro cedure therefore involves re

In the next section it will b e illustrated how the tech

placing the co ecients of DCT frame ie S u v

nique of predicting S u v from S u v can b e

with S v u The entropy of DCT frame b efore and af

F r ame0

implemented Naturally this requires that S u v

ter the delta mo dication is depicted by the magnied

F r ame0

b e known

histograms in Figures and It will b e shown in section

IV B that the delta mo dication can b e implemented by

IV Implementation and Results

allowing successive groups of N image frames to overlap

F

A Implementation

by one frame

A software based enco der and deco der was develop ed to

C Aspects of motion in the D DCT Domain

implement video compression using a DDCT scheme

DCT cub es of size xx were used The entropy reduction

The Epsilonmodication process

techniques describ ed in Section III the Delta and Ep

As shown in the previous section S v u is dep endent

silon mo dications to DCT frame were achieved by al

up on motion within the image sequence In addition mo

lowing consecutive groups of eight frames to overlap by one

tion is also represented by the AC frames in the DCT

frame This eectively resulted in eight DCT frames for

cub e namely all the DCT frames with the exception

every seven image frames However this was outweighed

of the the rst This section explores the nature of the

by the advantage of the reduced entropy achieved through

relationship b etween S v u on the one hand and DCT

the use of the delta and epsilon mo dication for the

frames to N on the other

F

rst frame in each group of eight DCT frames

Consider an image sequence of N frames These frames

F

Each frame of DCT co ecients was quantized using the

can b e DDCTtransformed to pro duce a DCT cub e of

standard JPEG quantization matrix Runlength co d

N frames The DCT co ecients S v u in the rst

F

ing was not used but the quantized DCT co ecients were

frame of the DCT cub e can then b e replaced by S v u

compressed using Arithmetic Co ding

as describ ed ab ove in Section B

B Tests Performed

An interesting observation can b e made by referring to

gure Note that the co ecients in frame are negligible

The DDCT enco der and deco der were tested with se

in the areas of low motion However the magnitudes of

quences of images each of which were frames in length

the DCT co ecients are considerably greater towards the

with each frame measuring pixels p er line by lines

b ottom right of the frame the region of motion Further

All the sequences were bit greyscale Two standard test

more the same is true of the AC DCT frames frames

sequences namely the Temp est and the Flower Gar

to N Consequently the question arises as to whether

5

F

den were used The third sequence the Talking Head

this motion information can b e represented more com

was a nonstandard sequence with relatively little motion

pactly

but with two scene cuts

In attempting to answer this question consider a further

The p erformance of the enco der and deco der was mea

mo dication to frame of the DCT cub e Let S v u

sured in terms of the compression achieved expressed in

dened as follows

bits p er pixel and the quality of the compressed video se

quence relative to the original The p eak signal to

N 1

F

X

ratio PSNR was used as a measure of image quality

S v u S v u S w v u

w =1

C Results

Table provides a summary of the results obtained when

Thus S v u is dened as the sum of S v u and the

the test images listed ab ove were compressed and subse

corresp onding N co ecients in the AC or motion

F

quently decompressed Figure displays a framebyframe

DCT frames The mo died co ecients in frame ie

analysis of the compressed Talking head sequence The

S v u can then b e replaced with S v u The resulting

compressed le size is shown as the average for each group

co ecients in the Epsilon Mo died frame are shown in

of DCT frames

Figure Figure shows the resulting distribution of the

5

co ecients in DCT frame

Both of these sequences diered slightly from the originals in that

they contained several scene jumps which corresp onds to addi

4

Both frames of co ecients are shown subsequent to quantization tional motion information

Fig A magnied Histogram of the co ecient distribution in

Fig A representation of the co ecients in DCT Frame

DCT Frame Std deviation of co ecients

Fig A representation of the co ecients in DCT Frame after

Fig A magnied Histogram of the co ecient distribution in

Delta Mo dication

DCT Frame after Delta Mo dication Std deviation

Fig A representation of the co ecients in DCT Frame after

Fig A magnied Histogram of the co ecient distribution in

Epsilon Mo dication

DCT Frame after Epsilon Mo dication Std deviation

TABLE I

6

around the scene cut

The compression achieved for the three test sequences

This problem could p ossibly b e avoided in a similar way to

that employed by MPEG When an MPEG enco der detects

Image sequence Bits pixel PSNR dB

a scene cut it do es not use forward prediction to enco de

the new frame Instead the frame is enco ded as an I

Flower Garden

a

frame

Flower Garden

The frames b etween the two scene cuts are characterised

Temp est

by low motion This is reected in the graph which shows

Talking head

b

that these frames are compressed to under bits p er

Talking head

pixel

a

The same sequence as ab ove but with coarser quantization

b

V Conclusion

The same sequence as ab ove but with coarser quantization

This pap er has highlighted the technique of using a D

DCT for video compression The main advantages and dis

advantages of the compression scheme can b e summarised

as follows

The DDCT based compression technique is conceptu

ally simple It can b e viewed as a natural extension

of the DDCT image compression metho d to include the

temp oral dimension Furthermore since the DDCT is

a separable transform it can b e implemented as a series

of D transforms or as a D transform followed by a D

transform An imp ortant advantage of this is that nu

merous fast techniques for implementing the D and D

transforms exist b oth in hardware and software

Perhaps the ma jor disadvantage is that an implementa

tion of the algorithm requires large amounts of memory as

temp orary storage since eight frames must b e stored in

7

Fig A graph showing the PSNR and the degree of compression

buers at any one time

achieved for the frames of the Talking head sequence

It is likely that a signicant improvement in the degree

of compression achieved could b e obtained through the use

of runlength co ding

D Analysis

References

D F Elliot and R K Rao Fast Transforms algorithms anay

Table demonstrates that the an image sequence can b e

ses applications Academic Press Inc

compressed at the exp ense of picture quality represented

N Ahmed T Natara jan and K Rao Discrete cosine trans

by the PSNR In the cases of the Flower Garden and

form in IEEE Trans vol C pp Jan

Talking head sequences increasing the degree of quanti

K Rao and P Yip Discrete Cosine Transform Algorithms Ad

zation resulted in signicant improvement in the degree of

vantages Applications Academic Press Inc

compression achieved but a corresp onding loss of quality N Ahmed and T Natara jan Some asp ects of adaptive trans

form co ding of multisp ectral data in th Asilomar Conf on

Figure shows a graph of image quality and compression

Circuits Systems and Computers pp Nov

for each frame of the Talking head sequence The image

O Chantelou and C Remus Adaptive transform co ding of

HDTV pictures in nd International Workshop on Signal Pro

quality is indicated by the PSNR which varies b etween

cessing of HDTV pp Mar

and dB for most of the frames The exception to this

M Servais Video compression using the three dimensional dis

is frame at which p oint the PSNR shows a sharp drop to

crete cosine transform Undergraduate Thesis Electrical Engi

neering University of Cap e Town Oct

just dB In the original image sequence there is a scene

W B Pennebaker and J L Mitchell JPEG Stil l Image Data

cut at this p oint and frame depicts a combination of two

Compression Standard Van Nostrand Reinhold

scenes The plot of the compression achieved exhibits two

interesting features

The degree of compression is signicantly reduced for the

frames in the region of the scene cuts at frames and

This is as a result of the increased amount of information

conveyed during the rst frame of a scene change since

forward prediction cannot b e used

Note that Frame conveys new information since it is the

start of the sequence However the degree to which the

6

Less than bits p er pixel compared to and bits p er

initial group of were frames were compressed is signicantly

pixel

7

less than the compression achieved for the groups of frames More generally N frames are required to b e buered F

View publication stats