<<

Chapter 3

Multimedia Systems

Technology: Co ding and

Compression

3.1 Intro duction

In system design, storage and transport of information play a sig-

ni cant role. Multimedia information is inherently voluminous and therefore

requires very high storage capacity and very high bandwidth transmission

capacity.For instance, the storage for a frame with 640  480 pixel

resolution is 7.3728 million bits, if we assume that 24 bits are used to en-

co de the luminance and chrominance comp onents of each pixel. Assuming

a of 30 frames p er second, the entire 7.3728 million bits should

b e transferred in 33.3 milliseconds, which is equivalent to a bandwidth of

221.184 million bits p er second. That is, the transp ort of such large number

of bits of information and in such a short time requires high bandwidth.

Thereare two approaches that arepossible - one to develop technologies

to provide higher bandwidth of the order of Gigabits per second or more and

the other to nd ways and means by which the number of bits to betransferred

can be reduced, without compromising the information content. It amounts

to saying that we need a transformation of a string of characters in some

representation such as ASCI I into a new string e.g., of bits that contains

the same information but whose length must b e as small as p ossible; i.e.,

. Data compression is often referred to as co ding, whereas

co ding is a general term encompassing any sp ecial representation of data

that achieves a given goal. For example, reducing the numb er of bits is by

itself a goal.

In 1948, , the father of ,intro duced 77

78 Multimedia Systems Technology: Co ding and Compression Ch. 3

Source Destination Encoder Channel Decoder

(Sender) (Receiver)

Fig. 3.1. Shannon's Communication Systems Mo del

a concept called entropy to measure the information content of a source[1].

Shannon prop osed a communication system mo del shown pictorially in

Fig. 3.1 which addresses two basic issues:

 How can a communication system eciently transmit the information

that a source pro duces?

 How can a communication system achieve reliable communication over

a noisy channel?

The rst issue relates to compression and the second relates to error

control. In b oth cases, there is a transformation of number of bits that

represent the information pro duced by a source. These two asp ects are also

known as sourcecoding and channel coding. Information theory is the study

of ecient co ding and its consequences in the form of sp eed of transmission

and probability of error.

In this chapter we fo cus on source co ding compression and standards for

compression. Information co ded at the source end has to b e corresp ondingly

deco ded at the receiving end. Co ding can b e done in suchaway that the

information content is not lost; that means it can b e recovered fully on

deco ding at the receiver. However, such as image and video meant

primarily for human consumption provide opp ortunities to enco de more

eciently but with a loss.

Co ding consequently, the compression of multimedia information is

1

sub ject to certain quality constraints. For example, the quality of a picture

should b e the same when co ded and, later on, deco ded. Dealing with p er-

ceptual quality it is p ossible for one to design a co ding metho d that maybe

lossy. By lossy co ding we mean that there is information loss, but the loss

do es not a ect the p erceptual quality. In practice, one or more parameters

related to the source may b e used in designing co ding schemes that result in

. Also, the complexity and the execution time of the tech-

niques used for co ding and deco ding should b e minimal. These constraints

1

By quality,we mean p erceptual quality.

Sec. 3.2. What is ? 79

are related to the nature of the applicationlive or orchestrated as well. It

should b e noted that all these requirements are based on the characteristics

of human p erception of di erent media. For example, for interactive mul-

timedia application such as video conferencing, the end-to-end delay should

b e less than 150 milliseconds. In the context of current high-sp eed transmis-

sion technologies, such an end-to-end delay requirementwould imply that

compression and decompression, if any, should b e contained within 50 mil-

liseconds. Similarly,multimedia applications that are retrieval based, such

as on-demand video service, require that random access to single images and

audio frames b e p ossible under 500 milliseconds.

Co ding and compression techniques are critical to the viabilityofmul-

timedia at b oth the storage level and at the communication level. Some of

the multimedia information has to b e co ded in continuous time dep endent

format and some in discrete time indep endent format. In multimedia con-

text, the primary motive in co ding is compression. By nature, the audio,

image, and video sources have built-in redundancy, which make it p ossible

to achieve compression through appropriate co ding. As the image data and

video data are voluminous and act as prime motivating factors for compres-

sion, our references in this chapter will often b e to images even though other

data typ es can b e co ded compressed.

3.2 What is Image Compression?

Image data can b e compressed without signi cant degradation of the visual

p erceptual quality b ecause images contain a high degree of:

 spatial redundancy, due to correlation b etween neighb oring pixels this

is also referred to as statistical redundancy,

 sp ectral redundancy, due to correlation among color comp onents, and

 psycho-visual redundancy, due to p erceptual prop erties of the human

visual system.

The higher the redundancy, the higher the achievable compression. This is

what we referred to as sourcecoding earlier in the communication system of

Shannon. A typical image compression system or source enco der consists

of a transformer, quantizer, and a co der, as illustrated in Fig. 3.2.

Transformer applies a one-to-one transformation to the input image

data. The output of the transformer is an image representation whichis

80 Multimedia Systems Technology: Co ding and Compression Ch. 3

Image Representation that is more amenable to compression Symbols Input Image

Raw Image Binary

Data Transformer Quantizer Coder BitStream

Fig. 3.2. Typical Image Compression System

more amenable to ecient compression than the raw image data. Unitary

mappings such as Discrete Cosine Transform, which pack the energy of the

signal to a small numb er of co ecients, is a p opular metho d with image

compression standards.

Quantizer generates a limited numb er of symb ols that can b e used in

the representation of the compressed image. Quantization is a many-to-

one mapping which is irreversible. It can b e p erformed by scalar or vector

quantizers. Scalar quantization refers to element-by-element quantization of

data and refers to quantization of a blo ck at a time.

Co der assigns a co de word, a binary bit-stream, to each symb ol at the

output of the quantizer. The co der may employ xed-length or variable-

length co des. Variable Length Co ding VLC, also known as entropy co ding,

assigns co de words in suchaway as to minimize the average length of the

binary representation of the symb ols. This is achieved by assigning shorter

co de words to more probable symb ols, which is the fundamental principle of

entropy co ding.

Di erent image compression systems are based on di erent combinations

of transformer, quantizer, and co der. Image compression systems can b e

broadly classi ed as:

 Lossless: Compression systems, which aim at minimizing the bit-rate

of the compressed output without any distortion of the image. The

decompressed bit-stream is identical to the original bit-stream. This

metho d is used in cases where accuracy of the information is essential.

Examples of such situations are programs, data, medical

imaging, and so on. systems are also referred to

as bit-preserving or reversible compression systems.

 Lossy: Compression systems, which aim at obtaining the b est p ossible

delity for a given bit-rate or minimizing the bit-rate to achieve a given

Sec. 3.3. Taxonomy of Compression Techniques 81

delity measure. Such systems are suited for video and audio.

The transformation and co ding stages are lossless.However, the quantiza-

tion is lossy.

3.3 Taxonomy of Compression Techniques

Based on the lossless or lossy compression, the enco ding is broadly classi ed

as leading to lossless compression and source encoding

leading to lossy compression.

It is interesting to note that the term entropy encoding refers to all those

co ding and compression techniques which do not takeinto account the na-

ture of the information to b e compressed. In other words, entropy enco ding

techniques simply treat all data as a sequence of bits and ignores the seman-

tics of the information to b e compressed.

On the contrary, the source encoding takes cognizance of the typ e of the

original signal. For example, if the original signal is audio or video, source

enco ding uses their inherentcharacteristics in achieving b etter compression

ratio. Normally, it is exp ected that source enco ding will pro duce b etter

compression compared to entropy enco ding techniques. But, of course, the

actual degree of success will dep end on the semantics of the data itself and

mayvary from application to application. The designer of a compression

system based on source enco ding has the choice to render it lossless or lossy.

All of this aside, in practical systems and standards, b oth entropy en-

co ding and source enco ding are usually combined for b etter e ect in com-

pression.

Let us understand the principle b ehind some of the co ding techniques.

For an exhaustive treatment of all the techniques, the reader maywantto

refer to [4; 5; 6; 7].

3.4 Entropy Enco ding Techniques

3.4.1 Run-length Enco ding

In this technique, the sequence of image elements or pixels in a scan line

x , x , ... ,x is mapp ed into a sequence of pairs  ,l , c ,l , ..., c ,l ,

1 2 n 1 1 2 2 k k

th

where c represents a color or intensity and l the length of the i run

i i

sequence of pixels with equal intensity.

Example: Let digits 1, 2, 3 represent Red, Green, and Blue. These will corresp ond to

c . Let a scan line b e of length 35 consisting of i

82 Multimedia Systems Technology: Co ding and Compression Ch. 3

111111111113333333 33322 22222 22111 11

as x . Then, the run-length enco ded stream will b e the series of tuples 1,11, 3,10, 2,9,

i

and 1,5, where 11,10,9,5 are the l .

i

3.4.2 Rep etition Suppression

In this technique, a series of n successive o ccurrences of a sp eci c character is

replaced by a sp ecial character called ag, followed byanumb er representing

the rep etition count. Normal applications are suppression of 0s in a data le

or a le and suppression of blanks in a text or program le.

Example: Consider a sequence of digits in a data le which lo oks like the following:

98400000000000000000 00000 00000 00000 .

When we use the rep etition sequence suppression metho d, the sequence will lo ok like

984f32. The savings is obvious and is dep endent up on the contents.

3.4.3 Pattern Substitution

Pattern substitution is a form of a statistical co ding metho d. The idea

here is to substitute an oft-rep eated pattern by a co de. If we are trying to

compress the b o ok Hound of Baskervilles, the name Sherlock Homes can b e

substituted by S*, the name Watson by W*, and the expression Elementary

my dear Watson by E*. Obviously, frequent patterns will use shorter co des

for b etter compression.

Example: Consider the lines:

This b o ok is an exemplary example of a b o ok on multimedia and networking. Nowhere

else will you nd this kind of coverage and completeness. This is truly a one-stop-shop for

all that you wanttoknow ab out multimedia and networking.

If we simply count, there are a total of 193 characters without counting blanks and

232 with blanks. If we group words such as a, ab out, all, an, and, for, is, of, on, that, this,

to, and will, they o ccur 2, 1, 1, 1, 3, 1, 2, 2, 1, 1, 3,1, and 1 times, resp ectively. All of them

have a blank character on either side, unless when they happ en to b e the rst word or last

word of a sentence. The sentence delimiter period is always followed by a blank character.

The words multimedia and networking app ear twice each. With all this knowledge ab out

the text, we can develop a substitution table that will b e very e ective. Notice that there

is no loss and the co ding is reversible. Let us represent the group of words that we iden-

ti ed for the text under consideration by 1,2,3,4,5,6,7,8,9,+,&,=,and.Letus

also substitute multimedia by m* and networking by n*. The resulting co ded string will b e:

Sec. 3.4. Entropy Enco ding Techniques 83

& b o o k 7 4 e x e m p l a r y

sp e x a m p l e 8 1 b o o k 9 m

* 5 n * . N o w h e r e sp e l s

e  y o u sp f i n d & k i n d 8

c o v e r a g e 5 c o m p l e t

e n e s s . & 7 t r u l y 1 o n

e - s t o p - s h o p 6 3 + y o

u sp w a n t = k n o w 2 m * 5 n

* .

That is a total of 129 characters and 33.16 compression.

3.4.4 Hu man Co ding

Hu man co ding is based on the frequency of o ccurrence of a character or an

o ctet in the case of images. The principle is to use a lower numb er of bits to

enco de the character that o ccurs more frequently. The co des are stored in a

co deb o ok. The co deb o ok may b e constructed for every image or for a set of

images, when applied to still or moving images. In all cases, the co deb o ok

should b e transferred to the receiving end so that deco ding can take place.

Example: In the sentences given in the example for pattern substitution, the o ccurrence

of the alphab ets are as follows:

Character: a b c d e Frequency: 1532718Code: 1100 0101011

0010011 01000 1011

Character: f g h i j

Frequency: 4 3 6 14 0

Code: 0001100 0101000 101010 00111 00000111

Character: k l m n o

Frequency: 6 11 7 16 21

Code: 101011 1000 00010 1111 1110

Character: p q r s t

Frequency: 5 0 7 10 15

Code: 0100111 000011000 10100 0011 001000

Character: u v w x y

Frequency: 6 1 6 2 4

Code: 0000111 0000111 110101 0100100 0000011

Character: z . sp

Frequency: 0 3 39

Code: 0000100 0000100 0111

84 Multimedia Systems Technology: Co ding and Compression Ch. 3

If they are co ded as p er the co de given in the co de-b o ok Table ab ove, the binary bit

stream will b e:

001000 101010 00111 0011 0111 00111 0011 0111 1100 1111

..T... ..h... ..i.. ..s. .sp. ..i.. ..s. .sp. ..a. ..n.

0111

.sp. .... and so on.....

The total size of the bit stream will b e 1124 bits as compared to 1856 bits 232 x 8

of the original text; a compression of 39.44.

3.5 Source Enco ding Techniques

Source enco ding is based on the content of the original signal and hence it

is rightly termed as semantic-based co ding. The compression that can b e

achieved using source co ding maybevery high, when compared to strict

entropy co ding. At the same time, it should b e noted that the degree of

compression is highly dep endent on the data semantic. In general, source

enco ding may op erate either in a lossless or lossy mo de.

Source enco ding techniques can b e classi ed into three basic typ es; viz.,

transform encoding, di erential encoding, and vector quantization. Each one

of these metho ds are e ective for a signal or raw data with certain charac-

teristics. The metho ds and where they are applicable are explained in this

section. Fig. 3.3 provides the bird's eye view of co ding techniques.

3.5.1 Transform Enco ding

In transform enco ding, the raw data undergo es a mathematical transforma-

tion from the original form in spatial or temp oral domain into an abstract

domain, which is more suitable for compression. The transform pro cess is

a reversible pro cess and the original signal can b e obtained by applying the

inverse transform. Fourier transform and Cosine transform are two p opu-

lar examples. They transform the signal from the space or time domain to

frequency domain.

The imp ortant p oint to note here is that the choice of the typ e of trans-

formation dep ends on the typ e of data. For instance, in the case of images,

transforming the signal from the initial time domain to the frequency do-

main has advantages. The reason is that the sp ectral representation i.e.,

frequency sp ectrum in the frequency domain of images captures the changes

in color or luminance rapidly. When an image signal f x in spatial domain

Sec. 3.5. Source Enco ding Techniques 85

CODING

ENTROPY CODING SOURCE COING

Repetitive Differen Vector Statitical Transform tial Sequence Coding Quanti Suppression Encoding Coding zation

DPCM

DM Run-Length Coding Pattern Substitution Zero Length Suppression

Discrete Cosine Transform ADPCM

Fig. 3.3. Bird's Eye View of Co ding Techniques

is transformed to F u in frequency domain, the DC comp onent and the

low frequency comp onents carry most of the information contained in the

original signal f x. Even though the signal is an in nite series in the trans-

formed domain, the most signi cant co ecients are only in the rst few

terms. In the quantization stage, normally the less signi cant co ecients

are dropp ed. Only the rst k coecients are actually used. The choice of k,

however, dep ends on the application requirement. Moreover, the signi cant

co ecients can b e co ded with b etter accuracy than the less signi cant ones.

Discrete Cosine Transform DCT is the transform enco ding technique

used in image compression standards such as JPEG. The co ecients sig-

ni cant part of the transformed signal retained are called DCT coecients.

This metho d is further discussed in Section 3.8.

3.5.2 Di erential Enco ding

In di erential enco ding, only the di erence b etween the actual value of a

sample and a prediction of that value is enco ded. Because of this approach,

di erential enco ding is also known as predictive enco ding. Techniques such

as di erential pulse code modulation, , and adaptive pulse

code modulation b elong to this class of di erential enco ding techniques. All

86 Multimedia Systems Technology: Co ding and Compression Ch. 3

10 10 10 9 9 9 8 8 8 7 f(t ) 7 7 f(t) 6 i 6 f(t ) 6 5 5 i 5 4 4 4 3 3 3 2 2 2 1 1 1

tt1 2 3 4 5 6 7 t

Fig. 3.4. Di erential PCM Example

these metho ds essentially di er in the prediction part.

The di erential enco ding technique is well suited to signals in which

successive samples do not di er much from each other, but they should b e

signi cantly di erent from zero values! These techniques naturally apply

to motion video where one can transmit the di erence in images across

successive frames and audio signals.

Di erential Pulse Co de Mo dulation DPCM: In DPCM, the prediction func-

tion is simply the following:

f t =f t 

pr edicted i actual i1

So what needs to b e enco ded at every sampling instant is:

f t =f t - f t 

i actual i actual i1

If successive sample values are close to each other, then only the rst

sample needs a larger numb er of bits to enco de and the rest can b e enco ded

with a relatively shorter numb er of bits.

Example: Consider the signal and its samples as shown in Fig. 3.4. The f t at

actual i

various instances t , t , t , t , t , t , and t are 9, 10, 8, 7, 6, 7, and 9. The f t 

1 2 3 4 5 6 7 pr edicted i

are 0, 9, 10, 8, 7, 6, and 7. Therefore, the f t  are +9, +1, -2, -1, -1, +1, and +2.

i

Delta Mo dulation: Delta mo dulation is a sp ecial case of DPCM. The pre-

diction function is the same as DPCM. The di erence is in co ding the error

di erence b etween the predicted and the actual values. Delta Mo dulation

co des the error as a single bit or digit. It simply indicates that the `current'

sample is to b e increased by a step or decreased by a step. Delta mo dulation

is more suited to the co ding of signals that do not change to o rapidly with the sampling rate.

Sec. 3.5. Source Enco ding Techniques 87

Adaptive DPCM: ADPCM is the sophisticated version of DPCM. The pre-

dicted value is extrap olated from a series of values of preceding samples using

a time-varying function. That means, instead of using a constant prediction

function, the estimation is made variable to b e in line with the characteristics

of the sampled signal.

Vector Quantization: In vector quantization, the given data stream is di-

vided into blo cks called vectors. These blo cks can b e one- or two-dimensional.

Typically, when we deal with images, the blo cks are usually q square blo ckof

pixels. Often, the vectors are of the same size. Consider the image presented

in Fig. 3.5 and its digital representation with a resolution of8x8.Inthe

gure, there are 2 x 2 blo cks that together make up the 8 x 8. Each one of

the 2 x 2 blo cks, will have a pattern b etween all 0s and all 1s, b oth inclusive.

A table called code book is used to nd a match for eachoneofthe2x2

blo cks, which will corresp ond to some or all the patterns. Eachentry in the

co de b o ok basically, it is a table of entries as shown in Fig. 3.5 is a 2 x 2

pattern of 0s and 1s written in a linear form by stacking the rows of the 2 x

2 matrix one after another. This is indicated as position index in Fig. 3.5.

The co de b o ok itself may b e prede ned or dynamically constructed.

Using the co de b o ok, eachvector in the image is co ded using the b est

matchavailable. As the co de b o ok is present, b oth at the co ding and the

deco ding ends, only the reference to the entry is actually transferred. When

there is no exact matchavailable in the co de b o ok, the vector of the image

is co ded with reference to the nearest pattern and will app ear as distortion

in the deco ded picture. Also, while co ding and transmitting, one can also

transfer the errors di erence b etween the entries in the co de b o ok and the

vectors in the image in a quantized form. Dep ending on the quantization

level of the errors and whether or not the errors are sent along with the

image, the scheme may b ecome lossy or lossless.

Vector quantization is useful in co ding or compressing a source whose

characteristics are particularly well known. Moreover, it should b e p ossible

to construct the co de b o ok that can approximate a wide range of actual im-

age vectors. Vector quantization is generally found to b e useful in co ding of

sp eech. The reader maybeinterested in understanding more ab out optimal

construction of co de b o oks and the algorithms to nd out the b est pattern

match.

88 Multimedia Systems Technology: Co ding and Compression Ch. 3

0 0 000 000 0 0 0011 0 0 0 0 111 1 0 0 0 1 1 0 1 1 1 0

0 1 0 1 0 0 1 0

0 0 1 1 1 1 0 0 0 000 1 1 0 0 0 0 0 0 0 0 0 0

1234 Codes from the Code Book 0 0 12 Entry Number in the Code Book 1 0000 11 3 4 2 0001 3 0010 1231 Position C 4 0011 index O 5 0100 6 0101 D 23EF E 7 0110 8 0111 B 9 1000 A 1001 O 5849 O B 1010 K C 1011 D 1100 E 1101 1591 F 1110

G 1111

Fig. 3.5. Vector Quantization

3.6 Image Compression System

The uncompressed picture is in analog form and the compressed picture

is in digital form and there are several steps involved in compressing and

decompressing a picture. The source image data which is in digital form is

compressed using an enco der and reconstructed from the compressed form,

using a deco der. The encoder and the decoder are the two functional blo cks of

an image compression system. This is shown in Fig. 3.6. Both the enco der

Source Reconstructed Image Compressed Image Encoder Decoder

Image Fig. 3.6. Image Compression System

Sec. 3.6. Image Compression System 89

Uncompressed Picture Phase I Phase II Phase III Phase IV

Entropy Picture Picture Quantization Preparation Processing Coding

Compressed

Picture

Fig. 3.7. Basic Units of an Enco der

and the deco der have di erent parts A generic comparison of units that

together make up an enco der is shown in Fig. 3.7. These units corresp ond

to di erent phases in the enco ding pro cess. The di erent phases together

convert the analog form of the picture to a corresp onding compressed digital

stream at the output.

Phase I is the picturepreparation phase. In this phase the uncompressed

analog signal is converted to its digital counterpart through sampling. Each

sample is then represented in digital form using the appropriate number

of bits p er sample. A picture is considered as consisting of a number of

pixels in the X and Y axes to cover the entire screen. The picture pixels

is also divided into blocks consisting of 8  8 array of pixels each. This is

2

true for the JPEG for still pictures or images[4]. In the case of

3

motion video compression based on MPEG , motion-comp ensation units

called Macroblo cks of size 16  16 are used. This size is arrived at based

on a trade-o b etween the co ding gain provided by motion information and

the cost asso ciated with co ding the motion information [5,7]. These two

standards are discussed further in this chapter.

Phase II is the pictureprocessing phase. In this phase, the actual com-

pression takes place using a source co ding or lossy algorithm. A transforma-

tion from the time domain to the frequency domain is p erformed. As most

of the `information' content is in the DC and low frequencies, appropriate

weights are used in selecting the co ecients for inter-frame co ding in this

phase.

Phase III is the quantization phase. In this phase, the output of the previ-

ous stage, which are co ecients expressed as real numb ers, are mapp ed into

2

Joint Photographers Exp erts Group.

3 Motion Pictures Exp erts Group.

90 Multimedia Systems Technology: Co ding and Compression Ch. 3

integers, resulting in a reduction of precision. In the transformed domain,

the co ecients are distinguished according to their signi cance; for example,

they could b e quantized using a di erentnumb er of bits p er co ecient.

Phase IV is the entropy encoding phase. In this phase, which is the last

step in the compression pro cess, the digital data stream which is output from

the quantization phase is compressed without losses.

In some cases, dep ending on the content of the image, the rep eated ap-

plication of pro cessing phase and quantization phase, will result in b etter

compression. Some compression techniques realize this and gainfully em-

ploys them. Such techniques are called adaptive compression schemes, and

they essentially rep eat Phases I I and I I I several times for b etter compres-

sion. After compression, the data lo oks as shown in Fig. 3.8. A preamble,

the co ding technique used, actual data, and p ossibly an error correction co de

are all part of a typical compressed picture.

Picture

1100011001000111...... 01010000...... 0110 Start Coding Method

Error Correction Code

Fig. 3.8. The Compressed Picture

At the receiving end, this information has to b e decompressed b efore

presentation. Decompression is the inverse of the compression pro cess. If an

application uses similar techniques resulting in same execution time for b oth

compression and decompression, then we call that application symmetric;

otherwise it is called asymmetric.Interactive applications are symmetric and

presentation applications are asymmetric. Teleconferencing is an example of

a symmetric application and an audio-visual tutoring program is an example

of an asymmetric application. As we mentioned earlier, dep ending on the

picture quality and time constraints, appropriate compression technique will

b e selected. The standards applicable for pictures provide for this option.

3.7 Compression System Standards

There are several co ding and compression standards already available; some

of them are in use in to day's pro ducts while others are still b eing develop ed.

The most imp ortant standards are:

 JPEG, standard develop ed jointly by ISO and ITU-TS for the com- pression of still images.

Sec. 3.8. JPEG 91

 MPEG 1, MPEG 2 and MPEG 4 standards develop ed by ISO com-

mittee IEC/JTC1/SC29/WG11 for co ding combined video and audio

information.

 H.261, standard develop ed by Study Group XV and known p opularly

as Video Co ded for Audiovisual Services at px64 Kbps.

 ITU-TS H.263 for videophone applications at a bit-rate b elow 64 Kbps.

 ISO JBIG for compressing bilevel images.

 DVI, a de facto standard for compression from Intel that enables stor-

age compression and real-time decompression for presentation pur-

p oses. This will not b e discussed further as this standard is getting

obsolete.

3.8 JPEG

JPEG is the acronym for Joint Photographic Exp erts Group, and is a joint

collab oration b etween the ISO committee designated JTC1/SC2/WG10 and

the CCITT SGVI I I ITU-T to establish an international standard for con-

tinuous tone multilevel still images, b oth grey scale and color. JPEG can

compress typical images from 1/10 to 1/50 of their uncompressed bit size

without visibly a ecting image quality. JPEG addresses several applications

such as photovideotex, color facsimile, medical imaging, ,

graphic , newspap er wire photo transmission, and so on. In 1992, JPEG

b ecame an ISO standard and an international standard [4,11,12].

The goal of JPEG has b een to develop a metho d for continuous tone

image compression which meets the following requirements:

 Enco der should b e parameterizable so that the application can set the

desired compression quality tradeo . The compression scheme should

result in image delity.

 To b e applicable to practically any kind of continuous tone digital

source image; not to b e restricted to certain dimensions, color spaces,

pixel asp ect ratio, etc. Not to b e limited to classes of scene imagery

with restrictions on scene content such as complexity, range of colors, etc.

92 Multimedia Systems Technology: Co ding and Compression Ch. 3

 Should have tractable computational complexity, to make it p ossible

to implement on a range of CPUs, as well as sp ecial-purp ose hardware.

The compression pro cess has to b e completed in real-time.

 Should b e p ossible to implement in hardware with a viable cost for

application requiring high p erformance

 Should have the following mo des of op eration:

{ Sequential encoding: each image is enco ded in multiple scans for

applications in which transmission time is long and the viewer

prefers to watch the image build up in multiple coarse to clear

phases.

{ Lossless encoding: the image is enco ded to guarantee exact recov-

ery of every source image sample value

{ Hierarchical encoding: the image is enco ded at multiple resolu-

tions so that the lower resolution versions may b e accessed with-

out rst having to decompress the image at its full resolution

JPEG aims at a very general scheme which is picture-feature indep endent

and technique indep endent. Basically, it de nes a framework that can take

care of all the requirements listed ab ove. The stages of JPEG enco ding are as

shown in Fig. 3.9, the corresp onding stages in deco ding, decompression and

reconstruction of the picture are shown in Fig. 3.10. The JPEG standard

is general and is indep endent of picture size, color, and asp ect ratio. JPEG

considers a picture as consisting of a numb er of comp onents, each with a

di erent size. For example, brightness and color are two comp onents of a

picture.

An Aside: The elements sp eci ed in the JPEG standards are encoder, decoder, and an

interchange format.

An encoder is an emb o diment of an enco ding pro cess. An enco der takes as input digital

source image data and table speci cations, and by means of a sp eci ed set of procedures

generates as output compressed image data.

A decoder is an emb o diment of the decoding process. A deco der takes as input com-

pressed image data and table sp eci cations same as the ones used by the enco der, and

by means of a set of pro cedures generates as output digital reconstructed image data.

An interchange format is a compressed image data representation which includes all

table sp eci cations used in the enco ding pro cess. the interchange format is for exchange

between applicatio n environments.

Sec. 3.8. JPEG 93

Digitized Source Image

Compression and Coding Compression and Coding

FoForward Discerete Cosine Entropy Quantizer EntropyEncoder Transform Quantizer Encoder (FDCT)

Table Table SpecificationTable SpecificationTable Specification Specification

Compressed Image Data Fig. 3.9. JPEG Enco ding of a Picture

Compressed Image Data

Table Table Specification Specification

IDCT Entropy Inverse Decoder Dequantizer Discrete Cosine Transform

Decoding and Decompression

Reproduced 8 x 8 Blocks Picture

Reconstructed Image Data

Fig. 3.10. JPEG Deco ding of a Picture

3.8.1 JPEG Compression Pro cess

The JPEG standard enables the design of two pro cesses; viz.; image compres-

sion and decompression. In a sequential mo de, each one of these pro cesses

94 Multimedia Systems Technology: Co ding and Compression Ch. 3

 JPEG Compression Pro cess

{ Preparation of Data Blo cks

{ Source Enco ding step

 Discrete Cosine TransformDCT, also called forward DCTFDCT

 Quantization

{ Entropy Enco ding step

 Run Length Co ding

 Hu man or Arithmetic Co ding

 JPEG Decompression Pro cess

{ Entropy Deco ding Step

 Hu man or Arithmetic Co ding

 Run Length Co ding

{ Source Deco ding Step

 Dequantization

 Inverse Discrete Cosine TransformIDCT

Table 3.1. Steps in the JPEG Pro cess

consists of several steps as shown in Table 3.1.

JPEG may compress grey-scale and color images. As there are many

ways to represent color images, the standard do es not imp ose an initial

format for the representation of color images. In general, the pictures are

considered as a matrix of colored dots called pixels. Each pixel maybe

4

represented byanRGB triplet, a YUV Europ ean TV or YIQ North

American and Japanese TV, a YC C triplet, or by many other combina-

r b

tions. The digitized variables which represent the image are called the image

components. R,Y,Q or C are examples of image comp onents, which are rep-

b

resented as matrices of values. Sometimes, these comp onents may b e sub-

sampled - which is normally done in the case of color di erence comp onents.

As the size of the image to b e compressed is variable and the sub-sampling

unknown, JPEG has foreseen the need to deal with a relatively large num-

b er of image comp onents. The maximum numb er of comp onents that JPEG

can deal with is 255. In practice, regular color images use three comp onents

4

Comp osite signals - RGB are direct and the rest are derivatives, as explained earlier in Chapter 2.

Sec. 3.8. JPEG 95

only.

JPEG utilizes a metho dology based on Discrete Cosine Transform DCT

for compression. It is a symmetrical pro cess with the same complexity for

co ding and for deco ding. The reader should note that JPEG do es not have

an emb edded enco ded/compressed and is aimed at pictures or

images only.

In DCT based co ding, two distinct mo des are p ossible: sequential and

progressive.For sequential DCT-based mo de, 8x8 sample blo cks are typically

input blo ck-by-blo ck from left-to-right, and then blo ck-rowby blo ck-row

top-to-b ottom. After a blo ck has b een quantized and prepared for entropy

enco ding, all 64 of its quantized DCT co ecients can b e immediately en-

tropy enco ded and output as a part of the compressed image data thereby

minimizing the storage requirements. In this approach, the image will slowly

unfold from top-to-b ottom.

For the progressive DCT-based mo de, 8x8 blo cks are also typically en-

co ded in the same order, but in multiple scans through the image. This is

accomplished by adding an image-sized co ecient memory bu er b etween

the quantizer and the entropy enco der. As each blo ck is quantized, its co-

ecients are stored in the bu er. The DCT co ecients in the bu er are

then partially enco ded in eachofmultiple scans. This approach will make

a silhouette of the image app ear rst and then progressively sharp en and

brighten it.

Preparation of Data Blo cks: As explained earlier in Chapter 2, the

source image has to b e converted from the analog to digital form. The

source digital image in its original form is a matrix of sampled values of

amplitude of signals color or gray scale. If it is color, then there will b e as

many matrices as there are color comp onents and if it is gray scale, there will

b e a single comp onent. This is illustrated in Fig. 3.11. The discrete cosine

transform is not applied on the entire image. Instead, in the sequential lossy

mo de, the image is divided into individual blo cks. Each individual blo ckis

treated separately at each step of the pro cessing chain. The blo cks are small

square matrices of 8  8 sampled values.

Example: Consider a color image represented by three comp onents: the luminance Y

and the two color di erences U and V. The size of the image is 640 pixels p er line by 480

lines, which is equivalent to computing a VGA standard. Therefore, the luminance consists

of a matrix of 640 by 480 values, and each color di erence comp onent is a matrix of 320 by

240 values if we assume a 4:1:1 chrominance reduction Fig. 3.11 and Fig. 3.9. The blo ck

preparation step will submit to the DCT pro cess 4800 blo cks of luminance and twice 1200

blo cks for the color di erence. Blo cks may then b e passed to the DCT, comp onent after

96 Multimedia Systems Technology: Co ding and Compression Ch. 3

comp onent and within each comp onent left to right and top to b ottom. This technique is

called non-interleaved data ordering.

Forward Discrete Cosine Transform: In transform co ding, the initial

spatial or temp oral domain is transformed into an abstract domain whichis

more suitable for compression. The pro cess is reversible; that is, applying

the inverse transformation will restore the original data.

Each8 8 blo ck in Fig. 3.9 can b e represented by 64 p ointvalues denoted

as:

f x; y ; 0  x  7; 0  y  7, where x and y are the two spatial domains.

That is, each 8x8 blo ck of source image samples can b e viewed as a 64

p oint discrete signal that is a function of the two spatial dimensions x and y.

The DCT transforms these values to the frequency domain as c=gF ,F 

u v

where c is the co ecient and F and F are the resp ective spatial frequencies

u v

for each direction using the transformation[4]:

For 0  u  7; 0  v  7 F u; v =

P P

7 7

f x; y   cos 2x +1 u   =16  cos 2y +1 v   =16; 0:25  C u  C v  

x=0 y =0

p

where C u=C v =1= 2 for u,v = 0 and 1 otherwise.

The output of this equation gives another set of 64 values known as the

DCT coecients, that is the value of a particular frequency - no longer

the amplitude of the signal at the sampled p osition x,y. The co ecient

corresp onding to vector 0,0 is called the DC co ecient and the rest are

called AC co ecients. The DC co ecient generally contains a signi cant

fraction of the total image energy. Because sample values typically vary

slowly from p ointtopoint across an image, the FDCT Forward Discrete

Original V Color Original V Color 8 x 8 Blocks Picture U Color 8 x 8 Blocks Picture U Color Y Y

Brightness BrightnessComponent Component Source Image Data Source Image Data

Image Preparation Phase Fig. 3.11. Image Preparation Phase

Sec. 3.8. JPEG 97

Cosine Transform pro cessing compresses data by concentrating most of the

signal in the lower values of the u,v space. For a typical 8x8 sample blo ck

from a typical source image, many, if not most, of the u,v pairs have zero

or near-zero co ecients and therefore need not b e enco ded. At the deco der,

IDCT Inverse Discrete Cosine Transform reverses this pro cessing step. For

each DCT-based mo de of op eration, the JPEG prop osal sp eci es di erent

co des for 8-bit and 12-bit source-image samples.

Intuition: In a blo ck representing an image, sampled values usually vary slightly from

p ointtopoint. Thus, the co ecients of the lowest frequencies will b e high, but the

medium and high frequencies will have a small or zero value. The energy of the signal is

concentrated in the lowest spatial frequencies.

Imagine a at mono chromatic wall. Take a picture of it and divide it into 8  8

blo cks. These are the fx,y. In each blo ck, the amplitude of the signal will b e nearly

constant from pixel to pixel. As the value do es not change much in each direction, the

zero frequency will b e high - as the frequency measures the rate of change in each direction

- whereas the other frequencies will b e nearly zero.

If a sharp black line is drawn on the picture, the image b ecomes more complex. Some

of the blo cks will b e a ected - some not all. In the a ected blo cks, there will b e a rapid

change b etween two consecutivevalues, which will entail a high co ecient for one of the

highest frequencies.

In practice, continuous-tone images such as photographs do not havetoo

many sharp lines or zeros. The transitions b etween areas are often smo oth.

Thus, the information is usually mainly contained in the low frequencies. In

fact, this is the underlying assumption in JPEG. JPEG is not designed for

to o complex images, in particular images which lo ok like bitonal images.

In principle, DCT intro duces no loss to the source image samples; it

simply transforms them to a domain in which they can b e more eciently

enco ded. This means that, if the FDCT and IDCT could b e computed with

p erfect accuracy and if DCT co ecients were not quantized, the original 8x8

blo ck could b e recovered exactly. But as can b e seen from the equations, the

FDCT also the IDCT equations contain transcendental functions. There-

fore, no nite time implementation can compute them with p erfect accuracy.

Several algorithms have b een prop osed to compute these values approxi-

mately. But, no single algorithm is optimal for all implementations: an

algorithm that runs optimally in software, normally do es not op erate opti-

mally in rmware e.g., in a programmable DSP or in hardware.

Because of the nite precision of DCT inputs and outputs, co ecients

calculated by di erent algorithms or by indep endent implementations of the

same algorithm will result in slightly di erent output for identical input.

To enable innovation and customization, JPEG has chosen not to sp ecify a

unique FDCT/IDCT algorithm, but to address the quality issue by sp eci-

98 Multimedia Systems Technology: Co ding and Compression Ch. 3

fying an accuracy test to ensure against largely inaccurate co ecients that

degrade image quality.

Quantization: The output from the FDCT the 64 DCT co ecients is

uniformly quanti ed in conjunction with a 64 element quantization table to

b e sp eci ed by the application or the user as an input to the enco der. Each

element can b e an integer from 1 to 255, which sp eci es the step size of the

quanti er for its corresp onding DCT co ecient. The purp ose of quantization

is to achieve further compression by representing DCT co ecients with no

greater precision than is necessary to achieve the desired image quality.It

may b e noted that quantization is a many-to-one mapping, and therefore

is fundamental ly lossy. It is the principal cause of lossiness in DCT-based

enco ders. Quantization is de ned as division of each DCT co ecientby

its corresp onding quanti er step size, followed by rounding to the nearest

integer:

F u; v 

Q

F u; v = I nteg er Round

Qu; v 

After quantization, the DC co ecient is treated separately from the 63

AC co ecients. The DC co ecient is a measure of the average value of the

64 image samples. Because there is usually a strong correlation b etween

the DC co ecients of adjacent 8x8 blo cks, the quantized DC co ecientis

enco ded as the di erence from the DC term of the previous blo ckinthe

enco ding order, as shown in Fig. 3.12. This sp ecial treatmentisworthwhile,

as DC co ecients frequently contain a signi cant fraction of the total image

energy.

Finally, all of the quanti ed co ecients are ordered into the \zig-zag

sequence" as shown in Fig. 3.12. The ordering helps to facilitate entropy

co ding by placing low-frequency co ecients which are more likely to b e

non-zero b efore high-frequency co ecients.

The nal step in DCT-based enco der is entropy co ding. This step achieves

additional compression losslessly by enco ding the quanti ed DCT co ecients

more compactly based on their statistical characteristics. The JPEG pro-

p osal sp eci es twoentropy co ding metho ds - Hu man co ding and arithmetic

co ding. It is useful to consider entropy co ding as a two-step pro cess. The

rst step converts the zig-zag sequence of quanti ed co ecients into an in-

termediate sequence of symb ols. The second step converts the symb ols to

a data stream in which the symb ols no longer have identi able b oundaries.

The form and de nition of the intermediate symb ols is dep endent on b oth

Sec. 3.8. JPEG 99

DC AC AC 01 07 DC DC i-1 i

block block i-1 i

D DC = DC- DC i i i-1

AC AC 77

70

Fig. 3.12. Di erential DC Co ding and Zig-Zag Sequencing

the DCT-based mo de of op eration and the entropy co ding metho d. Hu man

co ding requires that one or more sets of Hu man co de tables b e sp eci ed by

the application. The same tables used to compress the image are needed to

decompress the image. Hu man tables may b e pre-de ned and used within

an application as defaults, or computed sp eci cally for a given image in an

initial statistics-gathering pass prior to compression. Suchchoices are the

resp onsibility of JPEG applications. As such, the JPEG prop osal do es not

endorse any Hu man tables. A comprehensive example is given in Fig. 3.13.

The reader should note that the numb ers given in the example are only in-

dicative of what various matrices will lo ok like and not from any real image.

3.8.2 Compression and Picture Quality

For color images with mo derate complex scenes, all DCT-based mo des of

op eration typically pro duce the levels of picture quality for the indicated

ranges of compression as shown in Table. 3.2.

3.8.3 Pro cessing at the Receiver

At the receiving end, up on deco ding, de-quantization which is the inverse

function of quantization is to b e p erformed. That simply means that the

normalization is removed bymultiplying by step size, which returns the re-

sult to representation appropriate for input to the IDCT. The corresp onding

100 Multimedia Systems Technology: Co ding and Compression Ch. 3

equation is:

0

Q Q

F u; v =F u; v   Qu; v 

This is then fed to the IDCT that is represented by the following idealized

mathematical de nition for a 8x8 blo ck:

f x; y =

P P

7 7

C u  C v   F u; v   cos 2x +1 u   =16  c os 2y +1 v   =16 0:25 

y =0 x=0

p

where C u=C v =1= 2 for u,v = 0 and 1 otherwise.

8 x 8 Block Sample

V 188 173 166 122 133 322 222 111 U 165 145 155 222 222 176 167 188 101 120 120 130 122 133 244 100 Y 099 096 078 087 066 045 123 122 f(x,y) 066 056 321 123 075 044 055 023 101 102 123 133 233 111 111 123 161 162 163 163 154 153 153 152 134 143 123 122 133 143 432 122

DCT

Q (u,v) DCT Coefficient Matrix

001 001 001 001 001 004 008 016 185 044 006 002 002 001 001 001 001 001 001 001 004 004 008 016 055 033 004 002 002 001 001 001 001 001 001 002 004 004 008 016 010 005 003 002 002 001 002 001 002 008 008 008 016 016 016 016 F (u,v) 003 004 003 005 001 001 001 001

004 008 008 008 008 016 016 032 Q (u,v) 003 002 002 001 001 001 001 001 004 008 008 016 016 016 032 032 002 002 002 001 001 001 001 000 004 008 008 016 016 016 032 032 001 001 001 002 001 002 001 000 008 008 008 016 016 032 032 064 001 001 002 001 002 001 000 000

DC Coefficient F (u,v) Q F (0,0)

182 044 007 003 002 000 000 000 182 44 7 12 13 7 3 3 5 2 007 018 003 003 002 000 000 000 0 0 5 3 2 0 2 2 0 0 0 0 012 005 005 002 000 000 000 000 0 0 0 0 0 0 0 0 0 0 0 0 002 000 000 000 000 000 000 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

000 000 000 000 000 000 000 000 0 0 0 0 0 0 000 000 000 000 000 000 000 000 Zig-Zag Sequence 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

Quantized DCT Coefficient Q

F (u,v) Fig. 3.13. JPEG Example

Sec. 3.8. JPEG 101

Intuitively, FDCT and IDCT b ehave like harmonic analyzer and har-

monic synthesizer, resp ectively.At the deco der, the IDCT takes the 64 DCT

co ecients and reconstructs a 64-p oint output image signal by summing the

basis signals.

3.8.4 Hierarchical Mo de of Op eration

In hierarchical mo de of op eration, an image is enco ded as a sequence of

frames. These frames provide referencereconstructedcomponents which are

usually needed for prediction in subsequent frames. Like in the progressive

DCT-based mo de, hierarchical mo de also o ers a progressive presentation.

It is useful in environments whichhavemulti-resolution requirements. Hier-

archical mo de also o ers the capability of progressive transmission to a nal

lossless stage.

Using hierarchical co ding, a set of successively smaller images can b e cre-

ated by down-sampling low-pass ltering and sub-sampling the preceding

larger image in the set. Then starting with the smallest image in the set,

the image set is co ded with increasing resolution. After the rst stage, each

lower-resolution image is scaled up to the next resolution up-sampled and

used as a prediction for the following stage. When the set of images is stacked

as shown in Fig. 3.14, it sometimes resembles a pyramid. Just as smaller

images generated as part of a hierarchical progression can b e scaled up to

full resolution, full-resolution images generated as part of a non-hierarchical

Compression Range Picture Quality

0.25-0.5 bits/pixel Mo derate to go o d quality,

sucient for some applicatio ns

0.5-.075 bits/pixel Go o d to very go o d quality,

sucient for many applications

0.75-1.5 bit/pixel Excellent Quality,

sucient for most application s

1.5-2.0 bits/pixel Usually indistingu ish abl e

from the original picture,

sucient for the most demanding

application s

Table 3.2. Compression Range versus Picture Quality [4]

102 Multimedia Systems Technology: Co ding and Compression Ch. 3

Fig. 3.14. Hierarchical Multi-Resolution Enco ding

progression can b e scaled down to smaller sizes. These scalings are done at

the deco der and are not restricted to p owers of 2.

3.8.5 Multiple-Comp onent Images

In the JPEG standard, a source image may contain 1 to 255 image comp o-

nents, also referred to as color or sp ectral bands or channels. Each comp o-

nent consists of a rectangular array of samples. All samples are de ned to

b e an unsigned integer with precision P bits, with anyvalue in the range

p

[0, 2 -1]. All samples of all comp onents within the same source image must

have the same precision P, the value of which can b e 8 or 12 for DCT-based

co des and 2 to 16 for predictive co des, resp ectively. The multicomp onent

source image mo del is shown in Fig. 3.15.

The comp onent C has sample dimensions x  y . As the formats in

i i i

which some image comp onents are sampled may di er compared to other

comp onents, in the JPEG source mo del, each comp onent can have di er-

ent dimensions. The dimensions must havea mutual integral relationship

de ned by H and V , the relative horizontal and vertical sampling factors,

i i

whichmust b e sp eci ed for each comp onent. Overall, image dimensions X

and Y are de ned as the maximum x and y for all comp onents in the image,

i i

16

and can b e anynumber up to 2 . H and V are allowed only the integer

values 1 through 4. The enco ded parameters are X, Y, and the H s and V s

i i

for each comp onent. The deco der reconstructs the dimensions x and y for

i i

each comp onent, according to the following ceiling function:

x =X*H /H

i i max

Sec. 3.8. JPEG 103

Sample

C N o o o o o o o o o o o o o o C o o o o o o o o o o o o o o Line i o o o o o o o o o o o o o o C o o o o o o o o o o o o o o 2 o o o o o o o o o o o o o o o C1 Y o y o i o o o o o o o o o

X x

i

Fig. 3.15. JPEG Source Image Mo del

y =Y*V /V

i i max

Example: In the example discussed so far, the C s are Y; U; and V , the corresp onding

i

H and V are H =4;V =4; H =2;V =2; and H =2;V = 2 for Y; U;

i i Y Y u U V V

and V , resp ectively. The resolution of each comp onent Y; U; and V can b e assumed

to b e X = 512;Y = 512; X = 256;Y = 512; and X = 256;Y = 512:

Y Y U U V V

3.8.6 Enco ding Order and Interleaving for Multi-Comp onent Im-

ages

In practice, many applications need to pip eline the pro cess of displaying

multi-comp onent images in parallel with the pro cess of decompression. For

many systems, this is only feasible if the comp onents are interleaved together

within the compressed data stream. The JPEG prop osal de nes the concept

of a data unit, which is a single sample in the case of predictive co decs

and an 8x8 blo ck of samples in the case of DCT-based co decs. When two

or more comp onents are interleaved, each comp onent C is partitioned into

i

rectangular regions of x by y data units, as shown in Fig. 3.15. Regions

i i

are ordered within a comp onent from left-to-right and top-to-b ottom, and

within a region, data units are ordered from left-to-right and top-to-b ottom.

The JPEG prop osal de nes the term Minimum Co ded Unit MCU to b e the

104 Multimedia Systems Technology: Co ding and Compression Ch. 3

C C C C 1 2 3 4

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 0 1 2

0 A B E F I J a b c d e f 0 2 4 * + = 1 C D G H K L g h i j k l 1 3 5 # $ %

2 M N Q R U V m n o p q r 6 8 0 @ ! &

3 O P S T W X s t u v w x 7 9 1 ^ - )

4 Y Z MCU = ABCD ab 01 * 5 1 MCU = EFGH cd 23 + 2

MCU = IJKL ef 45 = 3

MCU = MNOPgh 67 #

4

Fig. 3.16. Interleaved Data Ordering Example

smallest group of interleaved data units. In the example given in Fig. 3.16,

MCU consists of data units taken from the top-left most region of C ,

1 i

followed by data units from the same region of C , and likewise for C and

2 3

C . MCU continues the pattern as shown in Fig. 3.16. Thus the interleaved

4 2

data is an ordered sequence of MCUs, and the numb er of data units contained

in an MCU is determined by the numb er of comp onents interleaved and their

relative sampling factors. The maximum numb er of comp onents which can

be interleaved is 4, and the maximum numb er of data units in an MCU is

10.

3.8.7 Using Multiple Tables

When multiple comp onents and interleaving are involved, JPEG co des must

control the use of a prop er data table for each comp onent. It should b e en-

sured that the same quantization table with the same entropy co ding table

Sec. 3.9. JBIG 105

A

Compressed B Encoding Process Image Data

C

Table Spec 1 Table Spec 2

Fig. 3.17. Table Switching Control

b e used to enco de all samples within a comp onent. JPEG deco ders can store

up to four di erent quantization tables and the same entropy co ding table

must b e used to enco de all samples within a comp onent. Therefore, it is

necessary to switchbetween tables during decompression of a scan contain-

ing multiple interleaved comp onents, as the tables cannot b e loaded during

decompression of a scan. Fig. 3.17 illustrates the task switching control that

must b e managed in conjunction with multiple comp onentinterleaving for

the enco der side.

3.8.8 Some Remarks on JPEG

Most imp ortantly,aninterchange format syntax is sp eci ed which ensures

that a JPEG compressed image can b e exchanged successfully b etween di er-

ent application environments. The applications have the freedom to sp ecify

default or referenced table, as appropriate.

In essence, JPEG is a standard to b e strictly followed for Image Exchange

and ecient hardware, software, and hybrid implementations are available

[14]. While a multimedia system designer cannot a ord to change the JPEG

pro cess, the designer can certainly ne-tune the implementation.

3.9 JBIG

Images can b e broadly classi ed into two categories: bi-tonal and continuous-

tone images. A regular printed text without any gray-scale characters is an

ideal example of a bi-tonal image. JBIG concentrates on bi-tonal images

106 Multimedia Systems Technology: Co ding and Compression Ch. 3

while JPEG concentrates on continuous-tone images. Bi-tonal images are

also called bi-level images that is, an image that has only colors like black

and white. ITU-T Recommendations T.4 and T.30 are co ding standards

for Group 3 facsimile machines. Compression algorithm of the Group 3

standard was lain down in 1990 for use with switched networks.

The co ding standard recommendation T.6 for Group 4 facsimile machines

was published in 1984. While T.6 provides b etter resolution, it requires line

sp eeds of the order of 56 Kbps or 64 Kbps to op erate.

JBIG Joint Bi-level Image Group is an advanced compression scheme

utilizing lossless, predictive metho ds [6,8,9]. The JBIG compression algo-

rithm is de ned by ISO/IEC Standard 11544:1993. The JBIG sp eci cation

de nes the compression scheme, not the le format. The JBIG Alliance is

sp onsoring the JBIG Interchange Recommendation, which provides a well-

de ned metho d for storing JBIG-compressed data in industry standard TIFF

Tagged Image File Format-formated les.

JBIG is an advanced lossless compression algorithm originally develop ed

for facsimile transmission. The eciency of the algorithm derives from arith-

metic enco ding, a mathematically complex concept based on image contexts.

Because JBIG stores data in indep endent bit-planes, JBIG eciently com-

presses b oth bi-tonal and gray-scale data.

The main characteristics of JBIG are:

 Compatible progressive/sequential co ding. This means that a progres-

sively co ded image can b e deco ded sequentially, and the other way

around.

 JBIG will b e a lossless image compression standard: all bits in the

images b efore and after compression and decompression will b e exactly

the same.

JBIG sp eci es the numb er of bits p er pixel in the image. Its allowable

range is 1 through 255, but starting at 8 or so, compression will b e more

ecient using other algorithms. On the other hand, medical images suchas

chest X-rays are often stored with 12 bits p er pixel while no distortion is

allowed, so JBIG can certainly b e of use in this area. To limit the number

of bit changes b etween adjacent decimal values e.g., 127 and 128, it is

wise to use Gray co ding b efore compressing multi-level images with JBIG.

JBIG then compresses the image on a bit-plane basis, so the rest of this text

assumes bi-level pixels.

Progressive co ding is a way to send an image gradually to a receiver

Sec. 3.10. MPEG 107

instead of all at once. JBIG uses discrete steps of detail by successively

doubling the resolution. Compatibilitybetween progressive and sequential

co ding is achieved by dividing an image into strip es. Each strip e is a hor-

izontal bar with a user-de nable height. Each strip e is separately co ded

and transmitted, and the user can de ne in which order strip es, resolutions,

and bit-planes are intermixed in the co ded data. A progressive co ded image

can b e deco ded sequentially by deco ding each strip e, b eginning by the one

at the top of the image, to its full resolution, and then pro ceeding to the

next strip e. Progressive deco ding can b e done by deco ding only a sp eci c

resolution layer from all strip es.

After dividing an image into bit-planes, resolution layers, and strip es,

eventually a numb er of small bi-level are left to compress. Com-

pression is done using a Q-co der.

The Q-co der co des bi-level pixels as symb ols using the probabilityof

o ccurrence of these symb ols in a certain context. JBIG de nes two kinds of

context: one for the lowest resolution layer the base layer, and one for all

other layers di erential layers. Di erential layer contexts contain pixels in

the layer to b e co ded, and in the corresp onding lower resolution layer.

For each combination of pixel values in a context, the probability distri-

bution of black and white pixels can b e di erent. In an all-white context,

the probability of co ding a white pixel will b e much greater than that of

co ding a black pixel. The Q-co der assigns, just like a Hu man co der, more

bits to less probable symb ols, and so achieves compression. The Q-co der

can, unlike a Hu man co der, assign one output co de-bit to more than one

input symb ol, and thus is able to compress bi-level pixels without explicit

clustering, as would b e necessary using a Hu man co der.

Maximum compression will b e achieved when all probabilities one set

for each combination of pixel values in the context follow the probabilities

of the pixels. The Q-co der therefore continuously adapts these probabilities

to the symb ols it sees.

3.10 MPEG

The MPEG system layer has the basic task of combining one or more audio

and video compressed bit-streams into a single bit-stream. It de nes the data

stream syntax that provides for timing control and the interleaving and syn-

chronization of audio and video bit-streams. From the systems p ersp ective,

MPEG bit-stream consists of a system layer and compression layers. The

system layer basically provides an envelop e for the compression layers. The

108 Multimedia Systems Technology: Co ding and Compression Ch. 3

compression layers contain the data to b e given to the deco ders, whereas the

system layer provides the control for demultiplexing the interleaved compres-

sion layers. A typical MPEG system blo ck diagram is shown in Fig. 3.18.

Audio Compression Layer Audio Decoder

Bitstream o o o Digital etc System Clock Storage Decoder System o o Timing o

Video Decoder

Video Compression Layer

Fig. 3.18. MPEG System Structure

The MPEG bit-stream consists of a sequence of packs that are in turn

sub divided into packets, as shown in Fig. 3.22. Each pack consists of a unique

32-bit byte aligned pack start co de and header, followed by one or more

packets of data. Each packet consists of a packet start co de another unique

32-bit byte aligned co de and header, followed by a packet data compressed

audio or video data. The system deco der parses this bit-stream and feeds

the separated audio and video to the appropriate deco ders along with timing

information.

The MPEG system uses an idealized deco der called System Target De-

co der STD, whichinterprets the pack and packet headers to deliver the

elementary bit-streams to the appropriate audio or video deco der. In STD

the bits for an access unit a picture or an audio access unit are removed

from the bu er instantaneously at a time dictated by a Deco ding Time

Stamp DTS in the bit-stream. The bit-stream also contains another typ e

of time stamp, the Presentation Time Stamp PTS. Bu er over ow or un-

Sec. 3.10. MPEG 109

der ow is controlled by the DTS; synchronization b etween audio and video

deco ding is controlled by the PTS.

3.10.1 MPEG Audio Compression

MPEG audio co ding supp orts 32, 44.1, and 48 KHz. At 16 bits p er sample

the uncompressed audio would require ab out 1.5 Mbps. After compression,

the bit-rates for monophonic channels are b etween 32 and 192 Kbps; the

bit-rates for stereophonic channels are b etween 128 and 384 Kbps.

The MPEG audio system rst segments the audio into windows of 384

samples wide and uses a lter bank to decomp ose each windowinto 32

sub-bands, each with a width of approximately 750 Hz for a sample of 48

KHz. As is typical of sub-band co ding, each sub-band is decimated such

that the sampling rate p er sub-band is 1.5 KHz and there are 12 samples

p er window. An FFT Fast Fourier Transform of the audio input is used

to compute a global masking threshold for each sub-band, and from this a

uniform quantizer is chosen that provides the least audible distortion at the

required bit-rate. For more information on MPEG audio co ding, the reader

is referred to [3].

3.10.2 MPEG Video Compression

MPEG is a generic compression standard that addresses video compres-

sion, audio compression, and the issue of audio-video synchronization for

sequences movie or VCR output. MPEG compresses the video and audio

signals at ab out 1.5 Mbps p er channel. The MPEG system addresses the

issue of synchronization and multiplexing of multiple compressed audio and

video list streams. The motivation for the MPEG standard comes from the

need to handle full-motion video for storage and retrieval with random ac-

cess. MPEG is designed to compress across multiple frames and therefore

can yield compression ratios of 50:1 to 200:1 instead of 20:1 or 25:1 in JPEG.

Its algorithm is asymmetrical; that is, it requires more computational com-

plexity hardware to compress than to decompress it. This is useful for

applications where the signal is pro duced at one source but is distributed to

many, or compressed once and decompressed several times.

At this p oint, the reader should recall that JPEG deals with a single

still image picture and the compression was an e ort to reduce the spatial

redundancy in the source. In the case of motion video, we are dealing with a

stream of single still images pictures that are available from the source at

a constant rate. A sequence of video frames inherently carries a lot of redun-

110 Multimedia Systems Technology: Co ding and Compression Ch. 3

dant information. Therefore, MPEG compression concentrates on reducing

the temporal redundancy across the frames in addition to the spatial redun-

dancy within a frame. The MPEG standard incorp orates a sp ecial concept

called \group of pictures" GOP to deal with temp oral redundancy and to

allow random access to b e stored MPEG-co ded .

There are three typ es of pictures that are considered in MPEG; viz.;

Intra pictures I-Frames, predicted pictures P-Frames and bi-directional

interp olated pictures B-Frames. The I-Frame is the rst frame of each

GOP. The Intra pictures I-Frames provide reference p oints for random

access and are sub jected to mo derate compression. From the compression

p oint of view, I-pictures are equivalenttoimages and can b e DCT enco ded

using a JPEG-like algorithm. In P- and B-pictures, the motion-comp ensated

prediction errors are DCT co ded. Only forward prediction is used in the P-

pictures, which are always enco ded relative to the preceding I- or P-pictures.

The prediction of the B-pictures can b e forward, backward, or bidirectional

relative to other I- or P-pictures. In addition to these three, there is a fourth

typ e of picture called D-pictures, containing only the DC comp onentofeach

blo ck. They are mainly useful in browsing at very low bit-rates. The number

of I, P, and B frames in a GOP is application dep endent. For example, it

dep ends on the access time requirement and the bit-rate requirement.

Example: In Fig. 3.19 the relationship b etween the di erent pictures and what their

relative roles are is shown. The example intro duces an intra-co ded picture every 8 frames

and the sequence of I, B, P is IBBBPBBBI. Note that the rst frame is an I-frame. In

MPEG, the order in which the pictures are pro cessed is not necessarily the same as their

time sequential order. The pictures in the example can b e co ded as 1,5,2,3,4,9,6,7,8 or

1,2,3,5,4,9,6,7,8, since the the prediction for P- and B-pictures should b e based on pictures

that are already transmitted.

Prediction pictures are co ded with reference to a past picture intraI or

predictedP and can, in general, b e used as a reference for future predicted

pictures. Interp olated pictures or B-pictures achieve the highest amountof

compression and require b oth a past and future reference frame. In addi-

tion, bi-directional pictures B-Pictures are never used as a reference. In all

cases, when a picture is co ded with resp ect to a reference, motion compen-

5

sation is used to improve the co ding eciency. Fig. 3.20 expands on the

example and shows how they act as inter-frame co ding and the relative sizes

of the compression information pro cess. As in the example, an intra co ded

picture every 8 frames and the sequence of I, B, P is IBBBPBBBI. Motion

comp ensation is the basic philosophy b ehind the co ding of B and P frames.

5

A technique used to compress video.

Sec. 3.10. MPEG 111

1 2 3

4 5 I − FRAME 6 7 B − FRAME 8 B − FRAME 9 B − FRAME P − FRAME B − FRAME B − FRAME Group of Frames B − FRAME

I − FRAME

Fig. 3.19. Group of Frames

3.10.3 Motion Comp ensation

There are two techniques that are employed in motion comp ensation: pre-

diction and interp olation. Motion-comp ensated prediction assumes that the

current picture can b e mo deled as a transformation of the picture at some

previous time. This means that the signal amplitude and the direction of

displacement need not b e the same everywhere in the picture. Motion-

comp ensated interp olation is a multi-resolution technique; a sub-signal with

low temp oral resolution typically 1/2 to 1/3 the frame rate is co ded and

the full resolution signal is obtained byinterp olation of the low-resolution

signal and addition of a correction term. The signal to b e reconstructed by

interp olation is obtained by adding a correction term to a combination of a past and future reference [3,5,7].

112 Multimedia Systems Technology: Co ding and Compression Ch. 3

3.10.4 Motion Representation

MPEG uses a Macro block consisting of 16  16 pixels. There is always a

trade-o b etween co ding gain provided by the motion information and the

cost asso ciated with co ding the motion information. The choice of 16  16

blo cks for the motion comp ensation unit is based on such a trade-o . In

a bi-directionally co ded picture, each16 16 macro blo ck can b e of typ e

Intra, Forward-Predicted, Backward-Predicted, or Average.

3.10.5

The extraction of motion information from a video sequence is called mo-

tion estimation. If we use the blo ck matching technique, the motion vector

is obtained by minimizing a cost function measuring the mismatchbetween

a blo ck and each predictor candidate. The search range V of the p ossible

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Full Motion Video Frames Group of Frames I B B B PB BB I 1/30 th of s second interval

Coded Frames of Group G 1

Group of Frames

Size Per Frame

Fig. 3.20. MPEG Co ding for Full-Motion Video

Sec. 3.10. MPEG 113

Sequence Layer Random Access Unit: Context

Group of Pictures Layer Random Access Unit: Video Co ding

Picture Layer Primary Co ding Unit

Slice Layer Re-synchronization Unit

Macro Blo ckLayer Motion Comp ensation Unit

Blo ckLayer DCT Unit

Table 3.3. Six Layers of the MPEG Video

motion vectors and the selection of the cost function are left to the imple-

mentation. Exhaustive searches, where all the p ossible motion vectors are

considered, are known to give go o d results, but at the exp ense of a very large

complexity of computation for large ranges.

3.10.6 Layered Structure, Syntax, and Bit-stream

The MPEG syntax allows for provision of many application-sp eci c features

without p enalizing other applications. For example, random access and easy

accessibility require many access p oints implying groups of pictures of short

duration e.g., 6 pictures, 1/5 second and co ded with a xed amount of bits

to make edit-ability p ossible. Similarly, in order to provide robustness in

the case of broadcast over noisy channel, the predictors are frequently reset

and eachintra and predicted picture is segmented into many slices. The

comp osition of an MPEG video bit-stream contains six layers, as given in

Table. 3.3.

Eachlayer supp orts a de nite function: either a signal pro cessing func-

tion DCT, motion comp ensation or a logical function Re-synchronization,

Random Access Point. The MPEG syntax de nes a MPEG bit-stream as

any sequence of binary digits.

3.10.7 MPEG System

The MPEG standard is a three-part standard; viz., video co ding, audio

co ding, and system co ding. The MPEG system co ding part sp eci es the

system multiplexer and enco der to pro duce a MPEG stream from the en-

co ded video and audio streams with a system reference clo ck as the base.

114 Multimedia Systems Technology: Co ding and Compression Ch. 3

That is, the MPEG stream is the synchronization of elementary streams and

carries enough information in the data elds to do the following tasks:

 Parsing the multiplexed stream after a random access.

 Managing co ded information bu ers in the deco ders.

 Identifying the absolute time of the co ded information.

The most imp ortant task of the system co ding pro cess is the actual mul-

tiplexing. It includes the co ordination of input data streams and output

data streams, the adjustment of clo cks, and bu er management. The sys-

tem's semantic rules imp ose some requirements on the deco ders and allow

freedom in the implementation of the enco ding pro cess. Fig. 3.21 shows the

schematic of such an enco der at the functional level. The video enco der re-

ceives the unco ded digitized picture in units called Video Presentation Units

VPUs at discrete time intervals; similarly, at discrete time intervals, the

audio enco der receives unco ded digitized blo cks of audio samples in units

called Audio Presentation Units APUs. The video and audio enco ders en-

co de and audio, pro ducing co ded pictures called Video Access

Units VAUs and Audio Access units AAUs. These outputs are referred

to as elementary streams. The system enco der and multiplexer pro duces

amultiplexed stream that contains the elementary streams and some addi-

tional information to help in handling synchronization and timing. MPEG

supp orts audio from 32 Kbps to 384 Kbps in single channel or in two stereo

channels.

Timing and Synchronization: The audio and video sample rates at the en-

co der are signi cantly di erent from one another, and mayormay not have

an exact and xed relationship to one another. Also, the duration of a blo ck

of audio samples or APUs is generally not the same as the duration of a

video picture. There is a single, common system clo ck in the enco der, and

this clo ck is used to create time stamps that indicate the correct presentation

and deco ding timing of audio and video, as well as to create time stamps

that indicate the instantaneous values of the system clo ck itself at sampled

intervals. The time stamps that indicate the presentation time of audio and

video are called Presentation Time Stamps PTS; those that indicate the

deco ding time are called Deco ding Time Stamps DTS; and those that in-

dicate the value of the system clo ck are called the System Clo ck Reference

SCR. It is the presence of the common system clo ck in the enco der, the

Sec. 3.10. MPEG 115

Audio Audio Source Encoder

System Multiplexer MPEG Stream and Encoder

Video Video Source Encoder

System Clock The MPEG Stream is Transmitted Through a Network Clock is Derived from the MPEG Stream

Video Video Out Decoder System Decoder and Demultiplexer Audio Audio Out

Decoder

Fig. 3.21. MPEG Data Stream Generation Mo del

time stamps that are created from it, the recreation of the clo ck in the de-

co der, and the correct use of the time stamps that provide the facilityto

synchronize prop erly the op eration of the deco der.

The MPEG system emb o dies a timing mo del in which all digitized pic-

tures and audio samples that enter the enco der are presented exactly once

each, after a constant end to end delay, at the output of the deco der. As

such, the sample rates, i.e., the video picture rate and the audio sample rate,

are precisely the same at the deco der as they are at the enco der. Constant

delay,asenvisaged in the MPEG system, is required for correct synchro-

116 Multimedia Systems Technology: Co ding and Compression Ch. 3

ZERO OR MORE PACKETS END CODE (32 BITS)

PACKET System Clock Max Rate System Header Zero or More Start Code Referece (SCR) Packets (32 Bits) (40 Bits) (24 Bits) Packet (Opt.)

PACKET Packet Packet Optional Fields Start Code Length Information

(32 Bits) (16 Bits)

Fig. 3.22. MPEG Bit-stream Layering

nization; however, some deviations are p ossible due to the variable delays at

the enco ding bu er during the transmission through the network and at the

deco ding bu er.

Toachieve synchronization in multimedia systems that deco de multiple

video and audio signals originating from a storage or transmission medium,

there must b e a `time master' in the deco ding system. MPEG do es not

sp ecify whichentity is the time master. The time master can b e anyofthe

deco ders, the information stream source, or an external time base. All other

entities in the system deco ders and information sources must slave their

timing to the time master. For example, if a deco der is taken as the time

master, the time when it presents a presentation unit is considered to b e the

correct time for the other entities. If the information stream source is the

time master, the SCR values indicate the correct time at the moment these

values are received. The deco ders then use this information to pace their

deco ding and presentation timing.

MPEG Stream: The most imp ortant task of MPEG stream generation pro-

cess is multiplexing. It includes a combination of input data streams and

output data streams, the adjustment of clo cks, and the bu er management.

The MPEG stream syntax consists of three co ding layers: stream layer, pack

layer, and packet layer, as shown schematically in Fig. 3.22. Of these three

layers, it is the packet layer which actually carries the information from the

elementary streams; each packet carries information from exactly one ele-

mentary stream. Zero or more packets are group ed together to form a pack,

which carries the SCR value. The deco der gets the information necessary

Sec. 3.10. MPEG 117

for its resource reservation from this multiplexed stream. The maximal data

rate is included in the rst pack at the b eginning of each MPEG stream.

Though MPEG do es not sp ecify directly the metho d of multiplexing the

elementary streams suchasVAUs and AAUs, MPEG do es laydown some

constraints that must b e followed by an enco der and multiplexer in order

to pro duce a valid MPEG data stream. For example, the individual stream

bu ers must not over ow or under ow. That is, the sizes of the individual

stream bu ers imp ose limits on the b ehavior of the multiplexer. Deco ding

the MPEG data stream starting at the b eginning of the stream is straightfor-

ward since there are no ambiguous bit patterns. But starting the deco ding

op eration at random p oints requires lo cating packorpacket start co des

within the data stream.

MPEG Deco ding Pro cess: The MPEG standard de nes the deco ding

pro cess - not the deco der. There are manyways to implement a deco der, and

the standard do es not recommend a particular way. The deco der structure

given in Fig. 3.23 is a typical deco der structure with a bu er at the input of

the deco der. The bit-stream is demultiplexed into overhead information such

as motion information, quantizer step-size, macroblo cktyp e, and quantized

DCT co ecients. The quantized DCT co ecients are dequantized, and are

input to the Inverse Discrete Cosine Transform IDCT. The reconstructed

waveform from the IDCT is added to the result of the prediction. Because of

the particular nature of the bidirectional prediction, two reference pictures

are used to form the predictor.

3.10.8 MPEG-2, MPEG-3, MPEG-4

MPEG standard is an optimal solution for a target data rate of ab out 1.5

Mbps. The optimality is with resp ect to p erceptual quality and not nec-

essarily p erformance. To address the high-quality video requirement with

a target rate of ab out 40 Mbps, MPEG-2 video standard has b een devel-

op ed. MPEG-2 builds up on MPEG standard also referred to as MPEG-1

standard and supp orts formats and HDTV. MPEG-3 was

originally started for HDTV and, after the development of MPEG-2 which

addresses this issue, MPEG-3 has b een dropp ed. MPEG-4 lo oks at the

other end of the sp ectrum; low bit-rate co ding of audiovisual programs. It

is exp ected to b e used in mobile telephony systems.

118 Multimedia Systems Technology: Co ding and Compression Ch. 3

Quantizer Stepsize

D E −1 Buffer M Q IDCT + U X Type

+ Ref

Ref

Motion Vectors

Fig. 3.23. MPEG Deco ding Pro cess

3.11 H.261 Standard

CCITT Recommendation H.261 is a video compression standard develop ed

to facilitate video conferencing and videophone services over the Integrated

Services Digital Network ISDN at px64 Kbps, p = 1,...30 [6,13]. For exam-

ple when p=1, one 64 Kbps may b e appropriate for low-quality videophone

service, where the video signal can b e transmitted at a rate of 48 Kbps and

the remaining 16 Kbps is used for audio signal. video conferencing services

generally require higher image quality, which can b e achieved with p  6,

that is, 384 Kbps or higher. Note that the maximum available bit-rate over

an ISDN channel is 1.92 MBPS p = 30, which is sucient to obtain VHS

quality images.

H.261 has two imp ortant features:

 It sp eci es a maximum co ding delay of 150 milliseconds, b ecause it

is mainly intended for bidirectional video communication. It has b een

estimated that delays exceeding 150 milliseconds do not give the viewer

the impression of direct visual feedback.

 It is amenable to low-cost VLSI implementation, which is imp ortant

for widespread commercialization.

To p ermit a single recommendation for use in and b etween regions using 625 and 525 line TV standards, the H.261 input picture is sp eci ed in Com-

Sec. 3.12. H.263 119

mon Intermediate Format CIF. For lower bit-rates a smaller format QCIF,

which is one quarter of the CIF, has b een adopted. The numb er of active

p els p er line has luminance and chrominance are 360 and 180 352 and 176

visible for CIF and 180 and 90 176 and 88 visible for QCIF, resp ectively.

The numb er of active lines p er picture has luminance and chrominance 288

and 144 for CIF and 144 and 72 for QCIF, resp ectively. The temp oral rates

are adjustable for 30, 15, 10, 7.5, for b oth CIF and QCIF. If we consider

30 frames p er second, the raw data rate for CIF will b e 37.3 Mbps and for

QCIF 9.35, resp ectively.Itmay b e noted that even with QCIF images at

10 frames p er second, 48:1 compression is required for videophone services

over a 64 Kbps channel. CIF images may b e used when p  6, that is, for

video conferencing applications. Metho ds for converting from CIF to QCIF,

or vice versa, is not sp eci ed by the recommendation.

3.12 H.263

H.263 is a low bit-rate video standard for teleconferencing applications that

has b oth MPEG-1 and MPEG-2 features [6,7]. It op erates at 64 Kbps. The

video co ding of H.263 is based on that of H.261. In fact, H.263 is an ex-

tension of H.261 and describ es a hybrid DPCM/DCT video co ding metho d.

Both H.261 and H.263 use techniques such as DCT, motion comp ensation,

variable-length co ding, and scalar quantization. Both use the concept of

macroblo ck structure.

The idea of PB frame in H.263 is interesting. The PB frame consists

of two pictures b eing co ded as one unit. The name PB is derived from

the MPEG terminology of P-pictures and B-pictures. Thus, a PB frame

consists of one P-picture that is predicted from the last deco ded P-picture

and one B-picture that is predicted from b oth the last deco ded P-picture

and the P-picture currently b eing deco ded. This last picture is called a B-

picture b ecause parts of it may b e bi-directionall y predicted from the past

and future P-picture.

It has b een shown that the H.263 system typically outp erforms H.261

by 2.5 to 1. That means, for a given picture quality, the H.261 bit-rate is

approximately 2.5 times that of H.263 co dec.

3.13 Summary

In this chapter, we describ ed the image compression system and explained

some of the key co ding and compression techniques. The concepts relating

120 Multimedia Systems Technology: Co ding and Compression Ch. 3

to co ding, compression, and transformation of digital samples of analog sig-

nals to their frequency domain equivalents were discussed at length. The

two p opular co ding standards, called JPEG and MPEG, were explained in

some detail. JPEG is aimed at still pictures and MPEG is aimed at motion

pictures. Other standards, such as H.261, H.263, and JBIG, were discussed

to enable the reader to understand the purp ose and p ositioning of di erent

standards. It may b e p ointed out that this chapter is one of the key chapters

in the entire b o ok, as it explains the nature of the one dominant media in all

multimedia systems, viz., the video. Further information ab out multimedia

systems can b e obtained from [10; 11; 12].

REFERENCES

1. C. Shannon A Mathematical Theory of Communication, Bell Systems Technical

Journal, Vol. 27, pp. 379-423, July 1948; and pp. 623-656, Octob er 1948.

2. John G. Proakis. Digital Communications, 3rd Edition, McGraw Hill Inc., 1995.

3. D. Pan. An Overview of the MPEG Audio Compression Algorithm, In SPIE.

Digital Video Compression on Personal : Algorithms and Technologies,

Vol.2187, pp. 260-273, 1994.

4. Gregory K. Wallace. The JPEG Still Picture Compression Standard, Communica-

tions of the ACM, Vol.34, No.4, pp.31-44, April 1991.

5. Didier Le Gall. MPEG: A Video Compression Standard for Multimedia Applica-

tions, Communications of the ACM, Vol.34, No.4, pp. 46-58, April 1991.

6. Richard Schaphorst. video conferencing and Video-telephony: Technology and Stan-

dards, Artech House Inc., 1996.

7. Joan L. Mitchell, Willia m B. Pennebaker, Chad E. Fogg, and Didier J. Legall Eds..

MPEG Video Compression Standard, Chapman & Hall, 1997.

8. www..org

9. ISO/IEC JTC1/SC2/WG9. Progressive Bi-level Image Compression, Revision 4.1,

CD 11544, Septemb er 16, 1991

10. W.B. Pennebaker, J.L. Mitchell, G.G. Langdon, and R.B. Arps. An Overview of the

Basic Principles of the Q-Co der Adaptive Binary Arithmetic Co der, IBM Journal

of and Development, Vol.32, No.6, Novemb er 1988, pp. 771-726.

11. Ralf Steinmetz and Klara Nahrstedt. Multimedia: Computing, Communications

and Applications , Prentice-Hall, Inc., 1995.

12. Francois Fluckiger. Understanding Networked Multimedia: Applications and Tech-

nology, Prentice-Hall, Inc., 1995.

13. A. Murat Tekalp. Digital Video Pro cessing, Prentice-Hall, Inc., 1995.

14. William B. Pennebaker and Joan L. Mitchell. JPEG : Still Image Data Compression

Standard, Van Nostrand Reinhold, 1992.

Sec. 3.13. Summary 121

Exercises

1. Find the digitized values of the samples of the signal given in Fig. 3.24. Each sample is 8 bits.

2.0 1.25 1.05 1.0 S 4 S6 S S123SS 5 0.75

1.25

Fig. 3.24. Signal for Exercise 1 in Chapter 3

2. Calculate the Fourier Transform of the digital samples from the previous exercise.

2

3. An 8x8 pixel formation has elements with values that are inverse of r + s , where

r and s are row and column numb ers. Compute the value of the pixel matrix.

4. Find the FDCT of the pixel matrix from the previous exercise.

5. The weights to b e asso ciated with each FDCT co ecient from the previous exercise

are inversely prop ortional to the p osition value. Enco de the values as p er this

weightage, keeping the order of the co ecients the same as the order in the pixel

matrix left-to-right and top-to-b ottom. Use run-length co ding for co ding the

resulting string.

6. Rep eat the previous exercise, using the zig-zag pattern. Use run-length co ding for

co ding the resulting string.

7. Scan your picture after disablin g compression in the scanner. Now compress the

picture using the JPEG standard. Scan the picture again with compression enabled

in the scanner. Compare.

8. Obtain an MPEG Trace from the . Find out the order in whichI,P, and

B frames app ear in that trace.

9. Find out the sizes of the I, P, and B frames from any MPEG trace. Calculate the

maximum and minimum size for all three typ es of frames.