<<

To appear in Proceedings of IEEE Conference

Lucent Technologies PROPRIETARY

Optimal Multiple Description Transform

Co ding of Gaussian Vectors

Jelena Kovacevic Vivek K Goyal

Bell Lab oratories Dept of Elec Eng Comp Sci

Murray Hill NJ University of California Berkeley

jelenabelllabscom vgoyalieeeorg

Abstract

Multiple description co ding MDC is source co ding for multiple channels

such that a deco der which receives an arbitrary subset of the channels maypro

duce a useful reconstruction Orchard et al prop osed a transform co ding

metho d for MDC of pairs of indep endent Gaussian random variables This pa

p er provides a general framework which extends multiple description transform

co ding MDTC to anynumber of variables and expands the set of transforms

which are considered Analysis of the general case is provided which can be

used to numerically design optimal MDTC systems The case of twovariables

sent over two channels is analytically optimized in the most general setting

where channel failures need not have equal probabilityor be indep endent It

is shown that when channel failures are equally probable and indep endent the

transforms used in are in the optimal set but many other choices are pos

sible A cascade structure is presented which facilitates lowcomplexity design

co ding and deco ding for a system with a large number of variables

Intro duction

For decades after the inception of techniques for source and chan

nel co ding develop ed separately This was motivated b oth by Shannons famous

separation principle and by the conceptual simplicity of considering only one or the

other Recently the limitations of separate source and channel co ding has lead many

researchers to the problem of designing joint sourcechannel JSC co des An exam

ination of Shannons result leads to the primary motivating factor for constructing

joint sourcechannel co des The separation theorem is an asymptotic result which re

quires innite blo ck lengths and hence innite complexityanddelay at b oth source

co der and channel co der for a particular nite complexity or delay one can often

do b etter with a JSC co de JSC co des have also drawn interest for being robust to

channel variation

Multiple description transform co ding is a technique which can be considered a

JSC co de for erasure channels The basic idea is to intro duce correlation between

transmitted co ecients in a known controlled manner so that erased co ecients can

be statistically estimated from received co ecients This correlation is used at the

deco der at the co ecient level as opp osed to the bit level so it is fundamentally

dierent from schemes that use information ab out the transmitted data to pro duce

likeliho o d information for the channel deco der The latter is a common element of

JSC co ding systems

Our general mo del for multiple description co ding is as follows A source sequence

fx g is input to a co der which outputs m streams at rates R R R These

k m

streams are sentonm separate channels There are many receivers and each receives

a subset of the channels and uses a deco ding algorithm based on which channels it

m

receives Sp ecically there are receivers one for each distinct subset of streams

except for the empty set and each exp eriences some distortion This is equivalentto

communicating with a single receiver when each channel may be working or broken

and the status of the channel is known to the deco der but nottothe enco der

This is a reasonable mo del for a lossy packet network Each channel corre

sp onds to a packet or set of packets Some packets may be lost but b ecause of

wn whichpackets are lost An appropriate ob jectiveisto header information it is kno

minimize a weighted sum of the distortions sub ject to a constraint on the total rate

When m the situation is that studied in information theory as the multiple

description problem Denote the distortions when b oth channels are received

only channel is received and only channel is received by D D andD resp ec

tively The classical problem is to determine the achievable R R D D D

tuples A complete characterization is known only for an iid Gaussian source and

squarederror distortion

This pap er considers the case where fx g is an iid sequence of zeromean jointly

k

T

Gaussian ve ctors with a known correlation matrix R E x x Distortion is mea

x k

k

sured by the meansquared error MSE The technique wedevelop is based on square

linear transforms and simple scalar quantization and the design of the transform is

paramount Rather dissimilar metho ds have b een develop ed which use nonsquare

transforms The problem could also be addressed with an emphasis on quantizer

design

Prop osed Co ding Structure

Since the source is jointly Gaussian we can assume without loss of generality that

the comp onents are indep endent If not one can use a KarhunenLoeve transform of

and the inverse at each deco der We prop ose the following the source at the enco der

steps for multiple description transform co ding MDTC of a source vector x

x x is quantized with a uniform scalar quantizer with stepsize x

i q

i

where denotes rounding to the nearest multiple of

T

The vector x x x x is transformed with an invertible discrete

q q q q

n

n n

The design and implementation of T transform T Z Z y T x

q

are describ ed b elow

For example the internet when UDP is used as opp osed to TCP



The vectors can b e obtained byblocking a scalar Gaussian source

The comp onents of y are indep endently entropy co ded

If mn the comp onents of y are group ed to be sent over the m channels

When all the comp onents of y are received the reconstruction pro cess is to ex

actly invert the transform T to get x x The distortion is precisely the quantiza

q

tion error from Step If some comp onents of y are lost they are estimated from the

received comp onents using the statistical correlation intro duced by the transform T

The estimate x is then generated by inverting the transform as b efore

Starting with a linear transform T with determinant one the rst step in deriving

a discrete version T is to factor T into lifting steps This means that T is

factored into a pro duct of upp er and lower triangular matrices with unit diagonals

T T T T The discrete version of the transform is then given by

k

T x T T T x

q k q

The lifting structure ensures that the inverse of T can be implemented by reversing

the calculations in

h i

T y T T T y

k

The factorization of T is not unique for example

      

a

bca

a b

b

c

ac

a

bc bca

c

c

b

a ab

Dierent factorizations yield dierent discrete transforms except in the limit as

approaches zero

The co ding structure prop osed here is a generalization of the metho d prop osed

by Orchard et al In only transforms implemented in two lifting steps

were considered By xing a in both factorizations reduce to having two

nonidentity factors

It is very imp ortant to note that we rst quantize and then use a discrete trans

form If we were to apply a continuous transform rst and then quantize the

use of a nonorthogonal transform would lead to noncubic partition cells which are

inherently sub optimal among the class of partition cells obtainable with scalar quan

tization The present conguration allows one to use discrete transforms derived

from nonorthogonal linear transforms and thus obtain b etter p erformance

Analysis of an MDTC System

The analysis and optimizations presented in this pap er are based on ne quantization

approximations Sp ecically we make three assumptions which are valid for small

First we assume that the scalar entropy of y T x is the same as that of

we assume that the correlation structure of y is unaected by the Tx Second

quantization Finally when at least one comp onent of y is lost we assume that the

distortion is dominated by the eect of the erasure so quantization can be ignored

Denote the variances of the comp onents of x by and denote the

n

T

correlation matrix of x by R diag Let R TR T In the

x y x

n

absence of quantization R would be exactly the correlation matrix of y Under

y

our ne quantization approximations we will use R in the estimation of rates and

y

distortions

Estimating the rate is straightforward Since the quantization is ne y is approx

i

imately the same as Tx ie a uniformly quantized Gaussian random variable

i

If we treat y as a Gaussian random variable with p ower R quantized with

i y ii

y

i

bin width we get for the entropy of the quantized co ecient Ch

log e log log log e log log k H y

i

y y y

i i i



where k log e log and all logarithms are basetwo Notice that k

dep ends only on We thus estimate the total rate as

n n

X Y

R H y nk log

i

y

i

i i

Q Q

n n

The minimum rate o ccurs when and at this rate the comp onents

y i

i i

i

of y are uncorrelated Interestingly T I is not the only transform which achieves

the minimum rate In fact an arbitrary split of the total rate among the dierent

comp onents of y is p ossible This is a justication for using a total rate constraintin

our following analyses However we will pay particular attention to the case where

across each channel are equal the rates sent

Wenow turn to the distortion and rst consider the average distortion due only to

quantization Since the quantization noise is approximately uniform this distortion

is for each comp onent Thus the distortion when no comp onents are erased is

given by

n

D

and is indep endent of T

Now consider the case when comp onents are lost We rst must determine

how the reconstruction should pro ceed By renumb ering the variables if necessaryas

sume that y y y are received and y y are lost Partition y into re

n n n

T T

ceived and not received p ortions as y y y where y y y y

r nr r n

T

and y y y y The minimum MSE estimate of x given y is

nr n n n r

E xjy which has a simple closed form b ecause x is a jointly Gaussian vector Using

r

the linearity of the exp ectation op erator gives the following sequence of calculations

x E xjy E T Txjy T E Txjy

r r r

y y

r r

T y T E

r

E y jy y

nr r nr

If the correlation matrix of y is partitioned in a way compatible with the partition of

y as

R B

T

R TR T

y x

T

B R

T

then it can be shown that y jy is Gaussian with mean B R y and correlation

nr r r

 

T T

matrix A R B R B Thus E y jy B R y and y E y jy is

nr r r nr nr r

Gaussian with zero mean and correlation matrix A is the error in predicting y

nr

from y and hence is the error caused by the erasure However b ecause wehave used

r

a nonorthogonal transform we must return to the original co ordinates using T in

order to compute the distortion Substituting y for E y jy in gives

nr nr r

y

r

T T

x T x T so kx x k T U U

y

nr

Finally where U is the last columns of T

X X

T

E kx xk U U A

ij ij

i j

We denote the distortion with erasures by D To determine D wemust nowaverage



n

over the p ossible erasures of comp onents weighted by their probabilities

if necessary

Our nal distortion criterion is a weighted sum of the distortions incurred with

dierent numb ers of channels available

n

X

D D

For the case where each channel has an outage probability of p and the channel



n

n

outages are indep endent the weighting p p makes D the overall

exp ected MSE However there are certainly other reasonable choices for the weights

Consider an image co ding scenario when an image is split over ten packets One

might want acceptable image quality as long as eight or more packets are received

In this case one should set

For a given rate R our goal is to minimize D The expressions given in this section

can be used to numerically determine transforms to realize this goal Analytical

solutions are p ossible in certain sp ecial cases Some of these are given in the following

section

Sending Two Variables Over Two Channels

General case Let us now apply the analysis of the previous section to nd the b est

transforms for sending n variables over m channels In the most general

situation channel outages may have unequal probabilities and may be dep endent

Supp ose the probabilities of the combinations of channel states are given by the

following table

Channel

broken working

Channel broken p p p p

working p p

a b d b

Let T normalized so that det T Then T and

c d c a

a b ac bd

T

R TR T

y x

ac bd c d

By the total rate is given by

d c b R k logR R k loga

y y

Minimizing over transforms with determinant one givesaminimum p ossible rate

 

of R k log We refer to R R as the redundancy ie the price

we pay in rate in order to p otentially reduce the distortion when there are erasures

In order to evaluate the overall average distortion wemust form a weighted aver

age of the distortions for each of the four p ossible channel states If b oth channels are

working the distortion due to quantization only is D If neither channel is

working the distortion is D The remaining cases require the application

of the results of the previous section We rst determine D the MSE distortion

when y is received but y is lost Substituting in

R

b

y

a b D R

y

a

R a b

y

z z

T

A

U U

where wehave used det T ad bc in the simplication Similarly the distortion

when y is received but y is lost is D c d c d The overall

average distortion is

D p D p D p D p p p D

p

D D p p p p p

p

z

D

where the rst bracketed term is indep endent of T Thus our optimization problem



is to minimize D for agiven redundancy

If the source has a circularly symmetric probability density ie then



D p p indep endent of T Henceforth we assume

After eliminating d through d bca one can show that the optimal trans

form will satisfy

h i

p

p

jaj bcbc

c



Furthermore D dep ends only on the pro duct b c not on the individual values of b

and c The optimal value of bc is given by

p p p

bc

optimal

p p p

It is easy to check that bc ranges from to as p p ranges from to

optimal

The limiting b ehavior can b e explained as follows Supp ose p p iechannel is

much more reliable than channel Since bc approaches ad must approach

optimal

and hence one optimally sends x the larger variance comp onent over channel

the more reliable channel and viceversa This is the intuitive layered solution The

h is most useful when the channel failure probabilities are multiple description approac

comparable but this demonstrates that the multiple description framework subsumes

layered co ding

Equal channel failure probabilities If p p then bc inde

optimal

p endent of The optimal set of transforms is describ ed by

a but otherwise arbitrary c b

p

b a d a

and using a transform from this set gives

p

D D D

This relationship is plotted in Figure a Notice that as exp ected D starts at

a maximum value of and asymptotically approaches a minimum value

of By combining and one can nd the relationship between R

D and D For various values of R the tradeo betw een D and D is plotted in

Figure b

The solution for the optimal set of transforms has an interesting prop erty that

after xing there is an extra degree of freedom which do es not aect the vs

D p erformance This degree of freedom can be used to control the partitioning of

the rate between channels or to give a simplied implementation

Optimality of Orchard et al transforms In it is suggested to use transforms

b

of the form As a result of our analysis we conclude that these

b

transforms in fact lie in the optimal set of transforms The extra degree of freedom

xing a which yields a transform which can be factored into has been used by

two lifting steps in the general case three lifting steps are needed

Optimal transforms that give balanced rates The transforms of do not give

channels with equal rate or equivalently power In practice this can be remedied −1.5

0.6 R = 8 R = 7 R = 6 R = 5 R = 4 −2

−2.5 0.5

−3

0.4 −3.5

−4 0.3 D1 (MSE)

D1 (MSE in dB) −4.5

0.2 −5

−5.5 0.1 −6

0 −6.5 0 0.5 1 1.5 2 2.5 3 −50 −45 −40 −35 −30 −25 −20 −15 −10 −5

Redundancy (bits/vector) D0 (MSE in dB)

a b

Figure Optimal R D D tradeos for a Relationship b etween

redundancy and D b Relationship b etween D and D for various rates

through timemultiplexing An alternative is to use the extra degree of freedom to

make R R Doing this is equivalent to requiring jaj jcj and jbj jdj and yields

s

p

s

p

a b

a

In the next section when weapplyatwobytwo correlating transform we will assume

a a



a balancedrate transform Sp ecicallywe will use T

a

a a

Geometric interpretation The transmitted representation of x is given by y

T T

hx i and y hx i where a b and c d In order to gain

some insight into the vectors and that result in an optimal transform let us

neglect the rate and distortion that are achieved and simply consider the transforms

describ ed by ad and bc We can show that and form the

same absolute angles with the p ositive x axis see Figure a For convenience

supp ose a and b Then c d Let and be the angles by which

and are below and ab ove the p ositiv e x axis resp ectively Then tan

ba dc tan If we assume then the maximum angle for

is arctan and the minimum angle for is zero This has the nice

interpretation of emphasizing x over x b ecause it has higher varianceas the

co ding rate is increased see Figure b

Three or More Variables

the results of Section to the Three variables over three channels Applying

design of transforms is considerably more complicated than what has b een

x x

 

d b



d



a c c a



x x

b b

a b

Figure Geometric interpretations a When the optimality condition

ad bc is equivalent to arctan b If

max

in addition to the optimality condition we require the output streams to have equal

rate the analysis vectors are symmetrically situated to capture the dimension with

greatest variation At as and close on the

max

x axis

presented thus far Even in the case of equal channel failures a closed form solution

will b e much more complicated that When and erasure probabilities

are equal and small a set of transforms which gives near optimal p erformance is

describ ed by

p

 

a

p

a

 

a

 

 

p

 

a

 

a

p

 

 

a

p

a

a

A derivation of this set must be omitted for lack of space

Four and more variables For sending any even number of variables over two

channels Orchard et al have suggested the following form pairs of variables

add correlation within each pair and send one variable from each pair across each

channel A necessary condition for optimality is that all the pairs are op erating at the

same distortionredundancy slop e If T is used to transform variables with variances

and and T is used to transform variables with variances and then the

equalslop e condition implies that we should have

p

where

Finding the optimal transform under this pairing constraint still requires nding the

optimal pairing

Cascade Structures In order to extend these schemes to an arbitrary number of

channels while maintaining reasonable ease of design we prop ose the cascade use of

x y

 

T T

 

y x

x y

 

T T



x y

 

Figure Cascade structure allows ecient MDTC of large vectors

pairing transforms as shown in Figure The cascade structure simplies the enco d

ing deco ding b oth with and without erasures and design when compared to using

a general n n transform Empirical evidence suggests that for n m and

considering up to one comp onent erasure there is no p erformance p enalty in restrict

ing consideration to cascade structures This phenomenon is under investigation As

exp ected there is great improvement over simple pairing of co ecients pairing can

not be exp ected to b e near optimal for m larger than two

References

M T Orchard Y Wang V Vaishampayan and A R Reibman Redundancy

ratedistortion analysis of multiple description co ding using pairwise correlating

transforms In Proc IEEE Int Conf Image Proc

J K Wolf A D Wyner and J Ziv Source co ding for multiple descriptions

Bel l Syst Tech J

L Ozarow On a sourceco ding problem with two channels and three receivers

Bel l Syst Tech J

A A El Gamal and T M Cover Achievable rates for multiple descriptions

IEEE Trans Info Theory IT

VKGoyal J Kovacevic and M Vetterli Multiple description transform co ding

Robustness to erasures using tight frame expansions In Pr oc IEEE Int Symp

Info Th August Submitted

V A Vaishampayan Design of multiple description scalar quantizers IEEE

Trans Info Theory

JC Batllo and V A Vaishampayan Asymptotic p erformance of multiple de

scription transform co des IEEE Trans Info Theory

I Daub echies and W Sweldens Factoring transforms into lifting steps

Technical rep ort Bell Lab oratories Lucent Technologies September

A Gersho and R M Gray and Compression Kluwer

Acad Pub Boston MA

T M Co ver and J A Thomas Elements of Information Theory John Wiley

Sons New York