On the Redundancy of Slepian–Wolf Coding Da-Ke He, Member, IEEE, Luis A

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 5607 On the Redundancy of Slepian–Wolf Coding Da-ke He, Member, IEEE, Luis A. Lastras-Montaño, Senior Member, IEEE, En-hui Yang, Fellow, IEEE, Ashish Jagmohan, and Jun Chen, Member, IEEE

Abstract—In this paper, the redundancy of both variable and or in other words, the side information. This problem of source fixed rate Slepian–Wolf coding is considered. Given any jointly I coding with decoder only side information was first introduced memoryless source-side information pair f@iYiAgiaI with finite n and studied by Slepian and Wolf in their ground breaking paper alphabet, the redundancy @nA of variable rate Slepian–Wolf n n [13], and is now commonly referred to as the Slepian–Wolf coding of I with decoder only side information I depends on both the block length n and the decoding block error probability (SW) coding problem (with one encoder). It was shown in [13] n, and is defined as the difference between the minimum average that for any memoryless pair , there exist SW codes that compression rate of order n variable rate Slepian–Wolf codes can compress with the decoder-only side information arbi- having the decoding block error probability less than or equal trarily close to (asymptotically) while still recovering to n, and the conditional entropy r@j A, where r@j A is the conditional entropy rate of the source given the side informa- with vanishing error probability , which is defined as n tion. The redundancy of fixed rate Slepian–Wolf coding of I the probability that the source’s message differs from the re- n with decoder only side information I is defined similarly and constructed message at the decoder. In other words, one can do, n denoted by p @nA. It is proved that under mild assumptions asymptotically, as well as in the case where the side information Yn@ Aad log an C o@ log anA about n n v n n and is also available at the encoder. This result is one of the first n @ Aad log an C o@ log anA d d p n f n n , where f and v and remains as one of the most influential results in network in- are two constants completely determined by the joint distribution formation theory ever obtained. of the source-side information pair. Since dv is generally smaller than df , our results show that variable rate Slepian–Wolf coding Beyond its significant impact on network information theory, is indeed more efficient than fixed rate Slepian–Wolf coding. SW coding has also attracted a lot of attention from a practical side; recently there have been significant advances in con- Index Terms—Compression rate, error probability, lossless data compression, redundancy, side information, Slepian–Wolf coding, structing practical SW codes [11], [7], [12], [14], [16], and re- source coding. searching their use in various applications (see [8] and the references therein). A natural consequence of this surge of interest in the practice of SW coding is a renewed interest in understanding I. INTRODUCTION better the fundamental limitations of finite block length codes from an information theoretical point of view. For this purpose, ET be a pair of random variables taking values in we define the redundancy of an (order ) SW code L finite alphabets and , respectively. Let as , where denotes the decoding error prob- denote a sequence of independent copies of . For brevity, ability of , and denotes the average compression rate the memoryless sources and are sometimes in bits per letter resulting from using the code to compress referred to as the sources and , respectively. Suppose that . Further define the source is to be compressed without essential loss of information, and is available only to the decoder as a helper, (1.1) Manuscript received February 14, 2008; revised June 01, 2009. Current ver- sion published November 20, 2009. The work of E.-h. Yang is supported in part where the minimization is taken over all (order ) variable rate by the Natural Sciences and Engineering Research Council of Canada under SW codes with decoding error probability no greater than . Grants RGPIN203035-02 and RGPIN203035-06, and by the Canada Research The quantity is called the redundancy of variable rate Chairs Program. The material in this paper was presented in part at the 2006 IEEE International Symposium on Information Theory (ISIT), Seattle, WA, the SW coding of with the decoder only side information 2006 IEEE Information Theory Workshop, Chengdu, China, and the 2007 IEEE and decoding block error probability . If the minimization in International Symposium on Information Theory (ISIT), Nice, France. (1.1) is limited to the set of all (order ) fixed rate SW codes D.-k. He was with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA. He is now with RIM/SlipStream, Waterloo, ON N2L with decoding error probability no greater than , the resulting 5Z5, Canada (e-mail: [email protected]). quantity (denoted by )) is called the redundancy of fixed L. A. Lastras-Montaño and A. Jagmohan are with IBM T. J. Watson Re- rate SW coding of with the decoder only side information search Center, Yorktown Heights, NY 10598 USA (e-mail: [email protected]. com; [email protected]). and decoding block error probability . E.-h. Yang is with the Department of Electrical and Computer Engineering, Previous research efforts in understanding the performance University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: ehyang@ of finite block length SW codes have been mainly focused on uwaterloo.ca). J. Chen was with IBM T. J. Watson Research Center, Yorktown Heights, fixed rate codes. Specifically, Wolfowitz’s treatment [15] of SW NY 10598 USA. He is now with the Department of Electrical and Computer coding demonstrates the existence of fixed rate codes that op- Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada (e-mail: erate within of the conditional entropy for some [email protected]). Communicated by E. Ordentlich, Associate Editor for Source Coding. function . Classic results by Gallager [6], Csiszár and Digital Object Identifier 10.1109/TIT.2009.2032803 Körner [5] (obtained also with K. Marton), describe good upper

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5608 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 and lower bounds on the error exponent of fixed rate SW coding1 II. INTRINSIC CONDITIONAL ENTROPY (the best exponential rate of decay of the decoding error proba- Motivated and inspired by [17], in this section, we introduce bility as a function of the rate provided). More recently, Baron et and analyze the concept of intrinsic entropy, which will play al. [2] studied for a pair of uniform binary random vari- a fundamental role in our performance analysis of SW coding ables connected through a binary symmetric channel. in terms of the tradeoff between the redundancy and decoding In contrast, other than the fact that error probability. We first describe the notation to be used throughout this paper. Let be a finite set. The notation (1.2) stands for the cardinality of , and for any finite sequence from denotes the length of . For any positive integer little progress has been made towards fully understanding the denotes the set of all sequences of length from . redundancy of variable rate SW. The purpose of this For convenience, we will sometimes write as paper is to characterize this quantity and , and hence , where are two integers, or simply as when gain insights into the performance of SW codes in practice. We . A similar convention will be applied to sequences of shall show that under mild assumptions2 about random variables as well. We use to denote the set of all probability distributions on , and to denote the subset of where probability distributions with zero entries are (1.3) excluded. Let denote a probability distribution in . (1.4) The marginal distributions of over and are referred to as and , respectively. The conditional distribution is defined by where and are two constants completely determined by the joint distribution of . The exact formulae to compute for when and are provided later in Sections II and III, respectively. for when A couple of implications can be drawn immediately from (1.3) and (1.4). First, the redundancy of SW coding is signifi- Occasionally, we shall also write as for con- cantly larger than that of conditional coding given the side in- venience. Unless specified otherwise, log denotes the logarithm formation available to both the encoder and decoder, which is to base , denotes the natural logarithm, and denotes the . This difference might explain why designing efficient base of the natural logarithm. SW codes, in comparison to designing conditional source codes, Our analysis in this paper makes heavy use of the method of is so challenging. Second, the design of practical SW codes types [4]. An -tuple with finite block length is not simply a matter of approaching the conditional entropy rate ; instead, it is more about the tradeoff among the compression rate, decoding error probability, and block length. Third, since is strictly less than in is said to be an -type if for any general, variable rate SW coding is indeed more efficient than . The set of all -types on is de- fixed rate SW coding for finite block length, which could not be noted by . The type of a sequence is defined revealed by the first order performance analysis. as which is an -type The rest of the paper is organized as follows. In Section II, on , where .For we introduce the concept of intrinsic entropy and prove some denotes the set of all length- sequences from with type , preliminary results which will facilitate our later discussions. i.e., . Section III is devoted to establishing lower bounds to We now introduce the notion of intrinsic entropy.For and . In Section IV, we show that the lower bounds and , we define the intrinsic -entropy of as in Section III are indeed tight by establishing matching upper bounds. In Section V, we compare the compression perfor- (2.1) mance of variable rate SW coding with that of fixed rate Sleapian–Wolf coding for finite block lengths. Final discussions are given in Section VIII. Throughout this paper, denotes the relative entropy between two distributions, i.e. 1The tradeoff between compression rate and decoding error probability in fixed rate SW coding is interestingly related to that in fixed rate classic source coding with side information available to both the encoder and the decoder. Specifically, it is eloquently argued in [6] that at rates within a certain range, there exists at least one fixed rate SW code performs (in terms of error exponent) as good as the best classic fixed rate code using the side information at if both and are defined over . Observe that the set the encoder; and at rates higher than the certain range, the error exponent of fixed rate SW coding is unknown and might be inferior to that of classic fixed rate coding. Similarly, it is not hard to see that when is in a certain range, @ A is the same as the redundancy of fixed rate classic source coding with side information available at the encoder. is convex and compact. Since the entropy function is continuous, 2As n 3I, will go to 0 fast enough, but not exponentially. the supremum in (2.1) is attainable.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5609

From (2.1), it is easy to see that i) and ; and ii) and . For brevity, let . From the convexity of relative entropy, we have that For any , let . We then have the following result. Lemma 1: The intrinsic entropy has the following properties: (2.4) 1) is a concave function of ; 2) is continuous in ; and Note that if and , then . We get 3) for any fixed is a strictly increasing function in . The proof of Lemma 1 folllows immediately from the definition of in (2.1). Extending the notion of intrinsic entropy, we define the intrinsic -conditional entropy of as follows: (2.5)

(2.2) In the above, the ﬁrst equality is due to where denotes the marginal of over . Note that

Similarly, the intrinsic -conditional entropy of with a constrained marginal is deﬁned by

and a similar expansion of ; and the last inequality follows from applying the log-sum inequality to and (2.3) . Inequalities (2.4) and (2.5) together imply Property 1). Similarly, we can prove Property 1) for . Prop- erties 2) and 3) follow immediately from (2.2) and (2.3) and The definitions (2.2) and (2.3) will be used later to analyze the Property 1). This completes the proof of Lemma 2. redundancy of fixed rate and variable rate SW coding, respectively. In order to gain insights into intrinsic entropy and intrinsic From (2.2) and (2.3), it is easy to see that conditional entropy, and more importantly, to see how they can , and be used to analyze the redundancy in source coding, we re- . For any , let late them to classical conditional entropy. Let be a pair , and of random variables with joint distribution and alphabet . Let and denote the marginals of over and . We have the , respectively; and let and denote the conditional following result. probability distributions of given and given , respec- Lemma 2: The intrinsic conditional entropy tively. The following two lemmas are proved in Appendices A ( , respectively) has the following properties: and B, respectively. 1) it is a concave function of ; Lemma 3 (Intrinsic Versus Classical): Suppose 2) it is continuous in ; and . There exists a constant such that 3) for any fixed , it is a strictly increasing function in ( , respectively). Proof of Lemma 2: We briefly show Property 1) for by proving for all less than some threshold. Lemma 4 (Intrinsic Conditional vs Classical Conditional): Assume . Then the following results hold: •If or is not uniformly distributed over , then there exists a such that for any where , and . Let be two distributions such that

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5610 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

where receiving codewords and with the help of the side information sequences. Since the mapping is often many-to-one, we sometimes refer to an entry as a bin index as in the literature of SW coding. The order SW code is called a ﬁxed rate code if consists of binary codewords of the same length, and a variable rate code if codewords may have different lengths. Clearly, the class of all (order ) variable rate SW (2.6) codes includes that of all (order ) ﬁxed rate SW codes as a strict subclass. •If , then for any , there exists When is applied to encode , the resulting a depending only on and such that for any average compression rate is given by and any distribution satisfying for all

where denotes the length in bits of the binary sequence .On the decoder side, let denote the decoder output. The decoding error probability of is given by where and

In [13], it was shown that can be made arbitrarily close to while maintaining a small decoding error probability when is large enough. On the other hand, it is clear that the smaller , the larger . Therefore, to fully understand the performance of SW coding, it is desirable to in- (2.7) vestigate the best tradeoff between and among all order SW codes . Under the condition that , a prescribed threshold, we first derive lower bounds on Remark 1: in Theorems 1 and 2 below for variable rate and fixed rate SW Lemma 3 can be strengthened to have an expression similar coding, respectively. The proofs of these two theorems are de- to that of Lemma 4. ferred to Section VI. Remark 2: The quantity inside the bracket of the equa- Theorem 1: Assume and tion defining can be interpreted as the variance of . Let be a sequence of positive real numbers satisfying (taken with respect to ). Similarly, and . Then for sufficiently the quantity inside the bracket of the equation defining can be interpreted as the average (over ) of the variance of large and any order variable rate code with (taken with respect to ). The conditions on (3.1) assumed in Lemma 4 above are made so that these two variances are bounded away from , respectively. one has In the following, we will make use of intrinsic entropy, intrinsic conditional entropy, and the above lemmas to investigate the performance of SW coding of with decoder side information in terms of the tradeoff between the redundancy and (3.2) decoding error probability.

III. LOWER BOUNDS where Let be a memoryless source-side information pair with finite alphabet . In this section, we establish a lower bound on the compression rate of SW coding of with decoder side information and a given decoding error probability. We begin with the formal definition of variable rate SW (3.3) coding. Let denote a set of finite binary codewords satisfying the prefix condition. An order SW code is described by a pair , where , acting as an encoder, maps source sequences of block length from to Theorem 2: Assume . Let binary codewords in , and , acting be defined as in Section II. Then for sufficiently as a decoder, reconstructs the encoded source sequences upon large , the following hold.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5611

a) For any order ﬁxed rate code with de- Remark 5: Theorems 1 and 2 can be extended to the case coding error probability where satisfying the following condition.

C1: is strictly less than the minimum zero-error coding rate of achievable asymptotically with decoder only side information [18], [1], [10]. whenever , where and In [1], [10], the minimum zero-error coding rate of achievable are constants depending only on . asymptotically with decoder only side information is shown b) If or is not uniformly distributed over to be given by the complementary graph entropy , , then for any order ﬁxed rate code with decoding where denotes the characteristic graph associated with error probability [18]. Together with the fact that , Condition C1 implies that

To extend the two theorems to satisfying Condition C1, one can follow an argument similar to that used to prove Theorem 1 in Section VI. Some necessary modifications whenever , where is specified in are in order, which begin with the definitions of the intrinsic Part 1 of Lemma 4, and is given by (2.6). -conditional entropy, , and the intrinsic -conditional entropy with a constrained marginal , Remark 3: We say that the above coding theorem is proved in (2.2) and (2.3), respectively. Specifically, the maximum in from the encoder’s perspective because the center of argument (2.2) and (2.3) is now taken over all whose support set [1] is is placed on the encoder, where we are mainly concerned that a subset of the support set of . The rest of the changes follow under a probability constraint, the encoder can pack how many accordingly, and the details are omitted as they do not provide distinct source sequences into one bin. The careful reader might much new insight. have found out that this type of packing argument is similar to the sphere packing method in channel coding. Indeed, our proof IV. UPPER BOUNDS above clearly demonstrates that the underlying connection between SW coding and channel coding: each bin performs like a In this section, the lower bounds established in Section III are channel code where every sequence in the same bin is mapped shown to be tight by establishing the respective matching upper to a set of side information sequences equivalent to channel re- bounds. Specifically, we have the following two theorems. alizations. Theorem 3: Assume and The coding theorem above can also be proven from the de- . Let be a sequence of positive real numbers such that coder’s perspective [9]. From this perspective, the center of ar- and . Then there exists a se- gument is placed on the decoder, where we are mainly con- quence of variable rate SW codes with cerned that under a probability constraint, the decoder has to such that for sufficiently large receive enough bits to decode a sufficient number of distinct source sequences. As a result, the argument relies more on the (4.1) classical Kraft’s inequality as in the analysis of classical lossless source codes. From this perspective, the connection between SW coding and classical source coding becomes apparent: for where is given by (3.3). each typical side information sequence, one can extract an good Theorem 4: Assume . Further assume embedded source code from a good SW code. either or is not uniformly distributed. Let Remark 4: The assumption that has be a sequence of positive real numbers such that intuitive explanations in addition to being viewed as the log- and . Then there exists a sequence of fixed ical consequence of our technical argument above. When rate codes with such that for sufficiently large , one can construct a code (see Remark 7 in Section IV) that achieves the conditional entropy from below by discarding some atypical source types whose (4.2) probability is in the order of , and the reduction in rates cannot be offset by the redundancy needed for encoding where is given by (2.6). the typical types. Thus, in contrast to fixed rate coding, there The proofs of Theorem 3 and 4 are provided in Section VII. is no redundancy in variable rate SW coding. This clearly demonstrates the fundamental difference between fixed Remark 6: Combining the proof of Theorem 1 with that of rate and variable rate SW coding in the regime of slow decaying Theorem 3, we see that an efficient variable rate SW code should error probabilities. assign the same average decoding error probability to each bin,

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5612 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 and the same decoding error probability to each sequence with Assume and . Let be the same type. a sequence of positive real numbers satisfying Remark 7: In view of Theorem 3 and its proof in Section VII, we see that for any constant (not depending upon ), there exists a sequence of variable rate SW codes such that for sufficiently large and . From Sections III and IV, it follows that for large , the best compression performance in bits per symbol of order variable rate SW coding with the decoding error prob- (4.3) ability less than or equal to and that of order fixed rate SW coding with the decoding error probability are equal to where is a constant. Briefly, the construction of and is similar to that in the proof of Theorem 3 with the following modifications. Let be the sequence to be encoded with type . In Step 2 on the encoder side, if where is a constant selected so that respectively. Thus the comparison between them for large finite block lengths boils down to comparing with . To this end, we have the following result. the encoder sends nothing to the decoder in this step; otherwise, select so that Theorem 5: Assume . Then we have

with equality if and only if and perform random binning of with bins. Step 2 on the decoder side is modified accordingly to reconstruct an arbitrary sequence whenever . An argument is a constant, i.e., does not depend on the actual symbol . similar to that used in the proof of Theorem 3 can then be used Proof of Theorem 5: In view of the definitions of and to prove (4.3). Further generalizing the construction of , one in (3.3) and (2.6), respectively, we see that in order to prove The- sees there exist SW codes with that orem 5, it suffices to compare against .It achieve from below, echoing Remark 4. is not hard to verify that Remark 8: It is well known that in classical lossless coding, the redundancy of zero-error variable rate coding, which is in the order , is much better than that of fixed rate coding, which is in the order , where denotes the decoding error probability. Though one might expect a similar result in SW coding, Theorem 1 in Section III and Theorem 4 paint a different picture: for a wide range of error probabilities, the redundancy of variable rate SW coding and that of fixed rate SW coding are in fact in the same order . Since the redundancy of variable rate SW coding was never fully characterized before, this result, to the best knowledge of the authors, is the first of its kind in the literature showing that one cannot distinguish variable rate SW coding from fixed rate SW coding simply by their respective redundancy order. In order to demonstrate that variable rate SW coding is indeed more efficient than fixed rate SW coding in general, in the next section we shall take a closer look at the constant terms and .

V. V ARIABLE RATE VS FIXED RATE Having established tight performance upper and lower bounds for both variable rate and fixed rate SW coding, we are now in a position to compare the performance of variable rate SW coding with that of fixed rate SW coding for large finite block lengths.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5613

VI. PROOF OF THEOREMS 1 AND 2 In this section, we prove Theorems 1 and 2. Proof of Theorem 1: Let be an order variable rate code with

(6.2)

where . Let be a positive number to be speciﬁed later. Deﬁne

In the above, the inequality 1) is due to the nonnegativity of variance with equality if and only if

where the constant is chosen so that is a constant. This completes the proof of Theorem 5. (6.3) Remark 9: Theorem 5 implies that even though variable rate SW coding and fixed rate SW coding approach asymptotically Throughout the paper, denotes the L1 distance. For brevity, the same compression limit at a speed of the same let denote the subset of such that order , for finite block lengths , variable rate SW coding is indeed more efficient than fixed rate SW coding in general. Remark 10: It is also interesting and surprising to see that for sources for which is a con- From Markov’s inequality, it follows that stant, variable rate SW coding and fixed rate SW coding have , which, together with (6.3), implies the same compression performance up to the second order inclusive. This implies that for these types of sources, there is no (6.4) need to use variable rate SW coding in practice. We conclude this section with an example to demonstrate the Define a binary random variable such that is equal to 1 if difference between and . and 0 otherwise. Furthermore, for any and , define Example: Consider the case where takes values in the binary alphabet with , and the channel from to is a binary symmetric channel (BSC) with crossover probability i.e.

if (5.1) if and We will assume that . When ,3

Since is a preﬁx set, it is easy to see that

When ; Fig. 1 shows the curves of and as a function of for .

3The redundancy of ﬁxed rate SW coding for the case where p aHXS was considered in [2, Theorem 2], where the term @log A q@I qA is obtained by using an argument based on the central limit theorem. (6.5)

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5614 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

which, together with (6.5), implies

(6.6)

where the last inequality is due to the fact that

Since each sequence in is equally probable, it follows that in (6.6)

Deﬁne

and

We are then led to upper bound . Fix and . Since

(6.7)

we can think of as a channel code for the memoryless channel given by with the average decoding error probability given by . Speciﬁcally, for each , deﬁne

Then the set mapping from to speciﬁes a d d Fig. 1. Comparison of redundancy constants and . channel code for with acting as a decoding set in for and the average decoding error probability . Note that when . Consequently, we shall Since the sets , are disjoint, we can use a consider only the case where . In view of the deﬁnition sphere packing argument to upper bound . This is essen- of entropy, we have for any tially the line we shall follow below. Applying the sphere packing argument directly to , however, encounters two problems: (1) the cardinalities of the sets may not have a tight uniform lower bound, and (2) the cardinality of the union of the sets over all may not have a tight upper bound. To overcome these two

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5615 problems, we shall limit our attention to a subset of good “code- as a matrix whose row sums equal , we can construct a type words” and the intersection of with some round as follows. type classes. To this end, define S1: For every row in , move the first elements to . From Markov’s inequality, it follows that the nearest integer multiples of . .4 For any -type with its marginal S2: Set the last column of the resulting matrix so that its over given by and any , let row sums equal . Verify that . It then follows from the standard Taylor’s series expansion that for all , there where is a positive constant. Instead of , we shall exist constants and such that consider the intersection of with below and (6.9) for . In other words, we shall apply the sphere packing argument to the set mapping from to (6.10) . Putting (6.8), (6.9), and (6.10) together, we have At this point, we invoke the following lemma, which will be proved in Appendix C, and enables us to tightly lower bound . Lemma 5 ( -Neighborhood of a Type): Assume . Then for any , there exists a constant such that for any sufficiently large ,any and -type ,any -type with and , and any sequence , the following results hold: 1) . 2) . 3) For any and any (6.11) subset of Select now . In view of Lemma 5 and its proof, we see that for

For brevity, let us use to denote whenever and to denote the complement of in . Then for

Fix and as we did before Lemma 5. Let be a constant to be speciﬁed later. In view of Lemma 2, let be a distribution in such that (6.12)

where the last inequality is due to . Equation (6.12), and together with Lemma 5, implies that

(6.8)

(6.13) Note that since increases as increases, such exists for sufﬁciently large . Regarding Regard as a sphere mapped to .In order to apply a sphere packing argument as mentioned above 4Due to Markov’s inequality, to upper bound (and in turn ), we need to upper i j Px I rf b P j Px g a bound P P which, together with the fact that all sequences in x are equally probable, implies that jx j Pjx j. In the above, i stands for the standard expecation operator.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5616 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 where the equality is due to the fact that is a subset of . To this end, we observe from the deﬁnition of that

which in turn implies

(6.14) where is a constant. In the above, the fact that for any

(6.17)

where the equality 1) follows from Lemma 4, and is de- ﬁned in (2.7). is used to upper bound . Putting (6.13) To proceed, note that (6.17) holds uniformly for all and and (6.14) together, we get . We average (6.17) with over . Observe that only one term on the right-hand-side of (6.17) depends on the index bin . Note that is a convex function when for . Thus if ,we can apply Jensen’s inequality and get

(6.15) where inequality 1) follows from (6.13), and inequality 2) is due to the inequality derived above. It follows (6.18) immediately from (6.15) that We continue to average (6.18) with over . (We can do this because when ). Let us ﬁrst calculate

(6.16) Observe that although is convex with re- Equation (6.16) gives the desired upper bound on the spect to when is concave with number of sequences (in with type ) which the en- respect to . To address this problem, we note that coder can place into one bin without violating the decoding error probability constraint. From (6.16) and the fact that , it follows that for sufﬁciently large, where is a constant dependent only on . Thus

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5617

where and are regarded as row vectors, and denotes the transpose of . When , we see that . Thus

(6.19)

In the above, inequality 1) is due to the fact that

and inequality 2) follows from Jensen’s inequality. At this moment, we invoke the following lemma, whose proof can be found in Appendix D. Lemma 6: There exists a constant such that for all . It follows from (6.19) and Lemma 6 that

(6.20) where . (6.21) We next calculate In the above, inequality 1) is due to ; and inequality follows from that

Denote by . Expanding In order to see that inequality 3) holds, we observe that for any at by using Taylor’s series, we have

if if and thus

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5618 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

In view of (6.18), (6.20), and (6.21), we see that proof of Theorem 1 and that of Lemma 5. Let be any fixed rate code as specified in Theorem 1. In view of the definition of the conditional intrinsic entropy in Section II and Lemma 2, we let be a distribution in such that

and

where is a positive number to be speciﬁed later. Around , we deﬁne a type set

Following an argument similar to that used to prove Lemma 5, we can show that

(6.22)

Since (6.25) Select now so that the right-hand-side of (6.25) is equal to , where . That is

Since we have

we have

(6.26)

For brevity, let us deﬁne

It then follows from (6.26), the deﬁnition of , and an argu- (6.23) ment similar to that used in the proof of Lemma 5 that where the last inequality follows from the observation that for . Putting (6.23) back (6.27) into (6.22), we ﬁnally get where is a positive constant. Observe that there are at most side information sequences in the set , where is a positive constant. This implies that the code needs at least

(6.28)

(6.24) distinct codewords (or equivalently bins) to encode the source sequences in . Equation (6.28), coupled with the fact that Because of (6.24) and Lemma 4, this completes the proof of is a fixed rate code and the definition of above, further Theorem 1. implies that Proof of Theorem 2: Theorem 2 can be proved in various ways. Here we only sketch a proof that is simplified from the

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5619

Because of our selection of and Lemma 4, this completes the Note that could be empty. Our code then proof of Theorem 2. reconstructs as a sequence in the bin specified by the transmitted bin index that is also in .5 That is, if we use to denote the set of all sequences VII. PROOF OF THEOREMS 3 AND 4 from in the th bin, then In this section, we prove Theorems 3 and 4. Proof of Theorem 3: Let and denote the marginals of over and , respectively. Let denote the conditional probability distribution of given . To prove if is non-empty, where denotes the this theorem, we shall construct a sequence of codes bin index sent from the encoder; otherwise, is with desired decoding error probability and redundancy. Our selected arbitrarily. If the set consists construction directly reflects the knowledge that we learned of more than one sequence, selects an arbitrary from the analysis of the redundancy lower bound of SW coding one in as . in Section III, and makes conscious use of the marginal type Suppose that , where . of . In view of the encoding and decoding process of described To describe , it suffices to see how and above, we see that a decoding error happens if either one of the work. For any , and with type , the encoder following two events occurs. encodes as follows. i) . Step 1: Encode by using ii) , but there exist other sequences in that also fall in the bin containing . In view of these, we have for

bits (7.29)

Step 2: If where is a constant selected so that

(7.33) (7.30) In the following, our job is to upper bound the two terms on the encode losslessly by using right hand side of (7.33). The derivation of these bounds makes use of the concept of intrinsic conditional entropy. (7.31) To upper bound , we deﬁne

otherwise, selects and (7.32) Let denote a sequence in . Then where , and are two positive numbers to be speciﬁed later. Construct bins. then randomly selects a bin with probability , places into the bin, and encodes the bin index by using bits. Note that the random bin selection is independent of , and the codeword length depends only on the type of . On the decoder side, the decoding function works as follows. Step 1: Decode the type from the transmitted codeword. Step 2: If , decode the source (7.34) sequence from the transmitted codeword; otherwise, decode the bin index from the transmitted codeword, and continue to Step 3 below. 5The decoding rule described in Step 3 of the decoding function g is similar Step 3: This step is executed only if to the standard joint typicality decoding rule with two subtle differences: 1) the joint type must satisfy the marginal constraint, i.e., the set consists of . For any side information sequence only sequences with the same type known to the decoder; 2) and the typicality and any , deﬁne by is evaluated by using relative entropy instead of L1 norm. The main reason for these subtle touches on the decoding rule is to derive a sharp upper bound of the redundancy of SW coding, while at the same time maintains the simplicity compared to the maximum a posteriori(MAP) decoding rule.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5620 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

In the above, the last equality follows from Stirling’s approxi- mation of integer factorials (see (C3) in Appendix C for details). To continue, we partition into a series of mutually exclusive subsets as follows: (7.38)

where denotes the integer factorial with convention , and the last equality follows from our assumption of ﬁnite where . Thus alphabets. the sum on the right-hand side of Leaving the selection of in (7.38) to a later moment, we (7.34) can be upper-bounded by turn our attention to . Suppose that .Given , the probability of the event is upper bounded by the probability that another sequence from is placed in the same bin as . Recall that each bin is selected with uniform probability , and that the bin selection is independent of . Deﬁne

and

(7.35)

Then a standard argument can be used to show that It remains to upper bound . Exam- ining the definition of above, we find that for (7.39) It is clear that in order to upper bound (7.39), it suffices to upper bound . We hint at this moment that our bound on (7.36) is related to intrinsic conditional entropy. Toward this direction, we see that from an argument similar to that used in the proof where the first inequality follows from the L1 bound on relative of Theorem 1, it follows that for any and entropy [3, Lemma 12.6.1]. This bound implies that

(7.37) (7.40)

Combining (7.34), (7.35), and (7.37), we arrive at Let denote the type in such that

Then for each with nonempty ,wehave

(7.41)

In the above, the equality 1) is due to (7.40). Using the argument that led to (7.37), we can upper-bound

(7.42)

Putting (7.42) back into (7.41) leads to

(7.43) where the last inequality follows from the deﬁnition of intrinsic conditional entropy in (2.3), and the deﬁnitions of and above. Since (7.43) holds for any and such that , it follows that

(7.44) which together with (7.39), implies

(7.45) (7.49) where the last inequality is due to (7.32). In the above, the inequality 1) is due to the concavity of For convenience, we now combine (7.33), (7.38), and (7.45) with respect to (see Lemma 2); the equality into follows from Lemma 4 and its proof in Appendix B; and the equality 3) is obtained by plugging in our selections of and in (7.47). Combining (7.48) with (7.49) now implies that there exists an (7.46) order deterministic variable rate SW code with the decoding error probability less than or equal to and the average com- In view of (7.46), we make the selections of and as fol- pression rate upper bounded by (7.49). This completes the proof lows: of Theorem 3. Proof of Theorem 4: In order to prove Theorem 4, we (7.47) modify the code constructed in the proof of Theorem 3 so that it becomes a fixed rate code as follows. Specifically, in Step 2 where and are con- of the encoder , for all type , select stants selected so that the two terms on the right-hand-side of (7.46) are upper bounded by , respectively. Consequently, it follows from (7.46) and the description of our code above that where and are selected in (7.47). Our modified code then randomly selects a bin with probability , places into the bin, and encodes the bin index by using bits. Since in Step 1, the type is encoded into a constant number of bits, the modified code is indeed a fixed rate code. Using an argument similar to that used in the proof of Theorem 3, we can easily show that for sufficiently large (7.48) for sufficiently large . To finish the proof, we need to upper bound the average com- with the modified code . This, coupled with the definition of pression rate of the code . Note that even though is a and Lemma 4, completes the proof of Theorem 4.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5622 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

VIII. CONCLUSION APPENDIX B

In this paper, we have characterized the compression perfor- In this section, we prove Lemma 4. For brevity, let . mance in bits per symbol of both variable rate and fixed rate In view of (2.2) and Lemma 2, we see that in order to prove SW coding for memoryless source-side information pairs up to the first part of Lemma 4, it suffices to solve the following con- the second order inclusive when the decoding error probability strained maximization problem goes to fast enough, but not exponentially as the block length . The characterization, on one hand, implies that surprisingly, variable rate SW coding and fixed rate SW (B1) coding approach asymptotically the same compression limit at a speed of the same order . This sharply contrasts Let . From [3, Lemma 12.6.1], we with the fact that in classical lossless coding, the redundancy of see that zero-error variable rate coding, which is in the order , is much better than that of fixed rate coding, which is in the order . On the other hand, the characterization also implies that for large finite block lengths , variable rate SW Thus, when is small, we can use Taylor’s series to expand coding is indeed more efficient than fixed rate SW coding in gen- at , and get eral. A necessary and sufficient condition has also been derived under which variable rate SW coding and fixed rate SW coding have the same compression performance up to the second order inclusive. The design of practical SW codes with finite block lengths is not simply a matter of approaching the conditional entropy rate ; instead, it is more about the tradeoff (B2) among the compression rate, decoding error probability, and block length. During the course of proving our main results, new information quantities called intrinsic entropy and intrinsic Similarly, we expand at , and get conditional entropy have been introduced and analyzed. It is ex- pected that these information quantities will have applications to (B3) other problems in information theory as well such as SW coding with multiple encoders. In view of (B2), and (B3), we see that when is small, we can simplify (B1) into the following constrained maximization APPENDIX A problem. In this section, we prove Lemma 3. For any , define by

Note that the marginals of over and are and , respectively. Then on the one hand we have subject to

(A1) To solve the above problem, we can use the standard Lagrange where is a positive constant. In the above, the last inequality multiplier. Deﬁne follows from the assumption that . On the other hand, when is small, there exists a constant such that

(A2)

Select . Because of (A1), (A2), and the fact that by deﬁnition, this completes the proof (B4) of Lemma 3.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5623

The ﬁrst derivative of with respect to is given by Observe that

(B5)

Letting (B5) equal zero, we have where and for any . Let . From [3, Lemma 12.6.1], we see that

(B6)

(B6), together with the constraints in (B4), leads to where denotes the natural logarithm. Thus, when is small, (B7) we can use Taylor’s series to expand at , and and get

(B8) (B11) Note that the difference within the brackets above is positive whenever or is not uniformly distributed over Similarly, we expand at , and get . Consequently (B12)

In view of (B11) and (B12), we see that when is small, we can simplify (B10) into the following constrained maximization problem:

subject to

for any (B13)

To solve the above problem, we again use the standard Lagrange multiplier. Deﬁne

(B9)

This completes the proof of the first part of Lemma 4. We now use the same strategy to prove the second part of Lemma 4. For brevity, let from this moment on. In (B14) view of (2.3) and Lemma 2, we see that in order to prove Lemma 4, it suffices to solve the following constrained maximization The first derivative of with respect to is given by problem:

(B15) subject to (B10)

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5624 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

Letting (B15) equal zero, we have

(B16)

(B16), together with the constraints in (B13), leads to (B17) and (B18) shown at the bottom of the page. Note that the difference within the brackets above is positive whenenver . Consequently

(B19)

Note that , and . Also, the above argument is valid uniformly for all satisfying for all . This completes the proof of Lemma 4.

APPENDIX C

In this Appendix, we prove Lemma 5. Fix and type according to the lemma’s statement. Around , we deﬁne the following type set:

We see that the set is equal to

(B17) and

(B18)

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5625

We pause at this moment to discuss some properties of the set . When is sufﬁciently large, we see that is a L1 ball centered at and is thus symmetric with respect to , i.e.

(C1)

(C5) Furthermore, observe that there are degrees of freedom to move the entries of at step size to obtain a for sufﬁciently large . In the above, the equality 1) follows type in . This implies that from an argument similar to that applied in the derivation of (C3) above; the inequality 2) is due to the convexity of the expo- (C2) nential function; the equality 3) follows from (C2); and the last equality follows from an argument similar to that in Appendix E. We now count the number of distinct sequences in It remains to lower bound the cardinality of any subset as follows: of with probability no smaller than , where is a real number satisfying . Recall that

is a linear function of , where and are two distributions deﬁned over the same alphabet. From (C4) and the construction of , it follows that for any

(C6)

where is a positive constant. It then follows immediately from (C6) that any subset of with probability satisfies (C3) for sufficiently large . In the above, the inequality 1) is due to (C7) the convexity of the exponential function; the equality 2) is obtained from (C2); and the last equality 3) is argued in Append E. for sufficiently large . Select to take care of the We next lower bound the probability terms in (C3) and (C5). We then get all the desired inequalities of Lemma 5 from (C3), (C5), and (C7). This completes the proof of Lemma 5.

Let denote a sequence in constructed above. Then APPENDIX D In this Appendix, we prove Lemma 6. Let us partition the set into a sequence of mutually exclusive subsets deﬁned by

(C4) For each , the probability that is upper bounded which implies by

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. 5626 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009

Let denote the smallest entry in . It is easy to verify that the absolute value of every entry in is upper bounded (D1) by . Consequently,

In the above, inequality 1) is due to the L1 bound on relative entropy [3, Lemma 12.6.1], i.e.

(E4) and equality 2) follows from observing that Combining (E2), (E3), and (E4), we arrive at

Using , we rewrite (E5)

where the last inequality follows from the deﬁnition of in Appendix C. This completes the proof of (E1).

ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers and the Associate Editor Dr. Ordentlich for their constructive comments that helped improve the presentation of the paper.

REFERENCES (D2) [1] N. Alon and A. Orlitsky, “Source coding and graph entropyies,” IEEE Trans. Inf. Theory, vol. 42, pp. 1329–1339, 1996. [2] D. Baron, M. A. Khojastepour, and R. G. Baraniuk, “Redundancy rates of Slepian-Wolf coding,” in Proc. Allerton Conf., Monticello, IL, USA, where is a positive constant. In the above, inequality 1) fol- 2004. lows from (D1). Let . This completes the [3] T. M. Cover and J. A. Thomas, Elements of Information Theory.New York: Wiley, 1991. proof of Lemma 6. [4] I. Csiszár, “The method of types,” IEEE Trans. Inf. Theory, vol. 44, pp. 2505–2523, 1998. [5] I. Csiszár and J. Körner, “Towards a general theory of source networks,” IEEE Trans. Inf. Theory, vol. 26, pp. 155–165, 1980. APPENDIX E [6] R. G. Gallager, Source Coding With Side Information and Universal Coding MIT LIDS Tech. Rep. (LIDS-P-937), 1976. In this appendix, we detail the derivation of the equality 3) in [7] J. Garcia-Frias and Y. Zhao, “Compression of correlated binary sources using turbo codes,” IEEE Commun. Lett., pp. 417–419, 2001. (C3). Specifically, we show there exists a constant , dependent [8] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed only on , such that source coding,” Proc. IEEE, vol. 93, pp. 71–83, 2005. [9] D.-k. He, L. A. Lastras-Montaño, and E.-h. Yang, “Redundancy of variable rate Slepian-Wolf codes from the decoder’s perspective,” in (E1) Proc. IEEE Int. Symp. Information Theory (ISIT’07), Nice, France, Jun. 2007, pp. 1321–1325. [10] P. Koulgi, E. Tuncel, S. L. Regunathan, and K. Rose, “On zero-error source coding with decoder side information,” IEEE Trans. Inf. Theory, for sufficiently large . vol. 49, pp. 99–111, 2003. [11] S. S. Pradhan and K. Ramchandran, “Distributed source coding using Regard and as row vectors for brevity. Expanding syndromes (DISCUS): Design and construction,” IEEE Trans. Inf. at by using Taylor’s series, we get Theory, vol. 49, pp. 626–643, 2003. [12] D. Schongberg, K. Ramchandran, and S. S. Pradhan, “Distributed code constructions for the entire Slepian-Wolf rate region for arbitrarily correlated sources,” in Proc. DCC’04, Snowbird, UT, 2004. [13] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inf. Theory, vol. 19, pp. 471–480, 1973. [14] V. Stankovic, A. Liveris, Z. Xiong, and C. Georghiades, “On code de- (E2) sign for the general Slepian-Wolf problem and for lossless multiterminal communication networks,” IEEE Trans. Inf. Theory, submitted for publication. [15] J. Wolfowitz, Coding Theorems of Information Theory, 3rd ed. New where denotes the transpose of . Since the term York: Springer-Verlag, 1978. is linear with respect to , it follows from (C1) [16] E.-H. Yang, D.-K. He, T. Uyematsu, and R. W. Yeung, “Universal multiterminal source coding algorithms with asymptotically zero feedback: that Fixed database case,” IEEE Trans. Inf. Theory, vol. 54, pp. 5575–5590, 2008. [17] Z. Zhang, E.-H. Yang, and V. K. Wei, “The redundancy of source (E3) coding with a fidelity criterion—Part one: Known statistics,” IEEE Trans. Inf. Theory, vol. 43, pp. 71–91, 1997.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply. HE et al.: ON THE REDUNDANCY OF SLEPIAN–WOLF CODING 5627

[18] H. S. Witsenhausen, “The zero-error side information problem and nications, information theory, source and channel coding including distributed chromatic numbers,” IEEE Trans. Inf. Theory, vol. 22, pp. 592–593, source coding and space-time coding, Kolmogorov complexity theory, and ap- 1976. plied probability theory and statistics. Dr. Yang is a recipient of several research awards including the 1992 Tianjin Da-ke He (S’01–M’06) received the B.S. and M.S. degrees, both in electrical Science and Technology Promotion Award for Young Investigators; the 1992 engineering, from Huazhong University of Science and Technology, Wuhan, third Science and Technology Promotion Award of Chinese National Education Hubei, China, in 1993 and 1996, respectively, and the Ph.D. degree in electrical Committee; the 2000 Ontario Premier’s Research Excellence Award, Canada; engineering from the University of Waterloo, Waterloo, ON, Canada, in 2003. the 2000 Marsland Award for Research Excellence, University of Waterloo; the From 1996 to 1998, he worked at Apple Technology China(Zhuhai) as a soft- 2002 Ontario Distinguished Researcher Award; the prestigious Inaugural (2007) ware engineer. From 2003 to 2004, he worked in the Department of Electrical Premier’s Catalyst Award for the Innovator of the Year; and the 2007 Ernest C. and Computer Engineering at the University of Waterloo as a postdoctoral re- Manning Award of Distinction, one of the Canada’s most prestigious innovation search fellow in the Leitch-University of Waterloo Multimedia Communications prizes. Products based on his inventions and commercialized by SlipStream re- Laboratory. From 2005 to 2008, he was a Research Staff Member in the De- ceived the 2006 Ontario Global Traders Provincial Award and were deployed partment of Multimedia Technologies at IBM T. J. Watson Research Center in by over 2200 Service Providers in more than 50 countries, servicing millions of Yorktown Heights, NY. Since 2008, he has been a Technical Manager in Slip- home and wireless subscribers worldwide every day. He served, among many stream Data, a subsidiary of Research In Motion, in Waterloo, ON, Canada. His other roles, as a General Co-Chair of the 2008 IEEE International Symposium research interests are in source coding theory and algorithm design, multimedia on Information Theory, a Technical Program Vice-Chair of the 2006 IEEE In- data compression and transmission, multiterminal source coding theory and al- ternational Conference on Multimedia & Expo (ICME), the Chair of the award gorithms, and digital communications. committee for the 2004 Canadian Award in Telecommunications, a Co-Editor of the 2004 Special Issue of the IEEE TRANSACTIONS ON INFORMATION THEORY, a Co-Chair of the 2003 US National Science Foundation (NSF) workshop on the interface of Information Theory and Computer Science, and a Co-Chair of Luis A. Lastras-Montaño (M’96–SM’06) received the Licenciatura in elec- the 2003 Canadian Workshop on Information Theory. He currently also serves tronics and digital systems from the Universidad Autónoma de San Luis Potosí, as an Associate Editor for IEEE TRANSACTIONS ON INFORMATION THEORY.He México and the the M.S. and Ph.D. degrees in electrical engineering from Cor- is a Fellow of the Canadian Academy of Engineering and a Fellow of the Royal nell University, Ithaca, NY, in 1998 and 2000, respectively. Society of Canada: the Academies of Arts, Humanities and Sciences of Canada. He has been a Research Staff Member with the IBM T. J. Watson Research Center, Yorktown Heights, NY, since 2000, where he is currently a member of the Memory Systems Department. His academic research interests lie in the areas of information, coding theory and statistical signal processing; his work Ashish Jagmohan received the B.Tech. degree in electrical engineering from for IBM is focused on the reliability, performance and architecture of computer the Indian Institute of Technology, Delhi, in 1999 and the M.S. and Ph.D. de- memory systems with a special interest on nonvolatile memory technologies. grees in electrical engineering from the University of Illinois, at Urbana -Cham- paign, in 2002 and 2004, respectively. Since 2004, he has worked at the Departments of Multimedia Technologies and Memory Systems in the IBM T. J. Watson Research Center, Yorktown En-Hui Yang (M’97–SM’00–F’08) was born in Jiangxi, China, on December Heights, NY. His research interests include memory system technologies, video 26, 1966. He received the B.S. degree in applied mathematics from HuaQiao compression, multimedia communication, signal processing, and information University, Qianzhou, China, and the Ph.D. degree in mathematics from Nankai theory. University, Tianjin, China, in 1986 and 1991, respectively. Since June 1997, he has been with the Department of Electrical and Com- puter Engineering, University of Waterloo, ON, Canada, where he is currently a Professor and Canada Research Chair in information theory and multimedia Jun Chen (S’03–M’06) received the B.E. degree with honors in communication compression. He held a Visiting Professor position at the Chinese University engineering from Shanghai Jiao Tong University, Shanghai, China, in 2001 and of Hong Kong, Hong Kong, from September 2003 to June 2004; positions of the M.S. and Ph.D. degrees in electrical and computer engineering from Cornell Research Associate and Visiting Scientist at the University of Minnesota, Min- University, Ithaca, NY, in 2003 and 2006, respectively. neapolis-St. Paul, the University of Bielefeld, Bielefeld, Germany, and the Uni- He was a Postdoctoral Research Associate in the Coordinated Science Lab- versity of Southern California, Los Angeles, from January 1993 to May 1997; oratory at the University of Illinois at Urbana-Champaign, Urbana, IL, from and a faculty position (ﬁrst as an Assistant Professor and then an Associate 2005 to 2006, and a Josef Raviv Memorial Postdoctoral Fellow at the IBM T. Professor) at Nankai University, Tianjin, China, from 1991 to 1992. He is the J. Watson Research Center, Yorktown Heights, NY, from 2006 to 2007. He is founding Director of the Leitch-University of Waterloo multimedia commu- currently an Assistant Professor of Electrical and Computer Engineering at Mc- nications lab, and a Co-Founder of SlipStream Data Inc. (now a subsidiary Master University, Hamilton, ON, Canada. He holds the Barber-Gennum Chair of Research In Motion). His current research interests are: multimedia com- in Information Technology. His research interests include information theory, pression, multimedia watermarking, multimedia transmission, digital commu- wireless communications, and signal processing.

Authorized licensed use limited to: Jun Chen. Downloaded on November 18, 2009 at 20:26 from IEEE Xplore. Restrictions apply.