High-Rate Quantization and Transform Coding with Side Information at the Decoder∗ David Rebollo-Monedero†, Shantanu Rane, Anne Aaron and Bernd Girod

Home , Transform coding

Information Systems Lab., Dept. of Electrical Engineering Stanford University, Stanford, CA 94305, USA

Abstract We extend high-rate quantization theory to Wyner-Ziv coding, i.e., lossy source coding with side information at the decoder. Ideal Slepian-Wolf coders are assumed, thus rates are conditional entropies of quantization indices given the side information. This theory is applied to the analysis of orthonormal block transforms for Wyner-Ziv coding. A formula for the optimal rate allocation and an approximation to the optimal transform are derived. The case of noisy high-rate quantization and transform coding is included in our study, in which a noisy observation of source data is available at the encoder, but we are interested in estimating the unseen data at the decoder, with the help of side information. We implement a transform-domain Wyner-Ziv video coder that encodes frames independently but decodes them conditionally. Experimental results show that using the discrete cosine transform results in a rate-distortion improvement with respect to the pixel-domain coder. Transform coders of noisy images for diﬀerent communication constraints are compared. Experimental results show that the noisy Wyner-Ziv transform coder achieves a performance close to the case in which the side information is also available at the encoder.

Keywords: high-rate quantization, transform coding, side information, Wyner-Ziv coding, distributed source coding, noisy source coding

1. Introduction nating motion compensation, and decoding using past frames as side information, while keeping the Rate-distortion theory for distributed source eﬃciency close to that of motion-compensated encoding [3–6] shows that under certain conditions, coding [9–11]. In addition, even if the image cap- the performance of coders with side information tured by the video encoder is corrupted by noise, available only at the decoder is close to the case we would still wish to recover the clean, unseen in which both encoder and decoder have access data at the decoder, with the help of side infor- to the side information. Under much more re- mation, consisting of previously decoded frames, strictive statistical conditions, this also holds for and perhaps some additional local noisy image. coding of noisy observations of unseen data [7,8]. In these examples, due to complexity con- One of the many applications of this result is re- straints in the design of the encoder, or simply ducing the complexity of video encoders by elimi- due to the unavailability of the side information

∗ at the encoder, conventional, joint denoising and This work was supported by NSF under Grant No. CCR- coding techniques are not possible. We need prac- 0310376. The material in this paper was presented partially at the 37th and 38th editions of the Asilomar Confer- tical systems for noisy source coding with decoder ence on Signals, Systems and Computers, Pacific Grove, side information, capable of the rate-distortion CA, Nov. 2003 [1] and 2004 [2]. † performance predicted by information-theoretic Corresponding author. Tel.: +1-650-724-3647. E-mail studies. To this end, it is crucial to extend the address: [email protected] (D. Rebollo-Monedero). 2 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod building blocks of traditional source coding and tings. In particular, [32] considered the important denoising, such as lossless coding, quantization, case of ideal Slepian-Wolf coding of the quantiza- transform coding and estimation, to distributed tion indices, at a rate equal to the conditional source coding. entropy given the side information. In [34–36], It was shown by Slepian and Wolf [3] that loss- nested lattice quantizers and trellis-coded quan- less distributed coding can achieve the same per- tizers followed by Slepian-Wolf coders were used formance as joint coding. Soon after, Wyner and to implement WZ coders. Ziv [4,12] established the rate-distortion limits for The Karhunen-Loève Transform (KLT) [37– lossy coding with side information at the decoder, 39] for distributed source coding was investigated which we shall refer to as Wyner-Ziv (WZ) cod- in [40,41], but it was assumed that the covariance ing. Later, an upper bound on the rate loss due matrix of the source vector given the side infor- to the unavailability of the side information at mation does not depend on the values of the side the encoder was found in [5], which also proved information, and the study was not in the con- that for power-difference distortion measures and text of a practical coding scheme with quantiz- smooth source probability distributions, this rate ers for distributed source coding. Very recently, loss vanishes in the limit of small distortion. A the distributed KLT was studied in the context similar high-resolution result was obtained in [13] of compression of Gaussian source data, assum- for distributed coding of several sources with- ing that the transformed coefficients are coded at out side information, also from an information- the information-theoretic rate-distortion perfor- theoretic perspective, that is, for arbitrarily large mance [42,43]. Most of the recent experimental dimension. In [14] (unpublished), it was shown work on WZ coding uses transforms [44,1,24]. that tessellating quantizers followed by Slepian- There is extensive literature on source cod- Wolf coders are asymptotically optimal in the ing of a noisy observation of an unseen source. limit of small distortion and large dimension. The nondistributed case was studied in [45–47], It may be concluded from the proof of the con- and [7,48–50,8] analyzed the distributed case verse to the WZ rate-distortion theorem [4] that from an information-theoretic point of view. Us- there is no asymptotic loss in performance by con- ing Gaussian statistics and Mean-Squared Error sidering block codes of sufficiently large length, (MSE) as a distortion measure, [13] proved that which may be seen as vector quantizers, followed distributed coding of two noisy observations with- by fixed-length coders. This suggests a conve- out side information can be carried out with a nient implementation of WZ coders as quantiz- performance close to that of joint coding and de- ers, possibly preceded by transforms, followed by noising, in the limit of small distortion and large Slepian-Wolf coders, analogously to the imple- dimension. Most of the operational work on dis- mentation of nondistributed coders. tributed coding of noisy sources, that is, for a Practical distributed lossless coding schemes fixed dimension, deals with quantization design have been proposed, adapting channel coding for a variety of settings [51–54], but does not con- techniques such as turbo codes and low-density sider the characterization of such quantizers at parity-check codes, which are approaching the high rates or transforms. Slepian-Wolf bound, e.g., [15–23]. See [24] for a A key aspect in the understanding of opera- much more exhaustive list. tional coding is undoubtedly the theoretic charac- The first studies on quantizers for WZ cod- terization of quantizers at high rates [55], which is ing were based on high-dimensional nested lat- also fundamental in the theoretic study of trans- tices [25–27], or heuristically designed scalar forms for data compression [56]. In the litera- quantizers [16,28], often applied to Gaussian ture reviewed at this point, the studies of high- sources, with fixed-length coding or entropy cod- rate distributed coding are information theoretic, ing of the quantization indices. A different ap- thereby requiring arbitrarily large dimension, proach was followed in [29–32], where the Lloyd among other constraints. On the other hand, the algorithm [33] was generalized for a variety of set- aforementioned studies of transforms applied to High-Rate Quantization and Transform Coding with Side Information at the Decoder 3 compression are valid only for Gaussian statistics 2. High-Rate WZ Quantization of Clean Sources and assume that the transformed coefficients are coded at the information-theoretic limit. We study the properties of high-rate quantizers In this paper, we provide a theoretic charac- for the WZ coding setting in Fig. 1. The source terization of high-rate WZ quantizers for a fixed dimension, assuming ideal Slepian-Wolf coding of X Q Xˆ qx() xˆ(,qy ) the quantization indices, and we apply it to de- velop a theoretic analysis of orthonormal transforms for WZ coding. Both the case of coding Y of directly observed data, and the case of coding of a noisy observation of unseen data, are consid- Fig. 1. WZ quantization. ered. We shall refer to these two cases as coding of clean sources and noisy sources, respectively. data to be quantized is modeled by a continuous The material in this paper was presented partially random vector X of finite dimension n. Let the in [1,2]. quantization function q(x) map the source data Section 2 presents a theoretic analysis of high- into the quantization index Q. A random vari- rate quantization of clean sources, and Section 3, able Y , distributed in an arbitrary alphabet, dis- of noisy sources. This analysis is applied to the crete or continuous, plays the role of side infor- study of transforms of the source data in Sec- mation, available only at the receiver. The side tions 4 and 5, also for clean and noisy sources, re- information and the quantization index are used spectively. Section 6 analyzes the transformation jointly to estimate the source data. Let Xˆ repre- of the side information itself. In Section 7, exper- sent this estimate, obtained with the reconstruc- imental results on a video compression scheme tion functionx ˆ(q,y). using WZ transform coding, and also on image MSE is used as a distortion measure, thus denoising, are shown to illustrate the clean and the expected distortion per sample is D = 1 ˆ 2 noisy coding cases. n E X − X . The rate for asymmetric Slepian- Throughout the paper, we follow the conven- Wolf coding of Q given Y has been shown to be tion of using uppercase letters for random vari- H(Q|Y ) in the case when the two alphabets of the ables, including random scalars, vectors, or ab- random variables involved are finite [3], in the stract random ensembles, and lowercase letters sense that any rate greater than H(Q|Y ) would for the particular values that they take on. The allow arbitrarily low probability of decoding er- measurable space in which a random variable ror, but any rate lesser than H(Q|Y ) would not. takes values will be called alphabet. Let X In [57], the validity of this result has been general- be a random variable, discrete, continuous or ized to countable alphabets, but it is still assumed in an arbitrary alphabet, possibly vector valued. that H(Y ) < ∞. We show in Appendix A that Its probability function, if it exists, will be de- the asymmetric Slepian-Wolf result remains true noted by pX (x), whether it is a probability mass under the assumptions in this paper, namely, for function (PMF) or a probability density func- any Q in a countable alphabet and any Y in an tion (PDF)(a). For notational convenience, the arbitrary alphabet, possibly continuous, regard- covariance operator Cov and the letter Σ will be less of the finiteness of H(Y ). The formulation used interchangeably. For example, the condi- in this work assumes that the coding of the in- tional covariance of X| y is the matrix function dex Q with side information Y is carried out by an ΣX|Y (y)=Cov[X| y]. ideal Slepian-Wolf coder, with negligible decoding error probability and rate redundancy. The expected rate per sample is defined accordingly 1 as R = n H(Q| Y ) [32]. (a)As a matter of fact, a PMF is a PDF with respect to We emphasize that the quantizer only has ac- the counting measure. cess to the source data, not to the side informa- 4 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

tion. However, the joint statistics of X and Y are satisfies, for large RX|Y (y), assumed to be known, and are exploited in the 2 design of q(x) andx ˆ(q,y). We consider the prob- DX|Y (y) Mn V (y) n , (1) lem of characterizing the quantization and recon- R 1 | − X|Y (y) n (h(X y) log2 V (y)) , (2) struction functions that minimize the expected 2 X|y − R y D n h( ) 2 X|Y ( ) Lagrangian cost C = D + λ R, with λ a nonnega- X|Y (y) Mn 2 2 . (3) R tive real number, for high rate . Then, there exists an asymptotically optimal The theoretic results are presented in Theo- quantizer q(x) for large R, for the WZ coding set- rem 1. The theorem holds if the Bennett as- ting considered such that: sumptions [58,59] apply to the conditional PDF pX|Y (x| y) for each value of the side informa- 1. q(x) is a uniform tessellating quantizer with tion y, and if Gersho’s conjecture [60] is true minimum moment of inertia Mn and cell (known to be the case for n = 1), among other volume V . technical conditions, mentioned in [55]. For a rig- 2. No two cells of the partition defined by q(x) orous treatment of high-rate theory that does not need to be mapped into the same quantiza- rely on Gersho’s conjecture, see [61,62]. tion index. We shall use the term uniform tessellating quantizer in reference to quantizers whose quan- 3. The rate and distortion satisfy tization regions are possibly rotated versions of 2 DMn V n , (4) a common convex polytope, with equal volume. R 1 | − Lattice quantizers are, strictly speaking, a par- n (h(X Y ) log2 V ) , (5) ticular case. In the following results, Gersho’s 2 h(X|Y ) −2R DMn 2 n 2 . (6) conjecture for nondistributed quantizers, which allows rotations, will be shown to imply that op- Proof: The proof uses the quantization setting timal WZ quantizers are also tessellating quan- in Fig. 2, which we shall refer to as a condi- tizers, and the uniformity of the cell volume will tional quantizer, along with an argument of opti- (b) be proved as well . Mn denotes the minimum mal rate allocation for q(x| y), where q(x| y)can normalized moment of inertia of the convex poly- be regarded as a quantizer on the values x and y Rn topes tessellating (e.g., M1 =1/12). taken by the source data and the side information, or a family of quantizers on x indexed by y. Theorem 1 (High-rate WZ quantization). Sup- In this case, the side information Y is available pose that for each value y in the alphabet of Y , the statistics of X given Y = y are such that X Q Xˆ the conditional differential entropy h(X| y) ex- qx(|) y xˆ(,qy ) ists and is finite. Suppose further that for each y, there exists an asymptotically optimal entropy-constrained uniform tessellating quan- Y tizer of x, q(x| y), with rate RX|Y (y) and distortion DX|Y (y),withnotwocellsassignedtothe Fig. 2. Conditional quantizer. same index and with cell volume V (y) > 0,which to the sender, and the design of the quantiza- (b)A tessellating quantizer need not be uniform. A triv- tion function q(x| y)onx, for each value y,isa ial example is a partition of the real line into intervals of nondistributed entropy-constrained quantization different length. In 1 dimension, uniform tesselating quan- problem. More precisely, for all y define tizers are uniform lattice quantizers. It is easy to construct simple examples in R2 of uniform and nonuniform tesse- 1 2 DX|Y (y)= E[X − Xˆ | y], lating quantizers that are not lattice quantizers using rec- n 1 tangles. However, the optimal nondistributed, fixed-rate RX|Y (y)= n H(Q| y), quantizers for dimensions 1 and 2 are known to be lattices: C D R the Z-lattice and the hexagonal lattice respectively. X|Y (y)= X|Y (y)+λ X|Y (y). High-Rate Quantization and Transform Coding with Side Information at the Decoder 5

D D | 1 n By iterated expectation, =E X|Y (Y )and Proof: Use h(X Y )= 2 log2 (2πe) det ΣX|Y , R =ERX|Y (Y ), thus the overall cost satisfies → 1 and Mn 2πe [63], together with Theorem 1. C =ECX|Y (Y ). As a consequence, a family of quantizers q(x| y) minimizing CX|Y (y)foreachy also minimizes C. C R Since X|Y (y) is a convex function of X|Y (y) 3. High-Rate WZ Quantization of Noisy Sources for all y, it has a global minimum where its derivative vanishes, or equivalently, at RX|Y (y) such In this section, we study the properties of high- that λ 2ln2DX|Y (y). Suppose that λ is small rate quantizers of a noisy source with side infor- enough for RX|Y (y) to be large and for the ap- mation at the decoder, as illustrated in Fig. 3, proximations (1)-(3) to hold, for each y. Then, all which we shall refer to as WZ quantizers of a quantizers q(x| y) introduce the same distortion noisy source. A noisy observation Z of some un- (proportional to λ) and consequently have a common cell volume V (y) V . This, together with Z Q Xˆ the fact that EY [h(X| y)]y=Y =h(X| Y ), implies qz() xˆ(,qy ) (4)-(6). Provided that a translation of the partition defined by q(x| y) affects neither the distortion nor the rate, all uniform tessellating quantiz- Y ers q(x| y) may be set to be (approximately) the Fig. 3. WZ quantization of a noisy source. same, which we denote by q(x). Since none of the quantizers q(x| y) maps two cells into the same seen source data X is quantized at the encoder. indices, neither does q(x). Now, since q(x)is The quantizer q(z) maps the observation into a asymptotically optimal for the conditional quan- quantization index Q. The quantization index is tizer and does not depend on y, it is also optimal losslessly coded, and used jointly with some side for the WZ quantizer in Fig. 1. information Y , available only at the decoder, to obtain an estimate Xˆ of the unseen source data. Equation (6) means that, asymptotically, there xˆ(q,y) denotes the reconstruction function at the is no loss in performance by not having access to decoder. X, Y and Z are random variables with the side information in the quantization. known joint distribution, such that X is a con- Corollary 2 (High-rate WZ reconstruction). Un- tinuous random vector of finite dimension n.No der the hypotheses of Theorem 1, asymptotically, restrictions are imposed on the alphabets of Y there is a quantizer that leads to no loss in per- and Z. formance by ignoring the side information in the MSE is used as a distortion measure, thus reconstruction. the expected distortion per sample of the unseen 1 ˆ 2 source is D = n E X − X . As in the previ- Proof: Since index repetition is not required, the ous section, it is assumed that the coding of the distortion (4) would be asymptotically the same if index Q is carried out by an ideal Slepian-Wolf 1 the reconstructionx ˆ(q,y) were of the formx ˆ(q)= coder, at rate per sample R = n H(Q| Y ). We E[X| q]. emphasize that the quantizer only has access to the observation, not to the source data or the side Corollary 3. Let X and Y be jointly Gaussian information. However, the joint statistics of X, random vectors. Then, the conditional covari- Y and Z can be exploited in the design of q(z) ance ΣX|Y does not depend on y,andforlargeR, andx ˆ(q,y). We consider the problem of characterizing the quantizers and reconstruction func- 1 tions that minimize the expected Lagrangian cost D n −2R Mn 2πe(det ΣX|Y ) 2 C = D + λR, with λ a nonnegative real number, 1 −2R for high rate R. This includes the problem in the −−−−→ (det ΣX|Y ) n 2 . n→∞ previous section as the particular case Z = X. 6 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

3.1. Nondistributed Case Take expectation to obtain (7). Note that the first term of (7) does not depend on the quantiza- We start by considering the simpler case of tion design, and the second is the MSE between X¯ quantization of a noisy source without side in- and Xˆ. formation, depicted in Fig. 4. The following Let r(q) be the codeword length function of ˆ a uniquely decodable code, that is, satisfying Z Q X −r(q) qz() xˆ()q q 2 1, with R =Er(Q). The La- grangian cost of the setting in Fig. 4 can be writ- Fig. 4. Quantization of a noisy source without side infor- ten as mation. C = 1 (E tr Cov[X| Z]+ theorem extends the main result of [46,47] to n +infEinf{x¯(Z) − xˆ(q)2 + λr(q)}), entropy-constrained quantization, valid for any xˆ(q),r(q) q rate R =H(Q), not necessarily high. Define x¯(z)=E[X| z], the best MSE estimator of X and the cost of the setting in Fig. 5 as given Z,andX¯ =¯x(Z). 1 C = n (E tr Cov[X| Z]+ Theorem 4 (MSE noisy quantization). For any +infEinf{X¯ − xˆ(q)2 + λr(q)}), nonnegative λ and any Lagrangian-cost optimal xˆ(q),r(q) q quantizer of a noisy source without side information (Fig. 4), there exists an implementation which give the same result. Now, since the ex- with the same cost in two steps: pected rate is minimized for the (admissible) rate measure r(q)=− log pQ(q) and E r(Q)=H(Q), ¯ 1. Obtain the minimum MSE estimate X. both settings give the same Lagrangian cost with 2. Quantize the estimate X¯ regarded as a a rate equal to the entropy. clean source, using a quantizer q(¯x) and Similarly to the remarks on Theorem 1, the a reconstruction function xˆ(q), minimizing hypotheses of the next theorem are believed to E X¯ − Xˆ2 + λ H(Q). hold if the Bennett assumptions apply to the PDF This is illustrated in Fig. 5. Furthermore, the pX¯ (¯x) of the MSE estimate, and if Gersho’s con- total distortion per sample is jecture is true among other technical conditions. Theorem 5 (High-rate noisy quantization). As- D = 1 (E tr Cov[X| Z]+EX¯ − Xˆ2), (7) n sume that h(X¯) < ∞ and that there exists a uni- where the first term is the MSE of the estimation form tessellating quantizer q(¯x) of X¯ with cell step. volume V that is asymptotically optimal in La- grangian cost at high rates. Then, there exists an asymptotically optimal quantizer q(z) of a noisy Z X Q Xˆ E[X |z ] qx() xˆ()q source in the setting of Fig. 4 such that: 1. An asymptotically optimal implementation Fig. 5. Optimal implementation of MSE quantization of a of q(z) is that of Theorem 4, represented noisy source without side information. in Fig. 5, with a uniform tessellating quan- Proof: The proof is a modification of that in [47], tizer q(¯x) having cell volume V . replacing distortion by Lagrangian cost. De- 2. The rate and distortion per sample satisfy fine the modified distortion measure d˜(z,xˆ)= 2 2 E[X − xˆ | z]. Since X ↔ Z ↔ Xˆ,itiseasy 1 n D n EtrCov[X| Z]+Mn V , to show that E X − Xˆ2 =Ed˜(Z, Xˆ). By the R 1 ¯ − orthogonality principle of linear estimation, n (h(X) log2 V ), 2 ¯ D 1 | n h(X) −2R d˜(z,xˆ)=E[X − x¯(z)2| z]+x¯(z) − xˆ2. n EtrCov[X Z]+Mn 2 2 . High-Rate Quantization and Transform Coding with Side Information at the Decoder 7

Proof: Immediate from Theorem 4 and conven- 4. h(X¯| Y )=h(X¯Z | Y ). tional theory of high-rate quantization of clean sources. Proof: The proof is similar to that for clean sources in Theorem 1 and only the differences are 3.2. Distributed Case emphasized. First, as in the proof of WZ quantization of a clean source, a conditional quantiza- We are now ready to consider the WZ quan- tion setting is considered, as represented in Fig. 6. tization of a noisy source in Fig. 3. Define An entirely analogous argument using conditional x¯(y,z)=E[X| y,z], the best MSE estimator costs, as defined in the proof for clean sources, of X given Y and Z, X¯ =¯x(Y,Z), and D∞ = implies that the optimal conditional quantizer is 1 | n EtrCov[X Y,Z]. The following theorem ex- an optimal conventional quantizer for each value tends the results on high-rate WZ quantization of y. Therefore, using statistics conditioned on y in Section 2 to noisy sources. The remark on the hypotheses of Theorem 5 also applies here, Z Q Xˆ where the Bennett assumptions apply instead to qzy(, ) xˆ(,qy ) the conditional PDF pX¯ |Y (¯x| y) for each y. Theorem 6 (High-rate noisy WZ quantization). Y Suppose that the conditional expectation function x¯(y,z) is additively separable, i.e., x¯(y,z)= Fig. 6. Conditional quantization of a noisy source. ¯ x¯Y (y)+¯xZ (z), and define XZ =¯xZ (Z).Sup- everywhere, by Theorem 4, the optimal condi- pose further that for each value y in the alpha- tional quantizer can be implemented as in Fig. 7, ¯| ∞ bet of Y , h(X y) < , and that there exists a with conditional costs uniform tessellating quantizer q(¯x, y) of X¯,with 2 no two cells assigned to the same index and cell 1 n DX|Y (y) n E[tr Cov[X| y,Z]| y]+Mn V (y) , volume V (y) > 0, with rate RX¯ |Y (y) and dis- R 1 ¯| − tortion DX¯ |Y (y), such that, at high rates, it is X|Y (y) n (h(X y) log2 V (y)), 1 asymptotically optimal in Lagrangian cost and DX|Y (y) n E[tr Cov[X| y,Z]| y]+ 2 n 2 DX¯ |Y (y) Mn V (y) , h(X¯ |y) −2RX|Y (y) + Mn 2 n 2 . R 1 ¯| − X¯ |Y (y) n h(X y) log2 V (y) , 2 The derivative of CX|Y (y) with respect to h(X¯ |y) −2RX¯ |Y (y) DX¯ |Y (y) Mn 2 n 2 . Then, there exists an asymptotically optimal Z X Q Xˆ ˆ quantizer q(z) for large R, for the WZ quanti- E[X yz , ] qxy(,) x(,qy ) zation setting represented in Fig. 3 such that:

1. q(z) can be implemented as an estimator Y x¯Z (z) followed by a uniform tessellating Fig. 7. Optimal implementation of MSE conditional quan- quantizer q(¯xZ ) with cell volume V . tization of a noisy source. 2. No two cells of the partition deﬁned by 2 RX|Y (y) vanishes when λ 2ln2Mn V (y) n , q(¯xZ ) need to be mapped into the same quantization index. which, as in the proof for clean sources, implies that all conditional quantizers have a common 3. The rate and distortion per sample satisfy cell volume V (y) V (however, only the second 2 term of the distortion is constant, not the over- DD n ∞ + Mn V , (8) all distortion). Taking expectation of the condi- R 1 ¯| − n (h(X Y ) log2 V ), (9) tional costs proves that (8) and (9) are valid for 2 ¯ the conditional quantizer of Fig. 7. The validity DD n h(X|Y ) −2R ∞ + Mn 2 2 . (10) of (10) for the conditional quantizer can be shown 8 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

by solving for V in (9) and substituting the result are the centroids of q(¯xZ ). into (8). Proof: In the proof of Theorem 6, q(¯xZ ) is a uni- The assumption thatx ¯(y,z)=¯xY (y)+¯xZ (z) form tessellating quantizer without index repeti- means that for two values of y, y and y ,¯x(y ,z) 1 2 1 tion, a translated copy of q(¯x, y). andx ¯(y2,z), seen as functions of z, differ only by a constant vector. Since the conditional quantizer Theorem 6 and Corollary 7 show that the WZ of X¯, q(¯x| y), is a uniform tessellating quantizer quantization setting of Fig. 3 can be implemented at high rates, a translation will neither affect the as depicted in Fig. 8, where x¯ˆZ (q,y) can be made distortion nor the rate, and thereforex ¯(y,z)can independent from y without asymptotic loss in be replaced byx ¯Z (z) with no impact on the La- performance, so that the pair q(¯xZ ), x¯ˆZ (q)form grangian cost. In addition, since all conditional a uniform tessellating quantizer and reconstructor quantizers have a common cell volume, the same for X¯Z . translation argument implies that a common un- ˆ ˆ conditional quantizer q(¯xZ ) can be used instead, Z X Z Q X Z X x ()z qx() xˆ (,qy ) with performance given by (8)-(10), and since Z Z Z conditional quantizers do not reuse indices, nei- x ()y ther does the common unconditional quantizer. Y Y The last item of the theorem follows from the Fig. 8. Asymptotically optimal implementation of MSE fact that h(¯xY (y)+X¯Z | y)=h(X¯Z | y). WZ quantization of a noisy source with additively separa- Clearly, the theorem is a generalization of The- blex ¯(y, z). orem 1, since Z = X impliesx ¯(y,z)=¯xZ (z)=z, Finally, ifx ¯Z (z) is a bijective vector field, then, trivially additively separable. under mild conditions, including continuous dif- The case in which X can be written as X = ferentiability ofx ¯Z (z) and its inverse, it can be f(Y )+g(Z)+N, for any (measurable) func- shown that h(X¯| Y ) in Theorem 6 satisfies tions f, g and any random variable N with | ¯| | d¯xZ E[N y,z] constant with (y,z), gives an exam- h(X Y )=h(Z Y ) + E log2 det dz (Z) , ple of additively separable estimator. This in- where d¯xZ (z)/dz denotes the Jacobian matrix cludes the case in which X, Y and Z are jointly ofx ¯Z (z). Gaussian. Furthermore, in the Gaussian case, sincex ¯Z (z) is an affine function and q(¯xZ )isa 4. WZ Transform Coding of Clean Sources uniform tessellating quantizer, the overall quantizer q(¯xZ (z)) is also a uniform tessellating quan- The following intermediate definitions and re- tizer, and if Y and Z are uncorrelated, then sults will be useful to analyze orthonormal trans- x¯Y (y)=E[X| y] andx ¯Z (z)=E[X| z], but not forms for WZ coding. Define the geometric ex- in general. pectation of a positive random scalar S as G S = S Observe that, according to the theorem, if the bElogb , for any positive real b different from 1. estimatorx ¯(y,z) is additively separable, there is Note that if S were discrete with probability pS (s) no asymptotic loss in performance by not using mass function pS(s), then G S = s s .The the side information at the encoder. constant factor in the rate-distortion approximation (3) can be expressed as Corollary 7. Assume the hypotheses of Theo- rem 6, and that the optimal reconstruction lev- 2 h(X|y) 2 1/n Mn 2 n = X|Y (y) det ΣX|Y (y) , els x¯ˆ(q,y) for each of the conditional quantizers 2 q(¯x, y) are simply the centroids of the quantiza- where X|Y (y) depends only on Mn and tion cells for a uniform distribution. Then, there pX|Y (x| y), normalized with covariance identity. is a WZ quantizer q(¯xZ ) that leads to no as- If h(X| y) is finite, then ymptotic loss in performance if the reconstruction 2 1/n function is xˆ(q,y)=x¯ˆZ (q)+¯xY (y),wherex¯ˆZ (q) X|Y (y) det ΣX|Y (y) > 0, High-Rate Quantization and Transform Coding with Side Information at the Decoder 9

2 2 [h(X|y)]y=Y h(X|Y ) | and since GY [2 n ]=2n , (6) is estimate of X given Y , i.e., E[X Y ]. In fact, the equivalent to orthogonality principle of conditional estimation implies 2 1/n −2R DG[X|Y (Y )] G[ det ΣX|Y (Y ) ]2 . Σ¯ X|Y +CovE[X| Y ]=CovX, We are now ready to consider the transform thus Σ¯ X|Y Cov X, with equality if and only if coding setting in Fig. 9. Let X =(X1,...,Xn) E[X| Y ] is a constant with probability 1. ˆ ˆ X1 X1c Q1c Q1c X1c X1 qc SWC xˆc Theorem 8 (WZ transform coding). Assume Ri 1 1 large so that the results for high-rate approxima- ˆ ˆ tion of Theorem 1 can be applied to each subband X 2 X 2c Q2c Q2c X 2c X 2 T U q2c SWC xˆ2c U in Fig. 9, i.e.,

1 2h(X |Y ) −2Ri X X c Qc Qc ˆ ˆ Di i n n n n X nc X n 12 2 2 . (11) qnc SWC xˆnc Suppose further that the change of the shape of the PDF of the transformed components Y with the choice of U is negligible so that 2 i G (Y ) may be considered constant, and Xi|Y Fig. 9. Transformation of the source vector. 2 that Var σ (Y ) 0, which means that the Xi|Y be a continuous random vector of finite dimen- variance of the conditional distribution does not sion n, modeling source data, and let Y be an change significantly with the side information. arbitrary random variable playing the role of side Then, minimization of the overall Lagrangian information available at the decoder, for instance, cost C is achieved when the following conditions a random vector of dimension possibly different hold: from n. The source data undergo an orthogonal transform represented by the matrix U, precisely, 1. All bands have a common distortion D.All T X = U X. Each transformed component Xi quantizers are uniform, without index repe- is coded individually with a scalar WZ quantizer tition, and with a common interval width Δ D 1 2 (represented in Fig. 1). The quantization index is such that 12 Δ . assumed to be coded with an ideal Slepian-Wolf 1 È X|Y coder, abbreviated as SWC in Fig. 9. The (entire) D 1 2 n i h( i ) −2R 2. 12 2 2 . side information Y is used for Slepian-Wolf decoding and reconstruction to obtain the transformed 3. An optimal choice of U is one that diago- ˆ estimate X , which is inversely transformed to re- nalizes Σ¯ X|Y , that is, it is the KLT for the cover an estimate of the original source vector ac- expected conditional covariance matrix. cording to Xˆ = U Xˆ . The expected distortion in subband i is Di = 4. The transform coding gain δT, which we de- 2 E(Xi − Xî) . The rate required to code the fine as the inverse of the relative decrease of quantization index Qi is Ri =H(Qi| Y ). De- distortion due to the transform, satisfies fine the total expected distortion per sample as D = 1 E X − Xˆ2, and the total expected rate 2 1/n n i G[σ (Y )] 1 Xi|Y R Ri δ per sample as = n i . We wish to mini- T 2 1/n C D R i G[σX|Y (Y )] mize the Lagrangian cost = + λ . i 2 1/n Define the expected conditional covariance i G[σX |Y (Y )] ¯ | i . ΣX|Y =EΣX|Y (Y )=EY Cov[X Y ]. Note that 1/n ¯ X|Y Σ¯ X|Y is the covariance of the error of the best det Σ 10 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

1 ¯ ¯ ¯ Proof: Since U is orthogonal, D = n Di.The Since U = UX|Y implies that ΣX|Y = ΛX|Y , minimization of the overall Lagrangian cost we conclude that the distortion is minimized precisely for that choice of U. The expression for the C 1 D R = n i + λ i transform coding gain follows immediately. i D D yields a common distortion condition, i Corollary 9 (Gaussian case). If X and Y are (proportional to λ). Equation (11) is equivalent jointly Gaussian random vectors, then it is only to necessary to assume the high-rate approximation 2 2 −2Ri Di G[X|Y (Y )] G[σX|Y (Y )] 2 . hypothesis of Theorem 8, in order for it to hold. i i D R Furthermore, if VQ and VQ denote the distor- 1/n Since Di Dfor all i, then D = i Di and tion and the rate when an optimal vector quantizer is used, then we have: D 2 1/n · G[X|Y (Y )] −1 T i ¯ X − XY i 1. ΣX|Y =Σ Σ ΣY ΣXY . 2 1/n −2R · G[σ (Y )] 2 , (12) | | Xi|Y 2. h(X Y )= i h(Xi Y ). i / 3. D 1 12 −−−−→ πe 1.53 dB. which is equivalent to Item 2 in the statement DVQ Mn n→∞ 6 of the theorem. The fact that all quantizers are / uniform and the interval width satisfies D = 1 Δ2 4. R−R 1 log 1 12 −−−−→ 1 log πe 12 VQ 2 2 Mn 2 2 6 is a consequence of Theorem 1 for one dimension. n→∞ 0.25 b/s. For any positive random scalar S such that Var S 0, it can be shown that G S E S.Itis Proof: Conditionals of Gaussian random vec- 2 assumed in the theorem that Var σX|Y (Y ) 0, i tors are Gaussian, and linear transforms preserve 2 hence Gaussianity, thus i G (Y ), which depends Xi|Y 2 2 only on the type of PDF, is constant with U. Fur- G σX|Y (Y ) E σX|Y (Y ). i i thermore, This, together with the assumption that 2 − −1 T i G (Y ) may be considered constant, im- ΣX|Y (y)=ΣX ΣXY ΣY ΣXY , Xi|Y plies that the choice of U that minimizes the dis- 2 constant with y, hence Var σX|Y (Y ) = 0. The tortion (12) is approximately equal to that mini- i 2 mizing i E σX|Y (Y ). differential entropy identity follows from the fact i that for Gaussian random vectors (conditional) Σ¯ X|Y is nonnegative definite. The spectral decomposition theorem implies that there exists an independence is equivalent to (conditional) un- correlatedness, and that this is the case for each y. orthogonal matrix U¯X|Y and a nonnegative def- To complete the proof, apply Corollary 3. inite diagonal matrix Λ¯ X|Y such that Σ¯ X|Y = ¯ ¯ ¯ T UX|Y ΛX|Y UX|Y . On the other hand, The conclusions and the proof of the previous corollary are equally valid if we only require that T ∀ ⇒ y ΣX |Y (y)=U ΣX|Y (y) U X| y be Gaussian for every y, and ΣX|Y (y)be T ⇒ Σ¯ X|Y = U Σ¯ X|Y U, constant. 2 As an additional example with Var σX|Y (Y )= where a notation analogous to that of X is used i 0, consider X = f(Y )+N, for any (measurable) for X . function f, and assume that N and Y are inde- Finally, from Hadamard’s inequality and the T pendent random vectors. ΣX|Y (y)=U ΣN U, fact that U is orthogonal, it follows that constant with y. If in addition, N is Gaussian, 2 then so is X| y. E σ (Y ) det Σ¯ X|Y = det Σ¯ X|Y . Xi|Y i High-Rate Quantization and Transform Coding with Side Information at the Decoder 11

Corollary 10 (DCT). Suppose that for each y, Then, the transform coding gain satisfies ΣX|Y (y) is Toeplitz with a square summable as- 1 2 sociated autocorrelation so that it is also asymp- n i σX|Y (Y ) δ G δ (Y ),δ(Y )= i . →∞ T T T 2 1/n totically circulant as n . In terms of the i σ (Y ) Xi|Y associated random process, this means that Xi is conditionally covariance stationary given Y , that Proof: Define is, (Xi −E[Xi| y]| y)i∈Z is second-order stationary σ2 (y)1/n for each y. Then, it is not necessary to assume i Xi|Y δ (y)= . 2 T 2 1/n that Var σX|Y (Y ) 0 in Theorem 8 in order for i σ (y) i Xi|Y it to hold, with the following modifications for U and δT: According to Theorem 8, it is clear that δT G δT(Y ). Now, for each y, since by assumption 1. The Discrete Cosine Transform (DCT) is the conditional variances are constant with i,the (c) an asymptotically optimal choice for U . numerator of δT(y) satisfies 2 1/n 2 1 2 2. The transform coding gain is given by σXi|Y (y) = σX0|Y (y)= n σXi|Y (y). i i 2 1/n i σX |Y (Y ) δ G δ (Y ),δ(Y )= i . Finally, since X = U TX and U is orthonormal, T T T 1/n det ΣX|Y (Y ) 2 − | 2| σXi|Y (y)=E[ X E[X y] y]= Proof: The proof proceeds along the same lines i 2 2 of that of Theorem 8, observing that the DCT =E[X − E[X | y] | y]= σ (y). Xi|Y matrix asymptotically diagonalizes ΣX|Y (y) for i each y, since it is symmetric and asymptotically circulant [64, Chapter 3]. 5. WZ Transform Coding of Noisy Sources Observe that the coding performance of the cases considered in Corollaries 9 and 10 would be 5.1. Fundamental Structure asymptotically the same if the transform U were Ifx ¯(y,z) is additively separable, the asymp- allowed to be a function of y. totically optimal implementation of a WZ quan- We would like to remark that there are sev- tizer established by Theorem 6 and Corollary 7, eral ways by which the transform coding gain in illustrated in Fig. 8, suggests the transform cod- Item 4 of the statement of Theorem 8, and also ing setting represented in Fig. 10. In this set- in Item 2 of Corollary 10, can be manipulated to ting, the WZ uniform tessellating quantizer and resemble an arithmetic-geometric mean ratio in- reconstructor for X¯Z , regarded as a clean source, volving the variances of the transform coefficients. have been replaced by a WZ transform coder of This is consistent with the fact that the transform clean sources, studied in Section 4. The trans- coding gain is indeed a gain. The following corol- form coder is a rotated, scaled Z-lattice quan- lary is an example. tizer, and the translation argument used in the Corollary 11. Suppose, in addition to the hypothe- proof of Theorem 6 still applies. By this argu- 2 2 ment, an additively separable encoder estimator ses of Theorem 8, that σXi|Y (y)=σX0|Y (y) for all i =1,...,n,andforally. This can be under- x¯(y,z) can be replaced by an encoder estimator stood as a weakened version of the conditional co- x¯Z (z) and a decoder estimatorx ¯Y (y) with no loss variance stationarity assumption in Corollary 10. in performance at high rates. The transform coder acts now on X¯Z ,which ¯ (c)Precisely, U T is the analysis DCT matrix, and U the undergoes the orthonormal transformation XZ = T synthesis DCT matrix. U X¯Z . Each transformed coefficient X¯Zi is 12 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

X X c Qc Qc Xˆ c Xˆ Z 1 Z 1 1 1 ˆ Z 1 Z 1 q1c SWC xZc 1

ˆ ˆ Z X Z 2 X Zc 2 Q2c Q2c X c X Xˆ T ˆ Z 2 Z 2 xZ ()z U q2c SWC xZc 2 U

X X c Qc Qc Xˆ c Xˆ Z n Z n n n ˆ Z n Z n qnc SWC xZc n x ()y Y Y

Fig. 10. WZ transform coding of a noisy source. coded separately with a WZ scalar quantizer (for All quantizers are uniform, without index a clean source), followed by an ideal Slepian- repetition, and with a common interval Wolf coder (SWC), and reconstructed with the width Δ such that D¯ Δ2/12. help of the (entire) side information Y . The re- 2 È X¯ |Y ¯ˆ D D D¯ D¯ 1 n i h( Zi ) −2R construction XZ is inversely transformed to ob- 2. = ∞ + , 12 2 2 . ˆ ˆ tain X¯Z = UX¯ . The final estimate of X is Z 3. U diagonalizes ECov[X¯Z | Y ], i.e., is the ˆ Xˆ =¯xY (Y )+X¯Z . Clearly, the last summation KLT for the expected conditional covariance could be omitted by appropriately modifying the matrix of X¯Z . reconstruction functions of each subband. All the definitions of the previous section are maintained, Proof: Apply Theorem 8 to X¯Z . Note that since ¯ ¯ ¯ ˆ ¯ ¯ˆ ¯ except for the overall rate per sample, which is X = XY + XZ and X = XY + XZ , then XZ − 1 ˆ ˆ now R = n Ri, where Ri is the rate of the X¯Z = X¯ − X¯, and use (7) for (Y,Z) instead of Z th ¯ 1 ¯ ¯ˆ 2 i subband. D = n E XZ − XZ denotes the to prove Item 2. distortion associated with the clean source X¯Z . Similarly to Theorem 6, since X¯| y =¯xY (y)+ The decomposition of a WZ transform coder of X¯Z | y,h(X¯Zi| Y )=h(X¯i| Y ). In addition, a noisy source into an estimator and a WZ trans- 1 ˆ 2 D¯ = E X¯ − X¯ and E Cov[X¯Z | Y ]= form coder of a clean source allows the direct ap- n ECov[X¯| Y ] Cov X¯. plication of the results for WZ transform coding of clean sources in Section 4. Corollary 13 (Gaussian case). If X, Y and Z are Theorem 12 (Noisy WZ transform coding). Sup- jointly Gaussian random vectors, then it is only pose x¯(y,z) is additively separable. Assume the necessary to assume the high-rate approximation hypotheses of Theorem 8 for X¯Z . In summary, hypotheses of Theorem 12, in order for it to hold. D assume that the high-rate approximation hypothe- Furthermore, if VQ denotes the distortion when ses for WZ quantization of clean sources hold for the optimal vector quantizer of Fig. 8 is used, then each subband, the change in the shape of the PDF D−D∞ 1/12 πe of the transformed components with the choice of −−−−→ 1.53 dB. D −D∞ Mn n→∞ 6 the transform U is negligible, and the variance VQ of the conditional distribution of the transformed Proof: x¯(y,z) is additively separable. Apply coefficients given the side information does not Corollary 9 to X¯Z and Y , which are jointly change significantly with the values of the side Gaussian. information. Then, there exists a WZ transform coder, represented in Fig. 10, asymptotically op- Corollary 14 (DCT). Suppose that x¯(y,z) is addi- timal in Lagrangian cost, such that: tively separable and that for each y, Cov[X¯| y]= 1. All bands introduce the same distortion D¯. Cov[X¯Z | y] is Toeplitz with a square summable High-Rate Quantization and Transform Coding with Side Information at the Decoder 13 associated autocorrelation so that it is also as- a convenient transform-domain estimation struc- ymptotically circulant as n →∞. In terms of ture. Suppose that X, Y and Z are zero-mean, the associated random processes, this means that jointly wide-sense stationary random processes. X¯i (equivalently, X¯Zi) is conditionally covariance Suppose further that they are jointly Gaussian, stationary given Y , i.e., ((X¯i − E[X¯i| y])| y)i∈Z is or, merely for simplicity, that a linear estimator Y second-order stationary for each y. of X given ( Z ) is required. Then, under cer- Then, it is not necessary to assume in Theo- tain regularity conditions, a vector Wiener filter rem 12 that the conditional variance of the trans- (hY hZ ) can be used to obtain the best linear es- formed coefficients is approximately constant with timate X¯: the values of the side information in order for it Y to hold, and the DCT is an asymptotically opti- X¯(n)=(hY hZ )(n) ∗ ( Z )(n)= mal choice for U. = hY (n) ∗ Y (n)+hZ (n) ∗ Z(n).

Proof: Apply Corollary 10 to X¯Z and Y . Observe that, in general, hY will differ from the We remark that the coding performance of the individual Wiener filter to estimate X given Y , cases considered in Corollaries 13 and 14 would and similarly for hZ . The Fourier transform of be asymptotically the same if the transform U the Wiener filter is given by and the encoder estimatorx ¯Z (z) were allowed to jω jω jω −1 depend on y. (HY HZ )(e )=SX Y (e ) S Y (e ) , (13) ( Z ) ( Z ) For any random vector Y ,setX = f(Y )+Z + NX and Z = g(Y )+NZ , where f, g are any (mea- where S denotes a power spectral density matrix. surable) functions, NX is a random vector such For example, let NY , NZ be zero-mean wide- that E[NX | y,z] is constant with (y,z), and NZ sense stationary random processes, representing is a random vector independent from Y such that additive noise, uncorrelated with each other and Cov NZ is Toeplitz. Cov[X¯| y]=Cov[Z| y]= with X, with a common power spectral density Cov NZ , thus this is an example of constant matrix SN .LetY = X + NY and Z = X + NZ conditional variance of transformed coefficients be noisy versions of X. Then, as an easy conse- which, in addition, satisfies the hypotheses of quence of (13), we conclude Corollary 14. jω jω HY (e )=HZ (e )= 5.2. Variations on the Fundamental Structure jω SX (e ) The fundamental structure of the noisy WZ = jω jω . (14) 2SX (e )+SN (e ) transform coder analyzed can be modified in a number of ways. We now consider variations The factor 2 multiplying SX in the denomina- on the encoder estimation and transform for this tor reflects the fact that 2 signals are using for structure, represented completely in Fig. 10, and denoising. Suppose now that X, Y and Z are in- partially in Fig. 11(a). Later, in Section 6, we stead blocks (of equal length) of consecutive sam- shall focus on variations involving the side infor- ples of random processes. Recall that a block mation. drawn from the convolution of a sequence with A general variation consists of performing the a filter can be represented as a product of a encoder estimation in the transform domain. Toeplitz matrix h, with entries given by the im- T T More precisely, define Z = U Z, X¯Z = U X¯Z pulse response of the filter, and a block x drawn T andx ¯Z (z )=U x¯Z (Uz ) for all z . Then, the en- from the input sequence. If the filter has finite T coder estimator satisfiesx ¯Z (z)=Ux¯Z (U z), as energy, the Toeplitz matrix h is asymptotically illustrated in Fig. 11(b). Since UUT = I, the es- circulant as the block length increases, so that T timation and transform U x¯Z (z) can be written it is asymptotically diagonalized by the Discrete T simply asx ¯Z (U z), as shown in Fig. 11(c). Fourier Transform (DFT) matrix [65,66], denoted The following informal argument will suggest by U,ash = UHUT. The matrix multiplication 14 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

X Z 1 X Zc 1 X Z 1 X Zc 1 X Zc 1 q1c q1c q1c xZ ()z

Z X Z 2 X Zc 2 Z Zc X Zc X Z 2 X Zc 2 Z Zc X Zc 2 T T T T xZ ()z U q2c U xZcc()z U U q2c U xZcc()z q2c

X Z n X Zc n X Z n X Zc n X Zc n qnc qnc qnc

(a) Fundamental structure. (b) Estimation in the transformed domain. (c) Equivalent structure.

Fig. 11. Variations of the fundamental structure of a WZ transform coder of a noisy source. y = hx, analogous to a convolution, is equiva- underlying processes (14) at the appropriate fre- lent to U Ty = HUTx, analogous to a spectral quencies. Furthermore, if the Wiener filter com- multiplication for each frequency, since H is di- ponent hZ associated withx ¯Z (z) is even, as in agonal. This suggests the following structure for the previous example, then the convolution matrix is not only Toeplitz but also symmetric, and x ()z Z the DCT can be used instead of the DFT as the transform U [64](d). An efficient method for gen- D1 Z Zc X Zc X Z eral DCT-domain filtering is presented in [67]. T U U If the transform-domain estimator is of the Dn formx ¯Z (z )=HZ z , for some diagonal matrix HZ , as in the structure suggested above, or x z more generally, ifx ¯Z (z ) operates individually on Fig. 12. Structure of the estimator ¯Z ( ) inspired by linear shift-invariant filtering. A similar structure may be used each transformed coefficient zi, then the equiva- forx ¯Y (y). lent structure in Fig. 11(c) can be further simplified to group each subband scalar estimation the estimator used in the WZ transform coder, x¯Z i(zi) and each scalar quantizer qi(zi)asasin- represented in Fig. 12:x ¯(y,z)=¯xY (y)+¯xZ (z), gle quantizer. The resulting structure transforms T wherex ¯Z (z)=UHZ U z, for some diagonal ma- the noisy observation and then uses a scalar WZ trix HZ , and similarly forx ¯Y (y). The (diagonal) quantizer of a noisy source for each subband. entries of HY and HZ can be set according to the This is in general different from the fundamen- Yi best linear estimate of Xi given . For the tal structure in Figs. 10 or 11(a), in which an Zi previous example, in which Y and Z are noisy estimator was applied to the noisy observation, observations of X, the estimation was transformed, and each transformed coefficient was quantized with a WZ quan-

−

1 tizer for a clean source. Since this modified struc- (HYiiHZii)=Σ Y Σ ⇒ X i Yi i Z ture is more constrained than the general struc- i Zi 2 ture, its performance may be degraded. However, σX ⇒ i the design of the noisy WZ scalar quantizers at HYii= HZii = 2 2 , 2σ + σ each subband, for instance using the extension of Xi Ni the Lloyd algorithm in [8], may be simpler than 2 T th the implementation of a nonlinear vector estima- where σ = ui ΣX ui is the variance of the i Xi torx ¯Z (z), or a noisy WZ vector quantizer oper- transform coefficient of X,andui the corresponding (column) analysis vector of U, and similarly (d) 2 If a real Toeplitz matrix is not symmetric, there is no for σ . Alternatively, HYiiand HZii can be ap- Ni guarantee that the DCT will asymptotically diagonalize proximated by sampling the Wiener filter for the it, and the DFT may produce complex eigenvalues. High-Rate Quantization and Transform Coding with Side Information at the Decoder 15

2 ating directly on the noisy observation vector. is equivalent to minimizing σX|Xˆ . Since

2 − ∗ T − ∗T 2 6. Transformation of the Side Information σX|Xˆ =Var[X α c Y ] Var[X c Y ]=σX|Y ,

6.1. Linear Transformations the minimum is achieved, in particular, for c = c∗ ∗ ∗ Suppose that the side information is a random and α = 1 (and in general for any scaled c ). vector of finite dimension k. A very convenient The following theorems on transformation of simplification in the setting of Figs. 9 and 10 the side information are given for the more gen- would consist of using scalars, obtained by some eral, noisy case, but are immediately applicable transformation of the side information vector, in to the clean case by setting Z = X = X¯ = X¯Z . each of the Slepian-Wolf coders and in the reconstruction functions. This is represented in Fig. 13. Theorem 16 (Linear transformation of side infor- Even more conveniently, we are interested in lin- mation). Under the hypotheses of Corollary 13, ear transforms Y = V T Y that lead to a small for high rates, the transformation of the side in- loss in terms of rate and distortion. It is not formation given by required for V to define an injective transform, T T −1 V = U Σ ¯ Σ (15) since no inversion is needed. XZ Y Y R Proposition 15. Let X be a random scalar with minimizes the total rate , with no performance loss in distortion or rate with respect to the trans- mean μX ,andletY be a k-dimensional random form coding setting of Fig. 10 (and in particu- vector with mean μY . Suppose that X and Y are jointly Gaussian. Let c ∈ Rk, which gives the lar Fig. 9), in which the entire vector Y is used ˆ T for decoding and reconstruction. Precisely, recon- linear estimate X = c Y .Then, struction functions defined by E[X¯Zi| q,y] and by ¯ | min h(X| Xˆ)=h(X| Y ), E[XZi q,yi] give approximately the same distor- c tion D¯i,andRi =H(X¯Zi| Yi ) H(X¯Zi| Y ). and the minimum is achieved for c such that Xˆ is Proof: Theorems 6 and 12 imply the best linear estimate of X − μX given Y − μY , R ¯ | ¯ | − in the MSE sense. i =H(XZi Y ) h(XZi Y ) log2 Δ,

∗ −1 ∗T Ri Proof: Set c =ΣXY Σ , so that c Y is the thus the minimization of is approximately Y ¯ | best linear estimate of X−μX given Y −μY . (The equivalent to the minimization of h(XZi Y ). ¯ assumption that Y is Gaussian implies, by defi- Since linear transforms preserve Gaussianity, XZ nition, the invertibility of ΣY , and therefore the and Y are jointly Gaussian, and Proposition 15 ¯ | applies to each XZi. V is determined by the best existence of a unique estimate.) For each y, X y 2 linear estimate of X¯Z given Y , once the means is a Gaussian random scalar with variance σX|Y , constant with y, equal to the MSE of the best have been removed. This proves that there is no affine estimate of X given Y . Since additive con- loss in rate. Corollary 2 implies that a subopti- stants preserve variances, the MSE is equal to the mal reconstruction is asymptotically as efficient, variance of the error of the best linear estimate of thus there is no loss in distortion either. − − − ∗T −1 X μX given Y μY , also equal to Var[X c Y ]. Observe that ΣX¯ Y ΣY in (15) corresponds | Z On the other hand, for each c, X xˆ is a Gaussian to the best linear estimate of X¯Z from Y ,dis- 2 random scalar with variance σX|Xˆ equal to the regarding their means. This estimate is trans- variance of the error of the best linear estimate of formed according to the same transform applied ˆ ∗ T ¯ X−μX given X−μXˆ , denoted by Var[X−α c Y ]. to X, yielding an estimate of XZ . In addi- Minimizing tion, joint Gaussianity implies the existence of a matrix B such that X¯Z = BZ. Consequently, | ˆ 1 2 h(X X)= 2 log2(2πeσX|Xˆ ) ΣX¯Z Y = BΣZY . 16 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

X X c Qc Qc Xˆ c Xˆ Z 1 Z 1 1 1 ˆ Z 1 Z 1 q1c SWC xZc 1 Y1c ˆ ˆ Z X Z 2 X Zc 2 Q2c Q2c X c X Xˆ T ˆ Z 2 Z 2 xZ ()z U q2c SWC xZc 2 U

Y2c X X c Qc Qc Xˆ c Xˆ Z n Z n n n ˆ Z n Z n qnc SWC xZc n

Ync xY ()y yc(y) Y

Fig. 13. WZ transform coding of a noisy source with transformed side information.

6.2. General Transformations Proof: Theorems 6 and 12 imply Ri = H(X¯ | Y ) h(X¯ | Y ) − log Δ. Proposition 17 Theorem 16 shows that under the hypotheses Zi Zi 2 ensures that h(X¯ | Y )=h(X¯ | Y ), and Corol- of high-rate approximation, for jointly Gaussian Zi Zi i lary 7 that a suboptimal reconstruction is asymp- statistics, the side information could be lin- totically as efficient if Y is used instead of Y . early transformed and a scalar estimate used for i Slepian-Wolf decoding and reconstruction in each subband, instead of the entire vector Y , with no In view of these results, Theorem 16 inciden- asymptotic loss in performance. Here we extend tally shows that in the Gaussian case, the best this result to general statistics, connecting WZ linear MSE estimate is a sufficient statistic, which coding and statistical inference. can also be proven directly (for instance com- Let X and Θ be random variables, represent- bining Propositions 15 and 17). The obtention ing, respectively, an observation and some data of (minimal) sufficient statistics has been stud- we wish to estimate. A statistic for Θ from X is ied in the field of statistical inference, and the a random variable T such that Θ ↔ X ↔ T ,for Lehmann-Scheffé method is particularly useful instance, any function of X. A statistic is suffi- (e.g. [69]). cient if and only if Θ ↔ T ↔ X. Many of the ideas on the structure of the estimatorx ¯(y,z) presented in Section 5.2 can be Proposition 17. A statistic T for a continuous applied to the transformation of the side informa- random variable Θ from an observation X satis- tion y (y). For instance, it could be carried out in fies h(Θ| T ) h(Θ| X), with equality if and only the domain of the data transform U. If, in addi- if T is sufficient. tion,x ¯Y (y) is also implemented in the transform domain, for example in the form of Fig. 12, then, Proof: Use the data processing inequality to write in view of Fig. 13, a single transformation can be I(Θ; T ) I(Θ; X), with equality if and only if T shared as the first step of both y (y) andx ¯Y (y). ˆ is sufficient [68], and express the mutual informa- Furthermore, the summationx ¯Y (Y )+X¯Z can be tion as a difference of entropies. ˆ carried out in the transform domain, since X¯Z is available, eliminating the need to undo the trans- Theorem 18 (Reduction of side information). Un- form as the last step ofx ¯Y (y). der the hypotheses of Theorem 12 (or Corollar- Finally, suppose that the linear transform ies 13 or 14), a sufficient statistic Yi for X¯Zi in (15) is used, and that U (asymptotically) di- from Y can be used instead of Y for Slepian-Wolf agonalizes both ΣX¯Z Y and ΣY . Then, since U is decoding and reconstruction, for each subband i in orthonormal, it is easy to see that y(y)=V Ty = −1 T the WZ transform coding setting of Fig. 10, with ΛX¯Z Y ΛY U y, where Λ denotes the correspond- no asymptotic loss in performance. ing diagonal matrices and U Ty is the transformed High-Rate Quantization and Transform Coding with Side Information at the Decoder 17 side information. Of course, the scalar multipli- form distortion across bands. Rate-compatible cations for each subband may be suppressed by punctured turbo codes are used for Slepian-Wolf designing the Slepian-Wolf coders and the recon- coding in each subband. The parity bits produced struction functions accordingly, and, ifx ¯Y (y)isof by the turbo encoder are stored in a buffer which the form of Fig. 12, the additions in the transform transmits a subset of these parity bits to the de- domain can be incorporated into the reconstruc- coder upon request. The rate-compatible punc- tion functions. tured turbo code for each band can come close to achieving the ideal Slepian-Wolf rate for the 7. Experimental Results given transform band. At the decoder, we take previously recon- 7.1. Transform WZ Coding of Clean Video structed frames to generate side information Y , In [11], we apply WZ coding to build a which is used to decode X. In the first setup low-complexity, asymmetric video compression (MC-I), we perform motion-compensated inter- scheme where individual frames are encoded in- polation on the previous and next reconstructed dependently (intraframe encoding) but decoded key frames to generate Y . In the second conditionally (interframe decoding). In the pro- scheme (MC-E), we produce Y through motion- posed scheme we encode the pixel values of a compensated extrapolation using the two previ- frame independently from other frames. At the ous reconstructed frames: a key frame and a WZ frame. The DCT is applied to Y , generating the decoder, previously reconstructed frames are used as side information and WZ decoding is per- different side information coefficient bands Yi .A formed by exploiting the temporal similarities be- bank of turbo decoders reconstruct the quantized coefficient bands independently using the corre- tween the current frame and the side information. In the following experiments, we extend the sponding Yi as side information. Each coefficient WZ video codec, outlined in [11], to a trans- subband is then reconstructed as the best esti- form domain WZ coder. The spatial transform mate given the previously reconstructed symbols enables the codec to exploit the statistical de- and the side information. Note that unlike in pendencies within a frame, thus achieving better Fig. 9, where the entire Y is used as side informa- rate-distortion performance. tion for decoding and reconstructing each trans- For the simulations, the odd frames are desig- form band, in our simplified implementation, we nated as key frames which are encoded and de- only use the corresponding side information sub- coded using a conventional intraframe codec. The band. More details of the proposed scheme and even frames are WZ frames which are intraframe extended results can be found in [70]. encoded but interframe decoded, adopting the The compression results for the first 100 frames WZ transform coding setup for clean sources and of Mother & Daughter are shown in Fig. 14. MSE transformed side information, described in Sec- has been used as a distortion measure, expressed tions 4 and 6. as Peak Signal-To-Noise Ratio (PSNR) in dB, de- 2 To encode a WZ frame X, we first apply a fined as 10 log10(255 /MSE). For the plots, we blockwise DCT to generate X. As stated in only include the rate and distortion of the lumi- Corollary 10, the DCT is an asymptotically op- nance of the even frames. The even frame rate is timal choice for the orthonormal transform. Al- 15 frames per second. We compare our results to: though the system is not necessarily high rate, we 1. DCT-based intraframe coding: the even follow Theorem 8 in the quantizer design. Each frames are encoded as Intracoded (I) transform coefficient subband is independently frames. quantized using uniform scalar quantizers with no index repetitions and similar step sizes across 2. H.263+ interframe coding with an I-B-I-B bands(e). This allows for a approximately uni- predictive structure, counting only the rate and each transform band has different dynamic range, it (e)Since we use fixed-length codes for Slepian-Wolf coding is not possible to have exactly the same step sizes. 18 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

and distortion of the Bidirectionally pre- results of Sections 3, 5 and 6. The source data X dicted (B) frames. consists of all 8 × 8 blocks of the ﬁrst 25 frames of the Foreman Quarter Common Intermediate We also plot the compression results of the pixel- Format (QCIF) video sequence, with the mean domain WZ codec. For the pixel-domain WZ removed. Assume that the encoder does not codec, we quantize the pixels using the optimal know X, but has access to Z = X + V , where V scalar quantizers suggested by Theorem 1, that 2 is a block of white Gaussian noise of variance σV . is, uniform quantizers with no index repetition. The decoder has access to side information Y = 50 X + W , where W is white Gaussian noise of vari- 2 H.263+ I-B-I-B WZ MC-E 4x4 DCT ance σW . Note that this experimental setup re- WZ MC-I 4x4 DCT WZ MC-E pixel dom. produces our original statement of the problem WZ MC-I pixel dom. DCT-based intra. cod. ] of WZ quantization of noisy sources, drawn in

dB 45 Fig. 3. In our experiments, V , W and X are sta- tistically independent. In this case, E[X| y,z]is not additively separable. However, since we wish 40 to test our theoretic results which apply only to separable estimates, we constrain our estimators to be linear. Thus, in all the experiments of this 35 PSNR of even frames [ section, the estimate of X, given Y and Z is de- ﬁned as

30 −1 y 0 100 200 300 400 x¯(y,z)=ΣX( Y ) Σ Y =¯xY (y)+¯xZ (z). Rate of even frames [b/pel] Z ( Z ) z

Fig. 14. Rate and PSNR comparison of WZ codec vs. We now consider the following cases, all con- DCT-based intraframe coding and H.263+ I-B-I-B coding. structed using linear estimators and WZ 2D-DCT Mother & Daughter sequence. coders of clean sources:

As observed from the plots, when the side in- 1. Assume that Y is made available to the en- formation is highly reliable, such as when MC-I is coder estimator, perform conditional linear used, the transform-domain codec is only 0.5 dB estimation of X given Y and Z, followed better than the pixel-domain WZ codec. With by WZ transform coding of the estimate. the less reliable MC-E, using a transform before This corresponds to a conditional coding encoding results in a 2 to 2.5 dB improvement. scenario where both the encoder and the These improvements over the pixel-domain sys- decoder have access to the side informa- tem demonstrate the transform gain in a practical tion Y . This experiment is carried out for Wyner-Ziv coding setup. the purpose of comparing its performance Compared to conventional DCT-based in- with that of true WZ transform coding of a traframe coding, the WZ transform codec is noisy source. Since we are concerned with about 10 to 12 dB (with MC-I) and 7 to 9 dB the performance of uniform quantization at (with MC-E) better. The gap from H.263+ in- high rates, the quantizers qi are all chosen terframe coding is 2 dB for MC-I and about to be uniform, with the same step-size for 5 dB for MC-E. The proposed system allows low- all transform sub-bands. We assume ideal complexity encoding while approaching to the entropy coding of the quantization indices compression eﬃciency of interframe video coders. conditioned on the side-information. 7.2. WZ Transform Coding of Noisy Images 2. Perform noisy WZ transform coding of Z We implement various cases of WZ transform exactly as shown in Fig. 13. As men- coding of noisy images to conﬁrm the theoretic tioned above, the orthonormal transform High-Rate Quantization and Transform Coding with Side Information at the Decoder 19

being used is the 2D-DCT applied to 8 × 8 38.25 ¯ PSNR of best affine estimate = 38.2406 pixel blocks of XZ . As before, all quantiz- 38.2 ers are uniform with the same step-size and ideal Slepian-Wolf coding is assumed, i.e., 38.15 the Slepian-Wolf rate required to encode ] 38.1 th the quantization indices in the i sub-band dB (1) Cond. estim. & WZ is simply the conditional entropy H(Qi|Yi ). 38.05 transform coding

As seen in Fig. 13, the decoder recovers the PSNR [ 38 (2) Noisy WZ ˆ transform coding of Z estimate X¯Z , and obtains the final estimate ˆ ¯ˆ 37.95 (3) Direct WZ as X =¯xY (Y )+XZ . transform coding of Z 37.9 (4) Noisy WZ w/o side inform. in reconstruct. 37.85 3. Perform WZ transform coding directly 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 on Z, reconstruct Zˆ at the decoder and ob- Rate [b/pel] tain Xˆ =¯x(Y,Zˆ). This experiment is performed in order to investigate the penalty Fig. 15. WZ transform coding of a noisy image is as- incurred, when the the noisy input Z is ymptotically equivalent to the conditional case. Foreman sequence. treated as a clean source for WZ transform coding. 8. Conclusions If ideal Slepian-Wolf coders are used, uniform 4. Perform noisy WZ transform coding of Z tessellating quantizers without index repetition as in Case 2, except that x¯ˆZi(qi,yi)= are asymptotically optimal at high rates. It is ¯ | E[Xi qi], i.e., the reconstruction function known [5] that the rate loss in the WZ problem does not use the side information Y . This for smooth continuous sources and quadratic dis- experiment is performed in order to exam- tortion vanishes as D→0. Our work shows that ine the penalty incurred at high rates for the this is true also for the operational rate loss and situation described in Corollary 7, where for each finite dimension n. the side-information is used for Slepian- The theoretic study of transforms shows that Wolf encoding but ignored in the recon- (under certain conditions) the KLT of the source struction function. vector is determined by its expected conditional covariance given the side information, which is Fig. 15 plots rate vs. PSNR for the above cases, approximated by the DCT for conditionally sta- 2 2 2 with σV = σW = 25, and σX = 2730 (mea- tionary processes. Experimental results confirm sured). The performance of conditional estima- that the use of the DCT may lead to important tion (Case 1) and WZ transform coding (Case 2) performance improvements. are in close agreement at high rates as predicted If the conditional expectation of the unseen by Theorem 12. Our theory does not explain the source data X given the side information Y and behavior at low rates. Experimentally, we ob- the noisy observation Z is additively separable, served that Case 2 slightly outperforms Case 1 then, at high rates, optimal WZ quantizers of Z at lower rates. Both cases show superior rate- can be decomposed into estimators and uniform distortion performance than direct WZ coding tessellating quantizers for clean sources, achieving of Z (Case 3). Neglecting the side-information the same rate-distortion performance as if the side in the reconstruction function (Case 4) is ineffi- information were available at the encoder. This cient at low rates, but at high rates, this sim- is consistent with the experimental results of the pler scheme approaches the performance of Case 2 application of the Lloyd algorithm for noisy WZ with the ideal reconstruction function, thus con- quantization design in [54]. firming Corollary 7. The additive separability condition for high- 20 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod rate WZ quantization of noisy sources, albeit less [71, Lemma 2.1], as assumed in the Wyner-Ziv restrictive, is similar to the condition required theorem, and the two regularity conditions re- for zero rate loss in the quadratic Gaussian noisy quired by theorem are clearly satisfied [12, Equa- Wyner-Ziv problem [8], which applies exactly for tion (2.2),Theorems 2.1,2.2]. Proceeding as in [4, any rate but requires arbitrarily large dimension. Remark 3, p.3], and realizing that Fano’s inequal- We propose a WZ transform coder of noisy ity is still valid for arbitrary Y , the asymmetric sources consisting of an estimator and a WZ Slepian-Wolf result follows immediately from the transform coder for clean sources. Under certain the Wyner-Ziv theorem. This proves the case of conditions, in particular if the encoder estimate finite Q and arbitrary Y . is conditionally covariance stationary given Y , In the case when the alphabet of Q is countably the DCT is an asymptotically optimal transform. infinite, observe that for any rate R > H(Q|Y ), The side information for the Slepian-Wolf decoder there exists a finite quantization Q˜ of Q such and the reconstruction function in each subband that R > H(Q|Y ) H(Q˜|Y ), and Q˜ can be can be replaced by a sufficient statistic with no coded with arbitrarily low probability of decod- asymptotic loss in performance. ing error between the original Q and the decoded Q˜. This shows achievability. To prove A. Appendix: Asymmetric Slepian-Wolf Coding the converse, suppose that a rate R < H(Q|Y ) with Arbitrary Side Information is admissible. For any such R, simply by the definition of conditional entropy for general al- We show that the asymmetric Slepian-Wolf phabets in terms of finite measurable partitions, coding result extends to sources in countable al- there exists a finite quantization Q˜ of Q such that phabets and arbitrary, possibly continuous, side R < H(Q˜|Y ) H(Q|Y ). Since the same code information. Let U, V be arbitrary random vari- used for Q could be used for Q˜, with trivial mod- ables. The general definition of conditional en- ifications, with arbitrarily small probability of er- | | tropy used here is H(U V )=I(U; U V ), with the ror, this would contradict the converse for the conditional mutual information defined in [71], finite-alphabet case. consistent with [12]. The proposition can also be proven directly Proposition 19 (Asymmetric Slepian-Wolf Cod- from the Slepian-Wolf result for finite alphabets, ing). Let Q be a discrete random variable repre- without invoking the Wyner-Ziv theorem for gen- senting source data to be encoded, and let Y be eral sources. The arguments used are somewhat an arbitrary random variable acting as side in- lengthier, but similarly to the previous proof, formation in an asymmetric Slepian-Wolf coder exploit the definitions of general information- R | of Q. Any rate satisfying > H(Q Y ) is admis- theoretic quantities in terms of finite measurable sible in the sense that a code exists with arbitrar- partitions. ily low probability of decoding error, and any rate R < H(Q|Y ) is not admissible. Acknowledgment Proof: The statement follows from the Wyner- The authors would like to thank the anony- Ziv theorem for general sources [12], that is, the mous reviewers for their helpful comments, which one-letter characterization of the Wyner-Ziv rate- motivated a number improvements in this paper. distortion function for source data and side information distributed in arbitrary alphabets. Sup- pose first that the alphabet of Q is finite, and References the alphabet of Y is arbitrary. The reconstruc- [1] D. Rebollo-Monedero, A. Aaron, B. Girod, Trans- tion alphabet in the Wyner-Ziv coding setting is forms for high-rate distributed source coding, in: chosen to be the same as that of Q, and the distor- Proc. Asilomar Conf. Signals, Syst., Comput., Vol. 1, Pacific Grove, CA, 2003, pp. 850–854, invited paper. tion measure is the Hamming distance d(q, qˆ)= [2] D. Rebollo-Monedero, S. Rane, B. Girod, Wyner-Ziv 1{q =ˆ q}. Then, I(Q; Y )=H(Q) − H(Q|Y ) < ∞ quantization and transform coding of noisy sources at High-Rate Quantization and Transform Coding with Side Information at the Decoder 21

high rates, in: Proc. Asilomar Conf. Signals, Syst., tributed source coding technique for correlated images Comput., Vol. 2, Pacific Grove, CA, 2004, pp. 2084 – using turbo-codes, IEEE Commun. Lett. 6 (9) (2002) 2088. 379–381. [3] J. D. Slepian, J. K. Wolf, Noiseless coding of corre- [22] A. D. Liveris, Z. Xiong, C. N. Georghiades, Compres- lated information sources, IEEE Trans. Inform. The- sion of binary sources with side information at the de- ory IT-19 (1973) 471–480. coder using LDPC codes, IEEE Commun. Lett. 6 (10) [4] A.D.Wyner,J.Ziv,Therate-distortionfunctionfor (2002) 440–442. source coding with side information at the decoder, [23] D. Schonberg, S. S. Pradhan, K. Ramchandran, IEEE Trans. Inform. Theory IT-22 (1) (1976) 1–10. LDPC codes can approach the Slepian Wolf bound [5] R. Zamir, The rate loss in the Wyner-Ziv problem, for general binary sources, in: Proc. Allerton Conf. IEEE Trans. Inform. Theory 42 (6) (1996) 2073–2084. Commun., Contr., Comput., Allerton, IL, 2002. [6] T. Linder, R. Zamir, K. Zeger, On source coding [24] B. Girod, A. Aaron, S. Rane, D. Rebollo-Monedero, with side-information-dependent distortion measures, Distributed video coding, in: Proc. IEEE, Special Is- IEEE Trans. Inform. Theory 46 (7) (2000) 2697–2704. sue Advances Video Coding, Delivery, Vol. 93, 2005, [7] H. Yamamoto, K. Itoh, Source coding theory for pp. 71–83, invited paper. multiterminal communication systems with a remote [25] R. Zamir, S. Shamai, Nested linear/lattice codes for source, Trans. IECE Japan E63 (1980) 700–706. Wyner-Ziv encoding, in: Proc. IEEE Inform. Theory [8] D. Rebollo-Monedero, B. Girod, A generalization Workshop (ITW), Killarney, Ireland, 1998, pp. 92–93. of the rate-distortion function for Wyner-Ziv coding [26] R. Zamir, S. Shamai, U. Erez, Nested linear/lattice of noisy sources in the quadratic-Gaussian case, in: codes for structured multiterminal binning, IEEE Proc. IEEE Data Compression Conf. (DCC), Snow- Trans. Inform. Theory 48 (6) (2002) 1250–1276. bird, UT, 2005, pp. 23–32. [27] S. D. Servetto, Lattice quantization with side informa- [9] H. S. Witsenhausen, A. D. Wyner, Interframe coder tion, in: Proc. IEEE Data Compression Conf. (DCC), for video signals, U.S. Patent 4191970 (Nov. 1980). Snowbird, UT, 2000, pp. 510–519. [10] R. Puri, K. Ramchandran, PRISM: A new robust [28] J. Kusuma, L. Doherty, K. Ramchandran, Distributed video coding architecture based on distributed com- compression for sensor networks, in: Proc. IEEE Int. pression principles, in: Proc. Allerton Conf. Com- Conf. Image Processing (ICIP), Vol. 1, Thessaloniki, mun., Contr., Comput., Allerton, IL, 2002. Greece, 2001, pp. 82–85. [11] A. Aaron, R. Zhang, B. Girod, Wyner-Ziv coding of [29] M. Fleming, M. Effros, Network vector quantization, motion video, in: Proc. Asilomar Conf. Signals, Syst., in: Proc. IEEE Data Compression Conf. (DCC), Comput., Pacific Grove, CA, 2002, pp. 240–244. Snowbird, UT, 2001, pp. 13–22. [12] A. D. Wyner, The rate-distortion function for source [30] M. Fleming, Q. Zhao, M. Effros, Network vector quan- coding with side information at the decoder—II: Gen- tization, IEEE Trans. Inform. Theory 50 (8) (2004) eral sources, Inform., Contr. 38 (1) (1978) 60–80. 1584–1604. [13] R. Zamir, T. Berger, Multiterminal source coding [31] J. Cardinal, G. V. Asche, Joint entropy-constrained with high resolution, IEEE Trans. Inform. Theory multiterminal quantization, in: Proc. IEEE Int. 45 (1) (1999) 106–117. Symp. Inform. Theory (ISIT), Lausanne, Switzerland, [14] H. Viswanathan, Entropy coded tesselating quantiza- 2002, p. 63. tion of correlated sources is asymptotically optimal, [32] D. Rebollo-Monedero, R. Zhang, B. Girod, Design of unpublished (1996). optimal quantizers for distributed source coding, in: [15] M. E. Hellman, Convolutional source encoding, IEEE Proc. IEEE Data Compression Conf. (DCC), Snow- Trans. Inform. Theory IT-21 (6) (1975) 651–656. bird, UT, 2003, pp. 13–22. [16] S. S. Pradhan, K. Ramchandran, Distributed source [33] S. P. Lloyd, Least squares quantization in PCM, IEEE coding using syndromes (DISCUS): Design and con- Trans. Inform. Theory IT-28 (1982) 129–1373. struction, in: Proc. IEEE Data Compression Conf. [34] Z. Xiong, A. Liveris, S. Cheng, Z. Liu, Nested quanti- (DCC), Snowbird, UT, 1999, pp. 158–167. zation and Slepian-Wolf coding: A Wyner-Ziv coding [17] J. Garc´ıa-Fr´ıas, Y. Zhao, Compression of correlated paradigm for i.i.d. sources, in: Proc. IEEE Workshop binary sources using turbo codes, IEEE Commun. Stat. Signal Processing (SSP), St. Louis, MO, 2003, Lett. 5 (10) (2001) 417–419. pp. 399–402. [18] J. Bajcsy, P. Mitran, Coding for the Slepian Wolf [35] Y. Yang, S. Cheng, Z. Xiong, W. Zhao, Wyner-Ziv problem with turbo codes, in: Proc. IEEE Global coding based on TCQ and LDPC codes, in: Proc. Telecomm. Conf. (GLOBECOM), Vol. 2, 2001, pp. Asilomar Conf. Signals, Syst., Comput., Vol. 1, Pacific 1400–1404. Grove, CA, 2003, pp. 825–829. [19] G.-C. Zhu, F. Alajaji, Turbo codes for nonuniform [36] Z. Xiong, A. D. Liveris, S. Cheng, Distributed source memoryless sources over noisy channels, IEEE Com- coding for sensor networks, IEEE Signal Processing mun. Lett. 6 (2) (2002) 64–66. Mag. 21 (5) (2004) 80–94. [20] A. Aaron, B. Girod, Compression with side informa- [37] H. Hotelling, Analysis of a complex of statistical vari- tion using turbo codes, in: Proc. IEEE Data Compres- ables into principal components, J. Educ. Psychol. 24 sion Conf. (DCC), Snowbird, UT, 2002, pp. 252–261. (1933) 417–441, 498–520. [21] A. D. Liveris, Z. Xiong, C. N. Georghiades, A dis- [38] K. Karhunen, Uber¨ lineare methoden in der 22 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

wahrscheinlichkeitsrechnung, Ann. Acad. Sci. Fenn., [55] R. M. Gray, D. L. Neuhoff, Quantization, IEEE Trans. Ser. A I Math.-Phys. 37 (1947) 3–79. Inform. Theory 44 (1998) 2325–2383. [39] M. Loève, Fonctions aléatoires du second ordre, in: [56] V. K. Goyal, Theoretical foundations of transform P. Lévy (Ed.), Processus stochastiques et mouvement coding, IEEE Signal Processing Mag. 18 (5) (2001) Brownien, Gauthier-Villars, Paris, France, 1948. 9–21. [40] M. Gastpar, P. L. Dragotti, M. Vetterli, The distrib- [57] T. M. Cover, A proof of the data compression theorem uted Karhunen-Loève transform, in: Proc. IEEE Int. of Slepian and Wolf for ergodic sources, IEEE Trans. Workshop Multimedia Signal Processing (MMSP), St. Inform. Theory 21 (2) (1975) 226–228, (Corresp.). Thomas, US Virgin Islands, 2002, pp. 57–60. [58] W. R. Bennett, Spectra of quantized signals, Tech. [41] M. Gastpar, P. L. Dragotti, M. Vetterli, The dis- J. 27, Bell Syst. (Jul. 1948). tributed, partial, and conditional Karhunen-Loève [59] S. Na, D. L. Neuhoff, Bennett’s integral for vector transforms, in: Proc. IEEE Data Compression Conf. quantizers, IEEE Trans. Inform. Theory 41 (1995) (DCC), Snowbird, UT, 2003, pp. 283–292. 886–900. [42] M. Gastpar, P. L. Dragotti, M. Vetterli, On compres- [60] A. Gersho, Asymptotically optimal block quantiza- sion using the distributed Karhunen-Loève transform, tion, IEEE Trans. Inform. Theory IT-25 (1979) 373– in: Proc. IEEE Int. Conf. Acoust., Speech, Signal 380. Processing (ICASSP), Vol. 3, Philadelphia, PA, 2004, [61] P. L. Zador, Topics in the asymptotic quantization of pp. 901–904. continuous random variables, Tech. Memo. 1966, Bell [43] M. Gastpar, P. L. Dragotti, M. Vetterli, The distrib- Lab., unpublished. uted Karhunen-Loève transform, IEEE Trans. Inform. [62] R. M. Gray, T. Linder, J. Li, A Lagrangian formula- TheorySubmitted. tion of Zador’s entropy-constrained quantization the- [44] S. S. Pradhan, K. Ramchandran, Enhancing analog orem, IEEE Trans. Inform. Theory 48 (3) (2002) 695– image transmission systems using digital side informa- 707. tion: A new wavelet-based image coding paradigm, in: [63] R. Zamir, M. Feder, On lattice quantization noise, Proc. IEEE Data Compression Conf. (DCC), Snow- IEEE Trans. Inform. Theory 42 (4) (1996) 1152–1159. bird, UT, 2001, pp. 63–72. [64] K. R. Rao, P. Yip, Discrete cosine transform: Al- [45] R. L. Dobrushin, B. S. Tsybakov, Information trans- gorithms, advantages, applications, Academic Press, mission with additional noise, IRE Trans. Inform. San Diego, CA, 1990. Theory IT-8 (1962) S293–S304. [65] U. Grenander, G. Szegö, Toeplitz forms and their ap- [46] J. K. Wolf, J. Ziv, Transmission of noisy information plications, University of California Press, Berkeley, to a noisy receiver with minimum distortion, IEEE CA, 1958. Trans. Inform. Theory IT-16 (4) (1970) 406–411. [66] R. M. Gray, Toeplitz and circulant matrices: A review [47] Y. Ephraim, R. M. Gray, A unified approach for en- (2002). coding clean and noisy sources by means of waveform URL http://ee.stanford.edu/~gray/toeplitz.pdf and autoregressive vector quantization, IEEE Trans. [67] R. Kresch, N. Merhav, Fast DCT domain filtering Inform. Theory IT-34 (1988) 826–834. using the DCT and the DST, IEEE Trans. Image [48] T. Flynn, R. Gray, Encoding of correlated observa- Processing 8 (1999) 821–833. tions, IEEE Trans. Inform. Theory 33 (6) (1987) 773– [68] T. M. Cover, J. A. Thomas, Elements of information 787. theory, Wiley, New York, 1991. [49] H. S. Witsenhausen, Indirect rate-distortion prob- [69] G. Casella, R. L. Berger, Statistical Inference, 2nd lems, IEEE Trans. Inform. Theory IT-26 (1980) 518– Edition, Thomson Learning, Australia, 2002. 521. [70] A. Aaron, S. Rane, E. Setton, B. Girod, Transform- [50] S. C. Draper, G. W. Wornell, Side information aware domain Wyner-Ziv codec for video, in: Proc. coding strategies for sensor networks, IEEE J. Select. IT&S/SPIE Conf. Visual Commun., Image Process- Areas Commun. 22 (6) (2004) 966–976. ing (VCIP), San Jose, CA, 2004. [51] W. M. Lam, A. R. Reibman, Quantizer design for de- [71] A. D. Wyner, A definition of conditional mutual infor- centralized estimation systems with communication mation for arbitrary ensembles, Inform., Contr. 38 (1) constraints, in: Proc. Conf. Inform. Sci. Syst., Balti- (1978) 51–59. more, MD, 1989. [52] W. M. Lam, A. R. Reibman, Design of quantizers for decentralized estimation systems, IEEE Trans. In- form. Theory 41 (11) (1993) 1602–1605. [53] J. A. Gubner, Distributed estimation and quantization, IEEE Trans. Inform. Theory 39 (4) (1993) 1456– 1459. [54] D. Rebollo-Monedero, B. Girod, Design of optimal quantizers for distributed coding of noisy sources, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Vol. 5, Philadelphia, PA, 2005, pp. 1097–1100, invited paper.