High-Rate Quantization and Transform Coding with Side Information at the Decoder∗ David Rebollo-Monedero†, Shantanu Rane, Anne Aaron and Bernd Girod
Information Systems Lab., Dept. of Electrical Engineering Stanford University, Stanford, CA 94305, USA
Abstract We extend high-rate quantization theory to Wyner-Ziv coding, i.e., lossy source coding with side information at the decoder. Ideal Slepian-Wolf coders are assumed, thus rates are conditional entropies of quantization indices given the side information. This theory is applied to the analysis of orthonormal block transforms for Wyner-Ziv coding. A formula for the optimal rate allocation and an approximation to the optimal transform are derived. The case of noisy high-rate quantization and transform coding is included in our study, in which a noisy observation of source data is available at the encoder, but we are interested in estimating the unseen data at the decoder, with the help of side information. We implement a transform-domain Wyner-Ziv video coder that encodes frames independently but decodes them conditionally. Experimental results show that using the discrete cosine transform results in a rate-distortion improvement with respect to the pixel-domain coder. Transform coders of noisy images for different communica- tion constraints are compared. Experimental results show that the noisy Wyner-Ziv transform coder achieves a performance close to the case in which the side information is also available at the encoder.
Keywords: high-rate quantization, transform coding, side information, Wyner-Ziv coding, distributed source cod- ing, noisy source coding
1. Introduction nating motion compensation, and decoding using past frames as side information, while keeping the Rate-distortion theory for distributed source efficiency close to that of motion-compensated en- coding [3–6] shows that under certain conditions, coding [9–11]. In addition, even if the image cap- the performance of coders with side information tured by the video encoder is corrupted by noise, available only at the decoder is close to the case we would still wish to recover the clean, unseen in which both encoder and decoder have access data at the decoder, with the help of side infor- to the side information. Under much more re- mation, consisting of previously decoded frames, strictive statistical conditions, this also holds for and perhaps some additional local noisy image. coding of noisy observations of unseen data [7,8]. In these examples, due to complexity con- One of the many applications of this result is re- straints in the design of the encoder, or simply ducing the complexity of video encoders by elimi- due to the unavailability of the side information
∗ at the encoder, conventional, joint denoising and This work was supported by NSF under Grant No. CCR- coding techniques are not possible. We need prac- 0310376. The material in this paper was presented par- tially at the 37th and 38th editions of the Asilomar Confer- tical systems for noisy source coding with decoder ence on Signals, Systems and Computers, Pacific Grove, side information, capable of the rate-distortion CA, Nov. 2003 [1] and 2004 [2]. † performance predicted by information-theoretic Corresponding author. Tel.: +1-650-724-3647. E-mail studies. To this end, it is crucial to extend the address: [email protected] (D. Rebollo-Monedero). 2 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod building blocks of traditional source coding and tings. In particular, [32] considered the important denoising, such as lossless coding, quantization, case of ideal Slepian-Wolf coding of the quantiza- transform coding and estimation, to distributed tion indices, at a rate equal to the conditional source coding. entropy given the side information. In [34–36], It was shown by Slepian and Wolf [3] that loss- nested lattice quantizers and trellis-coded quan- less distributed coding can achieve the same per- tizers followed by Slepian-Wolf coders were used formance as joint coding. Soon after, Wyner and to implement WZ coders. Ziv [4,12] established the rate-distortion limits for The Karhunen-Lo`eve Transform (KLT) [37– lossy coding with side information at the decoder, 39] for distributed source coding was investigated which we shall refer to as Wyner-Ziv (WZ) cod- in [40,41], but it was assumed that the covariance ing. Later, an upper bound on the rate loss due matrix of the source vector given the side infor- to the unavailability of the side information at mation does not depend on the values of the side the encoder was found in [5], which also proved information, and the study was not in the con- that for power-difference distortion measures and text of a practical coding scheme with quantiz- smooth source probability distributions, this rate ers for distributed source coding. Very recently, loss vanishes in the limit of small distortion. A the distributed KLT was studied in the context similar high-resolution result was obtained in [13] of compression of Gaussian source data, assum- for distributed coding of several sources with- ing that the transformed coefficients are coded at out side information, also from an information- the information-theoretic rate-distortion perfor- theoretic perspective, that is, for arbitrarily large mance [42,43]. Most of the recent experimental dimension. In [14] (unpublished), it was shown work on WZ coding uses transforms [44,1,24]. that tessellating quantizers followed by Slepian- There is extensive literature on source cod- Wolf coders are asymptotically optimal in the ing of a noisy observation of an unseen source. limit of small distortion and large dimension. The nondistributed case was studied in [45–47], It may be concluded from the proof of the con- and [7,48–50,8] analyzed the distributed case verse to the WZ rate-distortion theorem [4] that from an information-theoretic point of view. Us- there is no asymptotic loss in performance by con- ing Gaussian statistics and Mean-Squared Error sidering block codes of sufficiently large length, (MSE) as a distortion measure, [13] proved that which may be seen as vector quantizers, followed distributed coding of two noisy observations with- by fixed-length coders. This suggests a conve- out side information can be carried out with a nient implementation of WZ coders as quantiz- performance close to that of joint coding and de- ers, possibly preceded by transforms, followed by noising, in the limit of small distortion and large Slepian-Wolf coders, analogously to the imple- dimension. Most of the operational work on dis- mentation of nondistributed coders. tributed coding of noisy sources, that is, for a Practical distributed lossless coding schemes fixed dimension, deals with quantization design have been proposed, adapting channel coding for a variety of settings [51–54], but does not con- techniques such as turbo codes and low-density sider the characterization of such quantizers at parity-check codes, which are approaching the high rates or transforms. Slepian-Wolf bound, e.g., [15–23]. See [24] for a A key aspect in the understanding of opera- much more exhaustive list. tional coding is undoubtedly the theoretic charac- The first studies on quantizers for WZ cod- terization of quantizers at high rates [55], which is ing were based on high-dimensional nested lat- also fundamental in the theoretic study of trans- tices [25–27], or heuristically designed scalar forms for data compression [56]. In the litera- quantizers [16,28], often applied to Gaussian ture reviewed at this point, the studies of high- sources, with fixed-length coding or entropy cod- rate distributed coding are information theoretic, ing of the quantization indices. A different ap- thereby requiring arbitrarily large dimension, proach was followed in [29–32], where the Lloyd among other constraints. On the other hand, the algorithm [33] was generalized for a variety of set- aforementioned studies of transforms applied to High-Rate Quantization and Transform Coding with Side Information at the Decoder 3 compression are valid only for Gaussian statistics 2. High-Rate WZ Quantization of Clean Sources and assume that the transformed coefficients are coded at the information-theoretic limit. We study the properties of high-rate quantizers In this paper, we provide a theoretic charac- for the WZ coding setting in Fig. 1. The source terization of high-rate WZ quantizers for a fixed dimension, assuming ideal Slepian-Wolf coding of X Q Xˆ qx() xˆ(,qy ) the quantization indices, and we apply it to de- velop a theoretic analysis of orthonormal trans- forms for WZ coding. Both the case of coding Y of directly observed data, and the case of coding of a noisy observation of unseen data, are consid- Fig. 1. WZ quantization. ered. We shall refer to these two cases as coding of clean sources and noisy sources, respectively. data to be quantized is modeled by a continuous The material in this paper was presented partially random vector X of finite dimension n. Let the in [1,2]. quantization function q(x) map the source data Section 2 presents a theoretic analysis of high- into the quantization index Q. A random vari- rate quantization of clean sources, and Section 3, able Y , distributed in an arbitrary alphabet, dis- of noisy sources. This analysis is applied to the crete or continuous, plays the role of side infor- study of transforms of the source data in Sec- mation, available only at the receiver. The side tions 4 and 5, also for clean and noisy sources, re- information and the quantization index are used spectively. Section 6 analyzes the transformation jointly to estimate the source data. Let Xˆ repre- of the side information itself. In Section 7, exper- sent this estimate, obtained with the reconstruc- imental results on a video compression scheme tion functionx ˆ(q,y). using WZ transform coding, and also on image MSE is used as a distortion measure, thus denoising, are shown to illustrate the clean and the expected distortion per sample is D = 1 ˆ 2 noisy coding cases. n E X − X . The rate for asymmetric Slepian- Throughout the paper, we follow the conven- Wolf coding of Q given Y has been shown to be tion of using uppercase letters for random vari- H(Q|Y ) in the case when the two alphabets of the ables, including random scalars, vectors, or ab- random variables involved are finite [3], in the stract random ensembles, and lowercase letters sense that any rate greater than H(Q|Y ) would for the particular values that they take on. The allow arbitrarily low probability of decoding er- measurable space in which a random variable ror, but any rate lesser than H(Q|Y ) would not. takes values will be called alphabet. Let X In [57], the validity of this result has been general- be a random variable, discrete, continuous or ized to countable alphabets, but it is still assumed in an arbitrary alphabet, possibly vector valued. that H(Y ) < ∞. We show in Appendix A that Its probability function, if it exists, will be de- the asymmetric Slepian-Wolf result remains true noted by pX (x), whether it is a probability mass under the assumptions in this paper, namely, for function (PMF) or a probability density func- any Q in a countable alphabet and any Y in an tion (PDF)(a). For notational convenience, the arbitrary alphabet, possibly continuous, regard- covariance operator Cov and the letter Σ will be less of the finiteness of H(Y ). The formulation used interchangeably. For example, the condi- in this work assumes that the coding of the in- tional covariance of X| y is the matrix function dex Q with side information Y is carried out by an ΣX|Y (y)=Cov[X| y]. ideal Slepian-Wolf coder, with negligible decod- ing error probability and rate redundancy. The expected rate per sample is defined accordingly 1 as R = n H(Q| Y ) [32]. (a)As a matter of fact, a PMF is a PDF with respect to We emphasize that the quantizer only has ac- the counting measure. cess to the source data, not to the side informa- 4 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod
tion. However, the joint statistics of X and Y are satisfies, for large RX|Y (y), assumed to be known, and are exploited in the 2 design of q(x) andx ˆ(q,y). We consider the prob- DX|Y (y) Mn V (y) n , (1) lem of characterizing the quantization and recon- R 1 | − X|Y (y) n (h(X y) log2 V (y)) , (2) struction functions that minimize the expected 2 X|y − R y D n h( ) 2 X|Y ( ) Lagrangian cost C = D + λ R, with λ a nonnega- X|Y (y) Mn 2 2 . (3) R tive real number, for high rate . Then, there exists an asymptotically optimal The theoretic results are presented in Theo- quantizer q(x) for large R, for the WZ coding set- rem 1. The theorem holds if the Bennett as- ting considered such that: sumptions [58,59] apply to the conditional PDF pX|Y (x| y) for each value of the side informa- 1. q(x) is a uniform tessellating quantizer with tion y, and if Gersho’s conjecture [60] is true minimum moment of inertia Mn and cell (known to be the case for n = 1), among other volume V . technical conditions, mentioned in [55]. For a rig- 2. No two cells of the partition defined by q(x) orous treatment of high-rate theory that does not need to be mapped into the same quantiza- rely on Gersho’s conjecture, see [61,62]. tion index. We shall use the term uniform tessellating quantizer in reference to quantizers whose quan- 3. The rate and distortion satisfy tization regions are possibly rotated versions of 2 DMn V n , (4) a common convex polytope, with equal volume. R 1 | − Lattice quantizers are, strictly speaking, a par- n (h(X Y ) log2 V ) , (5) ticular case. In the following results, Gersho’s 2 h(X|Y ) −2R DMn 2 n 2 . (6) conjecture for nondistributed quantizers, which allows rotations, will be shown to imply that op- Proof: The proof uses the quantization setting timal WZ quantizers are also tessellating quan- in Fig. 2, which we shall refer to as a condi- tizers, and the uniformity of the cell volume will tional quantizer, along with an argument of opti- (b) be proved as well . Mn denotes the minimum mal rate allocation for q(x| y), where q(x| y)can normalized moment of inertia of the convex poly- be regarded as a quantizer on the values x and y Rn topes tessellating (e.g., M1 =1/12). taken by the source data and the side informa- tion, or a family of quantizers on x indexed by y. Theorem 1 (High-rate WZ quantization). Sup- In this case, the side information Y is available pose that for each value y in the alphabet of Y , the statistics of X given Y = y are such that X Q Xˆ the conditional differential entropy h(X| y) ex- qx(|) y xˆ(,qy ) ists and is finite. Suppose further that for each y, there exists an asymptotically optimal entropy-constrained uniform tessellating quan- Y tizer of x, q(x| y), with rate RX|Y (y) and distor- tion DX|Y (y),withnotwocellsassignedtothe Fig. 2. Conditional quantizer. same index and with cell volume V (y) > 0,which to the sender, and the design of the quantiza- (b)A tessellating quantizer need not be uniform. A triv- tion function q(x| y)onx, for each value y,isa ial example is a partition of the real line into intervals of nondistributed entropy-constrained quantization different length. In 1 dimension, uniform tesselating quan- problem. More precisely, for all y define tizers are uniform lattice quantizers. It is easy to construct simple examples in R2 of uniform and nonuniform tesse- 1 2 DX|Y (y)= E[X − Xˆ | y], lating quantizers that are not lattice quantizers using rec- n 1 tangles. However, the optimal nondistributed, fixed-rate RX|Y (y)= n H(Q| y), quantizers for dimensions 1 and 2 are known to be lattices: C D R the Z-lattice and the hexagonal lattice respectively. X|Y (y)= X|Y (y)+λ X|Y (y). High-Rate Quantization and Transform Coding with Side Information at the Decoder 5
D D | 1 n By iterated expectation, =E X|Y (Y )and Proof: Use h(X Y )= 2 log2 (2πe) det ΣX|Y , R =ERX|Y (Y ), thus the overall cost satisfies → 1 and Mn 2πe [63], together with Theorem 1. C =ECX|Y (Y ). As a consequence, a family of quantizers q(x| y) minimizing CX|Y (y)foreachy also minimizes C. C R Since X|Y (y) is a convex function of X|Y (y) 3. High-Rate WZ Quantization of Noisy Sources for all y, it has a global minimum where its deriv- ative vanishes, or equivalently, at RX|Y (y) such In this section, we study the properties of high- that λ 2ln2DX|Y (y). Suppose that λ is small rate quantizers of a noisy source with side infor- enough for RX|Y (y) to be large and for the ap- mation at the decoder, as illustrated in Fig. 3, proximations (1)-(3) to hold, for each y. Then, all which we shall refer to as WZ quantizers of a quantizers q(x| y) introduce the same distortion noisy source. A noisy observation Z of some un- (proportional to λ) and consequently have a com- mon cell volume V (y) V . This, together with Z Q Xˆ the fact that EY [h(X| y)]y=Y =h(X| Y ), implies qz() xˆ(,qy ) (4)-(6). Provided that a translation of the parti- tion defined by q(x| y) affects neither the distor- tion nor the rate, all uniform tessellating quantiz- Y ers q(x| y) may be set to be (approximately) the Fig. 3. WZ quantization of a noisy source. same, which we denote by q(x). Since none of the quantizers q(x| y) maps two cells into the same seen source data X is quantized at the encoder. indices, neither does q(x). Now, since q(x)is The quantizer q(z) maps the observation into a asymptotically optimal for the conditional quan- quantization index Q. The quantization index is tizer and does not depend on y, it is also optimal losslessly coded, and used jointly with some side for the WZ quantizer in Fig. 1. information Y , available only at the decoder, to obtain an estimate Xˆ of the unseen source data. Equation (6) means that, asymptotically, there xˆ(q,y) denotes the reconstruction function at the is no loss in performance by not having access to decoder. X, Y and Z are random variables with the side information in the quantization. known joint distribution, such that X is a con- Corollary 2 (High-rate WZ reconstruction). Un- tinuous random vector of finite dimension n.No der the hypotheses of Theorem 1, asymptotically, restrictions are imposed on the alphabets of Y there is a quantizer that leads to no loss in per- and Z. formance by ignoring the side information in the MSE is used as a distortion measure, thus reconstruction. the expected distortion per sample of the unseen 1 ˆ 2 source is D = n E X − X . As in the previ- Proof: Since index repetition is not required, the ous section, it is assumed that the coding of the distortion (4) would be asymptotically the same if index Q is carried out by an ideal Slepian-Wolf 1 the reconstructionx ˆ(q,y) were of the formx ˆ(q)= coder, at rate per sample R = n H(Q| Y ). We E[X| q]. emphasize that the quantizer only has access to the observation, not to the source data or the side Corollary 3. Let X and Y be jointly Gaussian information. However, the joint statistics of X, random vectors. Then, the conditional covari- Y and Z can be exploited in the design of q(z) ance ΣX|Y does not depend on y,andforlargeR, andx ˆ(q,y). We consider the problem of charac- terizing the quantizers and reconstruction func- 1 tions that minimize the expected Lagrangian cost D n −2R Mn 2πe(det ΣX|Y ) 2 C = D + λR, with λ a nonnegative real number, 1 −2R for high rate R. This includes the problem in the −−−−→ (det ΣX|Y ) n 2 . n→∞ previous section as the particular case Z = X. 6 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod
3.1. Nondistributed Case Take expectation to obtain (7). Note that the first term of (7) does not depend on the quantiza- We start by considering the simpler case of tion design, and the second is the MSE between X¯ quantization of a noisy source without side in- and Xˆ. formation, depicted in Fig. 4. The following Let r(q) be the codeword length function of ˆ a uniquely decodable code, that is, satisfying Z Q X −r(q) qz() xˆ()q q 2 1, with R =Er(Q). The La- grangian cost of the setting in Fig. 4 can be writ- Fig. 4. Quantization of a noisy source without side infor- ten as mation. C = 1 (E tr Cov[X| Z]+ theorem extends the main result of [46,47] to n +infEinf{x¯(Z) − xˆ(q)2 + λr(q)}), entropy-constrained quantization, valid for any xˆ(q),r(q) q rate R =H(Q), not necessarily high. Define x¯(z)=E[X| z], the best MSE estimator of X and the cost of the setting in Fig. 5 as given Z,andX¯ =¯x(Z). 1 C = n (E tr Cov[X| Z]+ Theorem 4 (MSE noisy quantization). For any +infEinf{X¯ − xˆ(q)2 + λr(q)}), nonnegative λ and any Lagrangian-cost optimal xˆ(q),r(q) q quantizer of a noisy source without side infor- mation (Fig. 4), there exists an implementation which give the same result. Now, since the ex- with the same cost in two steps: pected rate is minimized for the (admissible) rate measure r(q)=− log pQ(q) and E r(Q)=H(Q), ¯ 1. Obtain the minimum MSE estimate X. both settings give the same Lagrangian cost with 2. Quantize the estimate X¯ regarded as a a rate equal to the entropy. clean source, using a quantizer q(¯x) and Similarly to the remarks on Theorem 1, the a reconstruction function xˆ(q), minimizing hypotheses of the next theorem are believed to E X¯ − Xˆ2 + λ H(Q). hold if the Bennett assumptions apply to the PDF This is illustrated in Fig. 5. Furthermore, the pX¯ (¯x) of the MSE estimate, and if Gersho’s con- total distortion per sample is jecture is true among other technical conditions. Theorem 5 (High-rate noisy quantization). As- D = 1 (E tr Cov[X| Z]+EX¯ − Xˆ2), (7) n sume that h(X¯) < ∞ and that there exists a uni- where the first term is the MSE of the estimation form tessellating quantizer q(¯x) of X¯ with cell step. volume V that is asymptotically optimal in La- grangian cost at high rates. Then, there exists an asymptotically optimal quantizer q(z) of a noisy Z X Q Xˆ E[X |z ] qx() xˆ()q source in the setting of Fig. 4 such that: 1. An asymptotically optimal implementation Fig. 5. Optimal implementation of MSE quantization of a of q(z) is that of Theorem 4, represented noisy source without side information. in Fig. 5, with a uniform tessellating quan- Proof: The proof is a modification of that in [47], tizer q(¯x) having cell volume V . replacing distortion by Lagrangian cost. De- 2. The rate and distortion per sample satisfy fine the modified distortion measure d˜(z,xˆ)= 2 2 E[X − xˆ | z]. Since X ↔ Z ↔ Xˆ,itiseasy 1 n D n EtrCov[X| Z]+Mn V , to show that E X − Xˆ2 =Ed˜(Z, Xˆ). By the R 1 ¯ − orthogonality principle of linear estimation, n (h(X) log2 V ), 2 ¯ D 1 | n h(X) −2R d˜(z,xˆ)=E[X − x¯(z)2| z]+x¯(z) − xˆ2. n EtrCov[X Z]+Mn 2 2 . High-Rate Quantization and Transform Coding with Side Information at the Decoder 7
Proof: Immediate from Theorem 4 and conven- 4. h(X¯| Y )=h(X¯Z | Y ). tional theory of high-rate quantization of clean sources. Proof: The proof is similar to that for clean sources in Theorem 1 and only the differences are 3.2. Distributed Case emphasized. First, as in the proof of WZ quanti- zation of a clean source, a conditional quantiza- We are now ready to consider the WZ quan- tion setting is considered, as represented in Fig. 6. tization of a noisy source in Fig. 3. Define An entirely analogous argument using conditional x¯(y,z)=E[X| y,z], the best MSE estimator costs, as defined in the proof for clean sources, of X given Y and Z, X¯ =¯x(Y,Z), and D∞ = implies that the optimal conditional quantizer is 1 | n EtrCov[X Y,Z]. The following theorem ex- an optimal conventional quantizer for each value tends the results on high-rate WZ quantization of y. Therefore, using statistics conditioned on y in Section 2 to noisy sources. The remark on the hypotheses of Theorem 5 also applies here, Z Q Xˆ where the Bennett assumptions apply instead to qzy(, ) xˆ(,qy ) the conditional PDF pX¯ |Y (¯x| y) for each y. Theorem 6 (High-rate noisy WZ quantization). Y Suppose that the conditional expectation func- tion x¯(y,z) is additively separable, i.e., x¯(y,z)= Fig. 6. Conditional quantization of a noisy source. ¯ x¯Y (y)+¯xZ (z), and define XZ =¯xZ (Z).Sup- everywhere, by Theorem 4, the optimal condi- pose further that for each value y in the alpha- tional quantizer can be implemented as in Fig. 7, ¯| ∞ bet of Y , h(X y) < , and that there exists a with conditional costs uniform tessellating quantizer q(¯x, y) of X¯,with 2 no two cells assigned to the same index and cell 1 n DX|Y (y) n E[tr Cov[X| y,Z]| y]+Mn V (y) , volume V (y) > 0, with rate RX¯ |Y (y) and dis- R 1 ¯| − tortion DX¯ |Y (y), such that, at high rates, it is X|Y (y) n (h(X y) log2 V (y)), 1 asymptotically optimal in Lagrangian cost and DX|Y (y) n E[tr Cov[X| y,Z]| y]+ 2 n 2 DX¯ |Y (y) Mn V (y) , h(X¯ |y) −2RX|Y (y) + Mn 2 n 2 . R 1 ¯| − X¯ |Y (y) n h(X y) log2 V (y) , 2 The derivative of CX|Y (y) with respect to h(X¯ |y) −2RX¯ |Y (y) DX¯ |Y (y) Mn 2 n 2 . Then, there exists an asymptotically optimal Z X Q Xˆ ˆ quantizer q(z) for large R, for the WZ quanti- E[X yz , ] qxy(,) x(,qy ) zation setting represented in Fig. 3 such that:
1. q(z) can be implemented as an estimator Y x¯Z (z) followed by a uniform tessellating Fig. 7. Optimal implementation of MSE conditional quan- quantizer q(¯xZ ) with cell volume V . tization of a noisy source. 2. No two cells of the partition defined by 2 RX|Y (y) vanishes when λ 2ln2Mn V (y) n , q(¯xZ ) need to be mapped into the same quantization index. which, as in the proof for clean sources, implies that all conditional quantizers have a common 3. The rate and distortion per sample satisfy cell volume V (y) V (however, only the second 2 term of the distortion is constant, not the over- DD n ∞ + Mn V , (8) all distortion). Taking expectation of the condi- R 1 ¯| − n (h(X Y ) log2 V ), (9) tional costs proves that (8) and (9) are valid for 2 ¯ the conditional quantizer of Fig. 7. The validity DD n h(X|Y ) −2R ∞ + Mn 2 2 . (10) of (10) for the conditional quantizer can be shown 8 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod
by solving for V in (9) and substituting the result are the centroids of q(¯xZ ). into (8). Proof: In the proof of Theorem 6, q(¯xZ ) is a uni- The assumption thatx ¯(y,z)=¯xY (y)+¯xZ (z) form tessellating quantizer without index repeti- means that for two values of y, y and y ,¯x(y ,z) 1 2 1 tion, a translated copy of q(¯x, y). andx ¯(y2,z), seen as functions of z, differ only by a constant vector. Since the conditional quantizer Theorem 6 and Corollary 7 show that the WZ of X¯, q(¯x| y), is a uniform tessellating quantizer quantization setting of Fig. 3 can be implemented at high rates, a translation will neither affect the as depicted in Fig. 8, where x¯ˆZ (q,y) can be made distortion nor the rate, and thereforex ¯(y,z)can independent from y without asymptotic loss in be replaced byx ¯Z (z) with no impact on the La- performance, so that the pair q(¯xZ ), x¯ˆZ (q)form grangian cost. In addition, since all conditional a uniform tessellating quantizer and reconstructor quantizers have a common cell volume, the same for X¯Z . translation argument implies that a common un- ˆ ˆ conditional quantizer q(¯xZ ) can be used instead, Z X Z Q X Z X x ()z qx() xˆ (,qy ) with performance given by (8)-(10), and since Z Z Z conditional quantizers do not reuse indices, nei- x ()y ther does the common unconditional quantizer. Y Y The last item of the theorem follows from the Fig. 8. Asymptotically optimal implementation of MSE fact that h(¯xY (y)+X¯Z | y)=h(X¯Z | y). WZ quantization of a noisy source with additively separa- Clearly, the theorem is a generalization of The- blex ¯(y, z). orem 1, since Z = X impliesx ¯(y,z)=¯xZ (z)=z, Finally, ifx ¯Z (z) is a bijective vector field, then, trivially additively separable. under mild conditions, including continuous dif- The case in which X can be written as X = ferentiability ofx ¯Z (z) and its inverse, it can be f(Y )+g(Z)+N, for any (measurable) func- shown that h(X¯| Y ) in Theorem 6 satisfies tions f, g and any random variable N with | ¯| | d¯xZ E[N y,z] constant with (y,z), gives an exam- h(X Y )=h(Z Y ) + E log2 det dz (Z) , ple of additively separable estimator. This in- where d¯xZ (z)/dz denotes the Jacobian matrix cludes the case in which X, Y and Z are jointly ofx ¯Z (z). Gaussian. Furthermore, in the Gaussian case, sincex ¯Z (z) is an affine function and q(¯xZ )isa 4. WZ Transform Coding of Clean Sources uniform tessellating quantizer, the overall quan- tizer q(¯xZ (z)) is also a uniform tessellating quan- The following intermediate definitions and re- tizer, and if Y and Z are uncorrelated, then sults will be useful to analyze orthonormal trans- x¯Y (y)=E[X| y] andx ¯Z (z)=E[X| z], but not forms for WZ coding. Define the geometric ex- in general. pectation of a positive random scalar S as G S = S Observe that, according to the theorem, if the bElogb , for any positive real b different from 1. estimatorx ¯(y,z) is additively separable, there is Note that if S were discrete with probability pS (s) no asymptotic loss in performance by not using mass function pS(s), then G S = s s .The the side information at the encoder. constant factor in the rate-distortion approxima- tion (3) can be expressed as Corollary 7. Assume the hypotheses of Theo- rem 6, and that the optimal reconstruction lev- 2 h(X|y) 2 1/n Mn 2 n = X|Y (y) det ΣX|Y (y) , els x¯ˆ(q,y) for each of the conditional quantizers 2 q(¯x, y) are simply the centroids of the quantiza- where X|Y (y) depends only on Mn and tion cells for a uniform distribution. Then, there pX|Y (x| y), normalized with covariance identity. is a WZ quantizer q(¯xZ ) that leads to no as- If h(X| y) is finite, then ymptotic loss in performance if the reconstruction 2 1/n function is xˆ(q,y)=x¯ˆZ (q)+¯xY (y),wherex¯ˆZ (q) X|Y (y) det ΣX|Y (y) > 0, High-Rate Quantization and Transform Coding with Side Information at the Decoder 9
2 2 [h(X|y)]y=Y h(X|Y ) | and since GY [2 n ]=2n , (6) is estimate of X given Y , i.e., E[X Y ]. In fact, the equivalent to orthogonality principle of conditional estimation implies 2 1/n −2R DG[X|Y (Y )] G[ det ΣX|Y (Y ) ]2 . Σ¯ X|Y +CovE[X| Y ]=CovX, We are now ready to consider the transform thus Σ¯ X|Y Cov X, with equality if and only if coding setting in Fig. 9. Let X =(X1,...,Xn) E[X| Y ] is a constant with probability 1. ˆ ˆ X1 X1c Q1c Q1c X1c X1 qc SWC xˆc Theorem 8 (WZ transform coding). Assume Ri 1 1 large so that the results for high-rate approxima- ˆ ˆ tion of Theorem 1 can be applied to each subband X 2 X 2c Q2c Q2c X 2c X 2 T U q2c SWC xˆ2c U in Fig. 9, i.e.,