<<

Gaussian Approximation of Quantization Error for Estimation from Compressed Data

Alon Kipnis∗ and Galen Reeves† ∗ Department of Statistics, Stanford University † Department of Electrical and Computer Engineering, Department of Statistical Science, Duke University

Abstract—We consider the statistical connection between the {1,..., 2nR} quantized representation of a high dimensional signal X using Enc Dec Y a random spherical code and the observation of X under an additive white Gaussian (AWGN). We show that given 2 X, the conditional Wasserstein distance between its bitrate-R θ PX|θ X N (0, σ I) quantized version and its observation under AWGN of signal-to- noise ratio 22R − 1 is sub-linear in the problem dimension. We then utilize this fact to connect the mean squared error (MSE) + Z attained by an estimator based on an AWGN-corrupted version of X to the MSE attained by the same estimator when fed with Fig. 1. Inference about the parameter vector θ is based on degraded its bitrate-R quantized version. observations of the data X. The effects of bitrate constraints are compared to the effect of additive gaussian noise by studying the quadratic Wassertein I.INTRODUCTION distance between PY |X and PZ|X . Under random spherical encoding, we show that this distance is order one, and thus independent of the problem Due to the disproportionate size of modern datasets compared dimensions. to available computing and communication resources, many inference techniques are applied to a compressed representation distributed over the sphere of radius r = pn(1 − 2−2R). The of the data rather than the data itself. In the attempt to develop left side of Fig. 2, adapted from [7], shows that the angle and analyze inference techniques based on a degraded version α between X and its nearest codeword Xˆ concentrates as n of the data, it is tempting to model inaccuracies resulting increases so that sin(α) = 2−R with high probability. As a from lossy compression as an additive noise. Indeed, there result, the quantized representation of X and the error X − Xˆ exists a rich literature devoted to the characterization of this are orthogonal, and thus the MSE between X and its quantized “noise”, i.e., the difference between the original data and representation averaged over all random codebooks converges −2R its compressed representation [1]. Nevertheless, due to the to the Gaussian -rate function (DRF) DG(R) , 2 . difficulty of analyzing non-linear compression operations, this In fact, as noted in [9], this Gaussian coding scheme1 achieves characterization is generally limited to the high bit compression the Gaussian DRF when X is generated by any ergodic source regime [2]–[4] or compression using a dithered quantizer [5]. of unit variance, implying that the distribution of kX − Xˆk2 is In this paper, we show that quantizing the data using a independent of the distribution of X as the problem dimension random spherical code (RSC) leads to a strong and relatively n goes to infinity. simple characterization of the distribution of the quantization In this paper, we show that a much stronger statement holds noise. Specifically, consider the output codeword of a bitrate-R for a scaled version of the quantized representation (Y in Fig. 2): RSC and the output of a Gaussian noise channel with signal-to- in the limit of high dimension, the distribution of Y − X is noise ratio (SNR) 22R − 1 illustrated in Fig. 1; our main result independent of the distribution of X and is approximately says that the conditional Wasserstein distance between these Gaussian. This property of Y − X suggests that the underlying two outputs is independent of the problem dimension. This signal or parameter vector θ can now be estimated as if X is connection allows us to adapt inference procedures designed observed under AWGN. This paper formalizes this by showing for AWGN-corrupted data to be used with the quantized data, that estimators of θ from an AWGN-corrupted version of X (Z as well as to analyze the performances of such procedures. in Fig. 1) attain similar performances if applied to the scaled Spherical codes have multiple theoretical and practical uses representation Y . in numerous fields [6]. In the context of , We emphasize that the scaling of the quantized representation Sakrison [7] and Wyner [8] provided a geometric understanding and the estimation of θ are done at the decoder, and thus X is of random spherical coding in a Gaussian setting, and our essentially represented by the codeword maximizing the cosine main result can be seen as an extension of their insights. similarity. In particular, no additional information about X or Specifically, consider the representation of an n-dimensional θ is required by the encoder. This situation is in contrast to standard Gaussian vector X using 22R codewords uniformly optimal quantization schemes in indirect source coding [10],

The work of G. Reeves was supported in part by funding from the NSF 1We denote this scheme as Gaussian since r is chosen according to the under Grant No. 1750362. Gaussian DRF. error sphere II.PRELIMINARIES representation Our basic setting considers an n-dimensional random vector sphere X with distribution PX|θ, where θ is a d-dimensional random vector with prior P . We refer to X as the measurements and X θ ˆ Y to θ as the underlying signal, and consider the problem of X α X α √ √ recovering θ from a bitrate-R quantized version of X. n n r √ √ ρ A. Random Spherical Codes n n We quantize X using a random spherical code. Such code consists of a list of M codewords (codebook):

input Csp(M, n) , {C(1),...,C(M)}, sphere where each C(i), i = 1,...,M, is drawn independently from √ n Fig. 2. Left: In the standard source coding problem [7], [8], the representation the uniform distribution over the sphere of radius n in R . sphere is chosen such that the error Xˆ − X is orthogonal to the reconstruction The quantized representation of X is given by the codeword Xˆ. Right: In this paper the representation sphere is chosen such that the error c(i∗) that maximizes the cosine similarity between X and the Y − X is orthogonal to the input X. codewords, namely:

∗ ∗ hc(i),Xi [11] and in problems involving estimation from compressed Enc(X) = c(i ), i = arg max . 1≤i≤Mn kc(i)kkXk information [12]–[17] which depends on PX|θ. As a result, the random spherical coding scheme we rely on is in general sub- Let the cosine similarity between the vector X and its optimal, although it can be applied in situations where θ or the quantized representation be denoted by the random variable model that generated X are unspecified or unknown. Coding hX, Enc(X)i S . schemes with similar properties were studied in the context of , kXkkEnc(X)k indirect source coding under the name compress-and-estimate in [18]–[21]. Proposition 1 ( [7], [8]). The cosine similarity S is independent of X and has distribution function To summarize the contributions of this paper, consider the M vectors Y and Z illustrated in Fig. 1, describing the output of Pr[S ≤ s] = (1 − Qn(s)) , (2) a bitrate-R RSC applied to X and the output of an AWGN 1 Γ( n ) where Q (s) = κ R (1 − t2)(n−3)/2 dt and κ = √ 2 . channel with input X, respectively. Our main result shows that n n s n πΓ( n−1 ) the quadratic Wasserstein distance between Y and Z given the 2 nR input X is sub-linear in the problem dimension n. We further Proposition 2. If M = d2 e with R > 0, then use this result to establish the equivalence  p 2  log2 n 1  S − 1 − 2−2R = O + E n2(22R − 1) n2 2R    2  Dsp(R) ≈ M(snr), snr = 2 − 1, (1) p 2 log n 1 1 − S2 − 2−R = O + . E n222R n2 where Dsp(R) is the MSE of estimating θ from Y with input Proof. With a bit of work, it can be verified that

X as a function of the bitrate, and M(snr) is the MSE of κ −M n (1−a2)(n−1)/2 2 (n−1)/2 estimating θ under additive Gaussian noise as a function of Pr[S ∈ [a, b]] ≥ 1 − e n−1 − Mκn(1 − b ) the SNR in the channel from X to Z. Next, we use (1) to for all 0 ≤ a ≤ b ≤ 1. For any  ∈ (0, 1) such that log(1/) < analyze the MSE of Lipschitz continuous estimators of θ Mκn/(n − 1), letting from Y by considering their MSE in estimating θ from Z. The benefit is twofold: Allowing inference techniques from (n − 1) log(1/)2/(n−1) a2 = 1 − (3) AWGN-corrupted measurements to be applied directly on the Mκn quantized measurements to recover θ, as well as providing the 2/(n−1)    expected MSE in this recovery. Finally, we apply our results b2 = 1 − , (4) Mκn to two special cases where the measurement model PX|θ is Gaussian. The first case demonstrates how RSC can be used in leads to Pr[S ∈ [a, b]] ≥ 1 − 2. Since S ∈ [−1, 1], it follows conjunction with the prior distribution to attain MSE smaller that for any s ∈ [a, b], than D (R) when this prior is not Gaussian. The second case G  2 2 demonstrates how to derive an achievable distortion for the E |S − s| ≤ (b − a) + 8 quantized compressed sensing problem [18], [22] using the Or goal is to show that this term is small with approximate message passing (AMP) algorithm and its state p evolution property [23], [24]. s = 1 − M −2/(n−1) Let  = 1/n2. For all Remark 1. The difference between the conditional and 2(n − 1) log n unconditional Wasserstein distances in (8) can be arbitrarily M > large. For example, suppose that U is uniform on {µ, −µ} κn and let PX|U=u = N (u, I) and PY |U=u = N (−u, I). Then 2 it follows that a ≤ S ≤ b with probability greater than 1−2/n . W2(X,Y ) = 0 but W2(X,Y | U) = W2(N (2µ, I), N (0,I)), This implies that which increases without bound as kµk increases .  1/(n−1)  1/(n−1) 1 p 2 2(n − 1) log n Given a Markov chain θ → X → (Y,Z), the conditional 2 ≤ 1 − S ≤ 2n Mκn Mκn Wasserstein distance associated with PY,Z|X provides a natural upper bound on the difference between the marginals P with probability greater than 1 − 2/n2. Using the inequality θ,Y and P . For the following result, recall that a function f : y − x ≤ y log(y/x) we see that the difference between the θ,Z n → d is L-Lipschitz if kf(u) − f(v)k ≤ Lku − vk for all upper and lower bounds is R R u, v ∈ Rn p 2 p 2 1 − a − 1 − b d n Proposition 3. Let Pθ,X be a distribution on R × R and  1/(n−1)  1/(n−1) n 2(n − 1) log n 1 let PY |X and PZ|X be conditional distributions from R to = − 2 n n d Mκn 2n Mκn R . For any L-Lipshitz function f : R → R , we have  1/(n−1) 2  2(n − 1) log n log 4n (n − 1) log n p 2 p 2 ≤ E[kθ − f(Y )k ] − E[kθ − f(Z)k ] ≤ LW2(Y,Z | X), Mκn n − 1 M −1/n log n provided that the expectations exist. = O n Proof. For any Markov chain θ → X → (Y,Z), the triangle Hence, we conclude that inequality yields p  p 2 M −2/n log2 n + 1 E[kθ − f(Y )k2] 1 − S2 − M −1/(n−1) = O E 2 p p n ≤ E[kθ − f(Z)k2] + E[kf(Y ) − f(Z)k2] To deal with the S, observe that ≤ p [kθ − f(Z)k2] + Lp [kY − Zk2]. √ √ E E b − a 1 − a2 + 1 − b2 2 √ √ = ≤ (5) Taking the infimum over all possible couplings of (X,Y,Z) 1 − a2 − 1 − b2 (b + a) b leads to one direction of the stated inequality. Interchanging Thus, the role of Y and Z completes the proof.    −2/n 2  p 2 M log n 1 III.QUANTIZATION ERRORVERSUS GAUSSIAN NOISE S − 1 − M −2/(n−1) = O + E −2/n 2 2 1 − M n n This section addresses the extent to which the quantization (6) error induced by random spherical coding can be approximated by isotropic Gaussian noise. Given an integer M and positive number ρ, let P be the conditional distribution of B. Wasserstein Distance and Lipschitz Continuity Y |X The quadratic Wasserstein distance between Rn-valued Y = ρ Enc(X), (9) random vectors X and Y is defined as obtained by marginalizing over the randomness in the codebook. q √ 2 By construction, Y is supported on the sphere of radius nρ. W2(X,Y ) = W2(PX ,PY ) , inf E[kX − Y k2], For comparison, we define the Gaussian noise observation where the infimum is over all joint distributions (X,Y ) model PZ|X according to satisfying the marginal constraints X ∼ PX and Y ∼ PY . The quadratic Wasserstein distance is a metric on the space of Z = X + σW, (10) distributions with finite second moments. where W ∼ N (0,I) is standard Gaussian independent of X For the purposes of this paper, it is also useful to intro- and σ2 is the noise power. duce the conditional Wasserstein distance as follows. Given conditional distributions PX|U and PY |U and PU , we define A. Finite-Length Bounds Z Theorem 4. Let PY |Z and PZ|X be the conditional distribution 2 2 W2 (X,Y | U) , W2 (PX|U=u,PY |U=u) dPU . (7) defined by (9) and (10), respectively. For any distribution PX n on R , there exists a joint distribution PX,Y,Z with marginals Equivalently, we have W 2(X,Y | U) = inf kX − Y k2, 2 E 2 PX , PY |Z , and PZ|X , such that where the infimum is over all couplings of (U, X, Y ) satisfying √ √ √ 2 2 2 2 the marginal constraints (U, X) ∼ PU,X and (U, Y ) ∼ PU,Y . kY − Zk = ( nρS − kXk − σU) + ( nρ 1−S − σV ) , Notice that where (S, U, V, X) are mutually independent with U ∼ W2(X,Y ) ≤ W2(PU,X ,PU,Y ) ≤ W2(X,Y | U). (8) N (0, 1), V ∼ χn−1, and S is distributed according to (2). Proof. For any (U, V, S) satisfying the marginal constraints, Combining Proposition 3 and Theorem 5 provides a mean- let (X,X⊥) be drawn independently of (U, V, S) where the ingful comparison whenever the Lipschitz constant times the ⊥ marginal of X is according to PX and the marginal of X variance of kXk is sufficiently√ small. For example, one setting is uniform over the set {u ∈ Rn, uT X = 0, kuk = 1}. Next, of interest is if L = o( d) and Var kθk = O(1), since in this construct a coupling on (Y,Z) given X according to case the difference between the normalized MSE in estimating θ using f(Y ) and f(Z) converges to zero. √  X √ p ⊥ Y = X + nρS − kXk + nρ 1 − S2X (11) A useful bound on Var(kXk) is as follows: kXk X Lemma 6. Let X be a random vector with finite fourth mo- Z = X + σ U + σV X⊥. (12) √ √ kXk ments. If there exists c > 0 such that Pr[kXk ≤ c n] ≤ 1/ n, then Under this coupling we have Var(kXk2) c kY − Zk2 = Var(kXk) ≤ + . nc2 2 2 X √  ⊥√ p  2 nρS − kXk − σU + X nρ 1 − S2 − σV Proof. Let A and B be independent copies of kXk and let kXk √ √ E = {( A + B)2 ≤ δ2}. Then, √ √ 2 2 p 2 = nρS − kXk − σU + nρ 1 − S − σV , 1 √ √ 2 Var(kXk) = E A − B where the last transition is because XT X⊥ = 0. 2 1 √ √ 2  1 √ √ 2  = A − B 1 + A − B 1c . Theorem 4 is general in the sense that it holds for any 2E E 2E E choice of the parameters (M, ρ, σ). Optimizing over (ρ, σ) as a function of E[kXk] and M leads to the following upper By the definition of E, we have have Pr[E] ≤ 2 and so bound on the conditional Wasserstein distance.  √ √ 2   √ √ 2      2 2 n E A − B 1E ≤ E A + B 1E ≤ δ  . Theorem 5. Let PX be a distribution on R with finite second moments. Given R > 0, let M = d2nRe and put Meanwhile, E[kXk] E[kXk] ρ = , σ = .  √ √ 2   2  2 p −2R p 2R   (A − B) c Var(kXk ) n(1 − 2 ) n(2 − 1) E A − B 1Ec = E √ √ 1E ≤ ( A + B)2 δ2 Then, the conditional Wasserstein distance between the distri- butions defined in (9) and (10) satisfies The proof is√ completed by combining√ these inequalities and setting δ = c n and  = 1/ n. C [kXk]2 log2 n W 2(Y,Z | X) ≤ Var(kXk) + 2σ2 + RE , √ √ 2 n2 If this condition holds with δ = c n and  = 1/ n then the bound becomes where CR is a constant that depends only on R. Var(kXk2) c Proof Sketch. Under the coupling of (X,Y,Z) described in Var(kXk) ≤ 2 + Theorem 4, one obtains nc 2 B. Asymptotic Equivalence  2 h√ 2i 2 E kY − Zk = E nρS − E[kXk] + Var(kXk) + σ In the context of the problem of estimating θ from a  √ 2  p  2 quantized version of X, our results can be stated as follows. +E nρ 1 − S2 − σE[V ] + σ Var(V ). We consider a sequence of distributions on (θ, X) where both d and n scale to infinity with Standard√ properties of chi√ random variables yield Var(V ) ≤ 1 and n − 2 ≤ E[V ] ≤ n − 1. The terms involving S can 1 kXk2 = γ + o(1), Var(kXk) = O(1). (13) then be upper bounded using Proposition 2. nE The significance of Theorem 5 is that the second and third We further suppose that there exists a nonincreasing function terms are inversely proportional to the dimension of the signal. M : R+ → R+ with M(0) < ∞ and a sequence of estimators ˆ ˆ Dividing both sides by the expected norm yields: θ = θd,n,σ such that for almost all σ ∈ R+, we have √ W (Y,Z | X) pVar(kXk)  1  ˆ 2 ≤ + O √ . Lip(θ) ≤ o( d) (14) E[kXk] E[kXk] n 1  2 θ − θˆ(Z) = M(snr) + o(1), (15) It is also interesting to note that the signal-to-noise ratio of dE the Gaussian observation model (10) depends primarily on the 2 code bitrate R, with where snr = γ/σ . The following result follows immediately from Proposition 3 1 2 E[kXk] and Theorem 5. n = 22R − 1. σ2 Theorem 7. For almost all R ∈ R+, the MSE distortion of Proof I: We prove the result first for the case where Pθ has ˆ ˆ? random spherical encoding with decoding function θ satisfies a bounded support [−B,B]. Consider the Bayes rule θ (z) , 2 E[θ | Z = z] of θ from Z = X+σW . Since PX|θ = N (θ,  I),  2 1 2R  we have E θ − θˆ(Y ) = M 2 − 1 + o(1). (16) d 1 ∇ θˆ?(z) = Cov(θ | Z = z). z 2 + σ2 Proof. Theorem 7 is an immediate corollary of Theorem 5 and The assumption that θ is iid implies Proposition ??. ˆ 1 k∇zθk = 2 2 max Var(θ | Zi = zi) Remark 2. The assumption that (15) holds almost everywhere  + σ i 1 allow for the possibility that M(snr) has a countable number = 2 2 Var(θ1 | Z1 = z1). of jump discontinuities. This can arise, for example, in the  + σ context of random linear estimation [25]. Hence, since the entries are bounded, we have 2 ˆ ˆ B Lip(θ) = sup k∇zθk ≤ 2 2 . IV. APPLICATIONS AND EXAMPLES z  + σ The risk of θˆ? is given by This section provides some specific examples of problem   p 2 2 settings satisfying the assumptions of Theorem 7. Given jointly mmse θ1 | θ1 +  + σ W . random vectors (θ, U) on Rd × Rn, we use We see that conditions (13), (14), and (15) are met with 1 h 2i r ! mmse(θ | U) kθ − [θ | U]k γ + 2 , dE E M(snr) mmse θ | θ + 2 + W . , 1 1 snr the quadratic Bayes optimal risk in estimating θ from U. In Thm. 7 implies that the limit the following we give some examples where the Bayes optimal  2 estimator of θ from Z satisfies the Lipschitz condition (14), 1 ? lim E θ − θˆ (Y ) and thus M(snr) corresponds to mmse(θ | Z). n→∞ n exists and equals A. Noisy Source Coding 2R M(2 − 1) = mmse(θ1 | θ1 + ηW ). Consider the additive Gaussian noise model For the general case where θ is not necessarily bounded, we consider the random vector θB obtained by trimming each X = θ + W coordinate of θ to [−B,B] and it corresponding Bayes rule h i ˆ? p 2 2 where W ∼ N (0,I). The problem of quantizing X at bitrate θB(z) = E θB | θB + σ +  W . R and estimating θ from this quantized version is an instance of the noisy (a.k.a. indirect or remote) source coding problem From the first part of the proof, we have [10], [11]. An achievable distortion for this problem using RSC 1  2 1  2 θ − θˆ? (Y ) = A + θ − θˆ? (Y ) is given by the following theorem. nE B n,B nE B B 2R Theorem 8. Assume that the entries of θ are i.i.d. from a distri- = An,B + MB(2 − 1) + o(1) (17) bution with second moment γ and finite fourth moment. There σ where MB(snr) = mmse(θB,1 | θB,1 + snr W1) and exists a bitrate-R spherical code achieving MSE distortion 1 2 h i A = kθ − θ k2 − hθ − θ , θ − θˆ? (Y )i . n,B nE B nE B B B Dsp(R, ) , mmse(θ1 | θ1 + ηW ), Since θ has a finite p-th moment for p > 2, we have where W ∼ N (0, 1) and 1 1 h i [kθ − θ k]2 ≤ kθ − θ k2 nE B nE B s h i 2 + 2−2Rγ = kθ 1 (|θ |)k2 ≤ θ2+p B−p. η = η(R, 2) = . E 1 [B,∞] 1 E 1 1 − 2−2R This implies s Proof. We present two different proofs. The first is obtained by 1 h i 1 h i  2 A ≤ kθ − θ k2 + 2 kθ − θ k2 θ − θˆ? (Y ) approximating he distribution of the input using a distribution n,B nE B n2 E B E B B for which the Bayes optimal estimator is Lipschitz. The h i r h i second is obtained by approximating the estimator using a 2+p −p 2+p −p 2R ≤ E θ1 B + 2 E θ1 B (MB(2 − 1) + O(1)) Lipschitz function. = O(B−p). D Finally, we note that θB → θ as B → ∞, we have that 1 Dsp(R) MB(snr) → M(snr) [26, Thm. 4]. To summarize, we conclude that for any  > 0, there exists B large enough such that, for DG(R) D (R) all n ∈ N, An,B ≤ /3. Given such B, one may take n Sh large enough such that the o(1) term in (17), as well as the 2R 2R difference M(2 − 1) − MB(2 − 1) , are smaller than /3. For such B and n, we have that MSE 1  2 θ − θˆ? (Y ) ≤ M(22R − 1) + . nE B ˆ? Proof II: Consider the Bayes rule θ (z) , E[θ | Z = z] of 0 θ from Z = X + σW . The Bayes optimal risk is given by 0 1 2  p  mmse(θ | Z) = mmse θ | θ + σ2 + 2W R r ! γ + 2 Fig. 3. MSE in encoding an equiprobable binary signal versus coding bitrate = mmse θ | θ + 2 + W =: M(snr). R D (R) D (R) 1 1 snr : sp is achievable using a RSC, G is the Gaussian distortion- rate function achievable using a Gaussian code, and DSh(R) is Shannon’s distortion-rate function describing the minimal achievable distortion under any 2 ˆ? Since PX|θ = N (θ,  I) and θ is iid, each coordinate of θ (z) bitrate-R code. is analytic on R and hence has a bounded derivative over [−t, t] for any t > 0. It follows that the estimator g defined as B. Standard Source Coding

ˆ? gi(z) = arg min u − θi (z) , i = 1, . . . , d An interesting special case of the setting of Theorem 8 arises u∈[−t,t] in the limit γ/2 → ∞, in which the noisy source coding is Lipschitz continuous on R with Lipschitz constant equals to problem reduces to a standard one. The minimal achievable d ˆ? 1 distortion for this problem is known to be Shannon’s DRF of θ, sup θi (z) = 2 2 Cov(θ | Z = z). t∈[−t,t] dz  + σ denoted here by DSh(R). As Thm. 8 implies that there exists a spherical code attaining Since θ is iid with a finite second moment, we have that √  γ  1 h 2i h 2i kθ − g(Z)k = (θ − g(Z )) Dsp(R) , Dsp(R, 0) = mmse θ1 | θ1 + √ W1 , nE E 1 1 22R − 1  r !2 γ + 2 we conclude that Dsp(R) is an achievable distortion for this = θ1 − g(θ1 + W ) =: Mg(snr). θ E snr  problem. We further note that if is zero mean, then −2R DSh(R) ≤ Dsp(R) ≤ DG(R) , 2 γ, (18) Furthermore, the regret of g satisfies with equality if and only if P = N (0, γ). The fact that 1  2 θ1 M (snr) − M(snr) = θˆ?(Z) − g(Z) the Gaussian DRF D (R) is achievable for this problem g nE G already follows from [7], [9]. Our results imply that one can  2  2  ˆ?  ˆ?  ˆ? usually do much better than the Gaussian DRF by applying the = E θ1(X) − g1(X) = E θ1(X) − g1(X) 1[t,∞)](|θ1(X)|) optimal Bayes estimator of θ from θ + pγ/(22R − 1)W to  2   2+p an appropriately scaled version of the encoded representation. ≤ θˆ?(X) 1 (|θˆ?(X)|) ≤ t−p θˆ?(X) E 1 [t,∞)] 1 E 1 Figure 3 illustrates the distortion attained by RSC compared h i to Shannon’s and the Gaussian DRFs in encoding an i.i.d. ≤ t−p |θ |2+p . E 1 signal uniform over {−1, 1}. This figure reveals a disadvantage Since θ is iid with a finite second moment, it satisfies the of RSC in this setting: It does not attain the Shannon limit conditions of Lemma 6 and so Var(kθk) = O(1). We see that unless θ is Gaussian. In particular, for Pθ with a finite entropy, conditions (13), (14), and (15) are met for the estimator g with Dsp(R) > 0 for any R whereas DSh(R) equals zero for any an asymptotic MSE function Mg(snr). Theorem 7 implies that R exceeding this entropy. In this sense, RSC can be seen as a the limit compromise between the optimal indirect source coding scheme 1 h 2i lim E kθ − g(Y )k that requires the full specification of Pθ|X at the encoder n→∞ n [11] and the Gaussian coding scheme that assumes the least exists and equals favorable prior on X. 2R 2R −p h 2+pi Mg(2 − 1) ≤ M(2 − 1) + t E |θ1| . C. Inference in High-Dimensional Linear Models Since t may be chosen arbitrarily large, Consider the case where X = Aθ + W with A ∈ Rn×d and W ∼ N (0,I). The corresponding relation between θ and D (R, ) M(22R − 1) = mmse(θ | θ + ηW ) sp , 1 1 Z is given by is achievable. Z = Aθ + ξW, (19) p where ξ , ξ(snr) , 2 + E[kXk2]/ snr. We use the [9] A. Lapidoth, “On the role of mismatch in rate distortion theory,” IEEE approximate message passing (AMP) algorithm [23] to estimate Trans. Inform. Theory, vol. 43, no. 1, pp. 38–47, 1997. [10] R. Dobrushin and B. Tsybakov, “Information transmission with additional θ from Z, and evaluate the Bayes risk of this estimator using noise,” IRE Trans. Inform. Theory, vol. 8, no. 5, pp. 293–304, 1962. T the state evolution property of AMP. Specifically, let θAMP(z) [11] T. Berger, Rate-distortion theory: A mathematical basis for data denote the result of T iterations of the AMP algorithm applied compression. Englewood Cliffs, NJ: Prentice-Hall, 1971. n [12] T. S. Han, “Hypothesis testing with multiterminal data compression,” to z ∈ R . Under the assumption that θ is i.i.d. with a bounded IEEE Trans. Inform. Theory, vol. 33, no. 6, pp. 759–772, 1987. T support, it can be shown that θAMP has a Lipschitz constant [13] Z. Zhang and T. Berger, “Estimation via compressed information,” IEEE independent of n and d. In addition, let MT (snr) = τ 2 be Trans. Inform. Theory, vol. 34, no. 2, pp. 198–211, 1988. AMP T [14] S. Amari et al., “Statistical inference under multiterminal data com- defined by the recursion pression,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2300–2324, 1998. (ξ2 + d Var(θ ) t = 0, 2 n 1 [15] J. C. Duchi, M. I. Jordan, M. J. Wainwright, and Y. Zhang, “Opti- τt = 2 d h 2i mality guarantees for distributed statistical estimation,” arXiv preprint ξ + n E (E[θ1|θ1 + τt−1W1] − θ1) t = 1,...,T. arXiv:1405.0782, 2014. [16] Y. Han, P. Mukherjee, A. Ozgur, and T. Weissman, “Distributed statistical If we assume that A has i.i.d. entries N (0, 1/n) and n/d → estimation of high-dimensional and nonparametric distributions,” in 2018 δ ∈ (0, ∞), then the main result of [24] says that IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 506–510. 1 h T 2i T [17] A. Kipnis and J. C. Duchi, “Mean estimation from adaptive one-bit θ − θ (Z) = M (snr) + o(1). dE AMP AMP measurements,” in Communication, Control, and Computing (Allerton), 2017 55nd Annual Allerton Conference on. IEEE, 2017, to be presented. Thm. 7 now implies that for almost all R, δ and , the limit [18] A. Kipnis, G. Reeves, and Y. C. Eldar, “Single letter formulas for quantized compressed sensing with gaussian codebooks,” 2018, available 1 h T 2i online: http://www.stanford.edu/~kipnisal/Papers/QuantizedCS_ISIT2018. lim E θ − θAMP(Y ) n→∞ d pdf. [19] A. Kipnis, S. Rini, and A. J. Goldsmith, “Compress and estimate in T 2R exists and is given by MAMP(2 − 1). multiterminal source coding,” 2017, unpublished. [Online]. Available: https://arxiv.org/abs/1602.02201 V. CONCLUSIONS [20] A. Kipnis, A. J. Goldsmith, and Y. C. Eldar, “The distortion-rate function of sampled wiener processes,” IEEE Transactions on Information Theory, We considered the problem of estimating an underlying vol. 65, no. 1, pp. 482–499, Jan 2019. signal from the lossy compressed version of another high [21] G. Murray, A. Kipnis, and A. J. Goldsmith, “Lossy compression of dimensional signal. We showed that when the compression is decimated gaussian random walks,” in 2018 52nd Annual Conference on Information Sciences and Systems (CISS), March 2018, pp. 1–6. obtained using a RSC of bitrate R, the conditional distribution [22] A. Kipnis, G. Reeves, Y. C. Eldar, and A. J. Goldsmith, “Fundamental of the output codeword is close in Wasserstein distance to the limits of compressed sensing under optimal quantization,” in Information conditional distribution of the output of an AWGN channel with Theory (ISIT), 2017 IEEE International Symposium on, 2017. 2R [23] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo- SNR 2 − 1. This equivalence between the noise associated rithms for compressed sensing,” Proceedings of the National Academy with lossy compression and an AWGN channel allows us to of Sciences, vol. 106, no. 45, pp. 18 914–18 919, 2009. adapt existing techniques for inference from AWGN-corrupted [24] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. measurements to estimate the underlying signal from the Inform. Theory, vol. 57, no. 2, pp. 764–785, Feb 2011. compressed measurements, as well as to characterize their [25] G. Reeves and H. D. Pfister, “The replica-symmetric prediction for asymptotic performance. compressed sensing with Gaussian matrices is exact,” in Proc. IEEE Int. Symp. Inform. Theory, Barcelona, Spain, Jul. 2016, pp. 665 – 669. [26] Y. Wu and S. Verdu, “Functional properties of minimum mean-square CKNOWLEDGMENTS A error and mutual information,” IEEE Trans. Inform. Theory, vol. 58, The authors wish to thank Cynthia Rush and David Donoho no. 3, pp. 1289–1301, March 2012. for helpful discussions and comments.

REFERENCES [1] R. Gray and D. Neuhoff, “Quantization,” IEEE Trans. Inform. Theory, vol. 44, no. 6, pp. 2325–2383, Oct 1998. [2] W. R. Bennett, “Spectra of quantized signals,” Bell System Technical Journal, vol. 27, no. 3, pp. 446–472, 1948. [3] R. Zamir and M. Feder, “On lattice quantization noise,” IEEE Trans. Inform. Theory, vol. 42, no. 4, pp. 1152–1159, Jul 1996. [4] D. H. Lee and D. L. Neuhoff, “Asymptotic distribution of the errors in scalar and vector quantizers,” IEEE Trans. Inform. Theory, vol. 42, no. 2, pp. 446–460, 1996. [5] L. Schuchman, “Dither signals and their effect on quantization noise,” IEEE Trans. Commun. Technol., vol. 12, no. 4, pp. 162–165, 1964. [6] P. Delsarte, J.-M. Goethals, and J. J. Seidel, “Spherical codes and designs,” in Geometry and Combinatorics. Elsevier, 1991, pp. 68–93. [7] D. Sakrison, “A geometric treatment of the source encoding of a gaussian random variable,” IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 481–486, 1968. [8] A. D. Wyner, “Communication of analog data from a Gaussian source over a noisy channel,” The Bell System Technical Journal, vol. 47, no. 5, pp. 801–812, 1968.