Gaussian Approximation of Quantization Error for Estimation from Compressed Data

Gaussian Approximation of Quantization Error for Estimation from Compressed Data Alon Kipnis∗ and Galen Reevesy ∗ Department of Statistics, Stanford University y Department of Electrical and Computer Engineering, Department of Statistical Science, Duke University Abstract—We consider the statistical connection between the f1;:::; 2nRg quantized representation of a high dimensional signal X using Enc Dec Y a random spherical code and the observation of X under an additive white Gaussian noise (AWGN). We show that given 2 X, the conditional Wasserstein distance between its bitrate-R θ PXjθ X N (0; σ I) quantized version and its observation under AWGN of signal-to- noise ratio 22R − 1 is sub-linear in the problem dimension. We then utilize this fact to connect the mean squared error (MSE) + Z attained by an estimator based on an AWGN-corrupted version of X to the MSE attained by the same estimator when fed with Fig. 1. Inference about the parameter vector θ is based on degraded its bitrate-R quantized version. observations of the data X. The effects of bitrate constraints are compared to the effect of additive gaussian noise by studying the quadratic Wassertein I. INTRODUCTION distance between PY jX and PZjX . Under random spherical encoding, we show that this distance is order one, and thus independent of the problem Due to the disproportionate size of modern datasets compared dimensions. to available computing and communication resources, many inference techniques are applied to a compressed representation distributed over the sphere of radius r = pn(1 − 2−2R). The of the data rather than the data itself. In the attempt to develop left side of Fig. 2, adapted from [7], shows that the angle and analyze inference techniques based on a degraded version α between X and its nearest codeword X^ concentrates as n of the data, it is tempting to model inaccuracies resulting increases so that sin(α) = 2−R with high probability. As a from lossy compression as an additive noise. Indeed, there result, the quantized representation of X and the error X − X^ exists a rich literature devoted to the characterization of this are orthogonal, and thus the MSE between X and its quantized “noise”, i.e., the difference between the original data and representation averaged over all random codebooks converges −2R its compressed representation [1]. Nevertheless, due to the to the Gaussian distortion-rate function (DRF) DG(R) , 2 . difficulty of analyzing non-linear compression operations, this In fact, as noted in [9], this Gaussian coding scheme1 achieves characterization is generally limited to the high bit compression the Gaussian DRF when X is generated by any ergodic source regime [2]–[4] or compression using a dithered quantizer [5]. of unit variance, implying that the distribution of kX − X^k2 is In this paper, we show that quantizing the data using a independent of the distribution of X as the problem dimension random spherical code (RSC) leads to a strong and relatively n goes to infinity. simple characterization of the distribution of the quantization In this paper, we show that a much stronger statement holds noise. Specifically, consider the output codeword of a bitrate-R for a scaled version of the quantized representation (Y in Fig. 2): RSC and the output of a Gaussian noise channel with signal-to- in the limit of high dimension, the distribution of Y − X is noise ratio (SNR) 22R − 1 illustrated in Fig. 1; our main result independent of the distribution of X and is approximately says that the conditional Wasserstein distance between these Gaussian. This property of Y − X suggests that the underlying two outputs is independent of the problem dimension. This signal or parameter vector θ can now be estimated as if X is connection allows us to adapt inference procedures designed observed under AWGN. This paper formalizes this by showing for AWGN-corrupted data to be used with the quantized data, that estimators of θ from an AWGN-corrupted version of X (Z as well as to analyze the performances of such procedures. in Fig. 1) attain similar performances if applied to the scaled Spherical codes have multiple theoretical and practical uses representation Y . in numerous fields [6]. In the context of information theory, We emphasize that the scaling of the quantized representation Sakrison [7] and Wyner [8] provided a geometric understanding and the estimation of θ are done at the decoder, and thus X is of random spherical coding in a Gaussian setting, and our essentially represented by the codeword maximizing the cosine main result can be seen as an extension of their insights. similarity. In particular, no additional information about X or Specifically, consider the representation of an n-dimensional θ is required by the encoder. This situation is in contrast to standard Gaussian vector X using 22R codewords uniformly optimal quantization schemes in indirect source coding [10], The work of G. Reeves was supported in part by funding from the NSF 1We denote this scheme as Gaussian since r is chosen according to the under Grant No. 1750362. Gaussian DRF. error sphere II. PRELIMINARIES representation Our basic setting considers an n-dimensional random vector sphere X with distribution PXjθ, where θ is a d-dimensional random vector with prior P . We refer to X as the measurements and X θ ^ Y to θ as the underlying signal, and consider the problem of X α X α p p recovering θ from a bitrate-R quantized version of X. n n r p p ρ A. Random Spherical Codes n n We quantize X using a random spherical code. Such code consists of a list of M codewords (codebook): input Csp(M; n) , fC(1);:::;C(M)g; sphere where each C(i), i = 1;:::;M, is drawn independently from p n Fig. 2. Left: In the standard source coding problem [7], [8], the representation the uniform distribution over the sphere of radius n in R . sphere is chosen such that the error X^ − X is orthogonal to the reconstruction The quantized representation of X is given by the codeword X^. Right: In this paper the representation sphere is chosen such that the error c(i∗) that maximizes the cosine similarity between X and the Y − X is orthogonal to the input X. codewords, namely: ∗ ∗ hc(i);Xi [11] and in problems involving estimation from compressed Enc(X) = c(i ); i = arg max : 1≤i≤Mn kc(i)kkXk information [12]–[17] which depends on PXjθ. As a result, the random spherical coding scheme we rely on is in general sub- Let the cosine similarity between the vector X and its optimal, although it can be applied in situations where θ or the quantized representation be denoted by the random variable model that generated X are unspecified or unknown. Coding hX; Enc(X)i S : schemes with similar properties were studied in the context of , kXkkEnc(X)k indirect source coding under the name compress-and-estimate in [18]–[21]. Proposition 1 ( [7], [8]). The cosine similarity S is independent of X and has distribution function To summarize the contributions of this paper, consider the M vectors Y and Z illustrated in Fig. 1, describing the output of Pr[S ≤ s] = (1 − Qn(s)) ; (2) a bitrate-R RSC applied to X and the output of an AWGN 1 Γ( n ) where Q (s) = κ R (1 − t2)(n−3)=2 dt and κ = p 2 . channel with input X, respectively. Our main result shows that n n s n πΓ( n−1 ) the quadratic Wasserstein distance between Y and Z given the 2 nR input X is sub-linear in the problem dimension n. We further Proposition 2. If M = d2 e with R > 0, then use this result to establish the equivalence p 2 log2 n 1 S − 1 − 2−2R = O + E n2(22R − 1) n2 2R 2 Dsp(R) ≈ M(snr); snr = 2 − 1; (1) p 2 log n 1 1 − S2 − 2−R = O + : E n222R n2 where Dsp(R) is the MSE of estimating θ from Y with input Proof. With a bit of work, it can be verified that X as a function of the bitrate, and M(snr) is the MSE of κ −M n (1−a2)(n−1)=2 2 (n−1)=2 estimating θ under additive Gaussian noise as a function of Pr[S 2 [a; b]] ≥ 1 − e n−1 − Mκn(1 − b ) the SNR in the channel from X to Z. Next, we use (1) to for all 0 ≤ a ≤ b ≤ 1. For any 2 (0; 1) such that log(1/) < analyze the MSE of Lipschitz continuous estimators of θ Mκn=(n − 1), letting from Y by considering their MSE in estimating θ from Z. The benefit is twofold: Allowing inference techniques from (n − 1) log(1/)2=(n−1) a2 = 1 − (3) AWGN-corrupted measurements to be applied directly on the Mκn quantized measurements to recover θ, as well as providing the 2=(n−1) expected MSE in this recovery. Finally, we apply our results b2 = 1 − ; (4) Mκn to two special cases where the measurement model PXjθ is Gaussian. The first case demonstrates how RSC can be used in leads to Pr[S 2 [a; b]] ≥ 1 − 2. Since S 2 [−1; 1], it follows conjunction with the prior distribution to attain MSE smaller that for any s 2 [a; b], than D (R) when this prior is not Gaussian. The second case G 2 2 demonstrates how to derive an achievable distortion for the E jS − sj ≤ (b − a) + 8 quantized compressed sensing problem [18], [22] using the Or goal is to show that this term is small with approximate message passing (AMP) algorithm and its state p evolution property [23], [24].

Load more