<<

1 Estimation from Quantized Gaussian Measurements: When and How to Use Dither Joshua Rapp, Robin M. A. Dawson, and Vivek K Goyal

Abstract—Subtractive dither is a powerful method for remov- be arbitrarily far from minimizing the mean-squared error ing the signal dependence of quantization for coarsely- (MSE). For example, when the population variance vanishes, quantized signals. However, estimation from dithered measure- the sample mean estimator has MSE inversely proportional ments often naively applies the sample mean or midrange, even to the number of samples, whereas the MSE achieved by the when the total noise is not well described with a Gaussian or midrange estimator is inversely proportional to the square of uniform distribution. We show that the generalized Gaussian dis- the number of samples [2]. tribution approximately describes subtractively-dithered, quan- tized samples of a Gaussian signal. Furthermore, a generalized In this paper, we develop estimators for cases where the Gaussian fit leads to simple estimators based on order statistics quantization is neither extremely fine nor extremely coarse. that match the performance of more complicated maximum like- The motivation for this work stemmed from a series of lihood estimators requiring iterative solvers. The order statistics- experiments performed by the authors and colleagues with based estimators outperform both the sample mean and midrange single-photon lidar. In [3], temporally spreading a narrow for nontrivial sums of Gaussian and uniform noise. Additional laser pulse, equivalent to adding non-subtractive Gaussian analysis of the generalized Gaussian approximation yields rules dither, was found to reduce the effects of the detector’s coarse of thumb for determining when and how to apply dither to temporal resolution on ranging accuracy. Later work on a quantized measurements. Specifically, we find subtractive dither similar system showed that implementing subtractive dither to be beneficial when the ratio between the Gaussian standard could likewise reduce the effects of coarse quantization [4], deviation and quantization interval length is roughly less than 1/3. If that ratio is also greater than 0.822/K0.930 for the number [5]. Our aim was then to compare the two approaches by of measurements K > 20, we present estimators more efficient determining performance limits, optimal estimators, and when than the midrange. one method might be preferable over the other. The estimators we develop in this work are based on a generalized Gaussian Keywords—Quantization, subtractive dither, generalized Gaus- (GG) approximation for the combination of sample variation sian distribution, order statistics, L-estimator, alpha-trimmed mean, midrange and quantization noise, which the authors first proposed in [5]. While the benefit of the GG approximation did not yield improved results for the lidar data due to model mismatch, I.INTRODUCTION our framework is valid for a more general set of problems Estimation of the mean of a Gaussian distribution from in which quantization of a Gaussian scalar signal occurs. We independent and identically distributed (i.i.d.) samples is a propose a number of estimators for additive GG noise and canonical problem in statistics, yet it has important subtleties find a clear computational advantage with negligible loss in when the samples are quantized. Without quantization, the accuracy for simple estimators based on order statistics. sample mean is an unbiased, efficient, and consistent estimator. With uniformly quantized samples, the situation is immediately more complicated: The sample mean is an unbiased estimate A. Main Contributions only when the true mean falls on the quantizer’s reproduction This paper makes the following contributions: grid or asymptotically in the limit of fine quantization [1]; and 1) Estimation Efficiency: We demonstrate the inefficiency of in the opposite extreme of very coarse quantization, all the the mean and midrange estimators for subtractively-dithered samples are identical, so the estimates do not even improve measurements of a Gaussian signal by deriving the maximum with increasing numbers of data samples. likelihood estimator and Cramér-Rao bound. 2) Generalized Gaussian Approximation: arXiv:1811.06856v2 [stat.AP] 22 Feb 2019 The use of subtractive dither changes the situation sub- We expand upon stantially. The sample mean is then an unbiased and con- the generalized Gaussian approximation introduced in [5] for sistent estimator—like in the unquantized case—but it may the sum of Gaussian and uniform random variables that arises from subtractively dithered quantization of a Gaussian signal, This work was supported in part by a Draper Fellowship under Navy using the approximation to determine three distinct regimes of contract N00030-16-C-0014, the U.S. National Science Foundation under Grant 1422034, and the DARPA REVEAL program under Contract No. estimator behavior. HR0011-16-C-0030. This work was presented in part at the IEEE International 3) Estimator Proposal: We consider a family of location Conference on Image Processing, Athens, Greece, October 2018. estimators based on the GG approximation, in particular linear J. Rapp and V. K. Goyal are with the Department of Electrical and combinations of the measurement order statistics. We introduce Computer Engineering, Boston University, Boston, MA, 02215 USA (e-mail: [email protected]; [email protected]). a version of the trimmed mean estimator with trimming deter- J. Rapp and R. Dawson are with the Charles Stark Draper Laboratory, mined by the GG approximation that is simple, computation- Cambridge, MA, 02139 USA. ally efficient, and performs as well as the ML estimator. Monte 2

Carlo estimator comparisons are shown versus the number of signal [6], [7]: measurements K and versus σ /∆, the ratio of the Gaussian Z K ∆ standard deviation to the quantization bin size. ui µX + 2 µQML = arg max log Φ − 4) Rules of Thumb: We determine several key rules of µ σ − X i=1 " Z ! thumb for deciding when and how to use subtractive dither. X ∆ b ui µX For instance, we find dither is not beneficial roughly for Φ − − 2 , (2) σZ /∆ > 1/3; below this value, however, applying subtractive σZ !# dither and a GG-based estimator lowers the MSE. Moreover, if the quantization is coarser than 0.930 and where Φ( ) is the cumulative distribution function (CDF) of σZ /∆ = 0.822/K · K > 20, then the midrange is a good estimator. a standard normal random variable. Still, µQML is no more accurate than µQ when all of the samples have the same value. Because of the coarse quantization mapping every value in b [j∆ ∆/2, j∆ + ∆/2] to j∆ for j Z, the resolution of an B. Outline − b ∈ estimate µX is limited by the bin size ∆ and the quantization This paper is organized as follows. SectionII sets up the error is signal-dependent. problem of measuring a Gaussian signal with a subtractively- Statisticians have long recognized that working with b dithered quantizer and explores the fact that the mean and rounded data is not the same as working with underlying midrange are inefficient estimators. Section III motivates the continuous-valued data. Let Xhist be the continuous random 1 use of the generalized Gaussian distribution and estimators variable with density constant on intervals ((j 2 )∆, (j + based on order statistics. SectionIV discusses several estimator 1 )∆) with P(X ((j 1 )∆, (j + 1 )∆)) =− P(U = k∆), 2 hist ∈ − 2 2 implementations for our noise model. SectionV introduces for all j Z. Because of the piecewise-constant form, ∈ mean-squared error expressions as a guide to better under- Xhist is said to have a histogram density [8]. The widely standing the results of numerical simulation presented in Sec- known Sheppard’s corrections introduced in [9], [10] relate tionVI, which tests several estimators and compares the use of the moments of U and the moments of Xhist [11]. From the quantized data with and without dithering. Finally, Section VII construction of Xhist, it is immediate that these corrections are presents our conclusions regarding which estimators to use and zero for odd moments. See [12] for a review of Sheppard’s when to apply dither. corrections and [13] for results for autoregressive and moving average processes and more recent references. The moments of Xhist being close to the moments of II.FORMULATION,BACKGROUND, AND MOTIVATION X depends on continuity arguments and ∆ being small. In contrast, our interest here is in situations where the quantiza- A. Quantized Measurement tion is coarse relative to the desired precision in estimating We begin by presenting and expanding upon the signal µX . Quantization may be coarse because of limitation of acquisition model introduced in [5]. Suppose we have an instruments, such as the fundamental trade-offs in analog-to- digital converters [14] or the time resolution in time-correlated unknown constant signal µX corrupted by additive, zero-mean single photon counting [15], which may be coarse relative to Gaussian noise Z (0, σZ ). Then estimation of µX from K independent samples∼ N the resolution desired for time-of-flight ranging. When quantizing Xhist, the quantization error Ehist = q(X ) X is uniformly distributed on [ ∆/2, ∆/2] Xi = µX + Zi, i = 1, 2,...,K, hist − hist − and independent of Xhist. In general, however, quantization K error being uniformly distributed and independent of the input is straightforward, as the sample mean µ = (1/K) i=1 Xi can easily be shown to be an efficient estimator of the mean of does not extend to the quantization of X; approximating a Gaussian distribution. However, all measurement instrumentsP quantization error as such—without regard to whether the input perform some quantization. For instance, consider a uniform has a histogram density—is often called the “additive-noise model,” “quantization-noise model,” or “white-noise model.” midtread quantizer q( ) with bin size ∆ applied to Xi when σ ∆. Except when· µ is close to a quantizer threshold, it A substantial literature is devoted to understanding the validity Z  X of this approximation, e.g. [16]–[20]. will be the case that Ui = q(Xi) is identical for all i, so that the “quantized-sample mean” given as One approach to improving the precision of estimates from quantization-limited measurements is the use of dither, a small signal introduced before the discretization to produce enough 1 K µ = U , (1) variation in the input such that it spans multiple quantization Q K i i=1 levels. By combining multiple dithered measurements, esti- X mates can achieve resolution below the least-significant bit b is no more informative an estimate of µX than any single and the result may also have desirable statistical and per- measurement. For σZ not too small compared to ∆, estimation ceptual properties, such as whitened noise. Early applications error can be reduced by properly accounting for the quantiza- empirically demonstrating the benefits of dither include control tion and the underlying distribution, e.g., via the maximum systems [21], [22], image display [23], [24], and audio [25], likelihood estimator for quantized samples of a Gaussian with numerous contributions to the statistical theory developed 3

in [16], [17], [26]–[30], among others. More recent work has approach to estimating µX from K dithered measurements Yi, focused on varying the quantizer thresholds primarily for 1-bit i = 1, 2,...,K, is via the sample mean measurements in wireless sensing networks, including [31]– K [37]. While these works consider various methods for optimiz- 1 µ = Y . (9) ing or adapting thresholds, they are restricted to considering mean K i i=1 only nonsubtractively-dithered quantization. X The MSE of the sampleb mean is

B. Subtractively-Dithered Quantization 2 MSE(mean) = σV /K, (10) If it is possible to know the dither signal exactly, − subtractively-dithered quantization with the proper dither sig- which is O(K 1). Although using the sample mean is logical nal makes the quantization error uniformly distributed and when σZ ∆ so that the contribution of the uniform noise  independent of the input. Define the dither signal Di, i = component is negligible, the sample mean is not in general an 1,...,K as a sequence of i.i.d. random variables, independent efficient estimator. For example, in an alternative case of σZ = 1 of the noisy quantizer input Xi. The output of a subtractively- 0, a maximum likelihood (ML) estimator is the midrange dithered quantizer is 1 µmid = Y(1) + Y(K) , (11) Y = q(X + D ) D , (3) 2 i i i − i where Y Y Y are the order statistics of the with the quantization error defined as (1) (2) b (K) K measured≤ samples.≤ · · Whereas · ≤ the MSE of the sample mean W = Y X . (4) for σ = 0 is ∆2/(12K), the MSE of the midrange is i i − i Z Define the characteristic function of the dither signal proba- MSE(mid) 2 (12) bility density function (PDF) as = ∆ /[2(K + 1)(K + 2)], −2 M (ju) = [ejuD]. (5) which is O(K ) and hence better than the sample mean by D E an unbounded factor [39]. Nevertheless, the midrange is not Then Schuchman’s condition [28] is the property of the dither a good estimator in the general case of σZ > 0, as it relies PDF that on the finite support of the uniform distribution. If instead MD (j2π`/∆) = 0, ` Z 0. (6) σZ is much larger than ∆, rendering the uniform component ∈ \ negligible, then the MSE of the midrange would only improve As long as the quantizer has a sufficient number of levels so as O(1/ log(K)) [40]. As others have noted for quantization that it does not overload, by [17], [29] the Schuchman condi- of a Gaussian signal without dither [6], [7], the key figure of tion is necessary and sufficient for Xi to be independent of Wj merit for determining estimator performance is then σ /∆, for all i, j, with i.i.d. W [ ∆/2, ∆/2]. Subtractive dither Z i ∼ U − a measure of the relative sizes of the noise components. We often uses a uniform dither signal with D [ ∆/2, ∆/2] observe that normalizing the MSE by ∆2 removes the separate because its characteristic function ∼ U − dependence on σZ and ∆, resulting in sin(u∆/2) 2 MD(ju) = NMSE(mean) = [(σ /∆) + 1/12]/K, (13) u∆/2 Z meets Schuchman’s condition (6). and The rest of this paper considers only when Schuchman’s NMSE(mid) = 1/[2(K + 1)(K + 2)]. (14) condition is met, with an i.i.d. input signal of the form 2 Except in trivial cases (σZ ∆ or σZ ∆), V has neither Xi = µX + Zi,Z (0, σZ ), an i.i.d. dither signal Di   [ ∆/2, ∆/2] independent∼ N of the input signal, and a non-∼ Gaussian nor uniform distribution, so the conventional mean overloadingU − uniform quantizer. Then the dithered measure- and midrange estimators are expected to be suboptimal. Fur- ments take the form thermore, existing nonlinear processing schemes for dithered measurements do not adapt to best suit the noise statistics [41]. Yi = µX + Zi + Wi, (7) A first approach to finding a better estimator for arbitrary σZ /∆ is to derive the maximum likelihood (ML) estimator for and the problem of estimating µX simply becomes one of mit- the dithered noise model. From the definitions of the random igating independent additive noise. The sum of the Gaussian variables, the PDF of W is and uniform terms can be combined into a single total noise term to obtain 1/∆, w [ ∆/2, ∆/2] f (w) = ∈ − Yi = µX + Vi, (8) W 0, otherwise,  where Vi = Zi + Wi are i.i.d. Then the means and variances 1 simply add so that µ = 0 and σ2 = σ2 + ∆2/12. Any statistic in [Y(n) − ∆/2,Y(1) + ∆/2] is an ML estimator for V V Z the mean of a uniform distribution with known variance [38, p. 282]. The For convenient shorthand, we refer to measurements from midrange is commonly used because it is unbiased and the minimum- a quantizer without dither as “quantized” and measurements variance estimator among linear functions of order statistics [39]. However, from a subtractively-dithered quantizer as “dithered.” The usual no uniformly minimum-variance unbiased estimator exists [38, p. 331]. 4

K = 125 and the PDF of Z is fZ (z) = φ(z/σZ )/σZ , where φ(x) is the standard normal PDF. Since the total noise is the sum of 10-1 mean independent noise terms, the PDF of the samples is given by midrange the convolution ML -2 NCRB 10 fV (v) = fZ (z) fW (w) ∆∗ E 2 S 1 -3 = f (v τ)dτ M 10 Z N ∆ − ∆ − Z 2 ∆ ∆ 1 v + v -4 = Φ 2 Φ − 2 . (15) 10 ∆ " σZ ! − σZ !#

For i.i.d. samples from a dithered quantizer, the likelihood -3 -2 -1 0 function is then 10 10 10 10

C. Mixed Measurements of Vector Signals GG shape approximation versus

other estimation problems involving quantized data have been 10 b studied extensively. In particular, interest in linear inverse p problems—both undersampled and oversampled—has resulted 101 in work on estimating vectors from quantized linearly mixed measurements, with and without subtractive dither. 100 Quantized, mixed, noisy measurements of a vector x can 10-3 10-2 10-1 100 101 be represented as y x z , where K×N is = Q(A + ) A C

Relationship between -(p) and p 100

v

(

) p -1 ( 10 − − − − - v

10-2

v 2 10 100 ( p − − − − Fig. 4: As p increases beyond p = 2, β(p) becomes much less than v 1, implying that MSE(µbGGML) is much lower than MSE(µbmean).

v (

where Γ2(1/p) − − − − β(p) = . (25) v 2 2p−1 p Γ p Γ(3/p)   Fig. 3: The three plots show the noise PDF calculated numerically We notice that (24) decreases as O(K−1), but the coefficient from the true density (15) (solid red) and via the GG approxima- β(p), which is plotted in Fig.4, is much less than 1 for tion (20) (dashed black). The close agreement suggests the GGD is large p, suggesting that µGGML should outperform µmean a good approximation for the noise. for . Since the GGD closely approximates the total pVe > 2 noise distribution, it would be ideal if µ reduced to a b GGML b computationally simple estimator such as one based on order b statistics for all p. Unfortunately, the ML estimator does not To verify the quality of the generalized Gaussian approxima- b tion to the output noise distribution using the kurtosis match, generally have a closed-form expression, except in special Fig.3 shows comparisons between the true density, computed cases such as p = 1, 2, (an explicit expression has also ∞ numerically according to (15), and its GG approximation recently been derived for p = 4 [53]), so iterative solution from (20). We test σZ /∆ = 0.004, 0.04, and 0.4, maintaining would again be necessary. ∆ = 1 for consistency. For σZ ∆, the distribution is close We have already observed that the ML estimators for p = to uniform, and at σ ∆, the distribution is almost Gaussian. 1, 2, all belong to a class of linear combinations of order Z ∞ In the intermediate regime,≈ however, the distribution combines statistics called L-estimates [54], which are attractive because attributes of each component, with the flat top of the uniform they have closed-form definitions of the form distribution and exponential tails of the Gaussian distribution. K The GGD appears to be a good approximation of the true noise µ = a Y . (26) distribution, almost perfectly matching the shape behavior. X i (i) i=1 X B. Estimation We thus consider howb to obtain the coefficients ai for L- estimates that perform well for GG noise when p is not one For i.i.d. samples of a GG distribution, the likelihood of the special cases. function is An effective L-estimate should weight the order statistics K in accordance with the noise distribution. Notable past ap- K ( vi ; µ, σ, p) = f (v; µ, σ, p). (22) proaches include that of Lloyd, who derived the minimum- L { }i=1 Ve i=1 variance unbiased estimator among linear functions of the Y By differentiating the log of (22) with respect to µ, the ML order statistics in [39]. This formulation is impractical, how- estimator µGGML for the mean of a GGRV is given in [50] as ever, as it requires the correlations of the order statistics for the solution to a given distribution, which are often not known even for common special cases like the Gaussian distribution. Bovik et Kb sgn(y µ ) y µ p−1 = 0, (23) al. [55], [56] further specified the minimum variance unbiased i − GGML | i − GGML| L-estimate and then numerically computed results for several i=1 X values of p from samples of a GGD with K = 3. and is shown to be asymptoticallyb normalb and efficient in K A number of approximations to Lloyd’s formulation exist for p 2, which is the regime of interest. The asymptotic ≥ 2 to more simply compute near-optimal coefficients for linear variance of µGGML normalized by ∆ is given by combinations of order statistics, including [57], [58]. Öten and β(p)[(σ /∆)2 + 1/12] de Figueiredo introduced one such method using Taylor expan- NVar(GGML) = Z , (24) b K sion approximations to get around the difficulties of knowing 7 distributions of order statistics [59]. This method does still where require knowledge of the inverse CDF of the noise distribution, h i2 h i2 u − ∆ −µ(j) u + ∆ −µ(j) and while there is no closed form expression for the GGD, the i 2 bQML i 2 bQML exp 2σ2 exp 2σ2 necessary values can be pre-computed numerically. − Z ! − − Z ! m(ui) = . Simpler L-estimates have much longer histories, with con- ∆ (j) ∆ (j) ui+ 2 −µbQML ui− 2 −µbQML sideration of trimming extreme or middle order statistics at Φ Φ σZ − σZ least as old as [60] (credited to Gergonne in [61]), with the     (28) first known mathematical analysis by Daniell, who called such an estimate the “discard-average” [62], [63]. The method now (0) A good initialization is µ = µQ, since the estimators are known as the -trimmed mean and popularized by Tukey [64], QML α equal for σZ = 0, . A similar algorithm, derived in [48] for [65] avoids extensive computation of the weights by trimming quantized, linearly-mixed∞ vector measurements, is equivalent b b a fixed fraction α from the extremes of the order statistics. Re- to that in (27) for the special case of a repeated scalar input and strepo and Bovik defined a complementary α-“outer” trimmed- no mixing (i.e., the mixing matrix is a column of 1s). Since mean [66], which retains a fraction of the data by trimming α µDML has the same formulation as µQML, the same algorithm the middle order statistics and is suitable for distributions also works for continuous-valued dithered measurements: with short tails within the range from Gaussian to uniform K distributions. They tabulated several instances of the trimmed b σ b µ(j+1) = µ(j) + Z m(y ). (29) mean for GGDs with multiple combinations of K and p. DML DML √ i K 2π i=1 Lastly, Beaulieu and Guo introduced an estimator specifi- X cally for the GGD but using nonlinear combinations of the b (0)b We initialize with µDML = µmid, since the midrange is known order statistics [67]. The weighting of the order statistics to be the ML estimator for σZ = 0. A solver for µGGML was depends on p via a heuristically-justified function and is shown (0) likewise initializedb with µGGMLb = µmid. to perform almost identically to µGGML. This estimator is unbiased and exactly matches the ML estimator for the special b cases of p = 2 and . B. Order Statistics-Basedb Estimatorsb ∞ b In the following section, we consider three of the most To evaluate the GG noise approximation and find the best computationally-efficient order statistics-based estimators to non-iterative estimator, we compared the three simplest esti- use for the GG approximation: the nearly-best L-estimate µNB mators based on the order statistics: the nearly-best L-estimate, of [59], the trimmed-mean estimator µα modeled on [66], the α-outer mean, and the nonlinear combination from [67]. and the non-linear estimator µ of [67]. Each estimator Since the GGD is symmetric, the coefficients of an unbiased NL b takes the form of (26) with different computations of the order statistics-based estimator are defined symmetrically and b coefficients a . While µ is specifically designed for use with only half must be uniquely computed. It is thus useful to define i NL b GG noise, we modify the more general µ and µ to match M = K/2 and N = K/2 using the floor and ceiling NB α b c d e the GG approximation. For µ , we use the PDF and inverse functions, respectively. b NB CDF (computed numerically) of the GG approximation to To derive the nearly-best L-estimate of [59] b b determine the coefficients. One could alternatively compute the b K coefficients for µNB directly from the true noise distribution µ = aNBY , (30) in (15); however, additional numerical evaluation would be NB i (i) i=1 required for the inverse CDF, which we eschew in our search X b for computationally efficient estimators. There is no explicit we first compute b distribution assumed by , but we propose a choice of µα b = f (c )[ 2f (c ) + f (c )], (31a) the trimmed fraction based on the estimated value to 1 V 1 V 1 V 2 α pVe − implicitly link the estimator to the GGD. bi = fV (ci)[fV (ci−1 2fV (c1) + fV (ci+1)], (31b) b − i = 2,...,N 1, b − b = f (c )[f (c − ) f (c )], (31c) N V N V N 1 − V N IV. ESTIMATOR IMPLEMENTATIONS −1 −1 where ci = FV (i/(K + 1)), and FV is the inverse of the GG CDF. From this, the weights are derived for i = 1,...,N A. ML Estimators as b / 2 N b ,K even; An EM algorithm for obtaining the quantized-sample ML NB NB i i=1 i ai = aK−i+1 = M estimate µQML was introduced by Papadopoulos et al. [31,  b / b + 2 b ,K odd.  i N Pi=1 i Appendix E]: (32)  P  b For the simulations in Python, the inverse CDF was nu- K merically computed with the stats.gennorm.ppf GGD (j+1) (j) σZ µ = µ + m(ui), (27) percentile function in scipy, as no closed-form expression QML QML K√2π i=1 exists. X b b 8

For the α-outer mean estimate Regime I/II Boundary vs. K Regime II/III Boundary vs. K 0.5 K 10-1 α 0.4 µα = ai Y(i), (33)

i=1 0.3

1 2 9 X 9 α 10-2 the order statistics’ weightsb ai are only given in [66] for a 0.2 symmetric filter applied to an odd number of samples: exact log-log-cubic fit 0.1 exact 1 log-log-linear fit fit , i 1 Kα 2 0 1 2 3 0 1 2 3 Kα ≤ b c 10 10 10 10 10 10 10 10  1 Kα 1 Kα K K  2 − b 2 c, i = 1 Kα + 1, (a) (b)  Kα b 2 c α α  ai = aK−i+1 =  α [0, 1 1/K];  1 ∈ −  Kα 2 Kα Fig. 5: The value of ξ1 substantially decreases and ξ2 slowly increases  − b 2 c, i = 1 Kα + 1, Kα b 2 c as K increases, expanding Regime II. (a) A log-log-cubic fit can  α [1 1/K, 1]; be used to compute a close approximation to ξ1 for all K, while  ∈ −  0, otherwise. a log-log-linear fit suffices for K > 20. (b) The square root of a  log-quadratic fit closely approximates ξ2.  (34) Since an even number of measurements is also possible, we similarly define, for all α [0, 1], ∈ 1 A. Defining ξ , i 1 Kα ; 1 Kα ≤ b 2 c We define ξ as the value of σ /∆ where NMSE(mid) α α  1 1 1 Z ai = aK−i+1 = 2 Kα 2 Kα 1 and NVar(GGML) intersect, which from (14) and (24) is the  − b c, i = 2 Kα + 1;  Kα b c solution to 0, otherwise. K/2  (35) 2 (38)  β(pVe )[(σZ /∆) + 1/12] = 2 Note that the outer mean is equivalent to µmean when α = 1 K + 3K + 2 and reduces to µ for α = 0. To match the GGD behavior, mid for a given K. We remind the reader that pV is also dependent we thus propose to define α = 2/p , which yields the ML b e Ve b on σZ /∆ as shown in (21). Fig. 5a shows that ξ1 decreases estimate for both p = 2 and p = . Finally, the nonlinear b Ve Ve as K increases, since the probability of observing an “outlier” estimator of [67] is given as ∞ b b measurement due to the exponential tails increases with K, b Kb so a lower σZ /∆ value (i.e., with shorter tails) is needed for NL the midrange estimator to achieve nearly-optimal performance. µNL = ai Y(i), (36) i=1 The figure shows the exact values of ξ1 computed by solv- X ing (38) as well as a log-log-cubic least-squares fit where the data-dependentb coefficients are given for i = 1,...,M by log ξ 0.0104(log K)3 0.1760(log K)2 1 ≈ − p−2 + 0.0274(log K) 1.8511, (39) NL NL 1 [Y(K−i+1) Y(i)] a = a − = − . (37) − i K i+1 2 M p−2 j=1[Y(K−j+1) Y(j)] which can be used quickly to calculate an approximation for a − desired value of K. Since the relationship appears fairly linear Note that if K is odd, the median term (i = N) is ignored, as P for K > 20, the simple log-log-linear fit it would correspond to a numerator of zero. log ξ1 0.9301(log K) 0.1963, (40) V. DITHER NOISE REGIMES ≈ − − which can be rewritten as 0.9301, is also useful To better understand the dither noise behavior, we have ξ1 0.8217/K for quick computation. The natural≈ logarithm is used in each previously described three regimes of the dither noise distribu- case. tion, with Regimes I and III corresponding to approximately uniform and Gaussian noise, respectively. We have furthermore proposed the GGD with p (2, ) as an approximation for B. Defining ξ2 the noise distribution in Regime∈ ∞ II. However, the boundaries Since NMSE(mean) and NVar(GGML) both have 1/K of these regions are imprecise, and we aim to more rigorously factors, they converge where , which is only the case β(pVe ) = 1 define them in this section. We first define and as the for . This suggests that equality requires the noise to be ξ1 ξ2 pVe = 2 values of the ratio σ /∆ separating the regimes such that the exactly Gaussian, which only occurs for σ /∆ . Instead, Z b Z noise distribution is approximately uniform for σ /∆ < ξ , we can look for a point where NVar(GGML) and→ NMSE(mean) ∞ Z 1 b GG for ξ1 σZ /∆ < ξ2, and Gaussian for σZ /∆ ξ2. can reasonably be considered to have converged (i.e., the GG In each regime,≤ we have an expression for the expected≤ MSE is close enough to a Gaussian). or asymptotic variance of the ML estimator, so we use the We propose that a reasonable definition of ξ2 is the value intersection or approximate point of convergence of these of σZ /∆ that minimizes NMSE(Q), the expected normal- expressions to define ξ1 and ξ2. ized MSE of µQ. Intuitively, as σZ /∆ increases from ξ2,

b 9 the Gaussian variance will dominate for both quantized and VI.NUMERICAL RESULTS dithered measurements, so that the effect of the quantization Monte Carlo simulations were performed to compare the error is negligible, whether signal-independent for dithered NMSE performance of the generalized Gaussian and order measurements or signal-dependent without dither. Thus the statistics-based estimators (µNB, µNL, µα) against the ML point at which NMSE(Q) is minimized indicates where the estimators (µDML, µGGML) and the conventional sample mean Gaussian variance begins to dominate and is a reasonable place (µ ) and midrange (µ ). Estimates were also computed mean midb b b to consider a GG approximation to be sufficiently Gaussian. applying the sample mean (µ ) and ML estimator (µ ) b b Q QML We derive in AppendixC that NMSE(Q) is given as to the quantized data to determine under which conditions b b subtractive dithering actually provides an advantage. As in 2 2 b b NMSE(Q) = E[(µQ µX ) ]/∆ − the motivating example in SectionII, for each Monte Carlo 1/2 M trial, µX was chosen uniformly at random from [ ∆/2, ∆/2], 1 1 2 2− = + m Ψ(m, µX )dµX and K samples of signal noise Z (0, σ ) and dither 12 K b ∼ N Z −1/2 m=−M D ( ∆/2, ∆/2) were generated for (3). The quantization Z X 2 bin∼ size U − was maintained at ∆ = 1 throughout. The normalized K 1 1/2 M + − mΨ(m, µ ) dµ MSE was computed for T = 20,000 trials. K X X −1/2 − ! Z mX= M 1/2 M A. Normalized MSE vs. σZ /∆ 2 µ mΨ(m, µ )dµ , (41) − X X X We begin by discussing the plots in Fig.6 of NMSE as −1/2 − Z mX= M a function of σZ /∆ for K = 5, 25, and 125. Because nine separate estimators and four NMSE bounds are displayed in where each plot, we acknowledge that the figures can be difficult to follow due to the overlapping curves. A flowchart is included m + 1/2 µX m 1/2 µX Ψ(m, µX ) = Φ − Φ − − . in Fig.7 that summarizes the results and provides a decision- σZ /∆ − σZ /∆ making process for whether to use dither and which estimator    (42) to choose. Defining 1) Generalized Gaussian Estimators: The GG-based esti- 2 2 ξ2 = arg min E[(µQ µX ) ]/∆ (43) mators (µNB, µNL, µα, µGGML) have effectively identical σ /∆ − Z performance and match that of µDML. The actual differences in performance vary on the order of a few percent over large b b b b b and solving via a Nelder-Mead algorithm [68] and numerical ranges of K and σ /∆, compared to the orders of magnitude Z b integration, we show in Fig. 5b that the value of ξ2 changes differences for the mean and midrange. The negligible per- only slightly as a function of K. formance difference further validates approximating the total This range of values is notably very close to the with the GGD. For this same reason, we collectively σ /∆ = 1/2 recommended by Vardeman and Lee [6], or Z discuss µDML and the GG-based estimators in the following the value σZ /∆ = 1/3 at which Moschitta et al. suggest that sections. the loss of information from quantizing samples of a Gaussian The GG-based estimators meet or exceed the performance distribution becomes negligible in estimation of the mean [7]. b of all other estimators for all σZ /∆ and for all K. More For quick computation, ξ2 can be approximated by the square specifically, the GG estimators converge to and match the per- root of a log-quadratic fit: formance of the midrange in Regime I and likewise converge to and match the performance of the mean in Regime III. 2 ξ2 0.000756(log K) + 0.328 log K. (44) In Regime II, the GG estimators outperform both the mean ≈ − and the midrange. Thus, a GG estimator should be the default p We notice that the Regime boundary definitions are incon- estimator choice for any σZ /∆. sistent for K < 3, as ξ1 > ξ2; however, the Regimes are Given the approximate equivalence of the GG estimators, we meaningless for K = 1 or 2 anyway, as symmetric order argue that the trimmed-mean µα is the best choice of general- statistics-based estimators (e.g., mean, median, midrange) are purpose estimator for dithered data. The other estimators either all equivalent for such small numbers of measurements, so require iterative solvers (µ , µ ), rely on numerical DMLb GGML there is no advantage to distinguishing between noise distri- computation for the GG inverse CDF (µNB), or are data- butions. We notice also that since ξ decreases monotonically dependent (µ ). On the other hand, µ has a simple closed- 1 NL b b α and ξ increases monotonically with K, Regime II grows as form solution that can be tabulated if needed. 2 b K increases, since small mismatches between the assumed and 2) Performance by Regime—Dithered Measurements: The b b true PDFs become easier to observe. Intuitively, ξ1 decreases plots in Fig.6 validate the concept of three distinct regimes of much faster than ξ2 increases because the difference between noise behavior. In the plots, the approximate regime boundaries −2 −3 a PDF with finite support (σZ /∆ = 0) and one with infinite are computed to be ξ1 = 0.1098, 3.85 10 , 9.56 10 support (σ /∆ > 0) is more significant for large K than the and ξ = 0.2296, 0.3132{, 0.3737 for ×K = 5, 25, 125× , re-} Z 2 { } { } difference between finite σZ /∆ (e.g., GG approximation with spectively, confirming that Regime II expands as K increases. p > 2 and σ /∆ (p = 2). In Regime I, the NMSE performance of all estimators on Ve Z → ∞ Ve b b 10

K = 5 K = 25 K = 125

Regime I Regime I Regime I K = 125 10-1 10R-1egime II 10R-1egime II Regime II Regime III Regime III ReeggiimmeeI III NMSE(mean) NMS10E-1(mean) RNeMgiSmEe(ImI ean) NMSE(mid) NMSE(mid) RNeMgiSmEe(ImIIid) -2 -2 -2 10 10NVar(GGML) 10NVar(GGML) NMVaSEr((GmGeaMn)L) NMSE(mid) -2

NCRB(7X ) NCR10B(7X ) NCRB(7X )

E E

E NVar(GGML)

S S S 7 7 7

DML DML NDCMRLB(7X )

E

M M M

-3 7-3GGML 7-3GGS ML 7DGMGLML

N N N 10 10 10 7QML 7Q M L -3 7GQGMMLL

b b N 10 b 7 7bQ 7bQ 7bQML 7Q 7mean 7mean 7bmean b b 7b -4 7-4 7-4 7bmean 10 10bmid 10bmid -4 bmid 10 7bmid 7NB 7NB 7NB b b 7bNB 7 7 7 bNL bNL 7bNLL 7 7 7 91 92 b, 91 92 b, 91 91 92 92 b,, b -3 -2 -1 0 b -3 -2 -1 0 b -3 10-3 -2 10-2 -110-1 100 0 b 10 10 10 10 10 10 10 10 10 10 10 10 b b b < =" b

Fig. 6: The performance of the estimators is evaluated for K = (a) 5, (b) 25, and (c) 125 to show the range of behavior as σZ /∆ varies. The ML estimator for dithered measurements µbDML and the estimators based on the GGD (µbGGML, µbNB, µbNL, and µbα) achieve the lowest NMSE for each σZ /∆ regime (curves are overlapping). Results are shown for 20000 Monte Carlo trials.

Use quantized NCRB(µX ) are close in Regime II, NCRB(µX ) is a tighter measurements µQ (no dither) bound, as it is based on the true noise distribution, although yes b NVar(GGML) may be easier to compute for a rough estimate Require Add Gaussian noise: of performance. In Regime III, the NMSE performance of simple µQML set σZ /∆ 1/3 no no ≈ estimator? all estimators on the dithered data is equal to NMSE(mean). b In both Regimes II and III, the NMSE decreases as σ /∆ no Z µ decreases. Performance likewise improves with increasing K. yes mid 3) Performance by Regime—Quantized Measurements: σ /∆ yes Subtr. yes σ /∆ ξ b Start Z Z 1 While the three Regimes were technically defined for dithered 1/3? dither? for given≤ K? ≤ measurements in particular, they are also informative of the no µα behavior of estimators applied to quantized measurements. b In Regime I, σZ /∆ is so small that, unless µX lies on the Fig. 7: The results of our Monte Carlo simulations lead to a simplified boundary between quantization bins, all measurements are decision process for when and how to use dither. If σZ /∆ > ξ2 (≈ quantized to the same value. As a result, the NMSE of both 1/3), there is no benefit to using anything but µbQ applied to the quantized measurements, whereas dither leads to reduce estimation µQ and µQML is dominated by the squared bias term, which is 1/12 (the variance is zero). Further decreasing σ /∆ or error when σZ /∆ ≤ 1/3. If subtractive dithering is not possible, Z increasing K provides no benefit. the best performance can be achieved by adding Gaussian noise to b b In Regime II, σ /∆ is large enough that there is often some set σZ /∆ ≈ 1/3 and applying µbQML (although µbQ can be used if Z simplicity is required). However, larger performance improvements variation in the measurements due to signal even without the can be achieved with a subtractively-dithered quantizer. For K addition of dither. This phenomenon is sometimes referred subtractively-dithered measurements, compute ξ1 from either (39) to in the literature as self-dithering, equivalent to adding or (40) to determine whether to use µ (in Regime I) or µ (in bmid bα nonsubtractive Gaussian dither to a constant signal µX [69]. Regime II). Within Regime II, both µQ and µQML improve as σZ /∆ increases because the increased signal variation reduces the bias term of the NMSE faster than the variance increases. b b The NMSE is minimized for µQ by definition at ξ2, and then the dithered data is basically flat and equal to NMSE(mid). the NMSE increases as σZ /∆ increases in Regime III. This This suggests that for a practical system where σ /∆ can be suggests that if σ /∆ is small and subtractive dither cannot Z Z b tuned, once the system is operating in Regime I (dependent be used, then quantized measurements benefit from adding on a fixed K), there is no benefit from further decreasing nonsubtractive Gaussian dither such that σZ /∆ = ξ2, which σZ /∆; performance can only be improved by increasing K. is approximately 1/3. It is in Regime II that µQML shows the In Regime II, the GG-based estimators approach NCRB(µX ), largest improvement in performance over µQ, with the ML especially for large K. We note that while NVar(GGML) and estimator accounting for the form of the signal variation for b b 11

σZ /∆p= 0.004 σZ /∆p= = 0.04 σZ /p∆= = 0.4

− − − − − − y y y (a) (b) (c) p = 158 p = 14.3 p = 2.16 bVe bVe bVe 0.4 0.075 µ 0.4 NB µ 0.050 i i i 0.2 a a a NL 0.2 0.025

0.0 0.0 0.000 5 10 15 20 5 10 15 20 5 10 15 20 i i i (d) (e) (f) p = 14.3,K = 100 bVe 0.2

i 0.1 a

0.0 0 20 40 60 80 100 i (g)

Fig. 8: Example dithered measurements are shown in (a-c) for σZ /∆ = 0.004, 0.04, and 0.4 with K = 20. Plots (d-f) show the resulting coefficient values for the GG estimators given the estimated value of p above. In (g), σ /∆ = 0.04 and K = 100, highlighting how the bVe Z coefficients change as K increases. Note that the coefficients of the order statistics for the NL estimator depend on the measured data sequence shown above. quantized measurements. set of measurements shown in the top row, and that different In Regime III, the NMSE of µQ and µQML matches that sample realizations can result in coefficients more or less of the best estimators applied to dithered data. Clearly, σZ /∆ similar to those of µNB and µα. To show the behavior of is large enough that even the quantized measurements contain the coefficients as K increases, we also plot a K for K = b b i i=1 sufficient information about the signal variation. This suggests 100 and σ /∆ = 0.04 in Fig. 8g. This plot{ underscores} that Z b b that dither provides no benefit in Regime III, since equal the coefficients for µα are basically indicators of the most performance can be achieved without dither. Again for both significant non-zero coefficients of µNB and µNL. Using the Regimes II and III, the NMSE decreases as is increased. simple formulation of µ as a guide, in the limit as K , K b α only Kα coefficients would have nonzero weight. Since→ ∞the b b performance of all three order statistics-based estimators is b B. Order Statistics-Based Estimator Coefficients similar, this further suggests that the selection of which order To better understand why the order statistics-based esti- statistics are used is more important than exactly how much mators have essentially identical performance, in Fig.8 we they are weighted. plot the coefficients ai from (26) for each estimator. The top row shows example measurements for K = 20, ∆ = 1, and C. Normalized MSE vs. K σZ /∆ = 0.004, 0.04, and 0.4, respectively, with the samples spreading out as the Gaussian variance increases. The second To better understand how the number of measurements row of plots depicts the resulting coefficients for µNB, µNL, affects the estimators’ performance, we plot results for three and using the estimated value . Fig. 8d shows the fixed values of σ /∆ in each regime (0.004, 0.04, 0.4) while µα pVe Z coefficients are equivalent to those of µ for small σ /∆. varying K in Fig.9. mid b Zb In Figs. 8e and 8f, the coefficients of the various estimators In Fig. 9a, µ follows NMSE(mid) as expected for b b mid are no longer identical. However, the coefficients follow the Regime I until K 200. At that point, the NMSE of the b same trends for each estimator, with zero weight on the middle midrange begins to≈ diverge, with slower improvement as K b order statistics for small σZ /∆ and more evenly-distributed increases. Similarly, the GG estimators follow NMSE(mid) weights as the noise model approaches a Gaussian. We note until K 200 and then switch to NCRB(µ ). This suggests ≈ X that the coefficients for µNL vary depending on the particular that σZ /∆ = 0.004 is in Regime I for K < 200 and in

b 12

10-1 b 10-1Regime I b 10-1Regime I K = b125 Regime III Regime II Regime II Regime II NMSE(mean) NMSE(mean) RNegMimSeEI (mean) 10-1 Regime II -2 -2 -2 10 10 NMSE(mid) 10 NMSE(mid) RNegMimSeEII(Imid) NVar(GGML) NVar(GGML) NNMVSEa(rm(GeaGn)ML) NCRB(7X ) NCRB(7X ) NNMCSER(Bm(id7)X )

10-2

E E E NVar(GGML)

10-3 10-37DML 10-37DML 7DML

S S S NCRB(7X )

7GGML 7GE GML 7GGML

M M M

S 7DML

N N N 7QML 7QML 7QML b b M -3 7GbGML

N 10 -4 -47Q -47Q 7Q 10 10 b 10 b 7bQbML 7bmean 7bmean 7bQ7bmean 7bmid 7bmid 7bm7bemanid -4 -5 -57 -57 10 7bm7id 10 10 bNB 10 bNB bNB 7bNB 7bNL 7bNL 7bNL 7bNL 7b, 7b, 77b, 91 92 b, -6 -6 -6 10 10 b 10 b -3 -2 -1 0 b b 0 1 2 3 0 1 2 3 0 10 1 10 102 103 10 10 10 10 10b 10 10 10 10b 10 10 10 b b < =" K K K Z (a) (b) (c)

Fig. 9: The performance of the order statistics estimators is evaluated for σZ /∆ = (a) 0.004, (b) 0.04, and (c) 0.4 to show the full range of behavior as the number of measurements K increases. The plots show the MSE normalized to ∆ = 1 from 20000 trials per data point. Dashed lines show the theoretical NMSE of the mean and midrange, and the asymptotic variance of the ML estimator of the GGD mean. The ML estimator for dithered measurements µbDML and the estimators based on the GGD (µbGGML, µbNB, µbNL, and µbα) achieve the lowest NMSE for all K (curves are overlapping).

Regime II for K > 200. This switch between regimes occurs mators follow NMSE(mean), NVar(GGML), and NCRB(µX ), near the intersection of NMSE(mid) and NVar(GGML) as which have converged. In this regime, it is clear that there is a function of K, further validating these bounds as useful no benefit to using dither, as there is minimal improvement demarcations of estimator performance. For all K in the in performance even for large K. In fact, implementing a plotted range, the midrange and GG estimators outperform the dithered quantizer is likely more complicated in practice and mean. The quantized estimators show almost no improvement is discouraged for Regime III. as K increases. In Fig. 9b, the midrange performance is similar to that in Fig. 9a, with µmid following NMSE(mid) until the intersec- tion of NMSE(mid) and NVar(GGML) and then improving VII.CONCLUSION more slowly as a function of the number of measurements, b eventually being outperformed by µ for large K. The GG mean This work studied the task of estimating the mean of a Gaus- estimators likewise follow NMSE(mid) for small K and switch sian signal from quantized measurements. By applying sub- to following NCRB(µ ) after the intersection. For large K, the X b tractive dither to the measurement process, the noise becomes NMSE of the GG estimators is a constant factor lower than signal-independent but no longer has a Gaussian distribution. that of µ , with this factor approximately given by β(p ). mean Ve We showed that the generalized Gaussian distribution is a close The NMSE of the quantized estimators decreases slowly as K and useful approximation for the Gaussian plus uniform total increases, with marginally better performance for µ than b QML b noise distribution. Estimators using the generalized Gaussian µ . Q approximation effectively match the performance of the ML Figures 9a and 9b help answer the question ofb how the estimator for the total noise, which is a significant improve- orderb statistics-based estimators “between” the midrange and ment over the conventional mean and midrange estimators in the mean would perform as a function of K. The results Regime II. Due to its computational simplicity and efficient suggest that these estimators ultimately have −1 NMSE O(K ) performance, we recommend the trimmed mean µα. From reduction, although this reduction is faster for small values of further comparison against estimators for quantized measure- K. ments, we determined simple design rules for deciding whether b In Fig. 9c, the noise can be sufficiently described as Gaus- and how to use quantized measurements. In short, there is value sian for K < 359; however, for larger K, ξ2 > 0.4 as shown in in using dither in Regimes I and II, and a GG-based estima- Fig. 5b. The midrange has poor performance for all K, while tor should be used in Regime II. Future work will address the other dithered estimators and the quantized estimators have variations on the measurement model, including non-Gaussian essentially identical performance for K < 359. Those esti- signal distributions and different dither implementations. 13

APPENDIX A where independence eliminates the odd cross terms. Then the CRAMÉR-RAO BOUND excess kurtosis is The Cramér-Rao Bound is a lower bound on the variance σ4 γ(B) + σ4 γ(C) γ(A) = B C . (48) of an unbiased estimator [42], given by 4 σA

CRB(µX ) = 1/I(µX ), The kurtosis of Gaussian and uniform random variables is well-known and straightforward to compute from the defini- where I(µX ) is the Fisher information computed as tion; the excess kurtosis is 0 for a Gaussian and 6/5 for a uniform distribution. From [52], we have that the− excess 2 3 ∂ log fY (y; µX , σZ , ∆) kurtosis of a GGRV V with shape parameter pv is I(µX ) = E ∂µX "  # Γ(1/pv)Γ(5/pv) (49) 2 γ(V ) = 2 3. ∂ f (y; µ , σ , ∆) [Γ(3/pv)] − (a) ∂µX Y X Z = dy To fit the GGD to the sum of uniform and Gaussian random  fY (y; µX , σZ , ∆)  Z 2 variables, we set the kurtosis of the approximation to match φ v−∆/2 φ v+∆/2 the kurtosis of the sum using (48) (b) 1 σZ σZ = − dv 2 h  v+∆/2  v−∆/2i Γ(1/p )Γ(5/p ) σ4 0 + σ4 ( 6/5) σZ ∆ Φ Φ v v Z W Z σZ σZ = 3 + · − − [Γ(3/p )]2 (σ2 + σ2 )2    2  v W Z φ u−1/2 φ u+1/2 6 1 (c) 1 σZ /∆ − σZ /∆ = 3 , (50) = du, (45) 5 2 2 2 h  u+1/2  u−1/2i − σZ σZ Φ Φ 1 + 12 ∆ Z σZ /∆ − σZ /∆     2 2 h  i where step (a) uses the definition of expectation and the chain where σW = ∆ /12. rule, (b) differentiates (15) with respect to µX for v = y µX , and (c) changes variables to u = v/∆. Normalizing by− ∆2 APPENDIX C removes the separate dependence on σZ or ∆, so we define MEAN SQUARED ERROROF µQ the normalized CRB as We use iterated expectation to compute the MSE of µQ as NCRB(µ ) = CRB(µ )/∆2 2 b2 X X E[(µQ µX ) ] = E E[(µQ µX ) µX ] , (51) 2 − − | b (σZ /∆) = 2 . (46) with no prior knowledge on the true value so that we assume u−1/2 u+1/2 φ φ µX [ ∆b/2, ∆/2] within a bin.b Define a function g : R σZ /∆ − σZ /∆ ∼ U − 2 → du R as g(x) := E[(µQ µX ) µX = x], then hΦ u+1/2 Φ u−1/2i − | Z σZ /∆ − σZ /∆ K 2     1 b g(x) = E q(x + Zi) x Finally, Fisher information is additive for independent obser-  K −  i=1 ! vations, so for K independent samples, the lower bound on X the NCRB is 1/K times that for one observation.  K  2 1 2 = x + E (q(x + Zi)) K2 i=i APPENDIX B X h i KURTOSIS MATCHING K + E [q(x + Zi)] E [q(x + Zj)] The kurtosis of a random variable B is the standardized  i=1 j=6 i fourth central moment [70], defined as X X 2x K  [(B µ )4] µ (B) [q(x + Z )] E B 4 (47) E i κ(B) = − 2 2 = 4 . − K E[(B µB) ] σ i=1 { − } B X 2 1 2 K 1 2 = x + E (q(x + Z)) + − (E [q(x + Z)]) The excess kurtosis γ(B) = κ(B) 3 is often used to simplify K K computations. Define − , where and are A = B + C B C 2xE [qh(x + Z)] . i (52) independent random variables. The kurtosis of the sum can − be computed by expanding (47) as follows: Using the definition 4 4 m + 1/2 x m 1/2 x E[(A µA) ] E [(B µB) + (C µC )] κ(A) = = Ψ(m, x) = Φ − Φ − − , (53) − 2 2 { − − 2 } 2 σ /∆ − σ /∆ E[(A µA) ] E[((B µB) + (C µC )) ]  Z   Z  { − } {2 2 − − } µ4(B) + µ4(C) + 6σBσC 3Note that the definition of kurtosis in [52] corresponds to the excess = 2 2 2 , (σB + σC ) kurtosis in this work. 14 note that [8] S. B. Vardeman, “Sheppard’s correction for variances and the “quanti- zation noise model”,” IEEE Trans. Intrum. Meas., vol. 54, no. 5, pp. E [q(x + Z)] 2117–2119, Oct. 2005. M m∆+∆/2 [9] W. F. Sheppard, “On the calculation of the average square, cube, of a 1 z x J. Roy. Statist. Soc. = lim m∆ φ − dz large number of magnitudes,” , vol. 60, no. 3, pp. M→∞ σ σ 698–703, Sep. 1897. − m∆−∆/2 Z Z mX= M Z   [10] ——, “On the calculation of the most probable values of frequency- M constants, for data arranged according to equidistant divisions of a ∆ mΨ(m, x) scale,” Proc. London Math. Soc., vol. 29, pp. 353–380, 1898. ≈ m=−M [11] M. G. Kendall, “The conditions under which Sheppard’s corrections are X valid,” J. Roy. Statist. Soc., vol. 101, no. 3, pp. 592–605, 1938. for some large number M. Similarly, [12] D. F. Heitjan, “Inference from grouped continuous data: A review,” Statist. Sci., vol. 4, no. 2, pp. 164–179, 1989. M 2 2 2 [13] Z. Bai, S. Zheng, B. Zhang, and G. Hua, “Statistical analysis for E (q(x + Z)) ∆ m Ψ(m, x). ≈ rounded data,” J. Statist. Plan. Infer., vol. 139, pp. 2526–2542, 2009. − h i mX= M [14] R. H. Walden, “Analog-to-digital converter survey and analysis,” IEEE J. Sel. Areas Comm., vol. 17, no. 4, pp. 539–550, Apr. 1999. The MSE normalized by ∆2 then follows as [15] D. V. O’Connor and D. D. Phillips, Time-Correlated Single Photon 2 2 Counting. Academic Press, 1984. E[(µQ µX ) ]/∆ = − [16] B. Widrow, “Statistical analysis of amplitude-quantized sampled-data M 1/2 systems,” Transactions of the American Institute of Electrical Engi- 1 1 2 + m Ψ(m, x)dx neers, Part II: Applications and Industry, vol. 79, no. 6, pp. 555–568, b 12 K Z−1/2 m=−M 1961. X 2 [17] A. B. Sripad and D. L. Snyder, “A necessary and sufficient condition 1/2 M K 1 for quantization errors to be uniform and white,” IEEE Trans. Acoust. + − mΨ(m, x) dx K Speech Signal Process., vol. 25, no. 5, pp. 442–448, Oct. 1977. −1/2 − ! Z mX= M [18] T. A. C. M. Claasen and A. Jongpier, “Model for the power spectral 1/2 M density of quantization noise,” IEEE Trans. Acoust. Speech Signal 2 x mΨ(m, x)dx. (54) Process., vol. ASSP-29, no. 4, pp. 914–917, Aug. 1981. − Z−1/2 m=−M [19] H. Viswanathan and R. Zamir, “On the whiteness of high-resolution X quantization errors,” IEEE Trans. Inform. Theory, vol. 47, no. 5, pp. 2029–2038, Jul. 2001. ACKNOWLEDGMENT [20] D. Marco and D. L. Neuhoff, “The validity of the additive noise model The authors would like to thank Dr. Yanting Ma for assis- for uniform scalar quantizers,” IEEE Trans. Inform. Theory, vol. 51, no. 5, pp. 1739–1755, May 2005. tance with the derivations in AppendixC and Dr. Dongeek [21] L. R. A. MacColl, Fundamental Theory of Servomechanisms. D. Van Shin for helpful discussions on the role of nonsubtractive Nostrand Company, inc., 1945. dither in improving estimation accuracy. Computing resources [22] T. Ishikawa, “Linearization of contactor control systems by external provided by Boston University’s Research Computing Services dither signals,” Electronics Laboratories, Stanford University, Stanford, are also gratefully appreciated. CA, Tech. Rep., 1960. [23] W. M. Goodall, “Television by pulse code modulation,” Bell Syst. Tech. J., vol. 30, no. 1, pp. 33–49, Jan. 1951. REFERENCES [24] L. Roberts, “Picture coding using pseudo-random noise,” IEEE Trans. [1] N. F. Gjeddebaek, “Contribution to the study of grouped observations. Inform. Theory, vol. 8, no. 2, pp. 145–154, Feb. 1962. IV. Some comments on simple estimates,” Biometrics, vol. 15, no. 3, [25] N. S. Jayant and L. R. Rabiner, “The application of dither to the pp. 433–439, Sep. 1959. quantization of speech signals,” Bell Syst. Tech. J., vol. 51, no. 6, pp. [2] R. A. Fisher, “On the mathematical foundations of theoretical statistics,” 1293–1304, Jul. 1972. Philosophical Transactions of the Royal Society A: Mathematical, [26] G. G. Furman, “Improving performance of quantizer feedback systems Physical and Engineering Sciences, vol. 222, no. 594-604, pp. 309– by use of external dither,” M.S. Thesis, Massachusetts Institute of 368, Jan. 1922. Technology, 1959. [3] D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. [27] R. C. Jaffe, “Causal and statistical analyses of dithered systems con- Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with taining three-level quantizers,” M.S. Thesis, Massachusetts Institute of a single-photon camera,” Nature Commun., vol. 7, p. 12046, 2016. Technology, 1959. [4] J. Rapp, R. M. A. Dawson, and V. K. Goyal, “Dither-enhanced lidar,” [28] L. Schuchman, “Dither signals and their effect on quantization noise,” in Imaging and Applied Optics, Orlando, FL, Jun. 2018, p. JW4A.38. IEEE Trans. Commun., vol. 12, no. 4, pp. 162–165, Dec. 1964. [5] ——, “Improving lidar depth resolution with dither,” in Proceedings [29] R. M. Gray and T. G. Stockham, “Dithered quantizers,” IEEE Trans. of the IEEE International Conference on Image Processing, Oct. 2018, Inform. Theory, vol. 39, no. 3, pp. 805–812, 1993. pp. 1553–1557. [30] R. A. Wannamaker, S. P. Lipshitz, J. Vanderkooy, and J. N. Wright, “A [6] S. B. Vardeman and C.-S. Lee, “Likelihood-based statistical estimation theory of nonsubtractive dither,” IEEE Trans. Signal Process., vol. 48, from quantized data,” IEEE Trans. Instrum. Meas., vol. 54, no. 1, pp. no. 2, pp. 499–516, 2000. 409–414, Feb. 2005. [31] H. C. Papadopoulos, G. W. Wornell, and A. V. Oppenheim, “Sequen- [7] A. Moschitta, J. Schoukens, and P. Carbone, “Information and statistical tial signal encoding from noisy measurements using quantizers with efficiency when quantizing noisy DC values,” IEEE Trans. Instrum. dynamic bias control,” IEEE Trans. Inform. Theory, vol. 47, no. 3, pp. Meas., vol. 64, no. 2, pp. 308–317, Feb. 2015. 978–1002, 2001. 15

[32] D. Rousseau and G. V. Anand, “Nonlinear estimation from quantized [53] N. C. Beaulieu and Q. Guo, “True ML estimator for the location signals: Quantizer optimization and stochastic resonance,” in Third parameter of the generalized Gaussian distribution with p = 4,” IEEE International Symposium on Physics in Signal and Image Processing, Comm. Lett., vol. 17, no. 1, pp. 155–157, Jan. 2013. 2003, pp. 89–92. [54] G. R. Arce, Nonlinear Signal Processing: A Statistical Approach. John [33] A. Ribeiro and G. B. Giannakis, “Bandwidth-constrained distributed Wiley & Sons, 2005. estimation for wireless sensor networks - part I: Gaussian case,” IEEE [55] A. Bovik, T. Huang, and D. Munson, “Nonlinear filtering using linear Trans. Signal Process., vol. 54, no. 3, pp. 1131–1143, Mar. 2006. combinations of order statistics,” in Proc. IEEE Int. Conf. Acoust., [34] O. Dabeer and A. Karnik, “Signal parameter estimation using 1-bit Speech, and Signal Process., vol. 7, May 1982, pp. 2067–2070. dithered quantization,” IEEE Trans. Inform. Theory, vol. 52, no. 12, [56] ——, “A generalization of median filtering using linear combinations of pp. 5389–5405, Dec. 2006. order statistics,” IEEE Trans. Acoust. Speech Signal Process., vol. 31, [35] J. Fang and H. Li, “Distributed adaptive quantization for wireless sensor no. 6, pp. 1342–1350, Dec 1983. networks: From delta modulation to maximum likelihood,” IEEE Trans. [57] A. K. Gupta, “Estimation of the mean and standard deviation of a Signal Process., vol. 56, no. 10, pp. 5246–5257, Oct. 2008. normal population from a censored sample,” Biometrika, vol. 39, no. 3, [36] R. C. Farias and J. M. Brossier, “Scalar quantization for estimation: pp. 260–273, Dec. 1952. From an asymptotic design to a practical solution,” IEEE Trans. Signal [58] G. Blom, “On linear estimates with nearly minimum variance,” Arkiv Process., vol. 62, no. 11, pp. 2860–2870, Jun. 2014. för Matematik, vol. 3, no. 4, pp. 365–369, Jan. 1957. [37] C. Gianelli, L. Xu, J. Li, and P. Stoica, “One-bit compressive sampling [59] R. Öten and R. J. P. de Figueiredo, “An efficient method for L-filter with time-varying thresholds: Maximum likelihood and the Cramér-Rao design,” IEEE Trans. Signal Process., vol. 51, no. 1, pp. 193–203, Jan. bound,” in Conf. Rec. Asilomar Conf. on Signals, Syst. & Computers, 2003. Nov. 2016, pp. 399–403. [60] Anonymous, “Dissertation sur la recherche du milieu le plus probable, [38] A. M. Mood, F. A. Graybill, and D. C. Boes, Introduction to the Theory entre les résultats de plusieurs observations ou expériences,” Annales of Statistics, 3rd ed. McGraw-Hill, 1974. de Mathématiques pures et appliquées, vol. 12, pp. 181–204, 1821. [39] E. H. Lloyd, “Least-squares estimation of location and scale parameters [61] S. M. Stigler, “The anonymous Professor Gergonne,” Historia Mathe- using order statistics,” Biometrika, vol. 39, no. 1/2, pp. 88–95, Apr. matica, vol. 3, no. 1, pp. 71–74, Feb. 1976. 1952. [62] P. Daniell, “Observations weighted according to order,” American [40] J. D. Broffitt, “An example of the large sample behavior of the Journal of Mathematics, vol. 42, no. 4, pp. 222–236, 1920. midrange,” The American Statistician, vol. 28, no. 2, pp. 69–70, 1974. [63] S. M. Stigler, “Simon Newcomb, Percy Daniell, and the history of robust [41] P. Carbone, C. Narduzzi, and D. Petri, “Performance of stochastic quan- estimation 1885-1920,” Journal of the American Statistical Association, tizers employing nonlinear processing,” IEEE Trans. Intrum. Meas., vol. 68, no. 344, p. 31, 1973. vol. 45, no. 2, pp. 435–439, Apr. 1996. [64] J. W. Tukey, “The future of data analysis,” The Annals of Mathematical [42] H. L. Van Trees, K. L. Bell, and Z. Tian, Detection, Estimation, Statistics, vol. 33, no. 1, pp. 1–67, 1962. and Modulation Theory: Detection, Estimation, and Filtering Theory, [65] J. W. Tukey and D. H. McLaughlin, “Less vulnerable confidence 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2013. and significance procedures for location based on a single sample: [43] N. T. Thao and M. Vetterli, “Reduction of the MSE in R-times Trimming/Winsorization 1,” Sankhya:¯ The Indian Journal of Statistics, oversampled A/D conversion from O(1/R) to O(1/R2),” IEEE Trans. Series A (1961-2002), vol. 25, no. 3, pp. 331–352, 1963. Signal Process., vol. 42, no. 1, pp. 200–203, Jan. 1994. [66] A. Restrepo and A. C. Bovik, “Adaptive trimmed mean filters for image [44] ——, “Lower bound on the mean-squared error in oversampled quan- restoration,” IEEE Trans. Acoust. Speech Signal Process., vol. 36, no. 8, tization of periodic signals using vector quantization analysis,” IEEE pp. 1326–1337, 1988. Trans. Inform. Theory, vol. 42, no. 2, pp. 469–479, Mar. 1996. [67] N. C. Beaulieu and Q. Guo, “Novel estimator for the location parameter IEEE Comm. Lett. [45] V. K. Goyal, M. Vetterli, and N. T. Thao, “Quantized overcomplete of the generalized Gaussian distribution,” , vol. 16, N no. 12, pp. 2064–2067, Dec. 2012. expansions in R : Analysis, synthesis, and algorithms,” IEEE Trans. Inform. Theory, vol. 44, no. 1, pp. 16–31, Jan. 1998. [68] J. A. Nelder and R. Mead, “A simplex method for function minimiza- tion,” The Computer Journal, vol. 7, no. 4, pp. 308–313, Jan. 1965. [46] S. Rangan and V. K. Goyal, “Recursive consistent estimation with bounded noise,” IEEE Trans. Inform. Theory, vol. 47, no. 1, pp. 457– [69] P. Carbone and D. Petri, “Effect of additive dither on the resolution 464, Jan. 2001. of ideal quantizers,” IEEE Trans. Instrum. Meas., vol. 43, no. 3, pp. 389–396, Jun. 1994. [47] L. Jacques, D. K. Hammond, and J. M. Fadili, “Dequantizing com- pressed sensing: When oversampling and non-Gaussian constraints [70] E. J. Dudewicz and S. N. Mishra, “Moments,” in Modern Mathematical combine,” IEEE Trans. Inform. Theory, vol. 57, no. 1, pp. 559–571, Statistics. New York: Wiley, 1988, ch. 5, pp. 216–228. Jan. 2011. [48] A. Zymnis, S. Boyd, and E. Candès, “Compressed sensing with quan- tized measurements,” IEEE Signal Process. Lett., vol. 17, no. 2, pp. 149–152, Feb. 2010. [49] U. Kamilov, V. K. Goyal, and S. Rangan, “Message-passing de- quantization with applications to compressed sensing,” IEEE Trans. Signal Process., vol. 60, no. 12, pp. 6270–6281, Dec. 2012. [50] M. K. Varanasi and B. Aazhang, “Parametric generalized Gaussian density estimation,” The Journal of the Acoustical Society of America, vol. 86, no. 4, pp. 1404–1415, 1989. [51] Q. Zhao, H.-W. Li, and Y.-T. Shen, “On the sum of generalized Gaussian random signals,” in Proceedings of the International Conference on Signal Processing, vol. 1, Aug. 2004, pp. 50–53. [52] H. Soury and M. S. Alouini, “New results on the sum of two generalized Gaussian random variables,” in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Dec. 2015, pp. 1017–1021.