Optimal Trade-Off Between Sampling Rate and Quantization Precision in A/D Conversion

Optimal Trade-off Between Sampling Rate and Quantization Precision in A/D conversion

Alon Kipnis, Yonina C. Eldar and Andrea J. Goldsmith

Abstract—The jointly optimized sampling rate and quantiza- distortion in A/D converters with fixed bitrate outputs. Under tion precision in A/D conversion is studied. In particular, we this output constraint, a faster sampling frequency results in consider a basic pulse code modulation A/D scheme in which a a lower quantization precision, and vice versa. Hence, there stationary process is sampled and quantized by a scalar quantizer. We derive an expression for the minimal mean squared error may be some A/D converters for which sub-Nyquist sampling under linear estimation of the analog input from the digital minimizes distortion. Our results will thus indicate when output, which is also valid under sub-Nyquist sampling. This sampling at Nyquist or sub-Nyquist rates yields minimum expression allows for the computation of the sampling rate that A/D distortion. minimizes the error under a fixed bitrate at the output, which is the result of an interplay between the number of bits allocated to each sample and the distortion resulting from sub-Nyquist A very basic A/D conversion scheme is obtained by sampling. We illustrate the results for several examples, which sampling and quantizing each sample using a scalar quantizer, demonstrate the optimality of sub-Nyquist sampling in certain which is referred to as pulse code modulation (PCM) [4]. cases. Under this scheme, the overall bitrate R in the resulting digital representation is the product of the sampling rate f I.INTRODUCTION s and the quantizer bit precision q. In this work we analyze Representing an analog signal by a sequence of bits leads A/D conversion via PCM as a source coding scheme: we to a fundamental trade-off between the minimal distortion consider the minimal error as a function of the bitrate R in the reconstruction of the signal from this sequence and by assuming a statistical model on the input process and its bitrate. This is described by the distortion-rate function mean squared error (MSE) as our performance metric. The (DRF) of the analog source. While the DRF gives the minimal quantization distortion is modeled as an additive white noise distortion only as a function of the bitrate of the digital whose magnitude decreases exponentially with the bits-per- representation, in practice, A/D conversion schemes involve sample q, where q = R/ fs. While this model was found to sampling and quantization. Therefore, hardware limitations in be accurate when the quantizer resolution is relatively high sampling and quantization determine current A/D technology. [5], the white noise assumption may not hold under a very For instance, a key idea in determining the analog DRF coarse quantizer. Nevertheless, an analysis of the minimal is to map the continuous-time process into a discrete-time MSE from single-bit measurements which does not use the process based on sampling at or above the Nyquist frequency white noise model provided MSE improvement of only up [1, Sec. 4.5.3]. However, since wideband signaling and A/D to 3db per octave in the bitrate compared to analysis using technology limitations can preclude sampling signals at their the white noise model [6]. This implies that the results in Nyquist rate [2], an optimal source code based on such a this work would suffer only a minor change under an exact discrete-time representation may be impractical in certain model of the low-resolution quantizer. This approximation scenarios. does not affect our conclusions which are based only on the scaling behavior of the MSE as a function of the sampling An approach combining sampling and source coding, rate and the quantizer resolution. We elaborate more on this in which an analog Gaussian process is described from a approximation in Section II. rate-limited version of its samples, was considered in [3]. The main finding of [3] is that sampling at or above the When bitrate considerations are ignored, the PCM A/D con- Nyquist rate may not be necessary in order to achieve the version scheme considered here and higher order schemes such DRF D(R). Specifically, for each bitrate R, there exists as Sigma-Delta modulation (Σ∆M) benefit from oversampling another fundamental rate fRD which may be smaller than the (sampling above the Nyquist rate of the input signal), which Nyqusit rate, such that sampling at the rate fRD is enough reduces in-band quantization noise. This increases the effective to achieve D(R). In addition, [3] derives the existence of resolution of the quantizer, which is usually taken to be very a range of sampling frequencies where distortion due to coarse (typically 1-bit). While these modulators are attractive sub-Nyquist sampling can be traded with distortion due to due to their relatively cheap hardware implementation [7], high lossy compression, without affecting the overall distortion correlation between consecutive time samples taken at high sum. This general result serves as the motivation for the sampling rates implies that conventional oversampled modu- present work, where we seek to derive the optimal trade-off lations cannot lead to an efficient memory utilization without between sampling rate and quantization precision to minimize further coding the samples [8]. Even with the additional coding fs SX ( f ) mmse ( fs) Y[·] X X(·) Ha(·) lossy compression error preserved spectrum R Xˆ (·) Dec Enc θ Fig. 1: Combined sampling and source coding model.

f suggested in [9], the sampling rate required to approach the fs/2 fB DRF is still very high compared to sampling at the Nyquist rate. Other oversampled A/D conversion approaches which Fig. 2: Reverse waterfilling interpretation of (3): The function achieve exponential error reduction with the bitrate were D( fs,R) of a unimodal SX ( f ) and zero noise is given by the proposed in [10], but so far have not been realized in practice. sum of the sampling error and the lossy compression error. Since A/D technology limits sampling rates, oversampled A/D may not be practical for applications with signals of wide bandwidth, such as white-space estimation in cognitive radio Gaussian stationary process X(·), this DRF is obtained in terms systems [11], [2]. In addition, high sampling rates increase of SX ( f ), the PSD of X(·), by a parametric expression derived the memory requirements of the A/D and the system power by Pinsker [12] with a reverse waterfilling interpretation. A consumption. key idea in proving the source coding theorem which ties These challenges in A/D technology motivate us to under- Pinsker’s expression to the A/D conversion problem is to map stand how to sample in a memory-efficient manner. We impose X(·) into a discrete-time process based on sampling above its a constraint on the bitrate at the output of the system and Nyquist frequency fNyq [1, Sec. 4.5.3]. The situation in which examine the trade-off between sampling rate and distortion. sampling at the Nyquist rate fNyq cannot be achieved due to We show that under certain assumptions on the signal, the system constraints [2] gives rise to the combined sampling rate-distortion function can be approached by sampling below and source coding problem depicted in Fig. 1 and solved in the Nyquist rate. [13]. In this setting, X(·) is described from a rate R limited The main result of this paper is an expression for the version of its sub-Nqyuist samples Y[·]. The minimal distortion minimal MSE (MMSE) in A/D conversion using PCM under in reconstruction taken over all such descriptions is denoted a fixed bitrate R at the output of the modulator. The result is as D( fs,R). Under the assumption that SX ( f ) is unimodal and valid for any sampling rate, regardless if the input is band- Ha( f ) is lowpass with cutoff freqeuncy fs/2, D( fs,R) takes limited or not. This result allows us to compute the sampling the following form [13, Eq. 9] ? rate fs that minimizes the MMSE for this bitrate. To our Z fs knowledge, this is the first analysis of A/D conversion under 1 2 + R(θ) = log [SX ( f )/θ]d f , (1a) a fixed bitrate in the sub-Nyquist regime. We show that in the 2 − fs ? 2 case where the input signal is band-limited, fs is obtained Z fs ? 2 at the Nyquist rate or below it. The value of fs depends on D(θ) = mmseX ( fs) + min{SX ( f ),θ}d f , (1b) − fs the power spectrum distribution (PSD). The more uniform 2 ? this distribution, the closer fs is to the Nyquist rate. We + ? where log (x) = max{0,log(x)} and compare the behavior of fs as a function of R to the minimal Z sampling rate fRD, as defined in [3], that is needed to achieve mmse ( fs) SX ( f )d f . (2) X , fs fs the quadratic Gaussian DRF for several example input signals. R\ − 2 , 2

The rest of this paper is organized as follows: in Section II A waterfilling interpretation of (1) is illustrated in Fig. 2. we provide the relevant background on PCM, MSE estimation Assume that X(·) is band-limited to fB. If fs > 2 fB, then in sub-Nyquist sampling and the distortion-rate function of there is no loss of information in the sampling process, in sub-Nyquist sampled processes. Our main results and discus- which case we have sion are given in Section III. Concluding remarks are provided D( f ,R) = D(R), f ≥ 2 f , (3) in Section IV. s s B where D(R) is the (standard) quadratic DRF of the analog II.BACKGROUNDAND PROBLEM FORMULATION Gaussian source X(·), which is obtained by the celebrated A. Distortion-Rate Theory of Sampled Processes reverse waterfilling expression of Pinsker [12]. In fact, it An information theoretic bound on the MMSE in any A/D follows from [3] that if the energy of X(·) is not uniformly conversion scheme whose output bitrate is constrained to R bits distributed over its bandwidth, then there exists a source per time unit is given by the DRF of the analog source. For a coding rate R and a minimal sampling rate fRD < 2 fB such that (3) holds for all f ≥ f . This critical sampling rate can s RD fs be computed by the equation Y[·] X(·) Ha( f ) Z fRD 1 2 + R = f log [SX ( f )/SX ( fRD)]d f . (4) 2 − RD linear MMSE R = fsq 2 Xˆ (·) Q reconstructor It is shown in [3] that fRD is monotonically increasing in R q-bit scalar and approaches the Nyquist rate 2 fB as R goes to infinity. quantizer This result can be seen as an extension of the Shannon- Nyquist-Whittaker sampling theorem to the scenario where a Fig. 3: System model of A/D with a scalar quantizer. finite bitrate constraint is imposed [14].

In this paper we compare the distortion-rate characteristics of a non-ideal A/D conversion scheme under a bitrate con- There exists a vast literature on the conditions under which assumption (A1) provides a good approximation to the sys- straint to the information theoretic bound D( fs,R). tem behavior. For example, in [15] it was shown that two B. Problem Formulation: Pulse Code Modulation consecutive samples η[n] and η[n + 1] are approximately We study the performance of the sampling, quantization uncorrelated if the distribution of Y[·] is smooth enough, and reconstruction scheme described in Fig. 3. The input where this holds even if the sizes of the quantization bins process is an analog wide-sense stationary (WSS) process are on the order of the variance of Y[·] [18]. This justifies X(·) = {X(t), t ∈ R} with PSD the assumption that the process η[·] is white. Bennett [19] Z ∞ derived the following conditions under which η[·] and Y[·] −2πiτ f SX ( f ) , E[X(t + τ)X(τ)]e dτ. are approximately uncorrelated: smooth PSD of Y[·], uniform −∞ quantization bins and a high quantizer resolution q. Since The discrete-time process Y[·] = {Y[n], n ∈ Z} is obtained in our setting we are also interested in the low quantizer by uniformly sampling the filtered process at frequency fs, resolution regime, a better justification for this approximation namely is required. This will be the result of the following proposition, Y[n] , (X(·) ? ha(·))(n/ fs), n ∈ Z, proof of which can be found in Appendix A. where ha(t) is the impulse response of the analog filter Ha( f ). Proposition 1. The MMSE in estimating X(·) from Yˆ [·] in (5) Let Yˆ [n] be the process at the output of the quantizer at time is not smaller than the MMSE in estimating X(·) from the n, and denote by η[n] the quantization error, i.e., process

Yˆ [n] = Y[n] + η[n], n ∈ Z. (5) Y˜ [·] , Y[n] + η˜ [n], n ∈ Z, The variance of η[n] is proportional to the size of the quantization bins, and decreases exponentially with the bit resolution where η˜ [·] is a stationary process possibly correlated with Y[·], 2πiφ 2πiφ q, provided the size of the bins decreases uniformly [15]. The with PSD Sη˜ e = Sη e . non-linear relation between the quantizer input and its output Proposition 1 implies that the assumption of an uncorrelated complicates the analysis and usually calls for a simplifying quantization noise and input signal can only increase the assumption that linearizes the problem. A common assumption error, compared to an estimation scheme under the same which we will adopt here is: marginal noise statistics that also takes into account the (A1) The process η[·] is i.i.d, uncorrelated with Y[·] and with correlation between the samples and the quantization noise. variance We conclude that the analysis under assumption (A1) yields c0 σ 2 = . (6) a good approximation to the true error if the quantizer η (2q − 1)2 resolution q is high, and provides an upper bound when q This assumption implies that the PSD of η[·] equals is low. The tightness of this upper bound can be learned S e2πiφ = c0 ∈ (− . , . ) c η (2q−1)2 for any φ 0 5 0 5 . The constant 0 from [20], where it was shown that PCM with a single bit depends on statistical assumptions on the input signal. For quantizer leads to a reduction in the MSE of no more than example, if the amplitude of the input signal is bounded within 3db per octave more than an analysis that assumes (A1). the interval (−Am/2,Am/2), then we can assume that the Am quantization bins are uniformly spaced and c0 = 12 . If the Under (A1), the relation between the input and the output input is Gaussian with variance σ 2 and the quantization rule of the system can be represented in the z domain by is chosen according to the ideal point density allocation of the Lloyd algorithm [16], then [17, Eq. 10] Yˆ (z) = Y(z) + (z). √ η (8) π 3 c = σ 2. (7) 0 2 This leads to the following relation between the corresponding η[n] Using Holder’s¨ inequality and monotonicity of the function x x → x+1 , the integrand in (11) can be bounded for each f in the integration interval (− fs/2, fs/2) by fs X(·) Ha( f ) + Yˆ [n] (S?( f ))2 Y[n] ? 2 , (12) S ( f ) + ση / fs Fig. 4: Sampling and quantization system model. where ? 2 S ( f ) = supSX ( f − fsk)|Ha ( f − fsk)| . (13) k∈Z PSDs: This leads to a lower bound on mmseX|Yˆ ( fs,Ha). Under 2πiφ 2πiφ 2πiφ the assumption that SX ( f ) is unimodal in the sense that it SYˆ e = SY e + Sη e is symmetric and non-increasing for f > 0, for each f ∈ = f S ( f − f k)|H ( f − f k)|2 + σ 2. (9) s ∑ X s a s η (− f /2, f /2) the supremum in (13) is obtained for k = 0. k∈ s s Z This implies that (12) is achievable if the pre-sampling filter A system that realizes the input-output realtion (8) is given is a low-pass filter with cut-off frequency fs/2, namely in Fig. 4, where, in accordance with (A1), η[·] is white noise ( 1, | f | ≤ f /2, independent of X(·). H?( f ) = s (14) a 0, otherwise. C. MMSE Estimation This choice of Ha( f ) in (11) leads to the following: We are interested in the MMSE in estimating X(·) from Z fs its sub-Nyquist and noisy samples Yˆ [·]. Namely, the minimal ? 2 SX ( f ) mmseX|Y ( fs) = mmseX ( fs) + d f , (15) value of − fs 1 + SNR( f ) Z T 2 1 ˆ 2 lim E X(t) − X(t) (10) where mmse ( fs) is define in (2) and T→∞ 2T −T X 2 fs fs over all possible reconstruction methods of X(·) from Yˆ [·] of SNR( f ) fsSX ( f )/σ , − ≤ f ≤ . (16) , η 2 2 the form Henceforth, we will consider only processes with unimodal Xˆ (t) = w(t,n)Yˆ [n], ∑ PSD, so that the MMSE under optimal pre-sampling filtering n∈ Z is given by (15). See [13] for the optimization of the where w(t,n) is square summable in n for every t ∈ R. expressions of the form (11) in the case where SX ( f ) is not Standard linear estimation theory leads to the following propo- unimodal. sition: Since the SNR increases linearly in f , the MMSE of X(·) Proposition 2. Consider the system in Fig. 4. The minimal s give Yˆ [·] decreases by a factor of 1/ f for f > 2 f provided time-averaged MSE (10) in linear estimation of X(·) from Yˆ [·] s s B all other parameters are independent of f . In the next section is given by s we study (15) when in addition the quantizer resolution is inversely proportional to fs, so as to keep a constant bitrate at mmse , mmseX|Yˆ ( fs,Ha) (11) the output as fs varies. Z fs 2 2 2 1 2 ∑k∈Z SX ( f − fsk)|Ha ( f − fsk)| = σX − 2 d f III.MAIN RESULT:PCM UNDERA FIXED BITRATE fs − fs 2 2 ∑k∈Z SX ( f − fsk)|Ha ( f − fsk)| + ση / fs In the PCM A/D conversion system of Fig. 3 with sampling Proof: see Appendix B. frequency fs and a quantizer resolution of q bits per sample, the amount of memory per time unit, or the bitrate at the Note that in Proposition 2 we have not limited ourselves to output of the system, equals band-limited input processes or to sub-Nyquist sampling. An expression for the optimal estimator w?(t,n) can be derived R , q fs from the proof. It can be shown to be of the form bits per time unit. Since in this model the A/D converter must use at least one bit per sample, we limit f to be smaller than w?(t,n) = w(t − n/ f ), s s the bitrate R. In this section we fix R and study the MMSE as a where the Fourier transform of w(t) is function of the sampling frequency fs. Under this assumption, the variance of the quantization noise from (6) satisfies ∗ Ha ( f )SX ( f ) c c W( f ) = . 2 = 0 = 0 . (17) 2 2 ση q 2 2R/ f 2 ∑k∈Z |Ha( f )| SX ( f − fsk) + ση / fs (2 − 1) (2 s − 1) The details are given in [21]. The linear MMSE in estimating X(·) from Yˆ under the optimal pre-sampling gives rise to an approximation to the quantization noise mmseX ( fs) −2 Π ( f Ha(φ fs) Ha(φ fs) ) 1 1

−4 0.8 0.8 φ ) φ ) i i S 0.6 0.6 π π 2 2 G f0 = fB e ( 0.4 0.4 ( (e f Y Y ) S S 0.2 0.2 −6 Π( f ) Λ( f ) SG( f )

0 0

] −4 −3 −2 −1 0 1 2 3 4 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −4 −3 −2 −1 0 1 2 3 4 2πiφ Sη (e ) dB

)[ −8 2πiφ R Λ S (e ) ,

η s ( f f ?

( ) ˜

φ D 1 1 1 1 − f f f − f −10 2 − B B 2 − B 2 2 B fs fs fs fs ? (a) fs > 2 fB (b) fs < 2 fB −12 ? Fig. 5: Spectral interpretation of Proposition 3: no sampling −14 fs error when sampling above the Nyquist rate, but intensity of 0.5 1 1.5 2 2 fB in-band quantization noise increases. Fig. 6: MMSE as a function of fs for a ﬁxed R and various PSDs, which are given in the small frames. The dashed curves are the corresponding DRF in sub-Nyquist sampling D( fs,R). operational distortion-function of the PCM, which we denote ? The rates fs and fRD corresponds to the ? and , respectively. as D˜ ( fs,R). From (15) and (17) we obtain the following expression for D˜ ( fs,R): Proposition 3. The MMSE in estimating X(·) from Yˆ [·] implies that D˜ (2 fB,R) ≤ D˜ ( fs,R) for all fs > 2 fB. assuming (A1) and R = q fs is as follows: fs ? Z 2 S ( f ) How much fs is below 2 fB is determined by the derivative D˜ ( f ,R) = mmse ( f ) + X d f (18) s X s f mmse ( f ) − S ( f / ) − s 1 + SNR( f ) of X s , which equals 2 X s 2 . For example, in the 2 case of the rectangular PSD: where 2 ( 2 σ 1 | f | ≤ fB, R/ fs SX ( f ) SNR( f ) = SNR fs,R( f ) = fs 2 − 1 . (19) Π( f ) = (20) c0 2 fB 0 | f | > fB, and mmseX ( fs) is given by (10). 2 the derivative of −2SX ( fs/2) for fs < 2 fB is −σ . The We will denote the two terms in the RHS of (18) as derivative of the second term in (18) is smaller than σ 2 for the sampling error and the quantization error, respectively. most choices of system parameters1. It follows that 0 is in ˜ ? Fig. 6 shows the MMSE (18) as a function of fs for a the sub-gradient of D( fs,R) at fs = 2 fB, and thus fs = 2 fB, given R and various PSDs compared to their corresponding i.e., Nyquist rate sampling is optimal when the energy of the quadratic Gaussian DRF under sub-Nyquist sampling (1). In signal is uniformly distributed over its bandwidth. Two more Fig. 6 and in other ﬁgures throughout, we use the c0 in (7) input signal examples are given below. which corresponds to an optimal point density of the Gaussian distribution. Example 1 (triangular PSD). Consider an input signal PSD σ 2 f A. An Optimal Sampling Rate Λ( f ) = max 1 − ,0 . (21) fB fB The quantization error in (18) is an increasing function of fs, whereas the sampling error mmseX ( fs) decreases in fs. This For any fs ≤ 2 fB, we have ? situation is illustrated in Fig. 5. The sampling rate fs that 2 2 minimizes (18) is obtained at an equilibrium point where the 2 σ fs mmseX ( fs) = σ − fs − . derivatives of both terms are of equal magnitudes. Fig. 6 shows fB 4 f B ? that fs depends on the particular form of the input signal’s PSD. If the signal is band-limited, we obtain the following Since the derivative of mmseX ( fs), which is −2Λ( fs/2), 2 result: changes continuously from 0 to −2σ / fB as fs varies from ? ? 2 fB to 0, we have 0 < fs < 2 fB. The exact value of fs depends 2 Proposition 4. If SX ( f ) = 0 for all | f | > fB, then the sampling on R and the ratio σ /c0. It converges to 2 fB as the value of ? ˜ rate fs that minimizes D( fs,R) is not bigger than 2 fB. any of these two increases.

Proof: Note that SNR fs,R( f ) is an increasing function of fs in the interval 0 ≤ fs ≤ R. Since we assume X(·) is −2 mmse ( f ) = f ≥ f 1This holds if 1 > c0 20.5R/ fB − 1 . band-limited, we have X s 0 for s 2 B. This σ2 ? ? fs It follows that fs cannot be larger than the Nyquist rate as 2 fB 1 stated in Proposition 4, and is strictly smaller than Nyquist when the energy of X(·) is not uniformly distributed over

0.9 f RD its bandwidth, as in Example 1. In this case, some distortion due to sampling is preferred in order to increase the quantizer 0.8 resolution. In other words, restricted to scalar quantization, the f ? s optimal rate R code is achieved by sub-sampling. This behav- 0.7 ior of D˜ ( fs,R) is similar to the behavior of the information theoretic bound D( fs,R), as both provide an optimal sampling 0.6 rate which balances sampling error and lossy compression error. On the other hand, oversampling introduces redundancy 0.5 into the PCM representation, and yields a worse distortion-rate ? ˜ code than with fs = fs . In this aspect the behavior of D( fs,R) 0.4 R 1 2 3 4 5 6 2 fB is different than D( fs,R), since the latter does not penalize oversampling3. ? 2 Fig. 7: Optimal sampling rate fs and fRD versus R for the The trade-off between sampling rate and quantization pre- process with PSD Λ( f ) of (21). cision is particularly interesting in the case where the signal is not band-limited: Although there is no sampling rate that guarantees perfect reconstruction, there is still a sampling rate Example 2 (PSD of unbounded support). Consider an input that optimizes the aforementioned trade-off and minimizes the signal PSD of the form MMSE under a bitrate constraint. ? The similarity between fs and fRD as a function of R 2 2 σ ( f / f0) suggests that in order to implement a sub-Nyquist A/D con- SG( f ) = √ e 2 , f ∈ R, (22) 2π f0 verter that operates close to the minimal information theoretic sampling rate f , the principle of trading quantization bits where f > 0. For a process with the PSD (22) there exists RD 0 with sampling rate must be taken into account. The observation a non-zero sampling error mmse ( f ) for any finite sampling X s that rate fs, and therefore the argument in Proposition 4 does not ? hold. fs ≤ fRD (24) ? in Examples 1 and 2 raises the conjecture as to whether (24) We can compare fs in each of the examples above to holds in general. This may be explained by the diminishing the minimal sampling rate fRD that achieves the quadratic distortion-rate function of a Gaussian process with the same effect of reducing the sampling rate on the overall error. In ˜ ? PSD, given by (4). In the case of the PSD (21), the relation other words, the fact that D( fs ,R) ≥ D(R) implies that a distortion-rate achievable scheme is more sensitive to changes between fRD and R was derived in [3]: in the sampling rate than the sub-optimal implementation 2 ! ? σ 1 fRD of A/D conversion via PCM. The dependency of fs in R = log f − . (23) ln2 1 − RD 2 fB the spectral energy distribution SX ( f ) has a time-domain 2 fB 2 explanation: for a fixed variance σX , two consecutive time ? We plot fs and the corresponding fRD in Fig. 7, as a function samples taken at the Nyquist rate are more correlated (in ? of R. It can be seen that fs is smaller than fRD, where both their absolute value) when the PSD is not flat. Consequently, approach 2 fB as R increases. In the case of the PSD (22), the more redundancy is present after sampling than in the case relation between fRD and R can be computed from (4). This where the PSD is flat. The main discovery of this paper ? is plotted together with fs versus R in Fig. 8. Note that since is that part of this redundancy can be removed simply SG( f ) is not band-limited, fRD is not bounded in R since there by sub-sampling, where this is in fact the optimal way to is no sampling rate that guarantees perfect reconstruction for remove it when we are restricted to the PCM setting of Fig. 3. this signal.

Discussion IV. CONCLUSIONS Under a ﬁxed bitrate constraint, oversampling no longer A/D conversion via pulse-code modulation under a ﬁxed reduces the MMSE since increasing the sampling rate reduces bitrate at the output introduces a trade-off between the the quantizer resolution and increases the magnitude of the sampling rate and the number of bits we use to quantize each quantization noise. As illustrated in Fig. 5, for any fs below sample. The optimal sampling rate that minimizes the MMSE the Nyquist rate the bandwidth of both the signal and the 3 noise occupies the entire digital frequency domain, whereas This is because in the system model of Fig. 1, the encoder has the freedom to discard redundant information. the magnitude of the noise decreases as more bits are used in 3The curves do not go further left since in our model we restrict the quantizing each sample. sampling rate to be smaller than the output bitrate R. f ? [13] A. Kipnis, A. J. Goldsmith, T. Weissman, and Y. C. 4 s 2 f0 Eldar, “Distortion-rate function of sub-Nyquist sampled Gaussian sources,” 2014, submitted for publication. [Online]. Available: 3.5 http://arxiv.org/abs/1405.5329 f RD [14] A. Kipnis, A. J. Goldsmith, and Y. C. Eldar, “Sampling stationary 3 processes under bitrate constraints,” in preparation. [15] H. Viswanathan and R. Zamir, “On the whiteness of high-resolution 2.5 quantization errors,” IEEE Trans. Inf. Theory, vol. 47, no. 5, pp. 2029– 2038, Jul 2001. 2 [16] S. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inf. Theory, f ? vol. 28, no. 2, pp. 129–137, Mar 1982. s [17] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inf. Theory, 1.5 vol. 44, no. 6, pp. 2325–2383, 1998. [18] B. Widrow, “A study of rough amplitude quantization by means of 1 nyquist sampling theory,” Circuit Theory, IRE Transactions on, vol. 3, no. 4, pp. 266–276, Dec 1956. 0.5 R [19] W. R. Bennett, “Spectra of quantized signals,” Bell System Technical 2 f 4 6 8 10 12 14 16 0 Journal, vol. 27, no. 3, pp. 446–472, 1948. ? 2 [20] N. Thao and M. Vetterli, “Lower bound on the mean-squared error in Fig. 8: Optimal sampling rate fs and fRD versus R for the oversampled quantization of periodic signals using vector quantization process with PSD (22). analysis,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 469–479, Mar 1996. [21] A. Kipnis, A. J. Goldsmith, and Y. C. Eldar, “Optimal trade-off between sampling frequency and quantization precision in A/D conversion,” in obtained as a result of this trade-off is lower than the Nyquist preparation. rate. That is, our analysis shows that to minimize MMSE APPENDIX A between the A/D input and output, some sampling distortion is preferred in order to increase the quantizer resolution. Proof of Proposition 1:

Let X[·] and Z[·] be two jointly stationary processes and let ACKNOWLEDGMENT This work was supported in part by the NSF Center for Y[n] = X[n] + Z[n], n ∈ Z. Science of Information (CSoI) under grant CCF-0939370, the The MMSE under linear estimation of X[·] from Y[·] is given BSF Transformative Science Grant 2010505 and the Intel Col- by laborative Research Institute for Computational Intelligence 1 2πiφ 2πiφ 2πiφ 2 (ICRI-CI). Z 2 SX e SZ e − SXZ e Ecorr = 2 i 2 i 2 i dφ. (25) − 1 S (e π φ ) + S (e π φ ) + 2ℜS (e π φ ) REFERENCES 2 X Z XZ

[1] T. Berger, Rate-Distortion Theory. Wiley Online Library, 1971. We will show that Ecorr cannot exceeds the MMSE when the [2] Y. C. Eldar and T. Michaeli, “Beyond bandlimited sampling,” IEEE correlation between X(·) and Z(·) is zero. Let Signal Process. Mag., vol. 26, no. 3, pp. 48–68, 2009. [3] A. Kipnis, A. J. Goldsmith, and Y. C. Eldar, “Sub-nyquist sam- S e2πiφ = ℜS e2πiφ + iℑS e2πiφ =: u + iv, pling achieves optimal rate-distortion,” in Information Theory Workshop XZ XZ XZ (ITW), 2014 IEEE, Apr 2015. [4] B. Oliver, J. Pierce, and C. Shannon, “The philosophy of PCM,” Proc. where u,v ∈ R. The integrand in (25) can be written as IRE, vol. 36, no. 11, pp. 1324–1331, Nov 1948. 2πiφ 2πiφ 2 2 IEEE Trans. Inf. Theory SX e Sη e − u − v [5] R. Gray, “Quantization noise spectra,” , vol. 36, . no. 6, pp. 1220–1244, Nov 1990. 2πiφ 2πiφ (26) SX (e ) + SZ (e ) + 2u [6] N. Thao and M. Vetterli, “Reduction of the mse in r-times oversampled ˆ A/D conversion O(1/R) to O(1/R2),” IEEE Trans. Signal Process., 2πiφ 2πiφ 2πiφ 2 vol. 42, no. 1, pp. 200–203, Jan 1994. Since SX e Sη e ≥ SXZ e we have [7] J. de la Rosa, “Sigma-delta modulators: Tutorial overview, design guide, 2 2 2πiφ 2πiφ and state-of-the-art survey,” IEEE Trans. Circuits Syst., vol. 58, no. 1, u + v ≤ SX e Sη e . (27) pp. 1–21, Jan 2011. [8] N. Thao and M. Vetterli, “Lower bound on the mean squared error in Note that (26) is positive and maximizing it is equivalent to multi-loop Sigma Delta modulation with periodic bandlimited signals,” in The Twenty-Eighth Asilomar Conference on Signals, Systems and maximizing (25). Since (26) is convex in u and v, it obtains its Computers, vol. 2, Oct 1994, pp. 1536–1540 vol.2. maximum over the boundary deﬁned by (27). Speciﬁcally, the [9] Z. Cvetkovic and M. Vetterli, “On simple oversampled A/D conversion maximum of (26) in the domain (27) is obtain at u = v = 0. in l2(r),” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 146–154, Jan 2001. [10] C. S. Gunt¨ urk,¨ “One-bit sigma-delta quantization with exponential This implies that Communications on Pure and Applied Mathematics accuracy,” , vol. 56, 2πiφ 2πiφ SX e SZ e no. 11, pp. 1608–1630, 2003. E ≤ , [11] G. Hattab and M. Ibnkahla, “Multiband spectrum access: Great promises corr 2πiφ 2πiφ (28) SX (e ) + SZ (e ) for future cognitive radio networks,” Proceedings of the IEEE, vol. 102, no. 3, pp. 282–306, March 2014. and the RHS of (28) is the expression for the MMSE under [12] A. Kolmogorov, “On the shannon theory of information transmission in 2πiφ the case of continuous signals,” IRE Trans. Inform. Theory, vol. 2, no. 4, linear estimation when SXZ e ≡ 0, i.e., when Z(·) is pp. 102–108, December 1956. uncorrelated with X(·). APPENDIX B obtained from (32) by changing the integration variable from In this Appendix we provide the proof of Proposition 2. φ to f = φ fs.

For 0 ≤ ∆ ≤ 1 deﬁne

X∆[n] , X ((n + ∆)Ts), n ∈ Z, −1 ˆ where Ts , fs . Also define X∆[n] to be the optimal MSE estimator of X∆[n] from Yˆ [·], that is Xˆ∆[n] = E X∆[n]|Yˆ [·] , n ∈ Z. The MSE in (10) can be written as Z N+1 1 ˆ 2 mmseX|Yˆ = lim E X(t) − X(t) dt N→∞ 2N + 1 −N N Z 1 1 ˆ 2 = lim ∑ E X ((n + ∆)Ts) − X ((n + ∆)Ts) d∆ N→∞ 2N + 1 n=−N 0 N Z 1 1 ˆ 2 = lim ∑ E X∆[n] − X∆[n] d∆ N→∞ 2N + 1 n=−N 0 1 Z 2 = E X∆[n] − Xˆ∆[n] ∆. (29) 0 2πiφ 2πiφ ˆ Note that SX∆ e = SY e and X∆[·] and Y[·] are jointly stationary with cross-PSD S e2πiφ = S e2πiφ = f S ( f (k − φ))e2πi∆(k−φ). X∆Yˆ X∆ s ∑ X s k∈Z Denote by S e2πiφ the PSD of the estimator obtained by X∆|Yˆ the discrete Wiener filter for estimating X∆[·] from Yˆ [·]. We have 2πiφ ∗ 2πiφ SX Yˆ e S ˆ e S e2πiφ = ∆ X∆Y X∆|Yˆ 2πiφ SYˆ (e ) f 2S ( f (k − φ))S ( f (n − φ))e2πi∆(k−n) = s Xa s Xa s ∑ 2πiφ 2πiφ (30) n,k SY (e ) + Sη (e ) 2 Where SXa ( f ) = SX ( f )|Ha( f )| is the PSD of the process at the output of the analog filter. The estimation error in Wiener filtering is given by 2 E X∆[n] − Xˆ∆[n] Z 1 Z 1 2 2πiφ 2 2πiφ = SX e dφ − S ˆ e dphi 1 ∆ 1 X∆|Y − 2 − 2 Z 1 2 2 2πiφ = σ − S ˆ e dφ. (31) X 1 X∆|Y − 2 Equations (29), (30) and (31) leads to Z 1 ˆ 2 mmseX|Yˆ = E X∆[n] − X∆[n] ∆ 0 Z 1 Z 1 2 2 2πiφ = σ − S ˆ e dφ X 1 X∆|Y − 2 0 1 2 a Z 2 fs ∑k∈ S ( fs(k − φ)) = σ 2 − Z Xa dφ, (32) X 1 2πiφ 2πiφ − 2 SY (e ) + Sη (e ) where (a) follows from (30) and the orthogonality of the xk functions e2π ,k ∈ Z over 0 ≤ x ≤ 1. Equation (11) is