A Framework for Sample Efficient Interval Estimation with Control

A Framework for Sample Efficient Interval Estimation with Control Variates Shengjia Zhao Christopher Yeh Stefano Ermon Stanford University Stanford University Stanford University Abstract timates, e.g. for each individual district or county, meaning that we need to draw a sufficient number of We consider the problem of estimating confi- samples for every district or county. Another diffi- dence intervals for the mean of a random vari- culty can arise when we need high accuracy (i.e. a able, where the goal is to produce the smallest small confidence interval), since confidence intervals possible interval for a given number of sam- producedp by typical concentration inequalities have size ples. While minimax optimal algorithms are O(1= number of samples). In other words, to reduce known for this problem in the general case, the size of a confidence interval by a factor of 10, we improved performance is possible under addi- need 100 times more samples. tional assumptions. In particular, we design This dilemma is generally unavoidable because concen- an estimation algorithm to take advantage of tration inequalities such as Chernoff or Chebychev are side information in the form of a control vari- minimax optimal: there exist distributions for which ate, leveraging order statistics. Under certain these inequalities cannot be improved. No-free-lunch conditions on the quality of the control vari- results (Van der Vaart, 2000) imply that any alter- ates, we show improved asymptotic efficiency native estimation algorithm that performs better (i.e. compared to existing estimation algorithms. outputs a confidence interval with smaller size) on some Empirically, we demonstrate superior perfor- problems, must perform worse on other problems. Nev- mance on several real world surveying and ertheless, we can identify a subset of problems where estimation tasks where we use the output of better estimation algorithms are possible. regression models as the control variates. We consider the class of problems where we have side information. We formalize side information as a ran- 1 Introduction dom variable with known expectation and whose value is close (with high probability) to the random variable we want to estimate. Following the terminology in the Many real world problems require estimation of the Monte Carlo simulation literature, we call this side in- mean of a random variable from unbiased samples. In formation random variable a “control variate” (Lemieux, high risk applications, the estimation algorithm should 2017). Instead of the original estimation task, we can output a confidence interval, with the guarantee that estimate the expected difference between the original the true mean belongs to the interval with e.g. 99% random variable and the control variate. The hope is probability. The classic tools for this task are concen- that the distribution of this difference is concentrated tration inequalities, such as the Chernoff, Bernstein, around 0. Some estimation algorithms output very or Chebychev inequalities (Hoeffding, 1962; Bernstein, good (small sized) confidence intervals for distributions 1924; Vershynin, 2010). concentrated around 0 compared to classic methods However, for many tasks, obtaining unbiased samples is such as Chernoff bounds. expensive. For example, when estimating demographic Many practical problems have very good control vari- quantities, such as income or political preference, draw- ates. One important class of problems is when we have ing unbiased samples can require field survey. To make a predictor for the random variable, and we can use the the situation worse, often we seek more granular es- prediction as the control variate. For example, if we would like to estimate a neighborhood’s average voting Proceedings of the 23rdInternational Conference on Artificial pattern, then we might have a prediction function for Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. political preference from Google Street View images of PMLR: Volume 108. Copyright 2020 by the author(s). that neighborhood (Gebru et al., 2017); if we would like A Framework for Sample Efficient Interval Estimation with Control Variates to estimate the average asset wealth of households in a very small probability that Y = +1, then E[Y ] is certain geographic region, we might have a regressor unbounded, but µmean can be finite, which makes it that predicts income from satellite images (Jean et al., impossible to bound jµmean − E[Y ]j. Common assump- 2016). These classifiers could be trained on past data tions include sub-Gaussian, bounded moments, sub- (e.g. previous year survey results) or similar datasets, exponential, etc (Vershynin, 2010). We will consider or they could even be crafted by hand. the sub-Gaussian and bounded moments assumptions in this paper. For these problems we propose concentration bounds based on order statistics. In particular, we draw a Sub-Gaussian: If we assume 8t 2 R; E[et(Y −E[Y ])] ≤ 2 2 connection between the recently proposed concept of eσ t =2, then Chernoff-Hoeffding is the classic con- resilience (Steinhardt, 2018) and concentration bounds centration inequality for confidence interval estima- on order statistics. We show how to use these con- tion (Hoeffding, 1962) centration inequalities to design estimation algorithms " s # that output better confidence intervals when we have 2σ2 2 Pr jµ^ − [Y ]j > log ≤ ζ: a good control variate (e.g. output confidence intervals E m ζ of size O(1=number of samples)). Our proposed estimation algorithm always produces In particular, when Y is supported on [a; b] we have valid confidence intervals, i.e. the true mean belongs to " s # the interval with a specified probability. The only risk (b − a)2 2 is that when the control variate is poor, the confidence Pr jµ^ − [Y ]j > log ≤ ζ: E 2m ζ interval could be worse (larger) than classic baselines such as Chernoff inequalities. We empirically show superior performance of the proposed estimation algo- Bounded Moments: Another common assumption rithm on three real world tasks: bounding regression is that the k-th order moment of Y is bounded, i.e. error, estimating average wealth with satellite images, E[jY − E[Y ]jk] ≤ σk for some σ > 0. and estimating the covariance between wealth and ed- Under bounded moment assumptions, several concen- ucation level. tration inequalities are known, such as the Chebychev inequality, Kolmogorov inequality (Hájek and Rényi, 2 Problem Setup 1955) and Bernstein inequality (Bernstein, 1924). For example, when Y has a bounded second order moment, Our objective is to estimate the mean of some random the Chebychev inequality states the following: variable Y taking values in Y ⊆ d. Given i.i.d. sam- R ples y ; : : : ; y ∼ Y and some choice of confidence level σ 1 m Pr jµ^ − E[Y ]j ≥ p ≤ ζ (2) ζ 2 (0; 1), an estimation algorithm outputs µ^ 2 Rd and ζm an confidence interval size c 2 R+. The estimation algorithm must satisfy It is known that these bounds are (asymptotically in m) minimax optimal. There exist random variables Y Pr [kµ^ − E[Y ]k > c] ≤ ζ (1) that satisfy the respective assumptions of each inequality, and the inequality cannot be improved (Hoeffding, where the probability Pr is with respect to the random 1962). Therefore, to further improve these estimation samples of y ; : : : ; y and any additional randomness 1 m algorithms, additional assumptions will be necessary. in the execution of the (randomized) algorithm, and kµ^ − E[Y ]k is any choice of semi-norm. 2.2 Control Variates We will first focus on one dimensional problems where ~ Y 2 R, and choose the norm kµ^ − E[Y ]k = jµ^ − E[Y ]j, Suppose Y is another random variable jointly dis- then extend several results to more general setups. tributed with Y and that we know its mean E[Y~ ] (or have a very accurate estimate of it). We also 2.1 Baseline Estimators and Optimality have samples drawn from their joint distribution (y1; y~1);:::; (ym; y~m) ∼ Y; Y~ . The classical approach is to estimate [Y ] with the E For Y~ to be useful for our application, its value needs empirical mean µ = 1 Pm y , and its estimation mean m i=1 i to be close to Y . In other words, Y − Y~ should be a error jµ − [Y ]j can be controlled using concentra- mean E random variable that is concentrated around 0. The tion inequalities. purpose of this variable Y~ is similar to a “control variate” To obtain concentration inequalities, we need some in the Monte Carlo community, but we use it here for assumptions on Y. For example, if there is some a different task of interval estimation. Shengjia Zhao, Christopher Yeh, Stefano Ermon For example, in our household wealth estimation ex- Eq.(4) is satisfied for some ζ 2 (0; 1) and c > 0, then ample, let Y denote the household-level wealth in a randomly sampled village, and let Y~ be the predicted Z(m−k) − Z(1+k) Pr jµ^ − E[Y ]j > r + ≤ ζ: (5) household-level wealth based on the satellite image of 2 that village. If our predictor is accurate (i.e. Y ≈ Y~ with high probability), then Y~ could be an effective Proof of Proposition 1. See Appendix. control variate for Y . In addition, we can also estimate E[Y~ ] very accurately because a very large number of Note that in Eq.(5), the confidence interval size con- satellite images are available with little cost (Wulder Z(m−k)−Z(1+k) sists of two parts: 2 and c. We will show et al., 2012), so obtaining samples from Y~ (without the that c is much smaller (in fact, has asymptotically bet- corresponding Y ) is inexpensive. ter rates) compared to baselines previously discussed ~ We would like to design an estimation algorithm that in Section 2.1. If we have a good control variate Y (close to Y ), then Z(m−k)−Z(1+k) will be small.

A Framework for Sample Efficient Interval Estimation with Control

Confidence Interval Estimation in System Dynamics Models: Bootstrapping Vs

WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conﬂict with His Ideology of Statistical Inference

Likelihood Confidence Intervals When Only Ranges Are Available

32. Statistics 1 32

Interval Estimation Statistics (OA3102)

Confidence Intervals in Analysis and Reporting of Clinical Trials Guangbin Peng, Eli Lilly and Company, Indianapolis, IN

Interval Estimation for Drop-The-Losers Designs

Interval Estimation, Point Estimation, and Null Hypothesis Significance Testing Calibrated by an Estimated Posterior Probability of the Null Hypothesis David R

Statistical Inference II: Interval Estimation and Hypotheses Tests

Contents 3 Parameter and Interval Estimation

Applied Resampling Methods

Resampling for Estimation and Inference for Variances 1 Introduction