Estimating Accuracy of the MCMC Variance Estimator: a Central Limit Theorem for Batch Means Estimators
Total Page:16
File Type:pdf, Size:1020Kb
Estimating accuracy of the MCMC variance estimator: a central limit theorem for batch means estimators Saptarshi Chakraborty∗, Suman K. Bhattacharyay and Kshitij Kharey ∗Department of Epidemiology & Biostatistics Memorial Sloan Kettering Cancer Center 485 Lexington Ave New York, NY 10017, USA e-mail: [email protected] yDepartment of Statistics University of Florida 101 Griffin Floyd Hall Gainesville, Florida 32601, USA e-mail: [email protected] e-mail: [email protected] Abstract: The batch means estimator of the MCMC variance is a simple and effective measure of accuracy for MCMC based ergodic averages. Under various regularity conditions, the estimator has been shown to be consistent for the true variance. However, the estimator can be unstable in practice as it depends directly on the raw MCMC output. A measure of accuracy of the batch means estima- tor itself, ideally in the form of a confidence interval, is therefore desirable. The asymptotic variance of the batch means estimator is known; however, without any knowledge of asymptotic distribution, asymptotic variances are in general insufficient to describe variability. In this article we prove a central limit theorem for the batch means estimator that allows for the construction of asymptotically accu- rate confidence intervals for the batch means estimator. Additionally, our results provide a Markov chain analogue of the classical CLT for the sample variance parameter for i.i.d. observations. Our result assumes standard regularity conditions similar to the ones assumed in the literature for proving consistency. Simulated and real data examples are included as illustrations and applications of the CLT. arXiv:1911.00915v1 [stat.CO] 3 Nov 2019 MSC 2010 subject classifications: Primary 60J22; secondary 62F15. Keywords and phrases: MCMC variance, batch means estimator, asymptotic normality. 1. Introduction Markov chain Monte Carlo (MCMC) techniques are indispensable tools of modern day computations. Rou- tinely used in Bayesian analysis and machine learning, a major application of MCMC lies in the approxima- tion of intractable and often high-dimensional integrals. To elaborate, let (X ; F; ν) be an arbitrary measure space and let Π be a probability measure on X , with associated density π(·) with respect to ν. The quantity 1 Chakraborty, Bhattacharya and Khare/CLT for batch means variance estimate 2 of interest is the integral Z Z πf = Eπf := f(x) dΠ(x) = f(x) π(x) ν(dx) X X where f is a real-valued, Π−integrable function on X . In many modern applications, the such an integral is often intractable, i.e., (a) does not have a closed form, (b) deterministic approximations are inefficient, often due to the high dimensionality of X , and (c) cannot be estimated via classical or i.i.d. Monte Carlo techniques as i.i.d. random generation from Π is in general infeasible. Markov chain Monte Carlo (MCMC) techniques are the to-go method of approximation for such integrals. Here, a Markov chain (Xn)n≥1 with an invariant probability distribution Π [see, e.g. 22, for definitions] is generated using some MCMC sampling technique such as the Gibbs sampler or the Metroplis Hastings algorithms. Then, ergodic averages f n := −1 Pn n i=1 f(Xi) based on realizations of the Markov chain (Xn)n≥1 are used as approximations of Eπf. Measuring the errors incurred in approximations is a critical step in any numerical analysis. It is well known that when a Markov chain is Harris ergodic (i.e., aperiodic, φ-irreducible, and Harris recurrent [see 22, for definitions]), then ergodic averages based on realizations of the Markov chain always furnish strongly consistent estimates of the corresponding population quantities [22, Theorem 13.0.1]. In other words, if a Harris ergodic chain is run long enough, then the estimate f n is always guaranteed to provide a reasonable approximation to the otherwise intractable quantity Eπf (under some mild regularity conditions on f). Determining an MCMC sample (or iteration) size n that justifies this convergence, however, requires a measurement of accuracy. Similar to i.i.d. Monte Carlo estimation, the standard error of f n obtained from the MCMC central limit theorem (MCMC CLT) is the natural quantity to use for this purpose. MCMC CLT requires additional regularity conditions as compared to its i.i.d. counterpart; if the Markov chain (Xn)n≥1 2+δ is geometrically ergodic (see, e.g., Meyn and Tweedie [22] for definitions), and if Eπjfj for some δ > 0 2 (or Eπf < 1 if (Xn)n≥1 is geometrically ergodic and reversible), it can be shown that as n ! 1 p d 2 n f n − Eπf −! N(0; σf ) 2 where σf is the MCMC variance defined as 1 2 X σf = varπ f(X1) + 2 covπ (f(X1); f(Xi)) : (1.1) i=2 Here varπ and covπ respectively denote the variance and (auto-) covariance computed under the stationary distribution Π. Note that other sufficient conditions ensuring the above central limit theorem also exist; see the survey articles of Jones et al. [16], and Roberts and Rosenthal [32] for more details. When the regularity conditions hold, a natural measure of accuracy for f n is therefore given by the MCMC standard p error (MCMCSE) defined as σf = n. Note that this formula of MCMCSE, alongside measuring the error in approximation, also helps determine an optimum iteration size n that is required to achieve a pre-specified Chakraborty, Bhattacharya and Khare/CLT for batch means variance estimate 3 2 level of precision, thus providing a stopping rule for terminating MCMC sampling. A related use of σf also lies 2 in the computation of effective sample size ESS = n varπ f(X1)/σf [18, 29]. ESS measures how n dependent MCMC samples compare to n i.i.d. observations from Π, thus providing a univariate measure of the quality 2 of the MCMC samples. Thus to summarize, the MCMC variance σf facilitates computation/determination of three crucial aspects of an MCMC implementation, namely (a) stopping rule for terminating simulation, (b) effective sample size (ESS) of the MCMC draws, and (c) precision of the MCMC estimate f n. 2 In most non-trivial applications, however, the MCMC variance σf is usually unknown, and must be 2 estimated. A substantial literature has been devoted to the estimation of σf [see, e.g.,3,9, 12, 13, 14, 23, 31, 10, 11, to name a few], and several methods, such as regerative sampling, spectral variance estimation, and overlapping and non-overlapping batch means estimation, have been developed. In this paper, we focus on the non-overlapping batch means estimator, henceforth called the batch means estimator for simplicity, where 2 estimation of σf is performed by breaking the n = anbn Markov chain iterations into an non-overlapping blocks or batches of equal size bn. Then, for each k 2 f1; 2; ··· ; ang, one calculates the k-th batch mean 1 Pbn 1 Pan Zk := Z , and the overall mean Z := Zk, where Zi = f(Xi) for i = 1; 2;::: , and bn i=1 (k−1)bn+i an i=1 2 finally estimates σf by an 2 2 bn X 2 σ^BM;f =σ ^BM;f (n; an; bn) = Zk − Z : (1.2) an − 1 k=1 The batch means estimator is straightforward to implement, and can be computed post-hoc without making any changes to the original MCMC algorithm, as opposed to some other methods, such as regeneration 2 sampling. Under various sets of regularity conditions, the batch mean estimatorσ ^BM;f has been shown to 2 be strongly consistent [7, 15, 17, 11] and also mean squared consistent [5, 11] for σf , provided the batch size bn and the number of batches an both increase with n. Note that the estimator depends on the choice of the batch size bn (and hence the number of batches an = n=bn). Optimal selection of the batch-size is 1=2 1=3 still an open problem, and both bn = n and bn = n have been deemed desirable in the literature; the former ensures that the batch means fZkg approach asymptotic normality at the fastest rate (under certain 2 regularity conditions, [6]), and the latter minimizes the asymptotic mean-squared error ofσ ^BM;f (under different regularity conditions, [34]). It is however important to recognize that consistency alone does not in general justify practical usefulness, and a measurement of accuracy is always required to assess the validity of an estimator. It is known that 2 4 the asymptotic variance of the batch means estimator is given by varσ ^BM;f = 2σf =an + o(1=n), under various regularity conditions [5, 11]. However, without any knowledge of the asymptotic distribution, the asymptotic variance alone is generally insufficient for assessing the accuracy of an estimator. For example, a ±2 standard error bound does not in general guarantee more than 75% coverage as obtained from the Chebyshev inequality, and to ensure a pre-specified (95%) coverage, a much larger interval (∼ ±4:5 standard Chakraborty, Bhattacharya and Khare/CLT for batch means variance estimate 4 error) is necessary in general. This provides a strong practical motivation for determining the asymptotic distribution of the batch means estimator. To the best of our knowledge, however, no such result is available. The main purpose of this paper is to establish a central limit theorem that guarantees asymptotic normality of the batch means estimator under mild and standard regularity conditions (Theorem 2.1). There are two major motivations for our work. As discussed above, the first motivation lies in the immediate practical implication of this work. As a consequence of the CLT, the use of approximate normal confidence intervals for measuring accuracy of batch means estimators is justified.