Monte Carlo Integration Using Importance Sampling and Gibbs Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hormann¨ and Josef Leydold Department of Statistics, University of Economics and Business Administration Vienna, Austria [email protected] Abstract- To evaluate the expectation of a simple function integrals are calculated as they represent the Bayesian point with respect to a complicated multivariate density Monte Carlo estimates and confidence intervals for the parameters of the integration has become the main technique. Gibbs sampling model. and importance sampling are the most popular methods for this It is important to note that the methods discussed here task. In this contribution we propose a new simple general and the conclusions of this paper are valid for any inte¥¨§©" gral purpose importance sampling procedure. In a simulation study of the form given above as long as §©" the evaluation of is we compare the performance of this method with the performance much faster than the evaluation of . ©) of Gibbs sampling and of importance sampling using a vector of ¥¨§©)* Using naiv e Monte Carlo integration random vectors independent variates. It turns out that the new procedure is much with density are generated. The average of the better than independent importance sampling; up to dimension is the estimator (i.e. the approximate result) of the above five it is also better than Gibbs sampling. The simulation results integral. Howe ver, often the direct generation of iid. vectors indicate that for higher dimensions Gibbs sampling is superior. with density is impossible or very slow. There are three general approaches to evaluate the above integral if direct 1. Introduction generation of vectors with density is not possible: A standard problem in scientific computing is the evaluation + Simulation of vectors with density using the of an integral of the form rejection technique. ¢¡¤£¦¥¨§ © §© ¢© + Importance sampling. ! + § ©© ©"!# © Markov Chain Monte Carlo algorithms, in particular §©" where denotes a vector in $ , Gibbs sampling as the most popular one in Bayesian is the multiple of a density¥¨§©" function with bounded statistics. but often unknown integral and is an arbitrary real valued function. In typical applications we want to find There are quite a few papers on this topic in the expectation, variance and some quantiles of all marginal literature, certainly an indication that this is a difficult distributions together with all correlations between the practical problem. Papers mainly concerned with the variables. If we consider, for example,¥¨§© %&©'the expectation it computing methodology are eg. [6] (among many others) is obvious that we have to use to obtain the for the Gibbs sampler and [8], [11], [7], [13], [12] and [10] ¥¨§ © expectation of the ( -th marginal. Also for the variances, for importance sampling. quantiles and correlations the evaluation of §© is very In the last years many authors spot that the routine cheap whereas the evaluation of the density can be use shifts more and more to Gibbs sampling or other very expensive. Markov chain Monte Carlo methods. On the other hand An application for the above setting are for example the it is obvious that it is very easy to parallelize importance parameter estimation problems of Bayesian statistics. Then sampling that uses independent sampling from the same is the posterior density that is typically very expensive to distribution. In Markov Chain Monte Carlo normally one evaluate. It is standard to calculate expectation, variance long Markov chain is generated which makes parallelization and several quantiles (eg. 2.5%, 50%, and 97.5%) of all more difficult. Nevertheless, it is possible to run a small or several important one-dimensional marginals. These number of chains in parallel. 1 G O We were not able to find papers that compare the Note that if we consider the normed densities O and performance of these two so different approaches. Thus, associated with and G then we have § © F6 §© > the aim of this paper is twofold. We develop a new QP R¦S M G O simple general purpose importance sampling scheme and O T we compare its performance to the performance of Gibbs sampling. 3.1. Comparing rejection and importance sampling 2. Importance Sampling Perhaps the most frequently used error calculated in statistics is the mean squared error (MSE). To compare the For importance sampling we have to find an importance§© advantages and disadvantages of the different integration sampling density (also called proposal density) , . In methods it is therefore instructive to consider the MSE most applications it is necessary to find more than one when estimating the mean of a marginal distribution. characteristic for several marginal distributions. It is thus in (Without loss of generality we will restrict the discussion practice not possible¥¨§ © to find the optimal importance density to the first marginal distribution.) If we can generate iid. 6 for a single . In this situation§©" it is argued in the random variates directly it is well known that the MSE for 8 U U § © literature (see eg. [8] and [3]) that , should be roughly estimating the mean is equal to , where denotes the 8 proportional to but with higher tails. The idea of variance of the marginal distribution and the sample size. importance sampling§ © is then to sample vectors from the As we are focusing on the case that the evaluation of is proposal density , and to evaluate the integral very expensive it seems to be of interest to express the MSE in terms of VXW the number of evaluations of the density that § © -/§ © 0 §©" © ¥¨§©" .-/§ © are necessary. §© , with If we consider rejection sampling and assume that no , -/§©" -/§ 1 squeeze (simple lower bound of the density) is used, which The© is called weight function and the weight for is rarely possible for multidimensional problems, we can . As we do not know the integral of § ¥¨§ © .-5§© we cannot use express the mean squared error of rejection sampling as VXW § ¥¨§©" .-/§ © 6 § §©" 243 function of , the number of evaluations of : the classic importance sampling estimate but 243 273 , @ have to use the ratio estimate , see § [8] for its properties. M[U VXW MSE Y¤Z W It is quite clear that good weights should be V approximately constant, in other words the weights should For importance sampling we can use the fact that have a small variance. In [8] it is suggested to use the the MSE for the mean 6 of a marginal distribution is 8:9 “effective sample size” 8:9 U ;§=< ->)* 6?§=< - @ ) approximately equal to as is the effective sample 8 § ->)* ¤BDC 8:9 A size. Thus we get for importance sampling: © §->)* :BDC E Ta Maximizing the effective sample size is equivalent to W_^` A § @] U T_a minimizing the variance of the weights. It is not difficult 3_` Z W V MSE \ V5W to show that V5W 8 § © F6 §© @¢© 8:9 E §© If we compare importance sampling § © 4with rejection , , G sampling with the same normed densities O we can see (as pointed out in [4]) that we always have cb § 3. Rejection Sampling § Y¤Z Z W W V V MSE \ MSE For rejection sampling (see [9] or [2] for a § © detailed G explanation) it is necessary to find a hat function with as § © IHJ §©" ©LK §©" § © ©LbdPR?S §©" N¢©h G $ §© ¨g for §© M Tfe , The hat function must have bounded integral and should , be chosen such that it is not too difficult to§ © generate random Of course the above result does not mean that rejection vectors with density proportional to G . The integral sampling is useless. Only for the case that we cannot of the hat function should be as small as possible as the use a squeeze (lower bound for the density) and that the M expected number of trials necessary to generate one evaluation of the¥¨§©" density is much more expensive than the random vector is §©" N© evaluation of we can see that importance sampling E G § © N© is resulting in a smaller MSE for approximately the same E M computing effort. 2 Note: Our setting here is a typical one for Monte in the first quadrant is the given by: ! Carlo integration. For most discrete event simulations¥¨§©" the § ©:C¢© ©"!# JopqS[§sr < )¢©)* situation is different. There the evaluation of involves )kt[ n the whole simulation system and is in most cases much m more time consuming than the generation of the random ) variates. To choose the parameters n for all coordinates in each quadrant we use the following heuristic formula that is based on the idea to find a hyperplane not too different from 4. A new class of importance sampling the (log-)transformed density: ) u ! densities ! B v v n 9. w xy w xy w xFy w xy aa aa aa.~ aa*~ `kW¢`} `kW¢` `kW¢`| In practice the success of importance sampling heavily `zW{`}| depends on the used importance sampling density. If ) we consider the variety of shapes a multi-modal density where is the vector of dimension with all entries equal can have in higher dimensions it is not astonishing that to 1 and denotes the -th vector of the canonical basis. If it is impossible to find a general IS density for arbitrary the probabilities ? of the proposal distribution are selected densities in high dimension. There are few general proportional) to the reciprocal value of the product of all purpose suggestions how to select the proposal density. n this implies that the importance sampling densities of An exception are adaptive importance sampling schemes all quadrants have the same value in the origin. In our (cf. eg. [7] or [13]), that have the disadvantage that they numerical experiments we observed that for distributions still need an importance sampling density to start with. In with high correlations the probabilities for some quadrants are too close to 0.