Simulation Efficiency and an Introduction to Variance Reduction

Monte Carlo Simulation: IEOR E4703 c 2017 by Martin Haugh Columbia University Simulation Efficiency and an Introduction to Variance Reduction Methods In these notes we discuss the efficiency of a Monte-Carlo estimator. This naturally leads to the search for more efficient estimators and towards this end we describe some simple variance reduction techniques. In particular, we describe control variates, antithetic variates and conditional Monte-Carlo, all of which are designed to reduce the variance of our Monte-Carlo estimators. We will defer a discussion of other variance reduction techniques such as common random numbers, stratified sampling and importance sampling until later. 1 Simulation Efficiency Suppose as usual that we wish to estimate θ := E[h(X)]. Then the standard simulation algorithm is: 1. Generate X1;:::; Xn Pn 2. Estimate θ with θbn = j=1 Yj=n where Yj := h(Xj). 3. Approximate 100(1 − α)% confidence intervals are then given by σbn σbn θbn − z1−α/2 p ; θbn + z1−α/2 p n n where σbn is the usual estimate of Var(Y ) based on Y1;:::;Yn. One way to measure the quality of the estimator, θbn, is by the half-width, HW , of the confidence interval. For a fixed α, we have r Var(Y ) HW = z : 1−α/2 n We would like HW to be small, but sometimes this is difficult to achieve. This may be because Var(Y ) is too large, or too much computational effort is required to simulate each Yj so that n is necessarily small, or some combination of the two. As a result, it is often imperative to address the issue of simulation efficiency. There are a number of things we can do: 1. Develop a good simulation algorithm. 2. Program carefully to minimize storage requirements. For example we do not need to store all the Yj's: we P P 2 only need to keep track of Yj and Yj to compute θbn and approximate CI's. 3. Program carefully to minimize execution time. 4. Decrease the variability of the simulation output that we use to estimate θ. The techniques used to do this are usually called variance reduction techniques. We will now study some of the simplest variance reduction techniques, and assume that we are doing items (1) to (3) as well as possible. Before proceeding to study these techniques, however, we should first describe a measure of simulation efficiency. Simulation Efficiency and an Introduction to Variance Reduction Methods 2 1.1 Measuring Simulation Efficiency Suppose there are two random variables, W and Y , such that E[W ] = E[Y ] = θ. Then we could choose to either simulate W1;:::;Wn or Y1;:::;Yn in order to estimate θ. Let Mw denote the method of estimating θ by simulating the Wi's. My is similarly defined. Which method is more efficient, Mw or My? To answer this, let nw and ny be the number of samples of W and Y , respectively, that are needed to achieve a half-width, HW . Then we know that z 2 n = 1−α/2 Var(W ) w HW z 2 n = 1−α/2 Var(Y ): y HW Let Ew and Ey denote the amount of computational effort required to produce one sample of W and Y , respectively. Then the total effort expended by Mw and My, respectively, to achieve a half width HW are z 2 TE = 1−α/2 Var(W ) E w HW w z 2 TE = 1−α/2 Var(Y ) E : y HW y We then say that Mw is more efficient than My if TEw < T Ey. Note that TEw < T Ey if and only if Var(W )Ew < Var(Y )Ey: (1) We will use the quantity Var(W )Ew as a measure of the efficiency of the simulator, Mw. Note that (1) implies we cannot conclude that one simulation algorithm, Mw, is better than another, My, simply because Var(W ) < Var(Y ); we also need to take Ew and Ey into consideration. However, it is often the case that we have two simulators available to us, Mw and My, where Ew ≈ Ey and Var(W ) << Var(Y ). In such cases it is clear that using Mw provides a substantial improvement over using My. 2 Control Variates Suppose you wish to determine the mean midday temperature, θ, in Grassland and that your data consists of f(Ti;Ri): i = 1; : : : ng where Ti and Ri are the midday temperature and daily rainfall, respectively, on some random day, Di. Then θ = E[T ] is the mean midday temperature.If the Di's are drawn uniformly from f1;:::; 365g, then an obvious estimator for θ is Pn i=1 Ti θbn = n and we then know that E[θbn] = θ. Suppose, however, that we also know: 1. E[R], the mean daily rainfall in Grassland 2. Ri and Ti are dependent; in particular, it tends to rain more in the cold season Is there any way we can exploit this information to obtain a better estimate of θ? The answer of course, is yes. Pn Let Rn := i=1 Ri=n and now suppose Rn > E[R]. Then this implies that the Di's over-represent the rainy season in comparison to the dry season. But since the rainy season tends to coincide with the cold season, it also means that the Di's over-represent the cold season in comparison to the warm season. As a result, we expect θbn < θ. Therefore, to improve our estimate, we should increase θbn. Similarly, if Rn < E[R], we should decrease θbn. In this example, rainfall is the control variate since it enables us to better control our estimate of θ. The principle idea behind many variance reduction techniques (including control variates) is to \use what you know" Simulation Efficiency and an Introduction to Variance Reduction Methods 3 about the system. In this example, the system is Grassland's climate, and what we know is E[R], the average daily rainfall. We will now study control variates more formally, and in particular, we will determine by how much we should increase or decrease θbn. 2.1 The Control Variate Method Suppose again that we wish to estimate θ := E[Y ] where Y = h(X) is the output of a simulation experiment. Suppose that Z is also an output of the simulation or that we can easily output it if we wish. Finally, we assume that we know E[Z]. Then we can construct many unbiased estimators of θ: 1. θb = Y , our usual estimator 2. θbc := Y + c(Z − E[Z]) for any c 2 R. The variance of θbc satisfies 2 Var(θbc) = Var(Y ) + c Var(Z) + 2c Cov(Y; Z): (2) and we can choose c to minimize this quantity. Simple calculus then implies the optimal value of c is given by Cov(Y; Z) c∗ = − Var(Z) and that the minimized variance satisfies Cov(Y; Z)2 Var(θbc∗ ) = Var(Y ) − Var(Z) Cov(Y; Z)2 = Var(θb) − : Var(Z) In order to achieve a variance reduction it is therefore only necessary that Cov(Y; Z) 6= 0. The new resulting Monte Carlo algorithm proceeds by generating n samples of Y and Z and then setting Pn ∗ i=1 (Yi + c (Zi − E[Z])) θbc∗ = : n There is a problem with this, however, as we usually do not know Cov(Y; Z). We overcome this problem by doing p pilot simulations and setting Pp j=1(Yj − Y p)(Zj − E[Z]) Cov(d Y; Z) = : p − 1 If it is also the case that Var(Z) is unknown, then we also estimate it with Pp 2 j=1(Zj − E[Z]) Var(d Z) = p − 1 and finally set ∗ Cov(d Y; Z) bc = − : Var(d Z) Assuming we can find a control variate, our control variate simulation algorithm is as follows. Note that the Vi's are IID, so we can compute approximate confidence intervals as before. Simulation Efficiency and an Introduction to Variance Reduction Methods 4 Control Variate Simulation Algorithm for Estimating E[Y ] =∗ Do pilot simulation first ∗= for i = 1 to p generate (Yi;Zi) end for ∗ compute bc =∗ Now do main simulation ∗= for i = 1 to n generate (Yi;Zi) ∗ set Vi = Yi + bc (Zi − E[Z]) end for Pn set θ ∗ = V = V =n bbc n i=1 i 2 P 2 set σ = (V − θ ∗ ) =(n − 1) bn;v i bbc h σn;v σn;v i set 100(1 − α)% CI = θ ∗ − z bp ; θ ∗ + z bp bbc 1−α/2 n bbc 1−α/2 n Example 1 2 Suppose we wish to estimate θ = E[e(U+W ) ] where U; W ∼ U(0; 1) and IID. In our notation we then have 2 Y := e(U+W ) . The usual approach is: 1. Generate U1;:::;Un and W1;:::;Wn, all IID ∼ U(0; 1) 2 2 (U1+W1) (Un+Wn) 2. Compute Y1 = e ;:::;Yn = e Pn 3. Construct the estimator θbn;y = j=1 Yj=n p 2 4. Build confidence intervals θbn;y ± z1−α/2 σbn;y= n where σbn;y is the usual estimate of Var(Y ). Now consider using the control variate technique. First we have to choose an appropriate control variate, Z. There are many possibilities including Z1 := U + W 2 Z2 := (U + W ) U+W Z3 := e Note that we can easily compute E[Zi] for i = 1; 2; 3 and that it's also clear that Cov(Y; Zi) 6= 0. In a simple ∗ experiment we used Z3, estimating bc on the basis of a pilot simulation with 100 samples.

Load more