Chapter 5: Monte Carlo Integration and Variance Reduction
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 5: Monte Carlo Integration and Variance Reduction Lecturer: Zhao Jianhua Department of Statistics Yunnan University of Finance and Economics Outline 5.2 Monte Carlo Integration 5.2.1 Simple MC estimator 5.2.2 Variance and Efficiency 5.3 Variance Reduction 5.4 Antithetic Variables 5.5 Control Variates 5.5.1 Antithetic variate as control variate 5.5.2 Several control variates 5.5.3 Control variates and regression 5.6 Importance Sampling 5.7 Stratified Sampling 5.8 Stratified Importance Sampling 5.2 Monte Carlo (MC) Integration I Monte Carlo (MC) integration is a statistical method based on random sampling. MC methods were developed in the late 1940s after World War II, but the idea of random sampling was not new. I Let g(x) be a function and suppose that we want to compute R b a g(x) dx. Recall that if X is a r.v. with density f(x), then the mathematical expectation of the r.v. Y = g(X) is Z 1 E[g(X)] = g(x)f(x)dx: −∞ I If a random sample is available from the dist. of X, an unbiased estimator of E[g(X)] is the sample mean. 5.2.1 Simple MC estimator R 1 I Consider the problem of estimating θ = 0 g(x)d(x). If X1; :::; Xm is a random U(0; 1) sample, then 1 Xm θ^ = gm(X) = g(Xi): m i=1 converges to E[g(X)] = θ with probability 1, by the Strong Law of Large Numbers (SLLN). R 1 I The simple MC estimator of 0 g(x)dx is gm(X). Example 5.1 (Simple MC integration) R 1 −x Compute a MC estimate of θ = 0 e dx and compare the estimate with the exact value. m <- 10000; x <- runif(m); theta.hat <- mean(exp(-x)) print(theta.hat); print(1 - exp(-1)) [1] 0.6355289 [1] 0.6321206 : : The estimate is θ^ = 0:6355 and θ = 1 − e−1 = 0:6321. R b To compute a g(t)dt, make a change of variables so that the limits of integration are from 0 to 1. The linear transformation is y = (t − a)=(b − a) and dy = (1=(b − a)). Z b Z 1 g(t)dt = g(y(b − a) + a)(b − a)dy: a 0 Alternately, replace the U(0; 1) density with any other density sup- ported on the interval between the limits of integration, e.g., Z b Z b 1 g(t)dt = (b − a) g(t) dt: a a b − a is b − a times the expected value of g(Y ), where Y has the uniform density on (a; b). The integral is therefore (b − a) times gm(X), the average value of g(·) over (a; b). Example 5.2 (Simple MC integration, cont.) R 4 −x Compute a MC estimate of θ = 2 e dx: and compare the estimate with the exact value of the integral. m <- 10000 x <- runif(m, min=2, max=4) theta .hat <- mean(exp(-x))*2 print(theta.hat) print(exp(-2) - exp(-4)) [1] 0.1172158 [1] 0.1170196 : : The estimate is θ^ = 0:1172 and θ = 1 − e−1 = 0:1170. R b To summarize, the simple MC estimator of the integral θ = a g(x)dx is computed as follows. 1. Generate X1; :::; Xm, iid from U(a; b). 1 2. Compute g(X) = m g(Xi). 3. θ^ = (b − a)g(X). Example 5.3 (MC integration, unbounded interval) Use the MC approach to estimate the standard normal cdf x Z 1 2 Φ(x) = p e−t =2dt: −∞ 2π Since the integration cover an unbounded interval, we break this problem into two cases: x ≥ 0 and x < 0, and use the symmetry of the normal density to handle the second case. R x −t2=2 I To estimate θ = 0 e dt for x > 0, we can generate ran- dom U(0; x) numbers, but it would change the parameters of uniform dist. for each different value. We prefer an algorithm that always samples from U(0; 1) via a change of variables. Making the substitution y = t=x, we have dt = xdy and 1 Z 2 θ = xe−(xy) =2dy: 0 −(xY )2=2 Thus, θ = EY [xe ], where r.v. Y has U(0; 1) dist. Generate iid U(0; 1) random numbers u1; :::; um, and compute 1 m 2 X −(uix) =2 θ^ = gm(u) = xe m t=1 ^ ^ Sample meanp θ ! E[θ] = θ as m ! 1. If x > 0, estimate of Φ(x) is 0:5 + θ=^ 2π. If x < 0, compute Φ(x) = 1 − Φ(−x). x<-seq(.1, 2.5,length=10); m <- 10000 u <- runif(m) cdf <- numeric(length(x)) for (i in 1:length(x)) { g <- x[i]* exp(-(u* x[i])^2/ 2) cdf [i] <- mean(g)/ sqrt(2* pi) + 0.5 } Now the estimates θ^ for ten values of x are stored in the vector cdf. Compare the estimates with the value Φ(x) computed (numerically) by the pnorm function. Phi <- pnorm(x); print(round(rbind(x, cdf, Phi), 3)) The MC estimates appear to be very close to the pnorm values (The estimates will be worse in the extreme upper tail of the dist.) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500 cdf 0.54 0.643 0.737 0.816 0.879 0.925 0.957 0.978 0.990 0.997 Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994 I It would have been simpler to generate random U(0; x) r.v. and skip the transformation. This is left as an exercise. I In fact, the integrand is itself a density function, and we can generate r.v. from this density. This provides a more direct approach to estimating the integral. Example 5.4 (Example 5.3, cont.) Let I(·) be the indicator function, and Z ∼ N(0; 1). Then for any constant x we have E[I(Z ≤ x)] = P (Z ≤ x) = Φ(x). Generate a random sample z1; :::; zm from the standard normal dist. Then the sample mean 1 Xm Φ([x) = I(zi ≤ x) ! E[I(Z ≤ x)] = Φ(x): m i=1 x <- seq(.1, 2.5, length = 10) m <- 10000; z <- rnorm(m) dim(x) <- length(x) p <- apply(x, MARGIN = 1, FUN = function(x, z) {mean(z < x)}, z = z) Compare the estimates in p to pnorm: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500 p 0.546 0.652 0.741 0.818 0.876 0.925 0.954 0.976 0.988 0.993 Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994 Compared with Example 5.3, better agreement with pnorm in the upper tail, but worse agreement near the center. Summarizing, if f(x) is a probability density function supported on R a set A (f(x) ≥ 0 for all x 2 R and A f(x) = 1), to estimate the integral Z θ = g(x)f(x)dx; A generate a random sample x1; :::; xm from the dist. f(x), and com- pute the sample mean 1 Xm θ^ = g(xi): m i=1 Then with probability one, θ^ converges to E[θ^] = θ as m ! 1. ^ 1 m The standard error of θ = m Σi=1g(xi) ^ 2 2 The variance of θ is σ =m, where σ = V arf (g(X)). When the dist. of X is unknown we substitute for FX the empirical dist. Fm of the sample x1; :::; xm. The variance of θ^ can be estimated by 2 σ 1 Xm 2 = [g(xi) − g(x)] (5:1) m m2 i=1 1 Pm 2 m i=1[g(xi) − g(x)] is the plug-in estimate of V ar(g(X)) and the variance of U, where U is uniformly distributed on the set of replicates g(xi). The estimate of standard error of θ^ is σ 1 Xm 2 1=2 se^ (θ^) = p = f [g(xi) − g(x)] g (5:3) m m i=1 p The Central Limit Theorem (CLT) implies that (θ^− E[θ^])= V arθ^ converges in dist. to N(0; 1) as m ! 1. Hence, θ^ is approximately normal with mean θ. The approximately normal dist. of θ^ can be applied to put confidence limits or error bounds on the MC estimate of the integral, and check for convergence. Example 5.5 (Error bounds for MC integration) Estimate the variance of the estimator in Example 5.4, and construct approximate 95% CI for estimates of Φ(2) and Φ(2:5). x <- 2; m <- 10000; z <- rnorm(m) g <- (z < x)#the indicator function v <- mean((g - mean(g))^2)/ m; cdf <- mean(g) c(cdf, v);c(cdf - 1.96* sqrt(v), cdf + 1.96* sqrt(v)) [1] 9.772000e-01 2.228016e-06 [1] 0.9742744 0.9801256 The probability P (I(Z < x) = 1) is Φ(2) u 0:977. The variance of g(X) is therefore (0:977)(1 − 0:977)=10000 = 2:223e − 06. The MC estimate 2:228e − 06 of variance is quite close to this value.