<<

Lab 4: Monte Carlo Integration and Reduction

Lecturer: Zhao Jianhua

Department of Statistics Yunnan University of Finance and Economics

Task

The objective in this lab is to learn the methods for Monte Carlo Integration and Variance Re- duction, including Monte Carlo Integration, Antithetic Variables, Control Variates, , Stratified Sampling, Stratified Importance Sampling.

1 Monte Carlo Integration

1.1 Simple Monte Carlo estimator 1.1.1 Example 5.1 (Simple Monte Carlo integration)

Compute a Monte Carlo(MC) estimate of ∫ 1 θ = e−xdx 0 and compare the estimate with the exact value. m <- 10000 x <- runif(m) theta.hat <- mean(exp(-x)) print(theta.hat)

## [1] 0.6324415 print(1 - exp(-1))

## [1] 0.6321206

. − . The estimate is θˆ = 0.6355 and θ = 1 − e 1 = 0.6321.

1.1.2 Example 5.2 (Simple Monte Carlo integration, cont.) ∫ 4 −x Compute a MC estimate of θ = 2 e dx. and compare the estimate with the exact value of the . m <- 10000 x <- runif(m, min = 2, max = 4) theta.hat <- mean(exp(-x)) * 2 print(theta.hat)

1 ## [1] 0.1168929 print(exp(-2) - exp(-4))

## [1] 0.1170196

. − . The estimate is θˆ = 0.1172 and θ = 1 − e 1 = 0.1170.

1.1.3 Example 5.3 (Monte Carlo integration, unbounded interval)

Use the MC approach to estimate the standard normal cdf ∫ ∞ 1 2 Φ(x) = √ e−t /2dt. −∞ 2π Since the integration cover an unbounded interval, we break this problem into two cases: x ≥ 0 and x < 0, and use the symmetry of the normal density to handle the second case. ∫ x −t2/2 • To estimate θ = 0 e dt for x > 0, we can generate random U(0, x) numbers, but it would change the parameters of uniform dist. for each different value. We prefer an algorithm that always samples from U(0, 1) via a change of variables. Making the substitution y = t/x, we have dt = xdy and ∫ 1 2 θ = xe−(xy) /2dy. 0 −(xY )2/2 Thus, θ = EY [xe ], where r.v. Y has U(0, 1) dist. Generate iid U(0, 1) random numbers u1, ..., um, and compute ∑ 1 m 2 −(uix) /2 θˆ = gm(u) = xe m t=1 √ Sample mean θˆ → E[θˆ] = θ as m → ∞. If x > 0, estimate of Φ(x) is 0.5 + θˆ/ 2π. If x < 0, compute Φ(x) = 1 − Φ(−x). x <- seq(0.1, 2.5, length = 10) m <- 10000 u <- runif(m) cdf <- numeric(length(x)) for (i in 1:length(x)) { g <- x[i] * exp(-(u * x[i])^2/2) cdf[i] <- mean(g)/sqrt(2 * pi) + 0.5 }

Phi <- pnorm(x) print(round(rbind(x, cdf, Phi), 3))

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ## x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500 ## cdf 0.54 0.643 0.737 0.816 0.879 0.925 0.957 0.978 0.991 0.999 ## Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994

Now the estimates θˆfor ten values of x are stored in the vector cdf. Compare the estimates with the value Φ(x) computed (numerically) by the pnorm function. The MC estimates appear to be very close to the pnorm values (The estimates will be worse in the extreme upper tail of the dist.)

2 1.1.4 Example 5.4 (Example 5.3, cont.)

Let I(·) be the indicator function, and Z ∼ N(0, 1). Then for any constant x we have E[I(Z ≤ x)] = P (Z ≤ x) = Φ(x). Generate a random sample z1, ..., zm from the standard normal dist. Then the sample mean ∑ [ 1 m Φ(x) = I(zi ≤ x) → E[I(Z ≤ x)] = Φ(x). m i=1

x <- seq(0.1, 2.5, length = 10) m <- 10000 z <- rnorm(m) dim(x) <- length(x) p <- apply(x, MARGIN = 1, FUN = function(x, z){ mean(z < x) }, z = z)

Phi <- pnorm(x) print(round(rbind(x, p, Phi), 3))

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ## x 0.100 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500 ## p 0.548 0.648 0.741 0.815 0.877 0.925 0.956 0.976 0.987 0.994 ## Phi 0.540 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994

Compared with Example 5.3, beer agreement with pnorm in the upper tail, but worse agree- ment near the center.

1.1.5 Example 5.5 (Error bounds for MC integration)

Estimate the variance of the estimator in Example 5.4, and construct approximate 95% CI for estimates of Φ(2) and Φ(2.5). x <- 2 m <- 10000 z <- rnorm(m) g <- (z < x) #the indicator function v <- mean((g - mean(g))^2)/m cdf <- mean(g) c(cdf, v)

## [1] 9.771000e-01 2.237559e-06 c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))

## [1] 0.9741681 0.9800319

The probability P (I(Z < x) = 1) is Φ(2) u 0.977. The variance of g(X) is therefore (0.977)(1−0.977)/10000 = 2.223e−06. The MC estimate 2.228e−06 of variance is quite close to this value. The probability P (I(Z < x) = 1) is Φ(2.5) u 0.995. The MC estimate 5.272e − 07 of variance is approximately equal to the theoretical value (0.995)(1 − 0.995)/10000 = 4.975e − 07.

3 2 Antithetic Variables

Refer to Example 5.3, estimation of the standard normal cdf ∫ x 1 2 Φ(x) = √ e−t /2dt. −∞ 2π

Now use antithetic variables, and find the approximate reduction in standard error. Now the −(xU)2/2 target parameter is θ = EU [xe ], where U has the U(0, 1) dist. By restricting the simulation to the upper tail, the function g(·) is monotone, so the hypoth- esis of Corollary 5.1 is satisfied. Generate random numbers u1, ..., um/2 ∼ U(0, 1) and compute half of the replicates using

2 (j) −(uj )x /2 Yj = g (u) = xe , j = 1, ..., m/2 as before, but compute the remaining half of the replicates using

2 ′ −(−(uj )x) /2 Yj = xe , j = 1, ..., m/2.

The sample mean ∑ ( ) 1 m/2 − 2 − − 2 θˆ = xe (uj )x /2 + xe ((1 uj )x) /2 m j=1 ( ) ∑ − 2 − − 2 1 m/2 xe (uj )x /2 + xe ((1 uj )x) /2 = m/2 j=1 2 √ converges to E[θˆ] = θ as m → ∞. If x > 0, the estimate of Φ(x) is 0.5+θˆ/ 2π. If x < 0 compute Φ(x) = 1 − Φ(−x).

2.1 Example 5.6 (Antithetic variables) MC.Phi below implements MC estimation of Φ(x), which compute the estimate with or without antithetic sampling. MC.Phi function could be made more general if an argument naming a function, the integrand, is added (see integrate).

MC.Phi <- function(x, R = 10000, antithetic = TRUE){ u <- runif(R/2) if (!antithetic) v <- runif(R/2) else v <- 1 - u u <- c(u, v) cdf <- numeric(length(x)) for (i in 1:length(x)) { g <- x[i] * exp(-(u * x[i])^2/2) cdf[i] <- mean(g)/sqrt(2 * pi) + 0.5 } cdf }

Compare estimates obtained from a single MC experiment: x <- seq(0.1, 2.5, length = 5) Phi <- pnorm(x) set.seed(123)

4 MC1 <- MC.Phi(x, anti = FALSE) set.seed(123) MC2 <- MC.Phi(x) print(round(rbind(x, MC1, MC2, Phi), 5))

## [,1] [,2] [,3] [,4] [,5] ## x 0.10000 0.70000 1.30000 1.90000 2.50000 ## MC1 0.53983 0.75825 0.90418 0.97311 0.99594 ## MC2 0.53983 0.75805 0.90325 0.97132 0.99370 ## Phi 0.53983 0.75804 0.90320 0.97128 0.99379

The approximate reduction in variance can be estimated for given x by a simulation under both methods. m <- 1000 MC1 <- MC2 <- numeric(m) x <- 1.95 for (i in 1:m) { MC1[i] <- MC.Phi(x, R = 1000, anti = FALSE) MC2[i] <- MC.Phi(x, R = 1000) } print(sd(MC1))

## [1] 0.006874616 print(sd(MC2))

## [1] 0.0004392972 print((var(MC1) - var(MC2))/var(MC1))

## [1] 0.9959166

3 Control Variates

3.0.1 Example 5.7 (Control variate)

Apply the control variate approach to compute ∫ 1 θ = E[eU ] = eudu, 0 where U ∼ U(0, 1). θ = e − 1 = 1.718282 by integration. If the simple MC approach is applied with m replicates, The variance of the estimator is V ar(g(U))/m, where

e2 − 1 . V ar(g(U)) = V ar(eU ) = E[eU ] − θ2 = − (e − 1)2 = 0.2420351. 2 A natural choice for a control variate is U ∼ U(0, 1). Then E[U] = 1/2, V ar(U) = 1/12, and . Cov(eU ,U) = 1 − (1/2)(e − 1) = 0.1408591.

5 Hence U ∗ −Cov(e ,U) . c = = −12 + 6(e − 1) = −1.690309. V ar(U) U Our controlled estimator is θˆc∗ = e − 1.690309(U − 0.5). For m replicates, mV ar(θˆc∗ ) is ( ) −Cov(eU ,U) e2 − 1 e − 1 V ar(eU ) − = − (e − 1)2 − 12 1 − V ar(U) 2 2 . = 0.2420356 − 12(0.1408591)2 = 0.003940175.

The percent reduction in variance using the control variate compared with the simple MC esti- mate is 100(0.2429355 − 0.003940175/0.2429355) = 98.3781%. Empirically comparing the simple MC estimate with the control variate approach m <- 10000 a <- -12 + 6 * (exp(1) - 1) U <- runif(m) T1 <- exp(U) #simple MC T2 <- exp(U) + a * (U - 1/2) #controlled mean(T1)

## [1] 1.711419 mean(T2)

## [1] 1.717676

(var(T1) - var(T2))/var(T1)

## [1] 0.9837252 illustrating that the percent reduction 98.3781% in variance derived above is approximately achieved in this simulation.

3.0.2 Example 5.8 (MC integration using control variates)

Use the method of control variates to estimate ∫ 1 e−x 2 dx 0 1 + x − θ = E[g(X)] and g(X) = e x/(1 + x2), where X ∼ U(0, 1). • We seek a function ‘close’ to g(x) with known expected value, such that g(X) and f(X) are strongly correlated. − − For example, the function f(x) = e .5(1 + x2) 1 is ‘close’ to g(x) on (0,1) and we can compute its expectation. If U ∼ U(0, 1), then ∫ 1 −.5 1 −.5 −.5 π E[f(U)] = e 2 du = e arctan(1) = e 0 1 + u 4 ∗ Seing up a preliminary simulation to obtain an estimate of the constant c , we also obtain an estimate of Cor(g(U), f(U) u 0.974.

6 f <- function(u) exp(-0.5)/(1 + u^2) g <- function(u) exp(-u)/(1 + u^2) set.seed(510) #needed later u <- runif(10000) B <- f(u) A <- g(u)

∗ Estimates of c and Cor(f(U), g(U)) are cor(A, B)

## [1] 0.9740585 a <- -cov(A, B)/var(B) #est of c* a

## [1] -2.436228

Simulation results with and without the control variate follow. m <- 1e+05 u <- runif(m) T1 <- g(u) T2 <- T1 + a * (f(u) - exp(-0.5) * pi/4) c(mean(T1), mean(T2))

## [1] 0.5253543 0.5250021 c(var(T1), var(T2))

## [1] 0.060231423 0.003124814

(var(T1) - var(T2))/var(T1)

## [1] 0.9481199

∗ Here the approximate reduction in variance of g(X) compared with g(X) + c (f(X) − µ) is 95%. We will return to this problem to apply an other approach to variance reduction, the method of importance sampling.

3.1 Control variates and regression 3.1.1 Example 5.9 (Control variate and regression)

Returning to Example 5.8, let us repeat the estimation by fiing a regression model. In this problem, ∫ 1 e−x g(x) = 2 dx 0 1 + x and the control variate is f(x) = e−.5(1 + x2)−1, 0 < x < 1.

7 − ∗ with µ = E[f(X)] = e .5π/4. To estimate the constant c , set.seed(510) u <- runif(10000) f <- exp(-0.5)/(1 + u^2) g <- exp(-u)/(1 + u^2) c.star <- -lm(g ~ f)$coeff[2] # beta[1] mu <- exp(-0.5) * pi/4 c.star

## f ## -2.436228 ∗ ˆ Used the same seed as in Example 5.8 and obtained the same estimate for c . Now θcˆ∗ is the predicted response at the point µ = 0.4763681, so u <- runif(10000) f <- exp(-0.5)/(1 + u^2) g <- exp(-u)/(1 + u^2) L <- lm(g ~ f) theta.hat <- sum(L$coeff * c(1, mu)) #pred. value at mu

Estimate θˆ, residual MSE and the proportion of reduction in variance (R-squared) agree with the estimates obtained in Example 5.8. theta.hat

## [1] 0.5253113 summary(L)$sigma^2

## [1] 0.003117644 summary(L)$r.squared

## [1] 0.9484514

4 Importance Sampling

4.1 Example 5.10 (Choice of the importance function) Several possible choices of importance functions to estimate ∫ 1 e−x 2 dx 0 1 + x by IS method are compared. The candidates are −x f0(x) = 1, 0 < x < 1, f1(x) = e , 0 < x < ∞, 2 −1 f2(x) = (1 + x ) /π, −∞ < x < ∞, −x −1 f3(x) = e /(1 − e ), 0 < x < 1, 2 −1 f4(x) = 4(1 + x ) /π, 0 < x < 1.

8 The Plot for importance functions is given below.

# par(ask = TRUE) #uncomment to pause between graphs x <- seq(0, 1, 0.01) w <- 2 f1 <- exp(-x) f2 <- (1/pi)/(1 + x^2) f3 <- exp(-x)/(1 - exp(-1)) f4 <- 4/((1 + x^2) * pi) g <- exp(-x)/(1 + x^2)

# for color change lty to col

# figure (a) plot(x, g, type = "l", main = "", ylab = "", ylim = c(0, 2), lwd = w) lines(x, g/g, lty = 2, lwd = w) lines(x, f1, lty = 3, lwd = w) lines(x, f2, lty = 4, lwd = w) lines(x, f3, lty = 5, lwd = w) lines(x, f4, lty = 6, lwd = w) legend("topright", legend = c("g", 0:4), lty = 1:6, lwd = w, inset = 0.02)

2.0 g 0 1 2 1.5 3 4 1.0 0.5 0.0

0.0 0.2 0.4 0.6 0.8 1.0

x

9 # figure (b) plot(x, g, type = "l", main = "", ylab = "", ylim = c(0, 3.2), lwd = w, lty = 2) lines(x, g/f1, lty = 3, lwd = w) lines(x, g/f2, lty = 4, lwd = w) lines(x, g/f3, lty = 5, lwd = w) lines(x, g/f4, lty = 6, lwd = w) legend("topright", legend = c(0:4), lty = 2:6, lwd = w, inset = 0.02)

0 3.0 1 2

2.5 3 4 2.0 1.5 1.0 0.5 0.0

0.0 0.2 0.4 0.6 0.8 1.0

x

The integrand is { − e x/(1 + x2), if 0 < x < 1; g(x) = 0, otherwise.

While all five candidates are positive on the set 0 < x < 1 where g(x) > 0, f1 and f2 have larger ranges and many of the simulated values will contribute zeros to the sum, which is inefficient. All of these dist. are easy to simulate; f2 is standard Cauchy or t(v = 1). The densities are ploed on (0, 1) for easy comparison. The function that corresponds to the most nearly constant ratio g(x)/f(x) appears to be f3, which can be seen more clearly in Fig. 5.1(b). From the graphs, we might prefer f3 for the smallest variance (Code to display Fig.5.1(a) and (b) is given on page 152). m <- 10000 theta.hat <- se <- numeric(5) g <- function(x){ exp(-x - log(1 + x^2)) * (x > 0) * (x < 1) }

10 x <- runif(m) #using f0 fg <- g(x) theta.hat[1] <- mean(fg) se[1] <- sd(fg) x <- rexp(m, 1) #using f1 fg <- g(x)/exp(-x) theta.hat[2] <- mean(fg) se[2] <- sd(fg) x <- rcauchy(m) #using f2 i <- c(which(x > 1), which(x < 0)) x[i] <- 2 #to catch overflow errors in g(x) fg <- g(x)/dcauchy(x) theta.hat[3] <- mean(fg) se[3] <- sd(fg) u <- runif(m) #f3, inverse transform method x <- -log(1 - u * (1 - exp(-1))) fg <- g(x)/(exp(-x)/(1 - exp(-1))) theta.hat[4] <- mean(fg) se[4] <- sd(fg) u <- runif(m) #f4, inverse transform method x <- tan(pi * u/4) fg <- g(x)/(4/((1 + x^2) * pi)) theta.hat[5] <- mean(fg) se[5] <- sd(fg) ∫ 1 Five Estimates of 0 g(x)dx and their standard errors se are rbind(theta.hat, se)

## [,1] [,2] [,3] [,4] [,5] ## theta.hat 0.5259481 0.5224536 0.5235617 0.52530826 0.5233785 ## se 0.2456857 0.4206927 0.9513543 0.09623623 0.1410705

f and possibly f produce smallest variance among five candidates, while f produces the 3 4 . 2 highest variance. The standard MC estimate without IS has seˆ = 0.244(f0 = 1). f2 is supported on (−∞, ∞), while g(x) is evaluated on (0,1). There are a very large number of zeros (about 75%) produced in the ratio g(x)/f(x), and all other values far from 0, resulting in a large variance. Summary statistics for g(x)/f2(x) confirm this. Min. 1st Qu. Median Mean 3rdQu. Max. 0.0000 0.0000 0.0000 0.5173 0.0000 3.1380

For f1 there is a similar inefficiency, as f1 is supported on (0, ∞). Choice of importance function: f is supported on exactly the set where g(x) > 0, and the ratio g(x)/f(x) is nearly constant.

11 5 Stratified Sampling

5.1 Example 5.11 (Example 5.10, cont.) In Fig. 5.1(a), it is clear that g(x) is not constant on (0,1). Divide the interval into, say, four subintervals, and compute a MC estimate of the integral on each subinterval using 1/4∫ of the to- 1 −x tal number of replicates. Then combine these four estimates to obtain the estimate of 0 e (1+ − x2) 1dx. Results shows stratification has improved variance by a factor of about 10. For integrands that are monotone functions, stratification similar to Exp.5.11 should be an effective way to reduce variance.

M <- 20 #number of replicates T2 <- numeric(4) estimates <- matrix(0, 10, 2) g <- function(x){ exp(-x - log(1 + x^2)) * (x > 0) * (x < 1) } for (i in 1:10){ estimates[i, 1] <- mean(g(runif(M))) T2[1] <- mean(g(runif(M/4, 0, 0.25))) T2[2] <- mean(g(runif(M/4, 0.25, 0.5))) T2[3] <- mean(g(runif(M/4, 0.5, 0.75))) T2[4] <- mean(g(runif(M/4, 0.75, 1))) estimates[i, 2] <- mean(T2) } estimates

## [,1] [,2] ## [1,] 0.5264330 0.5326060 ## [2,] 0.5033366 0.5159873 ## [3,] 0.4587725 0.5166573 ## [4,] 0.5551324 0.5441103 ## [5,] 0.5003521 0.5059311 ## [6,] 0.5473194 0.5275441 ## [7,] 0.6521611 0.5137131 ## [8,] 0.5658337 0.5187298 ## [9,] 0.5086914 0.5378670 ## [10,] 0.6389043 0.5160286 apply(estimates, 2, mean)

## [1] 0.5456936 0.5229174 apply(estimates, 2, var)

## [1] 0.0037406995 0.0001459295

12 6 Stratified Importance Sampling

6.1 Example 5.12 (Examples 5.10-5.11, cont.) ∫ 1 −x 2 −1 Stratified sampling is implemented in a more general way for 0 e (1+x ) dx. The standard MC estimate is used for comparison.

M <- 10000 #number of replicates k <- 10 #number of strata r <- M/k #replicates per stratum N <- 50 #number of times to repeat the estimation T2 <- numeric(k) estimates <- matrix(0, N, 2) g <- function(x){ exp(-x - log(1 + x^2)) * (x > 0) * (x < 1) } for (i in 1:N) { estimates[i, 1] <- mean(g(runif(M))) for (j in 1:k) T2[j] <- mean(g(runif(M/k, (j - 1)/k, j/k))) estimates[i, 2] <- mean(T2) }

The result of this simulation produces the following estimates. apply(estimates, 2, mean)

## [1] 0.5246942 0.5248675 apply(estimates, 2, var)

## [1] 7.274107e-06 5.826139e-08

This represents a more than 98% reduction in variance.

13