Optimal Importance Sampling for Diffusion Processes
Total Page:16
File Type:pdf, Size:1020Kb
U.U.D.M. Project Report 2019:38 Optimal Importance Sampling for Diffusion Processes Malvina Fröberg Examensarbete i matematik, 30 hp Handledare: Erik Ekström Examinator: Denis Gaidashev Juni 2019 Department of Mathematics Uppsala University Acknowledgements I would like to express my sincere gratitude to my supervisor, Professor Erik Ekström, for all his help and support. I am thankful for his encouragement and for introducing me to the topic, as well as for the hours spent guiding me. 1 Abstract Variance reduction techniques are used to increase the precision of estimates in numerical calculations and more specifically in Monte Carlo simulations. This thesis focuses on a particular variance reduction technique, namely importance sampling, and applies it to diffusion processes with applications within the field of financial mathematics. Importance sampling attempts to reduce variance by changing the probability measure. The Girsanov theorem is used when chang- ing measure for stochastic processes. However, a change of the probability measure gives a new drift coefficient for the underlying diffusion, which may lead to an increased computational cost. This issue is discussed and formu- lated as a stochastic optimal control problem and studied further by using the Hamilton-Jacobi-Bellman equation with a penalty term to account for compu- tational costs. The objective of this thesis is to examine whether there is an optimal change of measure or an optimal new drift for a diffusion process. This thesis provides examples of optimal measure changes in cases where the set of possible measure changes is restricted but not penalized, as well as examples for unrestricted measure changes but with penalization. 2 Contents 1 Introduction 4 2 Monte Carlo Methods 5 2.1 Monte Carlo Integration and Convergence of Error . .5 2.2 Importance Sampling . .7 2.3 Time Discretization Error . 10 2.3.1 Smooth Coefficients . 14 3 Change of Measure 17 4 Stochastic Optimal Control Problem 20 5 Importance Sampling for Diffusions 24 5.1 Introductory Problem . 24 5.1.1 Constant Coefficients . 26 5.2 Stochastic Process with Controlled Drift . 28 5.2.1 Constant Push in Specified Interval . 30 6 Optimal Importance Sampling for Diffusions 33 6.1 Other Penalty Terms . 36 6.2 The Finite Horizon Version . 37 References 39 3 1 Introduction Variance reduction techniques are used to increase the precision of estimates in nu- merical calculations and more specifically in Monte Carlo simulations. Some of the most commonly used techniques are antithetic variates, control variates and impor- tance sampling. This thesis focuses on importance sampling and applies it to diffusion processes with applications primarily in the field of financial mathematics. Importance sampling attempts to reduce variance by changing the probability mea- sure. The Girsanov theorem is used when changing measure for stochastic processes. After applying importance sampling, the resulting diffusion process that will generate a smaller variance of estimation might have, for example, an exploding drift causing large fluctuations. To handle this numerically, smaller time steps in the Monte Carlo simulation might be needed to compensate the possible loss of precision. We arrive to a problem of finding a balance between the number of sample paths, the size of the time steps and the magnitude of the drift added to the diffusion process when using the Girsanov theorem to change measure in the importance sampling method. To generalize the situation, there are two types of estimation errors that occur in the method of Monte Carlo and importance sampling, namely non-negative variance inherent in the Monte Carlo approach and an error in connection with discretization of the stochastic differential equation. To add a constant drift in the method of importance sampling is presumably not opti- mal. For the technique to be as efficient as possible, we need to allow for an exploding drift somewhere in time and space. However, non-constant (possibly exploding) co- efficients lead to numerical difficulties when simulating trajectories. Indeed, such a situation calls for adaptive methods to distribute mesh points which are more com- plex than standard methods with time steps of a fixed size. We do not wish for the new and improved stochastic process with smaller variance to be difficult to simulate. The central dilemma of this thesis is consequently the trade-off between the improve- ments due to importance sampling and the numerical efficiency of the problem and its simulation. Our next approach is to penalize these circumstances where additional computational costs arise after reducing the variance. This issue is discussed and formulated as a stochastic optimal control problem and studied further using the Hamilton-Jacobi- Bellman equation with a penalty term to account for computational cost. The objec- tive of this thesis is thus to examine whether there is an optimal change of measure or an optimal new drift for a diffusion process. Further follows a discussion on the trade-off between the benefits of importance sampling and its computational costs. In the following section, we introduce importance sampling and touch upon some of the numerical issues in Monte Carlo methods. In Section 3 and 4, we go through needed background material such as change of measure and the Girsanov theorem, but also stochastic optimal control problem and the Hamilton-Jacobi-Bellman equation. In Section 5 and 6, we apply these theorems to carefully selected examples. 4 2 Monte Carlo Methods Monte Carlo methods are a class of computational algorithms that can be applied to vast ranges of problems. They provide approximate solutions and are used in cases where analytical or numerical solutions do not exist or are too difficult to implement. It is a computational algorithm that makes numerical estimations by taking the empirical mean of repeated random sampling. It is an easy way of modeling complex situations which allows for applications in a wide range of fields such as finance and engineering. When simulating Monte Carlo methods there are two main factors that affect the cost-effectiveness: the number of sample paths and the size of the time steps. Let N be the number of sample paths and h = ∆t be the size of the time steps in the general Monte Carlo integration method. Then, according to Seydel (2009) and Hirsa (2012), the numerical errors or rates of convergence depending on h and N are p h = O( h); respectively, p N = O(1= N): Further explanations of these rates of convergence are found in Section 2.1 and 2.3. 2.1 Monte Carlo Integration and Convergence of Error Assume a probability distribution with density f, then the expectation of a function h is Z E[h(X)] = h(x)f(x)dx: R In the one-dimensional case, for a definite integral on some interval I = [a; b], we use the uniform distribution with density 1 1 f = 1 = 1 ; b − a I d(I) I where d(I) denotes the length of the interval I. Let Z b α := d(I)E[h(X)] = h(x)dx: a For independent samples Xi ∼ U[a; b], the law of large numbers implies that the approximation N 1 X α^ := d(I) h(X ) ! α a.s. as N ! 1: N N i i=1 To generalize for the higher dimensional case, let I ⊂ Rm. We want to calculate the integral Z αm := h(x)dx: I 5 Again, we draw independent and uniformly distributed samples X1; :::; XN 2 I, then we get the approximation N 1 X α^m := d (I) h(X ); N m N i i=1 where dm(I) < 1 now is the volume, or the m-dimensional Lebesgue measure, m m of I. Following the law of large numbers, α^N converges almost surely to α = R dm(I)E[h(X)] = I h(x)dx as N ! 1. Let Z m δN := h(x)dx − α^N I be the error. Before deriving the variance of the error, let us examine the zero mean and correlation properties. We have N Z 1 X δ = h(x)dx − d (I) h(X ) N m N i I i=1 N 1 X Z = h(x)dx − d (I)h(X ) N m i i=1 I N 1 X Z 1 = h(x) dx − h(X ) d (I) N d (I) i m i=1 I m N Z dm(I) X 1 = h(x) dx − h(X ) : N d (I) i i=1 I m It is easy to show that R h(x) 1 dx − h(X ) has zero mean and considering X and I dm(I) i i X are independent for i 6= j, then R h(x) 1 dx−h(X ) and R h(x) 1 dx−h(X ) j I dm(I) i I dm(I) j are uncorrelated. We have Z 1 E h(x) dx − h(Xi) = 0 I dm(I) and Z 1 Z 1 E h(x) dx − h(Xi) h(x) dx − h(Xj) = I dm(I) I dm(I) Z 1 Z 1 E h(x) dx − h(Xi) E h(x) dx − h(Xj) = 0: I dm(I) I dm(I) Further, we can have a look at the variance of the error 2 2 Var(δN ) = E[δN ] − (E[δN ]) 2 = E[δN ] 2 2 N "Z 2# (dm(I)) X 1 = E h(x) dx − h(X ) N 2 d (I) i i=1 I m (d (I))2 = m Var(h); N 6 where the variance of h is Z 1 Z 1 2 Var(h) := h2(x) dx − h(x) dx : I dm(I) I dm(I) Thus, the standard deviation of the error δN tends to zero with the order p p N := Var(δN ) = O(1= N): Square integrability of h suffices (h 2 L2), the integrands h need not be smooth (Seydel, 2009).