Regression-Based Monte Carlo For Pricing High-Dimensional American-Style Options

Niklas Andersson [email protected]

Ume˚aUniversity Department of Physics April 7, 2016

Master’s Thesis in Engineering Physics, 30 hp. Supervisor: Oskar Janson ([email protected]) Examiner: Markus Adahl˚ ([email protected]) Abstract

Pricing different financial derivatives is an essential part of the financial industry. For some derivatives there exists a closed form solution, however the pricing of high-dimensional American-style derivatives is still today a challenging problem. This project focuses on the called and especially pricing of American-style basket options, i.e. options with both an early feature and multiple assets. In high-dimensional prob- lems, which is definitely the case for American-style options, Monte Carlo methods is advan- tageous. Therefore, in this thesis, regression-based Monte Carlo has been used to determine early exercise strategies for the option. The well known Least Squares Monte Carlo (LSM) algorithm of Longstaff and Schwartz (2001) has been implemented and compared to Robust Regression Monte Carlo (RRM) by C.Jonen (2011). The difference between these methods is that robust regression is used instead of least square regression to calculate continuation values of American style options. Since robust regression is more stable against outliers the result using this approach is claimed by C.Jonen to give better estimations of the option price. It was hard to compare the techniques without the duality approach of Andersen and Broadie (2004) therefore this method was added. The numerical tests then indicate that the exercise strategy determined using RRM produces a higher lower bound and a tighter upper bound compared to LSM. The difference between upper and lower bound could be up to 4 times smaller using RRM. Importance sampling and Quasi Monte Carlo have also been used to reduce the variance in the estimation of the option price and to speed up the convergence rate.

1 Sammanfattning

Priss¨attning av olika finansiella derivat ¨ar en viktig del av den finansiella sektorn. F¨or vissa derivat existerar en sluten l¨osning, men priss¨attningen av derivat med h¨og dimensionalitet och av amerikansk stil ¨ar fortfarande ett utmanande problem. Detta projekt fokuserar p˚aderivatet som kallas option och s¨arskilt priss¨attningen av amerikanska korg optioner, dvs optioner som b˚adekan avslutas i f¨ortid och som bygger p˚aflera underliggande tillg˚angar.F¨or problem med h¨og dimensionalitet, vilket definitivt ¨ar fallet f¨or optioner av amerikansk stil, ¨ar Monte Carlo metoder f¨ordelaktiga. I detta examensarbete har d¨arf¨or regressions baserad Monte Carlo anv¨ants f¨or att best¨amma avslutningsstrategier f¨or optionen. Den v¨alk¨anda minsta kvadrat Monte Carlo (LSM) algoritmen av Longstaff och Schwartz (2001) har implementerats och j¨amf¨orts med Robust Regression Monte Carlo (RRM) av C.Jonen (2011). Skillnaden mellan metoderna ¨ar att robust regression anv¨ands ist¨allet f¨or minsta kvadratmetoden f¨or att ber¨akna forts¨attningsv¨arden f¨or optioner av amerikansk stil. Eftersom robust regression ¨ar mer stabil mot avvikande v¨arden p˚ast˚arC.Jonen att denna metod ger b¨attre skattingar av optionspriset. Det var sv˚artatt j¨amf¨ora teknikerna utan tillv¨agag˚angss¨attet med dualitet av Andersen och Broadie (2004) d¨arf¨or lades denna metod till. De numeriska testerna indikerar d˚aatt avslutningsstrategin som best¨amts med RRM producerar en h¨ogre undre gr¨ans och en sn¨avare ¨ovre gr¨ans j¨amf¨ort med LSM. Skillnaden mellan ¨ovre och undre gr¨ansen kunde vara upp till 4 g˚angermindre med RRM. Importance sampling och Quasi Monte Carlo har ocks˚aanv¨ants f¨or att reducera variansen i skattningen av optionspriset och f¨or att p˚askyndakonvergenshastigheten.

2 Niklas Andersson April 7, 2016

Contents

1 Introduction 5

1.1 Options ...... 5

1.2 Pricing Options ...... 6

2 Theory 7

2.1 Monte Carlo ...... 7

2.1.1 Kolmogorov’s strong law of large numbers ...... 7

2.1.2 Monte Carlo simulation ...... 7

2.1.3 Central Limit Theorem ...... 8

2.1.4 Error estimation ...... 8

2.1.5 Advantages of Monte Carlo ...... 9

2.2 Dynamics of the stock price ...... 9

2.3 Monte Carlo for pricing financial derivatives ...... 10

2.4 Robust Regression ...... 13

2.5 Duality approach ...... 14

2.6 Variance reduction ...... 15

2.6.1 Importance sampling ...... 16

2.6.2 Importance sampling in finance ...... 16

2.7 Quasi Monte Carlo ...... 18

2.7.1 Discrepancy and error estimation ...... 18

2.7.2 Sobol sequence ...... 19

2.7.3 Dimensionality Reduction ...... 20

3 Method 21

3.1 Algorithms ...... 22

3.1.1 LSM and RRM ...... 22

3 Niklas Andersson April 7, 2016

3.1.2 Duality approach ...... 23

4 Results 24

4.1 LSM vs RRM ...... 24

4.1.1 Duality approach ...... 29

4.2 Quasi MC ...... 31

4.3 Importance Sampling ...... 34

4.4 Combinations ...... 36

5 Discussion 37

6 Appendix A1 39

6.1 Importance sampling example ...... 39

6.2 Not only in-the-money paths ...... 40

7 References 40

4 Niklas Andersson April 7, 2016

1 Introduction

After the financial crisis that started in 2007, the derivatives markets have been much criticized and many governments have introduced rules requiring some over-the-counter (OTC) derivatives to be cleared by clearing houses[1]. This thesis has been performed at Cinnober Financial Technology. Cinnober is an independent supplier of financial technology to marketplaces and clearinghouses. When more and more complex derivatives are brought to the market the requirements of the clearing houses to clear and evaluate them increases. Therefore, the pricing of different derivatives is an important part of the financial industry today. A financial derivative is a financial instrument that is built upon a more basic underlying variable like a bond, interest rate or stock. This project focuses on the derivative called option and specifically the pricing of American-style multi-asset options using regression-based Monte Carlo methods.

1.1 Options

There are two different kinds of options, call options and put options. Buying a means buying a contract that gives the right to buy the underlying asset at a specified date, date T, for a specified price, K. This is also referred to as taking a long position in a call option and for this you pay a premium. If one instead takes a short position in a call option you have the obligation to sell the underlying asset at the expiration date T for the strike price K and for this obligation you receive the premium. On the other hand, a long position in a gives the right to sell the underlying asset whereas the short position has the obligation to buy the underlying asset, see Table 1.

Table 1: Explanation of the states Long/Short in a Call/Put option.

Long: Gives the right to buy the underlying asset. Call ↓ premium Short: Obligation to sell if long position choose to exercise. Long: Gives the right to sell the underlying asset. Put ↓ premium Short: Obligation to buy if long position choose to exercise.

The difference between an option and future/forward contracts is that the holder of a option has the right, but not the obligation, to do something whereas the holder of a future/ is bound by the contract. Let’s take a call option to buy a stock S(t) with strike prize K and expiration date T as an example. If the stock price S(T ) exceeds the strike price K in the future time T > t the holder can exercise the option with a profit of S(T ) − K. If instead the stock price S(T ) is less or equal to the strike price K the option is worthless for the holder. The payoff of a call option, C with strike price K and expiration date T , is thus,

C(S(T ),K,T ) = max{0,S(T ) − K}. (1)

Options that only can be exercised at the expiration date T are called European options and options that can be exercised at any time are called American options. There are also options whose value depend not only on the value of the underlying asset at expiration but on the whole path of the asset. The is an example of this kind of option. It either becomes active or stop being active if the underlying asset hits a pre-determined barrier during the lifetime of the option. A knock- in barrier option becomes active only if the barrier is crossed and knock-out barrier option pays nothing if the barrier is crossed. The payoff of a down-and-in barrier call option with strike

5 Niklas Andersson April 7, 2016 price K, maturity date T and barrier H is,

C(S(T ),K,H,T ) = 1S(t)≤H · max{0,S(T ) − K}, (2) where 1S(t)≤H is 1 if S(t) ≤ H at any time in the interval t=[0,T] and 0 otherwise.

There are also options on multiple underlying assets, these are called basket options. The payoff of a basket option is typically a function of either the maximum, minimum, arithmetic average or geometric average of the assets prices, see Table 2.

Table 2: Different payoffs of multi-asset Call and Put options with strike price K and D number of assets.

Type Call Put

Maximum: max{0, max(s1, ..., sD) − K} max{0,K − max(s1, ..., sD)} Minimum: max{0, min(s1, ..., sD) − K} max{0,K − min(s1, ..., sD)} QD 1/D QD 1/D Geometric average: max{0, ( i=1 si) − K} max{0,K − ( i=1 si) } PD PD Arithmetic average: max{0, ( i=1 si)/D − K} max{0,K − ( i=1 si)/D}

1.2 Pricing Options

Pricing European options is often done using the well known Black-Scholes-Merton model. This model has had a big influence of the financial market since it provides a theoretic price for European options. The importance of the model was recognized in 1997 when the inventors where awarded with the Nobel prize in economics. Pricing American options is harder than pricing European options, in fact the pricing and optimal exercise of options with early exercise features is one of the most challenging problems in mathematical finance[12]. Before you can value this kind of option you need to determine when the option will be exercised. There has been a lot of research in this area trying to determine the optimal stopping strategy and it is one topic that this project will focus on. Regression-based Monte Carlo techniques for determining optimal stopping strategies had a breakthrough with the proposal of Longstaff and Schwartz in 2001 [3]. In their method least square regression is used to determine exercise dates. C. Jonen published his dissertation in 2011 where he instead suggests using robust regression in the regression step. This project will focus on comparing the least square method by Longstaff and Schwartz, and the robust regression method by C. Jonen [4].

As mentioned there is no analytic formula for pricing an American option so one instead turns to numerical methods, like the binomial model or the finite difference method to solve the partial stochastic differential equation describing the price process of the option. These methods work very good for options that have only one underlying asset but if the option is a multi-asset option with two or more underlying assets these methods lose their strength. Another approach to price options are Monte Carlo methods and this is the approach that will be used in this project. Instead of finding a solution of the price numerically we simulate the path of the underlying assets using a stochastic representation and valuate the option from this path. We repeat this procedure many times, calculate the average of all these simulations and use this as an estimate of the option price.

An advantage of Monte Carlo is that it is a relatively easy procedure which can be applied to price various kinds of derivatives. The disadvantage is that it is a computational costly method since stochastic paths have to be calculated which implies a slow convergence rate. However the √1 convergence rate O( n ) holds for all dimensions in contrast to numerical methods where the con- vergence rate decreases with the number of dimensions. American-style basket options incorporate

6 Niklas Andersson April 7, 2016 two sources of high-dimensionality, the underlying stocks and time to maturity. Therefore, Monte Carlo methods are often the method of choice when pricing American-style basket options[2].

For the most part there is nothing or very little that can be done about the rather slow convergence rate of Monte Carlo, under appropriate conditions quasi Monte Carlo is an exception. We can, however, look for better sampling techniques and variance reduction methods to reduce the variance in the estimation. There are many different types of variance reduction techniques, antithetic variates, control variates, importance sampling, stratified sampling and quasi Monte Carlo are some well known techniques. Depending on the type of option it may vary which method is the best. In this project the use of importance sampling will be examined, this is a more complex method and require some knowledge of the underlying problem but it has the capacity to produce orders of magnitude variance reduction. The effect of quasi Monte Carlo will also be studied.

2 Theory

Before I explain how to use Monte Carlo for financial applications I will start this section by giving the basic theory for Monte Carlo methods in general, this is handled in subsection 2.1. Subsection 2.2 and 2.3 will be about how to model stock prices and how to use regression-based Monte Carlo to determine exercise dates for options. The basic idea of robust regression and the extension to duality methods is explained in subsection 2.4 and 2.5 and I will end this section with some theory of importance sampling and quasi Monte Carlo.

2.1 Monte Carlo

2.1.1 Kolmogorov’s strong law of large numbers

This law is the main justification of the Monte Carlo methods and state that the average of a sequence of iid, i.e. independent identically distributed, variates will converge almost surely to the expected value. Given a sequence of iid variates ζi with expectation

E[ζi] = µ, (3) define the average as N 1 X X¯ = ζ . (4) N i i=1 Then this average will converge almost surely to µ,

X¯ −−→a.s. µ, when N → ∞. (5)

2.1.2 Monte Carlo simulation

Let’s consider the following integral,

Z 1 1 θ = f(x)dx = [F (x)]0. (6) 0

7 Niklas Andersson April 7, 2016

If the function f(x) is a complicated function and it is hard to find the primitive function F (x) ˜ 1 PN we can instead use Monte Carlo to approximate the integral with θN = N i=1 f(Ui), where Ui ∼ U(0, 1) are random uniform variables with density,  1 0 ≤ x ≤ 1 g (x) = (7) U 0 Otherwise. ˜ To understand that θN is a estimator of θ lets first show that the expectation of E[f(U)] = R 1 0 f(x)dx. Z Z 0 Z 1 Z ∞ E[f(U)] = f(x)gU (x)dx = f(x)gU (x)dx + f(x)gU (x)dx + f(x)gU (x)dx = R −∞ 0 1 (8) Z 0 Z 1 Z ∞ Z 1 f(x) · 0dx + f(x) · 1dx + f(x) · 0dx = f(x)dx. −∞ 0 1 0

The next step is to use the Kolmogorov’s strong law of large numbers ˜ 1 PN a.s. θN = N i=1 f(Ui) −−→ E[f(U)] = θ as N → ∞. (9)

Hence, by simulating U1, U2, ..., UN , evaluating the function at these points and averaging we get an unbiased estimator of θ.

2.1.3 Central Limit Theorem

2 Let ζ1, ζ2,..., ζN be iid. random variables with E[ζi] = µ and V ar[ζi] = σ < ∞, then the composite 1 PN ζ −µ variable X = N i=1√ i converges in distribution to the standard normal as N increases, N σ/ N

1 PN ζ −µ i.d. X = N i=1√ i −−→N (0, 1) as N → ∞. (10) N σ/ N

2.1.4 Error estimation

Given the Monte Carlo estimation in (9) the central limit theorem gives

˜ i.d. θN√−θ −−→N (0, 1) as N → ∞ (11) σ/ N ˜ where σ is the standard deviation of θN . Since σ is unknown we have to approximate it with s, v u N u 1 X s =t (f(U ) − θ˜ )2. (12) N − 1 i N i=1

˜ i.d. s2 By inserting s instead of σ in (11) we see that θN −−→N (θ, N ) this leads to the definition of the ˜ standard error, SE which is a measure of the deviation in θN . q ˜ s SE = V ar(θN ) =√ . (13) N ˜ We can now create a 100(1 − α)% confidence interval for the estimator θN as ˜ ˜  θN − z1−α/2 · SE; θN + z1−α/2 · SE (14) where z1−α/2 denotes the (1 − α/2) quantile of the normal distribution and z1−0.05/2 ≈ 1.96 for a 95% confidence interval.

8 Niklas Andersson April 7, 2016

2.1.5 Advantages of Monte Carlo

The integral in (6) can also be calculated numerically using for example the trapezoidal rule,

N−1 f(0) + f(1) 1 X θ˜t.r = + f(i/N). (15) N 2N N i=1 The error from using trapezoidal rule is of order O(N −2) which is better than the Monte Carlo estimate, O(N −1/2), so how come we use Monte Carlo? The strength of Monte Carlo is that it is of order O(N −1/2) regardless of the dimensionality of the problem, whereas the error for the trapezoidal rule is of order O(N −2/d). This degradation in convergence rate as the dimension of the problem increase are common for all deterministic integration methods[2]. Hence, Monte Carlo methods are superior for evaluating integrals in higher dimensions.

2.2 Dynamics of the stock price

In order to use Monte Carlo methods to price options we need to generate paths of the underlying asset, thus stocks. One common assumption in financial applications is that the evolution of the stock price can be represented by a stochastic process, and especially the geometric Brownian motion under the risk neutral valuation, dS(t) = (r − δ)dt + σdW (t) (16) S(t) where, r is the risk-free interest rate, δ the yearly dividend yield, σ the and W is a Brownian motion. A standard Brownian Motion (or a Wiener process) is defined by the following conditions,

1. W (0) = 0. 2. The process W has independent increments, i.e. if r < s < t < u then W (u) − W (t) and W (s) − W (r) are independent stochastic variables. √ 3. For s < t the stochastic variable W (t) − W (s) has a Gaussian distribution N(0, t − s) . 4. W has continuous trajectories.

The solution of (16) above is given by,

(r−δ− 1 σ2)∆t+σ∆W S(tl+1) = S(tl)e 2 , (17) and due to the conditions 2 and 3 of the Brownian Motion, it possible to simulate the value of a stock at discrete time steps, l = 0, ...L − 1 as

(r−δ− 1 σ2)∆t+σZ(l+1) S(tl+1) = S(tl)e 2 , (18) where Z(1), Z(2),... are N(0,p∆t). For derivation of equations (16) and (17), I recommend the book ”Arbitrage Theory in Continuous Time” by Tomas Bj¨ork[13].

To generate paths of multiple assets the multidimensional Geometric Brownian Motion is used

dSi(t) = (r − δi)dt + σidXi(t), i = 1, ..., d (19) Si(t)

9 Niklas Andersson April 7, 2016

where r is the risk-free interest rate, δi and σi is the yearly dividend yield and the volatility of the ith asset Si. X is the d-dimensional Brownian motion with covariance matrix Σi,j = ρi,jσiσj. In the multidimensional Brownian motion the√ increments are multivariate normally distributed with covariance matrix Σ, X(t) − X(s) ∼ N(~0, t − s Σ). Hence, the following equation can be used for generating paths of multiple assets √ 1 2 Pd (r−δi− σ )∆t+ ∆t Aij Zl+1,j Si(tl+1) = Si(tl)e 2 i j=1 , i = 1, ..., d (20) where Zl = (Zl1, ..., Zld) ∼ N(0, 1d) and A is chosen to be the Cholesky factor of the covariance matrix Σ, satisfying AAT = Σ.

For more information and algorithms on how to model dependencies between variables I refer to Glasserman [2] or J¨ackel [5].

2.3 Monte Carlo for pricing financial derivatives

In the world of finance there is a big need of calculating expectations and determine future values of different derivatives. The underlying processes are often simulated using stochastic differentials and it is common that if the expectations are written as integrals the dimensions are very large. Depending on the problem the dimensionality is often at least as large as the number of time steps. The strength of Monte Carlo is that it is a very attractive method for solving problems with high dimensionality and therefore the method is widely used in financial engineering.

To price a financial derivative by Monte Carlo one usually follows these steps,

• Simulate paths of the underlying assets. • Evaluate the discounted payoffs from the simulated paths. • Use the average of these discounted payoffs as an estimate of the derivative price.

However, the early exercise feature of an American-style option makes valuation harder and before we can price an American-style option we need a way to determine when the option will be exercised. When using Monte Carlo we simulate paths of the underlying asset as described in section 2.2 and use these to valuate the option, hence it may seem as an easy problem to determine when the option should be exercised. However, using knowledge of the asset paths and exercising at the optimum is referred to as perfect forecast and tend to overestimate the option price. To better estimate the exercise time one can use parametric approximation to determine exercise regions before the simulation. But in higher dimensional problems the optimal exercise regions can be hard to approximate using this technique [2, P.427].

Another approach to estimate the value of American options is by regression-based Monte Carlo methods. At each exercise time the holder of the option compares the value of exercising the option immediately with the expected payoff of continuation. The value of exercising the option immediately at time t is called the intrinsic value whereas the expected payoff of holding on to the option is called continuation value. The key insight in regression-based Monte Carlo methods is that the continuation value can be estimated using regression. Before I go on and describe the method, let’s begin with some assumptions[4].

• (Ω, F, P˜) is a complete probability space, where the time horizon [0,T] is finite and F = {Ft|0 ≤ t ≤ T } is the filtration with the σ -Algebra Ft at time date t.

10 Niklas Andersson April 7, 2016

• There are no arbitrage opportunities in the market and the market is complete. This implies existence of a unique martingale measure P , which is equivalent to P˜.

• Bt denotes the value at time t of 1 money unit invested in a riskless money market account at time date t = 0, i.e. Bt is described by

dBt = rtBtdt, B0 = 1,

where rt is the risk free interest rate at time t. Then, Ds,t denotes the discount factor given by Ds,t = Bs/Bt, s, t ∈ [0,T ]. In the special case of a constant risk-free rate r, we have

rt −r(t−s) Bt = e and Ds,t = e .

Furthermore we let the underlying asset {S(t), 0 ≤ t ≤ T } be a Markov processes that contains all necessary information about the asset price and we restrict ourselves to the valuation of Bermudan options, i.e. options that only can be exercised at a fixed set of exercise opportunities, t1 < t2 < ··· < tL and denote Sl the state of the Markov process at time step l. In the following the conditional expectation given the Markov process up until time step l is denoted El[·] = E[·|Stl ]. A fair price of an Bermudan option at time t0 is then given by the optimal stopping problem

sup E0[D0,τ Zτ ], (21) τ∈τ0,L where τ0,L is the set of all stopping times with values in {0,...,L} and (Zl)0≤l≤L is an adapted payoff process. Arbitrage reasoning justifies calling this a fair price for the option[2]. If we let Vl(s) denote the value of the option at tl given Sl = s we are interested in the value V0(S0) which may be determined recursively as follows

VL = ZL (22) Vl = max(Zl,El[Dl,l+1Vl+1]), l = L − 1, ..., 0.

This is the dynamic programming principle (DPP) in terms of the value process Vl which is a natural ansatz to solve the optimal stopping problem in (21). We start at the last time step because we know that the value of the American option at this time step will equal the value of the European option which can be determined.

The dynamic programming recursions (22) focus on the option value, but sometimes it is more convenient to work with the optimal stopping time instead.

τ ∗ = L L ( l , Zl ≥ El[Dl,τ ∗ Zτ ∗ ] (23) τ ∗ = l+1 l+1 , l = L − 1, ..., 0. l ∗ τl+1 , otherwise

Following the DPP and using a set of N simulated paths {Snl}n=1,...,N,l=0,...,L, the value at each exercise date for each path can be determined by

V n = Zn L L (24) n n S Vl = max(Zl ,Cl( nl)), l = L − 1, ..., 0 where Cl(s) denotes the continuation value in state s and at time step l

Cl(s) = El[Dl,l+1Vl+1(Sl+1)], (25)

11 Niklas Andersson April 7, 2016 for l = 0, ..., L − 1. The idea of regression-based Monte Carlo methods is to estimate a model function for the continuation value via regression. A model function for the continuation value Cˆl at time step l is given by a linear combination of M basis functions φ(·), i.e.

M X ˆ Cˆl(s) = βmφm(s) (26) m=1 ˆ where the coefficients βm might be determined by solving the least square problem,

ˆ 2 min ||Cl − Cl||2. (27) βˆ∈RM

Numerically the problem in (27) can be solved by

1 N X n ˆ S 2 min (Cl − Cl( nl)) , (28) ˆ RM N β∈ n=1

ˆ S PM ˆ S n where N is the number of paths, and for n = 1, ..., N we have Cl( nl) = m=1 βmφm( nl) and Cl are the realizations of the continuation value of each path n,

n −r∆t n −r∆t n ˆ S Cl = e Vl+1 = e max{Zl+1, Cl+1( n,l+1)}, (29)

n where Zl is the payoff at time step l of path n. Following this procedure gives an approximation of the option price today, i.e.

N 1 X Vˆ = max{Z , e−r∆t V n}. (30) 0 0 N 1 n=1

To estimate option prices through (30) has however shown to give poor estimations and a break- through for regression-based Monte Carlo came with the least square Monte Carlo (LSM) technique, proposed by Longstaff and Schwartz in 2001, [3]. They work with the DPP in terms of the optimal stopping time as in (23), and hence try to determine the optimal stopping time for each simulated path by n τL = L ( S ˆ S n l , Zl( nl) ≥ Cl( nl) (31) τl = n , l = L − 1, ..., 1 τl+1 , otherwise n where the dependent variable Cl , in the regression step is determined by

n −r(τ n −l)∆t C = e l+1 Z n , n = 1, ..., N, l = 1, ..., L (32) l τl+1 instead as in (29). Once the optimal stopping time for each path has been determined the option value can be estimated N 1 X n ˆ −rτ1 ∆t n V0 = max{Z0, e Zτ n }. (33) N 1 n=1

To evaluate Vˆ0 above one can either use the same set of paths that determined the stopping strategy or one might simulate a new set of paths and exercising according to the determined exercise strategy. Longstaff and Schwartz recommend using only in-the-money paths in estimating continuation value, i.e. only paths that can be exercised with a profit will be considered in the regression step.

12 Niklas Andersson April 7, 2016

2.4 Robust Regression

The idea of robust regression is to take outliers into account once calculating the continuation value in the least square Monte Carlo approach. Therefore the minimization problem in (28) is replaced with 1 N X n ˆ S min `(Cl − Cl( nl)) (34) ˆ RM N β∈ n=1 where ` is a suitable loss function. Thus in order to implement the robust regression technique in an already running LSM system we only replace the solver for regression. In the following the n residuals rl will be denoted n n ˆ S rl = Cl − Cl( nl) , l = 1, ..., L − 1, n = 1, ..., N. The loss functions that will be used in this project are, ordinary least square OLS, Huber and Jonen’s loss functions, see Table 3. To get an intuition of the properties for these loss functions Figure 1 show their graphs.

Table 3: Different loss functions `(·) for robust regression.

`(r)

OLS r2

( 2 r , |r| ≤ γ1 Huber 2 2γ1|r| − γ1 , |r| > γ1

 r2 , |r| ≤ γ  1 2 Jonen 2γ1|r| − γ1 , γ1 < |r| < γ2  2 2γ1γ2 − γ1 , |r| ≥ γ2

Figure 1: Illustration of the different loss functions for robust regression, γ1 and γ2 are transition points.

13 Niklas Andersson April 7, 2016

The idea of robust regression is to get a better approximation of the continuation value by giving outliers less weight in the regression step. Outliers occur by strongly fluctuated paths and the determination of outliers is not trivial[4, P.21]. The transition points γ1 and γ2 determine which points are considered outliers and not. The outlier detection procedure recommended by C.Jonen is the following, help rn = |rn|, n = 1, ..., N, r = (r1, ..., rN )

rhelp = sort(rhelp) (35) help γ1 = rbαNc, 0 << α < 1

help γ2 = rbβNc, α < β < 1. The assumption is hence that (1 − α)100 percent of the data points are outliers. The reason that this empirical α-quantiles procedure is suggested is because we are not familiar with any distribution of the error[4].

2.5 Duality approach

Andersen and Broadie introduced their Primal-Dual Simulation Algorithm in 2004 [9] which today is a valuable extension to regression based Monte Carlo methods. Their method combined with the LSM method of Longstaff and Schwartz [3] is often the method of choice and implemented in many option pricing systems of financial institutions today[4]. Following the strategy of Longstaff and Schwartz as in (31) usually produces a low estimator of the option price. This is due to that the value of the option V0 is given by the supremum in (21) which is achieved by an optimal stopping time τ ∗ hence following the procedure in (31) produces a low estimation of the option price

L0 := E0[D0,τ Zτ ] ≤ V0 = sup E0[D0,τ Zτ ]. (36) τ∈τ0,L

The idea of Andersen and Broadie’s approach is to create a high estimation to constrict the fair option price V0 between the low and high estimation.

A consequence of the DPP (22) is that

Vl ≥ El[Dl,l+1Vl+1(Sl+1)], (37) for all l=0,...,L-1. This is the defining property of a supermartingale. For more theory on martin- gales I refer to [13]. Due to (37) we can derive a dual problem for pricing options with an early exercise feature as follows[4]

1 Let H be the space of martingales M = (Ml)0≤l≤L for which sup0≤l≤L |Ml| ∈ Lp where Lp denotes the space of all random variables with finite p-th moment, p ≥ 1. For any martingale M ∈ H1, we have sup E0[D0,τ Zτ ] = sup E0[D0,τ Zτ − Mτ + Mτ ] τ∈τ 1,L τ∈τ 1,L

= sup E0[D0,τ Zτ − Mτ ] + M0 (38) τ∈τ 1,L

≤ E0[ max (D0,lZl − Ml)] + M0, l=1,..L

14 Niklas Andersson April 7, 2016 where the second equality follows from the martingale property and the optional sampling theorem. As the upper bound holds for any martingale M, the option value might be estimated by

sup E0[D0,τ Zτ ] ≤ inf (E0[ max (D0,lZl − Ml)] + M0). (39) M∈H1 l=1,..L τ∈τ 1,L

The right hand side of (39) is called a dual problem for pricing options with a early exercise feature. One can show that (39) holds with equality if the optimal martingale is found, to show this one may consider the Doob-Meyer decomposition of the supermartingale (D0,lVl)0≤l≤L see Rogers [14]. Finding the optimal martingale appears to be as difficult as solving the original stopping problem, but if we can find a martingale Mˆ that is close to the optimal martingale we can use

E0[ max (D0,lZl − Mˆ l)] + Mˆ 0 (40) l=1,..L to estimate an upper bound for the option price. A suboptimal exercise policy provides a lower bound and by extracting a martingale from this suboptimal policy the dual value complements the lower bound with an upper bound. Thus the strategy becomes [4]

1. Find a stopping policy τ with values in {1,...,L} and a martingale Mˆ that are optimal in the ∗ sense that they are good approximations to the optimal stopping times τ1 and the optimal martingales M ∗, respectively.

2. Calculate a lower bound L0 and an upper bound U0 for the fair value of an American-style option by

L0 := E0[D0,τ Zτ ] ≤ sup E0[D0,τ Zτ ] ≤ E0[ max (D0,lZl − Mˆ l)] + Mˆ 0 =: U0. (41) l=1,...,L τ∈τ 1,L

The algorithm for this approach and an example of how to find a martingale is described in the section 3.1.2. When pricing high dimensional options it is often hard to calculate any benchmarks values, therefore the method of Andersen and Broadie is often implemented so that we are not in the dark with our calculations. Another reason for using this method is that the difference between the lower and upper bound can be seen as a measure of how good we have estimated the optimal stopping time. The less difference the better estimate of the optimal stopping time. Therefore we can use this method to compare and evaluate the least square Monte Carlo (LSM) with the robust regression Monte Carlo (RRM).

2.6 Variance reduction

Let θ be the parameter of interest, such that θ = E(X) where we can simulate X1, X2, ...XN ˆ ˆ and use the average of these as an estimator θN of θ. Then to make θN a good estimator we q ˆ need the standard error SE = var(θN )/N, to be small. This can be done by increasing the number of samples, N, but we notice that in order to increase the precision by a factor 2 we need to increase the number of samples by a factor of 4. Another strategy is to work with different ˆ variance reduction techniques. We cannot decrease V ar(θN ) but we can find another variable Y such that E[Y ] = E[X] but V ar[Y ] < V ar[X]. There are many different variance reduction techniques that can be used, antithetic variables, control variates, stratified sampling to mention some. The method that will be examined and used in this project is importance sampling which is a more complex method and requires some knowledge of the particular problem but it has the capacity to reduce the variance significantly.

15 Niklas Andersson April 7, 2016

2.6.1 Importance sampling

R The standard Monte Carlo approach will have problem estimating θ = Ef (h(X)) = h(x)f(x)dx if it is very unlikely that h(x) is obtained under the probability function f(x), i.e. only in very rare cases we observe h(x). If we instead can find a probability function g(x) that makes it more likely that we obtain h(x) we can observe that,

Z Z h(x)f(x) h(x)f(x) θ = E (h(X)) = h(x)f(x)dx = g(x)dx = E ( ). f g(x) g g(x)

Hence we can write h(x)f(x) θ = E ( ), g g(x)

N ˆ 1 P where x has the new density g instead of f. So instead of using θ = N h(Xi), where Xi i=1 N ˆ 1 P h(Xi)f(Xi) are sampled from density f(x) we can use θIS = as our estimator where Xi are N g(Xi) i=1 sampled under g(x). The weight f(Xi)/g(Xi) is called the likelihood ratio and we should try to find g(x) ∝ h(x)f(x) to obtain large variance reduction using this technique.

In order to select a proper density g, the exponentially tilted distribution density is often used,

etxf(x) g(x) = f (x) = , (42) t M(t) where −∞ < t < ∞ and M(t) is the moment generating function M(t) = E[etx]. See Appendix for an example of how to use importance sampling.

2.6.2 Importance sampling in finance

The idea of importance sampling is to change the drift of the underlying assets and drive paths into important regions. We would like to drive the paths in such a way that early exercise is enforced and zero-paths vanish, i.e. paths that lie out of money at maturity. To get a feeling of how importance sampling can be used and the advantages of the method I have created the plots in Figure 2 where the paths in the left figure have a change of drift such that they go down to a barrier and up to strike in 50 time steps. This can be compared to the figure where no importance sampling is used. If we for example want to calculate the price of a down-and-in barrier option it would of course be favorable to use the paths in the left figure.

16 Niklas Andersson April 7, 2016

Figure 2: Simulated price paths of 10 stocks with importance sampling to the left and without to the right.

Suppose the underlying asset S is modeled through the geometric Brownian motion GBM(µ, σ2) with drift µ and volatility σ, we can write this as

Ln Pn S(tn) = S(0)e ,Ln = i=1 Xi, (43)

2 2 2 2 with Xi ∼ N(˜µ, σ˜ ), whereµ ˜ = (µ − 0.5σ )∆t andσ ˜ = σ ∆t and ∆t = tn+1 − tn. From this we know that the probability density function of Xi is

2 1 − (x−µ˜) f(x) = √ e 2˜σ2 . (44) σ˜ 2π If we then change the driftµ ˜, according to any proposed strategy and use the driftµ ˜∗, the proba- bility density function for this new drift becomes

∗ 2 1 − (x−µ˜ ) g(x) = √ e 2˜σ2 . (45) σ˜ 2π The likelihood ratio for this change of measure is

n Y f(Xi) . (46) g(X ) i=1 i

Assuming that the expectation Ef exists and is finite, the expectation for all functions h can then be calculated as n Y f(Xi) Ef [h(St ,St , ..., St )] = Eg[h(St ,St , ..., St ) ] (47) 1 2 n 1 2 n g(X ) i=1 i where Ef denotes the expectation under the original measure and Eg the expectation under the new measure. For more information and examples on how to use importance sampling, see [2].

To obtain a large variance reduction using this technique we need to optimize the change of drift according to the specified problem, how to optimize the change of drift can be seen in [4]. In this project a simpler approach will be investigated where the paths of an out-of-money option is driven into in-the-money regions. Once the paths are in-the-money they evolve according to the ordinary geometric Brownian motion.

17 Niklas Andersson April 7, 2016

2.7 Quasi Monte Carlo

Quasi Monte Carlo or low discrepancy methods differ from ordinary Monte Carlo in the way that they make no attempt to mimic randomness. Let’s consider an example of calculating the discounted payoff of an European arithmetic put option with d number of assets,

d 1 X h(s (T ), ..., s (T )) = e−rT · max((K − s (T )), 0). (48) 1 d d i i=1 The price can then be calculated as,

d √ −rT 1 X (r−0.5σ2)T +σ T Φ−1(U )) E[h(s , ..., s )] =E[e (K − max( s (0) · e i i i , 0)] 1 n d i i=1 Z (49) = E[f(U1, ..., Ud)] = f(x)dx.

[0,1)d

1 PN Instead of generating pseudo random numbers Ui ∼ U(0, 1) and estimate the integral with N i=1 f(Ui), d let’s choose X1,X2, ..., XN carefully to fill the hypercube [0, 1) evenly and use these to estimate 1 PN d the integral as N i=1 f(Xi). A sequence that evenly fills the hypercube with dimensions d, [0, 1] is called a low discrepancy sequence.

2.7.1 Discrepancy and error estimation

Discrepancy is a measure of how uniformly a sequence is distributed in the unit hypercube. Given d a collection A of subsets of [0, 1) , the discrepancy of the point set {x1, ..., xn} relative to A is

#{xi ∈ A} D(x1, ..., xn; A) = sup − vol(A) , (50) A∈A n where #{xi ∈ A} denotes the number of xi contained in A and vol(A) denotes the volume of A. Taking A to be the collection of all rectangles in [0, 1)d of the form

Qd j=1[uj, vj), 0 ≤ uj < vj ≤ 1, (51) yields the ordinary discrepancy D(x1, ..., xn). Restricting A to rectangles of the form

d Y [0, uj) (52) j=1

∗ defines the star discrepancy D (x1, ..., xn) [2, p.284]. Suppose that we fix an infinite sequence x1, x2, x3... where every subsequence x1, ..., xn has a low discrepancy. Such a sequence is called a low d ∗ log(n) discrepancy sequence if the star discrepancy D (x1, ..., xn) is of O( n ). The star discrepancy is a central part for error bounds of quasi Monte Carlo integration, and an upper bound for the integration error is given by the Koksma-Hlawka inequality. The Koksma- Hlawka inequality says that if f has a bounded variation V (f) in the sense of Hardy and Krause on [0, 1)d, then for any d sequence XN on [0, 1) the following inequality holds,[2, p.288]

N Z 1 X ∗ f(Xi) − f(x)dx ≤ V (f)D (X1, ..., XN ). (53) N d i=1 [0,1) ]

18 Niklas Andersson April 7, 2016

This inequality can be compared to the error information available in standard Monte Carlo. From the central limit theorem we know that N Z 1 X σf f(Ui) − f(u)du ≤ zδ/2 ·√ , (54) N d i=1 [0,1) ] N where Ui ∼ U(0, 1). In comparing the standard Monte Carlo error with quasi Monte Carlo it can be seen that in both cases the error bound is a product of two terms, one dependent on the integrand ∗ function f and the other on the properties of the sequence. Both V (f) and D (X1, ..., XN ) in (53) are extremely hard to compute which leads to the fact that the Koksma-Hlawka inequality has limited practical use as a error bound. This leads to problems in comparing the quasi Monte Carlo estimate with a standard Monte Carlo estimate[12, p.15-16].

2.7.2 Sobol sequence

Sobol sequences are a low discrepancy sequence, and it is probably the most used sequence for high dimension integration[12]. For more information of how to create a Sobol sequence and the theory behind it I refer to P.Glasserman[2] or P.J¨ackel [5]. In this section I will instead show why Sobol sequences are a better choice for high dimensional integration than other low discrepancy techniques. The aim of a low discrepancy sequence is to fill a given domain as evenly and ho- mogeneously as possible, however many techniques lose their capability to produce homogeneous sequences as the dimensions increase [5]. To demonstrate why Sobol’s sequence often is preferred before other techniques like Halton’ method I have created Figures 4 and 5 where I have plotted the projection of the first 1000 points of the Sobol’ sequence and the Halton’ sequence onto a two dimensional projection of adjacent dimensions. The result of using a pseudo random number generator is illustrated in Figure 3 and you can see that there is no difference in low and high di- mensions. The result of using Sobol’s sequence is seen in Figure 4 and the result of using Halton’s sequence is visible in Figure 5. One can see that the Halton’ sequence does not fill the domain evenly for higher dimensions while Sobol’s sequence still does.

(a) (b) (c)

Figure 3: Pseudo Random numbers

19 Niklas Andersson April 7, 2016

(a) (b) (c)

Figure 4: Sobol’s sequence

(a) (b) (c)

Figure 5: Halton’s sequence

2.7.3 Dimensionality Reduction

As mentioned earlier the dimensionality of the problem can be very large when pricing American- style basket options. There are, however, techniques that can be used to lower the effective dimen- sion of the problem. The Principal component analysis can be used to reduce the number of state variables whereas the Brownian bridge construction can be used to reduce the time dimensions. This is not something that this project focuses on but I will explain the Brownian bridge technique briefly and refer to [2] or [12] for further reading.

The Brownian motion, W , is the driving process when simulating stock paths as in (17). It can easily be generated from left to right by the following recursion p W (ti+1) = W (ti) + ti+1 − ti zi+1, (55) where zi+1 is a standard normal random variable and ti+1 > ti. The Brownian bridge construction provides an alternative way of generating the Brownian motion where the terminal nodes W (tN ) of the paths are determined first. All the intermediate nodes are then added conditional on the already known realizations of the paths. Suppose that we know two values of the Brownian motion W (ti) and W (tj), we can then use the Brownian bridge construction to calculate an intermediate value W (tk) as, s tj − tk tk − ti (tk − ti)(tj − tk) W (tk) = W (ti) + W (tj) + zk (56) tj − ti tj − ti tj − ti where ti < tk < tj and zk is a standard normal random variable. Figure 6 illustrates the principle of the method.

20 Niklas Andersson April 7, 2016

Figure 6: Illustration of the Brownian bridge construction.

This implies that we can simulate a path of the Brownian motion in any time order. Since many options are more sensitive to the terminal value of the underlying, than of intermediate states, the variance will be concentrated into large time steps. We can then use the first dimensions of the quasi sequence to control much of the generated paths and use pseudo random numbers for the intermediate steps. Hence getting a lower effective dimensionality[12].

3 Method

This thesis has been performed at Cinnober Financial Technology where I have implemented a system for pricing American-style options on multiple underlying assets using regression based Monte Carlo. The aim of the first part of the project was to make a comparative study between the LSM method and the RRM method. The LSM method of Longstaff and Schwarz was implemented with the possibility to replace the solver for regression with robust regression. To compare the results we focus our attention on the duality approach by Andersen and Broadie. The second part of the project focused on convergence rates therefore the possibility to use Quasi Monte Carlo and importance sampling were implemented in the system as part of the investigation.

I have also built an n-dimensional extension to the lattice binomial method according to [6] which has been used for comparison and verification. This method works very good for the one and two asset case but if more assets are used the computational time explodes. For high-dimensional valuation problems, the binomial method is not practically useful.

In all the numerical investigations I have assumed that the underlying assets follows a multi- dimensional geometric Brownian motion as in (16). Monte Carlo algorithms are highly dependent on the quality of the random number generator and creating a pseudo-random number is still today not a trivial thing. I have used the Mersenne Twister, MT19937, generator to produce uniform pseudo random numbers and the transformation to normal random numbers is made through the Box-Muller method. All code has been written in Java which provides already built in methods for generation of both pseudo random numbers and Sobol’s low discrepancy numbers. I could also use built in solvers for least square problems. I did not however find any technique to perform

21 Niklas Andersson April 7, 2016 robust regression so I had to implement it by myself following the algorithm in [4].

Once working with Monte Carlo and random numbers it is favorable to set a seed in the pseudo number generator and I have used the linear congruential method to produce a seed vector, x = (as + b)modM, x = (ax + b)modM 0 0 i i−1 (57) where i = 1, ..., 623, s0 = seed = 5, a = 214013, b = 2531011, M = 429496729.

3.1 Algorithms

3.1.1 LSM and RRM

The LSM and RRM algorithm can be summarized as follows

S 1. Generate N independent paths { nl}n=1,...,N0,l=1,...,L, according to the multivariate geomet- ric Brownian motion.

2. At the terminal nodes, loop over all paths and set CFn = ZL(SnL), and τn = L. 3. Apply backward induction for l = L − 1 to 1,

• set J = 0, and for n = 1 to N0,

– if(Snl is in-the-money) then ∗ set J = J + 1 and π(J) = n ∗ fill the regressor matrix AJm = φm(Snl), m = 1, ..., M −r(τn−l)∆t ∗ fill the vector with regressands bJ = e CFn ˆ • perform regression to calculate the coefficients βml • for j = 1 to J do ˆ PM ˆ – C = m=1 βmlAjm ˆ – if(Z(Sπ(j)l) ≥ C) ∗ set CFπ(j) = Zl(Sπ(j)l) and τπ(J) = l

N ˆ 1 P −rτn∆t 4. Calculate C0 = N n=1 e CFn

5. Calculate the value of the option by Vˆ0 = max(Z0, Cˆ0)

Note that the last step of this algorithm implies that the option can be exercised at time 0, i.e. the option can be bought and exercised at the same time step. The number of exercise dates is determined in advance in the case of Bermudan options and in my numerical investigations I did not allow exercising at time 0, instead the option was evaluated as Vˆ0 = Cˆ0. As can be seen in the algorithm above we can easily switch between robust regression and least square regression once ˆ the algorithm has been implemented. Just change the regression part where the coefficients βml are calculated.

2 If we like to use another set of {Snl}n=1,...,N2,l=1,...,L to evaluate the option and hence separate the determination of coefficient part and the evaluation part we can easily do so. Follow the steps 1-3 in the algorithm above to determine the regression coefficients in each time step but replace step 4 and 5 with

22 Niklas Andersson April 7, 2016

S2 4. Generate a new set of independent paths { nl}n=1,...,N1,l=1,...,L, according to the multivariate geometric Brownian motion.

5. for n = 1 to N1 do ˆ PM ˆ S2 • set l = 1 and C = m=1 βmlφ( nl) S2 S2 ˆ • while (Z( nl) ≤ 0 or Z( nl) < C and l 6= L) do ˆ PM ˆ S2 – update l=l+1 and C = m=1 βmlφ( nl) ∗ S2 ∗ • set CFn = Z( nl) and τn = l N ∗ ˆ 1 P −rτn∆t ∗ 6. Calculate C0 = N n=1 e CFn

7. Calculate the value of the option by Vˆ0 = max(Z0, Cˆ0)

3.1.2 Duality approach

In the following subsection I will go through the Andersen and Broadie algorithm that I use to create a lower and upper estimate of the option price.

S0 1. Run the LSM / RRM method with a set of { nl}n=1,...N0,l=1,...,L paths and determine the coefficients that define the stopping criterion τ at each time step such as given by τl = inf{k ≥ l|Zl ≥ Cˆl}. S1 2. Estimate the lower bound L0 by simulating a new set of paths { nl}n=1,...,N1,l=1,...,L and stopping according to the τ1 N1 1 X n ˆ −rτ1 ∆t n L0 = e Zτ n . (58) N 1 1 n=1

A valid (1 − α) confidence interval of L0 can be created by s 2 ˆ σˆL L0 ± z1−α/2 (59) N1

ˆ withσ ˆL being the estimated standard deviation of L0 and z1−α/2 denotes the (1 − α/2) quantile of a standard normal distribution.

3. The upper bound U0 is given as,

−rl∆t U0 = E0[ max (e Zl − Ml)] + M0 =: ∆0 + M0 (60) l=1,...,L

−rτ1∆t −rτl∆t −rτl∆t with M0 = E0[e Zτ1 ], Ml = Ml−1 + El[e Zτl ] − El−1[e Zτl ] =: Ml−1 + δl, l = 1, ..., L, and ( e−rl∆tZ , if Z ≥ Cˆ −rτ1∆t l l l El[e Zτ ] = . (61) l −rτl+1∆t ˆ El[e Zτl+1 ], if Zl < Cl

S2 To estimate the upper bound U0 we simulate another set of N2 paths { nl}n=1,...,N2,l=0,...,L to approximate ∆0 in (60) by N 1 X2 ∆ˆ := ∆ (62) 0 N n 2 n=1

23 Niklas Andersson April 7, 2016

where −rl∆t n n n n n ∆n = max (e Zl − Ml ),Ml = Ml−1 + δl l=1,...,L n and evaluate δl by ( e−rl∆tZn, if Zn ≥ Cˆ (S2 ) δn = l l l nl − Cn l n n ˆ S2 l−1 Cl , if Zl < Cl( nl)

n where Cl are the result of running a simulation in the simulation, i.e. creating N3 new paths S3 S2 { nk}n=1,...,N3,k=l,...,L stating at nl and stopping according to τl+1,

N3 1 m n X −rτl+1∆t m Cl = e Zτ m . (63) N l+1 3 m=1

4. From the results of the procedure above an upper bound U0 can be created

Uˆ0 = ∆ˆ 0 + Lˆ0

and a valid 100(1 − α)% confidence interval for Uˆ0 is given by s 2 2 ˆ σˆL σˆ∆ U0 ± z1−α/2 + (64) N1 N2

whereσ ˆ∆ is the estimated standard deviation of ∆ˆ 0. 5. A valid 100(1 − α)% confidence interval for the true option value can then be determined by s " 2 2 # ˆ σˆL ˆ σˆL σˆ∆ L0 − z1−α/2√ , U0 + z1−α/2 + . (65) N1 N1 N2

4 Results

In this section I will present some numerical results. To start with I concentrate on investigating differences between the robust regression and the least square regression techniques, this is subsec- tion 4.1. The subsections 4.2 and 4.3 are about speeding up the convergence rates via Quasi Monte Carlo and importance sampling. In subsection 4.4 I combine the methods Quasi Monte Carlo and importance sampling with the RRM method and compare it to the LSM method.

4.1 LSM vs RRM

To investigate how the choice of basis functions influence the result I created Table 4 where the price of a max call option on 2 assets has been calculated using both least square regression (LSR) and robust regression (RR) with different basis. The payoff of this option is Z(s1, s2) = max(0, max(s1, s2) − K) and the value of these calculations can be compared with the result from the binomial tree, 8.0735. From the results in Table 4 it is clear that the choice of basis affects the results and if one compares with the value from the binomial tree it seems as if it is preferable to choose a basis that includes the payoff, Z(s1, s2) and interaction terms s1s2.

24 Niklas Andersson April 7, 2016

Table 4: Option value of Bermudan max call option on 2 stocks for LSR, RRHuber and RRJonen method with different basis.

Basis LSR RRHuber RRJonen 3 2 si , si , si, 1 8.00 7.96 7.95 3 2 s1s2, si , si , si, 1 8.02 8.02 8.01 3 2 max(s1, s2), s1, s2, si , si , si, 1 8.05 8.06 8.08 2 2 3 2 s1s2, s1s2, s1s2, si , si , si, 1 8.04 8.02 8.02 2 2 3 2 Z(s1, s2), s1s2, s1s2, s1s2, si , si , si, 1 8.03 8.05 8.05 2 Z(s1, s2), s1s2, si , si, 1 8.05 8.06 8.06

Notes. Common option parameters are T=3, exercise opportunities = 9, i K=100, S0 = 90,i = 1, 2. r=0.05, δd = 0.1 d = 1, 2, σd = 0.2 d = 1, 2, ρ1,2 = 0. Results are created using the average of 10 runs and 100000 replications in each run with α = 0.92, β = 0.993. The result from the binomial tree is calculated with 360 time steps an is 8.0735.

The values in Table 4 have been calculated with the parameters α = 0.92 and β = 0.993 for 2 determining transition points. Now, using the base {Z(s1, s2), s1s2, si , si, 1} from Table 4 I instead examine the effect of different transition points on the same option. The results can be seen in Figure 7, the first three points from the left are the prices using Huber’s loss function with different values of α, the following 9 points are the prices using Jonen’s loss function with different α and β. The two points to the right are the prices calculated using least square regression (LSR). The line corresponds to the value of the binomial tree from 360 time steps. The transition points clearly affect the results and remembering that these estimations actually are low estimates of the option price it is a good sign that the values from robust regression are higher than the least square regression values.

Figure 7: Price of max call option on 2 assets with different transition points in the robust regression method. S0 = 90 for both stocks. Basis according to Entry 6 in Table 4. The results are based on the average of 10 runs and 100000 replications per run. For other option parameters see the notes in Table 4.

If we for a moment try and forget about the Andersen and Broadie’s approach and try to investigate differences between the RRM and LSM methods without the duality approach we will see that it is quite hard to find any differences. In Table 5 I have priced a max call option on 2,3 and 5

25 Niklas Andersson April 7, 2016 underlying assets using both RRM with Huber’s loss function and LSM. These values can then be compared to the value from the binomial tree (in 2 and 3 asset case). It seems that we get slightly smaller confidence intervals and a little higher estimates using robust regression. I use different basis depending on the number of assets and these are according to

2 2 {Z(s1, s2), s1s2, s1, s2, s1, s2, 1}, (66)

2 2 2 3 3 2 2 {1, s1, s2, s3, s1, s2, s3, s1, s2, s1s2, s1s3, s2s3, s1s2, s1s2}, (67) 2 2 2 2 2 3 3 2 2 {1, s1, s2, s3, s4, s5, s1, s2, s3, s4, s5, s1, s2, s1s2, s1s3, s2s3, s1s2, s1s2}, (68) for the 2, 3 resp 5 asset case where the stock prices has been ordered such that s1 > s2 > ··· > sd.

Table 5: Calculated 95% confidence interval using LSM and RRM method for Bermudan max call option on 2, 3 and 5 assets.

Nr of assets S0 Method 95% CI Size CI Est. Price Binomial tree LSM [7.9658, 8.1130] 0.154 8.0420 90 8.0735 RRM [7.9828, 8.1334] 0.151 8.0581 LSM [13.745, 13.937] 0.192 13.841 2 100 13.901 RRM [13.781, 13.970] 0.189 13.875 LSM [21.181, 21.404] 0.223 21.292 110 21.354 RRM [21.205, 21.427] 0.222 21.316 LSM [11.167, 11.341] 0.174 11.254 90 11.322 RRM [11.177, 11.348] 0.171 11.262 LSM [18.566, 18.782] 0.216 18.674 3 100 18.692 RRM [18.575, 18.789] 0.213 18.682 LSM [27.443, 27.691] 0.249 27.567 110 27.620 RRM [27.426, 27.675] 0.249 27.551 LSM [16.531, 16.734] 0.203 16.632 90 - RRM [16.522, 16.723] 0.201 16.623 LSM [26.025, 26.267] 0.242 26.146 5 100 - RRM [26.037, 26.278] 0.241 26.158 LSM [36.574, 36.847] 0.273 36.711 110 - RRM [36.617, 36.891] 0.274 36.754

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd = 0.1 d = 1, ...5, σd = 0.2 d = 1, ..., 5, ρde = 0 ∀d 6= e. For the Binomial tree 360 time steps are used for 2 asset, and 90 time steps for 3 assets. The number of paths in MC calculations are 100000 and repeated 10 times. Basis for 2 assets are according to (66), for 3 assets (67), and for 5 assets (68). The transition points α = 0.90 with Huber loss function.

To compare the variances between the methods I have also plotted the standard error for the 2 assets case and S0 = 100 for both stocks, see Figure 8. From the results in Table 5 and Figure 8 it is hard to say that one method is better than the other.

26 Niklas Andersson April 7, 2016

Figure 8: Standard error calculated with robust regression and least square regression for Bermudan max call option on 2 stocks. S0 = 100 for all stocks, see Table 5 for other option parameters.

To try and find differences between the RRM and LSM technique before adding the technique of Andersen and Broadie I created the histograms in Figure 9 and the density plot in Figure 10. In these figures I have calculated the price of a max call option on 3 and 5 assets 2500 resp 1500 times using 100000 paths per run. In the density plot I have also added the value calculated with the binomial tree for the 3 asset case (18.69). From the density plot one can see that the distributions differ a little bit and again we see that the values calculated with the RRM technique are higher than the LSM. From these figures it seems as if the data are normally distributed and this can not be rejected with statistical tests. Therefore I also do a two-sided student t-test to test if the mean from the two populations are equal, and this test shows that there is a significant difference between the mean of the 2 distributions. However if I also do a test to check for equal variances this test does not fail and hence strengthen the belief that the variances between the populations does not differ. An interesting note from the histograms is that there appears to be more ”extreme” values from the LSM method, i.e. values that lies below 18.55 in 3 asset case and below 25.95 in the 5 asset case.

27 Niklas Andersson April 7, 2016

(a)

(b)

Figure 9: Histogram of the price of a max call option on 3 and 5 assets. See table 5 for parameter settings.

28 Niklas Andersson April 7, 2016

(a)

(b)

Figure 10: Density plot of the price of a max call option on 3 and 5 assets. See table 5 for parameter settings.

4.1.1 Duality approach

If we add the method of Andersen and Broadie and thus create both an upper and lower bound for the option price we will see that it is easier to find differences between the RRM and LSM methods. In Table 6 I have priced an arithmetic call option on 5 assets, the payoff for this option is given by (s + s + s + s + s ) Z(s , s , s , s , s ) = max(0, 1 2 3 4 5 − K). 1 2 3 4 5 5 The basis that I use is according to

2 2 2 2 2 {1, s1, s2, s3, s4, s5, s1, s2, s3, s4, s5}, (69)

29 Niklas Andersson April 7, 2016 and to make a fair comparison of the methods I set a seed in the pseudo number generator according to (57). In the table I calculate both a lower and upper bound of the option price (standard error in the parenthesis) and a 95% confidence interval according to (65). The ∆ ratio is simply the difference between the high and low estimator from the LSM approach divided with the difference between the high and low estimator from the RRM approach. The tighter between the lower and upper bound and the higher the lower bound the more accurate is the approximated early exercise strategy, therefore this ∆ ratio can be used to compare the methods. As can be seen in Table 6 we get higher lower bounds and tighter upper bounds with the RRM method using both Huber’s loss function and Jonen’s loss function compared to the LSM method. The confidence intervals are smaller and we can see a slight reduction in variance using robust regression. The ∆ ratios are between 2.5 and 4.

Table 6: Low estimator, High estimator and a calculated 95% confidence interval for a Bermudan arithmetic average call option on 5 asset, using both LSM and RRM method.

S0 Method Low High 95% CI Size CI ∆ Ratio LSM 1.534 (0.0048) 1.554 (0.0057) [1.525, 1.565] 0.040 90 RRM, Huber 1.545 (0.0046) 1.552 (0.0048) [1.536, 1.561] 0.025 3.1 RRM, Jonen 1.546 (0.0045) 1.552 (0.0047) [1.537, 1.561] 0.024 3.2 LSM 3.955 (0.0068) 3.995 (0.0077) [3.942, 4.010] 0.068 100 RRM, Huber 3.979 (0.0064) 3.995 (0.0067) [3.966, 4.008] 0.042 2.5 RRM, Jonen 3.984 (0.0063) 3.994 (0.0064) [3.971, 4.006] 0.035 4.0 LSM 9.313 (0.0078) 9.371 (0.0091) [9.298, 9.389] 0.091 110 RRM, Huber 9.340 (0.0075) 9.364 (0.0080) [9.326, 9.379] 0.054 2.5 RRM, Jonen 9.343 (0.0074) 9.359 (0.0077) [9.329, 9.374] 0.046 3.6

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd = 0.1 d = 1, ...5, σd = 0.08 + 0.08 · d d = 1, ..., 5, ρde = 0 ∀d 6= e. The number of paths in MC calculations are N0 = 400000, N1 = 1000000, N2 = 1500, N3 = 8000. Basis are according to (69). The transition points are α = 0.9 for Huber and α = 0.87 and β = 0.993 with Jonen’s loss function.

In Table 7 I again use the seed in (57) but change the option to a max call option on 2 assets and use the base according to (66). I calculate low and high estimates, 95% confidence intervals and ∆ ratios. For this option benchmarks values can be calculated using the binomial tree, and these are 8.074, 13.90 and 21.35. It can be seen that the calculated confidence intervals cover these ”true” values and again we see that we get higher lower bounds and tighter upper bounds using the RRM method compared to the LSM method. The ∆ ratios are not as large as in Table 6 but they are between 1.4 and 2.4.

30 Niklas Andersson April 7, 2016

Table 7: Low estimator, High estimator and a calculated 95% confidence interval for a Bermudan max call option on 2 asset, using both LSM and RRM method.

S0 Method Low High 95% CI Size CI ∆ Ratio LSM 8.044 (0.012) 8.075 (0.013) [8.020, 8.100] 0.080 90 RRM, Huber 8.062 (0.012) 8.077 (0.012) [8.039, 8.101] 0.063 2.1 RRM, Jonen 8.064 (0.012) 8.077 (0.012) [8.041, 8.101] 0.060 2.4 LSM 13.866 (0.0155) 13.918 (0.0163) [13.836, 13.950] 0.115 100 RRM, Huber 13.888 (0.0153) 13.917 (0.0157) [13.858, 13.948] 0.0903 1.7 RRM, Jonen 13.895 (0.0152) 13.921 (0.0155) [13.865, 13.952] 0.0861 2 LSM 21.289 (0.0180) 21.378 (0.0192) [21.253, 21.415] 0.162 110 RRM, Huber 21.313 (0.0179) 21.376 (0.0187) [21.278, 21.412] 0.134 1.4 RRM, Jonen 21.322 (0.0178) 21.374 (0.0184) [21.287, 21.410] 0.123 1.7

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd = 0.1 d = 1, 2, σd = 0.2 d = 1, 2, ρde = 0 ∀d 6= e. The number of paths in MC calculations are N0 = 400000, N1 = 1000000, N2 = 1500, N3 = 8000. Basis are according to (66). The transition points are α = 0.9 for Huber and α = 0.87 and β = 0.993 with Jonen’s loss function.

The values in Table 8 have been created without any seed set in the pseudo number generator. In this table I have priced a max call option on 5 underlying assets using the base in (68). We see a slight reduction in variance and ∆ ratios between 1.2 and 1.6 in this case.

Table 8: Low estimator, High estimator and a calculated 95% confidence interval for a Bermudan max call option on 5 asset, using both LSM and RRM method.

S0 Method Low High 95% CI Size CI ∆ Ratio LSM 27.398 (0.0329) 27.685 (0.0377) [27.334, 27.758] 0.425 90 RRM, Huber 27.483 (0.0316) 27.725 (0.0356) [27.422, 27.795] 0.373 1.2 RRM, Jonen 27.431 (0.0311) 27.680 (0.0359) [27.370, 27.750] 0.380 1.2 LSM 37.614 (0.0380) 38.131 (0.0479) [37.539, 38.225] 0.686 100 RRM, Huber 37.740 (0.0360) 38.055 (0.0437) [37.670, 38.141] 0.471 1.6 RRM, Jonen 37.660 (0.0352) 38.008 (0.0413) [37.591, 38.089] 0.499 1.5 LSM 49.116 (0.0416) 49.627 (0.0491) [49.034, 49.723] 0.689 110 RRM, Huber 49.180 (0.0399) 49.621 (0.0497) [49.102, 49.718] 0.617 1.2 RRM, Jonen 49.093 (0.0391) 49.461 (0.0453) [49.017, 49.549] 0.533 1.4

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd = 0.1 d = 1, ..., 5, σd = 0.08 + 0.08 · d d = 1, ..., 5, ρde = 0 ∀d 6= e. The number of paths in MC calculations are N0 = 400000, N1 = 1000000, N2 = 1500, N3 = 8000. Basis are according to (68). The transition points are α = 0.9 for Huber and α = 0.87 and β = 0.993 with Jonen’s loss function.

4.2 Quasi MC

To examine the effect of Quasi Monte Carlo I price a max call option on 2 asset with the same option parameters as in table 5 and valuate the option with the same paths that I use to determine the exercise strategy. As can be seen in Figure 11 the values calculated with Sobol numbers converge much faster and stays stable compared to when pseudo random numbers are used. It looks as if

31 Niklas Andersson April 7, 2016 the values calculated using Sobol numbers are stable already after 50000 paths. The value from the Binomial tree has also been plotted for comparison.

Figure 11: Bermudan max call option on 2 stocks, S0 = 100 for both stocks valuated using robust regression with both pseudo random numbers and Sobol numbers. The value calculated with the binomial tree is 13.90 for comparison. see Table 5 for other parameters.

The same striking result can be seen in Figure 12 and 13 where the lower bound for a max call option on 2 resp 3 assets has been determined using both Sobol numbers and pseudo random numbers but the same exercise strategy.

32 Niklas Andersson April 7, 2016

Figure 12: Convergence of Low estimator using least square regression with pseudo random num- bers and Sobol numbers. For Bermudan max call option on 2 stocks. S0 = 90 for both stocks, see Table 7 for other option parameters. N0 = 500000.

Figure 13: Convergence of Low estimator using least square regression with both pseudo random numbers and Sobol numbers. For Bermudan max call option on 3 stocks. S0 = 100 for both stocks, see Table 7 for other option parameters. N0 = 500000.

33 Niklas Andersson April 7, 2016

4.3 Importance Sampling

In Table 9 I have calculated the lower bound of an arithmetic average call option on 3 assets with expiry date of both 1 and 3 years using both importance sampling and no importance sampling. The option is an out-of-money (OTM) option with S0 set to 90 and strike price K set to 100. In the case of importance sampling I set the drift of the assets so that the option is in-the-money (ITM) after 4 time steps, thereafter the paths will evolve according to the ordinary geometric Brownian motion. To calculate the results I use the base

2 2 2 {1, s1, s2, s3, s1, s2, s3, (s1 + s2 + s3)/3}, (70) and use LSM method to determine the exercise strategy. The seed is set according to (57). The results show that we can calculate the lower bound with a much smaller standard error (SE) using importance sampling than when no importance sampling is used. The results also indicate that the importance sampling technique gives a more striking result the more unlikely the outcome is.

Table 9: Low estimator and standard error for an arithmetic average call option on 3 assets with maturity date 1 and 3 years.

Method T=1 T=3 Low estimator SE Low estimator SE Standard MC 0.47377 0.0019 1.0964 0.0034 Import. Samp 0.47302 0.00061 1.0961 0.0013 Binomial tree 0.47622 1.0910

Notes. Common option parameters are, exercise opportunities = 9, K=100, S0 = 90 for all assets, r=0.05, δd = 0.1 d = 1, 2, 3, σd = 0.2 d = 1, 2, 3, ρde = 0 ∀d 6= e. For the Binomial tree 90 time steps are used. The number of paths in MC calculations are N0 = 500000 and N1 = 1000000. Seed is set according to (57) and basis according to (70). Early exercise regions are calculated with the LSM. The drift when using importance sampling is set so that at-the-money is reached after 4 time steps.

I also create a convergence plot of the low estimator for the same option and using the same parameters as in the example above but without any seed set (the exercise strategy is the same in both cases). One can see that the low estimator seems to be too high compared to the true value created by the binomial tree. A more interesting result can be seen in Figure 15 where the standard error has been plotted. There is clearly a big difference between the case when importance sampling is used compared to when it is not.

34 Niklas Andersson April 7, 2016

Figure 14: Convergence of Low estimator using least square regression with and without importance sampling. For Bermudan arithmetic average option on 3 stocks. S0 = 90 for both stocks and T = 3. See Table 9 for other option parameters. N0 = 500000, no seed set but uses the same coefficients in both cases.

Figure 15: Standard error of Low estimator using least square regression with and without impor- tance sampling. For Bermudan arithmetic average option on 3 stocks. S0 = 90 for both stocks and T = 3. See Table 9 for other option parameters. N0 = 500000, no seed set but uses the same coefficients in both cases.

35 Niklas Andersson April 7, 2016

4.4 Combinations

As a final result I create a convergence plot of both the low- and high estimation of the price of an arithmetic average call option on 2 assets using the base

2 2 3 3 {1, s1, s2, s1, s2, s1, s2}. (71)

I do this using both the least square technique and the robust regression technique to determine the exercise strategy. Once the exercise strategy has been determined I calculate the lower and upper bound using the exercise strategy determined by least squares without any quasi numbers or importance sampling. I also calculate lower and upper bound using the exercise strategy from robust regression but then I add quasi numbers and importance sampling. Importance sampling is only used to calculate the lower bound and the drift is set so that in-the-money regions is reached after 4 time steps. The results can be seen in Figure 16 and it is clear that the results using robust regression give a much smaller interval than the least square regression, this can also be seen in Figure 17 where I plot the difference between the upper and lower bound for both cases. It is also clear that the the values calculated using Sobol numbers and importance sampling converges much faster than when pseudo random numbers are used.

Figure 16: Upper and lower bounds calculated for an arithmetic average call option on 2 assets using both LSM with pseudo random numbers and RRM combined with Sobol numbers and importance sampling to calculate the lower bound and Sobol numbers to calculate upper bound. The calculations with robust regression are made with Jonens loss function using α = 0.87 and β = 0.993. S0 = 90 for both stocks and T = 3. See Table 9 for other option parameters. N0 = 300000, N1 varies between 10000 and 500000, N2 between 100 and 500, N3 between 200 and 1500. Seed is set when calculating coefficients.

36 Niklas Andersson April 7, 2016

Figure 17: Difference between upper and lower bound from Figure 16.

5 Discussion

This project has considered the pricing procedure of multi-dimensional American-style options. Regression-based Monte Carlo techniques has been used to determine early exercise strategies and two different regression procedures to calculate continuation values has been evaluated. It appeared to be hard to find advantages or differences between the methods without the duality approach of Andersen and Broadie therefore this method was added. Using this method it was shown that the robust regression technique provide a higher lower bound and a tighter upper bound. The results show that we can get as much as 4 times smaller intervals using the robust regression technique.

A recurrent question when using regression-based Monte Carlo is how to choose the basis functions. This is a topic of its own and there are a lot of articles on this subject. However this is not something that this project has focused on more than to verify that the choice has a big impact on the result and therefore definitely something that should be taken into consideration in further work. Another problem that appears when using robust regression is how to choose loss function and transition points, in this project I have only considered Huber’s loss function and the loss function recommended by C.Jonen [4], but other loss functions should also be tested in future work. Regarding the transition points I mainly focused on the result from Figure 7 and used values of 0.87 or 0.90 for α and 0.993 for β, these values also coincide with the ones that C.Jonen uses in most of his numerical investigations.

Another factor that might impact the result from the regression step is whether or not to use ”only- in-the-money” paths once performing the regression. There are different opinions regarding this question, Longstaff and Schwarts [3] and C.Jonen [4] recommend using only in-the-money paths while P.Glasserman [2], recommends to use all paths. This is a question that should be investigated further since I found that the results might differ quite a lot depending on the choice. In Table 11 in the Appendix 6.2 I have replicated Table 9 from section 4.3 but using all paths in the regression

37 Niklas Andersson April 7, 2016 step and it can be seen that the result differs a lot. It might be that OTM options that are not very volatile are more sensitive to the choice of using only in-the-money-paths or not. The reason of using only in-the-money-paths in the regression is that the zero values will affect the regression so that we get inferior results, but it is also less time consuming to use only in-the-money-paths. In the case of an OTM option that is not very volatile there will be very few paths that are in the money and hence the regression will be based on a very small set of paths, therefore the results of using only in the money paths will differ more than if there are a lot of paths that are in the money. This is something that should be investigated further in order to make some conclusions regarding when to use only in-the-money-paths and when to use all paths.

In this project we have focused on Bermudan options. i.e. options that only can be exercised at a final set of exercise opportunities. The step to pricing American options is not very large since we can always just increase the number of exercise opportunities. However, if the exercise opportunities increase the number of evaluation steps also increases and for multi-dimensional options the memory demand gets very large. Therefore it might be recommended to work with dimensionality reduction techniques. For many options it is enough to only create values for one time step at a time, therefore the Brownian bridge technique should be implemented to reduce the amount of memory needed once evaluating these options.

The second part of this project focused on variance reduction and therefore the use of low dis- crepancy sequences was considered instead of pseudo random numbers. Due to the superiority in higher dimensions Sobol’s technique was chosen and investigated. It is problematic to calculate error estimates for Sobol’ sequences therefore it is hard to compare the method to pseudo random numbers but it can be seen in the convergence plots that the results using Sobol numbers converge much faster than when pseudo numbers are used. A drawback of low discrepancy techniques is that they lose their strength for higher dimensions, I have however not noticed this in my numerical tests and due to the striking improvement of convergence rate I really recommend this technique. Already implemented methods for creating Sobol numbers are usually available in most program- ming languages therefore it is also very easy to implement this technique into an already running option pricing system.

Importance sampling has also been considered in this project, this technique is not as easily implemented as Quasi Monte Carlo but requires a lot from the user. The method has the potential to reduce the variance significantly and can be especially useful when pricing path dependent options, like the barrier option. I spent a lot of time trying to make the importance sampling work together with the regression part until I finally realized that these methods are highly incompatible with each other. Using importance sampling before the regression often led to American options being valuated as European options because the calculated exercise strategies never suggested exercise in advance. Therefore, in this project, I only considered a change of drift such that the paths of an OTM option were driven into in-the-money regions and then I let the paths evolve according to the ordinary geometric Brownian motion. Using this strategy, importance sampling could be used together with the regression because the option will never be exercised until being in-the-money. If the paths reached in-the-money regions after about half of the total number of time steps the variance reduction was as largest. Later I realized that you can separate between determining the exercise strategy and valuating the option. In this case, importance sampling can be used in the valuation part but not for determining the exercise strategy. I did not have enough time to work further with the importance sampling but I think that if more efforts should be put on this subject it would be preferable to consider algorithms for solving the underlying stochastic optimization problem like C.Jonen[4] does with a great success.

I have not made any comparison of the computational time between the methods in this project, this is because a fair comparison is hard to accomplish. When robust regression is used the residuals need to be ordered (I have used Quicksort, O(nlogn)), so it is natural that the computational time

38 Niklas Andersson April 7, 2016 is larger with this method which I also have noticed in the numerical investigations. But since the results are more accurate using robust regression it is not a fair comparison. To make a fair comparison one could instead try and run the LSM with more basis functions than the RRM and compare the time for a certain level of ∆ ratio to be reached. Like I mentioned in the method section I implemented the robust regression method myself while I use an already built in method for least square regression. Therefore if an investigation of CPU time was considered one would have to use the robust regression implementation but with the loss function of ordinary least square (OLS) so that the implementation does not affect the result. To make a comparison between the CPU time of the implemented variance reduction techniques one could run the algorithms until a certain level of the standard error is reached and compare the results.

Another interesting topic for future work is the change of the underlying model. In this project I have assumed that the assets can be modeled through the geometric Brownian motion, where I have assumed a constant interest rate and volatility. It would be interesting to instead investigate the use of non constant volatility models, i.e. models where the volatility is a function of price and time. Examples of such models are the Heston model explained in [8] and the model explained by Tullie [7].

6 Appendix A1

6.1 Importance sampling example

Let’s say we want to estimate θ = P (X > 4) when X is a standard random variable X ∼ N (0, 1). The naive way to estimate θ is by generating X1, X2, ... XN from N (0, 1) and using ˆ 1 PN θ = N i=1 1{Xi>4} as an estimate where 1{Xi>4} is equal to 1 if Xi > 4 and zero otherwise. The problem using this naive approach is that it is very unlikely to get a Xi > 4 and we will have to use a large number of random numbers to get an estimate that is not zero. If we instead use importance N ˆ 1 P f(Xi) sampling we change the distribution that Xi are sampled from and use θIS = 1 as N {Xi>4} g(Xi) i=1 our estimator where Xi are sampled under g(x). Let’s use exponential tilting to find the probability density g(x) of this new distribution,

etxf(x) g(x) = . M(t) where M(t) is the Moment generating function, Z Z Z tx tx tx 1 −x2/2 1 tx−x2/2 M(t) = Ef [e ] = e f(x)dx = e √ e dx = √ e dx = 2π 2π R R R (72) Z 2 2 2 Z 1 −(x −2tx−t +t ) t2/2 1 −(x−t)2/2 t2/2 √ e 2 dx = e √ e dx = e , R 2π R 2π

2 where I have used R √1 e−(x−t) /2dx = 1. We can now construct g(x), R 2π

tx 1 −x2/2 tx e √ e 2 e f(x) 2π 1 −(x−t) g(x) = = =√ e 2 . (73) M(t) et2/2 2π

Hence we should sample X1, X2, ... XN from N (t, 1). The next step is to determine the optimal t. In the general case when we are trying to calculate P (x > a) the optimal t is such that Eg(Xi) = a,

39 Niklas Andersson April 7, 2016

where the subscript g meaning that Xi are sampled from the exponential tilted density g(x). Since we know that Eg[Xi] = t we can directly see that the optimal t is t = 4 for this particular problem.

We can now construct the importance sampling estimator as,

N 1 X f(Xi) θˆ = 1 IS N {Xi>4} g(X ) i=1 i where Xi are sampled from the density function g(x). In Table 10 are the results from using both the naive approach and importance sampling approach to estimate P (Xi > 4), where Xi ∼ N (0, 1). −5 These results can be compared to the real value of P (Xi > 4) = 3.1671 · 10 .

Table 10: The estimated value of θ = P (Xi > 4) using both naive Monte Carlo and importance sampling for N = 1000, 10000, 100000, 1000000 simulations.

Method N=1000 N=10000 N=100000 N=1000000 Naive 0 0 1.0000 · 10−5 2.9000 · 10−5 Importance sampling 2.9738 · 10−5 3.2881 · 10−5 3.1897 · 10−5 3.1655 · 10−5

6.2 Not only in-the-money paths

Table 11: (Not only in the money path) Low estimator and standard error for an arithmetic average call option on 3 assets with maturity date 1 and 3 years.

Method T=1 T=3 Low estimator SE Low estimator SE Standard MC 0.44693 0.0015 1.0746 0.003 Import. Samp 0.44762 0.00047 1.0719 0.0011 Binomial tree 0.47622 1.0910

Notes. Common option parameters are, exercise opportunities = 9, K=100, S0 = 90 for all assets, r=0.05, δd = 0.1 d = 1, 2, 3, σd = 0.2 d = 1, 2, 3, ρde = 0 ∀d 6= e. For the Binomial tree 90 time steps are used. The number of paths in MC calculations are N0 = 500000 and N1 = 1000000. Seed is set according to (57). Basis are according to (70). Early exercise regions are calculated with the LSM. The drift when using importance sampling is set so that at-the-money is reached after 4 time steps.(Not only in-the-money paths are used in the regression step)

References

[1] John C. Hull, OPTIONS, FUTURES, AND OTHER DERIVATIVES, Pearson Education, 8th edition, 2012. [2] Paul Glasserman, Monte Carlo Methods in Financial Engineering, Springer Science, 2003.

40 Niklas Andersson April 7, 2016

[3] Francis A. Longstaff, Eduardo S. Schwartz, Valuating American Options by Simulation: A Simple Least-Squares Approach, The Review of Financial Studies, Vol 14 No.1,pp. 113-147, 2001. [4] Christian Jonen, Efficient Pricing of High-Dimensional American-Style Derivatives: A Robust Regression Monte Carlo Method, PhD dissertation University K¨oln, 2011.

[5] Peter J¨ackel, Monte Carlo Methods in Finance, John Wiley & Sons, 2002. [6] P. Boyle, J. Evinine, S. Gibbs, Numerical Evaluation of Multivariate Contingent Claims, The review of Financial Studies Vol 2 number 2 pp. 241-250, 1989. [7] Tracey Andrew Tullie, Variance Reduction for Monte Carlo Simulation of European, American or Barrier Options in a Environment., North Carolina State University, 2002. [8] Steven L. Heston, A closed- Form Solution for Options with Stochastic Volatility with Ap- plications to Bond and Currency Options, The review of Financial Studies Vol. 6 No. 2 pp. 327-343, 1993.

[9] Leif Andersen, Mark Broadie, Primal-Dual Simulation Algorithm for Pricing Multidimen- sional American Options, Management Science Vol. 50 No.9 pp. 1222-1234, 2004. [10] Mark Broadie, Paul Glasserman, Pricing American-style securities using simulation, Journal of Economic Dynamics and Control vol. 21 pp. 1323-1352, 1997.

[11] Espen Gardeer Haug, The Complete Guide To Option Pricing Formulas, McGraw-Hill Pro- fessional, Second Edition, 2007. [12] d-fine GmbH, Valuation of American Basket Options using Quasi-Monte Carlo Methods, MSc in Mathematical Finance Trinity Term 2009, University of Oxford

[13] Tomas Bj¨ork, Arbitrage Theory in Continuous Time, Oxford University Press, third edition, 2009 [14] L.C.G Rogers, Monte Carlo valuation of American options, Mathematical Finance 12:271-286 2002

41