Delft University of Technology Faculty of Electrical Engineering, Mathematics and Computer Science Delft Institute of Applied Mathematics
Application of State Space Hidden Markov Models to the approximation of (embedded) option prices
A thesis submitted to the Delft Institute of Applied Mathematics in partial fulfillment of the requirements
for the degree
MASTER OF SCIENCE in APPLIED MATHEMATICS
by
Josephine Alberts Delft, the Netherlands November 2016
Copyright © 2016 by Josephine Alberts. All rights reserved.
MSc THESIS APPLIED MATHEMATICS
“Application of State Space Hidden Markov Models to the approximation of (embedded) option prices”
Josephine Alberts
Delft University of Technology
Daily Supervisors Responsible Professor Prof. Dr. Ir. C.W. Oosterlee Prof. Dr. Ir. C.W. Oosterlee
Dr. Ir. L.A. Grzelak Other thesis committee members Ir. S.N. Singor Dr. P. Cirillo
November 2016 Delft, the Netherlands
Acknowledgments
This thesis has been submitted for the degree Master of Science in Applied Mathematics at Delft University of Technology. The responsible professor is Kees Oosterlee, professor at the Numerical Analysis group of Delft Institute of Applied Mathematics. Research for this project was carried out at Ortec Finance, under the supervision of Stefan Singor. Ortec Finance is a company aiming to improve investment decision-making by providing consistent solutions for risk and return management through a combination of market knowledge, mathematical models and information technology. First of all, I would like to thank Kees Oosterlee, Stefan Singor and Lech Grzelak for their close involvement in this project and their valuable advice. I would also like to thank Pasquale Cirillo for being part of the examination committee. Furthermore I would like to thank my colleagues at Ortec Finance for providing a pleasant and inspiring working environment. Lastly, I would like to thank my family and friends for all their encouragement and moral support over the whole duration of my studies.
1 Abstract
This thesis discusses dimension reduction of the risk drivers that determine embedded option values by using the class of State Space Hidden Markov Models. As embedded options are typically valued by nested Monte Carlo simulations, this dimension reduction leads to a major reduction in computing time. This is especially important for insurance companies that are dealing with many embedded option valuations in order to determine the market value of their liabilities. To achieve the dimension reduction of the risk driver process, this thesis proposes a specific Hidden Markov Model approach. An overview on current methods for state and parameter inference within this class of models is presented. For the state-of-the-art CPF- SAEM method insights are obtained by investigating an example of the dimension reduction model. Furthermore, the satisfactory behavior of this HMM approach is investigated in more detail for multiple (market) cases. Lastly, the dimension reduction model is applied to calibration of the Heston model parameters to market data. It is shown that this approach avoids overfitting issues and results in a more stable model than direct calibration of the parameters.
2 Contents
1 Introduction 5 1.1 General setting ...... 5 1.2 Research objectives ...... 8 1.3 Organization of the report ...... 9
2 Overview of State Space Hidden Markov Models 10 2.1 Introduction to Hidden Markov Models ...... 10 2.2 State inference ...... 12 2.3 Combined state and parameter inference ...... 20
3 Dimension reduction in option valuation models by a HMM approach 26 3.1 Model description ...... 26 3.2 Black-Scholes example ...... 27 3.3 Benchmark: Kalman Filter within the EM framework ...... 28 3.4 Solving the BS example with the CPF-SAEM method ...... 31 3.5 Influence of the underlying HMM ...... 42 3.6 Conclusions ...... 45
4 Test cases for the HMM approach 46 4.1 Non-linear example ...... 46 4.2 Extensive example: Heston model ...... 47 4.3 Market example: basket of S&P-500 index options ...... 49 4.4 Conclusions ...... 52
5 Application to reduction of overfitting in the Heston model 53 5.1 Calibration of the Heston model ...... 53 5.2 Overfitting ...... 56 5.3 Hidden Markov Model approach ...... 58 5.4 Out-of-sample testing ...... 64 5.5 Conclusions ...... 65
6 Conclusions 67 6.1 Summary and conclusions ...... 67 6.2 Future research ...... 69
References 70
A Conditioning on the particle with highest weight 75
3 B The Unscented Kalman Filter 76
C Correlation matrices for the risk drivers in Section 4.3 79
D Market Data for the Heston Calibration in Chapter 5 80
E Alternative conditionings for the second out-of-sample test in Section 5.4 81
4 CHAPTER 1
Introduction
1.1 General setting
Asset and Liability Management
Asset and liability management (ALM) plays an important role in the strategic decision making of liability driven companies, such as insurers, pension funds, housing corporations and banks. It refers to the practice of managing the risks faced by a company that arise due to a mis- match between assets and liabilities [68]. Within ALM the maximum allowable risk with respect to the objectives and constraints of the stakeholders is determined by analyzing the balance sheet. Thereafter, it helps specifying policies which provide optimal returns given that maxi- mum risk. ALM models are used as guidance to determine for example contribution, premium, indexation and investment policies. Besides this, the models also need to provide insight and transparency for regulating authorities and for other stakeholders, especially after the financial crisis of 2008. In most ALM problems a lot of different stakeholders are involved. In a pension plan the stakeholders are for example the sponsor, employees and beneficiaries (retired and non-active members of the plan). An insurance company has to consider for example policy holders and shareholders. Both also need to take into consideration indirect stakeholders such as regulators, government and accountants. This wide variety of stakeholders can have conflicting interests and requirements. For example, for shareholders it is important to have stable and high returns on their invested equity. However, investing in risky assets which provide higher expected returns implies more solvency risks and this is not allowed by the regulator [69]. In practice ALM problems are approached with scenario analysis, in which external uncertainties are modeled by a set of possible plausible future developments, called scenarios. The external uncertainties concern both future development of economic variables such as interest rates, risk premiums of equity and inflation, and the development of non-economical variables such as coverage ratio and the size and composition of the group of policyholders. The scenarios are constructed to capture as many stylized facts of the market as possible, based on historical data and assumptions (market models and expert views). These scenarios form the input for an ALM- model which determines scores on the required ALM-criteria with respect to the objectives and
5 constraints that the management of the company has set. In Figure 1.1 a visualization of the scenario approach for ALM problems is given. We refer to recent Ortec Finance papers for a complete overview [59] and the relevance [60] of this scenario approach. In the comprehensive handbook [68, 69] more information about general ALM techniques can be found.
Figure 1.1: ALM approach by scenario analysis, adapted from [69]
Valuation of embedded options
Following the financial crisis of 2008 new regulatory frameworks and accounting standards (e.g. Solvency II) were introduced. Insurance companies and pension funds are now obliged to value their liabilities at market value instead of at book value (which means that future cash flows were simply discounted with a fixed interest rate) [62]. Especially for insurance companies, asset and liability management has become much more complicated because they need to determine the amount of capital they have to hold against unforseen losses. A difficult aspect of this calculation is the market valuation of so-called embedded options. An embedded option is build into the structure of a financial security and it gives one of the parties the right, but not the obligation, to exercise some action by a certain date on terms that are established in advance. These options typically have a long term contract duration and are very sensitive to interest rates. An example is a policy conversion option that gives the insurance policyholder the right to convert from the current policy to another at pre-specified conditions [55]. Ortec Finance has developed an advanced simulation framework in which these complicated insurance liabilities can be modeled and many other questions concerning investment decisions can be answered. The value of an embedded option calculated by Ortec Finance is denoted by
V (r1, . . . , rn).
In their valuation model, the price thus depends on n economical and non-economical variables, the so-called risk drivers [51]. To value an embedded option on time step t > 0, real world scenarios are generated based on assumed distributions for all of these risk drivers under the real
6 world measure P. These distributions correspond to appropriate time-series models for specific variables, for example a Hull-White model for modeling interest rates. As mentioned before, real world scenarios are thus instances of all of the risk drivers (ˆr1,..., rˆn)t at each time step t > 0. Since the valuation function V (ˆr1,..., rˆn) is typically not known in closed form, multiple Monte Carlo simulations have to be generated under the risk neutral measure Q in order to determine the option value. Note that these so-called risk neutral scenarios have to be calculated for every real world scenario at each time step. This leads to time-consuming nested Monte Carlo simulations (see Figure 1.2). To reduce computing times for the clients of Ortec Finance, we would like to reduce the dimension of the risk driver process in the option valuation model. This will lead to a major reduction of the number of real world scenarios required for the option valuation, and therefore to an even greater reduction of the number of risk neutral scenarios. For example, if we have 5 risk drivers determining the option price and we want to generate scenarios with 3 possible realizations for each of the risk drivers, this will result in 35 = 243 different real world scenarios. If we could approximate the option price using only 2 risk drivers, it would require only 32 = 9 real world scenarios to obtain 3 possible realizations of each of the risk drivers. If every real world scenario leads to 10 risk neutral scenarios, we would obtain a reduction of 2340 risk neutral Monte Carlo runs on every time step for this small example. In practice, a risk neutral validation consists of tens of thousands scenarios for thousands of real world scenarios on multiple time points. We will obtain the dimensions reduction by assuming that the n-dimensional risk driver process is driven by some lower dimensional hidden process which captures the most important proper- ties of the risk drivers. This hidden process needs to be inferred from relevant option market instruments that are observed in the market. We will make use of the class of State Space Hidden Markov Models, which provides a general modeling framework and is used in a broad range of applications. Besides the dimension reduction purpose, we will investigate if this Hidden Markov Model approach will lead to more stable out-of-sample option valuations compared to regular calibration of the model parameters. In other words, we will analyze whether this approach avoids overfitting issues and performs well on unseen data.
Figure 1.2: Visualization of the real world (dark blue) and risk neutral (light blue) scenarios
7 1.2 Research objectives
The aim of this thesis is to investigate dimension reduction of the real world risk driver process determining option values in an risk neutral option valuation model V (r1, . . . , rn). We note that for this purpose we can not rely on standard dimension reduction techniques, such as the well-known Principal Component Analysis [67]. The drawback of these methods is that they only reduce the dimension of the data matrix of the risk driver process at each time step, which makes it unclear how to compute the option value. A method is necessary that also takes the transformation from the lower dimensional process to the option prices into account. The class of State Space Hidden Markov Models fulfills this requirement. Another major advantage of this class of models is that within the model itself we already assume a transition distribution f for the hidden states that drive the risk drivers. Therefore, we do not have to separately calibrate a time-series model to the estimated hidden states in order to generate real world scenarios. We can simply take the estimated states and model parameters and sample realizations for the hidden state process according to this transition distribution f, see Figure 1.3.
Figure 1.3: Visualization of the scenario generation of the hidden states: model parameters θ and hidden states Xt are estimated from historical option prices Vt
The objectives of this thesis are as follows: Gain insight in the class of State Space Hidden Markov Models and make an inventory of existing methods for state and parameter inference within these models. Propose and test a Hidden Markov Model for dimension reduction of the risk driver process in option valuation models. Apply this approach to reduce the problem of overfitting within the Heston model. Since it is complicated to construct an example where we calculate the price of some embedded option in the nested simulation framework of Ortec Finance, we restrict the analysis in this thesis to calculating values of European call and put options. Note that this means that we replace the risk neutral Monte Carlo simulations based on an instance (ˆr1,..., rˆn)t of all risk drivers (the gray square in Figure 1.2) by simply calculating for example the Black-Scholes formulas for this instance. The key point is that we need a realization of all risk drivers in order to determine the option value (either by risk neutral MC simulations or in closed form). The aim of this thesis is to reduce the dimension of this risk driver process.
8 A relationship between European options and embedded options can be found for example in unit-linked life insurance products. A unit-linked life insurance product is a contract between a policy holder and an insurance company. The policy holder will pay either a regular premium or a lump sum which will be invested by the insurance company. The insurance company promises to pay out a guaranteed amount when the contract expires. This guaranteed return is an example of an embedded option and can be seen as a put option written by the insurance company. If the fund value is below the guaranteed value at expiration, the option will be ‘in-the-money’ and the insurance company will have to settle the difference. A very crude approximation of the price of such an option can be given by the price of some European put option, although in practice these guarantee options are valued by risk neutral Monte Carlo simulations [51].
1.3 Organization of the report
In Chapter 2 we will first give a general introduction on Hidden Markov Models and Bayesian inference. After this, we will present an overview of existing methods for state inference, with a special focus on Particle Filters. Then, we will show how to use these methods within a framework for combined state and parameter estimation, which leads to the recent CPF-SAEM method. In Chapter 3 we present a new approach for dimension reduction in risk neutral option valuation models by defining a specific Hidden Markov Model. We present an example of this model and test the CPF-SAEM method for state and parameter inference on this example. We continue the examination of our Hidden Markov Model approach in Chapter 4, where we test cases for which we expect convergence difficulties. Besides this, we investigate the estimated hidden states and error distribution of a market example. In Chapter 5 we show that direct calibration of the Heston model parameters leads to overfitting. We then apply our Hidden Markov Model approach to this Heston calibration and show that this results in a more stable model. We finish this chapter by performing two out-of-sample tests. Lastly, Chapter 6 contains the overall conclusions of this thesis, as well as recommendations for future research.
9 CHAPTER 2
Overview of State Space Hidden Markov Models
In this chapter a (non-exhaustive) overview of methods available for inference in State Space Hidden Markov Models is presented. For clarification purposes only a brief overview of the most important methods is given. We especially elaborate on the famous Kalman Filter and on methods relevant for the understanding of the sophisticated CPF-SAEM algorithm introduced in 2013 in [45]. In Section 2.1 a general introduction on Hidden Markov Models and Bayesian inference is pre- sented. Section 2.2 describes the most important algorithms on state inference in these models. We show how to use these state inference methods in frameworks for combined state and param- eter inference in Section 2.3.
2.1 Introduction to Hidden Markov Models
Hidden Markov Models provide a general and flexible framework for modelling time-series in a broad range of applications. Examples of application areas are financial mathematics, machine learning, telecommunication, gene prediction and speech recognition. A broad and thorough introduction on the field can be found in the books of Capp´e[10] and S¨arkk¨a[56]. Let (Ω, F,P ) be a probability space where Ω represents the space of all possible states in the real world financial market and P is the physical (or real world) probability measure. By the filtration {Ft}t≥0 ⊆ F we represent all information available up to time t. All stochastic pro- cesses described in this thesis are defined on this probability space. We start with recalling the definitions of the Markov property and a Markov process. Then we provide the definition of a Hidden Markov Model as given in tutorial [27].
Definition 2.1 (Markov property [9]). Let (Ω, F,P ) be a probability space with a filtration {Fn}n≥1 ⊆ F. An X -valued stochastic process {Xn}n≥1 adapted to the filtration satisfies the Markov property with respect to {Fn}n≥1 if
P (Xn ∈ A | Fs) = P (Xn ∈ A | Xs) for each A ∈ X and for each s < n.
10 Definition 2.2 (Markov process [9]). A Markov process is a stochastic process that satisfies the Markov property with respect to its natural filtration.
Definition 2.3 (Hidden Markov Model [27]). Consider an X -valued discrete-time Markov pro- cess {Xn}n≥1 such that
X1 ∼ µ(x1) and Xn | (Xn−1 = xn−1) ∼ f (xn | xn−1) , (2.1)
0 where Xn is the state of the model at time n, µ(x) is a probability density function and f(x | x ) denotes the transition probability density associated with moving from x0 to x. We are interested in {Xn}n≥1 but can only observe an Y-valued process {Yn}n≥1. Given {Xn}n≥1, the observations {Yn}n≥1 are statistically independent and their marginal densities are given by
Yn | (Xn = xn) ∼ g (yn | xn) , (2.2) where g(y | x) denotes the observation probability density. Models compatible with (2.1)-(2.2) are called Hidden Markov Models (HMM) or general state-space models.
In Figure 2.1 we show the dependence structure of a HMM graphically. The observations {Yn}n≥1 can for example represent the observed value of embedded option(s) at time n.
Figure 2.1: Graphical representation of the dependence structure of a Hidden Markov Model.
We now want to estimate the hidden states x1:T = (x1, . . . , xT ) from the observed measurements y1:T = (y1, . . . , yT ). This means, in Bayesian sense, that we want to compute the joint posterior distribution of all states given all observations. To achieve this, note that the Hidden Markov Model given by (2.1) and (2.2) can be analyzed by using Bayesian techniques, where the joint prior distribution is given by
T Y p(x1:T ) = µ(x1) f (xn | xn−1) , (2.3) n=2 and joint likelihood function by
T Y p (y1:T | x1:T ) = g (yn | xn) . (2.4) n=1
11 Now the posterior distribution can be calculated by a straightforward application of Bayes’ theorem and equations (2.3)-(2.4)
p(x1:T , y1:T ) p (y1:T | x1:T ) p(x1:T ) p(x1:T | y1:T ) = = R p(y1:T ) p(x1:T , y1:T )dx1:T T T Q Q (2.5) µ(x1) g (yn | xn) f (xn | xn−1) n=1 n=2 = R , p(x1:T , y1:T )dx1:T where p(y1:T ) can be seen as a normalizing constant. In a few special cases it is possible to calculate the posterior (2.5) in closed-form. However, for most non-linear non-Gaussian models this is not possible and we have to rely on numerical methods to estimate it. In the next section we will investigate techniques to sample from this posterior distribution and its marginals.
2.2 State inference
Within state inference we can distinguish between the optimal filtering and smoothing problems. Filtering means estimating the underlying hidden states up till time n, given the observations up till time n. It can refer to sequentially estimating the joint distributions {p (x1:n | y1:n)}n≥1, alternatively in some literature the term is used to describe estimation of the marginal distri- butions {p (xn | y1:n)}n≥1. In this thesis we will state explicitly which filtering distribution we refer to. Smoothing means using future observations when estimating distributions at a cer- tain time, i.e. using filtering techniques to sequentially estimate the marginals {p (xk | y1:n)} where k ≤ n. In general, smoothing is computationally more challenging but leads to smoother trajectory estimates than filtering.
We can recursively compute the filter distributions p (x1:n | y1:n) and p (xn | y1:n) of the HMM defined by (2.1) and (2.2) [11]. Since we know that
p (x1:n, y1:n) = p (x1:n−1, y1:n−1) f (xn | xn−1) g (yn | xn) we can consequently calculate the posterior by Bayes’ theorem and the Markov property of the HMM with the following recursion
f (xn | xn−1) g (yn | xn) p (x1:n | y1:n) = p (x1:n−1 | y1:n−1) , p (yn | y1:n−1) where by the total law of probability and the Markov property Z p (yn | y1:n−1) = p (xn−1 | y1:n−1) f (xn | xn−1) g (yn | xn) dxn−1:n.
By integrating out x1:n−1 the recursion satisfied by the marginal filter distribution p (xn | y1:n) can be obtained: g (yn | xn) p (xn | y1:n) = p (xn | y1:n−1) , p (yn | y1:n−1) where Z p (xn|y1:n−1) = f (xn | xn−1) p (xn−1 | y1:n−1) dxn−1
12 is called the Chapman-Kolmogorov equation [56, 27]. These recursion formulas explain the sequential approach of all filtering and smoothing methods. The history of state inference in Hidden Markov Models starts from the Wiener Filter in 1950 [66], which is later shown to be a limiting special case of the well-known Kalman Filter. See Figure 2.2 for an overview of the most important methods for estimating filtering and smoothing distributions.
Figure 2.2: Overview of methods for state inference in Hidden Markov Models
The Kalman Filter
The wide-spread Kalman Filter (KF), also known as the Kalman-Bucy Filter, was first introduced and partially developed in the historical papers of Kalman and Bucy in 1960 [40, 41]. Due to its great importance in engineering and econometrics applications, numerous literature studies can be found on the filter. See for example [13] for a recent discussion on the mathematical theory, computational algorithms and applications of the Kalman filter. An early overview on the filter and its derivation can be found in [4]. For the special case that the state space model is linear and Gaussian the KF gives an exact numerical evaluation of the underlying hidden states. It is a recursive estimator that first predicts the state estimator from the estimate in the previous timestep. Then, it combines this prediction with the current observation to refine the state estimate by calculating a weighted average. The estimates are chosen in such a way that the mean-squared error is minimized. Although the original derivation of the KF was based on the least squares approach, it is also possible to obtain its equations by a pure probabilistic Bayesian analysis [56]. Mathematically, the Kalman Filter gives an exact solution to the Hidden Markov Model where the hidden states depend linearly on the previous states and the observations depend linearly on
13 the current states, both with some additive noise. This system is given by
xn = An−1xn−1 + Rn−1Un−1,
yn = Bnxn + SnVn, where Un,Vn ∼ N(0,I) and x0 ∼ N(0, Σ0) are all uncorrelated. The state transition matrix An−1, the measurement transition matrix Bn, the square-root of the state process noise covari- ance Rn−1 and the square-root of the measurement noise covariance Sn are all known matrices with appropriate dimensions. Note that this model defines a special case of the general state space model in equation (2.1) n m and (2.2) where X = R , Y = R , µ(x1) = N(0, Σ0) and