Amsterdam School of Economics Faculty of Economics and Business

Managing Risk with a Dynamic SABR Model

MSc in Econometrics Financial Econometrics track

Frank de Zwart 10204245

supervised by Dr. S.A. Broda

and supervisor at Abn Amro Ms Hiltje Bijkersma

July 28, 2017

ABN AMRO Bank N.V. CRM | Regulatory Risk | Model Validation Frank de Zwart Abn Amro Model Validation

Statement of Originality

This document is written by Student Frank de Zwart who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents. Frank de Zwart Abn Amro Model Validation

Abstract

This thesis focuses on models that can be used to estimate risk measures, like Value at Risk and Expected Shortfall. The displaced Black’s model and the displaced SABR model are used to price a portfolio of . The aim here is to capture the dynamics of the SABR parameters in a model to obtain more accurate swaption risk estimates. Hence, this time series model is used to simulate the one-day-ahead profit and loss distribution and is then compared to the Historical Simulation method. In an empirical study, we compute the Value at Risk and Expected Shortfall estimates based on the Historical Simulation method as well as the time series model. These models are analyzed with several backtests and diagnostic tests to be able to answer the following research question. Can one outperform the Historical Simulation Value at Risk and Expected Shortfall forecasts by fitting a time series model to the calibrated SABR model parameters instead? A vector is used as well as a local level model. Based on these two models we are not able to outdo the Historical Simulation estimates of the risk measures. Diagnostic tests show remaining significant autocorrelation as well as heterogeneity in the residuals of the vector autoregressive model. Also the backtests that are carried out show that the vector autoregressive model performs worse than the Historical Simulation method. Frank de Zwart Abn Amro Model Validation

Contents

1 Introduction 1

2 Preliminaries on financial notation2 2.1 instruments...... 2 2.2 Bootstrapping the zero curve...... 3 2.3 Swaptions...... 6 2.4 Martingales and Measures...... 7

3 Literature review 9

4 Models and method 12 4.1 pricing models...... 12 4.2 Time series analysis...... 16 4.3 Risk measurement...... 18 4.4 Backtests...... 20

5 Data 26 5.1 Calculating the implied volatilities...... 27 5.2 Leaving out of some strikes...... 28

6 Empirical study and results 29 6.1 Calibrating the SABR model parameters...... 29 6.2 Fitting a model through the SABR parameters time series...... 32 6.3 Risk measurement...... 35 6.4 Backtests...... 38 6.5 Robustness check: Local level model...... 42

7 Conclusion 43

References 45

A Appendix 47 Frank de Zwart Abn Amro Model Validation

1 Introduction

The Basel Committee(2013) has introduced the Fundamental Review of the Trading Book (FRTB). To contribute to a more resilient banking sector, they have decided to change the current framework’s reliance on Value at Risk (VaR) to the Expected Shortfall (ES) measure to estimate market risk. On the other hand Pérignon and Smith(2010) state that most banks use Historical Simulation (HS) to estimate their VaR. This Historical Simulation method computes the VaR by using past returns of the portfolio’s present assets so that one obtains a distribution of price changes that would have realized, had the current portfolio been held throughout the observation period. The decision described in the FRTB shows that it is getting even more important for financial institutions to estimate their market risk accurately. However, we also see that a relatively simple method is still used to obtain these risk measures. One of the main drawbacks of this Historical Simulation method is that it does not take the decreasing predictability of older returns into account. Derivatives are traded extensively these days, and one of these products is a option, or swaption. A swaption is an option on an . Swaptions are traded over-the-counter, so compared to derivatives that are traded on an exchange, the information is more scarce and not publicly available. This makes it an interesting challenge to find an accurate method to assess the risk of holding these derivatives. Besides this, the negative interest rates also affect almost all the valuation methods for these options. In the current interest rate environment, the Historical Simulation method that is used to produce the VaR and ES estimates of market risk may not be reliable. Hence, finding a method to get more reliable estimates for the VaR and ES, based on historical swaption data, is of interest. This leads to the following research question, which defines the main purpose of this thesis. Can one outperform the Historical Simulation Value at Risk and Expected Shortfall forecasts by fitting a time series model to the calibrated SABR model parameters instead? An empirical study is performed to be able to answer this question. This study is based on an ICAP data set of swaption premiums, interest rate deposits, and interest rate swaps. The time series, with a time grid of approximately 2.5 years, of the displaced SABR volatility model parameters will be analyzed to obtain an one-day-ahead forecast of the price of a portfolio of swaptions. Finally, a backtesting procedure will assess the quality of this new method compared to the well-known Historical Simulation method. The remainder of this research report is structured as follows. First in Section2, the necessary background theory for this research will be discussed. This includes theory on interest rate instruments in general as well as a description of an interpolation method called bootstrapping to obtain the zero and discount curves. We will then give a description of a swaption and some of its relevant trading strategies are discussed. This section will be concluded with the a description of martingales and measures. Then we will briefly discuss the relevant literature for this research in Section3 and subsequently we will continue in Section4 with the theory that is used to price the swaptions. In this section, the well-known model of Black(1976) is described in detail. Besides this, we will focus on the SABR volatility model of Hagan et al.(2002) and the correction to their work from Obłój(2008). We will then discuss the implications of negative interest rates on these models. In Section 4.2, some basic time series models that are used in this research will be discussed. We will then continue with the risk measurement concepts, like Value at Risk and Expected Shortfall. Finally, several different backtests will be elaborated. Different backtests are used to be able to assess the quality of our model estimates as thoroughly as possible. The data set will be described in Section5. Not only the raw data will be described, but also the different pre-processing techniques will be explained. This section also contains information on some of the limitations and

1 Frank de Zwart Abn Amro Model Validation argumentation on why some adjustments are made. In the next section, Section6, the empirical study and results are described. This section follows the structure of Section4 and starts with the calibrated SABR parameters and continues with the time series analysis, risk measurement, and concludes with the backtesting procedure. There are also some diagnostic tests carried out, besides the backtests themselves, to assess the quality of the fit of the time series analysis. The results will be elaborated and discussed in every single step. Finally in Section7, the main findings are summarized and a conclusion is drawn. The research question will be answered and some limitations and recommendations for further research will be provided.

2 Preliminaries on financial notation

Trading in derivatives has become an indispensable part of the financial industry. There are multiple different derivatives for every type of investment asset. The magnitude of this market shows that it is of great importance to have an understanding of how these derivatives work. Consequently a lot of researchers have focused on all these derivatives. Numerous papers and books describe how derivatives work and what risks the holder of an open position in them is taking. We will first explain some basic but crucial concepts of interest rate instruments. Then in Section4, we will describe the models and methods that are applied in the empirical analysis of this research.

2.1 Interest rate instruments

Interest rates are crucial in the valuation of derivatives. Especially the ’risk-free’ rate is of concern when evaluating derivatives. Hull(2012) explains that the interest rates implied by Treasury bills are artificially low because of a favorable tax treatment and other regulations. For this reason the LIBOR rate became commonly used instead. However, when the rates ascended during the crisis in 2007, many derivatives dealers had to review their practices. The LIBOR rate is the short-term opportunity cost of capital of AA-rated financial institutions and it lifted off due to unwillingness of banks to lend to each other during the crisis. Many dealers have switched now to using overnight indexed swaps (OIS), because they are closer to being ’risk-free’. This research focuses on the Euro market and makes use of the Euribor rate. The Euribor rate is similar to the LIBOR rate, however is only based on a panel of European banks. The rates at which they borrow funds from one another is the Euro Interbank Offered Rate (Euribor). Although the Euribor rate is not theoretically risk-free, it still is considered a good alternative against which to measure the risk and return trade off. There is one very important assumption that makes the risk-free rate even more crucial. This is known as the assumption of a risk-neutral world. In a risk-neutral world is assumed that all investors are risk-neutral. In other words, they do not require a higher expected return from an investment that is more risky.

Theorem 2.1. This leads to the following two characteristics of a risk-neutral world (Hull, 2012):

1. The expected return on an investment is the risk-free rate.

2. The discount rate used for the expected payoff on a financial instrument is the risk-free rate.

This makes the pricing of derivatives much more simple. The world is not actually risk-neutral, however it can be shown that if we compute the price of a under the risk-neutral world

2 Frank de Zwart Abn Amro Model Validation assumption, we obtain the correct price for the derivative in all worlds. This makes a significant difference, because there is still a lot unknown about the risk preferences of buyers and sellers of derivatives. The main focus of this research is on swaptions. The underlying of this product is an interest rate swap and therefore this derivative will first be discussed. However before the swaps are considered, we briefly discuss forward rate agreements. These agreements give an insight in how we can price a swap. We will then continue with an interpolation method, known as bootstrapping, that is used to obtain the zero curve. Then the swaption will be explained together with some of its most common trading strategies. Finally we will continue with theory about martingales and measures. These measures are used to compute the discounted expected value of a certain future payoff. A (FRA) is an agreement defined to ensure that a certain interest rate will apply to either borrowing or lending a certain principal during a specified future period of time. We define RK as the interest rate agreed to in the FRA and define RF as the forward of the reference rate at time Tα for the period between times Tα and Tβ. We denote the value of a FRA at time t, where RK is received as

VF RA(t) = L(RK − RF )(Tβ − Tα)P (t, Tβ), (2.1) where L is the principal of the FRA and P(t,T) is the present value at time t of 1 Euro received at time T (Brigo and Mercurio, 2007). The forward interest rate that is used in the FRA’s, is implied by current zero rates for periods of time in the future. A n-year zero rate is the rate of interest earned on an investment that starts today and lasts for n years. All the interest and principal is realized at the end of n years. A curve of zero rates can be created from market quotes by using a popular interpolation method known as the bootstrap method. This method will be described in more detail in Section 2.2. A fixed-for-floating swap, also known as a payer swap, is the most common type of swap. In this swap an investor agrees to pay interest at a predetermined fixed rate on a notional principal for a predetermined number of years. In return it receives interest at a floating rate on the same notional principal for the same period of time. A swap can be characterized as a portfolio of forward rate agreements and this can be used to determine its value. The value of the swap is simply the sum of multiple FRA’s, so we find that the value of a payer swap is given by

β X Vswap(t) = L (RK − RFi )(Ti − Ti−1)P (t, Ti), (2.2) i=α+1 where the length of the swap Tβ − Tα is called the tenor with n years between Tα and Tβ with each m cash flows per year. Throughout this entire paper we will denote m as the swap payment frequency per annum. So in total we have n × m cash flows, which can be valued like FRA’s. This leads to the sum as shown in (2.2), which sums in total over n × m different cash flows (Brigo and Mercurio, 2007).

2.2 Bootstrapping the zero curve

Only spot rates are quoted in the market, so the bootstrapping method is used to obtain the forward rates and forward swap rates. This method works by incrementally computing zero-coupon bonds in order of increasing maturity. As curve inputs first multiple market quotes based on the Euribor rate are used. More precisely, interest rate deposits with different maturities varying from overnight up to 3 weeks are used. To expand

3 Frank de Zwart Abn Amro Model Validation

the time grid out to 30 years, swaps are used with a maturity varying from 1 month up to 30 years. The interpolation over these market quotes will give us all zero-coupon prices, which we can use to compute the forward rates and the forward swap rates. Uri(2000) lists the different payment frequencies, compounding frequencies and day count conven- tions, as applicable to each currency specific interest rate type. The conventions for the Euro rates are used for this research, namely for Euro deposit rates the day count convention of ACT/360 and for Euro swap rates the day count convention of 30/360, respectively. The deposit rates that are used for the time grid of the swap curve up to 3 weeks are inherently zero- coupon rates. For this reason they only need to be converted to the base rate compounding frequency and day count convention. The day count convention of the deposit is ACT/360, so we can directly interpolate the data points to obtain the first part of the zero curve. For the middle part of the curve one could use market quotes of forward rate agreements like described by Uri(2000). This can be preferable, because they carry a fixed time horizon to settlement and settle at maturity. However the FRA’s can lack liquidity, which results in inaccurate market quotes. For this reason only swaps and deposits are used. The annually compounded zero swap rate is used to construct most of the zero curve. The different day count convention of the swaps is taken into account. The discount rates are computed based on the deposit and swap rates. Brigo and Mercurio(2007) define the zero curve at time t as a graph of the simply-compounded interest rates for maturities up to one year and of the annually compounded rates for maturities larger than one year. The simply compounded and the annually compounded interest rates are defined as follows 1 − P (t, T ) L(t, T ) = , (2.3) (T − t)P (t, T ) 1 and Y (t, T ) = , (2.4) [P (t, T )]1/(T −t) where L(t, T ) represents the simply compounded interest rate at time t for maturity T and Y (t, T ) represents the annually compounded interest rate respectively. The simply compounded interest rates that represent the first part of the zero curve are now combined with the annually compounded rates that are used for the other part of the zero curve. To do so we first define R(ti) as the interest rate

corresponding to maturity ti, where i is the market observation index. Hence, R(ti) represents the simply

compounded interest rate if ti ≤ 1 and the annually compounded interest rate if ti > 1. There is no single way to construct this complete zero curve correctly. It is however important that the derived yield curve is consistent, smooth, and closely tracks the observed market points. Uri(2000) however also mentions that over-smoothing the yield curve might cause the elimination of valuable market pricing information. Piecewise linear interpolation and piecewise cubic spline interpolation are two commonly used methods that are appropriate for market pricing. The piecewise linear interpolation method is simple to implement, because the value of a new data point is simply assigned according to its position along a straight line between observed market data points. One drawback of this method however is that it produces kinks in the areas, where the yield curve is changing slope. The piecewise linear interpolation can be constructed in a closed form as follows   (t − ti) R(t) = R(ti) + × [R(ti+1) − R(ti)], (2.5) (ti+1 − ti)

where ti ≤ t ≤ ti+1. To avoid the kinks produced by the linear method, one can choose to fit a polynomial function through the observed marked data points instead. It is possible to either use a single high-order polynomial or a

4 Frank de Zwart Abn Amro Model Validation number of lower-order polynomials. The latter method is preferred, because the extra degrees of freedom can be used to impose additional constraints to ensure smoothness of the curve. The piecewise cubic spline technique goes through all observed data points and creates the smoothest curve that fits the observations and avoids kinks. We can construct a cubic polynomial for the n − 1 splines between the n market observations. Now let Qi(t) denote the cubic polynomial associated with the t segment [ti, ti+1]

3 2 Qi(t) = ai(t − ti) + bi(t − ti) + ci(t − ti) + R(ti), (2.6) where R(ti) again represents market observation point i and ti represents the time to maturity of market observation i. With three coefficients per spline and n − 1 splines, we have 3n − 3 unknown coefficients and we impose the following constraints

3 2 ai(ti+1) + bi(ti+1 − ti) + ci(ti+1 − ti) = R(ti+1) − R(ti), 2 3ai−1(ti − ti−1) + 2bi−1(ti − ti−1) + ci−1 − ci = 0,

6ai−1(ti − ti−1) + 2bi−1 − 2bi = 0, (2.7)

b1 = 0,

6an−1(tn − tn−1) + 2bn−1 = 0.

The first set of n − 1 constraints is imposed in order to force the polynomials to perfectly fit to each other at the knot points. To also let the first and second order derivatives of the polynomials match, we set the second and third set of 2n − 2 constraints. Finally two endpoint constraints are required to set the derivative equal to zero at both ends. We end up with a linear system of 3n − 3 equations and 3n − 3 unknowns, which is solved to obtain the optimal piecewise cubic spline. Both methods are used and plotted in Figure 2.1 below. The main advantage of the linear interpo- lation method is that it is closed form. The piecewise cubic spline interpolation method however takes longer to compute. The figures show almost no difference between both methods. This relatively small difference can be explained by the large number of data points and the smooth structure of the rates with respect to their time to maturity.

−3 −3 x 10 Zero curve for 02−Jun−2016 x 10 Zero curve for 02−Jun−2016 10 10

5 5 Rates Rates 0 0 Market Quotes Market Quotes Linear interpolated line Cubic spline interpolated line −5 −5 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time to Maturity Time to Maturity Discount curve for 02−Jun−2016 Discount curve for 02−Jun−2016 1.1 1.1 Discount rates Discount rates 1 Linear interpolated line 1 Cubic spline interpolated line

0.9 0.9 Rates Rates

0.8 0.8

0.7 0.7 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time to Maturity Time to Maturity

Figure 2.1: The interpolated zero and discount curves.

The zero curves as well as the discount curves are computed daily over the entire time grid. The discount curves are used to price the swaptions and are obtained for a maturity up to 30 years. However, we will only focus on swaptions with a maximum maturity of 10 years and a maximum underlying tenor of the swap of 10 years. As showed in Figure 2.1, the two interpolation methods differ very little. The computation time of the linear interpolation method is however significantly shorter. For this practical reason and the aim of

5 Frank de Zwart Abn Amro Model Validation this research we have chosen to only use the linear interpolation method.

2.3 Swaptions

Swap options, or swaptions, are options on interest rate swaps. They give the holder the right to enter into a certain interest rate swap at a certain time in the future. Depending on whether the swaption is a call or a , we call it a payer swaption or a receiver swaption respectively. The swap rate of the swap contract equals the strike of the swaption. In a payer swaption, the owner pays the fixed leg and receives the floating leg and in a receiver swaption this is the other way around. For example, a ’2y10y’ European payer swaption with a strike of 1%, represents a contract in which the owner has the right to enter a swap, with a tenor of ten years, in two years from now where he pays a fixed rate of 1%. First we define the annuity factor A that is used for discounting

β X Aα,β(t) = (Ti − Ti−1)P (t, Ti), (2.8) i=α+1 where we have again n × m cash flows, with n the number of years and m the number of cash flows per year respectively. We also define the forward swap rate at time t for the sets of times Ti, like Brigo and Mercurio(2007). The forward swap rate is the rate in the fixed leg of the interest rate swap that makes the contract fair at the present time. We denote the forward swap rate as follows

P (t, Tα) − P (t, Tβ) Sα,β(t) = . (2.9) Aα,β(t)

Now one can define the value of a payer swaption with strike K and resetting at Tα,...,Tβ−1 as follows

 + Vswaption(t) = Aα,β(t)Et (Sα,β(Tα) − K) . (2.10)

The value of the swaption clearly depends on the expected value of the difference between the forward swap value and the strike rate. To obtain an arbitrage free price of a swaption, we need to define the corresponding measure used to derive the expected value. This will be elaborated in more detail in Section 2.4. Some swaptions or combinations of swaptions will briefly be explained in this section, because of their relevance for this research. An at-the-money (ATM) swaption is a swaption that has a strike equal to the par swap rate of the underlying swap of the swaption. There are multiple trading strategies involving swaptions. We define a as the sum of an ATM payer swaption and an ATM receiver swaption with the same ATM strike. If the interest rate is close to the strike rate at of the options, the straddle leads to a loss. However, if there is a sufficiently large move in either direction, a significant profit will be the result. We also define a , which is the sum of a receiver swaption with a strike of ’ATM - offset’ and a payer swaption with a strike of ’ATM + offset’. The market normally refers to strangles as a ’2Y into 10Y 100 out/wide skew strangle’ in which 100 is the width (in basis points) between the payer and receiver strike and the offset from the ATM to both the payer and receiver swaption is thus width/2. For example if we assume an ATM strike of 1%, the receiver strike is thus 0.5% and the payer strike is 1.5%. A strangle is a similar strategy to a straddle. The investor is betting that there will be a large movement in the interest rate, but is uncertain whether it will be an increase or a decrease. If we would compare the payoff of both strategies, we see that the interest rate has to move farther in a strangle than

6 Frank de Zwart Abn Amro Model Validation

in a straddle for the investor to make profit. However, the downside risk if the interest rate ends up in a central value is less with a strangle. Finally, a is also defined. A collar is a payer swaption with a strike of ’ATM + offset’ minus a receiver swaption with a strike of ’ATM - offset’. A collar is normally quoted as a ’2Y into 10Y 100 out/wide skew collar’ in which the width of 100 basis points is again the width between the payer and receiver strike. So, you will pay floating if the swap rate is within the interval of ’ATM ± offset’ and pay a fixed rate for the range of the swap rate outside this interval.

2.4 Martingales and Measures

The models that are used to price derivatives try to estimate the expected payoff of the derivative. These models are based on a process, which is simply a variable whose value changes over time in an uncertain way. The processes, where only the current value of a variable is relevant for predicting the future, are called Markov processes. The is very useful, because it states that the future value of a variable is independent of the path it has followed in the past. This corresponds to the assumption of weak market efficiency and states that all the relevant information is captured in the current value of the variable (Hull, 2012). A that satisfies the Markov property is known as a Markov process. We now focus on a particular kind of Markov process, which is known as a (or a Brownian motion). Formally, we define a P-Wiener process as stated in the theorem below (Tsay, 2005).

Theorem 2.2. A real-valued stochastic process {Wt}t≥0 is a P-Wiener process if for some real constant σ, under P,

1. for each s ≥ 0 and t ≥ 0 the random variable Wt+s − Ws has the normal distribution with mean 2 zero and variance σt ,

2. for each n ≥ 1 and any times 0 ≤ t0 ≤ t1 ≤ · · · ≤ tn, the random variables {Wtr − Wtr−1 } are independent,

3. W0 = 0,

4. Wt is continuous in t ≥ 0.

The probability measure P is defined as the probability of each event A ∈ F . We can think here of F as a collection of subsets out of the entire sample space. Finally Ft contains all the information about the evolution of the stochastic process up until time t. The price of a non-dividend paying stock is often modelled as a Geometric Brownian motion. Before we define this Geometric Brownian motion we first define a standard Brownian motion as a Wiener process with zero drift and a variance proportional to the length of the time interval. This corresponds to a rate of change in the expectation that is equal to zero and a rate of change in the variance that is equal to one. We now consider a generalized Wiener process, where the expectation has a drift rate equal to µ and the rate of change in the variance is equal to σ2 (Tsay, 2005). This leads to the following generalized Wiener process

dxt = µ(xt, t)dt + σ(xt, t)dWt, (2.11)

7 Frank de Zwart Abn Amro Model Validation

where Wt is a standard Brownian motion. We then consider the modelled change in price of a non- dividend paying stock over time and this results in the following Geometric Brownian motion

dSt dSt = µStdt + σStdWt ⇒ = µdt + σdWt, (2.12) St where µ and σ are constant. Now Itoˆ’s lemma can be used to derive the process followed by the logarithm

of St (Itô, 1951). First consider the general case for the continuous-time stochastic process xt of (2.11).

We also define G(xt, t) as a differentiable function of xt and t and find ∂G ∂G 1 ∂2G  ∂G dG = µ(x , t) + + σ2(x , t) dt + σ(x , t)dW . (2.13) ∂x t ∂t 2 ∂x2 t ∂x t t We then apply Itoˆ’s lemma to obtain a continuous-time model for the logarithm of the stock price. The

differentiable function is now defined as G(St, t) = ln(St). This leads to  σ2  d ln (S ) = µ − dt + σdW . (2.14) t 2 t This stochastic process has a constant drift rate of µ − σ2/2 and a constant variance of σ2. This implies that the price of a stock at some future time T is log-normally distributed, given the current value of the stock at time t   σ2   ln S ∼ φ ln S + µ − ∆, σ2∆ , (2.15) T 0 2 where ∆ is the fixed time interval T − t. Black’s model is based on this lognormal property together with the property of a risk-neutral world as will be further explained in Section 4.1. In order to be able to normalize different asset prices, one can use a numeraire Z as reference asset. A numeraire is defined as any positive non-dividend-paying asset. A key result used in the pricing of derivatives is the relation between the concept of absence of arbitrage and the existence of a probability measure like the martingale measure (or risk-neutral measure). Brigo and Mercurio(2007) denote this relation as follows based on a numeraire Z   St Z ST = E |Ft 0 ≤ t ≤ T, (2.16) Zt ZT where the price of any traded asset S (without intermediate payments) relative to Z is a martingale under probability measure QZ . This probability measure Q is equivalent to the real world probability measure P. A martingale is a zero-drift stochastic process, so under probability measure QZ we have for

a sequence of random variables S0,S1,...

Z E [Si|Si−1,Si−2,...,S0] = Si−1, ∀ i > 0. (2.17)

The preferred numeraire to use, depends on the derivative that is priced. Two frequently used numeraires are now briefly described. First a numeraire based on the zero-coupon bond and secondly a numeraire based on the annuity of a swap. A zero-coupon bond, with a maturity T equal to that of the derivative, is commonly used as a numeraire. We denote the value of this numeraire at time t as Zt and note that ZT = P (T,T ) = 1. We also denote the measure associated with this numeraire as the T-forward measure QT with expectation ET . This way we are able to price a derivative by computing the expectation of its payoff under this measure. This leads to the following price of a derivative at time t   T V (T ) T V (t) = P (t, T ) |Ft = P (t, T ) [V (T )], (2.18) E P (T,T ) E

8 Frank de Zwart Abn Amro Model Validation for 0 ≤ t ≤ T (Brigo and Mercurio, 2007). Notice that the forward rate is a martingale under this measure, this makes the forward measure convenient to work with. The annuity of a swap is a linear combination of zero-coupon bonds. A numeraire is defined as a positive non-dividend paying asset, so the annuity of a swap can also be used as a numeraire. The numeraire in this case will be the following portfolio of zero coupon bonds:

β X ZT = Aα,β(T ) = (Ti − Ti−1)P (T,Ti), (2.19) i=α+1

α,β this leads to the swap measure Q . Under this measure we find that the swap rate Sα,β(t) is a martingale:

P (t, Tα) − P (t, Tβ) Sα,β(t) = (2.20) Aα,β(t)   P (t, Tα) − P (t, Tβ) α,β P (T,Tα) − P (T,Tβ) ⇒ = E |Ft 0 ≤ t ≤ T. (2.21) Zt ZT These numeraires and their related measures are used in arbitrage-free pricing, which is an essential part of the option pricing models that are used. These models and their assumptions are further explained in Section 4.1. However, first we will review several studies that are relevant for this research in the next section.

3 Literature review

This study combines several methods with different underlying assumptions. First of all, the risk-neutral world assumption is used and both the and the SABR volatility model are used to price a swaption on an interval of strike rates. When pricing these swaptions we also take negative interest rates into account by using a displacement parameter. Then also the risk measures Value at Risk and Expected Shortfall are computed. The estimated underlying profit and loss distribution is based on the normal world probabilities to estimate valid risk measures. Hence, we make a bridge between the Q-measure and the P-measure. To obtain the forecasts of the risk measures, we use two different methods. The quality of the estimates of these two methods is evaluated by several different backtests. The methods that are used differ in several ways. First, the Historical Simulation method for instance gives all historical returns in the estimation window an equal weight and uses them to construct the profit and loss distribution of the portfolio. The time series analysis that is performed on the other hand simulates one-day-ahead forecasts of the SABR model parameters. These SABR parameters represent the characteristics of the volatility structure of the individual swaptions. As a result of this we estimate the risk measures based on one-day-ahead simulations of this volatility structure instead of the value of the portfolio on itself. We will now discuss some studies that have also focused on the aspects that we are looking at. Pérignon and Smith(2010) for instance, compare the disclosed quantitative information and VaR es- timates of up to fifty international commercial banks in their paper. They use panel data over the period 1996-2005 and find that VaR estimates are in general excessively conservative and also note that there is no improvement in the estimates of the VaR over time. Besides this, they find that the most popular VaR method is the Historical Simulation method. Then they also conclude that this method helps little in forecasting the volatility of future trading revenues. Pérignon and Smith(2010) use the Unconditional Coverage test of Kupiec(1995) to test whether the proportion of VaR violations equals the desired proportion p. The number of VaR violations that is found is extremely small and the null

9 Frank de Zwart Abn Amro Model Validation hypothesis of unconditional coverage is rejected for every year except for 1998 at the 5% confidence level. This study clearly shows the relevance of finding an improved and less conservative method to estimate risk measures like the Value at Risk. There are multiple improvements proposed in the literature that are based on the Historical Simula- tion method such as the Filtered Historical Simulation method (FHS) as described by Barone-Adesi et al. (2002). While the Historical Simulation method was found to be excessively conservative by Pérignon and Smith(2010), it is also known to underestimate risk in some particular situations. This is because the method is based on the assumption that the risks do not change over time. Hence, when the market conditions change and the market becomes more volatile, the risk is underestimated by the method. This can fortunately be solved by first standardizing the historical returns and then scaling them to the current volatility as is done with the Filtered Historical Simulation method. In this method a GARCH model is fitted to the historical data and the residuals are divided by their corresponding volatility estimates. These standardized residuals are then randomly drawn and used to simulate the one-day-ahead profit and loss distribution. Even though this method overcomes a shortcoming of the Historical Simulation method it still needs some care. According to the work of Gurrola and Murphy(2015) the filtering process changes the return distribution in ways that may not be intuitive. Furthermore it is important to make a careful selection in which application of the FHS method is used and besides this re-calibration and re-testing is essential to ensure that the model remains relevant. Finally, Pritsker(2001) also shows that one has to be careful when dealing with limited data sets. He shows for example that two years of historical data is not sufficient for the FHS method to estimate the Value at Risk accurately at a 10-day horizon. The Historical Simulation method is based on historical returns, however to obtain these returns we first need to price the swaptions. There are numerous models that can be used to price a swaption. However, we also need to take the smile risk into account due to the fact that the volatility of the swaptions differs for different strike rates. To capture this smile risk in the Hagan et al.(2002) introduce the SABR volatility model. West(2005) calibrates the parameters of the SABR model in a situation where input data is very scarce. The calibration is based on equity futures which are traded at the South African Futures Exchange. The study focuses on packages of options that combine multiple derivatives like a collar or a butterfly for example. Some of these packages are traded in total for about 800 times, while there are more than double that number of strike combinations. West(2005) compares two cases. First, he estimates all of the SABR model parameters daily and then in the second case he keeps one of the parameters (β) fixed while he still estimates the other parameters daily. This is because hedging efficiency can be ensured by changing the parameters only once a month while changing the input values of F and σAT M daily. West(2005) finds that the calibrated parameters of the model only change infrequently when the value for β is fixed. In fact, they are always changing up to a very high precision, but they remain unchanged up to a fairly high precision. For this reason, he finds that keeping the value for β fixed leads to an infrequent change of the other SABR parameters. These infrequent changes result in the end in lower hedging costs. Hence, this research shows a robust algorithm to capture the based on the SABR model while the input data is very scarce and also shows the advantages of keeping the parameter β fixed. Bogerd(2015) also uses the SABR volatility model, but he combines it with the Historical Simulation method. He focuses on the volatility structure of swaptions in specific. He uses daily observations of the calibrated SABR model parameters and also uses a displacement parameter to deal with negative

10 Frank de Zwart Abn Amro Model Validation interest rates. He simulates 1000 one-day-ahead estimates of the profit and loss distribution based on historical changes in the SABR model parameters. A distinction is made here between the curvature and the level of the volatility structure. Only varying one of the SABR parameters (i.e. α) results in just a vertical shift of the volatility skew. Bogerd(2015) notes that this is a reasonable approximation, because most of the variation in the swaption volatility over time is caused by vertical movements of the volatility smile. He performs an unconditional coverage test as well as an independence test and only rejects the independence property for the Historical Simulation method applied to all of the SABR model parameters. The independence property is tested here with the backtest of Du and Escanciano(2015), which is based on the Ljung-Box statistic. These results imply that there are possibilities to obtain valid forecasts of the risk measures based on estimates of the one-day-ahead volatility structure. We note however that the SABR parameters that represent the volatility structure are dependent on each other like described in Section 4.1.2. When dealing with such a time series of interdependent parameters a multivariate time series model can be used to capture the dynamics of the parameters over time. This makes it interesting to investigate whether it is possible to improve the one-day-ahead forecasts of the volatility structure by using a time series analysis. There are however some difficulties when applying the Historical Simulation method on the SABR model parameters. Moni(2014) explains that it is questionable if it is meaningful to add past changes in the SABR parameters to their current values. A change in the SABR parameters changes the entire volatility structure. Such a change may not always be valid, especially if the values of the historical SABR parameters are significantly different from the current values of the SABR parameters. For this reason, the Historical Simulation method will not be applied to the SABR parameters in this study. We will compare estimated risk measures of the Historical Simulation method based on the portfolio returns with the estimated risk measures based on a time series analysis of the SABR model parameters. In this study we make use of two different measures with each their own underlying assumptions. The risk-neutral world assumption makes it possible for us to compute the expected value of future payoffs without having to deal with the different risk preferences of buyers and sellers of derivatives. Giordano and Siciliano(2013) clarify in their paper that this risk-neutral hypothesis is acceptable for pricing derivatives. However, they also note that the risk-neutral assumption can not be used to forecast the future value of a financial product. So, if we estimate the one-day-ahead value of a swaption we need to take the risk premium into account. Hence, we compute the estimated profit and loss distribution based on the real-world probability measure P. Therefore we use the risk-neutral world assumption only to compute the volatility structure of the derivatives based on the quoted historical swaption premiums. These volatility structures are then used together with the risk-neutral assumption to price the swaptions up to and including the last day of the estimation window. The methods that are then used to estimate the one day ahead profit and loss distribution do not depend on the risk-neutral assumption. The one- day-ahead forecasts of the price of the swaptions are estimated based on the real-world probabilities. The risk measures are then computed based on these estimates of the profit and loss distribution. The adequacy of the forecasts based on these models will be assessed by several backtests. Piontek (2009) reviews various backtests that assess the quality of models that produce VaR estimates. He analyzes some commonly used backtesting methods in his research and focuses on the problems regarding limited data sets and low power of the tests. The simulations are performed for different sample sizes with the number of observations between 100 and 1000. He finds a low power for the backtest of Kupiec (1995) for all of these sample sizes. He tests for example based on 250 observations and an inaccurate

11 Frank de Zwart Abn Amro Model Validation

model that gives 3% or 7% of violations, instead of the chosen tolerance level of 5%. In this example the backtest only rejects the model in 35% of the draws. This shows that an inaccurate model in such a situation is not rejected in 65% of the cases with a significance level of 5%. A low power is also found by other backtests and this shows that we can not assume that a model is correct if it is not rejected by a backtest. In the empirical study of this research we also have to deal with a limited backtesting sample size of 363 observations. For this reason, we apply numerous different backtests that enable us to assess the quality of our methods more extensively. In the next section we will first discuss the models and methods that are used in the empirical part of this research. We will then continue with a description of the data and then also discuss the results of the empirical study.

4 Models and method

The SABR volatility model that is used will be explained in more detail in Section 4.1.2. It will be used to convert the quoted market swaption premiums into a volatility surface that allows us to price swaptions for arbitrary non-quoted strikes. This will be done for a selected combination of the expiry and tenor, so not the entire surface will be taken into account.

4.1 Option pricing models

Under the right corresponding measure, we have seen that both the forward rate as well as the swap rate are martingales. In this research we use the Euribor forward rate, which is a martingale under the forward measure QT . We also have that forward swap rates are a martingale under their measure Qα,β. The option pricing models are based on the following stochastic process

dFt = c(t, . . . )dWt. (4.1)

The Brownian motion Wt and coefficient c can be deterministic or random. Note that the dynamics do not have a drift term, since the forward rate is a martingale under its corresponding measure.

4.1.1 Black’s model

Black(1976) introduced a model which gives a closed form solution for the price of an option under the assumption that price movements of the forward rate Ft follow a log-normal distribution. The dynamics in Black’s model depend on the current value of the forward rate Ft and one parameter σB called Black’s volatility and are given by the following equation

dFt = σBFtdWt F0 = F > 0. (4.2)

The standard continuous-time stochastic process is denoted in (2.11). Notice that the drift parameter µ is dropped out of Black’s differential equation. This implies that the equation is independent of risk preferences. Black, Scholes and Merton use in their analysis that a riskless portfolio can be set up from the stock and the derivative. This portfolio is riskless for an instantaneously short period, but can be rebalanced frequently. This way one can assume that investors are risk-neutral and therefore use the following results. The expected return on all securities is the risk-free interest rate r and the present value of any cash flow can be obtained by discounting its expected value at the risk-free rate (Tsay, 2005).

12 Frank de Zwart Abn Amro Model Validation

The expected payoff of an European on a under the forward measure is

T E [max(V (T ) − K, 0)], (4.3)

where ET denotes the expected value under the forward measure and V (T ) is the value of the underlying of the option at time t = T . We denote the price of this call option at time t as

T ct = P (t, T )E [max(V (T ) − K, 0)]. (4.4)

Using the dynamics of (4.1), the following well known solution for the price of an European call option on a futures contract can be derived

c0(F0,K,T ; σB) = P (0,T )[F0φ(d1) − Kφ(d2)],

2 F0  σ log K + 2 T d1 = √ , (4.5) σ T √ d2 = d1 − σ T.

Besides this general formula, one can also compute the price of a payer swaption with Black’s formula, as described in Hull(2012)

2  Sα,β (Tα)  σ ln K + 2 T d1 = √ , σ T √ (4.6) d2 = d1 − σ T,

VSwaption(t) = LAα,β(t)[Sα,β(Tα)N(d1) − KN(d2)],

where L is the notional principal value of the contract. In this formula the swap rate is used instead of the discounted futures price, based on this swap rate and the swap measure we can price a swaption in a similar manner to an option on a futures contract.

4.1.2 SABR volatility model

One of the assumptions of the Black model is that a fractional change in the futures price over any interval follows a lognormal distribution (Black, 1976). If this assumption would be violated, some of the outcomes will as a result change. If for example the probability of a large positive movement in the interest rate would actually be significantly higher than implied by the lognormal property, this would lead to a higher expected payoff of an out-of-the-money (OTM) payer swaption with a strike rate in this region. The corresponding price of such a swaption will subsequently also need to be higher than the price based on the lognormal assumption. This phenomenon is observed in the market and leads to a volatility that varies for different strike rates, as opposed to the constant Black’s volatility. For this reason, we introduce a volatility model to take this volatility skew into account. The Stochastic Alpha Beta model, like derived by Hagan et al.(2002), is given by a system of

two stochastic differential equations. The state variables Ft and αt are defined as the forward interest rate and a volatility parameter respectively. The dynamics of the model are as follows

β (1) dFt = αtFt dWt F0 = F > 0, (2) dαt = ναtdWt α0 = α > 0, (4.7) (1) (2) dWt dWt = ρdt,

13 Frank de Zwart Abn Amro Model Validation

where the power parameter β ∈ [0, 1] and ν > 0 is the volatility of αt, so the volatility of the volatility (1) (2) of the forward rate. dWt & dWt are two ρ-correlated Brownian motions. The factors F and α are stochastic and the parameters β, ρ and ν are not. West(2005) describes the parameters in more detail. α is a ’volatility-like’ parameter: not equal to the volatility, but there will be a functional relationship between this parameter and the at-the-money volatility. Including the constant ν acknowledges that volatility obeys well known clustering in time. The parameter β ∈ [0, 1] defines the relationship between futures spot and at-the-money volatility. A value of β close to one indicates that the user believes that if the market were to move up or down in an orderly fashion, the at-the-money volatility level would not be affected significantly. Whereas for values of β << 1 it indicates that if the market were to move then the at-the-money volatility would move in the opposite direction. The closer β is to zero the more distinct this effect would be. Moreover the value for β also gives insight in the distribution of the the underlying. If β is close to one the stochastic model is said to be more lognormal and the closer β is to zero the closer the stochastic model follows the normal distribution instead. Hagan et al.(2002) show that the price of a vanilla option under the SABR model is given by the appropriate Black’s formula, provided the correct is used. For given α, β, ρ, ν and τ, this volatility is given by

2   (1−β) α2 1 ρβνα 2−3ρ2 2  α 1 + 24 (FK)1−β + 4 (FK)(1−β)/2 + 24 ν τ z σ(K, F, τ) = , (4.8) h 2 4 i (1−β)/2 (1−β) 2 F (1−β) 4 F χ(z) (FK) 1 + 24 ln K + 1920 ln K ν F where z = (FK)(1−β)/2ln , (4.9) α K ! p1 − 2ρz + z2 + z − ρ and χ(z) = ln , (4.10) 1 − ρ

for an option with strike K, given that the current value of the is F . Here we note that in

our case we have that the forward value is equal to the par swap rate. Hence, we have F = Sα,β(Tα) and note that if F = K the swaption is said to be at-the-money. For the ATM strike rate, we can remove z the terms z and χ(z) from the equation, because in the limit we have χ(z) = 1. So for an at-the-money volatility, one can rewrite the equation as follows

2 (1−β) τ 3 ρβντ 2  2−3ρ2 2  24F (2−2β) α + 4F (1−β) α + 1 + 24 ν τ α σ (F, τ) = , (4.11) AT M F (1−β) where τ is the year fraction to maturity. This formula is closed form, which makes the model very convenient for the pricing of an option. There is however one main drawback of Hagan’s formula. This drawback is that the formula is known to produce wrong prices in region of small strikes for large maturities. Obłój(2008) proposes for this reason to an improvement to the original formulas that compute the volatility as defined by Hagan et al. (2002). In his paper he gives several arguments to use the formula derived by Berestycki et al.(2004). To understand why we use the formula of Berestycki et al.(2004), we consider the Taylor expansion of the implied volatility surface

σ0(K, F, τ) = σ0(K,F ) 1 + σ1(K,F )τ + O(τ 2). (4.12)

Obłój(2008) then compares the explicit expressions of Hagan et al.(2002) and Berestycki et al.(2004) for σ0(K,F ) and σ1(K,F ). It can be shown that both expressions for σ0(K,F ) and σ1(K,F ) are exactly

14 Frank de Zwart Abn Amro Model Validation

the same when either K = F , ν = 0 or β = 1. However, when β < 1 the results of σ0(K,F ) of the two papers differ and Obłój(2008) argues that the formula of Berestycki et al.(2004) is correct and should be used. This conclusion is based on two arguments. First of all Hagan’s formula is inconsistent if β → 0. And secondly the formula suggested by Obłój(2008) produces, in most cases, correct prices in the region of small strikes for large maturities, unlike Hagan’s formula. The formula for the implied volatility is now obtained by combining σ0(K,F ) from Berestycki et al. (2004) and σ1(K,F ) from Hagan et al.(2002). We define the fine-tuned implied volatility as follows

2 F   (1−β) α2 1 ρβνα 2−3ρ2 2  ν ln K 1 + 24 (FK)1−β + 4 (FK)(1−β)/2 + 24 ν τ σ(K, F, τ) = , (4.13) χ(z) ν F (1−β) − K(1−β) where z = , (4.14) α 1 − β ! p1 − 2ρz + z2 + z − ρ and χ(z) = ln , (4.15) 1 − ρ

which is used instead of (4.8) if there is reason to assume that β < 1. We will now discuss the method that is used to calibrate the SABR model parameters. In the empirical part of this research we will only find values of β < 1, so as a result we will only work with (4.13) instead of (4.8). Nevertheless Obłój(2008) showed that the expressions from Hagan et al.(2002) and Berestycki et al.(2004) are exactly the same for the volatility of an at-the-money swaption. For this reason, (4.11) remains valid. We now follow the steps from West(2005) and notice the following relation

ln σAT M = ln α − (1 − β)ln F + ..., (4.16)

so the right value of β can be estimated from a log-log plot of σAT M and F . Hagan et al.(2002) suggest that it is appropriate to fit this parameter in advance and never change it. So the appropriate value for β is chosen first. Then (4.11) is inverted to obtain an expression of α in the other SABR parameters and the at-the-money volatility. This is done by setting the equation equal to zero and selecting the smallest positive real root. In the final step we minimize the difference between the market volatilities and the volatilities computed with the SABR model

min |σM − σSABR(α, β, ρ, ν, τ)|, (4.17) ρ,ν

where β is already estimated and α(σAT M , β, ρ, ν, τ). The time to maturity τ is also known, so we calibrate ρ and ν by minimizing this difference. In this method, we calibrate the parameters so that the produced at-the-money volatilities are exactly equal to the market quotes. The at-the-money volatilities are important to match, because they are traded most frequently. Finally, when all of the parameters are calibrated and we have estimated the SABR volatility for a swaption, we can use (4.6) to price this swaption. The steps to calibrate the SABR model parameters are all applied and described in more detail in Section 6.1.

4.1.3 Pricing in a negative interest rate environment

Before we continue with the time series analysis, we first need to consider a method that enables us to price derivatives in a negative interest rate environment. The option pricing models that are used in this research do not allow interest rates to become negative. However a lot has changed since these models where constructed and we need to adjust our models to be able to deal with the negative interest rates

15 Frank de Zwart Abn Amro Model Validation that have occurred over the past years. Frankema(2016) describes the Displaced Black’s model as well as the displaced SABR model, which allow interest rates to be negative. The shifted models with shift s > 0 allow rates larger than −s to be modelled. This leads to the following adjusted dynamics of Black’s model, which is also known as a displaced diffusion process

dFt = d(Ft + s) = σB(Ft + s)dWt, (4.18) where s is the constant displacement (or shift) parameter. Note that Fˆt ≡ (Ft +s) follows a lognormal (or

Black) process. This fact, together with the fact that the payoff of a European call option max(FT −K, 0) can be written as

max(FT − K, 0) = max((FT + s) − (K + s), 0) ≡ max(FˆT − Kˆ ), (4.19) leads to the conclusion that European calls and puts can be valued under the displaced diffusion model by plugging in Fˆ0 ≡ (F0 + s) and Kˆ = (K + s) in Black’s model. A similar adjustment leads to the following dynamics of the displaced SABR model

β (1) dFt = αt(Ft + s) dWt , (2) dαt = ναtdWt , (4.20) (1) (2) E[dWt dWt ] = ρdt. Hence, we use the formulas from Black’s model (4.6) and the SABR model (4.13) with the displaced values Fˆ0 and Kˆ instead of F0 and K. A drawback of the displaced models however is that the shift parameter needs to be selected a priori. So an assumption has to be made on the minimum of the interest rate. To overcome this drawback mentioned above, Antonov et al.(2015) describe the Free boundary model. However for this research the displaced SABR model is preferred.

4.2 Time series analysis

The SABR volatility model parameters are estimated on a daily basis. The aim of this research is to estimate the risk related to a portfolio of swaptions. Therefore an analysis of these SABR parameters over time is of interest to be able to forecast the one-day-ahead volatility structure. In this section, some models will be discussed that are used to capture the dynamics of the parameters αt, ρt and νt over time.

4.2.1 Vector Autoregressive model

A time series is called if all autocorrelation functions (ACF) of a sequence {γt} are equal to zero. So we need for a white noise series, that all sample ACFs are close to zero. To obtain this, we need to apply some time series models to model the dynamic structure of our time series. Tsay(2005) denotes first the simple autoregressive model of order 1 or simply AR(1) model. This model is defined as follows:

γt = φ0 + φ1γt−1 + at, (4.21)

2 where {at} is assumed to be a white noise series with mean zero and variance σa. This model described above could make sense for the individual parameters, but we have to obtain a forecast of all of the SABR parameters together. These parameters clearly depend on each other like

16 Frank de Zwart Abn Amro Model Validation

described in (4.7). Hence, a model that takes the correlation between these time series into account is desired. The vector autoregressive model (VAR) is a model that can be used for this kind of linear dynamic structures of a multivariate time series. We fit a VAR model to the three time series α, ρ and ν

Γt = φ0 + ΦΓt−1 + at,   α t (4.22)   where Γt = ρ  .  t  νt

The vectors Γt and φ0 are k-dimensional, Φ is a k×k matrix, and {at} is a sequence of serially uncorrelated random vectors with mean zero and co-variance matrix Σ. Note that we are modelling three different SABR parameters over time and for this reason have k = 3. For our V AR(p) model estimation, we have to decide how many lags p to include. A vector au- toregressive model of lag length p refers to a time series in which its current value is dependent on its first p lagged values. There are several tools that can be used to decide which lag length to include. Firstly a sample autocorrelation function (ACF) of the parameters can be used to check their level of autocorrelation. If we have a weakly stationary return series γt, we define the lag-l autocorrelation of

γt, ACFl, as the correlation coefficient between γt and γt−l. We define ACFl as follows (Tsay, 2005)

Cov(γt, γt−l) Cov(γt, γt−l) ACFl = p = (4.23) V ar(γt)V ar(γt−l) V ar(γt)

Another method to determine the optimal selection of lags to include is to use information criteria. These criteria like the Akaike information criterion (AIC), Bayes information criterion (BIC) and Hannan- Quinn criterion (HQC) can be used to measure the relative quality of statistical models for a given set of data. Liew(2004) compares these different criteria in a simulation study to obtain the best choice of lag length criteria for an autoregressive model. He finds out that for a relatively large sample, with 120 or more observations, the Hannan-Quinn criterion is found to outdo the rest in correctly identifying the true lag length.

4.2.2 Local level model

A local level model is a type of state space model, which can like the VAR model also be used for a time series analysis. In a classical regression model, a trend and an intercept are estimated. However, when focusing on a time series this intercept might in reality not be fixed over time. When this level component changes over time it is applied locally and for this reason this model is known as the local level model. The local level model allows this intercept to change over time and is defined as follows

 (1) µt  (2) µt+1 = Imµt + Bηt, where µt = µt  , (4.24)  (3) µt

Γt = Cµt + Dεt, (4.25)

where Γt is the vector of SABR parameters that is defined in (4.22). Moreover the observation or measurement equation (4.25) contains the values of the three observed time series at time t. Besides

this, we also have a m × 1 vector of unobserved variables µt. Three unobserved variables are used in this research, so we have here m = 3. These unobserved variables represent the unknown fixed effects

17 Frank de Zwart Abn Amro Model Validation

and we define (4.24) as the state equation. We also define εt as the observation disturbances and ηt as the state disturbances respectively. These disturbances are independent and follow the standard normal distribution. The state disturbance coefficient matrix B is here defined as a 3 × 3 matrix. This results in a co- variance matrix equal to BB0. The observation innovation coefficient matrix D is defined in a similar way as a 3 × 3 matrix, which leads to an observation innovation co-variance matrix equal to DD0. Both the state disturbance coefficient matrix as well as the observation innovation coefficient matrix are defined as a diagonal matrix. The diagonal elements of these matrices are estimated by using maximum likelihood.

Furthermore, we note that Im is the identity matrix of size m = 3. Finally, the 3 × 3 matrix C links

the unobservable factors of the state vector µt with the observation vector Γt. All the coefficients of the matrix C are also estimated by using maximum likelihood. The state equation is defined as a and in the measurement equation an irregular com-

ponent εt is added, which makes this model a random walk plus noise. The state equation is essential in time series analysis, because the time dependencies in the observed time series are dealt with by letting the state at time t + 1 depend on the state at time t (Commandeur and Koopman, 2007).

4.3 Risk measurement

The option pricing models are based on a probability measure Q that is related to a risk-neutral world. On the other hand the real probability P is used to estimate the risk of a portfolio. These two measures give different weights to the same possible outcomes for the same derivatives. The risk measures are based on estimates of the profit and loss distribution. The probability of occurring a certain value from this profit and loss distribution needs to be equivalent to the real probability P to obtain a valid risk measure. In this research, we will use option pricing models together with the risk neutral measure to price the swaptions. These swaption prices as well as the calibrated parameters of the SABR model are then used to derive the profit and loss distribution under the probability in the real world. In this section the concepts of financial risk and some methods of measuring risk will be introduced. This includes a definition of Value at Risk and Expected Shortfall as well as their limitations.

4.3.1 Risk measures

Financial risk can be seen as the change of a loss in a financial position, caused by an unexpected change in the underlying risk factor. In this research we focus on a portfolio of swaptions, so the risk related to this is the risk of losses in positions arising from movements in market prices. The risk we are trying to measure is called market risk and in our case in specific, interest rate risk. Now a formal definition of a risk measure is provided. We have a finite set of states of nature Ω, a set of all risks χ and the set of all real-valued functions X ∈ χ , which represent the final net worth of an instrument for each element of Ω. We now define a risk measure ρ(X) as a mapping of χ into R (Roccioletti, 2016). To assess whether a risk measure is acceptable, the axioms of a coherent risk measure are defined. In other words, a risk measure is said to be coherent if it satisfies the following four properties. Axiom 1. Translation Invariance For all X ∈ χ and for all m ∈ R, we have

ρ(X + m) = ρ(X) − m (4.26)

18 Frank de Zwart Abn Amro Model Validation

Translation invariance implies in words that the addition of a sure amount of capital reduces the risk by the same amount. Axiom 2. Sub-additivity

For all X1 ∈ χ and X2 ∈ χ, we have

ρ(X1 + X2) ≤ ρ(X1) + ρ(X2) (4.27)

So, the risk of two portfolios together cannot get any worse than adding the two risk separately. Axiom 3. Positive Homogeneity For all X ∈ χ and for all τ > 0, we have

ρ(τX) = τρ(X) (4.28)

Again in words, positive homogeneity implies the risk of a position is proportional to its size. Axiom 4. Monotonicity

For all X1 ∈ χ and X2 ∈ χ with X1 ≤ X2, we have

ρ(X1) ≥ ρ(X2) (4.29)

Finally, like described by Roccioletti(2016), the monotonicity axiom explains that if, in each state of the world, the position X2, performs better than position X1, then the risk associated to X1 should be higher than that related to X2. The Value at Risk measure is a single estimate of the amount by which an institution’s position in a risk category could decline due to general market movements during a given holding period. Define

∆Vl as the change in value of the assets of a financial position from time t to t + l. This quantity will be measured in Euros and is a random variable at time index t. The cumulative distribution function of

∆Vl is expressed as Fl(X). The Value at Risk measure is defined such that a loss will not exceed VaR with probability 1-p over a given time horizon(Tsay, 2005). The VaR is given by

p = P r[∆Vl ≤ −V aR] = Fl(−V aR). (4.30)

Although VaR is widely used among banks, it also has several limitations. First of all, as described by the Basel Committee(2013), the VaR measure does not capture tail risk. As it is a single estimate of the minimal potential loss in an adverse market outcome, it will underestimate the actual potential loss. The Value at Risk measure gives no estimation of the magnitude of the loss in such an event. Besides this the sub-additivity property fails to be valid for VaR in general, meaning that it is not a coherent risk measure and we can have

V aR(X1 + ··· + Xd) > V aR(X1) + ··· + V aR(Xd). (4.31)

While in general, portfolio diversification always leads to risk reduction, this is not the case for the VaR measure. This is especially a problem when we consider the capital adequacy requirements for a financial institution made of several businesses. With a decentralized approach, where the VaR number is calculated for every different branch, we can not be sure if the aggregated overall risk is an accurate estimation. However, we note that VaR is not sub-additive in general, but whether or not it is the case depends on the properties of the joint loss distribution. To overcome the shortcomings of the Value at Risk measure, the Expected Shortfall measure can be used instead. Expected Shortfall is the expected return of the portfolio given that a loss has exceeded

19 Frank de Zwart Abn Amro Model Validation

the VaR. We define the ES as

(1−p) (1−p) −ES = E[∆Vl|∆Vl ≤ −V aR ], (4.32) Z −V aR(1−p) (1−p) 1 −ES = xfl(x)dx, (4.33) p −∞

where fl(x) is the probability distribution function of ∆Vl. In these formulas we assume a long position in the portfolio, but the same can be derived for a short position. Expected Shortfall fulfills all the four axioms above and so it is a coherent risk measure. Also, the tail risk is taken into account with the ES measure. There are still some other issues with this measure. To obtain the ES forecast, we first need to ascertain the VaR estimate to subsequently compute the tail expectation. This brings a greater uncertainty into the estimation. There is also some difficulty with the validation of risk models’ ES forecasts. As showed by Gneiting(2011), Expected Shortfall is not elictable. A function is elictable if there exists a scoring function that is strictly consistent for it. The difficulty with the ES forecasts is that it measures all risk in the tail of the return distribution. Some losses far out in the tail however, will not be observed in regular backtesting. Despite these drawbacks, it is still proposed as a replacement of the VaR measure. To assess the risk related to the swaptions, we want to compute the Value at Risk and Expected Shortfall forecasts. The Historical Simulation method will now be described. This procedure uses historical returns to predict the VaR. It is easy to implement, but has some shortcomings. All of the returns are given the same weight, so this procedure does not take the decreasing predictability of data that are further away from the present into account.

Let rt, rt−1, . . . , rt−K be the returns of a portfolio in the sample period. So first the changes in swap-

tion price over our sample are computed. Then we sort the returns in ascending order: r[1], r[2], . . . , r[K]. The one-day ahead Value at Risk is given by:

(1−p) −V aR = r[k], (4.34)

where k = Kp. The Expected Shortfall follows from the previous steps and can be computed as follows:

k 1 X −ES(1−p) = r (4.35) k [i] i=1 We note that classical HS is only valid in theory when the volatility and the correlation are constant over time, when dealing with a time-varying volatility we need to use another method.

4.4 Backtests

Backtesting can be described as checking whether realizations are in line with the model forecasts. Financial institutions base their decisions partly on their estimates of risk measures. Therefore it is very important to test whether these estimates are accurate. There are various different tests developed over time to be able to assess the quality of the models that produce these estimates. Even though it may seem like a simple task, there are some complications. The main difficulty is that the methods result in an estimate of the profit and loss distribution daily, but to assess the quality of this estimated distribution only one true profit or loss is observed. Especially the evaluation of the accuracy of an Expected Shortfall estimate is challenging. If we focus on the ES(0.975) for example, we only incur in theory a loss that exceeds the V aR(0.975) in 2.5% of the cases. With this minimal number of actual losses that exceed the

20 Frank de Zwart Abn Amro Model Validation

VaR, we need to assess whether the ES forecast actually represents the true expected value of the tail loss. Besides this, the tail loss is also estimated based on a different profit and loss distribution for every new forecast. Fortunately there are some methods to backtest the models we are using. We will mainly focus however on backtests based on the VaR estimates, but we will also perform a backtest to assess the performance of the models with regard to their Expected Shortfall estimates. Campbell(2007) reviews a variety of backtests. He defines a hit function that creates a sequence like for example: (0, 0, 0, 1, 0, 0,..., 1), where a 1 stands for a loss that exceeds the VaR measure. Determining the accuracy of the VaR measure can be reduced to determining whether the hit sequence satisfies two properties. First of all, the probability of receiving a loss that exceeds the (1 − p)% VaR measure must be p. Secondly, any two elements of the hit sequence must be independent from each other. Only hit sequences that satisfy both properties can be described as evidence of an accurate VaR model. Let this hit function be defined as follows

( (1−p) 1, if rt+1 < −V aR It = (4.36) (1−p) 0, if rt+1 ≥ −V aR .

The hit function is used to test the unconditional coverage property with the backtest proposed by Kupiec(1995) and also to test the independence property with the backtest proposed by Christoffersen (1998). In addition also the magnitude of losses that exceed the VaR can be taken into account with a magnitude-based test.

4.4.1 Unconditional coverage backtesting

The unconditional coverage backtest, proposed by Kupiec(1995), tests the null hypothesis of E[It] = p. The hit function defined at the beginning of this section is used and we first compute the total number of hits

T X n1 = It, (4.37) t=1

(1−p) we also define n0 = T − n1 as the total number of returns larger than −V aR . The estimated probability now becomes n πˆ = 1 . (4.38) n0 + n1 So this corresponds to the following hypothesis based on the returns and the Value at Risk measure

H0 :π ˆ = p, H1 :π ˆ 6= p. (4.39)

The likelihood under the null hypothesis is defined as

n0 n1 L(p; I1,I2,...,IT ) = (1 − p) p , (4.40)

and under the alternative hypothesis as

n0 n1 L(π; I1,I2,...,IT ) = (1 − π) π . (4.41)

This can be tested with a standard likelihood ratio test   L(p; I1,I2,...,IT ) asy 2 LRuc = −2 log ∼ χ (m − 1) (4.42) L(ˆπ; I1,I2,...,IT )]

21 Frank de Zwart Abn Amro Model Validation

The variable m is the number of possible outcomes of the hit sequence, so in this case we have m = 2. The LR-statistic converges under the null hypothesis to the chi-squared distribution with one degree of freedom 1 − π n0 π n1  LR = 2 log −−→d χ2(1). (4.43) uc 1 − p p

4.4.2 Magnitude-based test

Frequency tests do not take the magnitude of the losses into account, therefore it is desired to also perform a magnitude-based test. If we consider for example two different banks, with both a V aR(0.99) estimate. Say that these banks both encounter three losses that exceed their Value at Risk estimate within the same time period. The Unconditional Coverage test would indicate that the performance of both models is similar. However, it could be the case that Bank A has occurred three losses that exceed the VaR with one million euros, while Bank B has occurred losses that exceed the VaR with one billion euros. This difference in the risk is obvious, so for that reason a multivariate version of the unconditional coverage test will be applied. Colletaz et al.(2013) describe a method to validate risk models. The test is based on the intuition 0 that a large loss will not only exceed the V aR(1−p), but is also likely to exceed the V aR(1−p ) with p0 < p. A standard Value at Risk violation is defined as a exception and a super exception is defined as (1−p0) rt < −V aR . Based on these two concepts the following null hypothesis is defined

0 0 H0 : E[It(p)] = p and E[It(p )] = p . (4.44)

To test this hypothesis, we define two hit functions to indicate the frequency of returns that fall in each interval

 0 1 if − V aR(1−p ) < r < −V aR(1−p), 0  t J1,t = It(p) − It(p ) = (4.45) 0 otherwise,

 0 1 if r < −V aR(1−p ), 0  t J2,t = It(p ) = (4.46) 0 otherwise ,

2 and J0,t = 1 − J1,t − J2,t = 1 − It(p). The hit functions {Ji,t}i=0 are Bernoulli random variables equal to one with probability 1 − p, p − p0, and p0, respectively. The hit functions are not independent of each PT other and we now denote ni,t = t=1 Ji,t, for i = 0, 1, 2. Then we define the proportions of exceptions as follows

n0 n1 n2 π0 = , π1 = , and π2 = . (4.47) n0 + n1 + n2 n0 + n1 + n2 n0 + n1 + n2 The likelihood ratio test can now also be defined for the multivariate case  π n0  π n1 π n2  LR (p, p0) = 2 ln 0 1 2 −−→d χ2(2), (4.48) muc 1 − p p − p0 p0

where the χ2 distribution has m − 1 degrees of freedom with in this case m = 3.

4.4.3 Independence backtesting

The next step is to test whether any two outcomes of the hit sequence are independent of each other. Christoffersen(1998) proposed a test that examines whether the likelihood of a VaR violation today is

22 Frank de Zwart Abn Amro Model Validation dependent on a violation yesterday. The hypotheses are constructed as follows

πij = P (It = j|It−1 = i), i, j = 0, 1, (4.49) H0 : π01 = π11 = p, H1 : π01 6= π11.

Christoffersen(1998) tests the independence property against an explicit first-order Markov alternative.

First a transition probability matrix is defined based on the binary first-order {It}   1 − π01 π01 Π1 =   . (4.50) 1 − π11 π11

We now define nij as the number of observations with value i followed by j and this leads to the following likelihood function

n00 n01 n10 n11 L(Π1; I1,I2,...,IT ) = (1 − π01) π01 (1 − π11) π11 . (4.51)

Conditioned on the first observation, the log likelihood can be maximized and the parameters are ratios of the counts of the appropriate cells   n00 n01 ˆ n00+n01 n00+n01 Π1 =   . (4.52) n10 n11 n10+n11 n10+n11

We now consider a similar interval model, with the same output sequence {It}. This Markov chain model has the independence property and is given by   1 − π2 π2 Π2 =   . (4.53) 1 − π2 π2

This gives us the likelihood under the null hypothesis

(n00+n10) (n01+n11) L(Π2; I1,I2,...IT ) = (1 − π2) π2 , (4.54) where we can again maximize the likelihood function and estimate the parameters. This leads to

n01 + n11 Πˆ 2 =π ˆ2 = , (4.55) n00 + n10 + n01 + n11 now the likelihood ratio test follows and is like the unconditional coverage test asymptotically χ2 dis- tributed with (m − 1)2 degrees of freedom

" ˆ # L(Π2; I1,I2,...,IT ) asy 2 2 LRind = −2 log ∼ χ ((m − 1) ). (4.56) L(Πˆ 1; I1,I2,...,IT )

This leads again to a χ2 distribution with one degree of freedom, because we again have m = 2 " # (1 − π )n00 πn01 (1 − π )n10 πn11 LR = 2 log 01 01 11 11 −−→d χ2(1). (4.57) ind (n +n ) (n00+n10) 01 11 (1 − π2) π2

4.4.4 Duration-based test

In addition to the tests described above, one could also assess the duration between two consecutive hits. The baseline idea is that if the one-day-ahead Value at Risk is correctly specified for a coverage rate p, then the durations between two consecutive hits must have a geometric distribution with a success

23 Frank de Zwart Abn Amro Model Validation

probability equal to p (Candelon et al., 2010). When the model satisfies the unconditional coverage property (UC) as well as the independence property (IND), the VaR forecasts are said to have a correct conditional coverage (CC). Under this property, the VaR violation process is a martingale difference

E[It(p) − p|Ft−1] = 0. (4.58)

The hit series {It(p)} is a random sample from a Bernoulli distribution with a success probability equal to p. We denote the duration between two consecutive violations as

di = ti − ti−1, (4.59)

where ti represents the date of the ith violation. A GMM moment condition test is used to backtest the UC, IND and CC properties, but now based on the duration. First we define the orthonormal polynomials associated to a geometric distribution with a success probability p as follows (1 − p)(2k + 1) + p(k − d + 1)  k  M (d, p) = √ M (d, p) − M (d, p), (4.60) k+1 (k + 1) 1 − p k k + 1 k−1 for any order k ∈ N, with M−1(d, p) = 0 and M0(d, p) = 1. If the true distribution is a geometric distribution with a success probability p, then we have

∗ ∗ E[Mk(d, p)] = 0, ∀ k ∈ N , ∀ d ∈ N . (4.61) This leads to the following hypotheses for each property

H0,uc : E[M1(di, p)] = 0,

H0,ind : E[Mk(di, q)] = 0, k = 1,...,K, (4.62)

H0,cc : E[Mk(di, p)] = 0, k = {1,...,K}, where K is defined as the number of moment conditions. The unconditional coverage property is tested with the first hypothesis. This hypothesis states that the expected value of the first moment condition

is equal to zero for the sequence of durations {d1, . . . , dN }. The second hypothesis is used to test the independence property. This hypothesis states in words that the expected value for every moment condition is equal to zero. There is however one difference, the probability q in the moment conditions does not has to be equal to the true success probability p. Finally the conditional coverage property is tested with the final hypothesis, which is a combination of the other two hypotheses. Now the of the three different tests are defined

N !2 1 X d 2 GMMuc(K) = √ M1(di, p) −−→ χ (1), (4.63) N i=1 N !T N ! 1 X 1 X d 2 GMMind(K) = √ M(di, q) √ M(di, q) −−→ χ (K), (4.64) N i=1 N i=1 N !T N ! 1 X 1 X d 2 GMMcc(K) = √ M(di, p) √ M(di, p) −−→ χ (K). (4.65) N i=1 N i=1 Note however that in the second equation the value of q is not known, so has to be estimated. Candelon et al.(2010) show that the distribution of the GMM statistic GMMind, based on Mk(di, qˆ), is similar to the one based on Mk(di, q) and this leads to

N !T N ! 1 X 1 X d 2 GMMind(K) = √ M(di, qˆ) √ M(di, qˆ) −−→ χ (K − 1), (4.66) N i=1 N i=1 (4.67)

24 Frank de Zwart Abn Amro Model Validation because the first polynomial is used to estimated the maximum likelihood estimator qˆ. The first polyno- mial M1(di, qˆ) is strictly proportional to the score used to define the maximum likelihood estimator qˆ, so we solve M1(di, qˆ) = 0 to obtain our estimate of q.

4.4.5 Kolmogorov Smirnov test

To assess the goodness of fit of a statistical model, one can use the Kolmogorov Smirnov test. The test, like described in Massey(1951), is based on the maximum difference between an empirical and a hypothetical cumulative distribution. The first distribution is a specified cumulative distribution function

F0(x). This is compared with an observed cumulative step-function of the sample SN (x) = k/N, where k is the number of observations less than or equal to x. This results in the following test statistic

DN = max |F0(x) − SN (x)|. (4.68)

When (x1, x2, . . . , xn) are mutually independent and all come from the same distribution function F0(x), then the distribution of DN does not depend on F0(x). This means that a table, used to test the hypothe- sis that numbers come from a uniform distribution, may also be used to test the hypothesis that numbers come from a normal distribution, or from any completely specified continuous distribution(Miller, 1956).

The statistic DN is used to test the null hypothesis that the observations come from F0(x) against the alternative that they come from an alternative distribution. Based on formulas noted in Miller(1956), one can derive the values of  based on the sample size N and the desired level of significance (1 − a).

These values of  define the distribution of the statistic DN : P = Prob(DN ≤ ).

4.4.6 Expected Shortfall backtesting

We mainly focus on backtests based on the estimated Value at Risk measure. However, we also estimate the Expected Shortfall measure and therefore also want to assess the quality of our methods based on these ES estimates. Acerbi and Szekely(2014) describe three different backtests based on the Expected Shortfall measure. They only make the assumption that the profit and loss distributions are continuous. This way the Expected Shortfall can be written as

(1−p) (1−p) ES (t) = −E[∆Vl(t)|∆Vl(t) + V aR (t) < 0]. (4.69)

The tests that are used are model independent, so there is besides continuity no assumption made on the true distribution of the returns. The general hypothesis of the Expected Value backtests is constructed as follows

H0 : Fl(t) = Pl(t), (4.70) (1−p) (1−p) H1 : ESF (t) > ESP (t), where Fl(t) is the unknown true distribution of the returns ∆Vl(t) and Pl(t) is the forecasted distribution (1−p) of the returns ∆Vl(t) based on the model. Furthermore we also define ESF (t) as the Expected (1−p) Shortfall based on the unknown true distribution Fl(t) and ESP (t) as the Expected Shortfall estimate based on the model distribution Pl(t). We perform one of the proposed backtests that is sensitive to both the magnitude as well as the frequency of exceptions. Besides this, we only estimate one-day-ahead forecasts and for this reason set l = 1. The test is based on the returns of a portfolio rt in a sample period with T observations in total.

25 Frank de Zwart Abn Amro Model Validation

Acerbi and Szekely(2014) base the test statistic of this test on the following relation r I  ES(1−p)(t) = − t t , (4.71) F E p where It is the indicator function as defined in (4.36). This leads to the following test statistic

T X rtIt Z( ~r ) = + 1. (4.72) (1−p) t=1 T p ESF (t) The hypothesis of this specific test is defined as follows

[1−p] [1−p] H0 : F1 (t) = P1 (t) ∀ t, (1−p) (1−p) H1 : ESF (t) ≥ ESP (t) ∀ t, (4.73) (1−p) (1−p) and ESF (t) > ESP (t) ∃ t, (1−p) (1−p) and V aRF (t) ≥ V aRP (t) ∀ t. So under the null hypothesis we have a model that estimates the tail risk correctly, while if the null hypothesis is rejected we have a model that underestimates the tail risk. The expected value of this test statistic Z is under the null hypothesis equal to zero and under the alternative hypothesis strictly smaller than zero. We perform this test with a significance level of 5% and Acerbi and Szekely(2014) show that we do not need to perform a Monte Carlo simulation to compute the p-value for Z. They show that the p-values are remarkable stable when all financially realistic cases are taken into account. This leads to a selected critical value of the test statistic that is equal to −0.7.

5 Data

Two data sets, with each a different source, are combined for this research. The swaption data is provided by ICAP and the zero curve data is collected from Thomson Reuter Eikon and Bloomberg. All of the data is available from 13-Jan-2015 up to and including 1/Jun/2017. The data only contains trading days and this leads to a total of 613 observations for each variable. Little pre-processing is done to obtain the necessary zero curve data. The interest rates and interest rate swaps, that are used to construct the zero and discount curves, are based on the Euribor rate. The quotes are end-of-day and based on a floating tenor of three months. Furthermore we use so called mid rates, which are computed based on the bid and ask quotes as observed in the market. The day count convention is also quoted for each product and this is all used together with the bootstrapping method described in Section 2.2 to obtain the zero and discount curves. We then start with the pre-processing of the ICAP swaption data. The initial data consists of two raw ICAP end-of-day data files. The first file contains the ATM data, including the ATM straddle premiums for various swaption expiry and tenor combinations. The second file contains the skew data including payer, receiver, collar and strangle premiums for various expiry, tenor and relative strike combinations. First the relevant data is extracted from these raw files, then we convert them into files that are used as input to the calibration. We store the ATM straddle premiums in a separate file. The premiums are stored in an expiry-tenor grid. In another file we store the payer and receiver swaption premiums for different relative strikes. For some strikes no payer and receiver premiums are available, but only collars and strangles. In this case the payer and receiver swaptions are derived using the relationship collar + strangle payer = , receiver = strangle - payer. (5.1) 2

26 Frank de Zwart Abn Amro Model Validation

We end up with two files with payer and receiver swaption premiums. These files contain the exact same values, we have only separated the premiums for expiries up to one year from the premiums for expiries of one year and beyond. The only reason for the two separate files is to follow the set up of the raw input ICAP data. Then finally we also create a file which contains all of the ICAP displacement values for every expiry-tenor combination. The descriptive statistics of these deposit rates, swap rates, and the ’10y10’ swaption premiums are shown in Table 5.1. The displacement parameter is excluded from this table, because it only takes on a small number of discrete values on the entire time grid. A plot of the magnitude of the displacement parameter for the ’5y5y’ and the ’10y10y’ swaption is shown instead in Figure 6.2. The value for the standard deviation that is shown in the table is the average of the standard deviations between the different tenors and strike rates for the Euribor data and the swaption data respectively. Aggregating the data gives a more clear view of the main characteristics of the data. However, on the other hand some information is lost because of the aggregation. For this reason, we show boxplots of both the Euribor data and the Swaption data in Section A.1.

Euribor deposit rate Euribor swap rate Swaption premium Tenor Overnight - 3 weeks 1 month - 60 years 10 years Maturity - - 10 years Min -0.3320 % -0.3980 % 57.85 euro Max 0.0710 % 1.7965 % 875.64 euro Mean -0.1787 % 0.2781 % 551.49 euro Median -0.2420 % -0.1490 % 582.98 euro Std. Dev. 0.0014 0.0020 25.23 Number of observations 3678 23907 10421

Table 5.1: Descriptive statistics of the data.

5.1 Calculating the implied volatilities

Next, we have to convert the premiums to volatilities, which we can then use to calibrate the SABR model. To obtain the volatilities, we will use the displaced Black’s model as described in (4.18). First we will use the ATM implied volatility to compute the correct principle value of the contract. This way we link the correct volatilities to the ICAP premiums. The ATM volatility is given in our dataset, so this makes a good starting point. We compute the principle value of the contract L as follows √ σAT M T d1.AT M = 2 √ σ T d = − AT M (5.2) 2.AT M 2 P L = swaption.AT M , Aα,β(0)[Sα,β(Tα)N(d1.AT M ) − KN(d2.AT M )] this notional principal is then used in the next step to compute the out-of-the-money volatilities. The premiums for both receiver and payer OTM swaptions are quoted in the data set. The interval of these strikes relative to the par swap rate of the underlying swap of the swaption is as follows

27 Frank de Zwart Abn Amro Model Validation

Receiver -3% -2% -1.5% -1% -0.75% -0.5% -0.25% -0.125% -0.0625% ATM 0% Payer +0.0625% +0.125% +0.25% +0.5% +0.75% +1% +1.5% +2% +3%

Table 5.2: Available strikes relative to the par swap rate.

Now we also compute the absolute rates of the ATM strikes based on the par swap rates. The ICAP data strikes are all relative to the ATM strike, so to get the absolute strikes we need to compute the par swap rate. To do so we use (2.9) together with the bootstrapped discount curve based on the Euribor rate. The OTM volatilities are now computed by inverting (4.6) and solving this function for σ. We make use of the displaced variant of Black’s model, so we use Fˆ and Kˆ , as described in Section 4.1.3. This way we obtain the market points of the implied volatilities of the swaption.

5.2 Leaving out of some strikes

Firstly, there are some strikes missing in our data set. We only focus on the most frequently traded expiry-tenor combinations to minimize the amount of missing values, but still some premiums are missing. Especially the receiver swaptions with strikes of -3% and -2% relative to the par swap rate are often missing. For this reason, we choose to exclude those two strikes on the entire interval. Furthermore there is one day in particular (25/Mar/2015) where the premiums of only 11 out of the 19 strikes are available. Fortunately this day is the only exception and for the 10y10 swaption there are at least 17 out of the 19 premiums available for all of the other days. The missing premiums here are the receiver swaptions with strikes of -3% and -2%, which are excluded from our calibration. This results in a complete premium vector for our interval of strikes for all days except for 25/Mar/2015. To obtain a more stable time series of SABR parameters, we choose to exclude the quotes on 25/Mar/2015 from our data set. Secondly, as will be described in Section 6.1, the shape of the volatility structure depends on the chosen level of displacement. This volatility structure is then used to calibrate the SABR model. The SABR model can however have difficulties to calibrate both to the low and high strikes. Some of the low strike receiver swaptions will be removed in the calibration to obtain a better calibration to the higher strikes, in which practitioners have the most exposure. The impact of these low strike receiver swaptions, with a high volatility, on the SABR parameters is too big in relation to their importance. Leaving them out will not only result in a better calibration for the other strikes, but also prevent calibrated SABR smiles that result in big repricing differences. We remove a strike (K[1]) from the range we use for the calibration in one of the following two cases

1. |σK[1] − σK[2] | > 0.2,

2. σK[1] < σK[2] , where the strikes are ordered in ascending order from the receiver swaption with the lowest strike up th to the payer swaption with the highest strike. K[i] represents the i strike in this sorted strike range. Moreover by removing strikes, with a too high (1.) or a too low (2.) volatility, we improve our overall calibration. These two cases only occur in the period from 13/Jan/2015 until 25/Mar/2015 and in total no more than 23 strikes are removed on this interval. Note that for example in Figure 6.2 a strike is removed from the interval for the lower two displacements.

28 Frank de Zwart Abn Amro Model Validation

6 Empirical study and results

The models and theory that are described in the previous sections will now be applied on our data set. First, we will argue which values for β and the displacement parameter are preferred. Then we will continue by calibrating the other SABR model parameters and subsequently we will start with the time series analysis. The vector autoregressive model is estimated and analyzed. The estimates of the risk measures based on this model will then be compared to the estimates of the historical simulation method by multiple backtests. Finally, this section is ended with the estimation of the local level model and this is used as a robustness check of the vector autoregressive model.

6.1 Calibrating the SABR model parameters

In Section5 is described how the implied volatilities are obtained from the input data. These volatilities are now used as inputs for the SABR volatility model. The model will be calibrated daily and the time series of the parameters will then be stored and finally also analyzed. This is described in Section 6.2. The first step in calibrating the SABR parameters is to determine which value for β fits the data best. Our main focus in this research is on a swaption with 10 years to maturity and an underlying swap

tenor of 10 years as well. In Figure 6.1 the log-log plot of σAT M and F is displayed. This can be used together with the theoretical relation described in (4.16). Now one can estimate the value for β and we use a simple OLS regression to do so. The linear approximations are plotted as well and the OLS estimates are shown in Table 6.1.

Log log plot with an OLS approximation −0.9 Data OLS approximation −1

−1.1

−1.2 ATM

σ −1.3 log −1.4

−1.5

−1.6

−1.7 −4.7 −4.6 −4.5 −4.4 −4.3 −4.2 −4.1 −4 −3.9 −3.8 −3.7 Log F

Figure 6.1: Log-log plot for the ’10y10y’ swaption.

The OLS estimation gives us the following results:

Log α -(1-β) α β OLS estimate -3.5563 -0.5262 0.0285 0.4738

Table 6.1: OLS estimates for α and β.

29 Frank de Zwart Abn Amro Model Validation

So, as mentioned before, one fixed value for β can be used for the entire time grid. This method of estimating the best value for β is however not always used. A common other approach is to just set β equal to 0.5. We note that the value of β lies close to this value of 0.5 for the ’10y10y’ swaption. On the other hand, this does not hold for every swaption. If we compute the log-log plot for the ’5y5y’ swaption, we find a optimal value for β = 0.7191. The log-log plot and OLS estimates for the ’5y5y’ swaption are displayed in appendix Section A.2. Before we start with the calibration of the other SABR parameters, we first need to select the level of the displacement parameter. This level of displacement has on itself no impact when repricing a single swaption. If a given displacement is used to imply the volatility, then recomputing the premium will result in an identical premium independent of the size of the displacement. However, the displacement parameter has got an impact on the underlying volatility structure for different strikes. So, we need to take two things into account when choosing the displacement parameter. First of all the displacement parameters s needs to be larger than the absolute value of the lowest strike K. This is necessary to be able to use Black’s model for the entire range of strikes. Also if we have K + s > 0, but really close to zero, this will result in very high volatilities. Secondly, a large displacement parameter will flatten the volatility structure or will even result in a frown. This effect is clearly shown in Figure 6.2 based on our data set.

10−Mar−2017 1 Market Volatilities 0.9 ATM volatility Displacement of 1% Displacement of 1.6% 0.8 Displacement of 3%

0.7

0.6

0.5

0.4

Implied Black Volatility 0.3

0.2

0.1

0 −0.02 −0.01 0 0.01 0.02 0.03 0.04 0.05 Strike

Figure 6.2: Implied volatilities and SABR calibration for different levels of displacement.

The interest rate and the par swap rate vary over time. For this reason it also makes sense to vary the magnitude of the displacement parameter over time. The proposed magnitude of this variable displacement parameter is provided with the data and shown in Figure 6.3. The fixed displacement value of 1.25% is proposed for the 10y10y swaption, because a larger value will result in a worse calibration for the positive interest rate period. On the other hand, a smaller value for the displacement parameter forces us to remove some of the lowest strikes in the negative interest rate period. We also see a dynamic displacement in Figure 6.3, these displacements are used by the data supplier ICAP. The dynamic displacement parameter makes sure that we obtain a well behaving volatility structure on the entire interval.

30 Frank de Zwart Abn Amro Model Validation

Different displacement parameters for the 5y5y swaption Different displacement parameters for the 10y10y swaption 3 3 Dynamic Dynamic Fixed Fixed 2.5 2.5

2 2

1.5 1.5

Displacement in % 1 Displacement in % 1

0.5 0.5

0 0 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure 6.3: Magnitude of displacement for the ’5y5y’ and ’10y10y’ swaption respectively.

Once we have obtained the optimal value for β and the displacement parameter s, we can calibrate the other SABR parameters; α, ρ, and ν. In Figure 6.4, we can see the effect of a change in one of these parameters, while the other parameters remain unchanged. Again, the relationship between α and β, as given in (4.16), is clear to see. An increase (decrease) in α or a decrease (increase) in β leads to an increase (decrease) in all of the implied volatilities. So, a shift in one of these two parameters results in a vertical shift of the entire volatility structure. Now in the figure on the left side below, we can see that a change in ρ will lead to a tilt in the volatility skew. So an increase (decrease) in ρ results in a decrease (increase) of the implied volatility for the OTM receiver swaption strikes and in an increase (decrease) for the OTM payer swaption strikes respectively. Finally, a shift in ν affects the structure again in another way. An increase (decrease) in ν leads to a more (less) curved volatility structure. These responses of a change in one of the SABR parameters hold in general for every swaption. The plots below are based on the ’5y5y’ swaption, but for this reason also hold for swaptions with another expiry-tenor combination like the ’10y10y’ swaption.

31 Frank de Zwart Abn Amro Model Validation

10−Mar−2017 10−Mar−2017 1 1 Market Volatilities Market Volatilities 0.9 ATM volatility 0.9 ATM volatility α = 0.0344 β = 0.6227 0.8 α = 0.0844 0.8 β = 0.7227 α = 0.1344 β = 0.8227 0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

Implied Black Volatility 0.3 Implied Black Volatility 0.3

0.2 0.2

0.1 0.1

0 0 −0.02 −0.01 0 0.01 0.02 0.03 0.04 0.05 −0.02 −0.01 0 0.01 0.02 0.03 0.04 0.05 Strike Strike

10−Mar−2017 10−Mar−2017 1 1 Market Volatilities Market Volatilities 0.9 ATM volatility 0.9 ATM volatility ρ = −0.6896 ν = 0.0669 0.8 ρ = −0.1896 0.8 ν = 0.2669 ρ = 0.3104 ν = 0.4669 0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

Implied Black Volatility 0.3 Implied Black Volatility 0.3

0.2 0.2

0.1 0.1

0 0 −0.02 −0.01 0 0.01 0.02 0.03 0.04 0.05 −0.02 −0.01 0 0.01 0.02 0.03 0.04 0.05 Strike Strike

Figure 6.4: Effect of changes in SABR parameters for the ’5y5y’ swaption.

6.2 Fitting a model through the SABR parameters time series

Now we have calibrated the SABR volatility model, we obtain the volatility structure for our expiry-tenor combination. Our input strikes are relative to the par swap rate, so they differ over our time period. For the next step in our research, we will focus on one fixed interval of strikes for the entire time period. We will compute the volatilities again with the formula suggested by Obłój(2008) and our calibrated SABR parameters for 100 strikes equally distributed on an interval between 0.1% and 3.0%. The calibrated SABR parameters for a fixed displacement of 1.25 % are displayed in the left part of Figure 6.5. As can be seen from this plot, we notice that the parameter ρ is very unstable up to 25/Mar/2015. We expect that these unstable results are due to the relatively high displacement for this period. The right part of Figure 6.5 shows again the calibrated SABR parameters, but now the dynamic displacement is used. The dynamic displacement is significantly lower in the first months of 2015 and this solves our problem of ρ being unstable. This clearly shows the significance of using the right magnitude of the displacement parameter. Again the same steps are followed for the ’5y5y’ swaption and the results are similar. The calibrated SABR parameters are displayed in Section A.3. Different magnitudes of the fixed and dynamic displace- ment parameter are proposed for the ’5y5y’ swaption, however again the first months of our time grid are calibrated with a relatively high value for the fixed displacement parameter. The dynamic displacement parameter, that is related to the level of the interest rates at that current time period, results also for the ’5y5y’ swaption in more stable SABR parameters.

32 Frank de Zwart Abn Amro Model Validation

SABR parameters with fixed displacement SABR parameters with dynamic displacement 1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2 -0.2 SABR parameters SABR parameters -0.4 -0.4

-0.6 -0.6

-0.8 -0.8

-1 -1 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure 6.5: SABR parameters for the ’10y10y’ swaption.

We will now try to capture the linear interdependencies among our variables α, ρ, and ν. We will focus on the ’10y10y’ swaption with a dynamic displacement parameter. Again the decision to focus on the ’10y10y’ swaption is made based on the fact that the quoted premiums are the most reliable and complete. The dynamic displacement is preferred, because it results in a more stable time series of our parameters. So this combination is the most promising in leading to reliable estimates of our risk measures. As discussed in Section 6.1, the volatility surface depends on the magnitude of the displacement parameter. This results in some shocks in our calibrated SABR parameters. A shift in the dynamic displacement parameter causes a change in the volatility structure and this results in slightly different calibrated SABR parameters. In Figure A.5 of the appendix the SABR parameters and the level of the dynamic displacement parameter are displayed in one figure. These plots give a clear view of the effect of the level of displacement on the calibrated SABR parameters. For now we do not adjust our time series analysis to deal with these small shocks, but we do note this occurrence. To estimate the one-day-ahead forecasts, we use a moving window of n = 250 observations. The

first estimation will be based on the t1, . . . , tn interval, where t1 is the first day of our data set, namely 13/Jan/2015, and tn represents 27/Jan/2016. This results in the first estimated profit or loss on 28/Jan/2016. We will fit a new autoregressive model to be able to estimate every day between

28/Jan/2016 and 01/Jun/2017. The moving window method implies that we use interval t2, . . . , tn+1 to estimate tn+2 and so on. The SABR parameters α, ρ, and ν are shown in Figure 6.6. In addition, we have also computed the first differences of the three SABR parameters and show them as well in Figure 6.6.

33 Frank de Zwart Abn Amro Model Validation

10-3 0.06 2

0.04 0

-2 0.02 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17

0 0.1 -0.2 0

-0.4 -0.1

Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17

0.6 0.05

0.4 0 0.2 -0.05 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure 6.6: SABR parameters and first differences for the ’10y10y’ swaption with dynamic displacement.

Now we will determine how many lags p to include in our vector autoregression. To do so we first check the sample ACF’s of the parameters and the parameters in first differences respectively. Plots of the ACF’s for different intervals can be found in Section A.4. These ACF’s are by themselves not enough to decide how many lags to include. If we look at the ACF’s of our parameters in first differences, there can still be some significant autocorrelation up to lag number 200. It does not make sense to include this many lags in our estimation, so we check the lag order selection criteria. The values of several criteria are compared and denoted in Table A.2, which can be found in the appendix. In our decision we follow the argumentation of Liew(2004) and make our selection based on the Hannan-Quinn criterion, because of our large sample. We check for lags between zero and twenty and the Hannan-Quin criterion reaches its minimum value on this interval at three lags. For this reason we make the decision to include p = 3 lags in our vector autoregressive model. We estimate the parameters of our VAR(3) model based on our moving window of 250 observations. The results of this estimation for the first period are denoted in Section A.5. One can use the VAR model to obtain forecasts of the SABR parameters. In Section A.7, a ’10-day-ahead’ forecast of the parameters can be found. The figure shows us the estimated trend of the parameters based on our fitted VAR(3) model. However, we are especially interested in the one-day-ahead forecasts. For this reason we simulate 20000 different one-day-ahead forecasts for our SABR parameters. We now first want to check the fit of our vector autoregressive model to the data and multiple diagnostic tests are performed. These tests show that the VAR model does not fit the data as well as required. The VAR models are stable, which can be evaluated by checking the inverse roots of the characteristic polynomial. However, a different preferred number of lags is found if we compare the Hannan-Quinn information criterion for different estimation windows over time. Instead of the three lags that are used now, for some estimation windows a preferred number of one lag as well as a number of thirteen lags is found. We also perform some other diagnostic tests based on the VAR(3) model and show these results in Section A.6. Recall that we want to model the dynamic structure of the time series such that the remaining residuals are white noise. First we check whether there is still significant auto correlation within the residuals by performing the Portmanteau test and the LM test for serial correlations. The null hypothesis of the Portmanteau test states that there is no serial correlation up to lag h. This hypothesis is rejected for h > 6, with a significance level of 1%. Furthermore the null hypothesis of the LM test states that there is no serial correlation and this hypothesis is with the same significance level rejected for lags 6, 12,

34 Frank de Zwart Abn Amro Model Validation

13 and 19 if we take up to 20 lags into account. Subsequently the White test is performed to check for heteroskedasticity in the errors. The test is carried out both with and without cross terms and rejects the null hypothesis for every combination of the individual components. The test without cross terms tests for heteroskedasticity only, while the test with cross terms also tests for a specification error. They both show that we have not captured the dynamics of our parameters as intended. In the final test, we assess whether the residuals follow the multivariate normal distribution. This Jarque-Bera test is based on the square root of correlation as orthogonalization method and rejects the null hypothesis of normality. Only a test on the skewness of the residuals of the equation for ν does not reject the null hypothesis of normality, but for every other part the null is rejected. We can conclude that the VAR(3) model does not capture the dynamics of the SABR parameters over time. The diagnostic tests show that the vector autoregressive model is not able to capture all the dynamics of the SABR parameters. We note these findings, but we will nevertheless compute the risk measures based on these simulations and compare the estimates of the risk measures to the estimates of the Historical Simulation method. Then we will perform the backtests and in addition also perform one robustness check after this. In Section 6.5, we will estimate the local level model and compare the simulations based on this model to the simulations that are generated based on the VAR(3) model.

6.3 Risk measurement

Now we use the simulated SABR parameters of the 28/Jan/2016 - 01/Jun/2017 period to compute 20000 volatility structures for each day. We then compute the premiums for the swaptions on the fixed range of strikes again based on these volatility structures. These premiums are used to create the profit and loss distribution. We will focus on a portfolio of three different swaptions. A strangle is one of the most popular trading strategies and for this reason we will focus on this trading strategy. This portfolio will be completed by adding an ATM payer swaption. This results in the following portfolio

Π = SwaptionReceiver(K1) + SwaptionP ayer(K2) + SwaptionP ayer(K3),

where K1 = 0.62%,K2 = 1.62% & K3 = 2.62%.

This portfolio is based on the par swap rate at the first day of our data set, K2 = 1.62%. The strangle is a combination of a receiver swaption with strike K1 =ATM - offset and a payer swaption with strike

K3 =ATM + offset, where we have chosen for a offset of 1%. A plot of the par swap rates together with

K1,K2,K3, and the used range of strikes is displayed in Section A.8. We compute the profit and loss distribution for all of our individual strikes based on the Historical Simulation method, which makes use of the 249 most recent past returns. We also compute a profit and loss distribution for every strike based on the 20000 simulated swaption prices. These distributions are used to compute the 99% VaR by simply selecting the value of our sorted profit and loss distribution that represents the lowest one percentile. For the 97.5% ES we compute the 97.5% VaR and then compute the expected value within this tail. We compare our Value at Risk and Expected Shortfall estimates between the two methods in Figure 6.7

35 Frank de Zwart Abn Amro Model Validation

VAR(3) and Historical VaR estimates over time VAR(3) and Historical ES estimates over time 350 350 VAR(3) VAR(3) 300 Historical 300 Historical

250 250

200 200

150 150 -97% ES -99% VaR 100 100

50 50

0 0

-50 -50 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure 6.7: VAR(3) and Historical Simulation −99% VaR and −97.5% ES.

The graph clearly shows that, if we use a vector autoregressive model together with the SABR model to estimate the risk measures, we get far more unstable results. The Historical Simulation method reacts relatively slow to changes in the market, because all of the 249 historical returns are taken into account with equal weights. The VAR(3) model however focuses on the more recent dynamics of the SABR model parameters. We also see that the somewhat less stable first part of our calibrated SABR parameters results in a higher estimate of the VaR and ES. The figures on the next page show the losses of our portfolio over time together with the estimates of the two risk measures. The percentage of violations is also given in the title of these plots.

Portofolio losses over time, with : 1.105% violated. Portofolio losses over time, with : 0.55096% violated. 150 400 Historical returns Historical returns -99% HS VaR 350 -99% VAR(3) VaR Violations Violations 100 300

250

50 200

150

0 100 Loss in Euros Loss in Euros

50

-50 0

-50

-100 -100 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Portofolio losses over time, with : 0.82873% violated. Portofolio losses over time, with : 0.55096% violated. 150 400 Historical returns Historical returns -97.5% HS ES 350 -97.5% VAR(3) ES Violations Violations 100 300

250

50 200

150

0 100 Loss in Euros Loss in Euros

50

-50 0

-50

-100 -100 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure 6.8: Losses over time for both methods with estimated VaR and ES.

36 Frank de Zwart Abn Amro Model Validation

We can now compare the proportion of violations with the theoretical value p. The Historical Sim- ulation method gives the results as we would expect. The 99% VaR results in four violations on the estimated interval, which is close to the expected one percent of the total estimations. The values for the 97.5% ES are also shown in the lower two plots of Figure 6.8. The title of these plots also shows a percentage of violations, but we note that this can not be used to assess the accuracy of the estimates of the Expected Shortfall measure. The models that produce the ES forecasts will be backtested in Section 6.4.6. The VAR(3) model on the other hand only results in two losses larger than the 99% VaR. This is in itself not that strange, but the values of the risk measures are. Some estimates do not make sense, because they are either extremely high or way too low. For the 99% Value at Risk for example, we find values ranging from 354.0662 up to −31.8681. A VaR of 354.0662 corresponds to a very large loss of our portfolio and is not very likely to be correct. Let alone the value of −31.8681, which means that we are 99% sure that the return of the portfolio over this one day is at least 31, 86 Euro. The simulated returns are displayed in gray in Figure 6.9. The mean of each set of simulated returns is also displayed and compared to the actual returns based on the data. In Figure A.9 a plot of the difference between the mean of the simulation and the actual return is displayed. These errors are compared to errors of the Historical Simulation method. These errors are defined as the difference between the mean of the Historical Simulation profit and loss and the actual return. If we compute the mean squared error (MSE) we find for the VAR(3) model a MSE of 563.7987 and for the Historical Simulation method a MSE of 378.7922.

Figure 6.9: VAR(3)-model simulations compared to the data set.

In the next section the results of several backtests will be displayed. These statistical tests give us a better way to assess the quality of the models. It is hard to evaluate the quality of our models, based on the number of violations alone. Due to the fact that we are looking at the tails of the distributions only, we do not have enough data available to draw conclusions about the quality of these models with enough certainty.

37 Frank de Zwart Abn Amro Model Validation

6.4 Backtests

In this section the quality of the two models will be assessed by applying several backtests. The Value at Risk as well as the Expected Shortfall is computed with four different probabilities p = [0.05, 0.025, 0.01, 0.001]. The Value at Risk and Expected Shortfall are estimated for 363 days. The number and proportion of violations for the Value at Risk measures based on both methods are denoted for different probabilities p in Table 6.2.

Historical Simulation VAR(3)-model p Risk measure Violations Proportion Violations Proportion 0.05 VaR 18 4.9587% 9 2.4793% 0.025 VaR 10 2.7548% 4 1.1019% 0.01 VaR 4 1.1049% 2 0.5510% 0.001 VaR 0 0% 1 0.2755%

Table 6.2: Proportions and total number of violations for different values of p.

The Historical Simulation method results in a number of violations that is very close to the theoretical values for all of the four values of p. The vector autoregressive model on the other hand shows a deviation from the theoretical values. The 99.9% Value at Risk for example has in theory a chance on a loss greater than this V aR(0.999) of one out of thousand. For this reason we would not expect to find a violation with 363 estimations in total, but even though of this small chance we still find a violation with the VAR(3) model Expected Shortfall.

6.4.1 Kupiec

The unconditional coverage test can be used to evaluate this even better. The test is performed with a significance level of 5% and the results can be found in Table 6.3. The test confirms the deviation of the VAR(3) model and rejects the null hypothesis for a value of p = 0.05. We recall the null hypothesis of this test, which is equal to E[It] = p. In words the null hypothesis states that the expected proportion of losses that exceed the VaR is equal to p. The null hypothesis for the Historical Simulation method is unlike the VAR(3)-model rejected in none of the five cases. We also note that the null hypothesis for the VAR(3) model VaR estimation with p = 0.025 is close to being rejected. If we would use a significance level of 6% the VAR(3) model would also be rejected based on the VaR estimates with p = 0.025.

Historical VaR VAR(3) model VaR

p LR-statistic p-value Reject H0 LR-statistic p-value Reject H0 0.05 0.0013 0.9711 False 5.9146 0.0150 True 0.025 0.0937 0.7596 False 3.6686 0.0554 False 0.01 0.0369 0.8477 False 0.8830 0.3474 False 0.005 0.0183 0.8923 False 0.0183 0.8923 False 0.002 0 1 False 0.0926 0.7609 False

Table 6.3: Kupiec unconditional coverage test, with a significance level of 5%.

38 Frank de Zwart Abn Amro Model Validation

The Historical Simulation method satisfies the unconditional coverage property according to Kupiec’s backtest. The LR-statistics based on the HS method are for every value of p close to zero, which shows

that there is little reason to suspect that H0 does not hold. Moreover a well known drawback of these backtests based on the Value at Risk is that they often have a low power, particularly when dealing with a lower number of observations, like the data set that is used in this research. Nevertheless, this backtest still indicates that the HS method VaR, unlike the VAR(3) model VaR, satisfies the unconditional coverage property.

6.4.2 Magnitude-based test

Now in next step, we also take the magnitude of the losses into account. This is done by performing a multivariate backtest on both the normal exceptions as well as on the super exceptions, as described in Section 4.4.2. The results of this test are in line with what we have found so far and shown in the table below. The power of most of these tests is as stated before in theory relatively low for our data set. The magnitude-based test rejects the null hypothesis in none of the six cases. However, we find again very low LR-statistics for the HS VaR. If we would use a larger significance level (e.g. 10%), we would again reject the null hypothesis in two out of three cases of the VAR(3) model VaR estimates. On the other hand, we note a different outcome for the p = 0.01 & p0 = 0.002 coverage rates. The LR-statistic based on the Historical Simulation method in this case is even larger than the statistic based on the VAR(3) model. This can possibly be explained by the fact that there are very few VaR violations for these coverage rates, which makes the results for this test very inaccurate. This way it becomes more difficult to assess the model, because of the low number of violations.

Historical VaR VAR(3) model VaR 0 p | p LR-statistic p-value Reject H0 LR-statistic p-value Reject H0 0.050 | 0.01 0.0554 0.9727 False 5.9417 0.0513 False 0.025 | 0.005 0.0937 0.9543 False 5.4537 0.0654 False 0.010 | 0.002 1.8220 0.4021 False 1.7756 0.4116 False

Table 6.4: Magnitude-based test, with a significance level of 5%. To assess the quality of our methods even better, we also test on the independence property and apply a duration-based test.

6.4.3 Christoffersen

The third test is performed to check whether the occurrences of a loss greater than the Value at Risk on two different dates for the same coverage rate are independently distributed. We test with a significance level of 5% and reject the null hypothesis of independent outcomes in none of the eight cases. This indicates that the models in general do not violate the indepence property.

39 Frank de Zwart Abn Amro Model Validation

Historical VaR VAR(3) model VaR

p LR-statistic p-value Reject H0 LR-statistic p-value Reject H0 0.05 1.8791 0.1704 False 0.4577 0.4987 False 0.025 0.5666 0.4516 False 0.0891 0.7653 False 0.01 0.0891 0.7653 False 0.0222 0.8817 False 0.005 0.0222 0.8817 False 0.0222 0.8817 False 0.002 0 1 False 0.0055 0.9407 False

Table 6.5: Christoffersen independence property test, with a significance level of 5%.

Also note that the VAR(3) model actually performs better compared to the Historical Simulation method, based on the LR-statistics here. One of the drawbacks of the HS method Value at Risk estimation is that it does not always satisfy the independence property. In this case, we are not able to reject the null hypothesis of independence, but we note that the Historical Simulation method is performing somewhat worse than the VAR(3) model.

6.4.4 Duration-based test

The duration-based test is based on a GMM framework (Candelon et al., 2010). The test is performed for a number of different moment conditions, however the results are only shown for K = 6 moment conditions. The results for other numbers of moment conditions were similar to the results below and so the same number of moment conditions as Candelon et al.(2010) imposed in their empirical research is used. The duration-based test can be used to test the unconditional coverage property, the independence property, and the conditional coverage property. The results for the test based on the conditional coverage property are shown in the table below and again we are not able to reject the null hypothesis in any case. However, we note again the difference in GMM-statistics between both methods and we are able to reject the UC property for the VAR(3) model V aR(0.95) with a significance level of 1%. We do not display the results for p = 0.002, because this results in too few violations. To test the independence property, we need at least two durations between at least three violations. The results for the tests based on the UC and the IND properties by themselves are shown in Section A.9.

Historical VaR VAR(3) model VaR

p GMM-statistic p-value Reject CC H0 GMM-statistic p-value Reject CC H0 0.05 3.0972 0.9281 False 15.2962 0.0536 False 0.025 0.6814 0.9996 False 12.8338 0.1177 False 0.01 0.9753 0.9984 False 2.9089 0.9399 False 0.005 1.3723 0.9946 False 4.5482 0.8046 False

Table 6.6: Duration-based CC property test, with a significance level of 5% and K = 6.

The HS method performs well in the duration-based test on the conditional coverage property. Again the results show that the VAR(3) model does not produce better VaR estimates than the HS method, as also found with the unconditional coverage test and the magnitude-based test.

40 Frank de Zwart Abn Amro Model Validation

6.4.5 Kolmogorov-Smirnov

The next step in assessing the quality of the Historical Simulation method is to perform a test of goodness of fit. We have 363 different estimation samples from which we compute 363 different Value at Risk estimates. The profit and loss distributions, which we use to estimate the upcoming returns, are composed of 249 equally weighted historical returns. We now use the Kolmogorov-Smirnov test to assess whether the actual returns we observe are uniform draws from the estimation samples. This gives the opportunity to test a crucial assumption in our analysis, namely if the historical returns can be used with equal weights to obtain an accurate estimate of the one-day-ahead return. The output of the test is displayed in Table 6.7.

Significance level ks-statistic p-value Reject H0 0.05 0.0386 0.9461 False

Table 6.7: Kolmogorov-Smirnov test for Historical Simulation method.

The Historical Simulation method is used to estimate the profit and loss distribution and the true return is compared to this sample. First, we sort the returns of the estimated profit and loss distribution in ascending order. Then we check the rank of the observed return in this sorted estimated distribution. The absolute value of the rank of the observed return is then converted to the relative rank by dividing by the total number of observations in the estimation window. For the Historical Simulation method to be valid the observed returns need to be random draws from the 249 historical returns that represent the estimated profit and loss distribution for every value of t. We check if this is the case by evaluating if the relative rank of the actual return with respect to the values of the profit and loss distribution is a random draw from the uniform distribution. Therefore the two-sample Kolmogorov-Smirnov test is used to compare the sample of relative ranks to the theoretical values from the uniform distribution. The theoretical values from the uniform distribution are also plotted together with the sample of relative ranks and displayed in Figure A.10. Table 6.7 already showed that the null hypothesis is not rejected with a significance level of 5% and the graph also shows little difference between the two CDF’s. So to conclude, based on the Kolmogorov-Smirnov test, we are not able to reject the assumption that the historical returns can be used to estimate current returns.

6.4.6 Expected Shortfall backtest

Next, also the backtest based on the Expected Shortfall measure is performed. We focus again on different values for p and expect under the null hypothesis to find a value for Z equal to zero. The results of the backtest are shown in Table 6.8 and we find values for the Z statistic close to zero for the Historical Simulation method. The Z statistic is not defined for the Historical Simulation method with p = 0.001, because the indicator function is in this case equal to zero for all t.

41 Frank de Zwart Abn Amro Model Validation

Historical ES VAR(3) model ES

p Z-statistic Reject H0 Z-statistic Reject H0 0.05 0.0292 False 0.4616 False 0.025 -0.0595 False 0.9380 False 0.01 0.0624 False -0.3730 False 0.005 0.1305 False -0.2686 False 0.001 -- -0.8821 True

Table 6.8: Expected Shortfall backtest, with a significance level of 5%.

The Z statistic is strictly negative for the VAR(3) model estimations with p ≤ 0.01, but we only reject the VAR(3) model estimations with p = 0.001. We are not able to reject the null hypothesis for other values for p, but the results are in line with what we found based on the Value at Risk estimates. The VAR(3) model performs here worse than the Historical Simulation method based on the Expected Shortfall forecasts.

6.5 Robustness check: Local level model

The results that we find based on the VAR(3) model are less stable than we would have hoped for. The main goal is to find an accurate risk measure. However, stability is also valued, because this makes a risk measure more suitable to be actually enforced by a financial institution. Large shifts in the level of the risk measures will make it more difficult and more expensive to adjust the required amount of capital that needs to be hold. Now the Local Level model is also applied on the time series of SABR parameters in first differences. The estimates of the model parameters and the model that is used itself are denoted in Section A.10. Based on this model, we find the following simulations and VaR estimates over time.

VAR(3) and LLM Value-at-Risk estimates over time 200 VAR(3) 180 LLM 160

140

120

100

80 -99% VaR 60

40

20

0

-20 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date

Figure 6.10: Local level model simulations.

The results that are found based on the local level model are similar to results that were found with the VAR(3) model. If we compare the simulated profit and loss distributions, which are denoted in gray in the right part of Figure 6.10, we see outcomes that are close to the VAR(3) model simulations. Hence, applying the local level model does not result in more stable simulations. This is unfortunate, but on the other hand supports the results that we found based on the VAR(3) model.

42 Frank de Zwart Abn Amro Model Validation

7 Conclusion

The main goal of this research was to find more accurate risk measures for swaptions, based on a time series analysis of SABR model parameters. An empirical study is used to assess the quality of this new method compared to the commonly used Historical Simulation method. Before we were able to apply the time series analysis, the SABR volatility model had to be calibrated. The optimal value for

β is first estimated based on the log-log plot of σAT M and F and we found a value close to 0.5 for the ’10y10y’ swaption. Some studies however, do not make use of this log-log plot and set β = 0.5. When we estimated the optimal value for β based on the ’5y5y’ swaption, we found a value significantly different from 0.5. Estimating the value for β based on the log-log plot is straightforward and takes little time to carry out. For this reason, we recommend to always check this estimate of β based on the log-log plot, before the decision to set β equal to 0.5 is made. We then noticed the relationship between the magnitude of the displacement parameter and the volatility structure. We have chosen for the displaced SABR model to be able to deal with negative interest rates and we find that a high value for the displacement parameter can lead to unstable cali- brated SABR parameters. Based on this consequence, we conclude that it is preferred to use a dynamic value for the displacement parameter. Using a displacement parameter that represents the interest rate environment at the current time period results in a more stable SABR calibration. Then in the next step a time series analysis was applied to the SABR model parameters. After some diagnostic tests, we can conclude that a vector autoregressive model is not qualified to capture the dynamic structure of the time series. As a result, we obtain unstable estimates of our risk measures over time. This is unfavorable, so based on the VAR model we are not able to improve the estimates of the Historical Simulation method. We then also use a local level model to analyze the time series, but find similar results as we found with the VAR model. We recall the research question of this thesis. Can one outperform the Historical Simulation Value at Risk and Expected Shortfall forecasts by fitting a time series model to the calibrated SABR model parameters instead? Based on our empirical study, we can conclude that we were not able to improve the Historical Simulation estimates of the risk measures by using a vector autoregressive model or a local level model. If we compare the results of the numerous backtests, we conclude that the Historical Simulation method performs relatively well. The indepence property is in general sometimes violated when the HS method is used, but in our case we do not reject the null hypothesis of independence. However, we note that we find somewhat higher LR-statistics and keep in mind that it is possible that the power of our backtest is too low to reject in our case. Nevertheless the HS method performs in this case relatively well even though the estimates of the risk measures respond slowly to changes in the profit and loss distribution. The vector autoregressive model on the other hand performs worse in the unconditional coverage test, the magnitude-based test and the duration-based test. Also if we perform a test based on the estimates of the Expected Shortfall measure, we find that the estimates based on the HS method are more accurate than the estimates based on the VAR(3) model. The backtests are in line with what we noticed from the estimated risk measures themselves and confirm that the Historical Simulation method outperforms the vector autoregressive model in the estimation of the risk measures. In our conclusion, we make the distinction between two possibilities. First of all, it could be the case that another more advanced time series model is able to produce better estimates of the risk measures. This is something that would be interesting for a follow up research. We saw for example that the shifts in the dynamic displacement parameter caused a shift in the calibrated SABR model parameters. These

43 Frank de Zwart Abn Amro Model Validation shifts are in this study ignored, but it would be interesting to check to which extent the estimates of the risk measures could be improved by taken these shocks into account. On the other hand it could also be the case that the uncertainty in the simulated one-day-ahead SABR model parameters just has a too large impact on the volatility structure and as a result also on the price of the swaptions. If this is the case, then the time series analysis on itself is not the main issue. In a follow up research, it would be interesting to investigate whether one can find better estimates with a more advanced time series model and when this does not work it would also be interesting to investigate why this is the case.

44 Frank de Zwart Abn Amro Model Validation

References

Acerbi, C. and Szekely, B. (2014). Backtesting expected shortfall. Risk.

Antonov, A., Konikov, M., and Spector, M. (2015). The free boundary sabr: Natural extension to negative rates. Risk.

Barone-Adesi, G., Giannopoulos, K., and Vosper, L. (2002). Backtesting derivative portfolios with filtered historical simulation (fhs). European Financial Management, 8(1):31–58.

Basel Committee (2013). Fundamental Review of the Trading Book: A revised market risk framework. Bank for International Settlements.

Berestycki, H., Busca, J., and Florent, I. (2004). Computing the implied volatility in models. Communications on Pure and Applied Mathematics, 57(10):1352–1373.

Black, F. (1976). The pricing of commodity contracts. Journal of , 3(1-2):167–179.

Bogerd, K. (2015). Smile risk in expected shortfall estimation for interest rate options. Utrecht University.

Brigo, D. and Mercurio, F. (2007). Interest rate models - theory and practice: with smile, inflation and credit. Springer.

Campbell, S. (2007). A review of backtesting and backtesting procedures. The Journal of Risk, 9(2):1–17.

Candelon, B., Colletaz, G., Hurlin, C., and Tokpavi, S. (2010). Backtesting value-at-risk: A gmm duration-based test. Journal of Financial Econometrics, 9(2):314–343.

Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Review, 39(4):841–862.

Colletaz, G., Hurlin, C., and Perignon, C. (2013). The risk map: A new tool for validating risk models. Journal of Banking and Finance, 37(10):3843–3854.

Commandeur, J. J. F. and Koopman, S. J. (2007). An introduction to state space time series analysis. Oxford University Press.

Du, Z. and Escanciano, Juan, C. (2015). Backtesting expected shortfall: Accounting for tail risk. Man- agement Science.

Frankema, L. (2016). Pricing and hedging options in a negative interest rate environment. Delft University of Technology.

Giordano, L. and Siciliano, G. (2013). Real-world and risk-neutral probabilities in the regulation on the transparency of structured products. SSRN Electronic Journal.

Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Associ- ation, 106(494):746–762.

Gurrola, P. and Murphy, D. (2015). Filtered historical simulation value-at-risk models and their com- petitors. Bank of England.

Hagan, P., Kumar, D., Lesniewski, S., and Woodward, D. (2002). Managing smile risk. Wilmott magazine, 1:84–108.

45 Frank de Zwart Abn Amro Model Validation

Hull, J. (2012). Options, futures, and other derivatives. Prentice Hall.

Itô, K. (1951). On stochastic differential equations. Memoirs of the American Mathematical Society, 4:1–51.

Kupiec, P. H. (1995). Techniques for verifying the accuracy of risk measurement models. The Journal of Derivatives, 3(2):73–84.

Liew, V. K.-S. (2004). Which lag length selection criteria should we employ. Economics Bulletin, 3(33):1–9.

Massey, F. J. (1951). The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68.

Miller, L. H. (1956). Table of percentage points of kolmogorov statistics. Journal of the American Statistical Association, 51(273):111.

Moni, C. (2014). Risk managing smile risk with sabr model. WBS Interest Rate Conference.

Obłój, J. (2008). Fine-tune your smile correction to hagan et al. Wilmott magazine, 1.

Pérignon, C. and Smith, D. R. (2010). The level and quality of value-at-risk disclosure by commercial banks. Journal of Banking & Finance, 34(2):362–377.

Piontek, K. (2009). The analysis of power for some chosen var backtesting procedures: Simulation ap- proach. Advances in Data Analysis, Data Handling and Business Intelligence Studies in Classification, Data Analysis, and Knowledge Organization, page 481–490.

Pritsker, M. G. (2001). The hidden dangers of historical simulation. Journal of Banking and Finance, 30(2):561–582.

Roccioletti, S. (2016). Backtesting value at risk and expected shortfall. Springer Gabler.

Tsay, R. S. (2005). Analysis of financial time series. Wiley Series in Probability and Statistics.

Uri, R. (2000). A practical guide to swap curve construction. Bank of Canada.

West, G. (2005). Calibration of the sabr model in illiquid markets. Applied , 12(4):371–385.

46 Frank de Zwart Abn Amro Model Validation

A Appendix

A.1 Data

Boxplots of the data are displayed below. The Euribor data consists of Deposits for all tenors up to and including three weeks and of Swaps for all of the remaining tenors. The boxplots show the minimum, the quantiles, and the outliers of the data for every tenor and strike rate, respectively. The boxplot draws

values as an outlier if they are larger than q3 + w(q3 − q1) or smaller than q1 − w(q3 − q1), with whisker w = 1.5 and q1 and q3 equal to the 25th and the 75th percentile of the sample data, respectively. We note that the Euribor data is shown in a more compact way, but this boxplot would still show outliers if they existed in the data. The boxplot of the swaption premiums on the other hand shows multiple outliers. If the data would be normally distributed then we would expect to find 0.7% outliers for every strike rate, which equals 4.3 outliers for every strike rate in our sample. However, we notice more outliers for the swaptions with a strike rate that is further out-of-the-money. This indicates that the swaption premiums are not identically distributed for different strike rates.

Boxplot of Euribor data

1.5

1

0.5 Deposit or swap rate in %

0

-0.5 ON TN SN 1W 2W 3W 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 1Y 15M 18M 21M 2Y 27M 30M 33M 3Y 4Y 5Y 6Y 7Y 8Y 9Y 10Y 11Y 12Y 13Y 14Y 15Y 20Y 25Y 30Y 35Y 40Y 50Y 60Y Tenor

Figure A.1: Boxplot of Euribor data.

47 Frank de Zwart Abn Amro Model Validation

Boxplot of swaption premium data 900

800

700

600

500

400

Premium in Euros 300

200

100

-1.5 -1 -0.75 -0.5 -0.25 -0.125 -0.0625 0 0.0625 0.125 0.25 0.5 0.75 1 1.5 2 3 Relative strike rates in %

Figure A.2: Boxplot of premiums for the ’10y10y’ swaption.

A.2 Determining the optimal value for β

Log log plot with an OLS approximation -1.2 Data -1.25 OLS approximation

-1.3

-1.35

-1.4 ATM

-1.45 log

-1.5

-1.55

-1.6

-1.65 -5.4 -5.2 -5 -4.8 -4.6 -4.4 -4.2 -4 -3.8 Log F

Figure A.3: Log-log plot for the ’5y5y’ swaption.

The OLS estimation gives us the following results:

Log α -(1-β) α β OLS estimate -2.7005 -0.2809 0.0672 0.7191

Table A.1: OLS estimates for α and β.

48 Frank de Zwart Abn Amro Model Validation

A.3 Time series of SABR parameters

SABR parameters with fixed displacement SABR parameters with dynamic displacement 1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2 -0.2 SABR parameters SABR parameters -0.4 -0.4

-0.6 -0.6

-0.8 -0.8

-1 -1 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure A.4: SABR parameters for the ’5y5y’ swaption.

SABR parameters with dynamic displacement for the 5y5y swaption SABR parameters with displacement for the 10y10y swaption 1 3 1 3

0.8 0.8 2.5 2.5 0.6 0.6

0.4 0.4 2 2 0.2 0.2

0 1.5 0 1.5

-0.2 -0.2 SABR parameters SABR parameters 1 Displacement in % 1 Displacement in % -0.4 -0.4

-0.6 -0.6 0.5 0.5 -0.8 -0.8 Dynamic displacement Dynamic displacement -1 0 -1 0 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date Date

Figure A.5: SABR parameters with dynamic displacement.

49 Frank de Zwart Abn Amro Model Validation

A.4 Lag length selection

Sample ACF of Sample ACF of Sample ACF of Sample ACF of 1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0

-0.5 -0.5 -0.5 -0.5 Sample Autocorrelation 0 20 40 60 Sample Autocorrelation 0 20 40 60 Sample Autocorrelation 0 20 40 60 Sample Autocorrelation 0 20 40 60 Lag Lag Lag Lag Sample ACF of Sample ACF of Sample ACF of Sample ACF of 1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0

-0.5 -0.5 -0.5 -0.5 Sample Autocorrelation 0 200 400 600 Sample Autocorrelation 0 200 400 600 Sample Autocorrelation 0 200 400 600 Sample Autocorrelation 0 200 400 600 Lag Lag Lag Lag

Sample ACF of Sample ACF of 1 1

0.5 0.5

0 0

-0.5 -0.5 Sample Autocorrelation 0 20 40 60 Sample Autocorrelation 0 20 40 60 Lag Lag Sample ACF of Sample ACF of 1 1

0.5 0.5

0 0

-0.5 -0.5 Sample Autocorrelation 0 200 400 600 Sample Autocorrelation 0 200 400 600 Lag Lag

Figure A.6: Sample ACF’s for SABR parameters with Dynamic displacement.

50 Frank de Zwart Abn Amro Model Validation

Below the results for several selection criteria are shown. The LR stands for the sequential modified Likelihood-Ratio test statistic, FPE is the final prediction error, AIC is the Akaike information criterion, SC is the Schwarz information criterion and HQ represents the Hannan-Quinn information criterion. The tests in the table are at a 5% level and the optimal lag order selected by the criterion is denoted with a asterisk.

Lag LogL LR FPE AIC SC HQ 0 2261,784 NA 5,92E-13 -19,64160 -19,59675 -19,62351 1 2290,584 56,59829 4,98E-13 -19,81377 -19,63440* -19,74142 2 2303,829 25,68476 4,80E-13 -19,85069 -19,53678 -19,72407 3 2322,641 35,98778 4,41E-13 -19,93601 -19,48757 -19,75512* 4 2329,173 12,32593 4,51E-13 -19,91455 -19,33157 -19,67939 5 2337,536 15,56151 4,53E-13 -19,90901 -19,19150 -19,61958 6 2349,805 22,51040 4,41E-13 -19,93743 -19,08539 -19,59373 7 2361,361 20,90162 4,31E-13 -19,95966 -18,97308 -19,56169 8 2369,328 14,20298 4,36E-13 -19,95068 -18,82957 -19,49845 9 2391,234 38,47807 3,90E-13 -20,06291 -18,80726 -19,55640 10 2398,282 12,19566 3,97E-13 -20,04593 -18,65575 -19,48516 11 2409,427 18,99526 3,90E-13 -20,06458 -18,53987 -19,44954 12 2419,645 17,14795 3,87e-13* -20,07517* -18,41592 -19,40587 13 2425,063 8,95164 4,00E-13 -20,04402 -18,25024 -19,32045 14 2433,286 13,37230 4,04E-13 -20,03727 -18,10896 -19,25943 15 2438,372 8,13667 4,19E-13 -20,00323 -17,94039 -19,17112 16 2449,252 17,12415 4,14E-13 -20,01958 -17,82220 -19,13320 17 2464,643 23,82352 3,93E-13 -20,07516 -17,74325 -19,13451 18 2468,343 5,62935 4,13E-13 -20,02907 -17,56262 -19,03415 19 2471,675 4,98327 4,36E-13 -19,97978 -17,37880 -18,93060 20 2483,380 17,20143* 4,29E-13 -20,00330 -17,26779 -18,89985

Table A.2: VAR lag order selection criteria.

51 Frank de Zwart Abn Amro Model Validation

A.5 Vector Autoregression

AR−Stationary 3−Dimensional VAR(3) Model

Effective Sample Size: 246 Number of Estimated Parameters: 30 LogLikelihood: 2432.32 AIC : −4804.63 BIC : −4699.47

Value StandardError TStatistic PValue ______

Constant(1) −4.9616e−05 6.4909 e−05 −0.7644 0.44463 Constant(2) −0.0007553 0.0025274 −0.29885 0.76506 Constant(3) −0.0003377 0.0017135 −0.19708 0.84377 AR{1}(1 ,1) −0.019815 0.07939 −0.2496 0.8029 AR{1}(2 ,1) −3.6426 3.0912 −1.1784 0.23865 AR{1}(3 ,1) −2.6018 2.0958 −1.2415 0.21444 AR{1}(1,2) 0.001924 0.0021331 0.90198 0.36707 AR{1}(2 ,2) −0.34306 0.083057 −4.1305 3.62e−05 AR{1}(3 ,2) −0.055376 0.056311 −0.98341 0.32541 AR{1}(1,3) 0.0019307 0.0028546 0.67634 0.49883 AR{1}(2 ,3) −0.048717 0.11115 −0.43829 0.66118 AR{1}(3,3) 0.016625 0.075359 0.22061 0.8254 AR{2}(1,1) 0.11722 0.078275 1.4976 0.13425 AR{2}(2 ,1) −10.905 3.0478 −3.578 0.00034629 AR{2}(3,1) 5.5086 2.0664 2.6658 0.0076798 AR{2}(1 ,2) −0.0036061 0.0021917 −1.6453 0.099902 AR{2}(2 ,2) −0.14852 0.08534 −1.7403 0.081801 AR{2}(3 ,2) −0.033632 0.057859 −0.58128 0.56105 AR{2}(1 ,3) −0.0113 0.0028421 −3.9758 7.0137e−05 AR{2}(2,3) 0.30217 0.11067 2.7304 0.0063252 AR{2}(3 ,3) −0.24748 0.07503 −3.2985 0.00097215 AR{3}(1,1) 0.050474 0.079474 0.6351 0.52537 AR{3}(2 ,1) −7.2978 3.0945 −2.3583 0.018359 AR{3}(3 ,1) −0.36743 2.098 −0.17513 0.86098 AR{3}(1 ,2) −0.0016881 0.0021352 −0.79059 0.42918 AR{3}(2 ,2) −0.0095256 0.083142 −0.11457 0.90879 AR{3}(3 ,2) −0.1261 0.056368 −2.2371 0.025282 AR{3}(1 ,3) −0.012139 0.0028895 −4.2012 2.6556e−05 AR{3}(2,3) 0.54294 0.11251 4.8257 1.3953e−06 AR{3}(3 ,3) −0.29708 0.07628 −3.8946 9.835e−05

52 Frank de Zwart Abn Amro Model Validation

Innovations co−variance matrix: 0.0000 −0.0000 0.0000 −0.0000 0.0016 −0.0006 0.0000 −0.0006 0.0007

Innovations correlation matrix: 1.0000 −0.5863 0.4330 −0.5863 1.0000 −0.5310 0.4330 −0.5310 1.0000

A.6 Evaluating the time series analysis

The results of the performed diagnostic tests are summarized below. The lag length selection criteria are shown for different samples and the other tests are all based on the first estimation window (13/Jan/2015 - 06/Jan/2016).

Portmanteau test LM test (χ2(9)) Lags Q-Stat Prob. Adj Q-Stat Prob. df LM-Stat Prob 1 0,395149 NA* 0,396762 NA* NA* 10,44895 0,3154 2 1,948446 NA* 1,96279 NA* NA* 11,24182 0,2595 3 4,111961 NA* 4,153016 NA* NA* 16,09898 0,0648 4 7,868684 0,5474 7,971833 0,537 9 4,45073 0,8793 5 11,26682 0,8827 11,44047 0,8747 18 3,858909 0,9205 6 44,47365 0,0185 45,47747 0,0145 27 37,52362 0 7 58,42196 0,0105 59,83431 0,0076 36 13,75203 0,1314 8 78,40643 0,0015 80,49053 0,0009 45 20,5727 0,0147 9 93,57406 0,0007 96,23414 0,0004 54 15,46203 0,079 10 102,7398 0,0012 105,7882 0,0006 63 9,46408 0,3956 11 115,5032 0,0009 119,149 0,0004 72 13,45859 0,1429 12 139,618 0,0001 144,5006 0 81 24,38307 0,0037 13 173,1128 0 179,8642 0 90 36,33583 0 14 177,9179 0 184,9593 0 99 5,150915 0,821 15 184,9171 0 192,4129 0 108 7,294864 0,6064 16 195,7306 0 203,9787 0 117 11,4214 0,2479 17 212,7453 0 222,2564 0 126 17,5525 0,0407 18 224,2442 0 234,6632 0 135 12,33173 0,1952 19 243,6628 0 255,7071 0 144 21,86051 0,0093 20 252,3717 0 265,1867 0 153 8,914652 0,4452

Table A.3: Portmanteau test and LM test for auto correlation.

53 Frank de Zwart Abn Amro Model Validation

Without cross terms Dependent R-squared F(18,227) Prob. χ2(18) Prob. res1*res1 0,646044 23,01793 0 158,9268 0 res2*res2 0,546893 15,22139 0 134,5356 0 res3*res3 0,283992 5,001986 0 69,86214 0 res2*res1 0,657318 24,19009 0 161,7002 0 res3*res1 0,619349 20,51922 0 152,3597 0 res3*res2 0,588867 18,06292 0 144,8612 0 Joint test (χ2(108)): 312,5637, with prob. 0.

With cross terms Dependent R-squared F(18,227) Prob. χ2(18) Prob. res1*res1 0,929864 46,89429 0 228,7466 0 res2*res2 0,910374 35,92741 0 223,952 0 res3*res3 0,465922 3,085659 0 114,6168 0 res2*res1 0,981364 186,2624 0 241,4156 0 res3*res1 0,924792 43,49284 0 227,4988 0 res3*res2 0,927955 45,55747 0 228,2768 0 Joint test (χ2(324)): 720,2421, with prob. 0.

Table A.4: White heteroskedasticity test.

Component Skewness χ2 df Prob. 1 -2,294449 92,91814 1 0 2 -0,433951 7,546527 1 0,006 3 -1,176436 40,00863 1 0 Joint test 140,4733 3 0 Component Kurtosis χ2 df Prob. 1 20,48321 20,32191 1 0 2 18,2069 444,9114 1 0 3 22,57069 417,4646 1 0 Joint test 882,698 3 0 Component Jarque-Bera df Prob. 1 113,24 2 0 2 452,458 2 0 3 457,4732 2 0 Joint test 1023,171 6 0

Table A.5: Normality test.

54 Frank de Zwart Abn Amro Model Validation

A.7 Forecasts

Figure A.7: Forecasts based on VAR(3) model fitted to ’10y10y’ swaption with dynamic displacement.

55 Frank de Zwart Abn Amro Model Validation

A.8 Risk measurement

Range of strikes and the par swap rate over time 3.5 K K K Par swap rate Fixed strike interval 1 2 3

3

2.5

2

Rate in % 1.5

1

0.5

0 Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date

Figure A.8: Range of strikes and the par swap rate over time.

VAR(3) and Historical Simulation errors 100

50

0

-50

-100 Overestimate in Euros

-150

VAR(3) model MSE:563.7987 Historical Simulation MSE:378.7922 -200 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17 Date

Figure A.9: Comparison between actual and mean of estimated returns.

56 Frank de Zwart Abn Amro Model Validation

A.9 Backtests

Historical VaR VAR(3) model VaR

p GMM-statistic p-value Reject UC H0 GMM-statistic p-value Reject UC H0 0.05 0.2006 0.6542 False 7.5003 0.0062 True 0.025 0.2234 0.6365 False 1.4720 0.2250 False 0.01 5.3872e-04 0.9815 False 0.8001 0.3711 False 0.005 4.0201e-04 0.9840 False 0.8975 0.3434 False

Table A.6: Duration-based UC property test, with a significance level of 5% and K = 6 moment conditions.

Historical VaR VAR(3) model VaR

p GMM-statistic p-value Reject IND H0 GMM-statistic p-value Reject IND H0 0.05 2.8965 0.9407 False 7.7958 0.4537 False 0.025 0.4581 0.9999 False 11.3618 0.1820 False 0.01 0.9747 0.9984 False 2.1088 0.9775 False 0.005 1.3719 0.9946 False 3.6506 0.8872 False

Table A.7: Duration-based IND property test, with a significance level of 5% and K = 6 moment conditions.

Distribution comparison Kolmogorov-Smirnov test 1

0.9

0.8

0.7

0.6

0.5

Probability 0.4

0.3

0.2

0.1 Uniform distribution CDF Empirical CDF 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x values

Figure A.10: Comparison between empirical and theoretical distribution Historical Simulation method.

57 Frank de Zwart Abn Amro Model Validation

A.10 Local level model

The local level model that is used is specified as follows:

(1) (1) (1) µt = µt−1 + c1εt (2) (2) (2) µt = µt−1 + c2εt (3) (3) (3) µt = µt−1 + c3εt (A.1)

(1) (2) (3) αt = c4µt + c7µt + c10µt + c13ηt (1) (2) (3) ρt = c5µt + c8µt + c11µt + c14ηt (1) (2) (3) νt = c6µt + c9µt + c12µt + c15ηt

Also the following initial state mean and co-variance matrix estimates are used:

Initial state means: x1 x2 x3 −6.5489∗e−05 5.2572∗ e−04 −4.7769∗e−05

Initial state co−variance matrix: x1 x2 x3 x1 1.66 e−07 −3.25e−06 −3.92e−08 x2 −3.25e−06 6.65 e−04 2.24 e−05 x3 −3.92e−08 2.24 e−05 3.71 e−05

The coefficients of the model equations are estimated by maximum likelihood and this results in the following values:

58 Frank de Zwart Abn Amro Model Validation

Coeff Std Err t Stat Prob. c(1) -0.00009 0.00624 -0.01466 0.98830 c(2) -0.00005 0.30904 -0.00015 0.99988 c(3) -0.00004 0.36125 -0.00011 0.99991 c(4) 0.08108 3.01323 0.02691 0.97853 c(5) 0.46985 20.65594 0.02275 0.98185 c(6) 0.43666 23.87897 0.01829 0.98541 c(7) 0.02826 0.58728 0.04812 0.96162 c(8) 0.40308 15.57251 0.02588 0.97935 c(9) 0.42118 11.34973 0.03711 0.97040 c(10) 0.02609 5.85120 0.00446 0.99644 c(11) 0.43940 79.70170 0.00551 0.99560 c(12) 0.42112 87.15293 0.00483 0.99614 c(13) -0.00109 0.00004 -26.26307 0 c(14) -0.04227 0.00086 -49.09493 0 c(15) -0.02838 0.00100 -28.50648 0 Final Final State Std Dev t Stat Prob. x(1) 0.00011 0.00161 0.07058 0.94373 x(2) -0.00676 0.02566 -0.26360 0.79209 x(3) 0.00535 0.02502 0.21382 0.83069

Table A.8: Parameter estimates of the local level model.

This is based on the first differences of the SABR parameters in the first estimation window with a sample size of 249. We also find a logarithmic likelihood of 2307.61, an Akaike information criterion of -4585.21 and a Bayesian information criterion of -4532.45.

59