ESTIMATING RISK USING STOCHASTIC VOLATILITY MODELSAND PARTICLE STOCHASTIC APPROXIMATION EXPECTATION MAXIMIZATION

HENRIK KRAGH

Master’s thesis MATHEMATICARUM SCIENTIARUM CENTRUM 2020:E62

Faculty of Science Centre for Mathematical Sciences Mathematical Statistics

Abstract In this thesis several stochastic volatility models are presented and used to estimate the risk of a collection of Swedish stocks, as well as of a portfolio consisting of said stocks. Model parameters are estimated using the PSAEM algorithm. It is concluded that these model are adequate at estimating the one day ahead five percent Value at Risk of the data in terms of conditional coverage.

Contents

1 Introduction 1 1.1 Stochastic Volatility ...... 1 1.2 Thesis Objective ...... 2

2 Theory and Concepts 3 2.1 Log Returns ...... 3 2.2 Value at Risk ...... 3 2.2.1 Backtesting Value at Risk ...... 3 2.3 Hidden Markov Model ...... 4 2.4 Maximum Likelihood ...... 5 2.5 Exponential Family Distributions ...... 5

3 Sequential Monte Carlo Methods 7 3.1 Expectation Maximization ...... 7 3.2 Monte Carlo EM ...... 7 3.3 Sequential Monte Carlo ...... 8 3.4 Particle Filter ...... 9 3.5 Particle Gibbs ...... 9 3.6 Particle Gibbs with Ancestor Sampling ...... 10 3.7 Particle Stochastic Approximation EM ...... 11

4 Method 15 4.1 Data ...... 15 4.2 Simple Univariate Model ...... 16 4.3 Correlated Univarite Model ...... 17 4.4 Multivariate Model ...... 18 4.5 Cholesky Decomposition ...... 18 4.6 Spherical Parameterization ...... 19 4.7 Portfolio ...... 19

5 Results 21 5.1 Simple Univariate Model ...... 21 5.1.1 Simulation Study ...... 21 5.1.2 Parameter Estimates ...... 22 5.1.3 Model Evaluation ...... 23 5.2 Correlated Univariate Model ...... 25 5.2.1 Parameter Estimates ...... 25 5.2.2 Model Evaluation ...... 26 5.3 Multivariate Model ...... 27 5.3.1 Parameter Estimates ...... 28 5.3.2 Model Evaluation ...... 29 6 Discussion and Conclusion 31 6.1 PSAEM ...... 31 6.2 Simulation Study ...... 31 6.3 Simple Univariate Model ...... 31 6.4 Correlated Univariate Model ...... 32 6.5 Multivariate Model ...... 32 6.6 Conclusion ...... 33 6.7 Future Studies ...... 33

A Appendix 35 A.1 Exponential Form of the Multivariate Model ...... 35 A.2 Figures ...... 36 A.3 Full Parameter Estimates of the Multivariate Model ...... 39

7 References 41 1 Introduction

Financial risk management has become a big part of the global economy, allowing companies as well as private investors to maximize potential earnings while controlling the amount of risk they are exposed to. A big part of risk management within finance is the construction of portfolios consisting of several assets in such a way that the risk of the portfolio matches the risk preferences of its holder. To manage this risk it is essential to be able to estimate and predict it, which requires sufficient understanding of the behaviour of the returns of the assets. Thus finding ways of modeling these returns hold great relevance. These models often have complex likelihoods, as well as require optimization over a very large parameter space (such as the hidden state of a hidden Markov model). For this reason models for the returns of financial assets are commonly applied using Monte Carlo methods, which will be done in this thesis.

1.1 Stochastic Volatility

Volatility is a measure of the unforeseeable dispersion that occurs within the value of an asset over time. Thus any method of approximating risk must take into consideration the volatility of the asset.

Figure 1: Daily log returns of Holmen B from 2006-01-02 to 2020-01-23.

The returns of an asset often varies between periods with low volatility, and periods with high volatility. This behaviour is known as volatility clustering [5]. An example of this is presented in Figure 1, where a cluster of high volatility can be seen at the end of 2008 due to the subprime mortgage crisis. Modern financial models capture this behaviour by allowing volatility to change over time. This often result in models where the volatility is

1 driven by randomness, which are known as stochastic volatility models.

1.2 Thesis Objective

The aim of this thesis is twofold. Firstly, to evaluate the performance of the Particle Stochastic Approximation Expectation Maximisation (PSAEM) algorithm when applied to financial models. Secondly, and primarily, to present several new stochastic volatility models for estimating the risk of daily asset returns. These models are evaluated via backtesting of the one day Value at Risk of individual Swedish stocks, as well as of a portfolio consisting of several stocks.

2 2 Theory and Concepts

2.1 Log Returns

The log returns (y1, . . . ,yT ) = y1:T of a process S0:T is given by the logarithm of the current price divided by the price one time unit back. That is, ! St yt = log . (1) St−1

Log returns are typically used to describe changes in value of derivative assets, as they have the nice property that they are time additive. That is, the log return over several time steps is the sum of all individual log returns in the interval.

2.2 Value at Risk

The Value at Risk at level α, referred to as VaR(α), of a stochastic return Y is defined as

−1 VaR(α) = − max{v : P (Y ≤ v) < α} = −FY (α). That is, the negative α quantile of Y [7]. In this thesis VaR will be used to calculate the one day risk of different assets, based on parametric estimates of the distribution of the conditional log returns yt+1|y1:t,x0:t. At time t we wish to estimate

VaR (α) = −F −1 (α). t+1 yt+1|y1:t,x0:t,θ

2.2.1 Backtesting Value at Risk

For a given return process y1:T with a corresponding α level Value at Risk process VaR1:T (α), consider the indication process I1:T (α) that is defined by  1 if yt ≤ − VaRt(α) It(α) = (2) 0 if yt > − VaRt(α).

For a VaR process which correctly predicts the α quantiles this sequence will have the properties

• P (It(α) = 1) = α, for t = 1, . . . ,T.

• Any pair {Ii(α),Ij(α)}, i 6= j are independent.

The first property implies that the VaR sequence must correctly predict the α quantile of the return at all time points, and is known as the unconditional coverage property. The second property means that the occurrence of a VaR exceedance must not give any information about future exceedances, known as the independence property. Showing that these two properties holds true turns out to be equivalent to showing that

3 i.i.d. It(α) ∼ B(α) for t = 1, . . . ,T where B(α) is a Bernoulli distribution with parameter α [4]. This is known as the condi- tional coverage property.

To test that the unconditional coverage property of the indicator sequence holds, i.e. that P (It(α) = 1) = α, one can perform the likelihood ratio test for a model where I ∼ B(α) compared to a Bernoulli distribution with a free parameter. This will give the likelihood ratio ! (1 − α)T −nαn LR = −2 log (3) 1 (1 − αˆ)T −nαˆn PT n where n = 1 It(α) and αˆ = T . If the unconditional coverage property holds, LR1 will 2 asymptotically follow a χ1 distribution. Therefore it can be rejected with confidence level 0 −1 0 α if LR1 > F 2 (α ). χ1 The independence property of the indicator sequence can be tested with a likelihood ratio test on whether modeling I1:T as i.i.d. B(ˆα) is sufficient, or if the sequence would be bet- ter modeled as a binary Markov chain with transition probabilities P (It = 0|It−1 = 0) = αˆ00,P (It = 1|It−1 = 1) =α ˆ1,1. This is done using the likelihood ratio ! (1 − αˆ)T −nαˆn LR2 = −2 log (4) n00 n01 n10 n11 (ˆα00) (1 − αˆ00) (1 − αˆ11) αˆ11

nij where n is the number of transitions in I : from state i to state j, and αˆ = . i,j 0 T ij ni0+ni1 2 LR2 will under the second property of I0:T (α) converge to a χ1 distribution.

As the three models used in the previous likelihood ratio tests are nested, a test can be constructed which simultaneously test both properties of the indicator sequence. This means that ! (1 − α)T −nαn LRVaR = −2 log (5) α n00 n01 n10 n11 (ˆα00) (1 − αˆ00) (1 − αˆ11) αˆ11

2 will be asymptotically χ2 distributed. Thus the hypothesis that conditional coverage of the 2 indicator sequence holds true can be tested by comparing LRVaRα to the quantiles of a χ2 distribution [4].

2.3 Hidden Markov Model

A pair of stochastic processes {x0:T , y1:T } is called a hidden Markov model (HMM) if  x is a Markov process that is not observed. 1:T (6) P (yt|y1, . . . ,yT ,x1, . . . ,xT ) = P (yt|xt) for 1 < t < T.

This means that at each time step, a new hidden state xt is generated from a distribution that only depends on xt−1, and then a observation yt is generated from xt.

4 2.4 Maximum Likelihood

The perhaps most frequently used estimate model parameters based on observations is the maximum likelihood estimator, the parameter values for which the observations are as likely as possible. For an observation y, it is defined as ˆ θML = arg max pθ(y). θ

In the case of a stochastic process y1:T with a hidden state x0:T , the likelihood of the process is equal to their joint likelihood integrated over the distribution of x0:T . That is,

ˆ θML = arg max pθ(y1:T ) = arg max pθ(y1:T , x0:T )dx0:T . θ θ ˆ

Additionally, if (x0:T ,y1:T ) is a HMM, the joint probability in the integral can be rewritten as

T Y pθ(y0:T ,x1:T ) = pθ(x0) pθ(xt|xt−1)pθ(yt|xt). (7) t=1

2.5 Exponential Family Distributions

A distribution is said to be part of the exponential family if it can be written on the form

hS(x),φ(θ)i−ψ(θ) pθ(x) = h(x)e where S, φ, ψ and h are known functions.

Distributions that are part of an exponential family have the very useful property that the likelihood of n i.i.d. observations x1, . . . xn is given by

n θ Y hS(xi),φ(θ)i−ψ(θ) hSx,φ(θ)i−nψ(θ) pθ(x1, . . . , xn) = h(xi)e ∝ e (8) i=1 Pn where Sx = 1 S(xi).

5

3 Sequential Monte Carlo Methods

3.1 Expectation Maximization

The EM algorithm iterates the two procedures that gives it its name, which after an initi- ation θ0 are • An Expectation step, where

Qk(θ)] ← E[ log(pθ(y1:T ,x0:T )) | θk−1, y1:T ] (9) = log(p (y ,x ))p (x | y ) dx . ˆ θ 1:T 0:T θk−1 0:T 1:T 0:T • A Maximization step, where

θk ← arg max Qk(θ). (10) θ

That is, in the first step the likelihood function of y1:T is approximated as the expected likelihood of y1:T given x0:T , where x0:T is assumed to be distributed according to the pre- vious parameter estimate θk−1. In the second step, this proxy likelihood is then maximized, producing the new parameters θk. Iterating these steps will lead to convergence of θ to a local maximum of pθ(y1:T |θ) [12].

3.2 Monte Carlo EM

In many cases the integral in the EM algorithm cannot be calculated directly and therefore requires a numerical solution. The perhaps most common, and simple, solution is to replace the integral in (9) with the Monte Carlo sum 1 N Qˆ (θ) = X log(p (y ,xn )) k N θ 1:T 0:T n=1 1 N where x0:T , . . . ,x0:T are observations of the distribution pθk−1 (x0:T |y1:T ), which can be gen- erated via Monte Carlo methods.

Algorithm 1: MCEM Input: Initial parameter θ0. Output: Parameter estimate θK . for k = 1, . . . ,K do 1 J Run a MC sampler with θ = θk−1 for trajectories x0:T , . . . ,x0:T PJ j Set Qk(θ) ← 1 log(pθ(y1:T ,x0:T )) Set θk ← arg maxθ Qk(θ) end

This however creates a new problem in that in order to converge the EM algorithm now also requires N → ∞ for the sum to converge to the integral it approximates. It may also be very difficult to correctly draw from the high dimensional distribution pθ(x0:T |y1:T ).

7 3.3 Sequential Monte Carlo

Drawing samples from pθ(x0:T |y1:T ) may be a very high dimensional problem, and therefore be very difficult. Say that one instead know pθ(x0:T |y1:T ) up to some normalizing constant, z(x0:T ) i.e. pθ(x0:T |y1:T ) = c , where we can evaluate z(x0:T ) but not pθ(x0:T |y1:T ). In that case one can use an instrumental distribution r(x0:T ) such that r(x0:T ) = 0 =⇒ pθ(x0:T |y1:T ) = 0. Then for any function ϕ(x0:T ) it follows that

E [ϕ(x ) | θ ] = ϕ(x )p (x | y ) dx x 0:T k−1 ˆ 0:T θk−1 0:T 1:T 0:T

c ϕ(x0:T )pθk−1 (x0:T | y1:T ) dx0:T = ´ c pθk−1 (x0:T |y1:T )dx0:T ´ c p (x | y ) θk−1 0:T 1:T ϕ(x0:T ) r(x0:T ) dx0:T = r(x0:T ) ´ c p (x |y ) θk−1 0:T 1:T r(x0:T )dx0:T r(x0:T ) ϕ(´x )w(x )r(x )dx = 0:T 0:T 0:T 0:T ´ w(x0:T )r(x0:T )dx0:T E [ϕ´(x )w(x )] = r 0:T 0:T Er[w(x0:T )] where w(x0:T ) = z(x0:T )/r(x0:T ). Therefore the integral in (9) can be approximated by i N assigning ϕ(x0:T ) = log(pθ(y1:T ,x0:T )), generating {x0:T }1 from r, and using that

E [ϕ(x )w(x )] PN ϕ(xi )w(xi ) r 0:T 0:T ≈ 1 0:T 0:T . PN i Er[w(x0:T )] 1 w(x0:T )

∗ This can also be used to generate a single trajectory x0:T from pθ(x0:T |y1:T ). Simply generate 1 N 0 a large number of trajectories x0:T , . . . ,x0:T from r, and draw x0:T with probability

w(xj ) p(x0 = xj ) = 0:T . 0:T 0:T PN i 1 w(x0:T ) The remaining theory regarding sequential Monte Carlo method will be given with regards QT to a HMM with the choice of instrumental distribution r(x0:T ) = 1 rt(xt), where rt(xt) = pθ(xt|xt−1). Additionally, the smoothing distribution pθ(x0:T |y1:T ) will be known up to a normalizing constant by

pθ(y1:T ,x0:T ) pθ(x0:T |y1:T ) = pθ(y1:T )

∝ pθ(y1:T ,x0:T ) T Y = pθ(x0) pθ(yt|xt)pθ(xt|xt−1) t=1

8 i meaning the correct choice of weight for a particle xt becomes i i i z(x0:t) pθ(y1:t,x0:t) wt(xt) = i i i = i i i z(x0:t−1)rt(xt|xt−1) pθ(y1:t−1,x0:t−1)pθ(xt|xt−1) i i pθ(yt,xt|yt−1,xt−1) i = i i = pθ(yt|xt) pθ(xt|xt−1)

i QT i where then w(x0:T ) = 1 w(xt).

3.4 Particle Filter

Algorithm 2: Standard Particle Filter Input: Parameter θ. ∗ Output: Trajectory x0:T . i Draw x0 ∼ p(x0) for i = 1, . . . ,N i Set w0 ← 1 for i = 1, . . . ,N for t = 1, . . . ,T do i i j Draw at with P r(at = j) ∝ wt−1 for i = 1, . . . ,N i i at Draw xt ∼ pθ(xt | xt−1) for i = 1, . . . ,N i i Set wt ← pθ(yt | xt) for i = 1, . . . ,N end i Draw I with P r(I = i) ∝ wT for i = 1, . . . ,N ∗ I Set xT ← xt for t = T − 1,...,1 do I Set I ← at+1 I Set xt∗ ← xt end

The standard particle filter is the most simple SMC sampler that uses resampling to generate i particles. At each time step t the standard particle filter will for each xt sample a ancestor i at x0:t−1 based on their weights, which is based on the suitability to the data. This is done so that particles which are not likely based on the observations gets replaced by more suitable i ones. After that the new particles xt are drawn from their proposal density, conditioned on i their ancestor. Finally, new weight wt are calculated which are based on the suitability of ∗ the new particle to the observations. After the time step T a final trajectory x0:T is chosen by ∗ i i P (x0:T = x0:T ) ∝ wT .

3.5 Particle Gibbs

The particle Gibbs sampler is very similar to the standard particle filter, except that it also 0 makes use of a predefined trajectory x0:T . This trajectory is implemented into the algorithm N N at each step as the path x0:T . Additionally, the ancestor of xt is set to N, preserving the

9 complete trajectory as a path of the algorithm. The use of this trajectory in a sense limits the generation of the new trajectory to the more relevant areas, as if new particles receive small weight the algorithm is likely to replicate the conditional trajectory. This results in the sampler drawing from the correct distribution for all N ≥ 2 [1].

Algorithm 3: Particle Gibbs 0 Input: Conditional trajectory x0:T , parameter θ. ∗ Output: Trajectory x0:T . i Draw x0 ∼ p(x0) for i = 1, . . . ,N − 1 N 0 Set x0 ← x0 i Set w0 ← 1 for i = 1, . . . ,N for t = 1, . . . ,T do i i j Draw at with P r(at = j) ∝ wt−1 for i = 1, . . . ,N − 1 i i at Draw xt ∼ pθ(xt | xt−1) for i = 1, . . . ,N − 1 N set at ← N N 0 Set xt ← xt i i Set wt ← pθ(yt | xt) for i = 1, . . . ,N end i Draw I with P r(I = i) ∝ wT for i = 1, . . . ,N ∗ I Set xT ← xt for t = T − 1,...,1 do I Set I ← at+1 ∗ I Set xt ← xt end

As good as this may sound, the algorithm creates a new problem, bad mixing for small N. Say if one, for a small N, were to iterate the algorithm using the previously generated trajectory ∗ 0 x0:T as your new conditional trajectory x0:T . In that case the consecutive trajectories are likely to be very similar, especially early in the process. This is firstly because if a small amount of new trajectories are generated, the conditional trajectory is more likely to be chosen, resulting in no change in the generated trajectory. Secondly, each new generated trajectory will be very likely to at some point take on the conditional trajectory in the resampling step, causing the beginning of the new trajectory to be an exact replica of the old one.

3.6 Particle Gibbs with Ancestor Sampling

The Particle Gibbs with Ancestor Sampling (PGAS) sampler is an extension of the ordinary particle Gibbs sampling algorithm, and allows for a new ancestor to be generated for the 0 conditional trajectory xt. This algorithm has been extensively studied, and has been shown to have the same convergence properties as the regular particle Gibbs algorithm [9][3], and to have greatly improved mixing, as the new trajectory now can replicate the conditional trajectory at any point without doing so at all earlier stages as well.

10 Algorithm 4: PGAS 0 Input: Conditional trajectory x0:T , parameter θ. ∗ Output: Trajectory x0:T . i Draw x0 ∼ p(x0) for i = 1, . . . ,N − 1 N 0 Set x0 ← x0 i Set w0 ← 1 for 1, . . . ,N for t = 1, . . . ,T do i i j Draw at with P r(at = j) ∝ wt−1 for i = 1, . . . ,N − 1 i i at Draw xt ∼ pθ(xt | xt−1) for i = 1, . . . ,N − 1 N N j 0 j Draw at with P r(at = j) ∝ wt−1pθ(xt | xt−1) N 0 Set xt ← xt i i Set wt ← pθ(yt | xt) for i = 1, . . . ,N end i Draw I with P r(I = i) ∝ wT for i = 1, . . . ,N I Set xT ∗ ← xt for t = T − 1,...,1 do I Set I ← at+1 I Set xt∗ ← xt end

3.7 Particle Stochastic Approximation EM

The PSAEM algorithm entangles the convergence criteria of N → ∞ and K → ∞ of the MCEM algorithm to create an algorithm that, in theory, converges as only K → ∞. This is done by replacing the Monte Carlo sum Qk(θ) in the MCEM algorithm with a recursively updated quasi likelihood

Qk(θ) = (1 − γk)Qk−1(θ) + γkQ(θ)

k where Q(θ) is the joint log likelihood log(pθ(y1:T ,x0:T )) of the observations and a single k ∞ draw x0:T ∼ pθk−1 (x0:T |y1:T ). {γ}1 is a positive and decreasing sequence of weights chosen such that ∞ ∞ X X 2 γk = ∞, γk < ∞ 1 1 which act as a forgetting factor for Qk(θ), causing the algorithm to more highly weighing k ∞ the optimization over more recent samples x0:T . A typical choice of {γ}1 is 1 γ = k kα

1 for some 2 < α ≤ 1. A choice of α closer to one results in the algorithm moving slower towards the correct parameters, but allows for the variance between steps within the al- gorithm to decrease faster. Choices of α closer to one one half allows for potential faster

11 movement in the parameter estimates, but at the cost of slower decrease in variance. The ∞ choice of {γ}1 provided above is chosen for all use of the PSAEM algorithm in this thesis. The hyperparameter α = 0.8 was chosen as a compromise as to not constraint the movement of the parameters to much, while still relatively quickly decreasing the variance between steps in the algorithm.

Algorithm 5: PSAEM 0 Input: Initial state x0:T , initial parameter θ0. Output: Parameter estimate θK . for k = 1, . . . ,K do 0 k−1 k Run PGAS with x0:T = x0:T and θ = θk−1 to sample x0:T k Set Qk(θ) ← (1 − γk)Qk−1(θ) + γk log(pθ(y1:T ,x0:T )) Set θk ← arg maxθ Qk(θ) end

However, the convergence of this algorithm is not as simple as desired. Because Qk(θ) is defined recursively and then maximized over θ in each step the algorithm will become more computationally heavy as k increases. This can be avoided if the joint conditional distribution of the hidden state and the observations are part of an exponential family, as defined earlier. For a HMM this reduces to that the conditional distributions pθ(yt | xt) and pθ(xt | xt−1) are part of a exponential family. In that case (8) can be used to get

T X log(pθ(x0:T ,y1:T )) = p(x0) + log(pθ(yt | xt)) + log(pθ(xt | xt−1)) t=1 (11) θ ∝ −(ψx(θ) + ψy(θ)) + hSx0:T ,φx(θ)i + hSy1:T ,φy(θ)i,

PT PT where Sx0:T = t=1 Sx(xt,xt−1) and Sy1:T = t=1 Sy(yt,xt). This can then be inserted into Qk(θ) to move the recursive updating to the sufficient statis- tics [6], resulting in the quasi likelihood

k k Qk(θ) = −(ψx(θ) + y(θ)) + hSx, φx(θ)i + hSy , φy(θ)i,

0 0 where Sy ,Sx = 0 and

T T k k−1 X k k−1 X Sy = (1 − γk)Sy + γk Syt ,Sx = (1 − γk)Sx + γk Sxt . t=1 t=1 This can be done as if the previous quasi likelihood can be written on the exponential form

k−1 k−1 Qk−1(θ) = −(ψx(θ) + ψy(θ)) + hSx , φx(θ)i + hSy , φy(θ)i,

12 then the linearity of the inner product can be used to rewrite the new quasi likelihood as

Qk(θ) =(1 − γk)Qk−1(θ) + γk(log(pθ(x0:T ,y1:T ))) θ k−1 k−1 ∝(1 − γk)(−( x(θ) + y(θ)) + hSx , φx(θ)i + hSy , φy(θ)i)

+ γk(−(ψx(θ) + y(θ)) + hSx0:T , φx(θ)i + hSy1:T , φy(θ)i)

= − (ψx(θ) + y(θ)) + γk(hSx0:T , φx(θ)i + hSy1:T , φy(θ)i) k−1 k−1 + (1 − γk)(hSx , φx(θ)i + hSy , φy(θ)i) k−1 k−1 = − (ψx(θ) + y(θ)) + hγkSx0:T + (1 − γk)Sx , φx(θ)i + hγkSy1:T + (1 − γk)Sy , φy(θ)i.

This changes the PSAEM algorithm so that the only recursive part (Sk) does not depend on θ. This means that the recursive part no longer changes during the optimization, resulting in the algorithm no longer increasing in complexity as k → ∞ [8].

13

4 Method

4.1 Data

Ten different Swedish stocks were chosen for the analysis, for which the closing prices from 2006-01-02 to 2020-01-23 (T = 3536) were used. As dependencies between the returns of different stock will be considered, their corresponding industries were of interest. Billerud- Korsnäs, Holmen and Stora Enso were chosen as they are part of the paper industry, which could result in higher correlation in their returns compared to stocks from different indus- tries. Boliden, Lundin and SSAB from the mining sector as well as and Telia from the Telecom industry were chosen for the same reasons. Additionally, and were chosen from individual industries. This was done in an attempt to have a a mix of some groups of highly correlated stocks, as well as some stocks that are not highly correlated with any other.

All data used were taken from [10]. The log returns were then calculated as in (1), and event such as splits were compensated for. Some extreme outliers that were not caused by splits or merges were also removed. For example, on 2013-05-16 Tele2 gave an extra dividend of 25.6 SEK per stock to its shareholders, with a closing price of 102 SEK the day before, resulting in a 22% drop in value of the stock during a time with otherwise low volatility. This kind of event is not wanted in the data, partly because it is impossible to predict using the models presented in this thesis, but mainly because an argument could be made that this is not a "real" loss in value of the stock from the perspective of a portfolio, as all shareholders were compensated for it. Optimally this compensation would be made for all significant dividend given to holders of the modeled stocks, but due to the difficulty of doing so only the most extreme cases were corrected. Such occurrences yt0 were replaced by ! St0 + D yt0 = log St0−1 where D is the amount of dividend handed out.

15 Figure 2: Log returns of the data, from 2006-01-02 to 2020-01-23.

# Name Industry 1 Alfa Laval Industrial Machinery 2 BillerudKorsnäs Paper & Paper Products 3 Boliden AB Industrial Metals & Mining 4 Holmen B Paper & Paper Products 5 Lundin Mining Corp. Copper 6 SSAB A Steel 7 Stora Enso R Paper & Paper Products 8 Swedish Match Tobacco 9 Tele2 B Telecom Services 10 Telia Telecom Services

Table 1: Chosen stocks along with their respective industries.

4.2 Simple Univariate Model

The most simple model fitted to the data using PSAEM was a HMM with Gaussian log returns y, whose log volatility is an autoregressive Markov process. This model has several properties which can be observed in real data. firstly, the independence of the driving noise in the returns will result in independent returns, which is a well known property for real daily stock returns. Additionally, the autoregressive property of the log volatility process will result in volatility clustering, the property that returns with high volatility tends to be

16 followed by returns with high volatility, and vice versa.

xt = axt−1 + m + σεt xt e xt (12) y = µ − + e 2 η t 2 t

ext where ε, η ∼ N(0,1). The negative 2 term is added to yt so that the expected return of the stock is finite, as otherwise

x x x 2 E[eyt ] = E[E[eyt |x ]] = E[eµ+e t /2] = eµ+e /2dF (x) ∝ eµ+e /2−x /2dx = ∞. t ˆ X ˆ

With the added term the dependencies on X in the conditional expectation will cancel out, resulting in the expectation

x x E[eyt ] ∝ E[eµ−e t /2+e t /2] = eµdF (x) = eµ. ˆ X This modification makes the model much more suitable for real data, as infinite expectation of the returns would imply that the value of the stock was infinite, which would be an odd assumption. This term is present in all models presented in this thesis, and for the same reasons.

4.3 Correlated Univarite Model

The previous model is lacking one of the widely accepted, although stylized, facts within finance: A negative correlation between the returns and the changes in volatility of an asset. That below average returns are associated with an increase in volatility, and vice versa [2]. A second model was constructed to accommodate this property,

xt = axt−1 + m + σεt x e t−1 xt−1 (13) y = µ − + e 2 η t 2 t where Cov(ηt,εt) = ρ. In essence this model adds a correlation between the driving noises in the log return and in the log volatility at the next time step. This introduces a direct dependency between yt and xt−1, and is therefore not a HMM. This requires a modification of the PSAEM algorithm compared to the previous model to accommodate for this. As the likelihood of such a model can be expressed as

T Y pθ(y1:T ,x0:T ) = p(x0) pθ(yt|xt−1)pθ(xt|yt, xt−1). (14) t=1

An algorithm which generates observations from pθ(x0:T |y1:T ) can be constructed by chang- ing the transition distribution (from which new suggestions for xt are generated) in the PGAS to rt+1(xt+1|xt) = pθ(xt|yt,xt−1), and by compensation for this in the weights by assigning

17 i i i z(x0:t+1) pθ(y1:t+1,x0:t+1) wt+1 = i i i = i i i z(x0:t)rt+1(xt+1|xt) pθ(y1:t,x0:t)pθ(xt+1|yt,xt)

i i i pθ(yt+1|xt)pθ(xt+1|yt,xt) i = i i = pθ(yt+1|xt). pθ(xt+1|yt,xt)

4.4 Multivariate Model

It is widely known that movements in the stock market often affect the entire market, implying that to model the log returns of several stocks, or of a portfolio of stocks, one would need to take into consideration the potential dependencies between log returns. The same argument can definitively be made for the hidden volatility processes. Therefore a multivariate model was created which allows for a full correlation structure within both the driving noise of the log returns and the driving noise in the volatility process, as well as between the two. This model is defined very similarly to the univariate case, except scaled up to the correct dimensions:

 1    1      xt a1 xt−1 m1 σ1  .   .   .   .   .   .  =  ..   .  +  .  +  ..  εt  .     .   .    d d xt ad xt−1 md σd (15)  1   1   1   xt−1 xt−1/2 yt µ1 e /2 e  .   .   .   .   .  =  .  −  .  +  ..  ηt  .   .   .    d  xd   xd /2 yt µd e t−1 /2 e t−1

" # > Ση Ση,ε where [ηt,εt] ∼ N(0,Σ) and Σ = is their joint correlation matrix. This Σηε Σε model contains a full correlation matrix of size 2d×2d, which is required to retain the prop- erties of a correlation matrix (symmetric, positive definite) throughout the optimization of the quasi likelihood in the PSAEM algortihm. Therefore there is a need to parameterize the correlation matrix in an efficient manner.

4.5 Cholesky Decomposition

A lower Cholesky decomposition of a positive semi-definite matrix Σ is a lower triangular matrix L such that LL> = Σ. This decomposition could be used to parameterize a covariance matrix Σ, by assigning θ to be each lower triangular element of L. This parameterization, however, has some issues.

18 The largest problem is that the Cholesky decomposition is not unique, as if L is a Cholesky decompsition of Σ, then so is any matrix constructed by multiplying any combination of rows in L by -1. This may cause big issues when optimizing a likelihood over Σ as there will be multiple minima which give the same likelihood, making convergence more difficult. Additionally, this parameterization cannot be used to parameterize a correlation matrix, as there is no restrictions on the diagonal of LL> to be equal to one.

4.6 Spherical Parameterization

The spherical parameterization, also known as the triangular angle parameterization, of a n(n−1) n × n correlation matrix Σ are the 2 parameters 0 ≤ θi,j ≤ π that define the lower triangular matrix

  1 0 0 0 ··· 0    cos θ2,1 sin θ2,1 0 0 ··· 0     cos θ cos θ sin θ sin θ sin θ 0 ··· 0  Lθ =  3,1 3,2 3,1 3,2 3,1   cos θ cos θ sin θ cos θ sin θ sin θ sin θ sin θ sin θ ··· 0   4,1 4,2 4,1 4,3 4,2 4,1 4,3 4,2 4,1   . . . . .  ......

> such that Lθ is the Cholesky decomposition of Σ, i.e., LθLθ = Σ. This parameteriza- tion guarantees that for all choices of θ the corresponding matrix Σ will be a correlation matrix, that is, symmetric and positive definite. This property is very practical in the optimization, as it constricts the optimization to the true parameter space. For the opti- ∗ mization over the likelihoods these parameters were replaced by the parameters θi,j, which were then mapped onto [0,π] by π θi,j = θ∗ . 1 + e i,j As sin(θ) is positive for θ ∈ [0,π] this guarantees that for all choices of θ∗, the diagonal in the Cholesky factorization will be positive, resulting in a unique parameterization [11]. Spherical parameterization is used for the correlation matrix Σ in the optimisation step for the multivariate model.

4.7 Portfolio

A portfolio P 1 d was constructed in order to evaluate the performance of the previously St ,...St defined models. At each time point, the portfolio value P is distributed equally across the modeled stocks, resulting in the log returns   1 y1 1 yd y = log e t+1 + ··· + e t+1 (16) p,t+1 d d where y1, . . . ,yd are the log returns of the stocks included in the portfolio. The VaR of the returns were approximated via Monte Carlo method, as explicitly calculating the VaR of

19 this asset is very difficult. In the multivariate case this was done by generating individual log returns with the estimated correlation structure Σˆ η, where as two different structures were used for the univariate models.

The first structure, from here on referred to as Portfolio 1, was given by an assumption the driving noise in the log returns are mutually independent. The second correlation structure (Portfolio 2) took into consideration that even though the univariate models do not take other data into consideration during the estimation, it is clear that the returns of the stocks will have some dependence structure. Therefore the correlation structure of η were estimated after the fitting of the univariate models as the correlation of of the recreated driving noise. That is, Σˆ η = corr(ˆη) where ηˆ is the reconstructed driving noise of the log returns.

20 5 Results

This section presents the estimated parameters of each model as well as the performance of their corresponding one day VaR(0.05). All figures shown as examples of the general result will be of the first stock in the data, Alfa Laval. For figures corresponding to the other assets, see Appendix A.2.

5.1 Simple Univariate Model

5.1.1 Simulation Study

The PSAEM algorithm was applied to simulated data in order to compare the variance of the estimates that is caused by the algorithm to than caused by variations in the data. Gen- erated processes were of length as the stock data (3536), with parameter values µ = 0.0002, a = 0.95, m = −0.5, σ = 0.3. These parameter values were chosen to be similar to the value one could expect for real log returns of a stock. The initial parameter values were set to the true values, plus uniform noise on [−cθp , cθp ], where cθp is a suitable, noticeable, number chosen differently for each parameter. To create a valid initial guess for the hidden state the PGAS algorithm was run with a very high number of generated particles (N) for the first iteration of the algorithm, in this case N = 1000.

Figure 3 and 4 show the histograms of the estimates of σ. The first is calculated by repeatedly running the algorithm on the same generated sequence of observations, and for the second histogram a new observed sequence was generated for each parameter estima- tion. These histograms firstly shows that the algorithm appears to generate sufficiently unbiased estimates of the parameters. Secondly it shows that the variance in the estimates that comes from the generation of the process itself is much greater than that from the algorithm. This indicates that the choice of N = 20 and K = 1000 are sufficient for the variance of the algorithm to be low enough for accurate estimates.

Figure 3: Histogram of estimates σˆ for Figure 4: Histogram of estimates σˆ for one simulated process y. different simulated processes y.

21 Figure 5: Estimated hidden state (red) along with true simulated process (blue).

Parameter True Value Estimate SE Algorithm SE µ 0.0002 9.577 · 10−5 5.7995 · 10−6 a 0.95 0.0094 0.0025 m -0.5 0.0938 0.0257 σ 0.3 0.0282 0.0099

Table 2: Standard error of parameter estimates using the PSAEM algorithm.

5.1.2 Parameter Estimates

Table 3 shows the estimated parameter from fitting the simple univariate model to the data using PSAEM with N = 30 particles and K = 2000 iterations. Additionally, a visualization of the estimated correlation matrix Σˆ η used for VaR estimation of the portfolio can be seen in Figure 6, where the data is ordered as in Table 1.

22 Figure 6: Visualization of the estimated correlation matrix Σˆ η.

Name µˆ aˆ mˆ σˆ Alfa Laval 0.0009 0.9533 -0.3835 0.2621 BillerudKorsnäs 0.0006 0.9136 -0.6954 0.3857 Boliden 0.0009 0.9673 -0.2495 0.2399 Holmen 0.0005 0.9093 -0.7822 0.3493 Lundin 0.0006 0.9678 -0.2339 0.2092 SSAB 0.0002 0.9708 -0.2202 0.1843 Stora Enso 0.0008 0.9653 -0.2761 0.1872 Swedish Match 0.0007 0.9105 -0.7847 0.3451 Tele2 0.0007 0.8476 -1.3051 0.5392 Telia 0.0004 0.9171 -0.7372 0.3840 Telia 0.0004 0.9135 -0.7698 0.3937

Table 3: Parameter estimates of the simple univariate model using PSAEM with N = 30 and K = 2000.

5.1.3 Model Evaluation

(12) implies that xt −xt/2 ηt = (yt − µ + e /2)e ∼ N(0,1), therefore the effectiveness of this model can be quantified by looking at the distribution of the estimated driving noise η. That is, if

xˆt −xˆt/2 ηˆt = (yt − µˆ + e /2)e ∼ N(0,1). (17) Figure 7 shows the normality plot of ηˆ of the log returns of Alfa Laval, which would follow a Gaussian distribution under the assumptions.

23 Figure 7: Normality plot of ηˆAlfa.

The likelihood ratio tests given in (3-5) were applied to the indicator sequences of the one step VaR estimates of each stock independently, as well as to the portfolio consisting of equal investments in each stock. The results of these tests can be seen in Table 4. Recall 2 2 that under the null hypothesis, LR1 and LR2 will follow a χ1 distribution, and LRVaR a χ2 distribution. As comparison to these results, the 95% quantiles of the χ2 distributions are 3.8415 and 5.9915, respectively.

Name LR1 LR2 LRVaR Alfa Laval 0.0038 0.2880 0.2918 BillerudKorsnäs 0.8675 1.6243 2.4918 Boliden 0.0606 2.5824 2.6430 Holmen 2.3464 5.6370 7.9835 Lundin 2.4216 0.6865 3.1081 SSAB 0.3674 1.5781 1.9455 Stora Enso 0.3946 1.0142 1.4088 Swedish Match 0.8675 0.4864 1.3538 Tele2 1.1712 0.4031 1.5743 Telia 0.3048 3.0068 3.3115 Portfolio 1 981.6675 3.7313 985.3988 Portfolio 2 3.0799 0.2945 3.3744

Table 4: Result of likelihood ratio tests on the indicator sequences, with values failing the test at 95 percent confidence level marked in bold.

24 Figure 8: Negative VaR(0.05) of the log returns of Alfa Laval (blue) with returns above (green) and below (red) the quantile.

5.2 Correlated Univariate Model

5.2.1 Parameter Estimates

Table 5 shows the estimated parameter from fitting the correlated univariate model to the data using PSAEM with N=30 particles and K=2000 iterations, and Figure 9 shows a visualization of the estimated correlation matrix Σˆ η.

Name µˆ aˆ mˆ σˆ ρˆ Alfa 0.0006 0.9595 -0.3320 0.2412 -0.2758 BillerudKorsnäs 0.0003 0.9450 -0.4404 0.2924 -0.2581 Boliden 0.0005 0.9794 -0.1570 0.1868 -0.3096 Holmen 0.0003 0.9209 -0.6801 0.3184 -0.2607 Lundin 0.0000 0.9765 -0.1699 0.1828 -0.4114 SSAB -0.0001 0.9775 -0.1691 0.1612 -0.3111 Stora Enso 0.0006 0.9671 -0.2623 0.1772 -0.2825 Swedish Match 0.0004 0.9225 -0.6776 0.3146 -0.2800 Tele2 0.0004 0.9019 -0.8356 0.4073 -0.3074 Telia 0.0001 0.9329 -0.5966 0.3341 -0.3201

Table 5: Parameter estimates of each stock for the univariate model with correlation (N = 2000, K = 30).

25 Figure 9: Visualization of Σˆ η of the univariate model with correlation.

5.2.2 Model Evaluation

Name LR1 LR2 LRVaR Alfa Laval 0.0287 0.2476 0.2763 BillerudKorsnäs 0.4959 1.0800 1.5759 Boliden 0.0866 0.1292 0.2158 Holmen 1.5193 0.3749 1.8941 Lundin 0.1595 0.4057 0.5652 SSAB 0.8471 2.4348 3.2819 Stora Enso 0.2264 0.1383 0.3647 Swedish Match 0.3674 0.7410 1.1084 Tele2 0.0038 0.1092 0.1130 Telia 0.4959 0.1803 0.6762 Portfolio 1 914.1447 3.0075 917.1522 Portfolio 2 0.3048 0.3324 0.6371

Table 6: Result of likelihood ratio tests on the indicator process of the correlated univariate model, with values failing the test at 95 percent confidence level marked in bold.

Similarly to the simple univariate model, the fit of the model can be evauated by examining the estimated driving noise ηˆ, which can be calculated as in (17), the only change being a replacement of xˆt with xˆt−1 due to the change in dynamics of the correlated model. The normality plot of the driving noise in Alfa Laval can be seen in Figure 10. The results of applying the likelihood ratio tests to estimated one day VaR can be seen in Table 6.

26 Figure 10: Normality plot of the estimated driving noise ηˆAlfa.

Figure 11: Negative VaR(0.05) of the log return of Alfa Laval (blue) with returns above (green) and below (red) the quantile.

5.3 Multivariate Model

The estimated parameters of the multivariate model can be seen in Table 7, and Figure 12 shows a visualization of the estimated correlation matrix Σˆ η. For a visualization of the full 20 × 20 correlation of the driving noise, as well as the exact values, see Appendix A.3.

27 Figure 12: Visualization of the estimated correlation matrix Σˆ η of the multivariate model.

5.3.1 Parameter Estimates

Name µˆ aˆ mˆ σˆ Alfa 0.0003 0.8281 -1.4196 0.4446 BillerudKorsnäs -0.0001 0.8629 -1.1032 0.4358 Boliden 0.0001 0.9012 -0.7547 0.3369 Holmen 0.0001 0.8235 -1.5240 0.4451 Lundin -0.0003 0.9197 -0.5827 0.2687 SSAB -0.0006 0.9030 -0.7348 0.2831 Stora Enso 0.0001 0.8912 -0.8697 0.2830 Swedish Match 0.0004 0.8618 -1.2121 0.4188 Tele2 0.0003 0.7713 -1.9563 0.6037 Telia 0.0000 0.8043 -1.7468 0.5464

Table 7: Parameter estimates of the multivariate model for N = 100 and K = 2000.

28 5.3.2 Model Evaluation

Name LR1 LR2 LRVaR Alfa Laval 2.1234 0.4937 2.6171 BillerudKorsnäs 0.3946 0.6862 1.0808 Boliden 1.3398 0.3400 1.6797 Holmen 3.6219 0.6727 4.2946 Lundin 1.3401 1.0665 2.4066 SSAB 0.3674 0.2639 0.6313 Stora Enso 0.1042 1.5501 1.6543 Swedish Match 0.8471 0.5180 1.3651 Tele2 0.1595 6.4899 6.6495 Telia 0.6085 0.1100 0.7185 Portfolio 1.5193 1.2788 2.7981

Table 8: Result of likelihood ratio tests on the indicator process of the multivariate model, with values failing the test at 95 percent confidence level marked in bold.

Similarly to the correlated univariate model the 10 dimensional driving noise η of the mul- tivariate model can be estimated by

  1   −x1 /2      xt−1 e t−1 y1 mˆ e  t 1  2   .   .   .   .  ηˆt =  ..   .  −  .  +  .  .    .   .   .   −xd /2  d  xd  e t−1  yt mˆ d e t−1  2 Each individual sequence ηˆi is then expected to follow a standard Gaussian distribution. The normality plot of one such sequence can be seen in Figure 13.

29 Figure 13: Normality plot of the estimated driving noise ηˆ of Alfa Laval.

Figure 14: Negative VaR(0.05) of the log return of Alfa Laval (blue) with returns above (green) and below (red) the quantile.

30 6 Discussion and Conclusion

6.1 PSAEM

Applying PSAEM to a model where the transition probabilities are not part of a exponen- tial family results in the algorithm slowing down considerably within a couple of hundred iterations due to the recursive definition of Qk in the maximization step. Therefore this algorithm can be problematic to use in combination with models whose conditional distri- butions are not part of an exponential family. For this reason, all models in this thesis were constructed using Gaussian noise. This simplifies the construction of models in which the conditional distributions are part of the exponential family, as the exponential form of the Gaussian distribution is the exponential of a square. For model in which e.g. both the mean and variance of the observed state depends on the hidden state, the square in the likelihood can simply be expanded to receive the statistics of the distribution.

Additionally, the algorithm can be very difficult to apply even to models who in theory can be implemented as in (11), as the derivation of the statistics in the conditional distri- butions can be very cumbersome for models in which the observed process has a complex dependency on the hidden state. It should therefore be taken into consideration that the actual implementation of a model into the PSAEM algorithm may require additional effort relative to other Monte Carlo methods.

6.2 Simulation Study

The simulation study of the simple univariate model indicates that the PSAEM algorithm performs very well when applied to a one dimensional Gaussian HMM. It appears to quickly convergence to parameter values which are close to the true values, and to find the correct parameter values even for poorly chosen initial parameters and states. Additionally, the variance introduced by the algorithm is relatively small, as only between 0.5 and 10 percent of the variance of the different parameter estimates are caused by variance in the algorithm, as opposed to estimate variance introduced by variations in the data.

6.3 Simple Univariate Model

The simple univariate model was effectively fitted to the data via PSAEM, reaching effective results within minutes. The estimated one day VaR(0.05) appear to correctly predict the 5 percent quantiles of the data. The only stock whose likelihood ratios were significantly different from the theorized behavior is Holmen, whose indicator function shows statisti- cally significant autocorrelation. Additionally it is clear that an uncorrelated approach for the log returns of the portfolio is not feasible, greatly underestimating VaR and therefore scoring poorly on the likelihood ratio tests (see Table 4).

The estimated driving noise ηˆ appears to somewhat diverge from the assumptions made in the model. The normality plot in Figure 7 shows that the noise behaves similar to a

31 Gaussian distribution up to approximately two standard deviations from the mean but that both tails of the distribution are heavier, indicating that the Gaussian distribution could be an insufficient choice of distribution.

6.4 Correlated Univariate Model

The correlated model validates the theory that the log returns and changes in volatility of a stock tend to be negatively correlated, with highly significant correlation parameter estimates for all modeled stocks (see Table 5). This model also outperforms its uncorrelated counterpart in general in terms of VaR estimation. The indication sequence of the corre- lated model behaved closer to the theorized properties for most of the stocks ( see Table 6), indicating a more suitable model. Most importantly, this model does not produce results that show statistically significant deviances for Holmen, as in the more simple model. The first portfolio approach is yet to produce satisfactory results, but the second approach does so, improving the LRVaR value to 0.6371, compared to 3.3744 in the previous model.

The correlated model has the same issues as the simple model in that the tails of the estimated driving noise ηˆ quite poorly fits a Gaussian distribution, appearing to have both heavier upper and lower tail ends of the distribution.

6.5 Multivariate Model

The multivariate model performed moderately, passing VaR likelihood ratio test with con- fidence level 0.95 for most stocks as well as for the portfolio. It did however produce unsatisfactory results for the VaR of Tele2, failing the independence test at 95 percent confidence level (see Table 8). In general, the multivariate model performed worse than its univariate counterpart, even though it performed adequately. This could be caused by numerical issues caused by the very large number of parameters in the multivariate model (230), as well as that the surface of the quasi likelihood is likely difficult to optimize over due to the parameterization of the correlation matrix.

The multivariate model appear to have more accurately captured the tail end behaviour of the driving noise, producing estimated driving noise ηˆ whose upper tail appear to closely follow a Gaussian distribution. However, the lower tail of the distribution is still too light, as in the previous models (Figure 13).

In addition to producing worse VaR estimates than the univariate models, the multivariate model is also severely more computationally heavy. The running time of the algorithm was not greatly analysed in this thesis, and neither were ways to optimize the number iterations, particles, or hyper-parameters needed for fast convergence. In spite of this it becomes clear that the multivariate model is up to an order of magnitude slower than running the univari- ate models for each asset. This is mostly due to the high complexity in the optimization of the quasi likelihood.

The estimated correlation matrix Σˆ η of the multivariate model, as well as the more crudely

32 estimated versions associated with the univariate models, confirm the idea that the returns of assets with common industries will show higher correlation than those from different industries. For example, each model shows high correlation between Boliden, Holmen, and SSAB, who are all part of the mining sector. Tele 2 and Telia who are both part of the telecom industry have relatively low correlation with all stocks except each other. Addi- tionally Swedish Match, the only analysed asset who is part of the tobacco industry, has comparably low correlation with all other stocks.

6.6 Conclusion The PSAEM algorithm appears to be a solid choice for application of some state space models, producing accurate results in a relatively short time span for simple models, while struggling with models where the optimization is more complex. All models were competent at estimating the one day VaR(0.05) of both singular stocks as well as of a portfolio of stock. Of them, the correlated univariate model is highest recommended as it was significantly faster than the multivariate model as well as produces quite accurate VaR estimates, in terms of conditional coverage.

6.7 Future Studies

The PSAEM algorithm appears to effectively fit stochastic volatility models to financial data. However, no formal optimization of the hypermarameters or comparisons to different Monte Carlo methods were made in this thesis. Therefore a study in which the performance of the PSAEM algortihm was optimized, or compared to algorithms such as standard Monte Carlo EM or particle marginal Metropolis Hastings, would be of great interest.

Based on the normality plots of the estimated driving noises of the applied models (Figure 7, 10 and 13) it is clear that the Gaussian distribution does not have enough density in the tails of the distribution to model the driving noise in the log returns. This means that although all models performed fairy well when evaluated using the one day VaR(0.05) they are likely to underestimate Value at Risk at lower probabilities as well as produce sub op- timal estimates for risk measures which takes the entire lower tail of the distribution into consideration. Therefore a suitable continuation from the presented models would be to introduce a distribution which is similar to the Gaussian distribution close to the center of the distribution, but which has heavier tails.

A common choice of distribution with these properties is the Student’s t distribution. How- ever as it is not part of an exponential family it could be difficult to make use of the property of the PSAEM algorithm given by (11), implying that such a model could be problematic to implement using PSAEM. Another choice of distribution could be the generalized Gaussian distribution, given by the density function

γ γ f(x) = e−(|x−µ|/σ) 2σΓ(1/γ) where Γ(·) is the gamma function, and where µ, σ > 0 and γ > 0 are location, scale and shape parameters, respectively. This distribution will have heavier tails than the Gaussian

33 distribution for γ < 2, and is equal to the Gaussian distribution for γ = 2. However, it is only part of an exponential family for γ equal to an even integer, causing this distribution to also potentially be problematic to implement using PSAEM.

In its current form the PSAEM algorithm is only used to estimate parameters and re- construct a hidden state to the entire data set at once. A possible extension could therefore be to allow the parameters to change over time by moving the algorithm along the data. For example, at each time point t the algorithm can be run only on observations within some window Wt. After acquisition of some initial parameters suitable for t = 0 the algorithm can for each iteration move one time step ahead, using the parameters and reconstructed hidden state from the previous iteration. The recursive definition in the quasi likelihood of the PSAEM algorithm will act as a forgetting factor in the optimization step, producing a sequence of parameters θ1, . . . ,θT corresponding to each time step in the data.

Algorithm 6: PSAEM - Possible Extension 0 Input: Initial state x0:T , initial parameter θ0, window function W Output: Parameter estimates θ1, . . . , θT . for t = 1, . . . ,T do Run PGAS with x0 = {x , x∗} and θ = θ to sample x Wt Wt−1 t t−1 Wt

Set Qt(θ) ← (1 − γt)Qt−1(θ) + γt log(pθ(yWt ,xWt )) Set θt ← arg maxθ Qt(θ) end

34 A Appendix

A.1 Exponential Form of the Multivariate Model The log likelihood of the multivariate model can be written on the exponential form

T X log(pθ(y1:T ,x0:T )) = log(pθ(x0)) + log(pθ(yt|xt,xt−1)) + log(pθ(xt|xt−1)) t=1 θ ∝ log(p (x )) + tr(φ1(θ)S1 ) − 2 tr(φ2(θ)S2 ) − 2 tr(φ3(θ)S3 ) θ 0 y y1:T y y1:T y y1:T + 2 tr(φ4(θ)S4 ) + 2S5 φ5(θ) + tr(φ6(θ)S6 ) + 2 tr(φ7(θ)S7 ) y y1:T y1:T y y y1:T y y1:T − 2 tr(φ8(θ)S8 ) − 2S9 φ9(θ) + tr(φ10(θ)S10 ) − 2 tr(φ11(θ)S11 ) y y1:T y1:T y y y1:T y y1:T − 2S12 φ12(θ) + tr(φ13(θ)S13 ) + 2S14 φ14(θ) − T ψ (θ) y1:T y y y1:T y1:T y y + tr(φ1(θ)S1 ) − 2 tr(φ2(θ)S2 ) − 2S3 φ3(θ) x x1:T x x1:T x1:T x + tr(φ4(θ)S4 ) + 2S5 φ5(θ) − T ψ (θ) x x1:T x1:T x x where the functions φy, φx, ψy and ψx are given by

> −1 > −1 −1 ψy(θ) = − m Σx ΣyxΣy|xΣyxΣx m − log |Σy|x| > −1 ψx(θ) = − log |Σx| − m Σx m n −1 −1 −1 −1 −1 o φx(θ) = Σx , Σx A, Σx m, AΣx A, AΣx m n −1 −1 −1 −1 −1 −1 −1 −1 −1 φy(θ) = Σy|x, Σy|xΛµ, Σy|xΣyxΣx , Σy|xΣyxΣx A, Σy|xΣyxΣx m, ΛµΣy|xΛµ, −1 −1 −1 −1 −1 −1 −1 > −1 −1 ΛµΣy|xΣyxΣx , ΛµΣy|xΣyxΣx A, ΛµΣy|xΣyxΣx m, Σx ΣyxΣy|xΣyxΣx , −1 > −1 −1 −1 > −1 −1 Σx ΣyxΣy|xΣyxΣx A, Σx ΣyxΣy|xΣyxΣx m, −1 > −1 −1 −1 > −1 −1 o AΣx ΣyxΣy|xΣyxΣx A, AΣx ΣyxΣy|xΣyxΣx m

−1 > > where Σy|x = Σy − ΣyxΣx Σyx, Σy = Ση, Σx = σ Σεσ. Σyx = Σηεσ and Λµ,A are diagonal matrices with elements µi and ai as in (15), respectively.

The statistics Sx1:T ,Sy1:T are given by

T Si = X Si •1:T •t t=1 where

n > > > > > o Sxt = xtxt , xt−1xt , xt , xt−1xt−1, xt−1 n > > > > > > > > Sy¯t = y¯ty¯t , Λxt−1 y¯t , xty¯t , xt−1y¯t , y¯t , Λxt−1 Λxt−1 , xtΛxt−1 , xt−1Λxt−1 , > > > > > > o Λxt−1 , xtxt , xt−1xt , xt , xt−1xt−1, xt−1

i i i −xt−1/2 i −xt−1/2 i xt−1 where Λxt−1 is a diagonal matrix with elements e and y¯t = e (yt + e /2).

35 A.2 Figures

Figure 15: Negative VaR(0.05) from the simple univariate model of the log returns of the nine stocks not shown in the results (blue) with with their corresponding returns above (green) and below (red) the quantiles.

36 Figure 16: Normality plots of the nine remaining driving noises ηˆ generated by the simple univariate model.

Figure 17: Negative VaR(0.05) from the correlated univariate model of the log returns of the nine stocks not shown in the results (blue) with with their corresponding returns above (green) and below (red) the quantiles.

37 Figure 18: Normality plots of the nine remaining driving noises ηˆ generated by the correlated univariate model.

Figure 19: Negative VaR(0.05) of the multivariate model of the log returns of the nine stocks not shown in the results (blue) with with their corresponding returns above (green) and below (red) the quantiles.

38 Figure 20: Normality plots of the nine remaining driving noises ηˆ generated by the multi- variate model.

A.3 Full Parameter Estimates of the Multivariate Model

Figure 21: Visualization of the full 20 × 20 correlation matrix of the driving noise in the multivariate model.

39 1.000 .4418 .5742 .4764 .4513 .5583 .5065 .2866 .4382 .4635 .4418 1.000 .4592 .4907 .3516 .4383 .5257 .2476 .3425 .3739   .5742 .4592 1.000 .4606 .6562 .6516 .5047 .2254 .3919 .4059   .4764 .4907 .4606 1.000 .3433 .4729 .6061 .2534 .3601 .3951   ˆ .4513 .3516 .6562 .3433 1.000 .5314 .3898 .1527 .2937 .2984 Ση =   .5583 .4383 .6516 .4729 .5314 1.000 .5060 .1873 .3621 .3872   .5065 .5257 .5047 .6061 .3898 .5060 1.000 .2350 .3714 .4034   .2866 .2476 .2254 .2534 .1527 .1873 .2350 1.000 .2671 .3207   .4382 .3425 .3919 .3601 .2937 .3621 .3714 .2671 1.000 .5642 .4635 .3739 .4059 .3951 .2984 .3872 .4034 .3207 .5642 1.000

1.000 .5876 .7307 .6851 .6078 .6782 .6870 .5269 .7259 .7690 .5876 1.000 .5606 .5568 .4623 .4026 .6450 .3627 .5620 .5176   .7307 .5606 1.000 .6252 .6891 .6981 .6316 .5033 .6243 .6194   .6851 .5568 .6252 1.000 .4952 .5228 .6007 .4563 .6111 .5633   ˆ .6078 .4623 .6891 .4952 1.000 .5764 .4983 .3783 .4991 .5121 Σε =   .6782 .4026 .6981 .5228 .5764 1.000 .5555 .4226 .5680 .6116   .6870 .6450 .6316 .6007 .4983 .5555 1.000 .5106 .5822 .5773   .5269 .3627 .5033 .4563 .3783 .4226 .5106 1.000 .5486 .4552   .7259 .5620 .6243 .6111 .4991 .5680 .5822 .5486 1.000 .7656 .7690 .5176 .6194 .5633 .5121 .6116 .5773 .4552 .7656 1.000

−.0526 −.0401 −.0497 −.0549 −.1802 −.0109 .0020 −.1391 −.0691 −.0985 −.0817 −.0849 −.0717 −.0807 −.1351 −.0445 .0009 −.0501 −.0939 −.1034   −.0855 −.0378 −.0550 −.0463 −.0988 −.0166 .0011 −.1315 −.0652 −.0495   −.0939 −.0422 −.0789 −.0723 −.1372 −.0191 −.0163 −.0729 −.0909 −.1156   ˆ −.1099 −.0281 −.0727 −.0516 −.2143 −.0257 −.0460 −.1117 −.1307 −.1460 Σηε =   −.0645 −.0205 −.0635 −.0294 −.1287 −.0615 −.0192 −.0883 −.0457 −.0678   −.0879 −.0641 −.0617 −.0963 −.2074 −.0523 −.0246 −.0775 −.0729 −.1248   −.0515 −.0230 −.0248 −.1193 −.0611 −.0406 −.0330 −.1997 −.0799 −.0399   −.0596 −.0394 −.0116 −.0146 −.0748 −.0206 −.0029 −.0868 −.1217 −.0559 −.0575 −.0619 −.1040 −.0825 −.1513 −.0677 −.0968 −.1312 −.0962 −.1371

40 7 References

[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342, 2010.

[2] J. Baea, C. J. Kimb, and C. R. Nelson. Why are Stock Returns and Volatility Nega- tively Correlated? Journal of Empirical Finance, 14:41–58, 2007.

[3] N. Chopin and S. S. Singh. On Particle Gibbs Sampling. Bernoulli, 21(3):1855–1883, 2015.

[4] P. F. Christoffersen. Evaluating Interval Forecasts. International Economic Review, 39(4):841–862, 1998.

[5] R. Cont. Volatility Clustering in Financial Markets: Empirical Facts and Agent-Based Models. Long Memory in Economics, 1, 05 2005.

[6] B. Delyon, M. Lavielle, and E. Moulines. Convergence of a Stochastic Approximation Version of the EM Algorithm. The Annals of Statistics, 27:94–128, 1999.

[7] P. Jorion. Value at Risk - The New Benchmark for Managing Financial Risk. 2007.

[8] A. Lindholm and F. Lindsten. Learning Dynamical Systems with Particle Stochastic Approximation EM. 2019.

[9] F. Lindsten, M. I. Jordan, and T. B. Schon. Particle Gibbs with Ancestor Sampling. Journal of Machine Learning Research, 15:2145–2184, 2014.

[10] Nasdaq. Nasdaq Nordic shares, 2020. http://www.nasdaqomxnordic.com/shares, ac- cessed on 2020-01-23.

[11] F. Rapisarda, D. Brigo, and F. Mercurio. Parameterizing Correlations: A Geometric Interpretation. IMA Journal of Management Mathematics, 18, 2006.

[12] C. F. J. Wu. On the Convergence Properties of the EM Algorithm. Annals of Statistics, 11(1):95–103, 1983.

41

Master’s Theses in Mathematical Sciences 2020:E62 ISSN 1404-6342 LUNFMS-3095-2020 Mathematical Statistics Centre for Mathematical Sciences Lund University Box 118, SE-221 00 Lund, http://www.maths.lth.se/