Estimating the Expected Shortfall Using Extreme Value Theory

Technische Universiteit Delft Faculteit Elektrotechniek, Wiskunde en Informatica Delft Institute of Applied Mathematics

Estimating the Expected Shortfall using Extreme Value Theory

Verslag ten behoeve van het Delft Institute of Applied Mathematics als onderdeel ter verkrijging

van de graad van

BACHELOR OF SCIENCE in TECHNISCHE WISKUNDE

door

Stan Hermanus Antoine Tendijck

Delft, Nederland 28 juni 2016

BSc verslag TECHNISCHE WISKUNDE

“Estimating the Expected Shortfall using Extreme Value Theory”

Stan Hermanus Antoine Tendijck

Technische Universiteit Delft

Begeleider

Dr. J.-J. Cai

Overige commissieleden

Dr. ir. F.H.van der Meulen Dr. K.P.Hart

Dr.ir. M. Keijzer

28 juni 2016 Delft

CONTENTS

Abstract vii Preface ix List of abbrevations and symbols xi 1 Introduction1 1.1 The expected shortfall...... 1 1.2 Extreme value theory...... 1 2 Convergence of the expected shortfall5 3 Estimation of the expected shortfall7 4 Simulations and applications 11 4.1 The Hill estimator...... 11 4.2 Simulations...... 11 4.3 Applications...... 15 5 Further studies 19 5.1 Asymptotic normality...... 19 5.2 Improving the estimator for large γ ...... 19 6 Proofs 21 6.1 Probability theory...... 21 6.2 Convergence of the expected shortfall...... 24 6.3 Further studies...... 31 A Analysis 33 B Regular variation 35 C Computations 37 D Simulations 39 D.1 Generalized Pareto distribution with negative parameter ξ ...... 40 D.2 Generalized Pareto distribution with non-negative parameter ξ ...... 43 D.3 Conclusions...... 52 References 53 References...... 53

ABSTRACT

The behavior of the conditional expectation of X given that X exceeds the quantile Qp , where p increases to 1, has been analyzed. Some may already know that this conditional expectation is called the expected shortfall and that this quantity is frequently used in, for example, the risk management. It has been assumed that X D(G ) with γ 11 . From this assumption, the following ∈ γ < limit relation is derived using extreme value theory.

ES(p) E{X X Qp } 1 lim lim | > p 1 Qp = p 1 Qp = 1 γ ↑ ↑ − + This relation leads to the next two estimators for p close to 1:

Qˆp ES(ˆ p) = 1 γˆ − + and ˆ k 1 Qp 1 X− ES(˜ p) Xn i,n, = Xn k,n · k i 0 − − = with Qˆp and γˆ consistent estimators for the pth quantile Qp and, respectively, the ex- + treme value index γ . Estimators, which satisfy this, were found in de Haan and Ferreira + [2, pp. 69-116,133-140]. In Chapter4 and AppendixD, the estimators are tested, and it is stated which estimators for γ and Qp give the best estimation for the expected shortfall. The estimators + are eventually put into practice. This can also be found in Chapter4.

1This is deﬁned on page2 under Deﬁnition2.

vii

PREFACE

If there is a huge storm coming and it is known that the water level will rise above some threshold or that it will be the worst storm in 30 years, then what can we expect to happen? Will this expectation exceed the height of the dike or not? If you want to know the answer to this question, then you should keep reading since this is one of the practical applications of this report. In this report, it will be shown how you could solve these type of questions to some extent. In the end, you should be able of computing it yourself. Many other similar questions can be formulated as well. Such as what can we expect to lose if the stock market crashes or how much damage will there be in case of inter- nal fraud in a small company. These type of questions can easily be computed with the empirical estimator if the threshold is not too high. However, the threshold will usually be too high if the uttermost extreme cases play a significant role in answering the question. Thus the focus of this report will be to solve issues where this is the case such that it is possible to conclude something about extreme events that makes sense where the amount of data is relatively small. For example, 1877 severe storms have been recorded in Delfzijl for over a period of more than 100 years, de Haan and Ferreira [2], but the government wants to preclude a flood at all times. So how high should the government build the dikes such that a flood will only occur with probability 1/10.000 every year? These type of questions can all be formulated as what to expect when things go really bad. It will be the primary goal of this report to solve problems like this. To do this, one needs to have a theory to extrapolate the available observations. A possibility is to use extreme value theory. This approach will be used to construct an estimator to answer such a question in a formal way. The word bad is actually not always the best word to describe the event. For example, if the question is to estimate how old you are going to be conditioned on the fact that you are very old, it is quite an inappropriate thing to say. However it will still be called bad, but this should be looked upon as if an event crosses a threshold.

Stan Tendijck Delft, June 2016

LISTOFABBREVIATIONSAND SYMBOLS

This chapter is a summary of all the notation that is frequently used throughout this report.

D equality in distribution =D convergence in distribution −→P convergence in probability −→a.s. almost sure convergence −→Lp convergence in pth mean, p 1 −→ ≥ a(t) b(t) lim a(t)/b(t) 1 ∼ t = γ the extreme value index ρ the second order parameter 1{A} indicator function: equals 1 if x A and 0 otherwise ∈ a max{a,0} + a.s. almost surely a.e. almost everywhere D(Gγ) the domain of attraction of Gγ E{X } the expectation of the random variable X f ← the (usually left-continuous) inverse function of f F distribution function (usually of the random variable X ) Fn right-continuous empirical distribution function Gγ extreme value distribution function GPD generalized Pareto distribution i.i.d. independent and identically distributed R [0, ) + ∞ Q1 p (1 p)th quantile, i.e. 1 p F (Q1 p ) − − − = − Qn,1 p empirical (1 p)th quantile based on n observations − − Qˆ1 p estimator for the (1 p)th quantile other than the empirical (usually − − based on n observations) RVα regularly varying with index α Π(a) extended regularly varying class with auxiliary function a(t) Xi , X random variables U (usually left-continuous) inverse of 1/(1 F ) − x∗ sup{x F (x) 1} U( ) | < = ∞ xi

1 INTRODUCTION

1.1. THEEXPECTEDSHORTFALL What is the expected shortfall and why is there a need to estimate it? Well, the expected shortfall is a tool to measure risk and risk measures play quite an important role in the risk management. Besides, the expected shortfall is not just a common risk measure. It is a spectral risk measure, which is a particular case of being coherent. The exact deﬁ- nition of a spectral risk measure can be found anywhere on the internet, but the most important thing is that it can be used intuitively, and therefore risk managers frequently utilize the expected shortfall. So, how is it deﬁned?

Deﬁnition 1. Let p be a probability, then the expected shortfall of a random variable X is deﬁned to be the conditional expectation of X given X Q , i.e. > p Z 1 1 ∞ E{X X Qp } E{X 1{X Qp }} x fX (x)dx, | > = 1 p > = 1 p Q − − p where Qp is the pth quantile of the distribution. It is denoted by

ES(p): E{X X Q } = | > p This conditional expectation is thus the quantity that will play a prominent role in this report. It will be estimated where the probability p will be very close to 1 (p 1), i.e. < 1 p O(1/n) with n the number of observations. Note that the empirical estimators − = are useless and we thus continue with extreme value theory.

1.2. EXTREMEVALUETHEORY Let’s start with explaining a few basics of extreme value theory accompanied with some theorems. This whole section is a summary of the ﬁrst chapter of de Haan and Ferreira [2], but it only discusses the part that is needed for this report. The proofs of the theorems that are listed can be found in [2].

1 2 1.I NTRODUCTION

What is extreme value theory? Extreme value theory is the study of the behavior of 1 the maximum of random variables. To do this properly, one normalizes the maximum in some way comparable to the central limit theorem. Then one classiﬁes the type of limit distributions that could occur. From that point on, the theory can be extended in many directions from which a few will be discussed. The following deﬁnition and proposition formalizes the basics of extreme value theory.

Deﬁnition 2. If for a sequence of random variables {Xi } (i.i.d) exist a sequence of constants a 0 and b real such that n > n max(X , X ,..., X ) b 1 2 n − n an

has a non-degenerate limit distribution as n , i.e. → ∞ n lim F (an x bn) G(x), n →∞ + = say for every continuity point x of G(x), and G a non-degenerate distribution function, then G is an extreme value distribution. Notation:

F D(G) ∈ However, usually X D(G) will be written, where X has the same distribution as X . ∈ i Proposition 1 (Fisher and Tippet (1928), Gnedenko (1943)). The class of extreme value distributions is G (ax b) with a 0, b real, where γ + > ¡ 1/γ¢ G (x) exp (1 γx)− , 1 γx 0, γ = − + + > with γ real and where for γ 0 the right-hand side is interpreted as exp( exp( x)). = − − This leads to the following statements.

Proposition 2. For γ R the following statements are equivalent: ∈ 1. There exist real constants a 0 and b real such that n > n n ¡ 1/γ¢ lim F (an x bn) Gγ(x) exp (1 γx)− , n →∞ + = = − + for all x with 1 γx 0. + > 2. There is a positive function a such that for x 0, > U(tx) U(t) xγ 1 lim − − : Dγ(x), t a(t) = = →∞ γ where for γ 0 the right-hand side is interpreted as logx, which is the limit of the = right-hand side for γ 0. → 1.2.E XTREMEVALUETHEORY 3

3. There is a positive function a such that 1 1/γ lim t(1 F (a(t)x U(t))) (1 γx)− , t − + = + →∞ for all x with 1 γx 0. + > 4. There exists a positive function f such that

1 F (t x f (t)) 1/γ lim − + (1 γx)− t x 1 F (t) = + ↑ ∗ −

for all x for which 1 γx 0, where x∗ sup{x F (x) 1}. + > = | < Moreover, the equation in 1. holds with b : U(n) and a : a(n). Also the equation in 4. n = n = holds with f (t) a(1/(1 F (t)). = − Proposition 3. The distribution function F D(G ) if and only if there exist positive func- ∈ γ tions c and f , f continuous, such that for all t (t ,x∗), t x∗, ∈ 0 0 < ½ Z t ds ¾ 1 F (t) c(t)exp − = − t0 f (s) with limt x c(t) c (0, ) and ↑ ∗ = ∈ ∞  limt f (t)/t γ, γ 0,  →∞ = > limt x f (t)/(x∗ t) γ, γ 0, (1.1) ↑ ∗ − = − <  limt x f 0(t) 0 and limt x f (t) 0 if x∗ , γ 0. ↑ ∗ = ↑ ∗ = < ∞ = Moreover, the function f is asymptotically the same as the function f in Proposition2.(4).

Note that from Proposition2.(4). the following is immediate.

Corollary 1. Let X be a random variable. Suppose X D(G ). Then there exists a positive ∈ γ function f (t) for which µ X t ¶ D Yt : − |X t Y (as t x∗) = f (t) > → → with Y GPD(γ). This function f satisﬁes the limit relations described in Proposition3. ∼ Enough foreknowledge of extreme value theory has been introduced for now. With this foreknowledge, a limit behavior of the expected shortfall in the tail will be found.

2 CONVERGENCEOFTHEEXPECTED SHORTFALL

In the preface, it has been said that the goal is to determine what to expect when things go really bad. In mathematical terms, the goal is to estimate the expected shortfall, i.e.

ES(1 p): E{X X Q1 p }, − = | > − for p close to 0. To study this property well, it is simplified first. Define x∗ as the endpoint of the distribution of X , which may be infinite. The goal is then to estimate

E{X X t} | > for large t, i.e. t close to x∗ (t x∗). In order to make some useful approximations for < large t, it is convenient to compute the limit as t increases to x∗. Since x∗ will be equal to infinity in many cases, this limit will tend to infinity as well. We will thus have to construct a normalized version of it since then we could derive in what order it tends to infinity if it tends to infinity. A division by t is exactly the normalization we were looking for. The details are worked out in Chapter6, and the results are stated in the next theorem and corollary. The proof of this theorem can also be found in Chapter6. In this chapter, only a sketch will be given.

Theorem 1. Suppose X D(G ) with γ 1. Then ∈ γ < E{X X t} 1 lim | > , t x∗ t = 1 γ ↑ − + where γ max{γ,0}. + = Corollary 2. Suppose X D(G ) with γ 1. Then ∈ γ < ES(1 p) 1 lim − p 0 Q1 p = 1 γ ↓ − − + 5 6 2.C ONVERGENCEOFTHEEXPECTEDSHORTFALL

where Q1 p is the (1 p)th quantile of the distribution of X and where ES is the expected − − shortfall.

The proof of the theorem relies on two important aspects. The ﬁrst aspect is one of the most important results of the extreme value theory 2 µ X t ¶ D − |X t Y GPD(γ) f (t) > −→ ∼

and the second one is that there exist a T 0 such that the set of the random variables > ½µ X t ¶ ¾ − |X t : t T (2.1) f (t) > ≥

is uniformly integrable. It is proved that these two together then imply convergence of the expectations, i.e. ½ X t ¾ 1 E − |X t E{Y } f (t) > → = 1 γ − From that point on, the result is straightforward. The most difﬁcult part is proving that the set, deﬁned in (2.1), is uniformly integrable. To show this part, regular variation of functions and extended regular variation is introduced. With this new theory, some integrable bounds are found, such that uniform integrability follows. If you want to know the details, see Chapter6 and AppendixB. 3 ESTIMATIONOFTHEEXPECTED SHORTFALL

Before we can ﬁnally answer the question, which is stated at the beginning, we have to gain information about estimating the parameter γ and the quantile Q1 p . There already − exist quite some estimators for γ,[2, pp. 69-116]. The next three estimators for γ will be used in this report.

1. The moment estimator (γ R), ∈ 2. The probability-weighted moment estimator (γ 1), and < 3. The negative Hill estimator (γ 0.5). < − All of these estimators converge in probability to the real γ under the right conditions. The next question is how to estimate the quantile Q1 p ? For that purpose, we need − to assume that p depends on n, i.e. p p (but we will write only p), and that 1 p 0 = n ≥ ↓ as n and last that p O(1/n). So, we need to extrapolate outside of the range of → ∞ = observations. In this case, the quantile Q1 p cannot be empirically estimated as n . − → ∞ Fortunately, this quantile can be estimated using extreme value theory1. In short, there are multiple estimators for the quantile depending on γ. In this report, only the following type of estimator will be used

³ ´γˆ k 1 npn − Qˆ1 p Xn k,n σˆ . − = − + γˆ

This one is useful for all γ and it is, therefore, an excellent estimator to use. So, it will be utilized for the three different estimators for γ mentioned earlier. If we ﬁll in the moment estimators for γ and σ, we have the moment estimator for Q. Next, if we ﬁll in the probability-weighted moment estimators for γ and σ, we have the probability-weighted

1All the details can be found in De Haan and Ferreira (2006, pp. 133-140)

7 8 3.E STIMATIONOFTHEEXPECTEDSHORTFALL

moment estimator for Q. Finally, if γ 0, we can fill in the negative Hill estimator for γ < and the moment estimator for σ. The resulting estimator for Q will be called the negative Hill estimator. Recall that p depends on n, so the limit n implies p 0. Since n is more → ∞ ↓ → ∞ general; this will be used instead of p 0. ↓ Corollary4. leads to the first estimator, which is defined in the next theorem.

Theorem 2 (Method 1). Suppose X , X , X , X ,... i.i.d. and X D(G ) with γ 1. Define 1 2 3 ∈ γ < ˆ 3 Q1 p ES(1ˆ p): − (3.1) − = 1 γˆ − + Suppose that the conditions of Theorem 4.3.1. in de Haan and Ferreira (2006) are satisfied, then we have defined a consistent estimator, i.e.

ES(1 p) P − 1 (as n ) ES(1ˆ p) −→ → ∞ −

if we use consistent estimators for γ and for the quantile Q1 p . + − Remark 1. Note that the method depends on the sign of γ. So if one is using this estimation method, then at ﬁrst the sign of γ should be determined before continuing. It will be assumed that it is no problem in determining the sign of γ. In other words, it is assumed P that there exists a (combined) method for which γˆ γ holds. + −→ + Proof. Earlier has already been noticed that such estimators exist, and the result follows using Corollary4. :

ES(1 p) (1 γˆ )ES(1 p) (1 γ )ES(1 p) Q1 p 1 γˆ P − − + − − + − − − + 1 (as n ) ES(1ˆ p) = Qˆ1 p = Q1 p · Qˆ1 p · 1 γ −→ → ∞ − − − − − +

The second estimator for the expected shortfall is a consequence of the first one. Suppose X , X , X , X ,... i.i.d. and X D(G ) with γ 1. Then the second estimator is 1 2 3 ∈ γ < defined as ˆ n ¡ ¢ Q1 p 1 X ES˜ 1 p : − Xi 1{Xi Xn k,n}, (3.2) − = Xn k,n · k i 1 > − − = where X X X 1,n ≤ 2,n ≤ ··· ≤ n,n are the nth order statistics and where Qˆ1 p is a consistent estimator for Q1 p . Moreover, − − if the conditions of Theorem 4.3.1. in de Haan and Ferreira [2] are satisfied and if for some sequence k k(n) , 1 k(n)/n 0 holds, then this estimation is intuitively = → ∞ ≥ ↓ correct, i.e. ES(1˜ p) ES(1 p). The idea of this claim will be given below. − ≈ − 9

First of all, note that one has

X Q k n k,n 1 D pk − − − n Z N (0,1). ([2, p. 50]) ¡ n ¢ −→ ∼ a k

So this validates the approximation Xn k,n Q1 k . Next, the following limit relations − ≈ − n hold by Corollary4.

³ k ´ ES 1 n 1 ES(1 p) lim − lim − 3 n n →∞ Q1 k = 1 γ = →∞ Q1 p − n − + − This leads to ¡ ¢ ES 1 p Q1 k lim − − n 1. n ³ k ´ →∞ ES 1 · Q1 p = − n − Also we have

µ ¶ n n k n o 1 X 1 1 X 1 ES 1 E X X Q1 k Xi {Xi Q1 k } Xi {Xi Xn k,n}. − n = | > − n ≈ k i 1 > − n ≈ k i 1 > − = = The first approximation is its empirical counterpart. Since k k(n) and since the = → ∞ expectation is finite, this is a proper estimate. The second one is intuitively correct because of the earlier noted approximation Xn k,n Q1 k . − ≈ − n It follows that ES(1˜ p) ES(1 p) for small p (or big n) because − ≈ −   ³ k ´ ¡ ¢ ¡ ¢ Q k ES 1 ES 1 p ES 1 p 1 n n Xn k,n Q1 p − − − − − − 1. ¡ ¢  ³ ´  1 n ˜ = k · Q · P 1 · Q k · ˆ ≈ ES 1 p ES 1 1 p k i 1 Xi {Xi Xn k,n} 1 Q1 p − − n − = > − − n − Remark 2. This estimator can be an advantage over the first one if there is a lot of uncer- tainty in the estimation of γ and especially if γ is close to 1.

Concluding, this chapter gave us two estimators for the expected shortfall: ESˆ in (3.1) and ES˜ in (3.2). The ﬁrst is consistent and the second one should also work provided some conditions are satisﬁed. It remains to check the performance of the estimators if the sample comes from a known distribution. In the next chapter, this will be done by simulations.

4 SIMULATIONSANDAPPLICATIONS

4.1. THE HILL ESTIMATOR If you have acquired more foreknowledge about extreme value theory, you are probably wondering why the Hill estimator for γ isn’t mentioned in the previous chapter since the Hill estimator is, I think, the most famous estimator for the tail index, and next to that, it is one of the best estimators for γ. The answer is actually kind of interesting. The Hill estimator is a consistent estimator for the extreme value index. So, in theory, this one gives a good estimation for γ. How- ever, you should note that this is due to the misleading concept of asymptotic behavior. The realistic sample size that we will take in this chapter is n 1000, and this is simply = not big enough for γ 1. The estimates for γ are then not favorable, and the estimation < for the expected shortfall will fail. If a sample size of n 10.000 or even bigger was used, = then the Hill estimator would’ve worked significantly better than it did for n 1000, and = it could be worth it to take the Hill estimator into account. If you are a big fan of the Hill estimator, and you want to use it in your practical situation, then the first thing you need to do, is to count your number of observations. Next, you run a simulation with a comparable extreme value index with a sample size equal to the same number of observations you counted. If the number of measurements appears to be big enough for a good estimation, then I encourage you to use the Hill estimator, otherwise just estimate it with the estimators I discussed since these also work fine.

4.2. SIMULATIONS Here there will be shown some simulation results of the estimators introduced in Chap- ter3. These simulations are based on a known distribution with sample size n 1000 4 = and probability p 10− , which is just a little less than 1/(n logn). The simulations will n = be independently replicated for r 5000 times and different values of k. Every simula- = tion will lead to a value for the expected shortfall, which will be averaged in the end to show how the estimator behaves given the value of k. Next to that the variability of the

11 12 4.S IMULATIONS AND APPLICATIONS

Type of distribution ES(1 p) − Uniform (2 p)/2; − 1 ¡ 2 ¢ Normal (pp2π)− exp Q /2 , where the quantile Q solves R ¡ · 2 ¢ − Q∞ exp x /2 dx pp2π; ¡ γ − ¢ 1 =1 Generalized Pareto p γ(1 γ) − γ− (γ 0) − − 6= log(e/p)(γ 0) ¡ ¡ = ¢¢ 1 Student t p3 pπ 1 Q2/3 − , where the quantile Q solves R · ¡ 2+ ¢ 2 2 ∞ 1 x /3 − dx pπp3 Q + = Table 4.1: The expected shortfall. For the computations, see AppendixC.

4 estimators is classiﬁed via the normalized mean squared errors of the estimator, i.e. r Ã ˆ !2 r µ ˜ ¶2 1 X ESi,k 1 X ESi,k 1 or 1 . r i 1 ES − r i 1 ES − = = For the estimator of the expected shortfall, we need to choose compatible estimators for the quantile and the extreme value index. It is convenient to choose the same estimator for the extreme value index as used in estimating the quantile, leaving us with a moment estimator, a probability-weighted moment estimator and a negative Hill estimator for the expected shortfall. Since it is known from which distribution the sample comes, the real expected shortfall can easily be computed, see Table 4.1. The actual value will then be compared to the estimated one. We will run the simulations for the following distributions

1. The uniform distribution on [0,1], where γ 1; = − 2. The standard normal distribution, where γ 0; = 3. The generalized Pareto distribution with ξ 0.2 γ; = = 4. The student t-distribution with 3 degrees of freedom, where γ 1/3. = Before we give the results. The notation in the ﬁgures should be explained. We denote Moment for the moment estimator, PWM for the probability-weighted moment estimator and NegHill for the negative Hill estimator. These 3 types of estimators depend on the value of γ which is known in these cases. They particularly depend on the sign of γ, which has to be determined ﬁrst in a practical situation. If γ is estimated to be close to 0, one can always use the moment and the probability-weighted moment estimator for estimating the expected shortfall since these should work in all cases. For the results concerning the distributions that are mentioned earlier, see Figure 4.1-4.4. More results are discussed in AppendixD. 4.2.S IMULATIONS 13

Figure 4.1: Uniform distribution: estimates of the real quantity, which is 0.99995 (this is indicated by the horizontal line) with the corresponding mean squared errors. The upper two ﬁgures show the estimator from (3.1) and the lower two the one from (3.2).

Figure 4.2: Standard normal distribution: estimates of the real quantity, which is approximately 3.9585 (this is indicated by the horizontal line) with the corresponding mean squared errors. The upper two ﬁgures show the estimator from (3.1) and the lower two the one from (3.2). 14 4.S IMULATIONS AND APPLICATIONS

Figure 4.3: Generalized Pareto distribution with ξ 0.2: estimates of the real quantity, which is approximately = 34.4348 (this is indicated by the horizontal line) with the corresponding mean squared errors. The upper two ﬁgures show the estimator from (3.1) and the lower two the one from (3.2).

Figure 4.4: Student t-distribution with 3 degrees of freedom: estimates of the real quantity, which is approximately 33.3461 (this is indicated by the horizontal line) with the corresponding mean squared errors. The upper two ﬁgures show the estimator from (3.1) and the lower two the one from (3.2). 4.3.A PPLICATIONS 15

The mean squared errors and the averages don’t give the full picture because know- ing these values, the shape of the distribution of the estimator can only be guessed. Luckily, the shape of the distribution can be approximated by the simulations. An approximation of the distribution can be found in Figures 4.5-4.8, where the value k 100 = has been chosen. The Shapiro-Wilk test is applied to the errors to check the normality assumption. The 16 highest p-value of all tests we did, is approximately 10− as a result of the occurrence of some outliers. So the distribution of the errors and thus of the estimator is not normal.

Figure 4.5: Uniform distribution: kernel density plot of the estimates of the quantity, which is 0.99995 (this is indicated by the horizontal line). The ﬁgure on the right shows the performance of the estimator from (3.1) and the ﬁgure on the left shows the performance of the estimator from (3.2).

Figure 4.6: Normal distribution: kernel density plot of the estimates of the quantity, which is 3.9585 (this is indicated by the horizontal line). The ﬁgure on the right shows the performance of the estimator from (3.1) and the ﬁgure on the left shows the performance of the estimator from (3.2)..

4.3. APPLICATIONS Let’s apply all the discussed methods now to the real world. Suppose that you buy the S&P 500 stock and you hold it for 30 years. If you want to know what you could expect to happen to the value of the stock on the worst day in these 30 years, then you came to the right place since this is the question that will be answered. If you have a similar problem, then you can just follow the steps I have taken. 16 4.S IMULATIONS AND APPLICATIONS

Figure 4.7: Generalized Pareto distribution with ξ 0.6: kernel density plot of the estimates of the quantity, = which is 34.4348 (this is indicated by the horizontal line). The ﬁgure on the right shows the performance of the estimator from (3.1) and the ﬁgure on the left shows the performance of the estimator from (3.2).. 4

Figure 4.8: Student t-distribution with 3 degrees of freedom: kernel density plot of the estimates of the quantity, which is 33.3461 (this is indicated by the horizontal line). The ﬁgure on the right shows the performance of the estimator from (3.1) and the ﬁgure on the left shows the performance of the estimator from (3.2).

To answer the question the daily closing prices over a 16 year period, from 2000 Jan- uary 1st until 2015 December 31st, have been downloaded. You might be wondering why I didn’t download the biggest dataset I could get. Well, next to providing an acceptable estimation, the data that the estimation is based on, should be relevant for the next period. In the case of stocks, it is always difficult to determine such a term since mistakes in the past are less likely to happen in the future. In the event of floods, it is also difficult since the climate is changing quite rapidly. So, using the historical data, you should always consider to what extent it is relevant. For the sake of illustration, I have chosen here a 16 year period. The log returns of these daily closing prices are computed, and the theory will be applied to these log returns. It should be noted that the log returns are not independent. However, weak dependence is a common assumption on the log returns of the daily closing prices. The results of the previous theorems could then under these assumptions still be used. The probability of once in every 30 years is

1 5 p 9.13 10− = 30 365 ≈ · · 4.3.A PPLICATIONS 17

¡ 5¢ Thus, to answer this question ES 1 9.13 10− needs to be determined. Before this − · can be done, a rough estimation of γ has to be made since the estimators depend on the sign and value of γ. By using the general estimators for γ (the moment estimator and the probability-weighted moment estimator), it follows that γ is approximately 0.2. So it makes sense to look at the moment estimator and the probability-weighted moment estimator. The results can be found in Figure 4.9. From the simulations we have done, we observe a similar behavior of the estimators for the expected shortfall. From the simulations, we know that the best option is to use the probability-weighted moment estimator with k between 100 and 300. Applying this result to the data. gives rise to multiple types of estimates. We can, for example, sum all of them and average it

ESˆ (1 p) 0.1661 1 − = or use k 200 = 4 ESˆ (1 p) 0.1583. 2 − = These estimations correspond to a decline in the value of the stock of around 18% respectively 17%.

Figure 4.9: Expected loss in log returns of the worst day in 30 years.

A single estimation for a quantity is nice, but it would be great if we could construct some conﬁdence intervals. I have done the following. I have taken k 100,200,300 and = simulated some data from the generalized Pareto distribution with parameter ξ 0.2. I = took a sample size of n 1000, and I repeated it for r 10.000 times. Remember that for = = positive γ we had four estimators. So we end up with 10.000 times four estimations of the expected shortfall. Next, we divide the estimations by the real value and subtract 1,

ESˆ 1, ES − to get some centered distribution around 0. Then we compute from these simulated distributions the 2.5% quantile and the 97.5% quantile to get a 95% conﬁdence interval. A simple computation leads then to the following: with 95% certainty we have that

· ESˆ ESˆ ¸ ES , . ∈ Qˆ 1 Qˆ 1 0.975 + 0.025 + This method is the basis of the construction of the 95% conﬁdence interval. The results are shown in Figure 4.10. 18 4.S IMULATIONS AND APPLICATIONS

Furthermore, the illustration corresponding to the value k 300 is the most satisfy- = ing compared to the others, since the conﬁdence intervals are relatively small. If we use this one with the information we had on the performance of the estimators, we would conclude that the real expected shortfall is with 95% certainty in the interval

ES (0.07272,0.30038). ∈ This range corresponds to an expected loss between 7.5% and 35.0%, where in this case the estimate would be a loss of 16.0%.

Figure 4.10: 95% confidence intervals of the estimators. The dots are the estimations, and the lines represent the confidence intervals. The figure on the top left corresponds to the value k 100, the one on the top right = to the value k 200, and the one in the middle to the value k 300. = = 5 FURTHERSTUDIES

What could we do next? Two ways of improving or supplementing this report are stated in the next two short sections.

5.1. ASYMPTOTICNORMALITY One obvious addition is to prove asymptotic normality of the error of the estimator. This property is quite useful since then we can construct mathematically approved confidence intervals. The only flaw of proving asymptotic normality is that the error of the estimation in the simulation was not normally distributed. This was checked by applying the Shapiro- Wilk test. In a further study, one could determine the lowest number of observations such that the normality assumption is verified and prove the actual asymptotic normality.

5.2. IMPROVINGTHEESTIMATORFORLARGE γ It can be found in AppendixD. that the estimators do not work when γ is close to 1. What can we do in that case? One possibility to proceed is to take the logarithm of the events. It can be shown that if X D(G ) where γ 0 that log X D(G ), for the proof see Chapter6. Then it is ∈ γ ≥ ∈ 0 possible to estimate the expected shortfall for the logarithm of X , which then leads to a lower bound for the expected shortfall by Jensen’s inequality. A simulation could check how well this works. This is left to a further study or the reader. If the simulations show that this method of estimation works, then a lower bound has been found. This lower bound could then be used for example as a ﬁrst estimation of the minimal actions that should be taken.

6 PROOFS

Before we can proof the relevant results, our foreknowledge of probability theory should be adequate. Let’s start with some probability theory.

6.1. PROBABILITYTHEORY This part is far from complete for as to deﬁne probability measures one should under- stand a big part of real analysis but this exceeds the goal of this report. However, I will still introduce some basic and advanced concepts of probability theory. First of all, we need to get through some general knowledge about expectations. What’s next is based on Grimmett and Welsch [3].

Deﬁnition 3. If X is a continuous random variable, the expectation of X is denoted by E{X } and deﬁned by Z E{X } ∞ x dF (x) = −∞ where F is the distribution function of X .

This deﬁnition leads to the following proposition.

Proposition 4. If X is a continuous random variable. Then the expectation can be written as Z E{X } ∞ x f (x)dx = −∞ where f is the density function of X .

For non-negative random variables, the deﬁnition could even be simpliﬁed further.

Proposition 5. If X is a continuous non-negative valued random variable. Then the expectation can be written as Z E{X } ∞(1 F (x))dx = 0 − whenever this integral exists.

21 22 6.P ROOFS

This proposition can be proved quite easily with partial integration. One other primary aspect of probability theory is the convergence of random variables. We know four types of convergence: almost sure convergence, convergence in Lp , convergence in probability and convergence in distribution. I’ll state the deﬁnitions from Jacod and Protter [4].

Definition 4. We say that a sequence of random variables {Xn}n 1 converges almost surely ≥ to a random variable X if ³n o´ P ω lim Xn(ω) X (ω) 0. n | →∞ 6= = Notation: a.s. X X . n −→ p Definition 5. A sequence of random variables {Xn}n 1 converges in L to X (where 1 ≥ ≤ p ) if E© X p ª,E© X p ª and < ∞ | n| | | < ∞ © p ª lim E Xn X 0. n →∞ | − | = Notation: Lp X X . n −→ 6 Definition 6. A sequence of random variables {Xn}n 1 converges in probability to X if for ≥ any ε 0 we have > lim P( Xn X ε) 0. n →∞ | − | > = Notation: P X X . n −→ d Definition 7. Let µn and µ be probability measures on R (d 1). The sequence µn R R ≥ converges weakly to µ if f (x)dµn(x) converges to f (x)dµ(x) for each f which is real- valued continuous and bounded on Rd .

d Deﬁnition 8. A sequence of R -valued random variables {Xn}n 1 converges in distribu- ≥ tion to X if the distribution measures PXn converge weakly PX . Notation: D X X . n −→ These deﬁnitions are not equivalent. However, they are all related, which is described in the next theorem.

Proposition 6. Suppose we have a sequence of random variables {Xn}n 1. Then the fol- ≥ lowing holds:

Lp P 1. If X X , then X X; n −→ n −→ a.s. P 2. If X X , then X X; n −→ n −→ P a.s. 3. If X X , then there existst a subsequence n such that X X; n −→ k nk −→ 6.1.P ROBABILITYTHEORY 23

P Lp 4. If X X and X Y for all n with E© Y p ª , then E© X p ª and X n −→ | n| ≤ | | < ∞ | | < ∞ n −→ X;

P D 5. If X X , then X X; n −→ n −→ D P 6. If X X and X c a.s. then X X. n −→ = n −→ It is now possible to formulate some relevant theorems for this report. First, we have two quite general ones, which can also be applied to expectations (as integrals).

Theorem 3 (Monotone convergence theorem). Let (X ,Σ,µ) be a measure space. Assume that the sequence {fn} of integrable functions satisﬁes fn fn 1 almost everywhere for ≤ + each n and limR f dµ . Then there exists an integrable function f such that f f n < ∞ n ↑ almost everywhere and hence R f dµ R f dµ holds. n ↑

Theorem 4 (Lebesgue’s dominated convergence theorem). Let {fn} be a sequence of integrable functions on a measure space (S,Σ,µ) satisfying f g almost everywhere for all n | n| ≤ and some ﬁxed integrable function g. If f f a.e., then f deﬁnes an integrable function n → and Z Z Z lim fn dµ lim fn dµ f dµ n n →∞ = →∞ = Proofs of these theorems can be found in Jacod and Protter [4] as well. The next 6 proposition is also quite an important one.

Proposition 7. Suppose we have two sequence of random variables {Xn}n 1, {Yn}n 1 which ≥ ≥ satisfy P P X XY Y n −→ n −→ for some random variables X and Y . Then

P X Y XY . n n −→ This proposition is not listed in [4], and the proof shall be left to the reader. Now, I want to formulate two more propositions, but for that purpose, it is necessary to deﬁne uniform integrability, see Billingsley [1].

Definition 9. A class C of random variables is called uniformly integrable if given ε 0, > there exists K [0, ) such that ∈ ∞ sup(E{ X 1{ X K }}) ε, X C | | | | ≥ ≤ ∈ where 1{ X K } is the indicator function. | | ≥ With that defined, the two next theorems can be formulated (these are grouped together since they will be useful in the same proof). The first theorem gives information on how to get from convergence in distribution an almost sure type of convergence, [1]. 24 6.P ROOFS

Theorem 5 (Skorokhod’s representation theorem). Let µ , n N be a sequence of prob- n ∈ ability measures on a metric space S such that µn converges weakly to some probability measure µ on S as n . Then there exists on the Lebesgue interval random elements X → ∞ n and X which have respective laws µ and µ and satisfy lim X (ω) X (ω) for each ω, in n n n = other words X X almost surely. n → The next theorem provides us a way to derive from almost sure convergence convergence in mean, Rudin [5, p. 133].

Theorem 6 (Vitali convergence theorem). Let (Ω,F,P) be a probability space.

• {Xn} is uniformly integrable; a.s. •X X; n −→ • X (ω) a.e. | | < ∞ then the following hold:

• E{ X } ; | | < ∞ L1 •X n X , i.e. limn E{ Xn X } 0. In particular limn E{Xn} E{X }. −→ →∞ | − | = →∞ =

6 6.2. CONVERGENCEOFTHEEXPECTEDSHORTFALL In the original part of the text, we normalized the expected shortfall by dividing the quantity with t. In this part where we proof the results, we will use another normalization, for which the assumption that X D(G ) with γ 1 is necessary. ∈ γ < Choose a positive function f (t) such that the relationship in Theorem2. (4). holds, i.e. 1 F (t x f (t)) 1/γ lim − + (1 γx)− . (6.1) t x 1 F (t) = + ↑ ∗ − The quantity will be normalized as

½ X t ¾ E − |X t , f (t) >

where f satisfies (6.1). We’ll write Y ((X t)/f (t) X t) to simplify notation for further t = − | > purposes. Eventually, we will be able to compute the limit of E{Y } at t increases to x . t ∗ Before we get to the formal part of proving to what this converges, it is nice to get a global (informal) glance at first sight. D By Corollary1. we already know that Y Y where Y GPD(γ). From this, we want t −→ ∼ to derive that the expectation of Yt also converges to the expectation of Y , which equals 1/(1 γ). Then we relate this back to the expected shortfall. We will see that the limit − of the expected shortfall divided by the quantile will converge to 1/(1 γ ), which is the − + result we want. However, formalizing the fact that the expectation of Yt converges to the expectation of Y is not trivial at all. For that purpose, we will prove that the random variable Yt satisfies the conditions of the two next statements, such that we may apply them to Yt from which the result follows. 6.2.C ONVERGENCEOFTHEEXPECTEDSHORTFALL 25

Theorem 7. Suppose we have a collection of positive random variables:

{Z i I} i | ∈ © 1 εª Suppose also that E Z + C for all i I and some ε 0, C R, then {Z i I} is i < ∈ > ∈ i | ∈ uniformly integrable.

Proof. Note that it is allowed to write for any M 0 > © 1 εª © 1 ε ª © 1 ε ª E Z + E Z + 1{Z M} E Z + 1{Z M} . i = i i ≤ + i i > ¡ 1 ε ¢ 1 ε Since the random variable Z + 1{Z M} monotonically increases to Z + when M i i ≤ i goes to inﬁnity, we have by Lebesgue’s monotone convergence theorem that

© 1 ε ª © 1 εª E Z + 1{Z M} E Z + i i ≤ → i Thus it follows that © 1 ε ª lim E Z + 1{Zi M} 0 M i > = →∞ 1 ε and in particular Z + 1{Z M} is measurable. Without loss of generality, assume that i i > M 1 and applying Lebesgue’s dominated convergence theorem yields > Z Z 1 ∞ ∞ ε 1 ε E{Zi {Zi M}} z fZi (z)dz z− z + fZi (z)dz, 6 > = M = M where fZi is the density function of Zi . ε Choose δ 0 arbitrarily and choose M such that z− δ/C as z M. Then it follows > < > that Z 1 δ ∞ 1 ε δ © 1 ε1 ª δ E{Zi {Zi M}} z + fZi (z)dz E Zi + {Zi M} C δ > ≤ C M = C > ≤ C · = for all i. Since the sequence E{Z 1{Z n}} is a decreasing sequence it follows that for all i i > δ 0 and m M that > ≥ supE{Zi 1{Zi m}} δ i > ≤ which implies uniform integrability.

D Note that it is in general not true to conclude from Zi Z and {Zi } uniformly inte- L1 −→ grable, that Z Z , but we can say that E{Z } E{Z }, which is formalized in the next i −→ i → theorem. 26 6.P ROOFS

D Theorem 8. Suppose Z Z and {Z : i C} is uniformly integrable for some C R, then i −→ i ≥ ∈ E{Z } E{Z }. i →

Proof. Let µi be the law of Zi and µ the law of Z . Then per deﬁnition, µi converges to µ weakly. Now we can apply Skorokhod’s representation theorem such that we can ﬁnd a probability space (Ω,F,P) and random variables W and Wi , i N with L(Wi ) a.s. ∈ = µ and L(W ) µ such that W W in the sense of the new probability space. Note i = i −→ that E{W } E{Z } and E{W } E{Z } since expectations only depend on their measures. i = i = It is also allowed to apply Vitali convergence theorem to get convergence in mean, i.e. E{W } E{W }. Combining all results completes the proof i → E{Z } E{W } E{W } E{Z }. i = i → =

Now to get back to where we originally wanted to go: if we can show that for the © 1 εª collection {Y : t T } for some T we have E Y + C for all t T and for some ε t ≥ t ≤ ≥ > 0,C R, then we have convergence of the expectations. To show this fact, we have to ∈ split everything up into three cases: 0 γ 1, γ 0 and γ 0. Every case is worked out < < < = in detail below. To proof the case where 0 γ 1 we ﬁrst need to have the following lemma. 6 < < Lemma 1. Suppose X D(G ) with 0 γ 1 and choose f and Y as in Corollary1. ∈ γ < < t Deﬁne µ X t ¶ Zt : − |X t = t > 1 ε then there exist ε 0,C 0,T 0 such that E{Yt + } C for all t T iff there exist δ 0, > > 1 δ > ≤ ≥ > D 0,T 0 such that E{Z + } D for all t T. > > t ≤ ≥ Proof. It follows immediately since t/f (t) is bounded, see Theorem3. and Corollary1. :

½µ ¶1 ε ¾ © 1 εª X t + E Yt + E − |X t = f (t) > µ ¶1 ε ½µ ¶1 ε ¾ t + X t + E − |X t = f (t) t > µ ¶1 ε t + © 1 εª E Z + . = f (t) t 6.2.C ONVERGENCEOFTHEEXPECTEDSHORTFALL 27

Theorem 9. Suppose X D(Gγ) with 0 γ 1 and choose f and Yt as in Corollary1. ∈ < < © 1 εª Then there exist an ε 0, a C 0, and a T 0 such that E Y + C for all t T. > > > t ≤ ≥ Proof. Let’s take a look at the random variable

µ X t ¶ Zt : − |X t = t >

By Lemma1. it is sufficient to show that there exist an ε 0, a C 0, and a T 0 such © 1 εª > > > that E Zt + C for all t T . ≤ ≥ 1 ε Choose 0 ε 1/γ 1 arbitrarily and simplify the expectation of Z + : < < − t Z µµ ¶1 ε ¶ © 1 εª 1 ∞ X t + E Zt + P − x, X t dx = P(X t) 0 t > > > 1 ¡ ¡ 1/(1 ε)¢¢ ¡ ¡ 1/(1 ε)¢¢ Z 1 F t 1 x + Z 1 F t 1 x + − + dx ∞ − + dx = 0 1 F (t) + 1 1 F (t) − − 1 ¡ ¡ 1/(1 ε)¢¢ Z 1 F t 1 x + Z 1 F (t y) − + dx ∞ − (1 ε)(y 1)ε dy = 0 1 F (t) + 2 1 F (t) · + − − − Now we want to show that both of these integral are convergent. For this, we want to apply Lebesgue’s dominated convergence theorem on both integrals. For the first integral, it is quite easy to find a bound since the integrand is bounded by 1. By Lebesgue’s dominated convergence theorem it follows that the first integral is 6 bounded by 1. The second integral is somewhat harder to bound. To find this bound one should note that 1 F RV 1/γ, see AppendixB and [2, p. 19]. − ∈ − Now it is allowed to apply Proposition8 to 1 F . Note that 1/γ ε 1. So we can − − + < − choose δ such that 1/γ δ ε 1 and δ 0.5 for the sake of illustration. Then this 2 − + 2 + < − 1 = proposition states that there exists a t such that for t t and t y t that: 0 > 0 > 0 1 F (t y) 3 n o 1/γ δ2 δ2 − y− max y , y− 1 F (t) < 2 − Since we are only interested in a bound for y 2 it is sufficient that t t where t is fixed. > > 0 0 Because we take t T , where T still needed to be determined, this will be no problem. ≥ It follows that t0 defines T . The second integral can thus be bounded by 3 Z n o 3(1 ε) Z ∞ 1/γ δ2 δ2 ε ∞ 1/γ δ2 ε y− max y , y− (1 ε)(y 1)− dy + y− + (y 1) dy 2 2 · + − = 2 2 − 3(1 ε) Z ∞ 1/γ δ2 ε + y− + + dy ≤ 2 2 1/γ δ2 ε 3(1 ε) 2− + + + . = 2 · 1 1/γ δ ε − + − 2 − © 1 εª So for all 0 ε 1/γ 1 it holds that E Z + for t T is bounded. So there exist an ε 0 < < − t ≥ > and a C 0 such that for all t T > ≥ © 1 εª E Y + C t ≤ 28 6.P ROOFS

Theorem 10. Suppose X D(Gγ) with γ 0 and choose f and Yt as in Corollary1. Then ∈ < © 1 εª there exist an ε 0,C 0, and a T 0 such that E Y + C for all t T. > > > t ≤ ≥

Proof. Since the distribution of X has a ﬁnite endpoint x∗ we have

½µ ¶1 ε ¾ ½µ ¶1 ε ¾ µ ¶1 ε X t + x∗ t + x∗ t + E − X t E − X t − . f (t) | > ≤ f (t) | > = f (t)

We know that (x∗ t)/f (t) is convergent (it converges to γ), thus it is bounded and thus − − the sequence to the power 1 ε is also bounded. It follows that there exist an ε 0 and a © 1 εª + > C 0 with E Y + C for all t T . > t ≤ ≥

Theorem 11. Suppose X D(G0) and choose f and Yt as in Corollary1. Then there exist ∈ © 1 εª an ε 0, a C 0, and a T 0 such that E Y + C for all t T. > > > t ≤ ≥ Proof. Since convergent sequences are bounded, it suffices to show that there exists an © 1 εª ε 0 such that E Y + converges. Choose 0 ε 1 arbitrarily, then the limit of this > t < < sequence can be simplified as a first step

½µ ¶1 ε ¾ © 1 εª X t + lim E Yt + lim E − |X t t x = t x f (t) > ↑ ∗ ↑ ∗ µµ ¶1 ε ¶ 1 Z X t + 6 lim ∞ P − x, X t dx = t x 1 F (t) 0 f (t) > > ↑ ∗ − ¡ 1/(1 ε) ¢ Z 1 F t x + f (t) lim ∞ − + dx = t x 0 1 F (t) ↑ ∗ − Z ¡ ¢ 1 ∞ ε 1 F t x f (t) lim x− − + dx = 1 ε t x 0 1 F (t) + ↑ ∗ − Z 1 1 ε 1 F (t x f (t)) lim x− − + dx = 1 ε t x 0 1 F (t) + ↑ ∗ Z − 1 ∞ ε 1 F (t x f (t)) lim x− − + dx + 1 ε t x 1 1 F (t) + ↑ ∗ − ε It is easy to see that the integrand of the first integral is bounded by x− because the ε distribution function F satisfies the monotonicity property. Since x− is an integrable function over [0,1] for ε 1, it is allowed to apply Lebesgue’s dominated convergence < theorem. This results in the following limit value of the first integral:

Z 1 Z 1 1 ε 1 F (t x f (t)) 1 ε 1 F (t x f (t)) lim x− − + dx x− lim − + dx 1 ε t x 0 1 F (t) = 1 ε 0 t x 1 F (t) + ↑ ∗ − + ↑ ∗ − Z 1 1 ε x x− e− dx : Cε = 1 ε 0 = < ∞ + Let’s take a look at the second integral, since if it is bounded we are done. To bound this integral I am going to ﬁnd an integrable bound of the integrand and then apply Lebesgue’s dominated convergence theorem. To ﬁnd this bound we’ll have to take a look at the inverses and then relate them to the original. 6.2.C ONVERGENCEOFTHEEXPECTEDSHORTFALL 29

Since the function U (the inverse of 1/(1 F )) is an element of Π, Theorem5. can be − applied to U. Thus for any δ 0 there exist t ,c 0 such that for t t , x 1, > 0 > ≥ 0 ≥ ¯ ¯ ¯U(tx) U(t) ¯ ¯ − ¯ cxδ ¯ a(t) ¯ ≤

Note that it is allowed to replace t by 1/(1 F (t)) since both of them tend to inﬁnity as − t and thus both will be above the threshold t provided t is above another threshold → ∞ 0 t (this t determines T ). Thus for any δ 0 there exist t ,c 0 such that for t t , x 1, 1 1 > 1 > ≥ 1 ≥ ¯ µ ¶ ¯ ¯ x ¯ ¯U t ¯ ¯ 1 F (t) − ¯ ¯ − ¯ cxδ ¯ µ 1 ¶ ¯ ≤ ¯ a ¯ ¯ 1 F (t) ¯ − Applying Lemma4. gives ¯ ¯ ¯ ¯ ¯ 1 F (t) ¯ 1/δ 1/δ ¯ − ¯ c− x , ¯ ³ ³ 1 ´´ ¯ ≥ ¯ 1 F t xa 1 F (t) ¯ − + − which leads to ¯ ³ ³ ´´ ¯ ¯ 1 F t xa 1 ¯ ¯ − + 1 F (t) ¯ 1/δ 1/δ 6 ¯ − ¯ c x− ¯ 1 F (t) ¯ ≤ ¯ − ¯ ε 1/δ and notice that the function x− − with δ 0 is integrable over [1, ) if and only if > ∞ δ 1/(1 ε). Thus now choose an arbitrary 0 δ 1/(1 ε). Note that for γ 0 we < − < < − = have f (t) a(1/(1 F (t)) (Theorem2. ). Now it is allowed to apply Lebesgue’s dominated = − convergence theorem:

Z 1 F (t x f (t)) Z 1 F (t x f (t)) lim ∞ xε − + dx ∞ xε lim − + dx t x∗ 1 1 F (t) = 1 t x∗ 1 F (t) ↑ − Z ↑ − ∞ ε x x e− dx : Dε = 1 = < ∞

It has been accomplished that for all t T and for any ε 0 ≥ > Z © 1 εª ∞ ε x lim E Yt + x e− dx Cε Dε t x∗ = 0 = + < ∞ → This completes the proof.

Combining all results gives us the following powerful statement. 30 6.P ROOFS

Corollary 3. Suppose X D(G ) with γ 1 and choose f and Y as in Corollary1. Then ∈ γ < t 1 E{Yt } E{Y } as t x , → = 1 γ ↑ ∗ − where x∗ sup{x F (x) 1} which may be ﬁnite but which also may be inﬁnite and where = | < Y GPD(γ). ∼ Proof. We know by Theorem9. , Theorem10. and Theorem11. that there exist an ε 0 1 ε > and a C 0 such that E{Y + } C for all t T provided γ 1. It follows that the sequence > t ≤ ≥ < {Y : t T } is uniformly integrable by Theorem7. Now applying Theorem8. leads to the t ≥ result: 1 E{Yt } E{Y } as t x∗ → = 1 γ ↑ −

This leads to the theorem and corollary that are stated in Chapter2. Theorem 12. Suppose X D(G ) with γ 1. Then ∈ γ < E{X X t} 1 lim | > , t x∗ t = 1 γ ↑ − + where γ max{γ,0}. + = Proof. First, let’s prove the case where 0 γ 1. From Corollary3. we know that 6 < < ½ X t ¾ 1 E − |X t f (t) > → 1 γ − Next to that equation (1.1) holds, i.e. f (t)/t γ. Note → ½ X t ¾ E{X X t} t t E − |X t | > f (t) > = t · f (t) − f (t) By uniqueness of limits E{X X t} f (t) ½ X t ¾ lim | > lim E − |X t 1 t x t = t x t · f (t) > + ↑ ∗ ↑ ∗ 1 1 1 γ 1 = · 1 γ + = 1 γ = 1 γ − − − + As t x∗ for γ 0 we have f (t)/t 0, this is deduced from equation (1.1). The result is ↑ ≤ → immediate: E{X X t} f (t) ½ X t ¾ 1 1 lim | > lim E − |X t 1 0 1 1 . t x∗ t = t x∗ t · f (t) > + = · 1 γ + = = 1 γ ↑ ↑ − − +

Corollary 4. Suppose X D(G ) with γ 1. Then ∈ γ < ES(1 p) 1 lim − p 0 Q1 p = 1 γ ↓ − − + where Q1 p is the (1 p)th quantile of the distribution of X and where ES is the expected − − shortfall. 6.3.F URTHERSTUDIES 31

6.3. FURTHERSTUDIES As announced in chapter5, the proof of the claim I made to encourage further studies, can be found here.

Theorem 13. Suppose X D(G ) with γ 0, then log X D(G ) and x∗ . ∈ γ > ∈ 0 log X = ∞ Proof. Define Y log X . Then we all are able to derive = F (t) F ¡et ¢, Y = X and it is sufficient to show by Proposition3. that there exist positive functions c and f , f continuous, such that for all t (t , ), t , ∈ 1 ∞ 1 < ∞ ½ Z t ds ¾ 1 FY (t) c(t)exp − = − t1 f (s) with limt c(t) c (0, ) and →∞ = ∈ ∞ lim f 0(t) 0 t = →∞ Since we assumed that X D(G ) we know there exist positive functions d and g, g ∈ γ continuous such that for all t (t , ), t , ∈ 0 ∞ 0 < ∞ ½ Z t ds ¾ 1 FX (t) d(t)exp − = − t0 g(s) with limt d(t) d (0, ) and 6 →∞ = ∈ ∞ g(t) lim γ t t = →∞ ¡ t ¢ s s Now define t logt , c(t): d e , f (s): g (e )e− . Then I claim that these two func- 1 = 0 = = tions give the desired result. The first identity is immediate:

( Z et ) ¡ t ¢ ¡ t ¢ ds 1 FY (t) 1 FX e d e exp − = − = − t1 g(s) ½ Z t es ds ¾ ½ Z t ds ¾ c(t)exp s c(t)exp = − logt1 g (e ) = − t0 f (s) ¡ t ¢ Note that d and f are positive functions, and that limt d(t) limt c e c →∞ = →∞ = ∈ (0, ). It remains to show that f is a continuous function and that the derivative of f ∞ converges to 0. The continuity of f follows directly from the continuity of g and from basic analysis principles. The derivative of f needs somewhat more work

d ¡ ¡ t ¢ t ¢ ¡ t ¢ t t ¡ t ¢ t ¡ t ¢ ¡ t ¢ t f 0(t) g e e− g 0 e e e− g e e− g 0 e g e e− = dt = − = − By assumption we know that g(t) γt as t . It now can be derived (using l’Hôpital’s ∼ → ∞ rule) that g 0(t) γ as t and the result follows ∼ → ∞ µ ¡ t ¢ ¡ t ¢ ¶ g 0 e g e ¡ t ¢ t t t lim f 0(t) lim γ γe e− 1 γ 1 lim γ− e 0 t = t · − et · = · − · t = →∞ →∞ γ γ →∞ Thus log X D(G ), provided X D(G ) with γ 0 ∈ 0 ∈ γ > 32 6.P ROOFS

Lemma 2. If f is a function for which f 0(x) 0 as x , then → → ∞ f (x) o (x). = Proof. Note that the result is trivial if the function f is convergent as x . So suppose → ∞ that f (x) as x . Then the result is immediate if l’Hôpital’s rule is applied: → ∞ → ∞

f (x) f 0(x) lim lim 0 x x →∞ x = →∞ 1 = Which is exactly the deﬁnition of f (x) o(x). =

Theorem 14. Suppose X D(G ) with γ 0, then log X D(G ) with x∗ logx∗ ∈ γ = ∈ 0 log X = X which may be inﬁnite.

Proof. One should note that the first part of the proof is exactly the same as the first part of the proof of Theorem13. Define y x∗ logx∗ , and it is sufficient to show that = log X = X ¡ t ¢ t f (t) g e e− complies with the necessary conditions =

lim f 0(t) 0 and lim f (t) 0 if y . t y = t y = < ∞ ↑ ↑ Note that g satisﬁes this condition and it follows that

¡ t ¢ ¡ t ¢ t f 0(t) g 0 e g e e− = − converges deﬁnitely to 0 if y . Also if y then < ∞ < ∞ ¡ t ¢ t f (t) g e e− = t converges to 0 since e− is bounded by 1 for t 0 and g converges to 0 if its argument > converges to x∗, and this is indeed the case. Now we only need to prove that limt y f 0(t) ↑ = 0 in the case of y . Therefore, it is only necessary to prove that = ∞ ¡ t ¢ t lim g e e− 0 t = →∞

It is allowed to apply Lemma2. since g 0(t) 0 as t . It follows immediately that → → ∞ g ¡et ¢ o ¡et ¢, = ¡ t ¢ t which implies per deﬁnition g e e− 0. Thus f satisﬁes the necessary condition from → which we can conclude that log X D(G ). ∈ 0 Remark 3. If X D(G ) with γ 0, then log X D(G ) in general. For example, if X ∈ γ < 6∈ 0 ∼ U([1,2]), then X D(G 1) and log X D(G 1). ∈ − ∈ − A ANALYSIS

This Appendix is meant for a few claims I made. Here they are formulated nicely. First of all, we need something about convergence of inverses given that the function itself converges. This can be formulated in the following lemma

Lemma 3. Suppose fn is a sequence of non-decreasing functions and g is a non-decreasing function. Suppose that for each x in some open interval (a,b) that is a continuity point of g: lim fn(x) g(x). (A.1) n →∞ = Let f ←,g ← be the left-continuous inverses of f and g. Then, for each x (g(a),g(b)) that n n ∈ is a continuity point of g ← we have

lim f ←(x) g ←(x). (A.2) n n →∞ = Moreover if relation (A.1) holds uniformly on (a,b) then relation (A.2) holds uniformly on (g(a),g(b)). For a proof we refer to [2].

One can construct the uniform convergence part easily by imitating the proof given in [2] but then choosing the n0 uniformly. We also want to say something about an inequality for the inverse of a sequence of functions if this sequence is bounded by another function. This is formulated in the next lemma

33 34 A.A NALYSIS

Lemma 4. Suppose fn is a sequence of non-decreasing functions and h is a positive non- A decreasing function. Suppose that there exists an x such that for all n and x x the 0 ≥ 0 following holds f (x) h (x), | n | ≤ n and suppose sup{f (x ): n N} . Let f ←,h← be the left-continuous inverses of re- n 0 ∈ < ∞ n spectively f and h. Then there exists an x such that for all n and x x we have n 1 ≥ 1

f ←(x) h←(x). | n | ≥ n Proof. First of all, note that if a function is non-decreasing, then its inverse is also non- decreasing. Deﬁne x : sup{f (x ): n N}. 1 = n 0 ∈ Pick an arbitrary k N. Then it will be shown that for any x x ∈ ≥ 1

f ←(x) h←(x) | k | ≥ k So, choose an arbitrary x x . Assume that there exists a z such that f (z) x, otherwise ≥ 1 k = it would make no sense, and define x2 as the smallest z that satisfies this relationship. We know by the definition of x that f (x ) x x f (x ). The non-decreasing property 1 k 2 = ≥ 1 ≥ k 0 and the minimality of x then leads to x x and thus 2 2 ≥ 0 h (x ) f (x ). k 2 ≥ k 2 This equation then transforms to ¡ ¢ h f ←(x) x k k ≥

Since hk is non decreasing, hk← is also non-decreasing, and so applying hk← to both sides of the equation leads to ¡ ¡ ¢¢ f ←(x) h← h f ←(x) h←(x), k ≥ k k k ≥ k which completes the proof. B REGULARVARIATION

This part contains a few relevant deﬁnitions one is probably not familiar with accompanied with some theorems that are needed for a few proofs in this report. The proofs of this theorems can all be found in de Haan and Ferreira [2]. First of all, we need to introduce the concept of regular variation.

Definition 10. A Lebesgue measurable function f : R+ R that is eventually positive is → regularly varying (at infinity) if for some α R, ∈ f (tx) lim xα, x 0. (B.1) t f (t) = > →∞ Notation: f RV ∈ α One link to extreme value theory is the fact that if X D(Gγ), then 1 F RV 1/γ ∈ − ∈ − (γ 0). 6= For the proof of one theorem it required to have an integrable bound for f (tx)/f (t). The next proposition helps a lot for this purpose. Proposition 8. [Potter, 1942] Suppose f RV . If δ ,δ 0 are arbitrary, there exists a ∈ α 1 2 > t t (δ ,δ ) such that for t t , tx t , 0 = 0 1 2 ≥ 0 ≥ 0 n o f (tx) n o α δ2 δ2 α δ2 δ2 (1 δ1)x min x ,x− (1 δ1)x max x ,x− . − < f (t) < + Note that conversely, if f satisfies the above property, then f RV . ∈ α We also need to introduce the concept of extended regular variation.

Deﬁnition 11. A Lebesgue measurable function f : R+ R is said to belong to the class Π → if there exists a function a : R+ R+ such that for x 0, → > f (tx) f (t) lim − logx. (B.2) t a(t) = →∞ Notation: f Π or f Π(a). The function a is called an auxiliary function for f . ∈ ∈

35 36 B.R EGULAR VARIATION

Note that this comes into play when X D(G ), the function U is then an element of ∈ 0 Π. This deﬁnition leads to the following corollary which we also used in this report.

Corollary 5. If f Π(a), for any ε 0 there exist t ,c 0 such that for t t , x 1, ∈ > 0 > ≥ 0 ≥ B ¯ ¯ ¯ f (tx) f (t) ¯ ¯ − ¯ cxε ¯ a(t) ¯ ≤ C COMPUTATIONS

Here I’ll show you how Table 4.1. has been made. We’ll start with the uniform distribution. We simplify the expected shortfall R ∞ x f (x)dx 1 Q1 p X 1 Z 1 ³ ´ − 2 ES(1 p) E{X X Q1 p } x dx 1 Q1 p − = | > − = p = p Q1 p = 2p − − −

We can compute Q1 p in closed form in terms of p: − Z Z 1 ∞ p fX (x)dx dx 1 Q1 p = Q1 p = Q1 p = − − − −

Thus it follows that Q1 p 1 p, and the expected shortfall can be simpliﬁed to − = − 1 2 p ES(1 p) ¡1 (1 p)2¢ − − = 2p − − = 2 For the standard normal distribution we have R ∞ x fX (x)dx Z Q1 p 1 2 1 Q2 /2 − ∞ x /2 1 p ES(1 p) x e− dx e− − , − = p = pp2π Q1 p · = pp2π − where Q1 p solves − Z Z ∞ 1 ∞ ¡ 2 ¢ p fX (x)dx exp x /2 dx. = Q1 p = p2π Q1 p − − − For the generalized Pareto distribution with γ 0 we have 6= R ∞ x fX (x)dx Z a(γ) Q1 p 1 1/γ 1 ES(1 p) − x (1 γx)− − dx − = p = p Q1 p · + − · 1/γ ¸a(γ) 1 (x 1)(1 γx)− + + , = p γ 1 Q1 p − −

37 38 C.C OMPUTATIONS

where a(γ) if γ 0 and a(γ) 1/γ is γ 0. Note that in both cases we get a 0 if we = ∞ > = − < ﬁll in this a(γ) in the equation above. So we get

1/γ (Q1 p 1)(1 γQ1 p )− ES(1 p) − + + − . − = p(1 γ) − Note that Z a(γ) 1/γ 1 £ 1/γ¤a(γ) p (1 γx)− − dx (1 γx)− . Q1 p = Q1 p + = − + − − C Again if we ﬁll in a(γ) in the equation above it will be 0 such that

1/γ p (1 γQ1 p )− = + − 1 ¡ γ ¢ So Q1 p p− 1 , and the expected shortfall simpliﬁes to − = γ − 1 ¡ γ ¢ p− 1 1 γ Q1 p 1 γ p− 1 γ ¡ γ ¢ 1 1 ES(1 p) − + − + − + p γ(1 γ) − γ− . − = 1 γ = 1 γ = γ(1 γ) = − − − − − x If γ 0, we have f (x) e− if x 0. So in that case = X = ≥ R R x x ∞ x f (x)dx ∞ xe dx [e (x 1)]∞ Q1 p Q1 p X Q1 p − − Q1 p e− − (Q1 p 1) ES(1 p) − − − + − − + − = p = p = p = p

and Z Z ∞ ∞ x Q1 p p fX (x)dx e− dx e− − . = Q1 p = Q1 p = − − Thus Q1 p log(p) and the expected shortfall can be simpliﬁed further to − = −

Q1 p µ ¶ e− − Q1 p 1 e ES(1 p) − + Q1 p 1 1 log(p) log . Q1 p − = e− − = − + = − = p For the student t-distribution with 3 degrees of freedom we have

2 1 Z Γ(2) µ x2 ¶− E{X X Q } ∞ x 1 dx 1 p ¡ 3 ¢ | > − = p Q1 p · p3πΓ + 3 − 2 2 p3 Z 2x µ x2 ¶− ∞ 1 dx = pπ Q1 p 3 + 3 − Ã 2 ! 1 p3 Q1 p − 1 − = pπ · + 3

where Q1 p , the (1 p)th quantile, solves − − Z 2 Z µ x2 ¶ p ∞ f (x)dx ∞ 1 dx. = Q1 p = πp3 Q1 p + 3 − − D SIMULATIONS

More results and details than listed in Chapter4 can be found here. The estimators are going to be tested, such that a clear image of the performance of the estimators is given, and, in particular, how you should apply them into practice if you have a good indication for the extreme value index. In this part, we will only discuss different types of generalized Pareto distributions since it holds for almost every distribution that the tail approximately follows a generalized Pareto distribution. We will look at 11 different parameters ξ for the generalized Pareto distribution:

ξ { 2, 1, 0.5,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7} ∈ − − − The simulation parameters are the same as earlier discussed in Chapter3. ( n 1000, 4 = p 10− , p 0.99, r 5000). = 0 = = We are going to present the simulation results for all these distributions in the form of figures. In the end, we will also try to find the best estimators for these distributions. It’d be delightful if we could just choose the best k, the upper order statistic, for every estimator and distribution. This is unfortunately not very useful since the generalized Pareto distribution is a distribution which is only used and introduced to model tail behavior. The k should thus be chosen such that the kth upper order statistics are in the tail of the distribution where the approximation of the generalized Pareto distribution makes sense. It should not be chosen according to the results in this chapter but it should be analyzed seperately. All figures that are listed consist of 6 graphs. The first two rows of figures correspond to the method 1 and method 2 estimator with their normalized mean squared errors. The real value of the expected shortfall is indicated with a horizontal line in the two left figures. The last row of figures display the distribution of the scaled error of the estimation, Ã ! µ ¶ ESˆ i,k ES˜ i,k 1 or 1 , ES − ES − where the value 100 for the upper order statistic index k has been chosen.

39 40 D.S IMULATIONS

D.1. GENERALIZED PARETODISTRIBUTIONWITHNEGATIVEPA- RAMETER ξ For negative ξ the estimator is pretty accurate. However, one should note that the bigger the parameter ξ, the worse the method works although the errors are still to be overseen. It should be pointed out that the best estimators in the case of ξ 0 are the moment < estimator and the probability-weighted moment estimator. The negative Hill estimator is not the one we prefer according to these simulations since it is only valid for γ 0.5 < − and it has relative big errors for γ 2. = −

Figure D.1: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 2 based on a sample − − = − size of 1000 and 5000 repetitions. D.1.G ENERALIZED PARETODISTRIBUTIONWITHNEGATIVEPARAMETER ξ 41

Figure D.2: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 1 based on a sample − − = − size of 1000 and 5000 repetitions. 42 D.S IMULATIONS

Figure D.3: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.5 based on a sample − − = − size of 1000 and 5000 repetitions. D.2.G ENERALIZED PARETODISTRIBUTIONWITHNON-NEGATIVE PARAMETER ξ 43

D.2. GENERALIZED PARETODISTRIBUTIONWITHNON-NEGATIVE PARAMETER ξ For non-negative ξ, the methods work fine if ξ 0.5. < For ξ 0.6 or higher, the moment estimator from method 1 should not be chosen = because of its significant errors. If ξ becomes strictly greater than 0.6, it should also be noted that estimates of infinity for the moment estimator occur. This will be the case if the extreme value index is to be estimated above 1. The estimations of infinity are, logically, not included in averaging the estimations as well as the normalized mean squared errors. Next to that, recall that the probability-weighted moment estimator only works for γ 1. So, if γ comes close to 1, the probability-weighted moment estimator probably < underestimates the quantity and the moment estimator also is not an accurate one because of its high variability. What can we do next? This is one of the further studies and D one possible option is formulated in Chapter5. 44 D.S IMULATIONS

Figure D.4: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0 based on a sample − − = size of 1000 and 5000 repetitions. D.2.G ENERALIZED PARETODISTRIBUTIONWITHNON-NEGATIVE PARAMETER ξ 45

Figure D.5: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.1 based on a sample − − = size of 1000 and 5000 repetitions. 46 D.S IMULATIONS

Figure D.6: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.2 based on a sample − − = size of 1000 and 5000 repetitions. D.2.G ENERALIZED PARETODISTRIBUTIONWITHNON-NEGATIVE PARAMETER ξ 47

Figure D.7: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.3 based on a sample − − = size of 1000 and 5000 repetitions. 48 D.S IMULATIONS

Figure D.8: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.4 based on a sample − − = size of 1000 and 5000 repetitions. D.2.G ENERALIZED PARETODISTRIBUTIONWITHNON-NEGATIVE PARAMETER ξ 49

Figure D.9: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.5 based on a sample − − = size of 1000 and 5000 repetitions.

From now on, the estimations of the moment estimator and the probability-weighted moment estimator are not included in the same ﬁgure since the difference between the estimators and the errors is relatively big such that the details are not readable anymore. 50 D.S IMULATIONS

The moment estimator:

The probability-weighted moment estimator:

Figure D.10: Estimates, mean squared errors, and the distribution of the errors of estimating the expected shortfall with argument 1 10 4 from the generalized Pareto distribution with parameter ξ 0.6 based on a − − = sample size of 1000 and 5000 repetitions. D.2.G ENERALIZED PARETODISTRIBUTIONWITHNON-NEGATIVE PARAMETER ξ 51

The errors are now so big that displaying the distribution of the errors is considered as useless.

The moment estimator:

The probability-weighted moment estimator:

Figure D.11: Estimates and mean squared errors of estimating the expected shortfall with argument 1 10 4 − − from the generalized Pareto distribution with parameter ξ 0.7 based on a sample size of 1000 and 5000 repe- = titions. 52 D.S IMULATIONS

D.3. CONCLUSIONS Concluding something from these results and apply them to cases with similar extreme value indexes, is quite a powerful thing to do, and it should not be done without thinking. If you have considered all options then Table D.1. could be useful. For the magnitude of the errors, it is best to take a look at the corresponding ﬁgure(s).

γ Estimator γ 0.5 Probability-weighted moment or moment estimator; ≤ If you’re sure that 0.5 γ 0.8 Probability-weighted moment estimator; < < γ 0.8 Actually none; ≥ Table D.1: An indication for what estimator to use. REFERENCES

REFERENCES [1] Billingsley (1971). Weak Converges of Measures: Applications in Probability. Philadelphia.

[2] de Haan and Ferreira (2006). Extreme Value Theory: An Introduction. New York.

[3] Grimmett and Welsch (2014). Probability: An Introduction. New York.

[4] Jacod and Protter (2002). Probability Essentials.

[5] Rudin (1986). Real and Complex Analysis.