There is a VaR beyond usual approximations. Towards a toolkit to compute risk measures of aggregated heavy tailed risks

Marie Kratz September 2013

SWISS FINANCIAL MARKET SUPERVISORY AUTHORITY FINMA Bern - Z¨urich

1 Acknowledgement.

This work has been carried out during my internship (July-December 2012) at the Swiss Financial Market Supervisory Authority (FINMA). I would like to thank FINMA for its hospitality and rich environment, and ISFA (Univ. Claude Bernard, Lyon 1) for having given me the opportunity to learn more about actuarial sciences and finance, in view of becoming an IA - actuary.

A special thanks to Dr. Hansj¨orgFurrer (Head of the QRM department, FINMA) for having made this stay possible and for bringing to my attention this interesting practical issue.

Thanks to all of my colleagues from the Quantitative Risk Management (QRM) - Fi- nancial Risk SST department, for making my stay at FINMA a rich, pleasant and fruitful experience. Besides the study presented in this report, it was very interesting to get the chance to participate in the review process, discussion and report of some company, to learn more about the Swiss Solvency Test (SST) through discussions with colleagues and the revision of the french translation of SST.

My warm thanks also to Dr. Michel Dacorogna, for stimulating and interesting discus- sions on this study.

2 Abstract Basel II and Solvency 2 both use the Value-at-Risk (VaR) as the risk measure to compute the Capital Requirements. In practice, to calibrate the VaR, a normal approximation is often chosen for the unknown distribution of the yearly log re- turns of financial assets. This is usually justified by the use of the Central Limit Theorem (CLT), when assuming aggregation of independent and identically dis- tributed (iid) observations in the portfolio model. Such a choice of modeling, in particular using light tail distributions, has proven during the crisis of 2008/2009 to be an inadequate approximation when dealing with the presence of extreme returns; as a consequence, it leads to a gross underestimation of the risks. The main objective of our study is to obtain the most accurate evaluations of the aggregated risks distribution and risk measures when working on financial or insurance data under the presence of heavy tail and to provide practical solutions for accurately estimating high of aggregated risks. We explore new ap- proaches to handle this problem, numerically as well as theoretically, based on properties of upper order statistics. We compare them with existing methods, for instance with one based on the Generalized Central Limit Theorem.

R´esum´e Les r`eglesde BˆaleII et de Solvabilit´e2 sont toutes deux bas´eessur le choix de la Value-at-Risk (VaR) comme mesure de risque pour calculer le capital de solvabilit´e. En pratique, il est souvent d’usage de choisir une approximation normale pour estimer la distribution sous-jacente des log-returns annuels d’actifs financiers, en vue de l’´evaluation de la VaR. Cela est justifi´epar l’utilisation du TLC lorsqu’on suppose le mod`eledu portefeuille constitu´ede l’aggr´egation d’observations iid. Un tel choix de mod´elisation,reposant sur l’utilisation d’une distribution `aqueue dite ’l´eg`ere’,s’est av´er´etotalement inappropri´elors de la crise financi`erede 2008/09 durant laquelle la pr´esencede returns extrˆemesest devenue manifeste. Cette erreur de mod`elea conduit `aune sous-estimation cons´equente des risques. L’objectif principal de cette ´etudeest d’obtenir une ´evaluation de la distribution des risques aggr´eg´eset des mesures de risque associ´ees,la meilleure possible, lorsqu’on traite des donn´eesfinanci`eresou d’assurance `aqueue de distribution lourde, et de proposer des solutions pratiques pour estimer de fa¸conla plus op- timale possible les quantiles extrˆemesdes risques aggr´eg´es. Nous explorons de nouvelles approches pour appr´ehenderce probl`eme,d’un point de vue th´eorique comme num´erique,bas´eessur les propri´et´esdes plus grandes statistiques d’ordre. Nous comparons ensuite ces approches avec les m´ethodes existantes, telle celle reposant sur le TLC g´en´eralis´e.

Keywords: aggregated risk, (refined) Berry-Ess´eeninequality, (generalized) central limit theorem, conditional (Pareto) distribution, conditional (Pareto) moment, convo- lution, , extreme values, financial data, high frequency data, market risk, order statistics, , rate of convergence, risk measures, stable distribution, Value-at-Risk

3 Synopsis

• Motivation Toute institution financi`ere,banque ou assurance, doit g`ererun portefeuille de risques. Ces risques agr´eg´es,mod´elis´espar des variables al´eatoires,constituent la base de tout mod`eleinterne. On constate qu’il est encore tr`esfr´equent, en pratique, d’utiliser une approximation normale pour estimer la distribution (in- connue) des rendements logarithmiques annuels d’actifs financiers. Cela est jus- tifi´epar l’application du Th´eor`eme de la Limite Centrale (TLC) sous l’hypoth`ese d’observations ind´ependantes et identiquement distribu´ees(iid) `avariance finie. Un tel choix de mod´elisation,utilisant une loi `aqueue de distribution dite l´eg`ere, s’est av´er´einappropri´epour l’´evaluation de mesures de risque, car elle sous estime le risque. Deux probl`emesessentiels se posent lors de l’application du TLC lorsqu’on con- sid`erece type de donn´ees `aqueue de distribution relativement lourde (par exem- ple de type Pareto ayant un param`etrede forme sup´erieur`a2). Le premier con- cerne la qualit´ede l’approximation li´eeaux moments de la variable parente, le sec- ond la pertinence d’une m´ethode graphique pour d´ecelerune distribution `aqueue ´epaisselorsqu’on agr`egeles donn´eeset que l’on obtient ainsi des ´echantillons de taille plus r´eduite.

,→ Revenons sur la 1`erequestion soulev´ee. Il ne faut pas perdre de vue que le TLC est un th´eor`emepermettant d’´evaluer le comportement moyen d’un ph´enom`eneet non celui des extrˆemes(localis´esen queue de distribution). Il a ´et´ed´emontr´e,depuis les ´etudesdes ann´ees80, en particulier celle de Hall ([27]), qu’enlever les extrˆemesde l’´echantillon peut am´eliorerla vitesse de convergence (variance plus faible) de la moyenne lors de l’application du TLC. De plus, mˆemesi l’on ne s’int´eressaitqu’au comportement moyen, on sait qu’il s’agit d’un th´eor`emeasymptotique fonction de la taille de l’´echantillon et que l’on peut obtenir une mauvaise approximation pour des ´echantillons de taille r´eduite. Pour am´eliorerla qualit´ede l’approximation, l’existence de moments d’ordre sup´erieur`a2 s’av`eren´ecessaire,comme l’atteste le d´eveloppement de Edgeworth. ,→ Concernant le second probl`emeli´e`al’agr´egation, notons que l’existence d’une distribution sous-jacente `aqueue lourde peut se d´ecelerpar des m´ethodes empiriques/graphiques, telles le QQ-plot, - clairement sur des donn´ees`ahaute fr´equence (par exemple des donn´ees journali`eres) - et non plus sur des donn´eesagr´eg´ees,par exemple des donn´eesannuelles (´echantillons de petite taille)

4 alors qu’il est bien connu, depuis le th´eor`emede Fisher, que l’indice de queue de la distribution sous-jacente reste invariant apr`esagr´egation.C’est un ph´enom`enesur lequel ont insist´ebien des auteurs (cf. par exemple [13]). Ce second probl`emes’illustre tr`esbien par les QQ-plots obtenus pour les ren- dements S&P 500 figurant dans la section Introduction. En effet, alors que le QQ-plot effectu´esur des donn´eesjournali`eresde 1987 `a2007 permet de d´etecter une distribution `aqueue lourde, il n’en est pas de mˆemelorsqu’on agr`egeces donn´eesmensuellement. Dans ce cas, le QQ-plot apparaˆıtplutˆotcomme normal, et inciterait alors `aconsid´ererles crises financi`eresde 1998 et 1987 comme des valeurs aberrantes. Mˆemeen consid´erant les donn´eesagr´eg´eesmensuellement sur une p´eriode un peu plus longue, `asavoir de 1987 `a2013, le QQ-plot reste plus ou moins inchang´e, i.e. normal, sauf que cette fois un autre point ’aberrant’ apparaˆıt, avec la date de ... Octobre 2008 ! Sans revenir aux donn´eesjournali`eresmais en augmentant sensiblement la taille de l’´echantillon, avec des donn´eesmensuelles de 1791 `a2013, il devient `anouveau manifeste que la distribution sous-jacente est `aqueue lourde. La crise financi`erede 2008 est un ´ev´enement apparaissant dans la queue de distribution et ne peut alors plus ˆetre´ecart´ede l’analyse en tant que point aberrant. Ces diff´erentes figures illustrent bien l’importance de la taille d’´echantillon, lorsque nous devons ´evaluer des risques en pr´esencede distri- bution `aqueue relativement lourde. Il s’av`eredonc fondamental de proposer une m´ethode qui ne d´epende pas de la taille de l’´echantillon pour d´ecelerla forme de la queue de distribution.

• Objectif La recherche de telle(s) approche(s) afin d’obtenir les ´evaluations les plus fines possibles des mesures de risque lorsque nous analysons des donn´eesfinanci`eres en pr´esencede queues de distribution ´epaissesconstitue notre principal objectif. Nous allons explorer diverses m´ethodes, existantes et nouvelles, pour r´esoudrece probl`eme,th´eoriquement et num´eriquement. Ayant en vue des applications financi`eres ou actuarielles, nous utiliserons des mod`eles de loi puissance pour les marginales des risques, telle la loi de Pareto de param`etre α d´efiniepar F (x) := 1 − F (x) = x−α, α > 0, x ≥ 1 (1) Nous d´efinironsles rendements selon (n) Xi := ln Pi − ln Pi−n, n ≥ 1

o`u Pi est le prix journalier et n le facteur d’agr´egation. Notons que nous pouvons aussi ´ecrire n (n) X (1) Xi = Xi i=1

5 (1) Nous simplifierons la notation de Xi en Xi et consid´ereronsalors un n-´echantillon (Xi, i = 1, . . . , n) de variable parente X suivant une loi de Pareto, de statistiques d’ordre associ´ees X(1) 6 ··· 6 X(n). • Questions ou remarques Lorsque nous abordons la recherche de m´ethodes appropri´ees,plusieurs questions se posent de suite, dont certaines sur le choix des hypoth`eses.

– Pourquoi consid´ererdes donn´eessimul´eesselon une loi de Pareto? Cette hypoth`esese justifie par la Th´eoriedes Valeurs Extrˆemes(TVE). En effet, il suffit de rappeler les deux r´esultatsfondamentaux suivants. ∗ D’une part, le th´eor`emede Pickands ´etablitque pour un niveau u assez ´el´ev´e,la loi de Pareto G´en´eralis´ee(GPD) Gξ,σ(u) (de param`etrede forme ξ et de param`etred’´echelle σ(u)) est une tr`esbonne approximation de la loi des exc`esau del`ade ce niveau d´efiniepar Fu(x) = P[X − u ≤ x|X > u]: Fu(y) ≈ Gξ,σ(u) (y) u→∞ ∗ D’autre part, pour ξ > 0, la queue de la GPD est asymptotiquement ´equivalente `aune queue de Pareto:

−1/ξ Gξ,σ(u)(y) := 1 − Gξ,σ(u)(y) ∼ cy (c > 0 some constant) y→∞ Ces deux rappels montrent qu’il est donc tout `afait naturel et relativement g´en´eralde consid´erercomme loi sous-jacente une Pareto, lorsque nous tra- vaillons sur des donn´eesen pr´esencede queues lourdes, et que nous avons en vue l’´evaluation des mesures de risque (risque extrˆemes). – La condition ’iid’ est-elle trop restrictive pour notre ´etude? Nous pouvons r´epondre `acette question quelque peu provocatrice en faisant appel `anou- veau `ala TVE, ainsi qu’`aune ´etuder´ecente de Embrechts et al. ([19]). ,→ Nous savons, via la TVE, que l’indice de queue de la loi des risques agr´eg´escorrespond `acelui de la marginale ayant la queue la plus lourde, cela ne d´epend donc pas du probl`emede d´ependance. ,→ R´ecemment, un nouvel algorithme num´eriquea ´et´eintroduit par Em- brechts et al. ([19]) pour le calcul des bornes inf´erieureet sup´erieurede la Value-at-Risk (VaR) de portefeuilles (ie risques agr´eg´es)inhomog`enes de dimension ´elev´ee,quelle que soit la structure de d´ependance. (Rap- pelons que la VaR `al’ordre q d’une variable correspond au d’ordre q de cette variable.) Les auteurs remarquent que la qualit´ede ces bornes d´epend essentiellement de l’information marginale disponible pour le mod`ele,et de fa¸consurprenante, nettement moins de l’information sur la structure de d´ependance. De plus, dans le cas de Pareto, ces bornes s’av`erent ˆetretr`esproches.

6 Aussi est-il naturel de simplifier notre ´etudeen supposant les variables iid.

D’autres questions seront abord´eeslors de la construction de nos m´ethodes, par exemple:

– D`eslors que nous consid´eronsdes donn´eesagr´eg´ees,ce qui signifie une taille d’´echantillon plus faible, est-il raisonnable de proposer des m´ethodes bas´ees sur des th´eor`emesasymptotiques pour ´evaluer la distribution sous-jacente? – Quel type d’approximation peut ˆetreutilis´elorsque nous sommes en pr´esence de queues lourdes? cela d´epend-t-il de l’indice de queue?

Notons enfin que cette ´etudepermet de remplir la derni`ere’brique’ manquante sur le comportement de la somme de variables iid dans le cas de queue de dis- tribution ’mod´er´ement’ lourdes, pour lesquelles le TLC s’applique (pour le com- portement moyen!) mais avec une vitesse de convergence pouvant ˆetre lente et dont l’approximation pour la queue de distribution n’est pas satisfaisante.

• Sommaire Nous passerons en revue les m´ethodes existantes, depuis celle bas´eesur le TCL g´en´eralis´e(GCLT) (avec convergence vers une loi stable si le param`etrede forme de la loi de Pareto est inf´erieur`a2 et vers une loi normale sinon), jusqu’`ala m´ethode du maximum (TVE). Nous proposerons ensuite deux m´ethodes, toutes deux inspir´eesd’une ´etudede Zaliapin et al.’s ( [49]) dans laquelle la somme de n variables iid est r´e´ecritecomme la somme des statistiques d’ordre associ´eespour lesquelles on peut ´etudier les moments. La premi`erem´ethode, appel´ee Normex, est d´evelopp´eeth´eoriquement et num´eri- quement, et permet de comprendre la divergence entre la distribution sous- jacente `aqueue mod´er´ement lourde et l’approximation normale, lorsque le TLC s’applique. Elle identifie le nombre de statistiques d’ordre les plus grandes `a extraire de la somme totale afin de proposer une loi m´elangeentre une nor- male appliqu´ee`ala somme ´ecrˆet´ee(avec une bonne vitesse de convergence) et la distribution exacte du reste de la somme. Ce nombre de statistiques d’ordre les plus grandes varie selon la valeur du param`etrede forme de la loi de Pareto, mais est tr`espetit et ind´ependant de la taille de l’´echantillon! Parmi les diff´erentes m´ethodes test´ees,Normex donne en g´en´eralles meilleurs r´esultats, ind´ependamment de la taille de l’´echantillon et de l’´epaisseurde la queue de dis- tribution. Nous faisons une comparaison analytique entre la distribution exacte des risques agr´eg´eset celle obtenue via Normex, puis d´eveloppons l’application de Normex pour l’´evaluation des mesures de risque. La seconde m´ethode propos´eeest empirique et consiste en une approximation normale `aparam`etrespond´er´es,dans le cas o`ule param`etre de forme de la loi de Pareto est sup´erieur`a2. Cette m´ethode a le m´erite de constituer un outil tr`es simple, permettant de rester dans le cadre Gaussien, mais avec termes correctifs

7 tant pour la moyenne que pour la variance. C’est en g´en´eralla 2nde meilleure m´ethode apr`esNormex. Finalement, nous terminons cette ´etudepar l’application des diff´erentes m´ethodes `al’´evaluation des mesures de risque, afin de les tester et de les comparer num´eriquement. Nous prenons comme exemple de mesure de risque la VaR et consid´eronsles quantiles extrˆemes(aux ordres 95%, 99% et 99,5%) utilis´esdans les calculs de solvabilit´e.Les diff´erentes ´evaluations, assorties de leur erreurs relatives, figurent dans les tableaux de la section 4.2.

• Principaux r´esultats Citons, pour clore ce synopsis, le principal r´esultatli´e`achacune des 2 m´ethodes propos´ees.

Th´eor`eme1 - Normex. La distribution de la somme de Pareto Sn peut ˆetre approch´eepar la distribution Gn,α,k, d´efiniepour tout x ≥ 1 par

 x x−y ?(k−1) R f (y) R ϕ ? h (v)dv dy si k ≥ 2  1 (n−k+1) 0 m1(y),σ(y) y G (x) = n,α,k    R x f(n)(y) R x−y v−m1(y)  1 σ(y) 0 ϕ σ(y) dv dy si k = 1

Pour k = 1, la distribution de Sn est donn´eepar Z x Z x−y   1 −(1+α) −α n−1 v − m1(y) Gn,α,1(x) = nα y (1 − y ) ϕ dv dy 1 σ(y) 0 σ(y) Pour k ≥ 2 (mais petit), nous avons Z x Z x−yZ v  f(n−k+1)(y) v − u − m1(y) ?(k−1) Gn,α,k(x) = ϕ hy (u)du dv dy 1 σ(y) 0 0 σ(y)

?(k−1) o`ule produit de convolution hy peut ˆetreais´ement ´evalu´enum´eriquement en util- isant l’´equation de convolution r´ecurrente appliqu´ee`a h, ou par une formule exacte dans les cas α = 1, 2.

R´esultat2. Consid´erons k = k(n, γ) avec γ = 0.9, tel que

k = k(n, γ) = [n(1 − γ)] = [n/10]

La distribution de Sn peut ˆetreapproch´ee,pour tout n et tout α > 2, par la loi normale `aparam`etrespond´er´es

 2 2 2 N m1(α, n, k) + k ESγ(X) , σ (α, n, k) × γ ESγ α o`u ES (X) = (1 − γ)−1/α, et m (α, n, k), σ2(α, n, k) sont les param`etres γ (α − 1) 1 de la loi normale de la somme ´ecrˆetr´eedes k statistiques d’ordre les plus grandes.

8 Contents

1 Introduction 10 1.1 Motivation and objective ...... 10 1.2 Preliminaries ...... 13

2 Existing methods 17 2.1 A GCLT approach ...... 17 2.1.1 Limit theorems ...... 17 2.1.2 Evaluation of Risk measures ...... 18 2.2 An EVT approach ...... 19 2.2.1 GEV approach ...... 20 2.2.2 Mean Excess Plot approach ...... 21

3 New approaches - mixed limit theorems 23 3.1 Introduction ...... 23 3.1.1 On the selection of the threshold for the best mean behavior of aggregated heavy tail distributed risks ...... 25 3.1.2 Order statistics ...... 27 3.2 Method 1: Normex - a mixed normal-extremes limit ...... 31 3.2.1 A conditional decomposition ...... 31 3.2.2 On the quality of the approximation of the distribution of the Pareto sum Sn ...... 38 3.3 Method 2 : a weighted normal limit ...... 46

4 Application to risk measures 50 4.1 Possible approximations of VaR ...... 50 4.2 Numerical study - comparison of the various methods ...... 52 4.2.1 Presentation of the study ...... 52 4.2.2 Estimation of the VaR with the various methods ...... 53 4.2.3 Discussion of the results ...... 61

5 Conclusion 62

6 Bibliography 64

7 Appendix 67 7.1 Numerical study of K(α, n)...... 67 7.2 Results concerning VaR ...... 69 7.3 R codes ...... 70

9 1 Introduction

1.1 Motivation and objective • Main issue / Motivation:

– A normal approximation is often chosen in practice for the unknown distri- bution of the yearly log returns of financial assets, justified by the use of the CLT, when assuming independent and identically distributed (iid) observa- tions. Such a choice of modeling, in particular using light tail distributions, has been revealed to be an inadequate approximation when dealing with risk measures; as a consequence, it leads to underestimate the risk. – Recently, a study was done by Furrer ([23]) on simulated iid Pareto random variables (rv’s) to measure the impact of the choice and the use of the lim- iting distribution of aggregated risks, in particular for the computation of standard risk measures (VaR or ES). In this study, the standard General Central Limit Theorem (GCLT) (see e.g. [44]) is recalled, providing a limit- ing stable distribution or a normal one, depending on the value of the shape parameter of the Pareto rv’s. Then, considering Pareto samples of various sizes and for different values of the shape parameter, Furrer compared the distance between the empirical distribution and the theoretical limiting dis- tribution, then computed the empirical VaR and TVaR and compared them with the ones computed from the limiting distribution. It appeared clearly that not only the choice of the limiting distribution matters, but also the rate of convergence, hence the way of aggregating the variables. From this study, we also notice that the normal approximation appears really inade- quate when considering aggregated risks coming from a moderately heavy tail distribution, i.e. a Pareto with a shape parameter or tail index larger than 2, but below 4. – A few comments can be added to this study. First, the numerical results obtained in [23] confirm what is already known in the literature. In particular, there are two main drawbacks when using the CLT for moderate heavy tail distributions (e.g. Pareto with a shape parameter larger than 2). On one hand, if the CLT may apply to the sample mean because of a finite variance, we also know that it provides a normal approximation with a very slow rate of convergence, that may be improved when removing extremes from the sample (see e.g. [27]). Hence, even if we are interested only in the sample mean, samples of small or moderate sizes will lead to a bad approximation. To improve the rate of convergence, existence of moments of order larger than 2 is necessary (see e.g. §3.2 in [20], or, for more details, [39]). On the other hand, we know that it has also been proved theoretically (see

10 e.g. [11]) as well as empirically (see e.g. [13], §5.4.3) that the CLT approach applied to a heavy tail distributed sample does not bring any information on the tail, therefore should not be used to evaluate risk measures. Indeed, a heavy tail may appear clearly on high frequency data (e.g. daily ones) but become not visible anymore when aggregating them in e.g. yearly data (i.e. short samples), although it is known, by Fisher theorem, that the tail index of the underlying distribution remains constant under aggregation. It is a phenomenon on which many authors insisted, as eg in [13]. The figures on the S&P 500 returns illustrate very clearly this last issue.

On these figures above, the QQ-plot of the S&P 500 daily returns from 1987 to 2007, helps to detect a heavy tail. When aggregating the daily returns into monthly returns, the QQ-plot looks more as a normal one, and the financial crises of 1998 and 1987 could be on this graph considered as outliers. Now, look at the figures below. When adding data from 2008 to 2013, the QQ plot looks pretty the same, i.e. normal, except that another ”outlier” appears ... with the date of October 2008! Instead of looking again on daily data for the same years, let us consider a larger sample of monthly data from 1791 to 20131. With a larger sample size, the heavy tail becomes again visible. And now we see that the financial crisis of 2008 does belong to the heavy tail of the distribution and cannot be considered anymore as an outlier. So we clearly see the importance of the sample size, when dealing with moderately heavy tails to estimate the risk. Thus we need a method that does not depend on the sample size, but look at the shape of the tail.

1as compiled by Global Finance Data (https://www.globalfinancialdata.com/index.html)

11 • Objective. The main objective is to obtain the most accurate evaluation of the distribution of aggregated risks and of risk measures when working on financial data under the presence of fat tail. We explore various approaches to handle this problem, theoretically, as well as empirically and numerically.

• Plan. After reviewing briefly the existing methods, from the General Central Limit Theorem (GCLT) to Extreme Value Theory (EVT), we will propose and develop two new methods, both inspired by the work of Zaliapin et al.’s (see [49]) in which the sum of n iid rv’s is rewritten as the sum of the associated order statistics. The first method, named Normex, answers the question of how many largest order statistics would explain the divergence between the underlying moderately heavy tail distribution and the normal approximation, whenever the CLT applies, and combines a normal approximation with the exact distribution of this number (independent of the size of the sample) of largest order statistics. It provides in general the sharpest results among the different methods, whatever is the sample size and for any heaviness of the tail. The second method is empirical and consists of a weighted normal approximation. Of course, we cannot expect such a sharp result as the one obtained with Normex. However it provides a simple tool allowing to remain in the Gaussian realm. We introduce a shift in the mean and a weight in the variance, as correcting terms for the Gaussian parameters. Then we will proceed to an analytical comparison between the exact distribution of the Pareto sum and its approximation given by Normex, before turning to the

12 application to the evaluation of risk measures. Finally a numerical study will follow, applying the various methods on simulated samples to compare the accuracy of the extreme quantiles, used as risk measures in solvency calculation. With financial/actuarial applications in mind, and without loss of generality, we will use power law models for the marginal distributions of the risks such as the Pareto distribution.

1.2 Preliminaries • Main notation Let Φ and ϕ denote, respectively, the cdf and the density function of the standard N (0, 1).

Let X be a (r.v.), Pareto (type I) distributed with shape pa- rameter α and cumulative distribution function (cdf) F defined by

F (x) := 1 − F (x) = x−α, α > 0, x ≥ 1 (2)

and probability density function (df) denoted by f. Note that the inverse function F ← of F is given by

← − 1 F (z) = (1 − z) α , for 0 < z < 1 (3) α α Recall that for α > 1, (X) = and for α > 2, var(X) = . E α − 1 (α − 1)2(α − 2) n X We denote by Sn the Pareto sum Sn := Xi ,(Xi, i = 1, . . . , n) being an i=1 n-sample with parent r.v. X, and by X(1) 6 ··· 6 X(n) the order statistics of (Xi)16i6n.

When dealing with financial assets (market risk data), we define the returns as

(n) Xi := ln Pi − ln Pi−n, n ≥ 1

Pi being the daily price, and n representing the aggregation factor. Note that we can also write

n (n) X (1) Xi = Xi i=1

(1) In what follows, we will denote Xi by Xi.

13 • Risk measures

– Definitions Let us introduce the risk measures used in solvency calculations, namely the Value-at-Risk, denoted VaR, and the Expected Shortfall Expected (named also Tail-Value-at-Risk) ES (or TVaR), of a rv X with cdf FX (and inverse ← function denoted by FX ).

∗ The Value-at-Risk of order q of X is simply the quantile of FX of order q, q ∈ (0, 1):

← V aRq(X) = inf{y ∈ R : P [X > y] ≤ 1 − q} = FX (q)

∗ If E|X| < ∞, the expected shortfall (ES), at confidence level q ∈ (0, 1) is defined as 1 Z 1 ESq(X) = V aRβ(X) dβ or ESq(X) = E[X | X ≥ V aRq] 1 − q q

It can also be thought as an average over all risks exceeding V aRq(X). This risk measure does depend only on the tail cdf of X and satisfies ESq(X) ≥ V aRq(X).

Note that we will simplify the notation of those risk measures writing V aRq or ESq when no confusion is possible. – Pareto risks In the case of a α-Pareto distribution, we deduce from (3) analytical expres- sions of those two risk measures, namely

← − 1 V aRq(X) = F (q) = (1 − q) α (4)

and, if X ∈ L1, i.e. if α > 1, then α α ES (X) = (V aR (X))1−α = (1 − q)−1/α (5) q (α − 1)(1 − q) q (α − 1)

For α-Pareto iid rv’s, the risk measure VaR is asymptotically superadditive if α ∈ (0, 1) and subadditive if α ≥ 1.

Recall also that the shape parameter α determines totally the ratio ESq/V aRq when we go far enough out into the tail: ES lim q = (1 − 1/α)−1 if α > 0 and 1 otherwise. q→1 V aRq Note that this result holds also for the Generalized Pareto Distribution with shape parameter ξ = 1/α, which will be defined later.

14 – Aggregated risks Pn When looking at aggregated risks i=1 Xi, it is well known that the risk measure ES is coherent (see [2]). In particular it is subadditive, i.e.

n n X  X ESq Xi ≤ ESq(Xi) i=1 i=1 whereas VaR is not a coherent measure, because it is not subadditive. In- deed many examples can be given where VaR is superadditive, i.e.

n n X  X V aRq Xi ≥ V aRq(Xi) (see e.g. [18], [14]) i=1 i=1 We have the following property. Proposition 1.1 ([18]) Consider i.i.d. rv’s Xi, i = 1, . . . , n with parent rv X and cdf FX . Assume they are regularly varying with tail index β > 0, which means that the right tail 1 − FX of its distribution satisfies 1 − F (ax) lim X = a−β, ∀a > 0 x→∞ 1 − FX (x)

Then the risk measure VaR is asymptotically subadditive for X1,...,Xn if and only if β ≥ 1:

Pn  V aRq i=1 Xi lim n ≤ 1 ⇔ β ≥ 1 q%1 P i=1 V aRq(Xi)

Recently, numerical and analytical techniques have been developed in order to evaluate the risk measures VaR and ES under different dependence as- sumptions regarding the loss random variables. It certainly helps for a better understanding of the aggregation and diversification properties of risk mea- sures, in particular of non-coherent ones such as VaR. We will not review these techniques and results in this report, but refer to [19] for an overview, and references therein. Let us add to those references some recent work by Mikosch and Wintenberger (see [33]) on large deviations under dependence which allows an evaluation of VaR. Nevertheless, it is worth mentioning a new numerical algorithm that has been introduced by Embrechts and coauthors (see [19]), which allows for the computation of reliable lower and upper bounds for the VaR of high- dimensional (inhomogeneous) portfolios, whatever the dependence structure is. Quoting the authors, ’”surprisingly, additional positive dependence in- formation (like positive correlation) does typically not improve the upper

15 bound substantially. In contrast higher order marginal information on the model, when available, may lead to strongly improved bounds. It is a good news since, in practice, typically only the marginal loss distribution func- tions are known or statistically estimated, while the dependence structure between the losses is either completely or partially unknown.”

• Further comments or questions

– Is it still worth considering i.i.d. rv’s, whereas most recent research focus on dependent ones? The previous quotation answers already for a good part this provocative question, when evaluating the VaR or conservative values for the VaR. More- over, in the case of aggregated Pareto risks, those bounds provided in [19] turn out to be very close, whatever the dependency is, as noticed by H. B¨uhlmannin his talk at the symposium for Paul Embrechts’ 60th birthday (see [6]). Another theoretical reason comes from the Extreme Value Theory (EVT); indeed we know that the tail index of the aggregated distribution corre- sponds to the one of the marginal with the heaviest tail, hence does not depend on considering the issue of dependence. Finally, there was still mathematically a missing ’brick’ when studying the behavior of the sum of iid rv’s with a moderately heavy tail, for which the CLT applies (for the center of distribution! ) but with a slow convergence for the mean behavior and certainly does not provide satisfactory approxi- mation for the tail. Our study aims at filling up the gap, by looking at an appropriate limit distribution.

– Why considering Pareto distribution? Again it is justified by the EVT. It is enough to go back to Pickands theorem, which states that for sufficiently high threshold u, the GPD Gξ,σ(u) (with shape parameter ξ and scale parameter σ(u)) is a very good approximation to the excess cdf defined by Fu(x) = P[X − u ≤ x|X > u]:

Fu(y) ≈ Gξ,σ(u) (y) u→∞ and to recall that, for ξ > 0,

−1/ξ Gξ,σ(u)(y) ∼ cy (c > 0 some constant) y→∞

Hence it is quite natural and reasonable to consider iid Pareto rv’s for this study.

16 – A last remark concerns the parameter α that we consider as given in our study. A prerequisite, when working on real data, would be to estimate α. Recall that there are various ways to test the presence of a heavy tail and to estimate the tail index, using e.g. the Hill estimator (see [28]) or the QQ-estimator (see [31]). We will not provide an inventory of the methods, except a brief recall in the next section of an important empirical EVT method used for estimating the heaviness of the tail. Let us also mention a test, easy to use in practice, for the existence of fat tails, namely the scaling law (see [13], §5.5). It consists of comparing the two plots, for p = 1 and Pn  2 respectively, of ln(n), ln || i=1 Xi||p , n ≥ 1; if the scaling exponent for p = 1 is larger than for p = 2, then it is a sign of the existence of a fat tail. For financial data, there are numerous studies that show the existence of fat tail and that the tail index is between 2 and 4 for developed markets (see e.g. [12] and references therein).

2 Existing methods

Limit theorems for the sum of iid rv’s are well known. Nevertheless, they can be misused in practice, for various reasons among which a too small sample size, as we have seen. As a consequence, it leads to wrong estimations of the risk measures for aggregated data. To help practitioners to be sensitive to this issue, we consider the simple example of aggregated heavy-tailed risks, where the risks are represented by iid Pareto random variables. We start by reviewing the existing methods, from the General Central Limit Theorem (GCLT) to Extreme Value Theory (EVT), and apply them on simulated Pareto samples to show the pros and cons of those methods, especially for the evaluation of risk measures.

2.1 A GCLT approach 2.1.1 Limit theorems • For sake of completeness, let us recall the General Central Limit Theorem (GCLT) (see e.g. [44]), which states that the properly normalized sum of a large number of iid rv’s belonging to the domain of attraction of an α-stable law may be approximated by a stable distribution with index α (0 < α ≤ 2).

Theorem 2.1 (GCLT) 2 Let (Xi, i ≥ 1) be iid rv’s, mean µ and variance σ , with parent rv X and parent distribution F attracted by an α-stable law Fα (we say that X belongs to the domain of attraction of Fα). Suppose that α > 0.

17 (i) If E(X2) < ∞, then S − nµ n √ →d Φ σ n

(ii) If E(X2) = ∞ and α = 2, or if α < 2, then S − µ n n →d G α-stable distribution n1/αL(n) α Z with µn := n xdF (x), an = inf {x : P(|X| > x) < 1/n}, and L an |x|≤an L(tx) appropriate slowly varying function, i.e. satisfying lim = 1, ∀t > 0. x→∞ L(t)

• In our specific case where we consider iid Pareto rv’s with shape parameter α, their sum Sn might then be approximated by a stable distribution whenever 0 < α < 2 (via the GCLT) and by a standard normal distribution for α ≥ 2 (via the CLT for α > 2; for α = 2, it comes back to a normal limit with a variance different from var(X), which does not exist):

Sn − bn d If 0 < α < 2, 1/α → Gα (6) n Cα

1  nα  d If α ≥ 2, Sn − → Φ (7) dn α − 1 with   0 if 0 < α < 1  2 Z ∞  πn πx bn = sin dF (x) ' n (log n + 1 − C − log(2/π)) if α = 1 (8)  2 1 2n  nα/(α − 1) if 1 < α < 2 (C = Euler constant 0.5772)  (Γ(1 − α) cos(πα/2))1/α if α 6= 1 Cα = π (9) 2 if α = 1 √  nα √  (α−1) α−2 if α > 2 dn = n 2n log x o (10)  inf x : x2 ≤ 1 if α = 2

2.1.2 Evaluation of Risk measures Limit theorems can be used to approximate the distribution of aggregated risks and, then, to evaluate the risk measures. In this report, we will take the example of the VaR, since it is the risk measure used in Basel II and Solvency 2.

18 • Well known approximation techniques based on the General Central Limit The- orem (GCLT) establish the general limit for the Pareto sum in terms of stable distributions; these approximations work well for large numbers of summands. Many studies, performed in practice or in theory, refer to this approach (see e.g. [44], [49], [23]). In particular, these approximation techniques have been presented and discussed in Zaliapin et al. (see [49]) for approximating the distri- bution of the sum of iid Pareto rv’s with shape parameter 0 < α < 2, in order to evaluate quantiles of the distribution of the sum, with arbitrary numbers of summands. They also have been discussed in Furrer (see [23]) for a positive shape parameter α. Note that in this last reference, both risk measures, VaR ad ES, have been considered. With the GCLT approach, an approximation of an arbitrarily quantile zq of the Pareto sum Sn, defined by P(Sn ≤ zq) = q, 0 ≤ q ≤ 1, n > 1, is given by : (i) for 0 < α < 2 (see e.g. [49] or [23]),

(1) 1/α zq := n Cαxq + bn where xq solves the equation Gα(xq) = q (11)

with bn defined in (8), Cα in (9) and Gα the limiting α-stable distribution defined in (6) (ii) for α ≥ 2 (see e.g. [23]), nα z(2) := d x˜ + wherex ˜ solves the equation Φ(˜x ) = q (12) q n q α − 1 q q

with dn defined in (10). • Another approximation for upper quantiles has been proposed in [49], when con- sidering the distribution of the sum of iid Pareto rv’s with shape parameter −α 1/2 < α < 2. It uses the simple asymptotic (Cαx) for the tail of a stable distribution, with Cα defined in (9) (see e.g. [44]), which has the advantage of providing a closed form expression for the upper approximated quantile, namely, for q > 0.95, (3) 1/α −1/α zq := n q + bn (13)

bn being defined in (8).

2.2 An EVT approach When focusing on risk measures, such as VaR or ES, the information on the entire distribution is not necessary. Hence the alternative of the EVT approach, which focuses on the tail of the distribution, makes sense in order to estimate the risk measures.

19 2.2.1 GEV approach For heavy tail distributions, it has been proved that the tail of the distribution of the maximum determines the tail of the distribution of the sum (see e.g. [20] or [32]). Therefore the three types Fisher-Tippett theorem (see [22]) is needed, providing the limiting distribution for the rescaled sample maximum that can only be of three types, Fr´echet, Weibull and Gumbel. The three types of extreme value distribution have been combined into a single three-parameter family (Jenkinson-Von Mises, 1955; ...; Hosking et al., 1985) known as the Generalized Extreme Value (GEV) distribution given by

" − 1 #  x − µ ξ x − µ H (x) = H (x) = exp − 1 + ξ , for 1 + ξ > 0 ξ µ,σ,ξ σ σ

with σ > 0 (scale parameter), µ ∈ R (location parameter), ξ ∈ R (shape parameter). The shape parameter ξ determines the nature of the tail distribution: ξ > 0: Fr´echet, ξ = 0: Gumbel, ξ < 0: Weibull.

Under the assumption of regular variation of the tail distribution, the tail of the cdf of the sum of iid rv’s is mainly determined by the tail of the cdf of the maximum of these rv’s. Indeed, we have:

Corollary 2.1 (see e.g. [20]) Assume that Xi, i = 1, . . . , n are i.i.d. rv’s with cdf FX having a regularly varying tail with tail index β ≥ 0, then for all n ≥ 1,

F ? n(x) ∼ nF (x) as x → ∞

which means that [Sn > x] ∼ [ max Xi > x] as x → ∞ P P 1≤i≤n It applies of course to Pareto rv’s.

It is one of the approach applied in [49], replacing the Pareto sum with tail index 0 < α < 2, by its maximal summand. It provides another possible approximation for an upper quantile, namely

−1/α (4) 1/α  zq := n log(1/q) + bn (14)

with bn defined in (8). (4) This approximation leads to less than 10% relative error (defined by zq /zq − 1) for the upper quantiles when the Pareto shape parameter α < 1, and is large otherwise.

20 2.2.2 Mean Excess Plot approach For completeness, to estimate the heaviness of a tail in empirical data, let us remind briefly the Mean Excess Plot (MEP) approach, introduced by Davison and Smith (see [16]; also e.g. [20], or [24] for asymptotic behavior of the ME Plots fo large thresholds),

Mean Excess Plot (MEP)

Recall the excess cumulative distribution function (cdf) Fu of a random variable X over a threshold u ∈ R is defined in the Peak Over Threshold (POT) approach as the cdf of X − u conditioned on X > u, namely

Fu(x) = P (X − u 6 x | X > u) , x > 0 . The corresponding mean excess function e of X is defined by

e(u) = E(X − u | X > u) whenever it exists.

The plot of the mean excess function e is a useful graphical tool to help distinguishing between heavy and light tails. Heavy-tailed distribution functions have a mean excess function tending to infinity, typically along an asymptotically straight line; an expo- nential distribution with parameter λ has a constant mean excess function equal to 1/λ, whereas distribution functions with tails decaying faster than exponentially fast are characterized by a mean excess function tending to 0.

In practice, one uses the empirical mean excess plot  (X(k), en(X(k))) : k ∈ {1, . . . , n − 1} , (15)

where en(u) is the empirical mean excess function defined using the empirical cumula- tive distribution function by 1 X (Xj − u) with In(u) = {j : 1 6 j 6 n, Xj > u} and Nu = card(In(u)) . Nu j∈In(u) The threshold u is often chosen in practice as one of the order statistics, leading con- sidering n 1 X e (X ) = X − X . n (k) n − k (j) (k) j=k+1 Fitting a GPD to the excesses over a threshold

Pickands proved in [40] that, for sufficiently high threshold u, the excess cdf Fu of any random variable X in a domain of attraction of an extreme value distribution can be

21 well approximated by a GPD Gξ,σ(u), with a shape parameter ξ and scale parameter σ = σ(u) > 0:

  −1/ξ  y 1 − 1 + ξ if ξ 6= 0, G(y) = G (y) = σ(u) ξ,σ(u)  y  1 − exp − otherwise,  σ(u)

where y > 0 if ξ > 0 and 0 6 y 6 −σ(u)/ξ if ξ < 0. Most of the “textbook” random variables are in the domain of attraction of some extreme value distribution, and so the above approximation of the excess cdf is very general. The shape parameter ξ > 0 arises when X is heavy tailed, and ξ ≤ 0 corresponds to light tails. Therefore, the parameters of the fitted GPD distribution provide information on the tails of X.

An important question is how to select an appropriate high threshold u; we choose it by plotting the empirical mean excess function and choosing u in the range where the latter appears to be linear or stable. The parameters of a GPD can be estimated via different methods. We will use the

method of moments; see [29]. If (Yj)16j6Nu denote the excesses over a given threshold u in a given sample, then the moments estimators of the parameters ξ and σ(u) of the approximating GPD are given, respectively, by

2 ! 2 ! 1 Y 1 Y ξ = 1 − and σ = σ(u) = Y + , (16) b 2 b d 2 2 SY 2 SY

¯ 2 where Y and SY are the sample mean and variance of the excesses:

Nu Nu 1 X 1 X 2 Y = Y and S2 = Y − Y  . N i Y N − 1 i u i=1 u i=1 Provided that the shape parameter satisfies ξ < 1/4, it can be shown by standard methods that the random vector (σ,b ξb) is asymptotically normal with covariance matrix A satisfying, as the sample sizes increases, (1 − ξ)2 N A ∼ Γ = (a ) , u (1 − 2ξ)(1 − 3ξ)(1 − 4ξ) ij 16i,j62

2 2 2 2 with a11 = 2σ (u)(1 − 6ξ + 12ξ ), a22 = (1 − 2ξ) (1 − ξ + 6ξ ) 2 and a12 = a21 = σ(u)(1 − 2ξ)(1 − 4ξ − 12ξ ), from which a confidence interval with asymptotic confidence level γ can be deduced:

   1/2        1/2   σb 1 z(1−γ)/2 σ(u) σb 1 z(1+γ)/2 + Γ 6 6 + Γ (17) ξb Nu z(1−γ)/2 ξ ξb Nu z(1+γ)/2

22 with zq denoting the q-th quantile of the standard normal distribution.

Application of the MEP approach on a sample

We proceed in a few steps to apply the MEP approach.

• we plot the empirical mean excess function, in order to judge whether it appears to increase linearly for large levels, or to decay to 0;

• on the MEP, we select a level u in the range where en looks approximately linear or stable. It is well known that selecting a proper threshold u is not an easy task as it implies a balance between bias and variance: too high a value of u leads to too few exceedances and, consequently, high variance of the estimators, whereas too small a value of u increases the bias of the estimators. The standard practice is to adopt as low a threshold as possible subject to the limiting GPD model providing a reasonable approximation to the empirical tail. We assess this graphically as well;

• having selected a level u, we estimate the corresponding GPD parameters as in (16), with the associated asymptotic confidence intervals (CI), given in (17), at the confidence level 99%.

3 New approaches - mixed limit theorems

3.1 Introduction An alternative approach to the GCLT one has been proposed by Zaliapin et al.’s (see [49]) when the Pareto shape parameter satisfies 1/2 < α < 2, i.e. in the case where the variance of the Pareto rv’s Xi does not exist. The neat idea of the method is to rewrite the sum of the Xi’s as the sum of the order statistics X(i), i = 1, . . . , n and, for 2/3 < α < 2, to separate it into two terms, one with the first n − 2 order statistics having finite variance and the other as the complement. They can then treat these two sub-sums separately. Even if not always rigorously developed in this paper, or say, quite approximative, as we will see later, their method provides a better approximation for the Pareto sum than the GCLT does, and for an arbitrary number of summands, with a higher degree of accuracy; it gives also a better result for the evaluation of the Value-at-Risk than the GCLT one does. Nevertheless, there are some mathematical issues in this paper. One of them is that the authors consider these two sub-sums as independent. Another one is that they approximate the quantile of the total (Pareto) sum with the direct summation of the quantiles of each sub-sum, also the quantiles are not additive. For the case 1/2 < α < 2/3, they reduce the behavior of the sum

23 arbitrarily to the last two upper order statistics.

We are mainly interested in the case of a shape parameter larger than 2, since it is the usual case when studying market risk data, for instance. For such a case, the CLT applies because of the finiteness of the 2nd moment but it provides wrong results for the tails, as expected. Indeed, the CLT only concentrates on the average behavior; it is equivalent to the CLT on the trimmed sum (ie Sn minus a given number of the largest order statistics) (see [35]), which emphasizes that the tail is not considered. Moreover, it is known that the rate of convergence improves for trimmed sums (see [27]). As already mentioned, a fat tail behavior may clearly appear on high frequency data but not anymore when aggregating data or when considering short samples, although it is well known that shape parameter of the underlying distribution remains constant under aggregation. Hence we really have to be aware that using CLT to obtain in- formation on something else than the average is simply wrong in presence of fat tails, even if in some situations the plot of the empirical distribution fits well a normal one, as seen in the example of monthly stock index data in the introduction.

In this study, we go further in the direction of separating mean and extreme behaviors in order to improve approximations, for any α, and we build two alternative methods. It means also to answer rigorously the question of how many largest order statistics Xn−j, j > k, would be needed to explain the divergence between the underlying distribution and the normal approximation when considering a Pareto sum with α ≥ 2, or the stable approximation when considering a Pareto sum with α < 2. Both methods rely on a first idea, inspired by Zaliapin et al.’s paper, which consists of splitting the Pareto sum into a trimmed sum to which the CLT applies and another sum with the remaining largest order statistics. The main idea of the two methods is to determine in an ’optimal way’, which we are going to explain, the number k that corresponds to a threshold when splitting the sum of order statistics into two sums, the second one having the k largest order statistics, then to have a specific treatment of this second sum. Our two methods differ from each other in two points:

• the way of selecting this number k

• the way of evaluating the sum determined by the k largest order statistics, which is of course related to the choice of k.

Moreover our way to provide approximations for the distribution of the Pareto sum is general enough to apply to any α > 0, so we also come back to the case α ≤ 2, although not our main focus. Then we deduce the evaluation of the VaR. Although the study is developed on the Pareto-example, note that its goal is to propose a method that may be applied to other examples and to real data, hence this choice of looking for limit theorems in order to approximate the true (and most of the time unknown) distribution.

24 3.1.1 On the selection of the threshold for the best mean behavior of ag- gregated heavy tail distributed risks

Let us start by studying the behavior of the trimmed sum Tk when writing down the Pn sum Sn of the iid α-Pareto rv’s (with α > 0), Sn := i=1 Xi, as

n−k k−1 X X Sn = Tk + Un−k with Tk := X(j) and Un−k := X(n−j) (18) j=1 j=0 Much literature, since the 80’s, has been concerned with the behavior of trimmed sums by removing extremes from the sample; see e.g. [27], [35], [26].

The main issue is the choice of the threshold k, in order to use the CLT but also to improve its fit since we want to approximate the behavior of Tk by a normal one. We know that a necessary and sufficient condition for the CLT to apply on Tk is to 2 require the summands X(j), j = 1, . . . , k, to be L -rv’s. Yet, we also know that requiring only the finitude of the 2nd moment may lead to a poor normal approximation, if higher moments do not exist, as occurs for instance with financial market data. In particular, including the finitude of the third moment provides a better rate of convergence to the normal distribution in the CLT (Berry Ess´eeninequality). Another information that might be quite useful to improve the approximation of the distribution of Sn with its [(X − (X))4] limit distribution, is the Fisher index, defined by the ratio γ = E E , which (var(X))2 is a kurtosis index. We know that for the standard normal distribution, it is worth 3, and if γ < 3, we will have a more flat distribution (hypo-normal), whereas γ > 3 indicates a hyper-normal distribution. The following Edgeworth expansion involving the Hermite polynomials (Hn, n ≥ 0) points out that requiring the finitude of the 4th moments appears as what we call the ’optimal’ solution (of course, the higher order moments exist, the finer the normal approximation becomes, but it would imply too strong conditions, difficult to check in practice). If Fn denotes the cdf of Sn, then 1 1 F (x) − Φ(x) = √ Q (x) + Q (x) + o(1/n) (19) n n 1 n 2 with H (x) [(X − (X))3] Q (x) = −ϕ(x) 2 E E 1 6 (var(X))3/2 3 2 nH5(x) E[(X − E(X)) ] H3(x) o Q (x) = −ϕ(x) + γ − 3 2 72 (var(X))3 24 and 2 3 5 3 H2(x) = x − 1; H3(x) = x − 3x; H5(x) = x − 10x + 15x

3 3/2 Note that the skewness E[(X − E(X)) ]/(var(X)) of X and γ − 3 measure the close- ness of the cdf F to Φ, and would vanish if X would be Gaussian. Hence we will choose

25 k based on the condition that the fourth moment of the first n − k order statistics does exist. This condition is naturally linked to the value of α.

Therefore we select the threshold k = k(α) such that

 < ∞ ∀j ≤ n − k (|X |p) (20) E (j) = ∞ ∀j > n − k which applied to our case of α-Pareto iid rv’s, using (25), gives: p k > − 1 (21) α In our case, we set p = 4 to require the existence of the 4th moment of the summands of Tk (nevertheless we prefer to keep the notation p so that it remains general). Indeed, such a choice provides what we call an ’optimal’ approximation since both skewness and kurtosis are taken into account. It may also be a way to limit the (small) number of largest observations appearing in the second sum Un−k when choosing the smallest k satisfying (21), namely k = k(α) = [p/α − 1] + 1 (where [z] denotes the largest integer smaller than z ∈ R); this argument will be used in one of the methods. Note that this condition is independent of the size n of the sample.

Let us summarize in the table below the necessary and sufficient condition on α to have the existence of the p-th moments for the upper order statistics, for p = 2, 3, 4 (and for α > 1/4; we could of course complete the table for any choice of α > 0 using p (25)), using (21) written as α > . k + 1

p k E(X(n−k)) p = 2 p = 3 p = 4 p 0 E(X(n)) < ∞ iff α > 2 iff α > 3 iff α > 4 p 1 E(X(n−1)) < ∞ iff α > 1 iff α > 3/2 iff α > 2 p 2 E(X(n−2)) < ∞ iff α > 2/3 iff α > 1 iff α > 4/3 p 3 E(X(n−3)) < ∞ iff α > 1/2 iff α > 3/4 iff α > 1 p 4 E(X(n−4)) < ∞ iff α > 2/5 iff α > 3/5 iff α > 4/5 p 5 E(X(n−5)) < ∞ iff α > 1/3 iff α > 1/2 iff α > 2/3 p 6 E(X(n−6)) < ∞ iff α > 2/7 iff α > 3/7 iff α > 4/7 p 7 E(X(n−7)) < ∞ iff α > 1/4 iff α > 3/8 iff α > 1/2

p Table 1: Necessary and sufficient condition on α for having E(|X(n−k)| ) < ∞

from which we deduce the value of the threshold k = k(α) satisfying (21) for which the 4th moment is finite, according to the set of definition of α:

26 1 4 4 2 2 4 4 4 4 α ∈ I(k) with I(k) = ] 2 ; 7 ] ] 7 ; 3 ] ] 3 ; 5 ] ] 5 ; 1 ]1; 3 ] ] 3 ; 2[ [2,4] k = k(α) = 7 6 5 4 3 2 1

4 Table 2: Value of k(α) for having up to E(|X(n−k(α))| ) < ∞

We notice from this table that we would use Zaliapin et al’s decomposition Sn = Pn−2 P1 4 j=1 X(j) + j=0 X(n−j) only when α ∈] 3 ; 2[, using then the limit distribution of each term to approximate the distribution of Sn (a normal one for the first sum and the exact joint distribution of the two largest observations for the second one). When considering, as they do, α > 2/3, we would rather introduce the decomposition Pn−5 P4 Sn = j=1 X(j) + j=0 X(n−j) instead of the previous one, to improve the approxima- tion of the distribution of Sn.

Let us turn now to the construction of two alternative methods, each one having its pros and cons, to fit not only the mean behavior of the sum of heavy tail distributed rv’s (aggregated heavy tailed risks) but also its tail, so that a reasonable approximation of the associated risk measures may be offered. Our methods will thus result in good approximations for the entire distribution, center and tail.

Just before doing so, let us briefly recall some known properties (see e.g. [15]) and add some new ones, obtained after straightforward computations, since our methods are based on order statistics.

3.1.2 Order statistics • Distribution of order statistics

Recall that the pdf f(i) of X(i) (1 ≤ i ≤ n) is given by 1 f (x) = F i−1(x)1 − F (x)n−if(x) (22) (i) B(i, n − i + 1) n! = α(1 − x−α)i−1x−α(n−i+1)−1 (i − 1)!(n − i)!

and that the joint pdf f(n1)...(nk) of the α-Pareto order statistics X(nj ), j = 1, . . . , k, with 1 ≤ n1 < . . . < nk ≤ n; 1 ≤ k ≤ n is, for 1 < x1 ≤ · · · ≤ xk

 nj+1−nj −1 k−1 n1−1 n−nk F (xj+1) − F (xj) (F (x1)) (1 − F (xk)) Y f (x1, . . . , x ) = n! f(x ) f(xj) (n1)...(nk) k (n − 1)! k (n − n )! n − n − 1! 1 k j=1 j+1 j  nj+1−nj −1 k k x−α−1 x−α − x−α Y 1 Y j j j+1 = n! αk (23) xα+1 n − n − 1! j=1 j j=0 j+1 j

27 setting x0 := −∞, xk+1 := +∞, n0 := 0 and nk+1 = n + 1. It becomes, for successive order statistics, for i ≥ 1, j ≥ 2, with i + j ≤ n,

j n! Y f (x , . . . , x ) = F i(x )(1 − F (x ))n−i−j f(x ) (24) (i+1)...(i+j) 1 j i!(n − i − j)! 1 j l l=1 j j n! α −α(n−i−j) Y 1 = (1 − x−α)i x i!(n − i − j)! 1 j xα+1 l=1 l Moments of α-Pareto order statistics satisfy

p E(X(j)) < ∞ iff p < α(n − j + 1) (25) n! Γ(n − j + 1 − p/α) (Xp ) = E (j) (n − j)! Γ(n + 1 − p/α)

and, for 1 ≤ i < j ≤ n,  E(X(i)X(j)) < ∞ iff min n − j + 1 , (n − i + 1)/2 > 1/α (26) n! Γ(n − j + 1 − 1/α)Γ(n − i + 1 − 2/α) (X X ) = E (i) (j) (n − j)! Γ(n − i + 1 − 1/α)Γ(n + 1 − 2/α)

• Conditional distribution of order statistics Let us look now at conditional distributions. We deduce from (23) that the joint pdf of X(i) given X(j), for 1 ≤ i < j ≤ n is, for x ≤ y,

(j − 1)! F i−1(x) f (x) = f(x)F (y) − F (x)j−i−1 (27) X(i)/X(j)=y (i − 1)!(j − i − 1)! F j−1(y) α (j − 1)! (1 − x−α)i−1 = x−α−1x−α − y−αj−i−1 (i − 1)!(j − i − 1)! (1 − y−α)j−1

and that joint pdf of (X(i),X(j)) given X(k), for 1 ≤ i < j < k ≤ n is, for x ≤ y ≤ z,

(k − 1)! f (x, y) = f(x)f(y) (28) X(i),X(j)/X(k)=z (i − 1)!(j − i − 1)!(k − j − 1)! F i−1(x)F (y) − F (x)j−i−1F (z) − F (y)k−j−1 × F k−1(z) α2 (k − 1)! x−α−1y−α−1 (1 − x−α)i−1x−α − y−αj−i−1y−α − z−αk−j−1 = (i − 1)!(j − i − 1)!(k − j − 1)! (1 − z−α)k−1

28 Using (24) provides, for y ≤ x1 ≤ ... ≤ xj−1,

j−1 (n − i − 1)! Y f (x , . . . , x ) = (1 − F (x ))n−i−j f(x ) (29) X(i+2)...X(i+j)/X(i+1)=y 1 j−1 n−i−1 j−1 l (n − i − j)! 1 − F (y) l=1 j−1 j−2 (n − i − 1)! α Y −α(n−i−j+1)−1 = x−α−1 x (n − i − j)! y−α(n−i−1) l j−1 l=1 Then we can compute the first conditional moments. We obtain, using (27),

Z y  (j − 1)! i−1 j−i−1 E X(i)/X(j) = y = j−1 xF (x) F (y) − F (x) dF (x) (i − 1)!(j − i − 1)! F (y) 1 (j − 1)! Z 1   = F ← uF (y) ui−1(1 − u)j−i−1 du (30) (i − 1)!(j − i − 1)! 0 (with the change of variables u = F (x)/F (y)) 1 (j − 1)! Z − 1 = 1 − uF (y) α ui−1(1 − u)j−i−1 du (i − 1)!(j − i − 1)! 0 ( l−1 ) 1 X (F (y))l Y ' B(i, j − i) + B(i + l, j − i) (m + 1/α) B(i, j − i) l! l≥1 m=0 l−1 Γ(j) X Γ(i + l) Y = 1 + (F (y))l (m + 1/α) Γ(i) l Γ(j + l)Γ(l) l≥1 m=0 and, with the same change of variables,

Z 1 2 2  (j − 1)! h ← i i−1 j−i−1 E X(i)/X(j) = y = F uF (y) u (1 − u) du (31) (i − 1)!(j − i − 1)! 0 1 (j − 1)! Z − 2 = 1 − uF (y) α ui−1(1 − u)j−i−1 du (i − 1)!(j − i − 1)! 0 l−1 Γ(j) X Γ(i + l) Y ' 1 + (F (y))l (m + 2/α) Γ(i) l Γ(j + l)Γ(l) l≥1 m=0

29 and, for 1 ≤ i < j < k ≤ n, via (28),

Z y  (k − 1)! i−1 E X(i)X(j)/X(k) = y = k−1 xi F (xi) (i − 1)!(j − i − 1)!(k − j − 1)! F (y) 1 Z y  j−i−1 k−j−1  xj F (xj) − F (xi) F (y) − F (xj) dF (xj) dF (xi) xi (k − 1)! Z 1   = F ← uF (y) ui−1 (i − 1)!(j − i − 1)!(k − j − 1)! 0  Z 1    F ← vF (y) v − uj−i−11 − vk−j−1dv du (32) u  with the change of variables u = F (xi)/F (y) and u = F (xj)/F (y) 1 (k − 1)! Z − 1 = 1 − uF (y) α ui−1 (i − 1)!(j − i − 1)!(k − j − 1)! 0 1  Z − 1 j−i−1 k−j−1  1 − vF (y) α v − u 1 − v dv du u where F (y) = 1 − y−α.

Moreover, the joint conditional distribution of (X(i+1),...,X(p−1)) given (X(k) =

xk, k ≤ i, k ≥ p), for 1 ≤ i < p ≤ n, denoted by fX(i+1),...,X(p−1) /X(k)=xk,k≤i,k≥p, or

f(i+1),...,(p−1) /X(k)=xk,k≤i,k≥p when no ambiguity, is, for x1 < . . . < xn,

p−1 (p − i − 1)! Y f (x , . . . , x ) = f(x ) (33) (i+1),...,(p−1) /X(k)=xk,k≤i,k≥p i+1 p−1  p−i−1 l F (xp) − F (xi) l=i+1

p−1 (p − i − 1)! αp−i−1 Y 1 =  p−i−1 α+1 −α −α xl xi − xp l=i+1

It implies that X(i+1),...,X(p−1) are independent of X(1),...,X(i−1) and X(p+1),...,X(n) when X(i) and X(p) are given, and that the order statistics form a Markov chain. • Limit distribution of Upper Order Statistics Recall the following results when considering the k upper order statistics (see e.g. [20]; §4.2)

Theorem 3.1 1. Assume that F belongs to the maximum domain of attraction of the extreme value distribution H (F ∈ MDA(H)) with norming constants cn > 0, dn ∈ R, i.e. that

lim nF (cnx + dn) = − ln H(x) ∀x ∈ n→∞ R

30 Let h denote the density of H. Then, for every fixed k ∈ N∗,

 −1  d (n−i+1) cn (X(n−i+1) − dn), i = 1, . . . , k → (Y , i = 1, . . . , k) as n → ∞ (34) where (Y (n−i+1), i = 1, . . . , k) is a k-dimensional Hextremal variate with (k) joint density h defined for x1 < . . . < xk by

k Y h(xj) h(k)(x , . . . , x ) = H(x ) 1 k 1 H(x ) j=1 j

2. If F is the (type I -)Pareto distribution with tail index α, then F ∈ MDA(Φα), Φα denoting the Fr´echetdistribution, and (34) holds with, for 0 < x1 < . . . < xk,

k (k) k n −α X o h (x1, . . . , xk) = φα(x1, . . . , xk) = α exp − x1 − (α + 1) ln xj j=1

Moreover, we have

k k −1 X  d X (n−i+1) (n−i) cn X(n−i+1) − kX(n−k) → i(Y − Y ) (35) i=1 i=1

where the spacings (Y (n−i+1) − Y (n−i), i = 1, . . . , k) have joint density, for y1 > 0, . . . , yk > 0, Z ∞ k+1 −α−1 −z−α g(Y (n−i+1)−Y (n−i),i=1,...,k)(y1, . . . , yk) = α z(z+y1) ... (z+y1+...+yk) e dz 0

3.2 Method 1: Normex - a mixed normal-extremes limit

3.2.1 A conditional decomposition A first approach consists of determining a fixed number k = k(α), function of the shape parameter α of the underlying heavy tailed distribution of the Xi’s but not of the size n of the sample. We take it as small as possible in order to fit both the mean behavior of Sn, as well as its tail behavior. Note that we look for the smallest possible k to be able to compute explicitly the distribution of the last upper order statistics appearing as the summands of the second sum Un−k. For this reason, based on condition (21), we choose k = [p/α − 1] + 1 (36)

31 Because of the small number of k upper order statistics, we are able to compute explic- itly the distribution of the 2nd term Un−k, independently of the sample size. However, combining it with the normal approximation of the distribution of the first term Tk obtained in Proposition 3.3 may be an issue since these two terms are not at all inde- pendent. To circumvent this problem, we decompose the Pareto sum Sn in a slightly different way than in (18), namely

Sn = Tk + Xn−k+1 + Un−k+1 (37) where Tk and Un−k+1 are defined in (18), in order to make use of the property of conditional independence (recalled in the previous section) between the two sub-sums Tk/X(n−k+1) and Un−k+1/X(n−k+1).

From this point on, we develop the method in three steps.

. First, we can write the cdf of Sn, for any x ≥ 1 (note that P(Sn ≤ x) = P(1 ≤ Sn ≤ x)) as Z x   P(Sn ≤ x) = P Tk + Un−k+1 ≤ x − y / X(n−k+1) = y f(n−k+1)(y)dy 1 Hence, if k = 1, Z x   P(Sn ≤ x) = P T1 ≤ x − y / X(n) = y f(n)(y)dy 1 Z x Z x−y

= f(n)(y) fT1/X(n)=y(v)dv dy (38) 1 0 and, for k ≥ 2,

Z x Z x−y

P(Sn ≤ x) = f(n−k+1)(y) fTk+Un−k+1/X(n−k+1)=y(v)dv dy 1 0 Z x Z x−y

= f(n−k+1)(y) fTk/X(n−k+1)=y ∗ fUn−k+1/X(n−k+1)=y(v)dv dy (39) 1 0 using the conditional independence of Tk/X(n−k+1) and Un−k+1/X(n−k+1) in this last equality.

So it remains to evaluate the limit distribution of Tk / (X(n−k+1) = y) and the distri- bution of Un−k+1 / (X(n−k+1) = y).

. A limiting normal distribution for Tk/(X(n−k+1)

First, let us provide the limit behavior of the conditional term Tk/X(n−k+1), as follows:

32 Proposition 3.1 The conditional distribution of the trimmed sum Tk defined in (18), given the (n − k + 1)th largest rv X(n−k+1), can be approximated, for large n, by the  2  normal distribution N m1(α, n, k, y), σ (α, n, k, y) :

  d  2  L Tk/(X(n−k+1) = y) ∼ N m1(α, n, k, y), σ (α, n, k, y) (40) n→∞ 2 with y > 1 and where the mean m1(α, n, k, y) and the variance σ (α, n, k, y) are defined in (42) and (43) respectively. Proof of Proposition 3.1 n−k X Notice that Tk/X(n−k+1) has the same distribution as Yj, where (Yj) is an (n − k)- j=1   sample with parent cdf defined by FY (.) = P Xi ≤ ./Xi < X(n−k+1) . Thus, we may apply the CLT whenever the 2nd moment of Yj is finite. Yet, for the reasons explained previously, we choose p ≥ 4 for a better fit (after noticing that if X(i) has a finite pth moment, for i ≤ n − k, so does X/X < X(n−k+1)). We need to compute the first two moments of Tk/(X(n−k+1) = y), m1(α, n, k, y) and m2(α, n, k, y) respectively. Using (30), (31) and (32) respectively, and applying the multinomial theorem, we obtain that for k > p/α − 1

n−k X  m1(y) := m1(α, n, k, y) := E X(j)/X(n−k+1) = y j=1 n−k Z 1   X (n − k)! = F ← uF (y) uj−1(1 − u)n−k−j du (j − 1)!(n − k − j)! 0 j=1 n−k−1 Z 1   X n − k − 1 = (n − k) F ← uF (y) uj(1 − u)n−k−1−j du j 0 j=0 n − k − 1 where denotes the binomial coefficient j Z 1   = (n − k) F ← uF (y) du 0 (using the binomial theorem) i.e. when considering the α-Pareto distribution, using (3), Z 1 −1/α m1(y) = (n − k(α)) 1 − uF (y) du 0  1−1/α  1 − F (y) n − k(α)  if α 6= 1 = × 1 − 1/α F (y)   | ln F (y) | if α = 1

33 hence (42). Let us compute the 2nd moment m2(y) := m2(α, n, k, y); we introduce the notation a = F (y).

n−k n−k j−1 X 2  X X  m2(y) = E X(j)/X(n−k+1) = y + 2 E X(i)X(j)/X(n−k+1) = y j=1 j=2 i=1 Z 1 2 Z 1 Z 1 = (n − k) F ←(au) du + 2 F ←(au) F ←(av) 0 0 u n−k j−1 X X (n − k)! j−i−1 n−k−j × ui−1v − u 1 − v dv du (i − 1)!(j − i − 1)!(n − k − j)! j=2 i=1 Z 1 2 Z 1 Z 1 = (n − k) F ←(au) du + 2(n − k)(n − k − 1) F ←(au) F ←(av) 0 0 u n−k−2 j   X X n − k − 2 j−i n−k−2−j × uiv − u 1 − v dv du i, j − i, n − k − 2 − j j=0 i=0  n − k − 2  where denotes the multinomial coefficient i, j − i, n − k − 2 − j Z 1 2 Z 1 Z 1  = (n − k) F ←(au) du + 2(n − k − 1) F ←(au) F ←(av)dv du 0 0 u (using the multinomial theorem)

Hence, for Pareto, via (3), it comes

Z 1 Z 1 Z 1 n −2/α −1/α −1/α o m2(y) = (n − k) (1 − au) du + 2(n − k − 1) (1 − au) (1 − av) dv du 0 0 u Z 1 = (n − k) (1 − au)−2/αdu + (n − k)(n − k − 1) × 0  Z 1 Z 1   2 1−2/α 1−1/α −1/α  (1 − au) du − (1 − a) (1 − au) du if α 6= 1  a(1 − 1/α) 0 0

 ln2(1 − a)  if α = 1  a2

34  1−2/α  1 − F (y) n − k(α)  if α 6= 2 i.e. m (y) = × 1 − 2/α 2 F (y)   | ln F (y)| if α = 2  1 − (F (y))1−1/α2    n − k(α) n − k(α) − 1  2 if α 6= 1 + × (1 − 1/α) F 2(y)   ln2 F (y) if α = 1  1 − y2−α  if α 6= 2   n − k(α)  1 − 2/α n − k(α) n − k(α) − 1 = −α × + −α 2 1 − y  (1 − y )  2 ln(y) if α = 2  (1 − y1−α)2  if α 6= 1  (1 − 1/α)2 ×   ln2(y) if α = 1 hence (43). Then the result given in Proposition 3.1 follows. 2

. A Pareto distribution for the conditional sum of the largest order statistics

 Now, let us look at the exact behavior of Un−k+1/ X(n−k+1) = y , assuming k ≥ 2. Its distribution may be computed explicitly via (29), which becomes, when taking i = n−k and j = k, and for y ≤ x1 ≤ ... ≤ xk−1,

k−1 k−1 (k − 1)! Y (k − 1)! αk−1 Y f (x , . . . , x ) = f(x ) = x−α−1 X(n−k+2),...,X(n)/X(n−k+1)=y 1 k−1 k−1 l y−α(k−1) l 1 − F (y) l=1 l=1

k−1 Y i.e. fX(n−k+2),...,X(n)/X(n−k+1)=y(x1, . . . , xk−1) = (k − 1)! hy(xl)1I(x1≤...≤xk−1) l=1 where hy is the probability density function (df) of a Pareto rv with parameters α and y. Taking into account the number of possible permutations, we can then deduce that the conditional density of the sum Un−k+1 given (X(n−k+1) = y) is defined, for any s ≥ (k − 1)y, by (k−1)∗ fUn−k+1/(X(n−k+1)=y)(s) = hy (s) (41) where (k − 1)∗ denotes the convolution product of order k − 1. Note that we could have retrieved this conditional density, noticing, as previously for

35 Tk, that Un−k+1/X(n−k+1) can be written as

k−1 d X Un−k+1/X(n−k+1) = Zj j=1

where the Zj are i.i.d. rv’s with parent rv Z and parent cdf defined by h i FZ (.) = P X ≤ . / X > X(n−k+1)

. Main result

Combining Proposition 3.1 with the results (38), (39) and (41), we obtain the following approximation for the distribution of the Pareto sum Sn, for k ≥ 1 (i.e. when at least the largest order statistics has a p-th moment which is not finite).

Theorem 3.2 The cdf of Sn expressed in (37) with k ≥ 1 defined in (36), can be approximated by Gn,α,k defined, for any x ≥ 1, by

 Z x 1 (1 − y−α)n−1 Z x−y v − m (y)  nα ϕ 1 dv dy if k = 1  1+α  1 σ(y) y 0 σ(y) Gn,α,k(x) =  Z x Z x−y  f (y) ϕ ∗ h(k−1)∗(v)dv dy if k ≥ 2  (n−k+1) m1(y),σ(y) y 1 0

with f(i) computed in (22), hy the probability density function of a Pareto rv with α yα parameters α and y, i.e. defined by h (x) = 1I , and where the mean m (y) y xα+1 (x≥y) 1 2 and the variance σ (y) of the normal density ϕm1(y),σ(y) are defined respectively by  1 − y1−α  if α 6= 1 n − k(α)  1 − 1/α m1(y) = m1(α, n, k, y) = −α × (42) 1 − y   ln(y) if α = 1

and

2 2 2 σ (y) = σ (α, n, k, y) := m2(α, n, k, y) − m1(α, n, k, y)  y ln2(y)  y2  y − 1 = (n − k(1)) y 1 − 1I + 2(n − k(2)) ln(y) − 2 1I (y − 1)2 (α=1) y2 − 1 y + 1 (α=2) n − k(α) 1 − y2−α 1 (1 − y1−α)2  + − × 1I (43) 1 − y−α 1 − 2/α (1 − 1/α)2 1 − y−α (α6=1,2)

Comments

36 1. The distribution Gn,α,k can also be expressed as Z x −α n−1   (1 − y ) m1(y) m1(y) − (x − y) Gn,α,1(x) = α n 1+α Φ − Φ dy (44) 1 y σ(y) σ(y) and, for k ≥ 2, Z x Z x−yZ v  f(n−k+1)(y) v − u − m1(y) (k−1)∗ Gn,α,k(x) = ϕ hy (u)du dv dy (45) 1 σ(y) 0 0 σ(y) 2. Note that we considered iid Pareto rv’s only as an example to illustrate Normex. It does not prevent from extending it to unknown distributions, using the CLT for the mean behavior and heavy tail distributions of the Pareto type for the tail. Since the exact distribution of the Pareto sum Sn of iid Pareto rv’s is known, we can judge the quality of the approximation proposed in Theorem 3.2, when comparing it with the exact distribution of Sn. Then we can also compare the respective associated risk measures. Recall that the distribution of the sum Sn of iid Pareto rv’s is given by the following (see [42] and references therein).

 For 0 < α < 2 and α 6= 1, the pdf fn of Sn is given explicitly by the series expansion (see Brennan et al., 68, and Blum, 70 [5]) n   ∞ −1 X n j X Γ(m + αj + 1) f (x) = −Γ(1−α) sin(παj) C (46) n π j n−j,m xm+αj+1 j=1 m=0

where Ck,m is the mth coefficient in the series expansion of the kth power of ∞ ∞ !k X X  −α tj the confluent hypergeometric function: C tm = k,m j − α j! m=0 j=0 but computational difficulties may arise for large values of n and certain ranges of x and α, as pointed out in [5].  An alternative method, based on the inversion of the Laplace transform, has been proposed in [42] and provides an explicit expression as well in the case ∗ α ∈ N and for a Pareto Hβ (i.e. not only for the case β = 1). We have Z ∞  1 − 1+ t v fn(t) = γm,n(v/n)e nβ dv (47) nβ 0 where, for v > 0,

[(n−1)/2]    m 2j+1 X n n−2j−1 v γ (v) := (−1)n+1mn (−π2)j Ei (v) m,n 2j + 1 m+1 m! j=0 m ! xm X 1 X xj Ei (x) := γ + ln x − + m+1 m! j (j − m)j! j=1 j≥0,j6=m γ being the Euler constant.

37 n∗  The pdf fn of Sn, satisfying fn = f , can also be simply evaluated numeri- cally using the recursive convolution equation Z x f j∗(x) = f ∗(j−1) ∗ f(x) = f (j−1)∗(x − u)f(u) du, for j ≥ 2, (48) 0 and f 1∗ = f. This recursive approach may yield good results, but is rela- tively time consuming for large values of n and x.

(k−1)∗ 3. The convolution product hy appearing in Gn,α,k can be numerically evaluated using, either the recursive convolution equation (48) applied to h, or, for α ∈ N∗, the explicit expression (47) when replacing β by y.

4. Finally recall the following result by Feller (see [21]) on the convolution closure of distributions with regularly varying tails, which applies in our Pareto example but may also be useful when extending the method.

Lemma 3.1 If F1 and F2 are two cdfs with regularly varying tails with tail index β ≥ 0, then the convolution F1 ∗ F2 is also regularly varying with the same tail index β.

Note that this lemma implies the result given in Lemma 2.1.

A consequence of Lemma 3.1 in our Pareto case, is Z ∞ Z ∞ (k−1)∗ hy (u)du ∼ (k − 1) hy(u)du (49) x x→∞ x

3.2.2 On the quality of the approximation of the distribution of the Pareto sum Sn To estimate the quality of the approximation of the distribution of the Pareto sum Sn, we compare analytically the exact distribution of Sn with the distribution given in Theorem 3.2. It could be also done numerically, as for instance in [23] with the distance Z ∞ i between two distributions F and G defined by di(F,G) = F (x) − G(x) dx, with 1 i = 1, 2. Instead, when looking at the entire distribution, we focus on the analytical comparison, mainly for the case α > 2 (with some hints for the case α ≤ 2). Then, we estimate numerically the distance in the tails, through the VaR measure (see §4.2). Note that it is not possible to compare directly the expressions of the VaR correspond- ing to, respectively, the exact and approximative distributions, since they can only be

38 expressed as the inverse function of a cdf. Nevertheless, we can compare the tails of these two distributions to calibrate the accuracy of the approximative VaR, since

P(Sn > x) − Gn,α;k = |P(Sn ≤ x) − Gn,α;k| Moreover, we will compare analytically our result with a normal approximation made on the entire sum (and not the trimmed one) since, for α > 2, the CLT applies and, as already noticed, it is often used in practice.

Since we do not approximate the distribution of the last upper order statistics in Normex, comparing the true distribution of Sn with its approximation Gn,α;k simply comes back to the comparison of the true distribution of n − k iid rv’s with the normal distribution (when applying the CLT). Note that, when extending Normex to any dis- tribution, an error term should be added to this latter evaluation. It comes from the approximation of the extremes distribution by a Pareto one.

Suppose α > 2, case of our main focus. For such a case, it is often the normal 2 2 approximation N (µn, sn), with µn := E(Sn) and sn := var(Sn), which is used in nα practice, where in the case of a Pareto sum, µ = nα , and s2 = . n α−1 n (α − 1)2(α − 2) We know that applying the CLT directly to Sn leads to bad results, in particular for the estimation of risk measures, since for any x, the quantity Q1(x), involving the 3rd moment of X, appearing in the error (19) made when approximating the exact distribution of Sn by a normal one, is infinite for any 2 < α ≤ 3. Indeed, using (19) implies that

2 |P(Sn ≤ x) − Φµn,sn (x)| = O(1)

When α > 3, even if the rate of convergence improves because Q1(x) < ∞, we still have Q2(x) = ∞ (because the 4th moment of X does not exist), which means that we cannot get a rate of order 1/n. Now let us look at the rate of convergence when approximating Sn with Gn,α;k, pro- posed on purpose to improve the rate of convergence. The main idea is to use a normal approximation not on the entire sum Sn, but only on the sum of the first n − k order statistics having finite moments of order p ≥ 3. Then, we choose p = 4 to improve the rate of convergence for the mean behavior, and we keep the exact distribution for the k largest order statistics with infinite 3rd or 4th moments. Remember that k is small and fixed (depending on α).

Recall (see the proof of Theorem 3.2) that for k = 1, Z x   Z x Z x−y P(Sn ≤ x) = P T1 ≤ x − y / X(n) = y f(n)(y)dy = f(n)(y) fT1/X(n)=y(v)dvdy 1 1 0 and, for k ≥ 2, Z x Z x−y

P(Sn ≤ x) = f(n−k+1)(y) fTk/X(n−k+1)=y ∗ fUn−k+1/X(n−k+1)=y(v)dv dy 1 0

39 Considering the exact distribution of the Pareto sum Sn means taking, at given y > 1 and for any k ≥ 1 and z ≤ (n − k)y: α f (z) = g(n−k)∗(z) with g(u) = u−α−11I (50) Tk/X(n−k+1)=y F (y) (1≤u≤y)

whereas considering our approximation means to replace fTk/X(n−k+1)=y by the pdf  2  ϕ 2 , of the normal distribution N m1(α, n, k, y), σ (α, n, k, y) defined in Proposi- m1k,σk tion 3.1.

Recall that, for k ≥ 1,

n−k d X Tk ≤ x − y / (X(n−k+1) = y) = Yj j=1 with Yj iid rv’s with parent rv Y having pdf g defined in (50) with finite pth moment. Although the direct dependence is on α (and y) and only indirectly on k since k = k(α), we introduce k in the index notation for convenience and have m (α, n, k; y) µ := (Y ) = 1 y E n − k 1 1 − y1−α ln(y) 2 = × 1I + 1I + 1I 1 − 1/α 1 − y−α (α6=1,2) 1 − y−1 (α=1) 1 + y−1 (α=2)

(note that µ(y) > 1, for any α that we consider, and any y > 1), and

2 2 2 γy = γy := var(Y ) = σ (α, n, k; y)/(n − k) 1 1 − y2−α 1 (1 − y1−α)2  = − × 1I 1 − y−α 1 − 2/α (1 − 1/α)2 1 − y−α (α6=1,2)  y ln2(y)  y2  y − 1 + y 1 − 1I + 2 ln(y) − 2 1I (y − 1)2 (α=1) y2 − 1 y + 1 (α=2)

A straightforward computation provides α (|Y − µ |3) = [2h(µ) − h(1) − h(y)] E y 1 − y−α where h denotes the antiderivative of the function H(z) = (µ3 −3µ2z+3µz2 −z3)z−α−1, i.e., if α 6= 1, 2,

α 1I 3µ 3µ2 µ3 (|Y − µ |3) = (α6=3) y3−α + 1I ln(y) + y y2−α − y y1−α + y y−α+ E y 1 − y−α 3 − α (α=3) α − 2 α − 1 α 3−α 3 2 12 µ 1I(α6=3)  11 µ 3µ 3µ 1I  y − 2 ln µ + 1I + y − y + y − (α6=3) α(α − 1)(α − 2)(α − 3) y 3 (α=3) α α − 1 α − 2 α − 3

40 whereas, if α = 1,

1 y2 1  ln3(y)  1 1  (|Y − µ |3) = − y + 3 ln(y) + 3 + + E y (1 − y−1)2 2 2 1 − y−1 y − 1 1 − y−1 3 ln2(y) 1 − y−1  + 1 − 2 ln (y) + 2 ln(1 − y−1) − 3 ln(y) + 1 − y−1 2 2

and, if α = 2,

4  2y 6 2 1 + y−1 (|Y − µ |3) = − 3 ln(y) − + + E y 1 − y−1 1 + y−1 1 + y (1 + y)2 2 6 2  + 3 1 + 2 ln 2 − 2 ln(1 + y−1) − + 1 + y−1 (1 + y−1)2

For simplicity, let us look at the case 2 < α ≤ 3 and consider the Berry-Ess´eenin- equality. For α > 3, we would use the Edgeworth expansion, with similar arguments as developed below. Note that the Berry-Ess´eeninequality has been proved by Petrov to hold also for probability density functions (see [38], or [39]). Since then, various authors have worked on this type of inequality, in particular to sharpen the accuracy of the constant appearing in it. In the case of uniform Berry-Ess´eenbounds, the value of the constant factor c has decreased from 7.59 by Ess´een(1942) to 0.4785 by Tyurin (2010; [48]), to 0.4748 by Shevtsova (2011; [45]; see also [30] for a detailed review), in the independent case, and to 0.5600 in the general case. On the other hand, Ess´eenhad shown that c admits 0.4097 as a lower bound. In the case of non-uniform Berry-Ess´een bounds, the best upper bound on c in the iid case is over 25 times the corresponding best known lower bound, and this gap factor is greater than 31 in the general case. Note also that these past decades, much literature has been dedicated to the generalization of this type of inequality; we will not provide exhaustive references, besides pointing out the remarkable contribution by Stein (1972 [46], 1986 [47]) who proposed a uniform upper bound to the normal approximation as in the Berry-Ess´eenbound, but under general distributional assumptions, allowing dependent and nonidentical distributions; the Stein method has been used to develop many studies, in particular by Chen & Shao (2004, [7]) to obtain sharp bounds of the Berry-Ess´eentype under local dependence (see also [41] for new developments).

Since α > 2, we only have to consider the case k = 1 (see Table 2). We can write

Z x   | (S ≤ x)−G | ≤ | T ≤ x−y / X = y −Φ 2 (x−y)|f (y)dy P n n,α;1 P 1 (n) (n−1)µ1y,(n−1)γ1y (n) 1 Since the conditions on moments of Y are satisfied, we can use the Berry-Ess´eenin- equality to provide a uniform bound of the error made when approximating the exact

41 distribution by Gn,α;1. Indeed we have   T ≤ x − y / X = y − Φ 2 (x − y) P 1 (n) (n−1)µy,(n−1)γy n−1  X  2 ≤ sup P Yi ≤ v − Φ(n−1)µy,(n−1)γy (v) v i=1

Pn−1 Y − (n − 1)µ  i=1 i y = sup P √ ≤ v − Φ(v) v n − 1 γy C(y) (|Y − µ |3) √ E y ≤ c × with C(y) := 3 (51) n − 1 γy where c is a constant (with 0.4 < c < 0.5) independent of the parameters and of x. We deduce that, for any x > 1, introducing β such that 0 < β < α, c Z x |P(Sn ≤ x) − Gn,α;1(x)| ≤ √ C(y) f(n)(y) dy n − 1 1 c α n Z x = √ C(y) (1 − y−α)n−1y−α−1 dy n − 1 1 1− 1 x c α n α Z ≤ × × max (1 − y−α)n−1C(y) y−α+β f (y)dy 1 − 1 P(β) (n − 1) 2 n α β y∈(1,x] 1 i.e.

1− 1 c α n α −α n−1 −α+β −β | (Sn ≤ x)−Gn,α;1(x)| ≤ × max (1 − y ) C(y) y ×(1−x ) P 1 − 1 (n − 1) 2 n α β y∈(1,x] (52) where C(y) satisfies (51) and fP(β) denotes the pdf of a Pareto with shape parameter β (and support [1, ∞)). Notice that we made the Pareto distribution with parameter β appear in the bound given on the RHS of this last equation (52) in order to keep a bound expressed as a function of x. We will have to chose β small enough since it is increasing with β. 1− 1 n α The next study concerns × max (1 − y−α)n−1C(y) y−α+β, which, as a function β y∈(1,x] of y, increases then decreases. We study it numerically (for any α ∈ (2; 4], although we are looking for a bound of the distance between the distributions in the case 2 < α ≤ 3), since it would be too tedious to do it analytically. It appears that it is decreasing with increasing n, β and α, respectively. Hence to take both opposite conditions on β into account, we chose β = 0.1α. It can be shown numerically that, for α ∈ (2; 4], β = 0.1α and ∀n ≥ 52,

1− 1 n α max (1 − y−α)n−1C(y) y−α+β < K(α, n) (53) β y∈(1,x]

42 with K(a, n) which is decreasing with increasing n (and more or less stable with α) K(α, n) ≤ 1.23, ∀α > 2, ∀n ≥ 52 and, more precisely,

Table 3: Value of the function K(α, n)

α 2.01 2.1 2.5 3 3.5 4 n 52 1.134 1.158 1.211 1.225 1.207 1.175 100 1.054 1.081 1.161 1.203 1.208 1.191 250 0.953 0.986 1.096 1.169 1.201 1.206 500 0.886 0.923 1.047 1.139 1.187 1.206

Hence the following result, deduced from (52) :

Proposition 3.2 The error between the true distribution of Sn and its approximation Gn,α;1 is bounded by:

c α −0.1α | (Sn ≤ x) − Gn,α;1(x)| < K(α, n) × (1 − x ) (54) P 1 − 1 (n − 1) 2 n α where K(α, n) is a small constant depending on α and n with some values given in Table 3. The Kolmogorov-Smirnov distance between Sn and its approximation can be then deduced, namely c α K(α, n) 0.5 α sup | (Sn ≤ x) − Gn,α;1(x)| < < P 1 − 1 1 − 1 x (n − 1) 2 n α (n − 1) 2 n α

Note that the bound given in (54) could be refined using a non-uniform Berry-Ess´een bound, as follows   T1 ≤ x − y / X(n) = y − Φ 2 (x − y) P (n−1)µ1y,(n−1)γ1y ! Pn−1 Y − (n − 1)µ x − y − (n − 1)µ x − y − (n − 1)µ  i=1 i 1y 1y 1y = P √ ≤ √ − Φ √ n − 1 γ1y n − 1 γ1y n − 1 γ1y c C(y) 1 ≤ √ × (55) 3 n − 1  x−y−(n−1)µ1y  1 + √ n−1 γ1y from which we deduce that, for any x ≥ 1, c Z x C(y) | (S ≤ x) − G (x)| ≤ √ f (y) dy P n n,α;1 3 (n) n − 1  x−y−(n−1)µ1y  1 1 + √ n−1 γ1y

43 In any case, it would have to be solved numerically, as there is no known analytical solution for this antiderivative.

Instead, to asymptotically refine the upper bound (54) given in Proposition 3.2, as 1− 1 n α x → ∞, we use the property of the function × (1 − y−α)n−1C(y) y−α+β, which β rapidly decays to 0 whenever y becomes large.

Let us introduce (ln(n))α 1  a := exp + α + (56) n n1/α [α]

where [x] denotes the integer part of x ∈ R. If x ≤ an, we keep this bound; otherwise, i.e. if x > an, we write

|P(Sn ≤ x) − Gn,α;1(x)| ≤ c α n Z an Z x  √ C(y) (1 − y−α)n−1y−α−1 dy + C(y) (1 − y−α)n−1y−α−1 dy ≤ n − 1 1 an ( 1− 1 ) c α n α K(α, n) × 1 − a−β + × max (1 − y−α)n−1C(y) y−α+β (a−β − x−β) 1 − 1 n n (n − 1) 2 n α β y∈(an,x]

It can be shown numerically, as for K(α, n), that

1− 1 n α × max (1 − y−α)n−1C(y) y−α+β < K˜ (α, n) β y∈(an,x]

with K˜ (α, n) < 1/2 and soon close to 0 for large n. One could provide a similar table for K˜ (α, n) as for K(α, n), which would show much smaller numbers.

We can then conclude to the following bound for the error between the distributions, much better asymptotically in x, and thus for the risk measures.

c α  −β | (Sn ≤ x) − Gn,α;1(x)| ≤ × K(α, n) × 1 − a P 1 − 1 n (n − 1) 2 n α o ˜ −β −β + K(α, n) × an − x (57)

˜ with an defined in (56), K(α, n) in Table 3 and K(α, n) < 1/2.

Remark. Let us briefly look at the case α ≤ 2 (and α > 1/2). We have seen that such a case implies k ≥ 2 (see Table 2). We have

Z x Z x−y

|P(Sn ≤ x) − Gn,α;k(x)| ≤ f(n−k+1)(y) fUn−k+1/X(n−k+1)=y ∗ 1 0 (n−k)∗ g − ϕ 2 (v)dv dy (n−k)µy,(n−k)γy

44 (n−k)∗ To evaluate g − ϕ 2 , we will use Petrov’s result for pdf ([38]), so we (n−k)µy,(n−k)γy n−k X Yi − µy need to go back to the pdf of the standardized sum √ of iid rv’s with pdf i=1 n − k γy g˜, which can be expressed as 1  . − µ  g˜(n−k)∗ where g(.) = √ g˜ √ y n − k γy n − k γy It is straigthforward to show by induction that 1 v − (n − k)µ  g(n−k)∗(v) = √ g˜(n−k)∗ √ y n − k γy n − k γy 1 x − a Then, since ϕ 2 (x) = ϕ , we can write a,b b b   (n−k)∗ 1 (n−k)∗  v − (n − k)µy 2 √ √ g (v) − ϕ(n−k)µy,(n−k)γy (v) = g˜ − ϕ (58) n − k γy n − k γy

Since we consider a sum of (n−k) iid rv’s Yi (i = 1, . . . , n−k) with parent rv Y having a finite pth moment, we obtain via Petrov ([38]) that there exist a constant c (with 0.4 < c < 0.5) independent of all the parameters and x such that   (n−k)∗  v − (n − k)µy (n−k)∗ c C(y) sup g˜ − ϕ √ = sup g˜ (v) − ϕ(v) ≤ √ (59) v n − k γy v n − k where C(y) is defined in (51). Hence, combining (58) and (59) gives 3 (n−k)∗ c C(y) c E(|Y − µy| ) sup g (v) − ϕ 2 (v) ≤ = × (n−k)µy,(n−k)γy 2 v (n − k)γy n − k var (Y ) and so c Z x C(y) Z x−Zy |P(Sn ≤ x) − Gn,α;k| ≤ f(n−k+1)(y) fUn−k+1/X(n−k+1)=y(t)dt dv dy (n − k) 1 γy 0 α n! with f (y) = (1 − y−α)n−ky−αk−1 (see (22)). (n−k+1) (n − k)!(k − 1)! We could propose, choosing β < αk, and following the same way as for the case α > 2,

|P(Sn ≤ x) − Gn,α;k| ≤   Z x c α n! C(y) −α n−k −αk+β 2 max (x − y) (1 − y ) y fP(β)(y) dy = β(n − k) (n − k − 1)!(k − 1)! y∈(1,x) γy 1   c α n! C(y) −α n−k −αk+β −β 2 max (x − y) (1 − y ) y (1 − x ) β(n − k) (n − k − 1)!(k − 1)! y∈(1,x) γy but this last upper bound would be reasonable for x < ky, but not for large x. Some further study would be required.

45 3.3 Method 2 : a weighted normal limit

In this method, we go back to the first decomposition (18) of Sn and use limit theorems for both terms Tk and Un−k instead of proceeding via conditional independence and considering a small given k = k(α). It means that we need to choose k as a function of n such that k = k(n) → ∞ as n → ∞, for the approximation of the distribution of Un−k via its limit to be relevant.

First we consider a normal approximation for the trimmed sum Tk, which implies some conditions on the threshold k (see [8]). We need to select a threshold k such that   k satisfies (21) k → ∞ as n → ∞ (60)  k/n → 0 as n → ∞ or k = [nρ] with 0 < ρ < 1/2 Note that the condition (21) will be implied by the condition k → ∞. Hence, for n→∞ this method, k does not depend directly on the value of α.

We can then enunciate the following:

Proposition 3.3 Take α > 1/4. Let p ≥ 2 and k = k(n, α) satisfy (60). The distribution of the trimmed sum Tk defined in (18) can be approximated, for large n,  2  by the normal distribution N m1(α, n, k), σ (α, n, k) :

  d  2  L Tk ∼ N m1(α, n, k), σ (α, n, k) (61) n→∞

2 where the mean m1(α, n, k) and the variance σ (α, n, k) are defined respectively by

n−k n−k X X n!Γ(n − i + 1 − 1/α) m (α, n, k) := (X ) = (62) 1 E (i) (n − i)!Γ(n + 1 − 1/α) i=1 i=1 n−k i−1 X Y n − j = (63) n − j − 1/α i=1 j=0

2 2 σ (α, n, k) := m2(α, n, k) − m1(α, n, k) (64)

46 with

n−k n−k j−1 X 2 X X m2(α, n, k) := E(X(i)) + 2 E(X(i)X(j)) i=1 j=2 i=1 n−k n! X Γ(n − i + 1 − 2/α) = + (65) Γ(n + 1 − 2/α) (n − i)! i=1 n−k j−1 ! X X Γ(n − j + 1 − 1/α)Γ(n − i + 1 − 2/α) 2 (n − j)!Γ(n − i + 1 − 1/α) j=2 i=1 n−k j−1 j−1 i−1 j−1 ! X Y n − l X Y n − l Y n − l = + 2 (66) n − l − 2/α n − l − 2/α n − l − 1/α j=1 l=0 i=1 l=0 l=i

Note that k = k(α) is chosen in such a way that σ2(α, n, k) is finite. The case k = 2 corresponds to the one developed in Zaliapin et al’s. (but with a different set of defi- nition for α).

Proof of Proposition 3.3 n−k n−k X X It is sufficient to notice that Tk can be considered as the sum Tk = X(j) = Yj j=1 j=1  with (Yj) an (n − k)-sample with parent cdf defined by FY (.) = P Xi ≤ ./Xi <  X(n−k+1) . Hence the CLT applies to Tk when requiring p ≥ 2. The result follows directly using (25) and (26) to obtain the first equalities (62) and (65). The simplified expressions (63) and (66) come from straightforward computations using the recursive definition of the gamma function for non negative real numbers z, namely: Γ(z + 1) = zΓ(z). Indeed we can write, for 1 ≤ i ≤ n − k, and assuming α ≥ max(1/(k + 1), 1/4),

Γ(n + 1 − 1/α) = (n − 1/α)Γ(n − 1/α) = (n − 1/α)(n − 1 − 1/α)Γ(n − 1 − 1/α) = ... i−1 Y = (n − j − 1/α)Γ(n − i + 1 − 1/α) j=0 from which we deduce that, for 1 ≤ i ≤ n − k with k ≥ 1, for α ≥ max(1/(k + 1), 1/4),

n! Γ(n − i + 1 − 1/α) (X ) = h(n, i, 1/α) := × E (i) (n − i)! Γ(n + 1 − 1/α) i−1 Y (n − j) = (67) (n − j − 1/α) j=0

47 and so (63). Let us proceed in the same way to simplify (65). Note that m2(α, n, k) can be expressed, in terms of the function h introduced above, as

n−k j−1 ! X X m2(α, n, k) = h(n, 1, 2/α) + h(n, j, 2/α) + 2 h(n, i, 2/α) h(n − i, j − i, 1/α) j=2 i=1 therefore, using (67), it comes, for α ≥ max(2/k, 1/4),

n−k j−1 j−1 i−1 j−i−1 ! X Y (n − l) X Y (n − l) Y (n − i − l) m (α, n, k) = + 2 2 (n − l − 2/α) (n − l − 2/α) (n − i − l − 1/α) j=1 l=0 i=1 l=0 l=0 n−k j−1 j−1 i−1 j−1 ! X Y (n − l) X Y (n − l) Y (n − l) = + 2 2 (n − l − 2/α) (n − l − 2/α) (n − l − 1/α) j=1 l=0 i=1 l=0 l=i

Let us turn now to the limit behavior of the partial sum Un−k. The main idea of this method relies on using an estimation (involving the last order statistics) of the Expected Shortfall ES(X) of X to propose an approximation for the second term 1 Un−k. So it implies to assume X ∈ L , i.e. α > 1 (to be able to define ES(X)). Let us recall the following result (see [1] for the proof, or [20]), that we are going to use. 1 Lemma 3.2 For a sequence (Li)i∈N of L -iid rvs with cdf FL, we have P[n(1−γ)]−1 i=0 Ln−i,n lim = ESγ(L) a.s. n→∞ [n(1 − γ)]

where L1,n ≤ ... ≤ Ln,n are the order statistics of L1, ··· ,Ln and where [n(1 − γ)] denotes the largest integer not exceeding n(1 − γ), 0 < γ < 1. In other words, expected shortfall at confidence level γ can be thought of as the lim- iting average of the [n(1 − γ)] upper order statistics from a sample of size n from the loss distribution.

Now we can enunciate the main empirical result.

Result 3.1 Let X be a α-Pareto rv with cdf (2), with α > 2, and (Xi, i = 1, . . . , n) an n-sample with parent rv X. Let us choose k = k(n, γ) with γ = 0.9, such that k = k(n, γ) = [n(1 − γ)] = [n/10] (68)

The distribution of Sn expressed in (18) can be approximated, for any n and any α > 2, by a normal approximation with mean (m1(α, n, k) + kESγ) and variance 2 2 2 γ σ (α, n, k) ESγ :

 2 2 2 N m1(α, n, k) + k ESγ(X) , γ σ (α, n, k) ESγ

48 2 where m1(α, n, k), σ (α, n, k) are defined in (63), (64) respectively, and ESγ(X) in (5), α i.e. ES (X) = (1 − γ)−1/α γ (α − 1)

Proof of Result 3.1 Note that the choice (68) of k implies that it satisfies (60) with ρ = 1 − γ, and has been made according to Lemma 3.2 and [8]. We deduce from Lemma 3.2 that Un−k defined in (18) satisfies, with k chosen as in (68), k−1 Un−k 1 X a.s. = X −→ ES (X) as n → ∞ k k (n−j) γ j=0

d d d Recall that if, as n → ∞, Zn → Z and Wn → a, then Zn + Wn → Z + a, for any rv’s (Zn), Z,(Wn) and any constant a. On one hand, by Proposition 3.3, we have  2  Tk ∼ N m1(α, n, k), σ (α, n, k) . On the other hand, we have Un−k ∼ k ESγ(X). n→∞ n→∞ Therefore, for large n,

d  2  L(Tk + Un−k) ∼ N m1(α, n, k), σ (α, n, k) + k ESγ(X) =

 2  N m1(α, n, k) + k ESγ(X) , σ (α, n, k)

Note that m1(α, n, k) + k ESγ(X) is close to E(Sn) = n var(X) = m1(α, n, 0) (a bit larger), which makes sense when looking at the tool we used. But the variance 2 2 σ (α, n, k) of Tk is too small compared with var(Sn) = σ (α, n, 0), hence a correction 2 has to be made, of at least the order var(Sn)/σ (α, n, k), taking into account the num- ber k of extremes we considered, α and ESγ (as for the mean). 2 2 2 It has been checked numerically that considering a variance as γ σ (α, n, k) ESγ , with γ = 0.9, allows to get a good approximation of the tail of the distribution of Sn, for any n and any α > 2, as seen in the results tables. 2

Comments

1. This result is interesting since it shows that, even if we want to consider a normal approximation, there must be a correction based on ESγ and the number of extremes that we considered, such that both the mean and the variance become larger than the ones of Sn. 2. The final approximation being normal, its tail distribution is light, hence we do not expect a priori an evaluation of the VaR as accurate as the one provided by Normex, but better than the normal one applied directly on Sn. The light tail should still lead to an underestimation of the VaR, but certainly not as gross as the one when applying directly the CLT, because of the correcting term expressed in terms of the ES.

49 3. We will compare numerically the tail approximation with the exact one, but also the modified normal approximation with the normal one made directly on Sn. 4. To obtain a good fit requires a calibration of γ. Numerically, it appeared that the value γ = 0.9 provides a reasonable fit, for any n and any α > 2. It is an advantage that γ does not have to be chosen differently, depending on these parameters n and α, in order to keep the generality of the method. Next step will consist in the analytical evaluation of this method and to generalize it, if possible, to any α > 0.

4 Application to risk measures

4.1 Possible approximations of VaR As an example, we treat the case of one of the two main risk measures, and choose the VaR, since it is the main one used for solvency requirement. We would proceed in the same way for the Expected Shortfall.

• Let us deduce the approximations of the VaR of order q from the two new limit results coming from Normex and the weighted normal method, and recall those given (i) for the other existing methods (see Section 1). Let zq denotes the VaR computed from the chosen method (i), namely (1) for the GCLT approach, (2) for the CLT one, (3) for the max one, (4) for the Zaliapin et al.’s method, (5) for Normex and (6) for our weighted normal method. Note that, for further reference and practical use, we will make them appear according to the value of α, even if it may appear repetitive.

. For 0 < α < 2:

- via the GCLT : (1) 1/α ← zq = n Cα Gα (q) + bn (Gα (α, 1, 1, 0)-stable distribution) for 1/2 < α < 2, and for q > 0.95, (1bis) 1/α −1/α zq = n q + bn

- via the Max (EVT) approach, for high order q : −1/α (3) 1/α  zq = n log(1/q) + bn

- via the Zaliapin et al.’s method (see [49]), for 2/3 < α < 2 : (4)  ←  ← zq = σ(α, n, 2) Φ (q) + m1(α, n, 2) + Tα,n(q)  with Tα,n the cdf of X(n−1) + X(n)

50 - via Normex : (5) ← zq = Gn,α,k(q) with Z x Z x−yZ v  f(n−k+1)(y) v − u − m1(y) (k−1)∗ Gn,α,k(x) = ϕ hy (u)du dv dy 1 σ(y) 0 0 σ(y)

. For α = 2:

- via the (G)CLT : (1) ← zq = dn Φ (q) + 2n

- via the Max (EVT) approach, for high order q : −1/α (3) 1/α  zq = n log(1/q) + bn

- via Normex : (5) ← zq = Gn,α,2(q)

. For 2 < α ≤ 4:

- via the CLT √: nα nα z(2) = √ Φ←(q) + q (α − 1) α − 2 α − 1

- via the Max (EVT) approach, for high order q : −1/α (3) 1/α  zq = n log(1/q) + bn

- via Normex : (5) ← zq = Gn,α,1(q) with Z x Z x−y   1 −(1+α) −α n−1 v − m1(y) Gn,α,1(x) = nα y (1 − y ) ϕ dv dy 1 σ(y) 0 σ(y)

- via the Weighted Normal limit : (6)  ←  zq = γ ESγ σ α, n, k Φ (q) + m1 α, n, k + k ESγ α with k = [n(1 − γ)], , γ = 0.9,ES = (1 − γ)−1/α γ (α − 1) 2 m1(α, n, k) and σ (α, n, k) defined in (63), (64), respectively

51 • Comparison

Since there is no explicit analytical formula for the true quantiles of Sn, their compar- ison with the quantiles obtained by other methods will be mainly numerical. Nevertheless, in the case α > 2, we can compare analytically the VaR obtained when (2) doing a rough normal approximation directly on Sn (namely zq ) with the one obtained (6) via the shifted normal method (namely zq ) . As seen in (7), applying directly the  n α n α  CLT on S provides the normal approximation N , , hence a n α − 1 (α − 1)2(α − 2) (2) V aRq given by zq expressed in (12). So, we obtain the correcting term to the CLT as:  √  (6) (2) nα ← z − z = γ ESγ σ(α, n, k) − √ Φ (q) q q (α − 1) α − 2 n α [n(1 − γ)]  + m (α, n, k) + (1 − γ)−1/α − 1 1 (α − 1) n

4.2 Numerical study - comparison of the various methods 4.2.1 Presentation of the study

We simulate (Xi, i = 1, . . . , n) with parent r.v. X α-Pareto distributed, with different sample sizes, varying from n = 52 (corresponding to aggregating weekly returns to obtain yearly returns), through n = 250 (corresponding to aggregating daily returns to obtain yearly returns), to n = 500 representing a large size portfolio.

We consider different shape parameters, namely α = 3/2; 2; 5/2; 3; 4, respectively. Recall that simulated Pareto rv’s Xi’s (i ≥ 1) can be obtained simulating a uniform rv −1/α U on (0, 1] then applying the transformation Xi = U .

For each n and each α, we aggregate the realizations xi’s (i = 1, . . . , n). We repeat the 7 7 operation N = 10 times, thus obtaining 10 realizations of the Pareto sum Sn, from which we can estimate its quantiles.

Let zq denotes the empirical quantile of order q of the Pareto sum Sn (associated with the empirical cdf FSn and pdf fSn ), defined by

zq := inf{t | FSn (t) ≥ q}, with 0 < q < 1.

Recall, for completeness, that the empirical quantile of Sn converges to the true quantile as N → ∞ and has an asymptotic normal behavior, from which we deduce the following confidence interval at probability a for the true quantile: p ← q(1 − q) zq ± Φ (a/2) × √ (69) fSn (q) N

52 where fSn can be empirically estimated for such a large N. We do not compute them numerically: N being very large, bounds will be close.

The tables, given below, present, for various values of α and n, the values of the quantiles of order q obtained by the GCLT method, the Max one, Normex, and the (i) Weighted Normal one when α > 2, respectively. We compare these quantiles zq ((i) indicating the chosen method) with the (empirical) quantile zq obtained via Pareto simulations (representing the true quantile). For that, we introduce the approximative relative empirical error: z(i) δ(i) = δ(i)(q) = q − 1 zq We consider three possible order q: 95%, 99% (threshold for Basel II) and 99.5% (threshold for Solvency 2) and obtain the results given below.

When applying Normex for the case α ≤ 2, the correct choice of k is 2 and only the results with this value should be considered. Nevertheless, we also report in the tables the results obtained when only the maximum is put aside, i.e. when k = 1, to explore the influence of the choice of k on the result.

We use the software R to perform this numerical study, with different available packages (see the appendix). Let us particularly mention the use of the procedure Vegas in the package R2Cuba for the computation of the double integrals. This procedure turns out not to be always very stable for the most extreme quantiles, mainly for low values of α. In practice, for the computation of integrals, we would advise and plan to test various procedures in R2Cuba (Suave, Divonne and Cuhre, besides Vegas) or to look for other packages. Another possibility would be to implement the algorithm using all together a different software, as e.g. Python.

4.2.2 Estimation of the VaR with the various methods

• Case α = 3/2

53 n = 52 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) 95% 246.21 280.02 256.92 246.3 (k=1) 245.86 (k=2) 13.73 4.35 0.04 (k=1) - 0.14 (k=2) 99% 450.74 481.30 455.15 469.5 (k=1) 453.92 (k=2) 6.78 0.97 4.16 (k=1) 0.71 (k=2) 99.5% 629.67 657.91 631.66 759.21 (k=1) 645.60 (k=2) 4.48 0.31 20.6 (k=1) 2.53 (k=2)

n = 100 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) 95% 442.41 491.79 456.06 441.8 (k=1) 443.08 (k=2) 11.16 3.09 - 0.14 (k=1) 0.15 (k=2) 99% 757.82 803.05 762.61 800.14 (k=1) 761.66 (k=2) 5.97 0.63 5.58 (k=1) 0.51(k=2) 99.5% 1031.56 1076.18 1035.58 - (k=1) 1032.15 (k=2) 4.33 0.39 - (k=1) 0.06 (k=2)

54 n = 250 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) 95% 1017.64 1103.27 1037.47 1019.22 (k=1) 1022.04 (k=2) 8.42 1.95 0.16 (k=1) 0.43 (k=2) 99% 1594.97 1676.63 1602.13 1630.50 (k=1) 1629.60 (k=2) 5.12 0.45 2.23 (k=1) 2.17 (k=2) 99.5% 2099.49 2179.73 2104.94 - (k=1) 2238.71 (k=2) 3.82 0.26 - (k=1) 6.63 (k=2)

n = 500 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) 95% 1929.32 2060.79 1956.32 1929.88 (k=1) 1930.31 (k=2) 6.81 1.40 0.03 (k=1) 0.05 (k=2) 99% 2850.51 2970.93 2852.67 2883.57 (k=1) 2850.17 (k=2) 4.22 0.076 1.16 (k=1) - 0.01 (k=2) 99.5% 3651.13 3769.55 3650.84 - (k=1) 3706.05 (k=2) 3.24 - 0.79 - (k=1) 1.50 (k=2)

• Case α = 2

55 n = 52 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) 95% 135.48 132.29 135.84 135.49 (k=1) 135.47(k=2) -2.35 0.26 0.01 (k=1) - 0.01 (k=2) 99% 177.28 144.02 175.93 177.48 (k=1) 180.16 (k=2) -18.76 -0.76 0.11 (k=1) 1.62 (k=2) 99.5% 207.09 148.31 205.85 224.42 (k=1) 215.29 (k=2) -28.38 - 0.60 8.37 (k=1) 3.96 (k=2)

n = 100 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) 95% 245.82 241.85 244.15 245.55 (k=1) 245.79 (k=2) -1.62 -0.68 -0.11 (k=1) - 0.01 (k=2) 99% 303.45 259.19 299.75 305.07 (k=1) 308.77 (k=2) -14.59 -1.22 0.53 (k=1) 1.75 (k=2) 99.5% 344.23 265.53 341.24 361.80 (k=1) 353.98 (k=2) - 22.86 - 0.87 5.10 (k=1) 2.83 (k=2)

56 n = 250 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) 95% 576.82 571.42 569.81 576.90 (k=1) 576.83 (k=2) -0.93 -1.21 0.01 (k=1) 0.00 (k=2) 99% 666.66 601.01 657.72 669.07 (k=1) 677.33 (k=2) -9.85 -1.34 0.36 (k=1) 1.60 (k=2) 99.5% 730.79 611.85 723.33 754.27 (k=1) 750.41(k=2) -16.28 -1.02 3.21 (k=1) 2.68 (k=2)

n = 500 Simul GCLT Max Normex (1) (3) (5) q zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) 95% 1113.04 1106.19 1098.73 1113.01 (k=1) 1114.08 (k=2) -0.62 -1.29 0.00 (k=1) 0.09 (k=2) 99% 1240.02 1150.18 1223.05 1242.14 (k=1) 1246.97 (k=2) -7.25 -1.37 0.17 (k=1) 0.56 (k=2) 99.5% 1330.40 1166.29 1315.83 1356.62 (k=1) 1369.18 (k=2) -12.33 - 1.1 1.97 (k=1) 2.92 (k=2)

• Case α = 5/2

57 n = 52 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) δ(6)(%) 95% 103.23 104.35 102.60 103.17 109.25 1.08 -0.61 - 0.06 5.83 99% 119.08 111.67 117.25 119.11 118.57 -6.22 -1.54 0.03 -0.43 99.5% 128.66 114.35 127.07 131.5 121.98 -11.12 -1.24 1.21 -5.19

n = 100 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3)(%) δ(5)(%) δ(6)(%) 95% 189.98 191.19 187.37 189.84 197.25 0.63 -1.38 -0.07 3.83 99% 210.54 201.35 206.40 209.98 209.74 -4.36 -1.96 -0.27 -0.38 99.5% 222.73 205.06 219.14 223.77 214.31 -7.93 -1.61 0.47 -3.78

n = 250 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 454.76 455.44 446.53 453.92 464.28 0.15 -1.81 -0.18 2.09 99% 484.48 471.5 473.99 483.27 483.83 -2.68 -2.17 -0.25 -0.13 99.5% 501.02 477.38 492.38 501.31 490.98 -4.72 -1.73 0.06 -2.00

n = 500 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 888.00 888.16 872.74 886.07 900.26 0.02 -1.72 -0.22 1.38 99% 928.80 910.88 908.97 925.19 927.80 -1.93 -2.14 -0.39 -0.11 99.5% 950.90 919.19 933.23 948.31 937.89 -3.33 -1.86 -0.27 -1.37

58 • Case α = 3

n = 52 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 88.68 88.27 88.05 88.65 91.25 -0.47 -0.71 - 0.04 2.89 99% 96.76 92.53 95.30 97.00 96.7 -4.37 -1.51 0.25 - 0.06 99.5% 100.94 94.09 99.81 101.08 98.7 -6.79 -11.03 0.14 - 2.22

n = 100 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 164.91 164.24 162.49 164.77 168.02 -0.41 -1.47 - 0.08 1.89 99% 175.30 170.15 171.51 175.13 175.36 -2.94 -2.16 -0.10 0.03 99.5% 180.62 172.31 177.12 180.56 178.05 -4.60 -1.94 - 0.03 -1.42

n = 250 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 398.39 397.52 391.95 398.23 403.08 -0.22 -1.62 -0.04 1.18 99% 413.32 406.85 404.19 412.71 414.59 -1.56 -2.21 -0.15 0.31 99.5% 420.49 410.27 411.81 419.28 418.8 -2.43 -2.06 -0.29 -0.40

n = 500 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 782.99 781.85 771.36 782.56 789.48 -0.15 -1.49 -0.05 0.83 99% 802.08 795.05 786.78 801.44 805.7 -0.88 -1.91 -0.08 0.45 99.5% 810.95 799.88 796.38 809.73 811.64 -1.35 -1.80 -0.15 0.09

59 • Case α = 4

n = 52 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 75.35 74.92 74.97 75.30 75.86 -0.57 -0.49 -0.07 0.68 99% 79.11 77.24 77.81 78.92 78.54 - 5.12 -1.63 -0.25 -0.72 99.5% 80.82 78.09 79.43 80.49 79.52 -3.37 -1.73 -0.41 -1.61

n = 100 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 141.58 141.09 139.98 141.52 142.26 -0.35 -1.13 -0.04 0.48 99% 146.32 144.30 143.32 146.12 145.88 -1.38 -2.05 -0.14 -0.30 99.5% 148.38 145.47 145.22 148.05 147.2 -1.96 -2.12 -0.22 -0.80

n = 250 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 346.31 345.59 341.69 346.06 347.24 -0.21 -1.33 -0.07 0.27 99% 352.97 350.67 345.89 352.34 352.93 -0.65 -2.00 -0.18 -0.01 99.5% 355.74 352.53 348.28 355.22 355.01 -0.90 -2.19 -0.15 - 0.21

n = 500 Simul CLT Max Normex Weighted Normal (2) (3) (5) (6) q zq zq zq zq zq δ(1) (%) δ(3) (%) δ(5) (%) δ(6)(%) 95% 684.99 684 676.60 685.56 686.22 -0.14 -1.22 0.08 0.18 99% 693.85 691.19 681.60 695.55 694.23 -0.38 -1.77 0.25 0.05 99.5% 697. 36 693.81 684.44 698.85 697.18 - 0.51 -1.85 0.21 -0.03

60 4.2.3 Discussion of the results

• Those numerical results are subject to numerical errors due to the finite sample of simulation of the theoretical value, as well as the choice of random generators, but the most important reason for numerical error of our methods resides in the convergence of the integration methods. Thus, one should read the results, even if reported with many significant digits, to a confidence we estimate to be around 0.1%

• Concerning our first method, Normex, we find out that:

- the accuracy of the results appears more or less independent of the sample size n, which is the major advantage of our method when dealing with the issue of aggregation - for α > 2, it always gives sharp results (error less than 0.5% and often extremely close); for most of them, the estimation is indiscernible from the true value, obviously better than the ones obtained with the other methods - for α = 2, the choice k = 1 gives in general better results for this method than k = 2; in fact k = 2 is a limiting case for choosing the number of extremes to trim - for α ≤ 2, the results for the most extreme quantiles are less satisfactory than expected. We attribute this to a numerical instability in the integration procedure used in R; we plan to explore this problem further - For very large quantiles (≥ 99.5%), the convergence of the integral seems a bit more unstable (due to the use of the package Vegas in R), which may explain why the accuracy decreases a bit, and may sometimes be less than with the max method

• Our second method, the weighted normal one, gives sharp results for very extreme quantiles (above 99%) for α > 2, improving the issue of underestimation when using directly the CLT. It is often the 2nd best to Normex. It overestimates the 95% quantile, whereas the CLT is still correctly performing at this order. The accuracy increases with n.

• The max method overestimates for α < 2 and underestimates for α ≥ 2; it improves a bit for higher quantiles and α ≤ 2. It is a method that practitioners should think about, because it is very simple to use and gives already a first good approximation for the VaR (as the CLT does for the mean)

• The GCLT method (α < 2) overestimates the quantiles but improves with higher quantiles and when n increases

• Concerning the CLT method, we find out that:

61 - the higher the quantile, the higher the underestimation; it improves slightly when n increases, as expected - the smaller α, the larger the underestimation - for α ≥ 2, the VaR evaluated with the normal approximation is always lower than the VaR evaluated via Normex or the Weighted Normal method. The lower n and α, the higher the difference - the difference between the VaR estimated by the CLT and the one estimated with Normex, appears large for relatively small n, with a relative error going up to 13%, and decreases when n becomes larger - for α ≥ 2, the difference between the VaR estimated by the CLT and the one estimated with the modified CLT appears large for quantiles above 99%.

• We have concentrated our study on the VaR risk measure because it is the one used in solvency regulations both for banks and insurances. However, the Ex- pected Shortfall, which is the only coherent measure in presence of fat tails, would be more appropriate for measuring the risk of the companies.

• The difference between the VaR estimated by the CLT and the one estimated with Normex is overall not that big for large n but would certainly be much larger when the risk is measured with the Expected Shortfall, pleading for using this measure in presence of fat tails.

5 Conclusion

The main motivation of this study was to propose a sharp approximation of the entire distribution of aggregated risks when working on financial or insurance data under the presence of fat tails. In particular the aim is to obtain the most accurate evaluations of risk measures. After reviewing the existing methods, we built two new methods, Normex and Weighted Normal. Normex is a method mixing a CLT and the exact distribution for a small number (defined according to the range of α and the choice of the number of existing moments of order p) of the largest order statistics. The second approach is based on a weighted normal limit, with a shifted mean and a weighted variance, both expressed in terms of the tail distribution.

In this study, Normex has been proved, theoretically as well as numerically, to deliver a sharp approximation of the true distribution, for any sample size n and for any pos- itive tail index α, and is generally better than existing methods. The weighted normal method consists of trimming the total sum by taking away a large number of extremes, and approximating the trimmed sum with a normal distribution, then shifting it by the (almost sure) limit of the average of the extremes and correcting the variance with

62 a weight depending on the shape of the tail. It is a simple, and reasonable tool, which allows to express explicitly the tail contribution to be added to the VaR when applying the CLT to the entire sample. It has been developed empirically in this work and still requires further analytical study. It constitutes a simple and exploratory tool to remediate the underestimation of extreme quantiles over 99% .

An advantage of both methods, Normex and the weighted normal, is their generality. Indeed, trimming the total sum by taking away extremes having infinite moments (of order p ≥ 3) is always possible and allows to better approximate the distribution of the trimmed sum with a normal one (via the CLT). Moreover, fitting a normal distri- bution for the mean behavior can apply, not only for the Pareto distribution, but for any underlying distribution, without having to know about it, whereas for the extreme behavior, we pointed out that a Pareto type is standard in this context.

Normex could also be used from another point of view. We could apply it for an in- verse problem, namely for finding out a range for the tail index α when fitting this explicit mixed distribution to the empirical one. Other approaches have been proposed to estimate the heaviness of the tail, which may be put into two classes, supervised pro- cedures in which the threshold to estimate the tail is chosen according to the problem (as e.g. the MEP, Hill, or QQ methods) and unsupervised ones, where the threshold is algorithmically determined (as in the recent work [10]). Normex would be a new unsupervised approach, since the k is chosen algorithmically for a range of α.

Other perspectives concern the application of this study to real data, its extension to the dependent case, using CLT under weak dependence and some recent results on stable limits for sums of dependent infinite variance r.v. from Bartkiewicz et al. (see [3]) and Large Deviation Principles from Mikosch et al. (see [33]).

Finally this study may constitute a first step in understanding the behavior of VaR under aggregation and be helpful in analyzing the scaling behavior of VaR under ag- gregation, next important problem that we want to tackle.

63 6 Bibliography References

[1] C. Acerbi, D. Tasche, On the coherence of expected shortfall. J. Banking & Finance 26, (2002) 1487-1503.

[2] P. Arztner, Coherent measures of risks. Mathematical Finance 9, (1999) 203-228.

[3] K. Bartkiewicz, A. Jakubowski, T. Mikosch, 0. Wintenberger, Stable limits for sums of dependent infinite variance random variables. Probab. Theory Relat. Fields 150, (2012) 337-372.

[4] Basel Committee on Banking Supervision, Developments in modelling risk aggregation. Basel: Bank for International Settlements (2010).

[5] M. Blum, On the sums of independently distributed Pareto variates. SIAM J. Appl. Math. 19:1, (1970) 191-198.

[6] H. Buhlmann¨ , Paul Embrechts! http://www.risklab.ch/paul60

[7] L.H. Chen, Q.-M. Shao, Normal approximation under local dependence. Ann. Probab. 32:3, (2004) 1985-2028.

[8] S. Csorg¨ o,¨ L. Horvath,´ D. Mason What portion of the sample makes a partial sum asymp- totically stable or normal? Probab. Theory Relat. Fields 72, (1986) 1-16.

[9] G. Christoph, W. Wolf Convergence theorems with a stable limit law. Akademie-Verlag, Berlin.

[10] N. Debbabi, M. Kratz, M. Mboup, Unsupervised threshold determination for Hybrid models. Preprint 2013

[11] O. Pictet, M. Dacorogna, U.A. Muller¨ , Hill, Bootstrap and Jackknife Estimators for heavy tails, In Practical guide for heavy tails distributions. Ed. M.Taqqu, Birkh¨auser (1996).

[12] U.A. Muller,¨ M. Dacorogna, O. Pictet , Heavy tails in high frequency financial data, In Practical guide for heavy tails distributions. Ed. M.Taqqu, Birkh¨auser(1996).

[13] M. Dacorogna, R. Genc¸ay, U.A. Muller,¨ R. Olsen, O. Pictet, An introduction to High-Frequency Finance. Academic Press (2001).

[14] J. Dan´ıelsson, B. Jorgensen, G. Samorodnitsky, M. Sarma, C. de Vries (2005). Sub- additivity re-examined: the case for Value-at-Risk. FMG Discussion Papers, London School of Economics.

[15] H.A. David, H.N. Nadaraja, Order Statistics. Wiley (2003, 3rd ed.)

[16] A. Davison, R. Smith, Models for exceedances over high thresholds. J. Royal Stat. Soc. Series B 52 (3), (1990) 393-442.

[17] D. Dupuis, M.-P. Victoria-Feser, A robust prediction error criterion for Pareto modeling of upper tails. Canadian J. Stat. 34 (4), (2006) 339-358.

[18] P. Embrechts, D. Lambrigger, M. Wuthrich¨ (2009). Multivariate extremes and the ag- gregation of dependent risks: examples and counter-examples. Extremes 12, (2009) 107-127.

64 [19] P. Embrechts, G. Puccetti, L. Ruschendorf¨ , Model uncertainty and VaR aggregation. Preprint. (2012).

[20] P. Embrechts, C. Kluppelberg,¨ T. Mikosch, Modelling Extremal Events for Insurance and Finance. Springer (1997).

[21] W. Feller, An introduction to probability theory and its applications, Vol.II. Wiley (1966).

[22] R.A. Fisher, L.H.C. Tippett, Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proceedings of the Cambridge Philosophical Society 24, (1928) 180-190.

[23] H. Furrer, Uber die Konvergenz zentrierter und normierter Summen von Zufallsvariablen und ihre Auswirkungen auf die Risikomessung. ETH preprint (2012).

[24] S. Gosh, S.Resnick, A discussion on mean excess plots. Stoch. Proc. Appl. 120, (2010) 1492- 1517.

[25] T. Goudon, S. Junca, G. Toscani, Fourier-Based Distances and Berry-Esseen Like Inequal- ities for Smooth Densities. Monatshefte f¨urMathematik 135, (2002) 115-136

[26] M.G. Hahn, D.M. Mason, D.C. Weiner , Sums, Trimmed Sums and Extremes. Progress in Probability 23 Birkh¨auser(1991).

[27] P. Hall, On the influence of extremes on the rate of convergence in the central limit theorem. Ann. Probab. 12, (1984) 154-172.

[28] B. Hill, A simple approach to inference about the tail of a distribution. Ann. Statist. 3, (1975) 1163-1174.

[29] J. Hosking, J. Wallis, Parameter and quantile estimation for the Generalized Pareto distribution. Technometrics 29(3), (1987) 339-349.

[30] V.Y. Korolev, I.G. Shevtsova. On the upper bound for the absolute constant in the Berry- Esseen inequality. J. Theory of Probab. Applic. 54:4, (2010) 638-658.

[31] M. Kratz, S. Resnick. The QQ-estimator and heavy tails. Stoch. Models 12, (1996) 699-724.

[32] T. Mikosch, Non-life Insurance Mathematics. Springer (2004).

[33] T. Mikosch, 0. Wintenberger, Precise large deviations for dependent regularly varying sequences. Preprint (2012).

[34] G. Montserrat, F. Prieto, J.M. Sarabia, Modelling losses and locating the tail with the Pareto Positive Stable distribution. Insurance: Math. and Economics 49, (2011) 454-461.

[35] T. Mori, On the limit distribution of lightly trimmed sums. Math. Proc. Camb. Phil. Soc. 96, (1984) 507-516.

[36] A. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management. Princeton (2005).

[37] C. Perignon,´ D.R. Smith, The level and quality of Value-at-Risk disclosure by commercial banks. J. Banking Finance 34, (2010) 362-377.

[38] V. V. Petrov, A Local Theorem for Densities of Sums of Independent Random Variables. J. Theory of Probab. Applic. 1:3, (1956) 316-322.

65 [39] V. V. Petrov, Limit Theorem of Probability Theory: Sequences of Independent Random Vari- ables. Oxford Sciences Publications (1995).

[40] J. Pickands, Statistical inference using extreme order statistics. Ann. Stat. 3, (1975) 119-131.

[41] I. Pinelis, On the nonuniform Berry-Ess´eenbound. arxiv.org/pdf/1301.2828?, (2013)

[42] C.M. Ramsay, The distribution of sums of certain i.i.d. Pareto variates. Communications in Stat. - Theory and Methods 35:3, (2006) 395-405.

[43] S. Resnick, Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer (2006).

[44] G. Samorodnitsky, M.S. Taqqu, Stable non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall (1994).

[45] I.G. Shevtsova. On the absolute constants in the Berry-Esseen type inequalities for identi- cally distributed summands. Abstracts of the XXX Seminar on Stability Problems for Stochastic Models, Moscow (2012); arXiv: 1111.6554v1(2011)

[46] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 2, (1972) 583-602.

[47] C. Stein. Approximate Computations of Expectations. Lecture Notes - Monograph Series, IMS, vol.7 (1986)

[48] I.S. Tyurin. An improvement of upper estimates of the constants in the Lyapunov theorem. Russian Mathematical Surveys 65:3, (2010) 201-202.

[49] I.V. Zaliapin, Y.Y. Kagan, F.P. Schoenberg, Approximating the distribution of Pareto sums. Pure Appl. Geophysics 162, (2005) 1187-1228.

66 7 Appendix

7.1 Numerical study of K(α, n) We refer here to Section 3.2.2 where we investigate analytically the quality of the ap- proximation, via Normex, of the distribution of the Pareto sum Sn.

1− 1 n α We want to study the behavior of the function × (1 − y−α)n−1C(y) y−α+β (see β (53)) since we have to integrate it over (1, x]. It is not straightforward to obtain it from a close form solution, hence we study it numerically (for any α ∈ (2; 4]).

As a function of y, this function increases then decreases. It appears also that it is decreasing with increasing n, β and α, respectively.

Here are the various plots we obtained, for different n and α. 1.2

n=52 1.2 n=52

n=100 n=100

n=250 1.0 n=250 1.0 n=500 n=500

alpha=2.01, beta=0.1*alpha alpha=2.1, beta=0.1*alpha 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 50 100 150 200 50 100 150 200

67 n=52 n=52 n=100

1.2 n=100 n=250 1.2 n=500 n=250 n=500 1.0 1.0

alpha=2.5, beta=0.1*alpha alpha=3, beta=0.1*alpha 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 50 100 150 200 50 100 150 200

n=52n=100 n=500 n=250 n=100 n=250 n=500 n=52 1.2 1.2 1.0 1.0

alpha=3.5, beta=0.1*alpha alpha=4, beta=0.1*alpha 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 10 20 30 40 50 2 4 6 8 10

68 alpha=4 alpha=3.5

1.2 alpha=3

alpha=2.5 n=500, beta=0.1*alpha 1.0 alpha=2.1 alpha=2.01 0.8 0.6 0.4 0.2 0.0 10 20 30 40 50

7.2 Results concerning VaR The results shown in the various tables are obtained by sampling numerically the cdf’s via Normex. For getting the value at the threshold, we use a local linear interpolation between adjacent points around the threshold in most of the cases. When the noise is too big, we choose a linear interpolation with a larger step based on the plot of the data to minimize the noise.

For the benchmark values of VaR based on simulation, we evaluate the numerical noise by changing the seed for the random generator, using 107 simulations. The results are of course not precise, the noise increases with the threshold and decreases with the number of aggregation. We tested this by doing 12 times a simulation for α = 3. The results of the benchmark VaR in the tables of Section 4.2.2 for α = 3 are taken from the average of these 12 runs. In Table 4, we give the Coefficient of Variation (CV) as well as the minimum and maximum reached for those runs, for the different thresholds and aggregations.

We choose to investigate numerically the stability of the simulation although we could have computed the theoretical bounds given in (69). The main reason is that we also want to explore the stability of the random generator and the precision of the computation. Nevertheless, we see that the values in Table 4 follow the same behavior as the theoretical bounds, namely increase with the threshold q and decrease with the number of aggregation n.

69 Table 4: Descriptive statistics for the 12 runs of simulation for the Pareto sum (with α = 3)

n 52 100 250 500 CV CV CV CV min/max min/max min/max min/max q

95% 0.20 % 0.14 % 0.06% 0.07% 88.40 / 89.04 164.30/165.29 398.08/398.82 782.21/784.07

99% 0.60 % 0.34 % 0.15% 0.12% 95.78 / 97.66 174.37/176.27 411.92/414.30 800.19/803.71

99.55% 1.06 % 0.56 % 0.38% 0.21% 99.02 / 103.45 179.23/182.64 418.24/422.85 807.44/813.78

7.3 R codes Here are the various R-codes used for the numerical study of the first method - Normex, namely :

• to simulate the Pareto sums to obtain the benchmark quantiles against which the quantiles estimated through various methods are measured. This code is taken from the previous study graciously provided by H. Furrer (see [23]).

• to produce quantile estimations with known methods

• to compute quantile estimations with Normex and the R-code to explore the properties of the function appearing in (53) to maximize in the analytical comparison provided for Normex, namely

1− 1 n α (1 − y−α)n−1C(y) y−α+β β Note that these codes could certainly be improved both in their presentation as well as in their time-efficiency.

70 ########################################## # # simulation Pareto Tail-Index alpha (> 2) # ##########################################

# load package 'actuar' library(actuar) seed <- 56 set.seed(seed) alpha = 3 # Shape Parameter der Pareto-Verteilung beta = 1 # Schiefe-Parameter der (stabilen) Grenzverteilung k = 1 # lower bound des Traegers der Pareto-Verteilung mu = (alpha * k)/(alpha -1) # mean value var= (alpha * k^2)/((alpha-1)^2*(alpha-2)) # variance n = c(1,12,52,100,250,500) # Anzahl Summanden

N = 10^7 # Anzahl Simulationen m = 10^4 # package size

# a.n = sqrt(var)*sqrt(n) # normalising constants # b.n = (n * alpha * k)/(alpha-1) levels = c(0.95,0.99,0.995) data.matrix = matrix(0,nrow=length(n),ncol=N) rownames(data.matrix) = c('1','12','52','100','250','500')

# # Note: der Fall n=1 kann analytisch bestimmt werden; keine Simulation erforderlich! Die 'for'-Schleife # beginnt daher bei s=2 # for (s in 2:length(n)){

Y <- matrix(rpareto1(n[s]*m,alpha,k),nrow=n[s],ncol=m) Y.star.n <- apply(Y,2,sum) out <- Y.star.n

for (i in 1: ((N/m)-1)){ Y = matrix(rpareto1(n[s]*m,alpha,k),nrow=n[s],ncol=m) Y.star.n = apply(Y,2,sum) } data.matrix[s,]=out } quantiles = matrix(0,nrow=length(n),ncol=length(levels)) dimnames(quantiles) = list(c('1','12','52','100','250','500'),c('95','99','99.5')) for (s in 2:length(n)) { quantiles[s,] = quantile(data.matrix[s,], probs = levels) } quantiles[1,] = qpareto1(levels,alpha,k) print(cbind(alpha,N,seed)) VaR = rbind(quantiles) round(VaR,3) ############################ # # # # Computation of the quantiles with existing methods # ############################

# load package 'actuar' # load package 'stabledist' library(actuar) library(stabledist)

############################################ ####### quantiles using the GCLT for alpha\le 2: z_q^{(1)}=(nC_\a)^{1/\a}G_\a^{\leftarrow}(q)+b_n and z_q^{(3)}=d_n\Phi^ {\leftarrow}(q)+bn with bn=mun ############################################ alpha <- c(3/2, 2) conflevels <- c(0.95,0.99,0.995)

### give n ## n <- 250 zq1 <- matrix(data=NA, nrow=length(conflevels),ncol=length(alpha)) mun<- as.vector(c(1:length(alpha)), mode="numeric") sdn<- as.vector(c(1:length(alpha)), mode="numeric")

# computation of the normalization constant in the case alpha=2 fa <- function (x,r) {2*r * log(x) / x^2}

# d.n <- rep(0,length(n))

# for (j in 2:length(n)){ d <- 1.1 while (fa(d,n) > 1) { d <- d + 0.001 } print(d)

# d.n[j] <- x # } # d.n[1] <- 1 for (t in 1:length(alpha)){ mun[t] <- n*alpha[t]/(alpha[t] -1) } Ca <- (gamma(1-alpha[1])*cos(pi*alpha[1]/2))^(1/alpha[1]) sdn[1] <- Ca*n^(1/alpha[1]) sdn[2] <- d

# print(cbind(mun[1],sdn[1])) # print(cbind(mun[2],sdn[2])) for (s in 1:length(conflevels)){ print(conflevels[s]) zq1[s,1]<- sdn[1]*qstable(conflevels[s],alpha[1],1,1,pm=0) + mun[1] print(cbind(zq[s,1],sdn[1],mun[1])) zq1[s,2] <- sdn[2]*qnorm(conflevels[s], mean=0,sd=1, lower.tail=TRUE,log.p=FALSE) + mun[2] print(cbind(zq1[s,2],sdn[2],mun[2])) }

# computation of the approximative relative error RE=$Delta$

RE<- matrix(data=NA, nrow=length(conflevels),ncol=length(alpha)) for (t in 1:length(alpha)){ for (s in 1:length(conflevels)){ RE[s,t] <- zq1[s,t]/zq[s,t]-1 } }

###################################### ####### quantiles using the Max method : $z_q^{(3)}=n^{1/\a}\Big(\log(1/q)\Big)^ {-1/\a} +b_n$ ###################################### alpha <- c(3/2, 2, 5/2, 3, 4 ) conflevels <- c(0.95,0.99,0.995)

# give n n <- 250 zq3 <- matrix(data=NA, nrow=length(conflevels),ncol=length(alpha)) mun<- as.vector(c(1:length(alpha)), mode="numeric") sdn<- as.vector(c(1:length(alpha)), mode="numeric") for (t in 1:length(alpha)){ mun[t] <- n*alpha[t]/(alpha[t] -1) for (s in 1:length(conflevels)){ zq3[s,t] <- n^(1/alpha[t])*(log(1/conflevels[s]))^(-1/alpha[t]) + mun[t] } }

# computation of the approximative relative error RE=$Delta$ en pourcentage. alpha <- c(3/2, 2, 5/2, 4 )

# give n n <- 500 zq3 <- matrix(data=NA, nrow=length(conflevels),ncol=length(alpha)) mun<- as.vector(c(1:length(alpha)), mode="numeric") sdn<- as.vector(c(1:length(alpha)), mode="numeric") for (t in 1:length(alpha)){ mun[t] <- n*alpha[t]/(alpha[t] -1) for (s in 1:length(conflevels)){ zq3[s,t] <- n^(1/alpha[t])*(log(1/conflevels[s]))^(-1/alpha[t]) + mun[t] } }

RE<- matrix(data=NA, nrow=length(conflevels),ncol=length(alpha)) for (t in 1:length(alpha)){ for (s in 1:length(conflevels)){ RE[s,t] <- (zq3[s,t]/zq[s,t]-1)*100 } }

############################################ ####### quantiles using the CLT for alpha >2:z_q^{(1)}=\frac {\sqrt{n\a}}{(\a-1)\sqrt{\a-2}} ~ \Phi^{\leftarrow}(q) + \frac{n\a} {\a-1} ############################################ alpha <- c(5/2, 3, 4 ) conflevels <- c(0.95,0.99,0.995)

# give n n <- 250 zq3 <- matrix(data=NA, nrow=length(conflevels),ncol=length(alpha)) mun<- as.vector(c(1:length(alpha)), mode="numeric") sdn<- as.vector(c(1:length(alpha)), mode="numeric")

# for (s in 1:length(conflevels)){ # print(cbind(conflevels[s], qnorm(conflevels[s], mean=0,sd=1, lower.tail=TRUE,log.p=FALSE))) # } for (s in 1:length(conflevels)){ for (t in 1:length(alpha)){ mun[t] <- n*alpha[t]/(alpha[t] -1) sdn[t] <- sqrt(n*alpha[t] /((alpha[t]-1)^2*(alpha[t]-2))) zq3[s,t] <- sdn[t]*qnorm(conflevels[s], mean=0,sd=1, lower.tail=TRUE,log.p=FALSE)+mun[t] } } ############################ # # Pareto Tail-Index alpha (> 1/4) # # Computation of the quantiles using the 1st method (normal+extremes limit) # ############################

# load package 'actuar' # load package 'stabledist' # load package 'R2Cuba' # load package 'distr' # load package 'distrEx' library(actuar) library(stabledist) library(R2Cuba) library(distr) library(distrEx) library(RobAStRDA) library(rrcov) library(RobAStBase) library(pcaPP) library(mvtnorm) library(ROptEst) library(evd) library(RobExtremes)

# Value for the number of aggregation n<-100 # value of alpha alpha <- 3/2

# Condition to get the existence of the p-th moment of the first $n-k$th order stat (condition (35) in the report) p <- 4

######## # Computation of k, k being the fixed (and small) number of upper order statistics that we remove of the total Pareto sum to get a trimmed sum ######## ka=function(alpha,p){ return(floor((p/alpha)-1)+1) }

######## # Computation of the moments of the normal approximation of the trimmed sum when removing the last k upper order stat ########

######## # definition of the function mOney(alpha,n,k,y) for y>1 # If $\alpha\neq 1$, $$ m_1(\a,n,k) = \frac{n-k(\a)}{1-y^{-\a}} \times \frac{1-y^{1-\a}}{1-1/\a} $$ # If $\alpha=1$, $$ m_1(\a,n,k) = \frac{n-k(\a)}{1-y^{-\a}} \times \ln(y $$ ######## mOney=function(alpha,n,k,y){ result=0 if (alpha !=1) { result=(n-k)/(1-1/alpha)*((1-y^(1-alpha))/(1-y^(-alpha))) } else{ result=(n-k)*(log(y)/(1-y^(-alpha))) } return(result) }

######## # Definition of the function mTwo(alpha,n,k,y) given in eqref{m2condi} ######## mTwoy=function(alpha,n,k,y){ result1=result2=result=0 if (alpha !=2) { result2=(1/(1-2/alpha))*((1-y^(2-alpha))/(1-y^(-alpha))) } else{ result2=2*(log(y)/(1-y^(-alpha))) } if (alpha !=1) { result1= result2+((n-k-1)/((1-1/alpha)^2))*(((1-y^(1-alpha))^2)/((1-y^(-alpha))^2)) } else{ result1=result2 + (n-k-1)*((log(y))^2 /(1-y^(-alpha))^2) } return(result=(n-k)*result1) }

######################## ######################## # Case k=1: ######################## # Computation of the cdf of the mixed distribution G(n,alpha,k) given in Th 2.1 # $$ G_{n,\a,1}( x)= \int_1^x \frac{f_{(n)}(y)}{\sigma(y)} \int_0^{x-y}~ \varphi \left(\frac{v-m_1(y)}{\sigma(y)}\right) dv dy $$ ########################

### # Definition of the function $1/(n\alpha) f_{(n)}(x)= x^{-(\alpha+1)}(1-x^{-\alpha})^{n-1}$, density of $\max(X_i)$ ### fquasimax <- function(x,alpha,n){ h=x^(-(alpha+1))*((1-x^(-alpha))^(n-1)) return(h) }

# we will have to multiply fquasimax by $n\alpha/sigma(y)$ to get the 1st part of the integrand

### # Computation of the analytical convolution via integration, using vegas, for a given x ### integrand <-function(arg,weight){ y <- arg[1] v <- arg[2] ff <- (fquasimax(y,alpha,n)/(sqrt(mTwoy(alpha,n,1,y)-mOney(alpha,n,1,y)^2))*(dnorm((v-mOney(alpha,n,1,y))/(sqrt(mTwoy(alpha,n, 1,y)-mOney(alpha,n,1,y)^2)))*((v<=x-y)*1))) return(n*alpha*ff) }

# give x k <- ka(alpha,p) xx <- c(85,86,87,88,89,90,95,96,97,98,99,100,101,102,103) h <- length(xx) z <- as.vector(c(1:h), mode="numeric") print(cbind(alpha, p, n, k)) for (i in 1:h){ x <- xx[i] result <- vegas(2, 1, integrand, lower=c(1,0), upper= c(x,x),rel.tol=1e-3, abs.tol=1e-12, flags=list(verbose=0)) z[i]=result[4] print(cbind(i,xx[i],z[i])) } plot(xx,z, type ='l',xaxs='i',yaxs='i',ylab="", xlab="",ylim=c(0,1))

######################## ######################## # Case k>1 ######################## # Computation of the cdf of the mixed distribution G(n,alpha,k) given in Th 2.1 # $$ G_{n,\a,k}( x)= \int_1^x f_{(n-k+1)}(y)} \int_0^{x-y}~ \varphi_{m_1(y),\sigma(y)}\star h_y^{*(k-1)} (v) dv dy $$ ######################## fnk <- function(y,alpha,nn,k){ prod=1 for (p in 0:(k-1)){ prod= prod*(nn-p)/(k-p) } if (prod<0) prod <- Inf # test condition on a and k prod return(k*prod*alpha*(1-y^(-alpha))^(nn-k)*y^(-k*alpha-1)) } integrand1 <-function(arg,weight){ y <- arg[1] v <- arg[2] m1y <- mOney(alpha,nn,k,y) m2y <- mTwoy(alpha,nn,k,y) X <- Pareto(shape=alpha, Min=y) Y <- Norm(mean=m1y,sd=sqrt(m2y-m1y^2)) S <- X+Y ff <- (fnk(y,alpha,nn,k)*d(S)(v)*((v<=x-y)*1)) # print(ff) return(ff) } integrand2 <-function(arg,weight){ y <- arg[1] v <- arg[2] m1y <- mOney(alpha,nn,k,y) m2y <- mTwoy(alpha,nn,k,y) X <- Pareto(shape=alpha, Min=y) Y <- Norm(mean=m1y,sd=sqrt(m2y-m1y^2)) S <- X+Y for (i in 2:k-1){ S <- S+X } # S ff <- (fnk(y,alpha,nn,k)*d(S)(v)*((v<=x-y)*1)) # print(cbind('int2',ff)) return(ff) }

# first choose k k <- ka(alpha,p) if (k==2){ integrand <- integrand1 } if (k>2){ integrand <- integrand2 }

# define the segmentation of x xx <- c(3750,3800,3850,3900,3950,4000,4050,2780,2770,2760,2750) h <- length(xx) z <- as.vector(c(1:length(h)), mode="numeric") nn <- 0 for (s in 1:length(n)){ # passer de vecteur n à un nombre nn <- n[s] # Imprimer les données print(cbind(alpha, p, nn, k)) for (i in (1:h)){ x <- xx[i] result <- vegas(2, 1, integrand, lower=c(1,0), upper= c(x,x),rel.tol=1e-3, abs.tol=1e-12, flags=list(verbose=0)) z[i]=result[4] # print(cbind(xx,z)) # if (s==1) { # plot(xx,z, type ='l',xaxs='i',yaxs='i',ylab="", xlab="",ylim=c(0,1)) # } # else { # lines(xx,z,type ='l', col="red") # } } ####################### # Exploring the properties of the function to be maximized in the analytical comparison # Method 1 # Estimation of K(a,n) #######################

# #function C(y) ##

C = function(a,y){ mu <- ((1-y^(1-a))/(1-y^(-a)))/(1-1/a) gamma2 <- ((1-y^(2-a))/(1-2/a)-((1-y^(1-a))^2/(1-y^(-a)))/(1-1/a)^2)/(1-y^(-a)) if (a==3){ e1=log(y) e5= -11/3-2*log(mu) e9=0 } # if (a!=3){ else{ e1=y^(3-a)/(3-a) e5=12*mu^(3-a)/(a*(a-1)*(a-2)*(a-3)) e9=1/(a-3) } e2=3*mu*y^(2-a)/(a-2) e3=3*mu^2*y^(1-a)/(a-1) e4=mu^3*y^(-a)/a e6=mu^3/a e7=3*mu^2/(a-1) e8=3*mu/(a-2) ECube <- (e1+e2-e3+e4+e5+e6-e7+e8-e9)*a/(1-y^(-a)) return (ECube/gamma2^(3/2)) }

# # function C(y)/gamma_y # Cg = function(a,y){ mu <- ((1-y^(1-a))/(1-y^(-a)))/(1-1/a) gamma2 <- ((1-y^(2-a))/(1-2/a)-((1-y^(1-a))^2/(1-y^(-a)))/(1-1/a)^2)/(1-y^(-a)) if (a==3){ e1=log(y) e5= -11/3-2*log(mu) e9=0 } # if (a!=3){ else{ e1=y^(3-a)/(3-a) e5=12*mu^(3-a)/(a*(a-1)*(a-2)*(a-3)) e9=1/(a-3) } e2=3*mu*y^(2-a)/(a-2) e3=3*mu^2*y^(1-a)/(a-1) e4=mu^3*y^(-a)/a e6=mu^3/a e7=3*mu^2/(a-1) e8=3*mu/(a-2) ECube <- (e1+e2-e3+e4+e5+e6-e7+e8-e9)*a/(1-y^(-a)) return (ECube/gamma2^2) }

# # function to be maximized; choose b1){ return(Cg(a,y)*(1-y^(-a))^(n-k)*y^(-a*k+b)) }else{ return(C(a,y)*(1-y^(-a))^(n-1)*y^(-a+b)) } } ka = function(a,p){ return(floor((p/a)-1)+1) }

# choice of alpha (k will depend on this choice) a <- 2.01 ### p <- 4 k <- ka(a,p) # choice of an arbitrary beta b <- 0.1*k*a n <- 52 n100 <- 100 n250 <- 250 n500 <- 500

# we look at the function: n times function(y) /beta # variation with n ## h <- 1000 step <- 0.2 yy <- as.vector(c(1:h), mode="numeric") u <- as.vector(c(1:h), mode="numeric") v <- as.vector(c(1:h), mode="numeric") w <- as.vector(c(1:h), mode="numeric") z <- as.vector(c(1:h), mode="numeric") counter <- 1.0001 - step for (i in (1:h)){ counter <- counter+step yy[i] <- counter z[i] <- n^(1-1/a)*FMax(a,b,counter,n,k)/b u[i] <- n100^(1-1/a)*FMax(a,b,counter,n100,k)/b v[i] <- n250^(1-1/a)*FMax(a,b,counter,n250,k)/b w[i] <- n500^(1-1/a)*FMax(a,b,counter,n500,k)/b } vmax <- c(max(z),max(u),max(v),max(w)) max <- max(vmax) nn <- c(n,n100,n250,n500) print(cbind(a)) print(cbind(nn,xmax,vmax)) plot(yy,z, type ='l',xaxs='i',yaxs='i',ylab="", xlab="",ylim=c(0,max*1.1)) text(yy[which.max(z)]+step*30,1.02*max(z),labels="n=52",cex=0.7) # changer le label selon choix de alpha (a) et valeur de beta (b); j'ai enleve k car k=1 ici text(max(yy)*2/3,0.7*max(z), labels="alpha=2.01, beta=0.1*alpha",cex=0.9) lines(yy,u,type ='l', col="green2") text(yy[which.max(u)]+step*52,1.02*max(u),col="green2",labels="n=100",cex=0.7) lines(yy,v,type ='l', col="red") text(yy[which.max(v)]+step*55,1.02*max(v),col="indianred2",labels="n=250",cex=0.7) lines(yy,w,type ='l', col="blue") text(yy[which.max(w)]+step*60,1.02*max(w),col="blue",labels="n=500",cex=0.7)

# without the multiplication by n ## counter <- 0.5001 for (i in (1:h)){ counter <- counter+step yy[i] <- counter z[i] <- FMax(a,b,counter,n,k)/b u[i] <- FMax(a,b,counter,n100,k)/b v[i] <- FMax(a,b,counter,n250,k)/b w[i] <- FMax(a,b,counter,n500,k)/b } vmax <- c(max(z),max(u),max(v),max(w)) max <- max(vmax) plot(yy,z, type ='l',xaxs='i',yaxs='i',ylab="", xlab="",ylim=c(0,max*1.1)) text(300,0.7*max(z), labels="alpha=3/2",cex=0.7) lines(yy,u,type ='l', col="indianred4") lines(yy,v,type ='l', col="indianred3") lines(yy,w,type ='l', col="indianred2")

# # Variation with alpha # a201 <- 2.01 a2 <- 2.1 a52 <- 5/2 a3 <- 3 a35 <- 3.5 a4 <- 4 k201 <- ka(a201,p) k2 <- ka(a2,p) k52 <- ka(a52,p) k3 <- ka(a3,p) k35 <- ka(a35,p) k4 <- ka(a4,p) b201 <- 0.1*k201*a201 b2 <- 0.1*k2*a2 b52 <- 0.1*k52*a52 b3 <- 0.1*k3*a3 b35 <- 0.1*k52*a35 b4 <- 0.1*k4*a4 h <- 1000 step <- 0.05 yy <- as.vector(c(1:h), mode="numeric") u <- as.vector(c(1:h), mode="numeric") v <- as.vector(c(1:h), mode="numeric") w <- as.vector(c(1:h), mode="numeric") ww <- as.vector(c(1:h), mode="numeric") xx <- as.vector(c(1:h), mode="numeric") z <- as.vector(c(1:h), mode="numeric") counter <- 1.0001 - step for (i in (1:h)){ counter <- counter+step yy[i] <- counter z[i] <- n^(1-1/a201)*FMax(a201,b201,counter,n,k201)/b201 u[i] <- n^(1-1/a2)*FMax(a2,b2,counter,n,k2)/b2 v[i] <- n^(1-1/a52)*FMax(a52,b52,counter,n,k52)/b52 w[i] <- n^(1-1/a3)*FMax(a3,b3,counter,n,k3)/b3 ww[i] <- n^(1-1/a35)*FMax(a35,b35,counter,n,k35)/b35 xx[i] <- n^(1-1/a4)*FMax(a4,b4,counter,n,k4)/b4 } vmax <- c(max(z),max(u),max(v),max(w),max(ww),max(xx)) xmax <- c(yy[which.max(z)],yy[which.max(u)],yy[which.max(v)],yy[which.max(w)],yy[which.max(ww)],yy[which.max(xx)]) alpha <- c(a201,a2,a52,a3,a35,a4) print(cbind(n)) print(cbind(alpha,xmax,vmax)) max <- max(vmax) plot(yy,z, type ='l',xaxs='i',yaxs='i',ylab="", xlab="",ylim=c(0,max*1.1)) text(max(yy)*2/3,0.7*max, labels="n=52, beta=0.1*k*alpha",cex=0.9) text(yy[which.max(z)],1.05*max(z), labels="alpha=2.01",cex=0.7) lines(yy,u,type ='l', col=118) text(yy[which.max(u)],1.04*max(u), col=118, labels="alpha=2.1",cex=0.7) lines(yy,v,type ='l', col="green2") text(yy[which.max(v)],1.04*max(v), col="green2", labels="alpha=2.5",cex=0.7) lines(yy,w,type ='l', col="indianred2") text(yy[which.max(w)],1.03*max(w), col="indianred2", labels="alpha=3",cex=0.7) lines(yy,ww,type ='l', col="blue") text(yy[which.max(ww)],1.03*max(ww), col="blue", labels="alpha=3.5",cex=0.7) lines(yy,xx,type ='l', col="magenta") text(yy[which.max(xx)],1.03*max(xx), col="magenta", labels="alpha=4",cex=0.7)