ACTUARIAL MODELLING OF EXTREMAL EVENTS USING TRANSFORMED GENERALIZED EXTREME VALUE DISTRIBUTIONS AND GENERALIZED PARETO DISTRIBUTIONS

A Thesis

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the

Graduate School of the Ohio State University

By

Zhongxian Han, B.S., M.A.S.

*****

The Ohio State University 2003

Doctoral Examination Committee:

Bostwick Wyman, Advisor Approved by Peter March

Gerald Edgar Advisor Department of Mathematics ABSTRACT

In 1928, (EVT) originated in work of Fisher and Tippett de- scribing the behavior of maximum of independent and identically distributed random variables. Various applications have been implemented successfully in many fields such as: actuarial science, , climatology, engineering, and economics and

finance.

This paper begins with introducing examples that extreme value theory comes to encounter. Then classical results from EVT are reviewed and the current research approaches are introduced. In particular, statistical methods are emphasized in detail for the modeling of extremal events. A case study of hurricane damages over the last century is presented using the “excess over threshold” (EOT) method.

In most actual cases, the range of the data collected is finite with an upper bound while the fitted Generalized Extreme Value (GEV) and Generalized Pareto (GPD) distributions have infinite tails. Traditionally this is treated as trivial based on the assumption that the upper bound is so large that no significant result is affected when it is replaced by infinity. However, in certain circumstances, the models can be im- proved by implementing more specific techniques. Different transforms are introduced to rescale the GEV and GPD distributions so that they have finite supports.

ii All classical methods can be applied directly to transformed models if the upper bound is known. In case the upper bound is unknown, we set up models with one additional parameter based on transformed distributions. Properties of the transform functions are studied and applied to find the cumulative density functions (cdfs) and probability density functions (pdfs) of the transformed distributions. We characterize the transformed distribution from the plots of their cdfs and mean residual life. Then we apply our findings to determine which transformed distribution should be used in the models. At the end some results of parameter estimation are obtained through the maximum likelihood method.

iii Dedicated to

My Parents

iv ACKNOWLEDGMENTS

I am deeply indebted to my advisor Dr. Wyman, for his general and his patience over these past years. Particularly he always gives me directions for searching a field which I am really interested in.

I thank Dr. Lijia Guo from University of Central Florida. She has my gratitude both for introducing me the topic of this thesis and directing me for supporting materials. My discusss with her were very helpful in deciding the final topic.

I thank the members of my Examination Committee (Dr. Bostwick Wyman, Peter

March, and Gerald Edgar of the Department of Mathematics and Dr. Mario Peruggia of the Department of ) for their comments, criticisms and suggestions.

Also, I thank my girlfirend, Yun Fu for being a good listener. Also her encour- agement and enlightment helped me to finish this paper more easily.

This document was prepared using -LATEX, the American Mathematical So- AMS ciety’s LATEXmacro system. Most of the results involving statistics run under S-plus and many function plots are generated by Matlab.

v VITA

December 18, 1975 ...... Born - Lanzhou, P.R.China.

1997 ...... B.S., Peking University Beijing, China.

1997 - present ...... Graduate Teaching and Research Asso- ciate, The Ohio State University.

2001 ...... Master in Applied Statistics The Ohio State University.

FIELDS OF STUDY

Major field: Mathematics

Specialization: Extreme Value Theory

vi TABLE OF CONTENTS

Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

Vita ...... vi

List of Tables ...... ix

List of Figures ...... x

CHAPTER PAGE

1 Introduction of Extreme Value Theory ...... 1

1.1 Extremal Events ...... 1 1.2 Modeling Extremal Events ...... 5 1.3 Extreme Value Theory ...... 9 1.4 Generalized Extreme Value Distributions (GEV) . . . . . 13 1.5 Generalized Pareto Distributions (GPD) ...... 17

2 Statistical Methods in Application of EVT ...... 21

2.1 Exploratory Data Analysis ...... 21 2.2 Parameter Estimation for the GEV ...... 26 2.3 The Excess Over Threshold Method ...... 29 2.4 Approximate Normality of the MLE and Profile Likelihood 33

3 Case Study: Hurricane Damages over the Last Century ...... 39

3.1 Data Resource and Information ...... 39 3.2 Fitting Data by the EOT Method: Threshold Selection . 41 3.3 Statistical Analysis of S-Plus Plots ...... 43

vii 4 Transforms of GEV and GPD ...... 47

4.1 Class of Transforms ...... 47 4.2 Properties of Transformed GEV and GPD Distributions . 54 4.3 Conditional Expectation of Excess Over Threshold . . . . 62

5 Modeling over Transformed Distributions ...... 68

5.1 Model Selection: Transformed or Not? ...... 68 5.2 Parameter Estimations of Transformed Distributions . . . 73 5.3 Future Research Directions ...... 77

Bibliography ...... 80

viii LIST OF TABLES

TABLE PAGE

1.1 10 Largest Catastrophe Events in US (Source: ISO’s PCS) ...... 3

1.2 California Earthquake Data (1971 – 1994) ...... 4

1.3 Analog between CLT and EVT ...... 11

3.1 Some descriptive statistics of hurricane data ...... 39

4.1 cdfs and pdfs of transformed GEV distributions ...... 61

4.2 cdfs and pdfs of transformed GPD distributions ...... 61

5.1 Maximum points of the mean excess function for ξ = 0.1, 0.2, ..., 2.0 . 72

ix LIST OF FIGURES

FIGURE PAGE

1.1 Density Plots of Generalized Extreme Distributions ...... 14

3.1 Scatter plot of the hurricane data ...... 40

3.2 Mean residual plot of hurricane data ...... 41

3.3 Parameter estimations for different thresholds ...... 42

3.4 Goodness-of-fit Plots for hurricane data, u = 500 ...... 43

3.5 Profile log-likelihood estimate of shape parameter ξ ...... 44

3.6 Profile log-likelihood estimate of 10 year return level ...... 46 1 4.1 pdf of hyperbolic transformed GPD family when ξ = , 1, 2 . . . . . 58 2 1 4.2 pdf of logarithmic transformed GPD family when ξ = , 1, 2 . . . . . 59 2 4.3 pdf of logarithmic transformed GPD family whenu ˜ = 10, 20, 30 . . . 60

4.4 Plot of m(λ)...... 64

4.5 Plot of mr(λ) ...... 66 5.1 Mean residual life plot for daily rainfall data, Source: Coles[6] . . . . 70

5.2 Mean residual life plot for fire data, Source: Embretchs[9] ...... 71

5.3 Mean excess function of transformed GPD for ξ = 0.1, 0.2, ..., 2.0 . . . 72

x CHAPTER 1

INTRODUCTION OF EXTREME VALUE THEORY

1.1 Extremal Events

Extremal events are also called “rare” events. Extremal events share three charac- teristics: “relatively rareness,” “huge impact,” and “statistical unexpectness.”

From the name of it, extremal events are extreme cases; that is, the chance of occurrence is very small. But over the last decades, we have observed several. We list some recent events with a large impact in time order: Hurricane Andrew (1992),

Northridge earthquake (1994), terrorism attack (2001), disintegration of space shuttle

Columbia (2003), winter blizzard of the northeast (2003). If we assume those events are in different fields and independent of each other, the probability of the occurrence of each event is small. The terminology t-year event has been used by hydrologists to describe how frequent a certain event will occur. In general, suppose for every year, the probability of a certain event is p, where p is relatively small. Given that the events of each year are independent of each other, the number of the year when the event occurs follows a with parameter p, and thus has an 1 expectation of . On the other words, we say Hurricane Andrew is a 30-year event p

1 means that for every year the probability of having a hurricane which is more severe 1 than Hurricane Andrew is approximately . 30 The enormous impact of catastrophic events on our society is deep and long. Not only we need to investigate the causes of such events and develop plans to protect against them, but also we will have to resolve the resulting huge financial loss. Finan- cial plans have been established for the reduction of the impact of extremal events. For example, the Homeland Security Act of 2002 was passed on Nov 15, 2002 to prevent future terrorism attacks. In the industry of, the magnitude of property- casualty risk has become a major topic of research and discussion. Traditionally, insurers buy reinsurance to hedge against catastrophic risk. For a catastrophic event causing more than $5 billion of insured losses, the overwhelming majority is not cov- ered by reinsurance, according to Froot and Connell [11]. On December 11, 1992, the Chicago Board of Trade launched the first generation of catastrophe insurance futures, also called CAT-Futures. The related future PCS options were launched shortly after. CAT-Futures provides a way of transferring insurance risk to the finan- cial market and CAT-Futures can be traded as standardized securities. Table 1.1 lists

10 catastrophic events that caused the largest losses in history of the United States.

Since the population and fortune grow at a relatively constant rate, we expect larger catastrophic events entering the list in the future. The task of understanding and resolving the underlying risk is still a big challenge for researchers.

2 Year Catastrophic Event Estimated Losses (in billions)

2001 Fire and explosion 20.3

1992 Hurricane Andrew 15.5

1994 Northridge earthquake 12.5

1989 Hurricane Hugo 4.2

1998 Hurricane Georges 3.0

2001 Tropical Storm Allison 2.5

1995 Hurricane Opal 2.1

1999 Hurricane Floyd 2.0

2001 St.Louis hailstorm 1.9

1993 Midwest blizzard 1.8

Table 1.1 10 Largest Catastrophe Events in US (Source: ISO’s PCS).

In the modeling of extremal events, statistical methods are commonly used for inference from historical data. However, classical statistical methods treat those extreme data, exactly the point of our interest, just as any other data. In some cases, these data are even treated as “” and dropped from the analysis. Therefore, any forecasting or prediction could be largely off the target. Table 1.2 shows loss- ratios (yearly data) for earthquake insurance in California from 1991 till 1994. The data are taken from Jaffe and Russell [15].

Note: “Loss-ratio” is the ratio (in percentage) of total claims paid to total premi- ums collected during a certain period (usually one year) for policies in a portfolio.

3 Year Loss Ratio Year Loss Ratio Year Loss Ratio

1971 17.4 1979 2.2 1987 22.8

1972 0 1980 9.2 1988 11.5

1973 0.6 1981 0.9 1989 129.8

1974 3.4 1982 0 1990 47.0

1975 0 1983 2.9 1991 17.2

1976 0 1984 5.0 1992 12.8

1977 0.7 1985 1.3 1993 3.2

1978 1.5 1986 9.3 1994 ?

Table 1.2 California Earthquake Data (1971 – 1994).

Suppose we want to make a guess on the 1994 entry by having the data from

1971 to 1993. Any experienced statistician would be impressed by the relatively low values (which means the insurance company makes profits) during all years but 1989

(year of the earthquake in San Francisco). Even though there seems an increasing trend of loss-ratio in time, a reasonable guess would still be between 0 to 50. If we apply a general statistical model (say, a linear regression) for the above data, our prediction to 1994 year data is 31.7, with a 95% confidence interval of [0.0, 83.0]. It is time for the true data, which is amazingly 2272.7 ! Indeed, an earthquake recorded at 6.6 Richter scale hit Northridge, California on Jan 17, 1994. The insured damage was about 12.5 billion while the total damage exceeded 30 billion. Obviously, the insurance and reinsurance industry needs to reevaluate the risk in insuring future

4 damages. Extreme Value Theory (EVT) emerges as a basic tool in modeling such risk.

1.2 Modeling Extremal Events

In the modeling of extremal events, different approaches had been proposed for certain circumstances. We would like to introduce some illustrative examples.

On February 1, 1953, a severe storm hit the Netherlands and caused a record storm surge. At various locations the sea dikes collapsed and the flood hit a large part of coastal Holland. The property and casualty loss was so huge that the Delta- committee was established later by the Dutch government. The government required the Delta-committee to estimate the necessary height of the dikes to prevent damages from any future sea surge for a t-year event where t is sufficiently large.

To simplify the problem, suppose X1,X2, ..., Xn are hourly or daily observations of sea level in a year. Let

Mn = max(X1,X2, ..., Xn) be the annual maximum sea level, which we call a block maximum. A t-year event corresponds to a value xt which is defined by:

1 xt = inf x R : F (x) (1 ) { ∈ ≥ − t } 1 where F is the distribution function of Mn. We can interpret xt as the (1 ) − t upper quantile of the distribution F. As we discussed in section 1.1, assuming that the annual maxima Mn are identically and independently distributed, the expected

5 time of occurrence of an more extremal event than Mn xt is t years. The his- { ≥ } torical data recorded a previous record surge in year 1570 with a recorded value of (NAP + 4)meters(m), where NAP stands for normal Amsterdam level. The

1953 surge recorded with value (NAP + 3.85)m. A quick approximation is that

(NAP + 3.85)m corresponds to a 383-year event. The Delta-committee report led to an estimate of (NAP + 5.14)m for 10,000-year event, based on a total of 166 his- torical observations. For further readings on this topic, the paper by de Haan [13] is recommended.

A second example relates to the stochastic behavior of the stock market. An extremal event occurs when the stock market rallies or crashes. For managers in charge of portfolios worth millions, lowering the risk during market crashes is as important as maximizing the profit. Analysis can be applied to a single stock, stocks in a particular sector, or major indices that represent the overall market movement.

Using stock market crashes as examples, we can trace the overall stock market performance for the last century by the Dow Jones Index. To identify stock market crashes, as pointed out in Mishkin and White [16], October 1929 and October 1987 can be used as benchmarks. On October 28 and 29, 1929, the Dow Jones declined

12.8 and 11.7 percent; and on October 19, 1987, the Dow Jones fell 22.6 percent.

To capture the speed and duration characteristics of a market crash, we can use a moving window with a fixed length. The length could be chosen as one day, two days, one week, one month and one year. The identification of a market crash is equivalent to detect windows that have an overall decline of more than 20%.

6 In modeling market crashes as an extremal events, we let Sn be the close market

th Sn Sn 1 index on the n window. Define Cn = − − be the percentage change in the Sn 1 th − n window. We can split Cn into a positive part Pn and a negative part Nn. To study the market crashes, we define

Mn = min Nn : for n such that Cn < 0 { } The objective is to fit historical data into a model to get a better understanding of the conditional distribution of (Mn u Mn < u) where Mn is defined as above, and u is − | a “high threshold” chosen so that the Extreme Value Theory conditions are satisfied.

On the other hand, we can rewrite Mn as a maximum: −

Mn = max Nn : for n such that Cn < 0 − {− } In the similar way, we can see that any minimization problem can be treated as a maximization problem. Our concern is to fit historical data into a model to get an approximation of its distribution.

Finally, modeling extremal events in climatology is a very interesting task. In general, the iid assumptions are not valid and thus more sophiscated techniques need to be employed. Two examples will be presented as follows.

Weather conditions are an important factor in crop growth of and affect the whole agricultural economy. The study of conditions at a certain location attracts more and more scientists for the purpose of accurate prediction and preven- tion procedures. With available historical data, analysis can be done on a vast of aspects, including long period of drought, heavy precipitation, extreme low temper- atures during winter, etc.

7 One important feather of extreme weather data is the attached spatial character.

Different locations may focus on different categories of weather conditions. For ex- ample, snow storms and extreme low temperature during winter are not concerns at south Florida, but they would affect most areas in the northeast. Also, if we com- pare data collected from New York City and Boston, it is not a surprise to find some positive correlation. How to include spatial factors into the mathematical model is still a task which needs to be explored.

Seasonality is another feather of most weather data. Using extreme low tempera- ture as example, the subject of the study could be simply the absolute low tempera- ture (< 0F ◦ any time) or including relative low temperature (36F ◦ below average). In the later case, we would have to model the non-stationary process of annual temper- atures. In [5], Coles and Tawn fit the expected annual temperature with a sinusoidal curve and then use the difference of expected and actual in the analysis of extreme cases. In the similar way, long term’s trend in the data can be introduced into the model easily.

A second example is the hazard of hurricanes for the southeast coastal area. Caus- ing the costliest losses among natural disasters, the annual variation of hurricane hur- ricanes frequency and intensity needs to be studied. There are many ways to evaluate the intensity of a hurricane, including Saffir-Simpson Hurricane Scale which uses the wind speed at the center of the storm when it is land-falling. For example, hurricane

Andrew of 1992 was firstly assessed to be a Category 4 and later upgraded to be a Category 5 - the highest intensity scale possible - at its land fall in southeastern

Florida with peak sustained winds of 165 mph.

8 However, the total damages that are caused by a hurricane is a more important concern to the insurance industry. In order to make use of historical data, we need to estimate the total loss if a certain hurricane would happen today. There are several factors we need to count: intensity of the hurricane, population affected by hurricane, average fortune index, inflation factors, and the insured property ratio. This issue will be revisited in Chapter 3 in detail, where we will do analysis on the adjusted data.

From next section, we will introduce the main extreme value theory and necessary mathematical and statistical tools for the analysis.

1.3 Extreme Value Theory

Theorem 1.1. Extreme Value Theorem (Fisher and Tippett, 1928) Suppose X1,X2, ..., Xn are iid with distribution (df) F. If there exist constants cn > 0 and dn R such that ∈

Mn dn − Y, n (1.1) cn → → ∞ where Mn = X1,n = max(X1, ..., Xn), Y is non-degenerate with distribution function G. Then G is of one the following types:

1. (Gumbel)

x Λ(x) = exp e− , x R {− } ∈ 2. (Frechet)  0, if x 0 Φα(x) =  ≤  α exp x− , if x > 0 {− } 

9 3. (Weibull)  exp ( x)α , if x < 0 {− − } Ψα(x) =  0, if x 0 ≥  Sketch of the Proof The following theorem is important in the proof.

Theorem 1.2 (Convergence to types theorem). Let A, B, A1,A2, be random ··· variables and bn > 0, βn > 0 and an, αn R be constants. Suppose that ∈ An an d − A bn −→ Then the relation

An αn d − B (1.2) βn −→ holds iff

bn an αn lim = b 0, lim − = a R (1.3) n β n β →∞ n ≥ →∞ n ∈ If (1.2) holds then B d bA+a and a, b are the unique constants for which this holds. → And A is non-degenerate iff b > 0, and then A and B belong to the same type.

Proofs of theorem (1.2) could be found at Gnedenko [12] or Resnick [18]. Equation

(1.1) implies that for any t > 0,

nt F b c(c nt x + d nt ) G(x), x R b c b c → ∈ where denotes the function taking integer part. On the other hand, b·c

nt n nt /n t F b c(c nt x + d nt ) = F b c(c nt x + d nt ) b c G (x), x R b c b c  b c b c  → ∈ So that by theorem (1.2), there exists functions γ(t) > 0, δ(t) R satisfying ∈ cn dn d nt lim = γ(t), lim − b c = δ(t), t > 0 n c nt n c nt →∞ b c →∞ b c 10 and

Ht(x) = H[γ(t)x + δ(t)] (1.4) then

H[γ(st)x + δ(st)] = Hst(x) = Hs[γ(t)x + δ(t)] = H[γ(s)γ(t)x + γ(s)δ(t) + δ(s)] thus, we obtain

γ(st) = γ(s)γ(t), δ(st) = γ(s)δ(t) + δ(s) (1.5)

The solutions of the functional equations (1.5) are (up to some constants):

 γ(t) = 1  γ(t) = tξ  γ(t) = at    (1.6) δ(t) = ln t δ(t) = 0 δ(t) = 0 −    Finally, plugging the above results into (1.4) leads to the three types Λ, Φα and Ψα.

Details of the proof are to be found in Resnick [18], Proposition 0.3.  The Central Limit Theorem (CLT) is one of the most important tools in probabil- ity theory and statistics, stating that the is the only distribution of iid sums under certain conditions. As an analog, EVT states the three types (stan- dard GEV) are the only possibilities for the limiting distribution of the maxima. We have a brief comparison of these two theorem in the following table:

Analog CLT EVT

Conditions on (Xn) iid with finite second moment iid and df is regularly varying

Study Object Sn (Sum) Mn (Maxima) Limiting Distribution Normal GEV

Table 1.3 Analog between CLT and EVT.

11 For the next three examples, we simply provide simple distributions that lead to three types of limiting distribution in Theorem(1.1).

Example 1 Maxima for

Let (Xi) be a sequence of iid standard exponential random variables. So the cdf

x is F (x) = 1 e− and then −

n P (Mn ln n x) = [F (x + ln n)] − ≤ x ln n n = 1 e− − − x n e− = 1  − n x exp e− → {− } = Λ(x)

For x R. Here the normalizing constants cn = 1, dn = ln n. ∈  Example 2 Maxima for

Let (Xi) be a sequence of iid standard Cauchy random variables. So the pdf is 1 f(x) = and then π(1 + x2)

x 1 1 1 F (x) = dt = + tan 1(x) Z π(1 + t2) 2 π − −∞ Apply l’Hopital’s rule we obtain

1 1 1 F¯(nx/π) = 1 F (nx/π) = tan− (nx/π) − 2 − π 1 n 1 F¯(nx/π) −π π (nx/π)2 + 1 n2x2 lim = lim = lim = 1 x (nx) 1 x 1 1 x n2x2 + π2 →∞ − →∞ →∞ −n x2

12 Hence n nx ¯ nx P (Mn ) = 1 F ( ) ≤ π h − π i 1 1 n = 1  + o(1) − n x 1 exp x− → {− } = Φ1(x) n For x > 0. Here the normalizing constants are c = , d = 0. n π n  Example 3 Maxima for uniform distribution

Let (Xi) be a sequence of iid uniform random variables on [0, 1]. So the cdf is F (x) = x for 0 x 1 and then ≤ ≤ x n P (n(Mn 1) x) = 1 + − ≤  n exp x → { } = Ψ1(x) 1 For x < 0. Here the normalizing constants c = , d = 1. n n n 

1.4 Generalized Extreme Value Distributions (GEV)

The three types of limiting distribution in section 1.3 are in standard form. We can parameterize them within the location and scale families.

1. (Gumbel) x d Λ(x) = exp  exp   −  , x R − − c ∈ 2. (Frechet)  0, if x d Φα(x) =  ≤  x d α exp ( − )− , if x > d {− c }  13 3. (Weibull)

x d α  exp ( − ) , if x < d {− − c } Ψα(x) =  0, if x d ≥ 

Figure 1.1: Density Plots of Generalized Extreme Distributions

It is clear from Figure(1.1) that the three types of distributions have different support and tail behavior. The density of the decays exponen- tially as the decaying in algebraic order for the left tail in and the right tail in Frechet distribution. It is advantageous to reformulate these three types into one family of distributions.

14 Lemma The generalized Gumbel, Frechet and Weibull families can be combined into a single family of distributions in the form

1/ξ x µ − x µ G(x) = exp ( 1 + ξ  −  ) where 1 + ξ  −  > 0 (1.7) − σ σ

It is straightforward to check the result by letting σ when ξ > 0 1 σ  ξ α = , d = µ , c =  σ (1.8) ξ − ξ when ξ < 0 − ξ   The model has three parameters: a µ R, a ∈ σ > 0 and a shape parameter ξ R. The generalized Frechet and Weibull distribu- ∈ tions correspond respectively to the cases ξ > 0 and ξ < 0. When ξ = 0, we can take the limit of (1.7) as ξ 0 to obtain the following Gumbel family with distribution: → x µ G(x) = exp  exp   −  where x R (1.9) − − σ ∈

In general, the normalizing constants in Theorem(1.1) are difficult to find. This problem can be solved by the introduced GEV family of distributions. It is easy to see that if

Mn dn P r − x G(x) { cn ≤ } ∼ for large enough n, then

x dn P r Mn x G( − ) { ≤ } ∼ cn

x dn Since G is a member of GEV family, G( − ) is also a member of GEV family cn with different location and scale parameters. In practice, the distribution of Mn itself

15 can also be approximated by a member of GEV family. Therefore the location and scale parameters µ, σ in the estimation depend on the data size n but not the shape parameter ξ. To emphasize it for the later statistical analysis, we state the above result as a theorem:

Theorem 1.3. If there exist constants cn > 0 and dn R such that ∈

Mn dn P r − x G(x) as n (1.10) { cn ≤ } → → ∞ for a non- G, then G is a member of the GEV family as in equation(1.7) where µ, ξ R and σ > 0. Moreover, the approximate distribution of ∈ Mn itself is a different member of the same family with the same shape parameter ξ.



In practice, data are blocked into sequences of independent observations X1,X2, ..., Xn for some large integer n, to generate a series of block maxima Mn,1,Mn,2, ..., Mn,m. The block maxima will be fitted to a GEV distribution and corresponding parameters will be estimated. A natural way of choosing the blocks is to use data from a certain time period so that seasonality effects can be neglected. For example, hourly observed sea level data can be blocked into monthly maxima and daily observed rainfall data can be blocked into annual maxima.

The method above has one disadvantage that it may throw away lots of useful information since only the block maxima are recorded. An alternative method is to use upper order statistics, where the EVT can be used as the fundamental tool to approximate the joint distributions of the k largest observations in a block. For reference books, David([8]) and Arnold([1]) are recommended.

16 1.5 Generalized Pareto Distributions (GPD)

As the GEV in the previous section describes the limit distribution of normalized maxima, the Generalized (GPD) is the limit distribution of scaled excess of high thresholds. The main connection is in the following theorem.

Theorem 1.4. the Generalized Pareto Distribution (GPD). Suppose X1,X2, ..., Xn are iid with distribution F. As in Theorem(1.3),

1/ξ x µ − x µ G(x) = exp ( 1 + ξ  −  ) where 1 + ξ  −  > 0 − σ σ is the limit distribution of the maxima Mn = X1,n = max(X1, ..., Xn). Then for a large enough threshold u, the conditional distribution function of Y = (X u X > u), − | is approximately

1/ξ ξx − P r X u < x X > u H(x) = 1 1 +  (1.11) { − | } ∼ − σ˜ defined on x : x > 0 and (1 + ξx/σ˜) > 0 , where { }

σ˜ = σ + ξ(u µ) −

In case ξ = 0, the limit distribution should be interpreted as

H(x) = 1 exp (1 x/σ˜), for x > 0 − −

Sketch of the proof First, notice that

P r X > u + x 1 F (u + x) P r X u > x X > u = { } = − (1.12) { − | } P r X > u 1 F (u) { } −

17 For large u, F (u) 1 and by Taylor expansion →

ln[F (u)] = ln 1 + [F (u) 1] [1 F (u)] { − } ≈ − −

n and since P r Mn < u = F (u) G(u), we have { } ≈ 1/ξ x µ − u µ F n(u) exp ( 1 + ξ  −  ) where 1 + ξ  −  > 0 ≈ − σ σ

Taking ln( ) on both sides gives the approximation of ln[F (u)] · 1/ξ 1 x µ − ln[F (u)] 1 + ξ  −  ≈ −n σ

Put it back into equation(1.12) so we have

1/ξ 1 u + x µ − 1 + ξ  −  −n σ P r X u > x X > u 1/ξ { − | } ≈ 1 x µ − 1 + ξ  −  −n σ The result of the theorem comes from the simplification of the right hand side of the above equation.  The family of distributions defined by equation(1.11) is called the General Pareto family (GPD). For a fixed high threshold u, the two parameters are the shape pa- rameter ξ and the scale parameterσ ˜. For simpler notation, we may just use σ for the scale parameter if there is no confusion. The GPD distribution has many good properties, see for example Embrechs [9], page 165. For the purpose of this paper, we particularly compute the expectation of the GPD.

Definition 1.1. Let X be a with distribution function F (x).

Fu(x) = P r X u x X > u , for x 0 { − ≤ | } ≥ 18 is the excess distribution of X over the threshold u. Define

e(u) = E(X u X > u) (1.13) − | and e(u) is called the mean excess function of X.

Suppose a random variable X follows GPD with parameters ξ > 0 and σ. Then

e(u) = E(X u X > u) − | ∞ x u = Z − dH(x) u 1 H(u) 1 − ∞ = Z (x u) d[1 H(x)] 1 H(u) u − − − − 1 ∞ =  lim x[1 H(x)] + Z [1 H(x)]dx 1 H(u) x − − − − →∞ u 1/ξ ξx − where H(x) is defined in equation(1.11), so 1 H(x) = 1 +  . The integra- − σ tion ∞ (x u) d[1 H(x)] converges only when ξ < 1, where lim x[1 H(x)] = 0. Z x u − − − →∞ − − And when 0 < ξ < 1, 1/ξ 1/ξ+1 ∞ ∞ ξx − σ ξu − Z [1 H(x)]dx = Z 1 +  dx = 1 +  − σ 1 ξ σ u u − thus, the mean excess function of GPD is linear in u when 0 < ξ < 1. σ + ξu for 0 < ξ < 1  1 ξ e(u) =  − for ξ 1  ∞ ≥ At the end of this section, we present the three examples in Section 1.4 in terms of the excess of threshold model with GPD. In other words, we will compute the 1 F (u + x) limit distribution function P r X u > x X > u = − for each example. { − | } 1 F (u) − x Example 1 For the standard exponential model, F (x) = 1 e− for x > 0. Then − (u+x) 1 F (u + x) e− u − = = e− 1 F (u) e x − − 19 for all x > 0. The distribution of exceedances over excess itself is exponential distri- bution, corresponding to ξ = 0 and σ = 1 in the General Pareto family.  1/x Example 2 For the standard Frechet model, F (x) = e− for x > 0. Then

1/(u+x) 1 1 F (u + x) 1 e− 1/(u + x) x − = − = 1 + − as u 1 F (u) 1 e 1/u ≈ 1/u  u → ∞ − − − This corresponds to the case of ξ = 1 and σ = u in the General Pareto family.  Example 3 For the uniform distribution model, F (x) = x for 0 x 0. Then ≤ ≤ 1 F (u + x) 1 (u + x) ( 1) x − = − = 1 + − 1 F (u) 1 u 1 u − − − This corresponds to ξ = 1 and σ = 1 u in the General Pareto family. − − 

20 CHAPTER 2

STATISTICAL METHODS IN APPLICATION OF EVT

2.1 Exploratory Data Analysis

In the real world, the extreme value theory as we described in chapter 1 needs to be applied through statistical data: the observed sea level, major insurance claims, large variation of security market values over a certain time period, daily records of temperature and precipitation at a certain location, etc. We hope the modeling of the empirical data through extremes would manifest most of these variations. Appro- priate models can be used to manage financial risks, to set up prevention procedure or to obtain estimations and predictions.

Before getting into any statistical analysis, we want to learn our best knowledge of the data set: How is the data collected? Is there any missing or unreported data?

Since it is particularly important that the data is error-free for extreme observations, we should give special attentions to data that are outliers.

The first step in exploratory analysis is to have a scatter plot of all observations.

It is important just by looking at data to obtain the following information: range of data, several extremes, trends and seasonalities, any violation of independence and stationarity conditions. If trends and seasonalities are suspected, we recommend to

21 fit the location, scale and shape parameters with a time variable t. See for example, chapter 6 in Coles [6] and references therein.

Embrechts, Kluppelberg and Mikosch’s book [9] and its references give a thorough list of both graphical and analytical methods, and in Bassi [3], a survival kit of some terminologies on quantile estimation is provided. We have introduced the mean excess function in section 1.5, which is crucial in the later analysis on the determination of choice of transforms. We will emphasize two other methods used in this paper: QQ- plots, the return period and return levels.

Probability and Quantile Plots (QQ-plot)

Suppose X1,X2, ..., Xn are continuous random variables and x1, x2, ..., xn are inde- pendent observations from a common population with unknown distribution F . An estimate of F , say Fˆ, has been obtained. The probability and quantile plots provide a graphical accessment to the fitted distribution Fˆ. It follows from the Quantile

Transformation Lemma ([9] Lemma 4.1.9) that F (Xi) has a uniform distribution on

(0,1) for i = 1, ..., n. Furthermore, if x(1) x(2) x(n) are the ordered sample, ≤ ≤ · · · ≤ then the expectation of these quantiles can be computed as:

i E[F (X )] = , for i = 1, ..., n (i) n + 1 and this leads to the following definition.

Definition 2.1. Given an ordered sample of independent observations

x(1) x(2) x(n) ≤ ≤ · · · ≤

22 from a population with estimated distribution function Fˆ, a graph

ˆ i F (x(i)),  : i = 1, ..., n n + 1 is called a probability plot (PP-plot). 

If Fˆ is a reasonable estimation of the population distribution, the points on the probability plots should lie close to the unit diagonal. Substantial deviation from linearity provides evidence of the lack of fit of Fˆ to the model. This plot can be modified by using the data values as the x-values of the points. The following is more commonly used:

Definition 2.2. Given an ordered sample of independent observations

x(1) x(2) x(n) ≤ ≤ · · · ≤ from a population with estimated distribution function Fˆ, a graph

ˆ 1 i F − ( ), x(i) : i = 1, ..., n n + 1 is called a quantile plot (QQ-plot).  i The quantities x is the empirical quantile of the population distribution F (i) n + 1 i while Fˆ 1( ) is the estimation. If Fˆ is a reasonable estimation of F, the quantile − n + 1 plot should look roughly linear. This remains true if the data come from a linear transformation of the distribution. And since that, the change of location and scale parameters only change the plot through y-intercept and slope.

Outliers can be easily identified on the QQ-plot in a general statistical analysis.

While the subject of the extreme value study concerns the upper tail, we should be

23 particularly cautious about any point that substantially deviates from the model on the large observation end. Since the shape parameter ξ determines how heavy the tail distribution is, some difference in distributional shape may be deduced from the plot. In general, an overestimation of ξ (heavy tail) will result in a concave down curve in the QQ-plot, and an underestimation of ξ (light tail) will result in a concave up curve in the QQ-plot.

The Return Period and The Return Level

The return period and the return level are very silimar to the t-year event we mentioned in section 1.1. We make the definition precise as below:

Definition 2.3. Let (Xi) be a sequence of iid random variables with continuous dis-

tribution F and u a given threshold. Consider the sequence (I(Xi>u)) of iid Bernoulli random variables with success probability p = 1 F (u). Then − 1 E[L(u)] = p is called the return period of the event (Xi > u), where L(u) = min i 1 : Xi > u { ≥ } is the time of first exceedance of the threshold u. L(u) follows a geometric distribution with parameter p. 

From the definition, return period is the expected revisiting period corresponding to the threshold u. This leads to the other definition:

Definition 2.4. Let (Xi) be a sequence of iid random variables with continuous distribution F . if

F (zp) = 1 p (2.1) − 24 1 then z is called the return level associated with the return period . p p  For the sea dikes example in chapter 1, in the above language, one would say that the 1953 sea level corresponds to a return period of 382 years. The expectation from the government is a dike level that can sustain a 1,000 or even 10,000 year event.

And it is clear from this example that the solution of extremal events needs to be able to make inferences about very high quantiles.

Combining equation 1.7 and 2.1, we can solve the return level zq in terms of parameters in the model and p.

σ ξ zp = µ 1 [ ln(1 p)]− for ξ = 0 − ξ  − − − 6 In the similar way, combining 1.9 and 2.1 in case ξ = 0, we obtain:

zp = µ σ ln[ ln(1 p)] for ξ = 0 − − −

Define yp = ln(1 p), we have − − σ ξ  µ [1 yp− ] for ξ = 0 − ξ − 6 zp =  (2.2) µ σ ln yp for ξ = 0  − The plot of zp against ln yp is a referred as return level plot. When ξ = 0, the − plot is linear with µ and σ as y-intercept and slope. If ξ > 0, the plot is convex with σ asymptotic limit µ as p 0; if ξ > 0, the plot is concave and unbounded as p 0. − ξ → → The return level plots are convenient for both model presentation and validation, for its simple interpretation and highlighted extrapolation of the tail distribution under the compressed scale.

Often the number of blocks n is chosen to correspond to a time period of length one year, then the block maxima is annual maxima and the return period is in years.

25 2.2 Parameter Estimation for the GEV

Once the model has been set up, parameters in the model need to be estimated using appropriate procedure(s). Suppose X1,X2, ..., Xn is a sequence of random variables of iid GEV distribution with parameters ξ, µ and σ > 0, and x1, x2, ..., xn are the recorded observations. As in section 1.4, the distribution function G(x) is

1/ξ x µ − x µ  exp ( 1 + ξ  −  ) when ξ = 0, 1 + ξ  −  > 0  − σ 6 σ G(x) =   x µ exp  exp   −  when ξ = 0, x R  − − σ ∈  In this section, we introduce two methods: maximum likelihood method and method of probability-weighted moments. For Bayesian estimations, Coles and Powell [7] give a general review.

Maximum Likelihood Method The method of maximum likelihood is, by far, the most popular technique for deriving .

Definition 2.5. Let f(x θ) denote the joint pdf or pmf of the sample X = (X1,X2, ..., Xn). | Given that X = x = (x1, x2, ..., xn) is observed, the function of θ defined by

L(θ X) = f(X θ) | | is called the likelihood function, and

l(θ X) = ln L(θ X) | | defines the log-likelihood function. The maximum likelihood (MLE)

θˆ(X) is the function of X that maximizes L(θ X) or l(θ X). | | 

26 As an example, we consider the Gumbel family where ξ = 0.

n n xi µ xi µ 1 L((0; µ, σ) x) = exp " exp  − # exp "  − # X σ X σ σn | − i=1 − · − i=1 · Thus, the log-likelihood is

n n xi µ xi µ l((0; µ, σ) x) = exp  −   −  n ln σ X σ X σ | − i=1 − − i=1 − One approach to locate the maximum is to differentiate the above equation with respect to µ and σ and set them to zero. This leads to the following likelihood equations: n 1 xi µ n  exp  −  + = 0 −σ X − σ σ  i=1  n (2.3) xi µ xi µ n  − 1 exp  −  = 0 X σ2 − − σ − σ  i=1  There exists no explicit solution to equations(2.3). Numerical procedures to find maximums are called for the approximation. The likelihood equations for the case

ξ = 0 is more complicated. For the and asymptotic properties of the MLE, 6 it is shown in Smith [19] that the classical properties of the MLE hold whenever 1 1 ξ > ; but not for ξ in general. −2 ≤ −2  Method of Probability-Weighted Moments The technique of moments method is to equate the model-moments with empirical moments. Even though the general properties of obtained estimators can be unreliable, the method of moments can be very useful in obtaining an initial approximation of the intended statistics. The class of probability-weighted moments stands out to be more promising in obtaining a

27 good first estimate. The results are used as the initial values for other methods when numerical techniques are applied. Let θ = (ξ; µ, σ) and define

r wr(θ) = E [XGθ(X)] for r = 0, 1, 2, ...

In case ξ < 1, calculation yields

1 σ ξ wr(θ) = µ [1 Γ(1 ξ)(1 + r) ] (2.4) r + 1 − ξ − −

t 1 x where Γ denotes the Gamma function Γ(t) = ∞ x − e− dx for t > 0. Choosing R0 r = 0, 1, 2, we can then immediately obtain σ w0(θ) = µ [1 Γ(1 ξ)]  − ξ − −   σ ξ  2w1(θ) w0(θ) = Γ(1 ξ)(2 1) (2.5)  − ξ − − σ ξ  3w2(θ) w0(θ) = Γ(1 ξ)(3 1)  − ξ − −  (ξ; µ, σ) can be explicitly solved from the above system (2.5). For example

ξ 2 1 2w1(θ) w0(θ) ξ − = − (2.6) 3 1 3w2(θ) w0(θ) − −

Parameter estimation is obtained by replacing the model-moments wr(θ) in (2.6) by empirical momentsw ˆr(θ). To obtain empirical moments, notice

1 n wˆ (θ) = X Gr(X ), for r = 0, 1, 2 (2.7) r n X (j) θ (j) j=1 where X(1),X(2), ..., X(n) are order sample. Again it follows from the Quantile Trans- formation Lemma ([9] Lemma 4.1.9) that

d (Gθ(X(1)), ..., Gθ(X(n))) = (U(1), ..., U(n))

28 where U(1),U(2), ..., U(n) are the order statistics of an iid sequence U1,U2, ..., Un uni- formly distributed on (0, 1). Thus, 2.7 can be rewritten as: 1 n wˆ (θ) = X U r , for r = 0, 1, 2 (2.8) r n X (j) (j) j=1 r where U(j) are often approximated by their expectations. 

2.3 The Excess Over Threshold Method

Model framework and parameter estimation

The modeling using the excess over threshold follows the assumptions and con- clusions in Theorem(1.4). Suppose x1, x2, ..., xn are raw observations independently from a common distribution F . Given a high threshold u, assume x(1), x(2), ..., x(k) are observations that exceeds u. Here we abuse our notation and define the exceedances as xi = x(i) u for i = 1, 2, ..., k. By Theorem(1.4), xi may be regarded as realization − of independently random variable which follows a General Pareto family with un- known parameters ξ and σ. General parameter estimation procedure can be applied to obtain references, followed by goodness of fit procedure and further extrapolation.

As we applied two parameter estimation methods to GEV in Section 2.2, we use both two methods in the parameter estimation of GPD model. In case ξ = 0, the 6 likelihood function can be obtained directly from (1.11):

k 1/ξ 1 1 ξxi − − L((ξ, σ) x) = " 1 +  # Y σ σ | i=1 The log-likelihood function is k 1 ξxi l((ξ, σ) x) = k ln σ ( + 1) ln 1 +  ξ X σ | − − i=1

29 Taking partial derivatives of l((ξ, σ) x) with respect to ξ and σ, we obtain |

k k  1 ξxi xi 2 ln 1 +  = 0  ξ X σ − X σ + ξxi  i=1 i=1 (2.9)  n x k + (1 + ξ) i = 0  − X σ + ξx  i=1 i  ξ We can solve (2.9) through a reparametrization (ξ, σ) (ξ, λ) where λ = . The → σ solution is determined through

k k λxi ln(1 + λxi) X X 1 + λxi ξ = i=1 and ξ = i=1 k k λxi 1 X 1 + λx X 1 + λx i=1 i i=1 i 1 Again the MLE properties like consistency and asymptotic efficiency hold if ξ > . −2 ˆ Assume Hθ(x) is the GPD distribution where θ = (ξ, σ). Let Hθ(x) = 1 Hθ(x) − and define the probability-weighted moment

ˆ r wr(θ) = E [XHθ (X)] for r = 0, 1, 2, ...

Direct calculation leads to

σ σ w = and w = 0 1 ξ 1 4 2ξ − − and we obtain the solution

w0 4w1 2w0w1 ξ = − and σ = w0 2w1 w0 2w1 − −

And finally replace w0 and w1 by empirical moments to obtain estimation. 

30 Threshold selection As we vary threshold u from low to high, there is an important trade-off between bias and . If u is chosen small, the asymptotic basis of Theorem(1.4) is violated and there may be systematic bias. A very large choice of u would directly decrease the number of exceedance so the variance of estimation is high. The standard practice is to adopt as low a threshold as possible such that the limiting model still provides a reasonable estimation. Two methods are available for this purpose: one is through an explanatory plot (mean residual life plot) prior to model estimation; the other is through the analysis of the stability of model estimation, based on the fitting of models across a range of different thresholds.

As we discussed in Section 1.5, the mean excess function of GPD

σ + ξu e(u) = for 0 < ξ < 1 1 ξ − is linear in u. Therefore we can check the linearity in the plot described blow:

Definition 2.6. Given a sample of iid observations x1, x2, ..., xn and a threshold u.

Let x(1) x(2) x(n ) be the observations that exceeds u, a graph ≤ ≤ · · · ≤ u 1 nu ( u, (x u)! : x u < x ) n X (i) min max u i=1 − ≤ is called a mean residual life plot (mrl). 

The mean residual life plot provides an accessible approximation to the mean excess function. As we will discuss in the next section, confidence intervals can be added to the plot based on the approximate normality of sample means.

In practice, the interpretation of a mean residual life plot may not be simple.

Often the linearity is vague for small choice of u and for large u, the sparseness of

31 the data available for calculation causes the large variation of the plot toward the right end. For our purposes, the mlr plot is used as a graphical tool in distinguishing between light- and heavy-tail models.

A complementary technique is to fit the generalized Pareto distribution at a range of thresholds, and to look for stability of parameter estimates. Suppose the conditions in Theorem(1.4) are satisfied, then for threshold u which is large enough, the shape parameter is nearly constant. And in the estimation

σ˜ = σ + ξ(u µ) − if we reparametrize σ∗ =σ ˜ ξu, then the estimation of σ∗ should be also nearly − constant if the threshold u is large enough that the Theorem is approximately valid. ˆ This argument suggests that by plotting estimations ξ and σˆ∗ against a range of u, we can select the threshold as the lowest value of u such that these estimates remain near-constant. Again, confidence intervals for each estimate can be combined into the plot after finding approximate normality means.  The return level for a N-year period

In the approach of the excess over threshold method, the number of exceedances over a time period follows a for a chosen threshold u. Suppose that a generalized Pareto distribution with parameters ξ and σ is a suitable model, then for x > 0, 1/ξ x µ − P r X > x X > u = 1 + ξ  −  { | } σ

Let pu = P r X > u , then the m-observation return level xm is the solution of { } 1 P r X > xm = { } m 32 use the fact that P r X > xm = P r X > xm X > u P r X > u , we obtain { } { | }· { } 1/ξ xm µ − 1 1 + ξ  −  pu = σ · m or explicitly

σ ξ xm = µ + (mpu) 1 (2.10) ξ  −  provided ξ = 0 and m is large enough to ensure xm > u. For the case ξ = 0, we can 6 obtain an explicit return level in the same way:

xm = µ + σ ln(mpu)

Suppose we have approximately ny observations annually, then for a N year period, the number of observations that exceeds the threshold u follow a binomial distribution with parameter N ny and pu. Thus the corresponding return level ·

σ ξ  µ + (N ny pu) 1 for ξ = 0 ξ  · · −  6 xN =  (2.11) µ + σ ln(N ny pu) for ξ = 0  · · 

2.4 Approximate Normality of the MLE and Profile Likeli-

hood

Various standard and widely applicable approximations are available for many use- ful sampling distributions if the maximum likelihood method is used for parameter estimations. For references, Azzalini [2] and Casella & Berger [4] provide the basics of statistics inference and modeling in these aspects. Here, we briefly list the major

33 results without introducing any detailed proofs. Each of the results is an asymptotic law which becomes more accurate as the sample size n increases to infinity. And they are valid under certain regularity conditions. We will assume these conditions to be satisfied (or nearly satisfied).

Theorem 2.1. Let x1, x2, ..., xn be independent observations from a common distribu- tion within a parametric family . Let l(θ x) and θ0 denote the log-likelihood function F | and the MLE of the d-dimensional model parameter θ0. Then, under suitable regu- larity conditions, for large n

ˆ d 1 θ0 MVNd(θ0,IE(θ0)− ) ∼ where MVNd denotes the d-dimensional multivariate normal distribution and

∂2 IE(θ) = E  l(θ) −∂θi∂θj 1 i,j d ≤ ≤ is a d d matrix which measures the expected curvature of the log-likelihood surface. × 

Theorem(2.1) can be used to obtain approximate confidence intervals for individ- ual components of θ0 = (θ1, ..., θd). Denoting IE(θ0) = ψi,j , it follows from the { } properties of the multivariate normal distribution that for large n,

ˆ d θi N(θi, ψi,i) ∼

Since the true value of θ0 is generally unknown, it is usual to approximate the terms of IE(θ) with those 2 ∂ ˆ IO(θ) =  l(θ) −∂θi∂θj 1 i,j d ≤ ≤ 34 ˆ ˆ where θ is the MLE estimate of θ. Again, denote IO(θ0) = nψi,jo, then an approxi- mate (1 α) confidence interval for θi is − ˆ ˆ θi z α qψi,i ± 2

IE(θ) and IO(θ) are usually referred to as the expected information matrix and observed information matrix respectively.

Perhaps one of the most useful properties of maximum likelihood estimators is what has come to be known as the invariance property. Suppose that a distribution is parameterized by θ, but our interest is in finding inferences for some function of

θ, say τ(θ), where τ(θ) may have a different dimension to θ. We state the invariance property as a theorem:

Theorem 2.2. Invariance Property of MLEs If θˆ is the MLE of θ, then for ˆ any function τ(θ), τ(θ) is the MLE of τ(θ). 

It is easy to verify the result in case τ(θ) is a one-to-one function. For more general function of τ(θ), yet the above theorem holds only for the induced likelihood function L∗, given by

L∗(η x) = sup L(θ x) | θ:τ(θ)=η | { } The valueη ˆ which maximizes L∗(η x) will be called the MLE of η = τ(θ). | ˆ Theorem 2.3. Let θ0 be the large-sample MLE of the d-dimensional parameter

θ0 with approximate variance-covariance matrix Vθ. Then if φ = g(θ) is a scalar function, the maximum likelihood estimator of φ0 = g(θ0) satisfies

ˆ d φ0 N(φ0,Vφ) ∼ 35 T T ∂φ ∂φ ˆ where Vφ = φ Vθ φ and φ =  , ...,  evaluated at θ0.  ∇ ∇ ∇ ∂θ1 ∂θd

Theorem (2.3) is usually referred to as the delta method, which enables the approximate normality of φ0 to be used in obtaining confidence intervals for φ.

As an alternative way of making inferences on a parameter component θi of a param- eter vector θ, a profile log-likelihood method often provide unsymmetric, yet more

c c accurate results. The log-likelihood for θ can be written as l(θi, θi ), where θi denotes all components of θ excluding θi. The profile log-likelihood for θi is defined as

l (θ ) = max l(θ , θc) p i c i i θi

From the definition, given a fixed θi, the profile log-likelihood is the maximized log- likelihood with respect to all other components of θ. In other words, lp(θi) is the profile of the log-likelihood surface view from the θi axis. A profile log-likelihood plot consists points of (lp(θi), θi) for a range of θi and a confidence region described below. A generalized situation to the profile likelihood is to partition θ into two parts,

(θ(1), θ(2)), of which θ(1) is the k-dimensional vector of interest and θ(2) corresponds to the remaining (d k) components. the profile log-likelihood for θ(1) is defined − similarly,

(1) (1) (2) lp(θ ) = max l(θ , θ ) θ(1) The previous definion is the case when k = 1.

Theorem 2.4. Let x1, x2, ..., xn be independent observations from a common distri- ˆ bution within a parametric family . Let θ0 denote the MLE of the d-dimensional F

36 (1) (2) (1) model parameter θ0 = (θ , θ ), where θ is a k-dimensional subset of θ0. Then, under suitable regularity conditions, for large n

(1) ˆ (1) d 2 Dp(θ ) =4 2 l(θ0) lp(θ ) χ { − } ∼ k



(1) 2 Dp(θ ) is usually referred to as the deviance statistic and χk is the known chi-square distribution with index k. Applying Theorem 2.4 directly to the profile likelihood, we can obtain a confidence interval for θi.

Corollary 2.1. Suppose conditions in Theorem 2.4 are all satisfied and k = 1, θ(1) =

(2) c θi, θ = θi then,

Cα = θi : Dp(θi) cα { ≤ }

2 is a (1 α) confidence interval, where cα is the (1 α) quantile of the χ distribution. − − 1 

Another application of Theorem 2.4 is to model selection or model reduction.

Suppose our interest is whether some factors should be introduced in explaining the variations of the data. The procedure is to obtain the maximized likelihood for both models with and without the intended factors. The magnitude of the improvement in the maximized log-likelihood when the new factors are added determines the ac- ceptance or rejection of our hypothesis.

37 (2) Corollary 2.2. Suppose 0 with parameter θ is the sub-model of model 1 with M M (1) (2) parameter θ0 = (θ , θ ) under the constraint that the k-dimensional sub-vector θ(1) = c, where c is known. Then

d 2 D =4 2 l1( 1) l0( 0) χ { M − M } ∼ k where l0( 0) and l1( 1) are the maximized log-likelihood values for model 0 and M M M 1 respectively. A hypothesis test of the validity of model 0 relative to 1 at a M M M level of significance is to reject 0 in favor of 1 if D > cα, where cα is the (1 α) M M − 2 quantile of the χk distribution. 

38 CHAPTER 3

CASE STUDY: HURRICANE DAMAGES OVER THE

LAST CENTURY

3.1 Data Resource and Information

As discussed in chapter 1, we need to work on the raw data so that the total damage

(in millions) reflects the impact of the corresponding hurricane if it happened today, and thus make the estimation and prediction meaningful. As in Pielke [17], it is as- sumed that losses are proportional to three factors: inflation, wealth and population.

The result of normalizing the data is to produce the estimated impact of any storm as if it had made land-falling in 1995. Before we get into detailed analysis of the data, let’s take a look at some descriptive statistics of the data and the scatter plot:

Statistics Observed Data

Maximum $72.3 Billion

Mean $6.7 Billion

the Ratio of Maximum and Sum 0.04

$10B Return Period 9 years (approx)

Table 3.1 Some descriptive statistics of hurricane data.

39 Figure 3.1: Scatter plot of the hurricane data

There are 253 normalized data in total, range from year 1900 to 1995. In the data from year 1900 to year 1925, all hurricanes that have small damages amount were omitted from the records. Since the omission of small amount of data does not affect the result when excess over threshold method is used, we will keep data from this period in our study.

The plot of the data does not indicate any serious violation of the independence and stationarity assumptions. However, there is seemingly an over normalization for data before year 1940. While the costliest hurricane (normalized to year 1995) happened in year 1926 with an actual damage of $105M and a normalized damage of

$72.3B, hurricane Andrew (1992) ranks second with a normalized loss of $33.1B. It takes on average 9 years to observe a hurricane with a normalized damage over $10B.

40 3.2 Fitting Data by the EOT Method: Threshold Selection

Since the number of observations is very limited for each year, we use the EOT method for the analysis of the data. It follows from section 2.4 that the excesses follows approximately a generalized Pareto distribution, thus the mean excess function is linear in threshold u. The mean residual life plot in figure(3.2) looks like a straight line except at the very high end. We can also try to estimate ξ from the slope of the ξ line and set it equal to . For example, if slope 1 is observed from the plot, we 1 ξ − can have a very rough estimation for ξ as 0.5.

Figure 3.2: Mean residual plot of hurricane data

As an alternative method mentioned in section 2.4, the selection of a threshold can be carried out more carefully through a stability check of MLEs of parameters. Sup- pose the CAT future defines any total damage loss over $500 million as a catastrophe

41 event, we are particularly interested in finding out if $500 million is an appropriate threshold. Figure(3.3) are the MLEs of parametersσ ˜ and ξ together with a 95% confidence interval of the estimates. We can directly tell from these plots that when u = 500, both parameter estimations are nearly constant. So the rest of our analysis uses the threshold u = 500M.

Figure 3.3: Parameter estimations for different thresholds

Usually after the threshold is taken, we generate another scatter plot consisting only the excesses of the data and recheck the assumptions in Theorem(1.4). In Coles

[6] (page 90), the excesses of transformed Dow Jones Index series demonstrate the tendency that extreme events are clustered. In this example, the plot does not indicate any serious violation of those assumptions.

42 3.3 Statistical Analysis of S-Plus Plots

Figure(3.4) gives the diagnostic plots for the fitted GPD model of excesses data over threshold u = 500. The probability plot and the quantile plot show a good fit of the model, especially the higher end fit in the quantile plot. The return level plot and the density plot can be hardly distinguished since the large scale in the data. This can be improved by a log-log plot of the results.

Figure 3.4: Goodness-of-fit Plots for hurricane data, u = 500

As a comparison, we present both the estimates by the MLE method and by the profile log-likelihood method. Maximum likelihood estimation for parameters are:

(ˆσ, ξˆ) = (2108.4, 0.71479)

43 with a corresponding maximized log-likelihood of -614.82. The variance-covariance matrix is calculated as  247167.6 64.83216  −  64.83216 0.0485135   −  leading to standard error of 497.16 and 0.22026 forσ ˆ and ξˆ respectively. In particular, it follows that a 95% confidence interval for ξ is obtained as 0.71479 1.96 0.22026 = ± × [0.2831, 1.1465]. Since the 95% interval for ξ does not include 0, there is strong evidence that ξ > 0 with the exact p-value less than 0.001.

As a comparison, the profile log-likelihood plot,figure(3.5), suggests non-symmetric confidence intervals for the shape parameter ξ. A 95% confidence interval for ξ can be obtained from the plot as [0.35, 1.25]. This is not so different from the previous

[0.2831, 1.1465] estimation, but strengthens the conclusion that ξ > 0.

Figure 3.5: Profile log-likelihood estimate of shape parameter ξ

44 There are 69 exceedances of the threshold u = 500 in the complete set of 253 observations, the maximum likelihood estimate of the exceedance probability isp ˆu = 69 pˆu(1 pˆu) = 0.27273. Therefore the approximate variance is about − = 0.000784. 253 253 ˆ The complete variance-covariance matrix for (p ˆu, σ,ˆ ξ) is

 0.000784 0 0  V =  0 247167.6 64.83216     −   0 64.83216 0.0485135   −  To obtain the 10-year return period, we need make assumption on the frequencies of the hurricane occur each year. Since we have about 253 observations during the

90 year span and there are probably some reports missing between 1905-1920, we assume the frequency to be 3, even though it seems in recent years, this number should be slightly higher. Then the 10-year return period corresponds to the 30- observation period (m=30) in (2.10). Substitution with parameter estimates, we obtainx ˆm = 10, 802. And apply delta method to obtain an estimate of the variance

T Var(ˆxm) x V xm = 7, 403, 393 ≈ ∇ m ∇

The standard deviation is 2,721. Thus a 95% confidence interval for 10-year return period xm is 10, 802 1.96 2, 721 = [5469, 16135]. Figure(3.6) gives the profile ± ∗ log-likelihood plot for the 10-year return period. From the plot we can obtain a non- symmetric confidence intervals for the shape 10-year return period [7300, 19000]. For a quick comparison, the confidence interval shifts to the higher end as the result of the curvature in the parameter surface. A nice interpretation of this result is, we

45 are 95% confidently to say, that the costliest hurricane in the next 10 year period is between $7.3B and $19B.

Figure 3.6: Profile log-likelihood estimate of 10 year return level

46 CHAPTER 4

TRANSFORMS OF GEV AND GPD

4.1 Class of Transforms

The origin of transforms

In the modeling of extremal events, the asymptotic distributions of block maxima and excess over threshold were stated as GEV and GPD. In case the shape parameter

ξ is positive, the support of the distribution function is in the form [a, ] where ∞ σ  µ ,X GEV(µ, σ, ξ) − ξ ∼ a =  0,X GPD(˜σ, ξ)  ∼ In many examples we mentioned earlier, the support of the collected data has a compact support, i.e., the range is bounded. In the example of stock market returns, the daily percentage change has a range of [ 100%, ). Particularly, in the analysis − ∞ of market crashes, the data we use would be all negative returns which have range of

[0, 100%] after negation. In the sea level example, we expect that there is an upper bound for the surge in the model. In modeling the total loss caused by a catastrophe event, a natural upper bound would be the total wealth of the covered region. In the low temperature example, the natural limit of any low temperature would be the absolute zero degree in physics.

47 Different examples require different strategies. If we are tracing a major stock market index such as Dow Jones or S&P 500, the probability of a near 100% drop − would be the same as the default rate of treasury bond, which is negligible. If our concern is on the return of a single stock or a small portfolio, based on the recent bankcruptcy of corporations including Enron and Worldcom, the default risk may be still small, yet should be counted.

Suppose we denote the percentage change of a stock price during a trading day by r, where 1 r < . Positive r and negative r corresponds to the up and down − ≤ ∞ of the stock price. Since the stock price can never go negative, the distribution of r is not symmetric. More importantly, the percentage change r does not satisfy the addition property: the total percentage change over a year is not the sum of all daily percentage changes. To overcome these difficulties, we have two solutions:

Solution 1: Use log-returns on both positive r and negative r. Let x = ln(1 + r) and the distribution of x will be symmetric and addition property is satisfied.

Solution 2: We will fix positive r and try to find an ’equivalent’ negative r that will ’negate’ the effect of positive r. Let

 u, if r > 0 r =  d, if r < 0 −  Suppose we observe the behavior of a stock price in two consecutive days. The stock price goes up by u on the first day and then goes down by d on the second day in percentage. If on the close of the second day, the stock price gets back to the d original price, we have (1 + u)(1 d) = 1, or u = , which means a drop of d is − 1 d −

48 d d ’equivalent’ to a rise of in value. This suggests a transform of x = on the 1 d 1 d − − negative return data will make the comparison to positive return data meaningful.

Transforms from [0, 1) to [0, ) ∞ From the discussion above, it is necessary to transform data with bounded support to an unbounded domain to make the application of the extreme value theory more suitable. We first consider transforms that map [0, 1) to [0, ), and that are smooth, ∞ simple and meaningful for plausible applications. Based on the discussion of the stock return example, three types of transforms can be found from the elementary functions:

Type I (standard hyperbolic transform with index p):

x T (x) = , where p > 0 (4.1) (1 x)p − Type II (standard logarithmic transform):

T (x) = ln(1 x) (4.2) − −

Type III (standard trigonometric transform)

2 π T (x) = tan( x) (4.3) π 2

Note: These transforms are selected from elementary functions that have vertical asymptotes. In (4.1), p is typically chosen as 1 if there is no other information that favors other choices. It would be in general similar if we incorporate the parameter p in our model. To keep the calculation simple, we will use this assumption in later

49 analysis. For subject matter, a hyperbolic transform should be suitable for most sub- jects, while a logarithmic transform can apply to financial data and a trigonometric transform to engineering data.

Transformed GEV and GPD families

Using transforms similar to those listed in (4.1)–(4.3), we can map any bounded interval [a, b] to [a, ). Suppose T is the transform, we assume that T will fix a and ∞ x a map b to , i.e., T (a) = a and lim T (x) = . We first use linear transform − ∞ x b ∞ b a → − to map [a, b) onto [0, 1). Then apply the standard transforms to obtain the following transforms:

Type I (general hyperbolic transform with index p): x a − T1(x) a b a − = , where p > 0 (4.4) b a x− a p − 1 −  − b a − Type II (general logarithmic transform):

T2(x) a x a − = ln 1 −  (4.5) b a − − b a − − Type III (general trigonometric transform)

T3(x) a 2 π x a − = tan  −  (4.6) b a π 2 b a − − σ Apply these transforms to the GEV and GPD distribution, where a = µ and − ξ a = 0 respectively. We introduce a new parameter τ = b a to be the spread of the − domain. τ will be referred as the range parameter in later use.

50 Definition 4.1. We say X is from a transformed GEV or GPD distribution if Z follows GEV or GPD distribution, where Z is defined by the following:

Type I (general hyperbolic transform with index p of GEV or GPD):

X a Z = − + a (4.7) X a p 1 −  − τ Type II (general logarithmic transform of GEV or GPD):

X a Z = τ ln 1 −  + a (4.8) − − τ Type III (general trigonometric transform of GEV or GPD)

2τ π(X a) Z = tan  −  + a (4.9) π 2τ σ where a = µ in GEV and a = 0 in GPD. And p > 0, σ > 0, τ > 0. − ξ  From the definition, the supports of the transformed GEV and GPD distributions σ σ are bounded, i.e. [µ , µ + τ) and [0, τ u˜) respectively. − ξ − ξ −

Example 4.1. CDF of hyperbolic transform (p = 1) of GEV distribution GT1 (x) The cumulative distribution function for X can be easily derived. Suppose Z follows a GEV distribution and the transform is

X a τ(X a) σ Z = − + a = − + a where a = µ X a τ (X a) − ξ 1 − − − − τ Since the transform function is strictly increasing (properties of the transforms will be proved in next section), we obtain:

GT (x) = Pr(X x) 1 ≤ 51  x a  = Pr Z −x a + a  ≤ 1 −   − τ  1/ξ τ(x a) −   − + a µ    τ (x a) −  = exp  1 + ξ − −   −  σ            1/ξ  τ(x a) σ −   −    τ (x a) − ξ  = exp  1 + ξ − −   −  σ            1/ξ  ξτ x a − = exp (  −  ) (4.10) − σ τ (x a) − − 

Example 4.2. CDF of logarithmic transform of GPD distribution HT2 (x) Suppose (Y u Y > u) follows a GPD distribution for large enough threshold u − | (say, when u u0) and the transform is ≥ X a Y = τ ln 1 −  + a − − τ Using the monotonicity of the transform, we obtain:

HT (x) = Pr (X u x X > u) 2 − ≤ | u a = Pr X x + u Y > τ ln(1 − ) + a ≤ | − − τ x + u a u a = Pr Y τ ln(1 − ) + a Y > τ ln(1 − ) + a ≤ − − τ | − − τ u a Denote uT = τ ln(1 − ) + a, then − − τ x + u a HT2 (x) = Pr Y uT τ ln(1 − ) + a uT Y > uT − ≤ − − τ − |

52 τ (u a) = Pr Y uT τ ln  − −  Y > uT − ≤ τ (x + u a) | − − 1/ξ ξ τ (u a) − = 1 1 + τ ln  − −  (4.11) − σ˜ τ (x + u a) − − where

σ˜ = σ + ξ(u µ) T − u a = σ + ξ  τ ln 1 −  + a µ − − τ − u a = ξτ ln 1 −  (4.12) − − τ

Replace σ˜ in (4.11) by (4.12):

1/ξ ln(τ + a u) ln(τ + a x u) − HT2 (x) = 1 1 − − − −  − − ln(τ + a u) ln τ − −1/ξ ln τ ln(τ + a x u) − = 1  − − −  − ln τ ln(τ + a u) − − σ The CDF function solely depends on ξ and u a. By denoting u˜ = u a = u µ + − − − ξ σ and u˜0 = u0 a = u0 µ + , we get − − ξ

1/ξ ln τ ln(τ u˜ x) − HT2 (x) = 1  − − −  − ln τ ln(τ u˜) − − where

0 < u˜0 u˜ < τ, 0 x < τ u˜ ≤ ≤ −

53 4.2 Properties of Transformed GEV and GPD Distributions

Transforms T1(x),T2(x),T3(x) in (4.4) – (4.6) are elementary functions in simple form. At the same time, these functions have nice properties so that they can be easily used in the calculation of probabilities. Theroem (4.1) lists the main results.

Theorem 4.1. The transform functions T1(x),T2(x),T3(x) in (4.4) – (4.6) satisfy the following properties:

(a) All functions are strictly increasing and concave up.

(b) All functions are smooth, and the their inverse functions exist and are smooth.

(c) Ti(x) (x a) for all x [a, b), i = 1, 2, 3. ≥ − ∈ (d) Fix a and x [a, b), Ti(x) x 0 as τ = (b a) , i = 1,2,3. ∈ | − | → − → ∞

Proof. (a) We need to show the first and second derivatives of Ti(x) are positive.

b a p x a T 0 (x) =  −  1 + p −  > 1 1 b x b x − − b a p 2p(b x) + p(p + 1)(x a) T 00 (x) =  −  − − > 0 1 b x (b x)2 − − b a T 0 (x) = − > 1 2 b x − b a T 00 (x) = − > 0 2 (b x)2 − π x a T 0 (x) = sec2  −  > 1 3 2 b a − π π x a π x a T 00 (x) = sec2  −  tan( − ) > 0 3 2(b a) 2 b a 2 b a − − − Note: In all above equations, a x < b. ≤

54 (b) We can find all inverse functions except p = 1. 6 2 1 bx a T − (x) = − , x a 1 x + b 2a ≥ − 1 x a T − (x) = b (b a) exp  −  , x a 2 − − − b a ≥ − 1 2 1 π x a T − (x) = (b a) tan−  −  + a , x a 3 π − 2 b a ≥ − In case p = 1, the existence of an inverse function is obvious from (a) that all 6 functions are monotone. The smoothness follows from the Implicit Function Theorem.

(c) Let Gi(x) = Ti(x) x. Gi(a) = 0 and G0 (x) > 0 for x a from (a). Thus we − i ≥ conclude Gi(x) 0, i.e.,Ti(x) x for all x [a, b), i = 1, 2, 3. ≥ ≥ ∈ (d) For T1(x), we have

p x a x a − T1(x) x = − p + a x = (x a) "1 −  1# − x a − − − b a − 1 −  − − b a − p x a − where 1 −  1 as (b a) , thus (T1(x) x) 0. − b a ↓ − → ∞ − → − For T2(x), the Taylor expansion of the function ln(1 z) is: − − z2 z3 ln(1 z) = z + + + for 0 z < 1 (4.13) − − 2 3 ··· ≤ x a Let z = − and apply (4.11) to T2(x), we obtain b a − x a T2(x) = (b a) ln 1 −  + a − − − b a − x a x a 2 = (b a) " − + O  −  !# + a − b a b a − − 1 = x + (x a)2 O   − b a − 55 Similarly, from the Taylor expansion of the function tan z:

z3 z5 π tan z = z + + + for 0 z < (4.14) 3! 5! ··· ≤ 2 π x a Let z = − and apply (4.12) to T3(x), we obtain 2 b a − 2 π x a T3(x) = (b a) tan  −  + a π − 2 b a − 2 π x a π x a 2 = (b a) " − + O  −  !# + a π − 2 b a 2 b a − − π 1 = x + (x a)2 O   2 − b a −

And we conclude (T2(x) x) 0 and (T3(x) x) 0 as (b a) 0. − → − → − → 

Comments 4.1. (1) It can be seen that from the construction of the transform, the transform depends only on the relative position of x to the end points a and b. To be

x a Ti(x) a exact, suppose − = λ where 0 λ < 1. Then − = fi(λ), i = 1, 2, 3. b a ≤ b a − − (2) Property (b) indicates that all transforms are isomorphisms from [a, b) to

[a, ). From (a) and (c), the transform performs an expansion on [a, b). Point ∞ a is fixed, and point b has been pulled to infinity as the scale of the expansion is higher if the point is closer to point b.

(3) Property (d) indicates the consistence of the transformed distribution with the original GEV or GPD in modeling. The new model converges asymptotically to the original GEV or GPD model as the range parameter τ = b a goes to infinity. −

Next we study the behavior of the pdfs of transformed distributions.

56 Example 4.3. pdf of hyperbolic transform of GEV distribution gT1 (x) Suppose Z follows a GEV distribution and the transform is

X a σ Z = − + a where a = µ X a p − ξ 1 −  − τ We can obtain cdf of X similar as in Example 4.1,

1/ξ ξ x a − " # GT1 (x) = exp −x a p  − σ 1 − − τ  Taking derivative, the pdf is:

1/ξ 1 1 ξ x a − − " # gT1 (x) = GT1 (x) −x a p · ξ σ 1 − · − τ  p 1 p ξ x a − − 1 x a − "(x a)( p) 1 −  − + 1 −  # ·σ − − − τ · τ − τ 1/ξ 1 p/ξ 1 1 ξ − − x a − x a = GT1 (x)  (x a) 1 −  1 + (p 1) −  · σ σ − · − τ · − τ

p/ξ 1 x a − x a Since 1 −  1 + (p 1) −  1 as x a, the pdf of X is close to − τ · − τ → ↓ the pdf of Z at the left end point x = a. In other words,

g (x) lim T1 = 1 x a g(x) ↓ When x approaches the right end point a + τ, we obtain:

 0, if ξ < p  1/ξ 1  p ξτ − − lim gT1 (x) =  , if ξ = p x (a+τ)    ↑ σ σ  , if ξ > p   ∞ These properties can be verified from the plots in Figure (4.1).

57 1 Figure 4.1: pdf of hyperbolic transformed GPD family when ξ = , 1, 2 2



Example 4.4. pdf of logarithmic transform of GPD distribution hT2 (x) Suppose (Y u Y > u) follows a GPD distribution and the transform is − | X a Y = τ ln 1 −  + a − − τ

We obtained cdf of X in Example 4.2,

1/ξ ln τ ln(τ u˜ x) − HT2 (x) = 1  − − −  − ln τ ln(τ u˜) − − Taking derivative, the pdf is:

1/ξ [ln τ ln(τ u˜)] 1/ξ 1 hT (x) = − − [ln τ ln(τ u˜ x)]− − 2 ξ(τ u˜ x) − − − − − 58 where

0 u˜ < τ, 0˜ x < τ u˜ ≤ ≤ − Similar to the approaches in Example 4.3, we obtain:

hT2 (x) u˜ 1 lim = and lim hT2 (x) = x 0 h(x) τ u˜ · ln τ ln(τ u˜) x (τ u˜) ∞ ↓ − − − ↑ − 1 The following are the plots when ξ = , 1, 2 and u˜ = 10, 20, 30 respectively. 2

1 Figure 4.2: pdf of logarithmic transformed GPD family when ξ = , 1, 2 2

59 Figure 4.3: pdf of logarithmic transformed GPD family whenu ˜ = 10, 20, 30

Comments 4.2.

(1) Indicated by Theorem 4.1 (d) and checked above, the pdfs of transformed GEV and GPD are very close to the original pdfs when x is close to the left end point.

(2) It is interesting to look at the result at the right end point. For pdf under transform

T1(x), the end behavior depends on ξ. If the ξ is larger than index p (corresponds to heavy tail), the pdf blows up at the right end point which suggests ‘point mass’ behavior. For transform T2(x), this ’point mass’ behavior will always occur for any positive ξ. We can interpret T2(x) as the asymptotic behavior of T1(x) as p 0. Pdf ↓ using transform T3(x) shows behavior similar to the case of p = 1 in T1(x).

Table 4.1–4.2 consists of cdfs and pdfs of transformed GEV and GPD for reference:

60 σ Function Expression where a = µ  − ξ 1/ξ ξ x a − " # GT1 (x) exp − p  σ x a − 1 −τ −  1/ξ 1 p/ξ 1 1 ξ − − x a − x a gT1 (x) GT1 (x)  (x a) 1 −  1 + (p 1) −  · σ σ − · − τ · − τ 1/ξ ξτ x a − GT2 (x) exp  ln 1 −   − − σ − τ 1/ξ 1 1 τ ξτ x a − − gT2 (x) GT2 (x)  ln 1 −  · σ · τ (x a) · − σ − τ − − 1/ξ 2ξτ π x a − GT3 (x) exp  tan  −   − πσ 2 τ 1/ξ 1 1 2 π x a 2ξτ π x a − − gT3 (x) GT3 (x) sec  −   tan  −  · σ 2 τ πσ 2 τ Table 4.1 cdfs and pdfs of transformed GEV distributions.

σ Function Expression where˜u = u a = u µ +  − − ξ p/ξ p/ξ p/ξ p/ξ HT (x) 1 u˜ (τ u˜)− (x +u ˜)− (τ u˜ x) 1 − − − − pτ p/ξ p/ξ p/ξ 1 p/ξ 1 hT (x) u˜ (τ u˜)− (x +u ˜)− − (τ u˜ x) − 1 ξ − − − 1/ξ ln τ ln(τ u˜ x) − HT2 (x) 1  − − −  − ln τ ln(τ u˜) − 1/ξ− [ln τ ln(τ u˜)] 1/ξ 1 hT (x) − − [ln τ ln(τ u˜ x)]− − 2 ξ(τ u˜ x) − − − − − 1/ξ 1/ξ πu˜ π u˜ + x − HT3 (x) 1 tan   tan   − 2τ 2 τ 1/ξ 1/ξ 1 π πu˜ π u˜ + x − − 2 π u˜ + x hT3 (x) tan   tan   sec   2τξ 2τ 2 τ 2 τ

Table 4.2 cdfs and pdfs of transformed GPD distributions.

61 4.3 Conditional Expectation of Excess Over Threshold

As we discussed in earlier chapters, the conditional expectation of excess over thresh- old is helpful in determining the choice of threshold u. There are a few reasons that the corresponding results have certain restrictions in usage. Recall the formula

σ + ξu E(X u X > u) = u0 − | 1 ξ − provided ξ < 1. When ξ 1, the mean is infinite. In practice, since the data collected ≥ will always produce a finite mean residual life plot, an interpretation is impossible in case ξ 1. Also in the above formula, the conditional expectation is a linear curve ≥ independent of the threshold u. Both the mean residual life plot of the rainfall data

(Figure(5.1)) from [6] and the mean residual life plot of the fire data from (Figure

(5.2)) from [9] indicate a forced damping at the higher value end. Figures and more discussions can be found in Chapter 5.

Moreover, the method should be used very cautiously due to the lack of stability of the plot even when the underlying distribution is exactly Pareto distribution. We refer to [9] (page 294-303) for the discussion.

Things are very different if transformed distributions are used. We concentrate our result on the transformed GPD distributions. First, the conditional expectation is finite for all ξ > 0 since the support of the transformed distribution is finite.

And when the threshold approaches the higher end of the support, the conditional expectation converges to zero. Explicit formulas are not possible for all transformed distributions. However, in case that we are able to solve, the illustration is clear as in the next example.

62 Example 4.5. Conditional Expectation of Excess Over Threshold for HT1 (x) p Notice that the distribution depends solely through , firstly we study the case when ξ p τu˜ = 1. Correspondingly, h (x) = (x +u ˜) 2 Thus for u˜ > u˜ , ξ T1 τ u˜ − 0 − τ u˜ − E(X u X > u) = Z xhT1 (x)dx − | 0 τ u˜ τu˜ x = − dx (substitute y = x +u ˜) Z τ u˜ (x +u ˜)2 0 − τu˜ τ y u˜ = − dy τ u˜ Z y2 − u˜ τu˜ u˜ y=τ = ln y +  τ u˜ y |y=˜u − τu˜ u˜ = ln τ + lnu ˜ 1 τ u˜ τ − − τ−u˜ τ = ln u˜ τ u˜ u˜ − − Let u˜ = λτ, then

λ ln λ u0 E(X u X > u) = τ(− λ) for λ0 = < λ < 1 − | 1 λ − τ − λ ln λ A plot of m(λ) := − λ against λ can be generated by Matlab. 1 λ − − From Figure)(4.4), we conclude the following properties:

The plot of m(λ) is a bell-shaped curve which skewed to the left. •

The maximum occurs at about λ0 = 0.3162 or u˜ = 0.3162τ. • 1 The curve is asymptotic to a line with slope at the right end point λ = 1. • −2

63 Figure 4.4: Plot of m(λ)

To obtain these properties, we simply look at

(1 λ)( 2 + λ) ln λ m0(λ) = − − − (1 λ)2 − (1 λ)( 1 λ) + ln λ m00(λ) = − − − < 0 for 0 < λ < 1 (1 λ)3 − Then 1 lim m0(λ) = and lim m0(λ) = λ 0 λ 1 2 → ∞ → −

And the only solution to m0(λ) = 0 is λ0 = 0.3162.

p Next we carry out similar computation in general case p = ξ. Define r := > 0, so 6 ξ the distribution function of the hyperbolic transformed GPD distribution is given by

u˜ r τ u˜ x r HT1 (x) = 1    − −  − τ u˜ x +u ˜ − 64 where u˜ > u˜0, then

τ u˜ − E(X u X > u) = Z xdHT1 (x) − | 0 τ u˜ x=τ u˜ − = xHT1 (x) x=0− Z HT1 (x)dx | − 0 τ u˜ r r − u˜ τ u˜ x = (τ u˜) Z 1    − −   dx − − 0 − τ u˜ x +u ˜ r τ u˜ − r u˜ − τ u˜ x =    − −  dx (substitute y = x +u ˜) τ u˜ Z x +u ˜ − 0 u˜ r τ τ y r =    −  dy (substitute z = τy, u˜ = λτ) τ u˜ Z y − u˜ λ r 1 1 z r = τ    −  dz 1 λ Z z − λ λ r 1 1 z r Define mr(λ) :=    −  dz and it can be interpreted as the excess 1 λ Z z − λ over threshold ratio. We use Matlab to generate plot of mr(λ) in Figure(4.5) for different choices of r and then we conclude the following properties of mr(λ):

The plot of m(λ) is a bell-shaped curve with one single . •

The maximum excess over threshold ratio decreases as r increases, and the cor- • responding λ0 decreases.

1 The curve is asymptotic to a line with slope at the right end point λ = 1. • −r + 1

65 Figure 4.5: Plot of mr(λ)

Some of the properties can be derived from the calculation of

λ r 0 1 1 z r λ r d 1 1 z r 0 mr(λ) =    Z  −  dz +   Z  −  dz 1 λ · λ z 1 λ · dλ λ z − r 1 1 r − λ − 1 1 z = r   2 Z  −  dz 1 1 λ (1 λ) λ z − −r 1 1 − r λ − 1 z = r Z  −  dz 1 (1 λ)r+1 z − − λ

r 1 1 z  −  dz λ z 0 R lim mr(λ) = r lim 1 (apply L’Hopital’s Rule) λ 1 · λ 1 (1 λ)r+1 − → → − 1 λ r  −  − λ = r lim 1 λ 1 (r + 1)(1 λ)r( 1) − → − − 66 1 = −r + 1 1 z 1 If r < 1, since − 1 when z , we have z ≥ ≤ 2 1 1 z r 1/2 1 Z  −  dz Z 1dz = λ λ z ≥ λ 2 −

Then 1 0 r 1 lim mr(λ) lim rλ − ( λ) 1 = λ 0 λ 0 2 → ≥ → − − ∞ 1 1 z r If r > 1, lim Z  −  dz = , so λ 0 z → λ ∞ 1 1 z r  −  dz Z z 0 λ lim mr(λ) = r lim 1 (apply L’Hopital’s Rule) λ 0 λ 0 λ1 r → · → − − 1 λ r  −  − λ = r lim 1 λ 1 (1 r)λ) r → − − 1 − = > 0 r 1 − In summary, we have

 , if r 1 0 ∞ ≤ lim mr(λ) =  λ 0  1 ↓ if r > 1 r 1  − Thus, the mean excess over threshold function increases at the small threshold and decreases for the large threshold. From continuity of E(X u X > u) with respect to − | u, there exists λ = λ0 or u˜0 = λτ such that E(X u X > u) reaches maximum. − |

67 CHAPTER 5

MODELING OVER TRANSFORMED DISTRIBUTIONS

5.1 Model Selection: Transformed or Not?

As we discussed in Chapter 4, modeling over the transformed distributions is a tech- nique that extends the original models to be applied to the cases when natural limit exists. It enables us to choose models from broader ranges with the additional range parameter τ. We approach the model selection in the following steps.

The first thing we need to consider is whether the natural limit exists. In most cases the answer is yes, since all of our measurements and readings are finite. However, our major concern is whether the introduction of this limit will affect the modeling.

For example, suppose the age distribution of the United States population is our interest. Then currently people that are at least 100 years old is only a negligible portion in the study. Any error caused by inaccurate estimation of this age period may only have a very slight effect on the whole model. Things are different if our study object changes to the population that are at least 80 years old. A more accurate model is desirable for the population with age over 100. In detail, since the general

Pareto distribution has linear mean excess function and there is a natural force toward mortality for very old people, we see already that the mean excess function changes to

68 fit this situation when we apply transforms. Thus a transformed model is preferable when an elderly population is studied.

For the specific transforms to be used in the model, we recommend the adoption of proper transforms from the tradition of different fields. For many models in hydrol- ogy and climatology, the hyperbolic transform (particularly when p = 1) is often a good choice. For distributions that are known to have log-normal distributions, stock returns, for instance, the logarithmic transform may perform well. The trigonomet- ric transform might be attractive to engineers. The only criterion is to choose the simplest model that fits the data reasonably well.

The mean residual life plot is suggested to be used in the model selection. Notice that the major difference between the original GPD distribution and the transformed one is that for the transform, the mean excess function increases first and then de- creases and the whole curve is concave down. If we take a look at the mean residual life plot of the hurricane data in chapter 3 (figure(3.2)), we can see that only one observation represents the decreasing part in the plot. Also in the hurricane data example, the natural limit of the total loss is the total wealth of people who live close to the southeast coastline. This limit is a huge number compared with the recorded damage losses. Hence we conclude that the range parameter ξ may not be necessary in the model of hurricane data.

We examine two examples where a transform is quite important. The first example is the daily rainfall data in Coles [9] (page 80), and the mean residual life plot is given in figure(5.1).

69 Figure 5.1: Mean residual life plot for daily rainfall data, Source: Coles[6]

The complete data set has 17,531 observations and there are 152 exceedances of the threshold u = 30. We can see clearly the maximum point in the plot and there are about 10 observations representing the decreasing part of the plot. Even though the evidence is not convincingly strong, we should give fully consideration of using transformed distributions in the model.

The next example is from Embrechts, Kluppelberg and Mikosch [6] (page 320).

The mean residual life plot is attached. About 50 observations in the range of above

6000 that compose the decreasing part in the plot. The evidence is quite strong that the data seem to cluster towards some specific upper value. Using a transformed model becomes very appealing here.

70 Figure 5.2: Mean residual life plot for fire data, Source: Embretchs[9]

Based on what we have discussed, we can go one step further to get an estimation on the shape parameter ξ. Suppose we fit the data with a transformed GPD model using the EOT method. Fix p = 1 in the hyperbolic transform and follow what we have defined in chapter 4:

λ r 1 1 z r mr(λ) :=    −  dz 1 λ Z z − λ 1 where r = = α. Then we generate plots of m (λ) for ξ = 0.1, 0.2, ..., 2.0 as in ξ r

figure(5.3). Keep in mind that mr(λ) does not take the range parameter τ into account. Since in general τ is unknown, we cannot get any information about ξ directly from τ. Also notice that the relative position of the rescaled maximum point

(the ratio of the y and x coordinates) stays constant for fixed ξ. Table(5.1), consisting rescaled maximum points and y/x ratios for various ξ, is generated for reference.

71 Figure 5.3: Mean excess function of transformed GPD for ξ = 0.1, 0.2, ..., 2.0

ξ maximum point y/x ratio ξ maximum point y/x ratio

0.1 (0.475, 0.0249) 0.0524 1.1 (0.305, 0.2330) 0.7640

0.2 (0.451, 0.0495) 0.1097 1.2 (0.294, 0.2491) 0.8473

0.3 (0.429, 0.0735) 0.1712 1.3 (0.284, 0.2645) 0.9312

0.4 (0.409, 0.0966) 0.2362 1.4 (0.275, 0.2792) 1.0151

0.5 (0.390, 0.1189) 0.3048 1.5 (0.267, 0.2932) 1.0981

0.6 (0.372, 0.1402) 0.3768 1.6 (0.259, 0.3066) 1.1839

0.7 (0.356. 0.1605) 0.4509 1.7 (0.251. 0.3195) 1.2729

0.8 (0.342, 0.1800) 0.5262 1.8 (0.244, 0.3318) 1.3600

0.9 (0.329, 0.1985) 0.6033 1.9 (0.237, 0.3437) 1.4502

1.0 (0.316, 0.2162) 0.6841 2.0 (0.231, 0.3551) 1.5370

Table 5.1 Maximum points of the mean excess function for ξ = 0.1, 0.2, ..., 2.0 .

72 The y/x ratio of the maximum point is monotonic as we increases ξ. Therefore we can get an estimate of ξ from the valuation of y/x ratio in the mean residual plot.

We apply it to the two examples we mentioned. From figure(5.1) of the rainfall data, the y/x ratio of the maximum point is between 0.25 and 0.30. Check Table(5.1) we obtain an estimate of ξ between 0.4 and 0.5. Similarly, the y/x ratio of the maximum point in figure(5.2) for the fire data is about 0.7 and thus gives an estimate of ξ between 1.0 and 1.1.

Once ξ is estimated, we can combine Table(5.1) with the mean residual life plot to give references of the range parameter τ. Suppose we take ξ = 0.4 in the rainfall 62 example. The observed x-coordinates of the maximum point is 62. Then = 0.409 151.6 is the estimate of the natural limit τ in the model. Similarly in the fire data, 5600 we take ξ = 1.0 and then the estimate of the range parameter τ is = 17, 721. 0.316 Even though the mean excess function of a transformed distribution, in general, has better stability than before, the lack of stability is still the major concern for it to be used as precise reference. We recommend the use of the standard model selection criteria described in section 2.4 for the addition of range parameter τ. For more accurate parameter estimation, we prefer the MLE method in the next section.

5.2 Parameter Estimations of Transformed Distributions

We adopt the MLE principle in the parameter estimation of transformed distributions.

Suppose x = x1, x2, ..., xn are realizations of independent variables from a common distribution F. The parameter estimation is given by the values of the parameters

73 that maximize the log-likelihood function in the parameter space. We list some of the calculation results for reference.

(1) Hyperbolic transform of the GEV family: case p = 1

1/ξ ξτ x a − Suppose F (x) = GT1 (x) = exp  −   and θ = (ξ; a, σ, τ), then − σ τ (x a) − − n 1/ξ 1 ξτ x a − L(θ x) = exp (  −  ) | σn · − X σ · τ (x a) i=1 − − n 1/ξ 1 n 1/ξ 1 ξ − − xi a −  (xi a) 1 −  Y σ Y τ · i=1 − · i=1 − and

n 1/ξ ξτ x a − l(θ x) = n ln σ  −  | − − X σ · τ (x a) i=1 − − n n 1 ξ 1 xi a ( + 1) ln  (xi a) + ( 1) ln 1 −  ξ X σ ξ X τ − i=1 − − i=1 − The log-likelihood equations obtained by taking partial derivative of l(θ x) are too | complicated to display here. Again we call for numerical procedures for the maxi- mization.

(2) Hyperbolic transform of the GPD family: case p = 1 and a = 0

1/ξ u x + u − Suppose F (x) = HT1 (x) = 1   and θ = (ξ; τ), then − τ u · τ x u − − − n/ξ n n n n u 1/ξ 1 1/ξ 1 L(θ x) = τ ξ−   (xi + u)− − (τ xi u) − | · · τ u · Y · Y − − − i=1 i=1 and

τ n u 1 n 1 n l(θ x) = n ln + ln ( + 1) ln(xi + u) + ( 1) ln(τ xi u) | ξ ξ τ u − ξ X ξ − X − − − i=1 i=1 74 Taking the partial derivative of l(θ x) with respect to ξ and τ, we obtain the following | log-likelihood equations:

n u xi + u  nξ n ln + ln = 0 τ u X τ x u  − − i=1 i  − n− − (5.1)  n 1 1 nτ + + ( 1) = 0 − ξ(τ u) ξ − X τ x u  i=1 i  − − − Solving the first equation in (5.1) we obtain

u 1 n x + u ξ = ln + ln i > 0 − τ u n X τ xi u − i=1 − − Replacing ξ by this result in the second equation, we can obtain an implicit equation in

τ. Solutions that satisfy τ > max x1, ..., xn can be found by numerical procedures. { } (3) Logarithmic transform of the GPD family: case a = 0

1/ξ ln τ ln(τ x u) − Suppose F (x) = HT2 (x) = 1  − − −  and θ = (ξ; τ), then − ln τ ln(τ u) − − n/ξ n n 1/ξ 1 n τ 1 τ − − L(θ x) = ξ− ln  ln  | · τ u · Y τ xi u · Y τ xi u − i=1 − − i=1 − − and

n τ n 1 n τ l(θ x) = n ln ξ+ ln ln  ln(τ xi u) ( +1) ln ln  | − ξ τ u −X − − − ξ X τ xi u − i=1 i=1 − − Taking the partial derivative of l(θ x) with respect to ξ and τ, we obtain the following | log-likelihood equations:

τ n τ  nξ n ln ln  + ln ln  = 0  − − τ u X τ xi u  i=1  n − − − 1 n  1 u τ − nu 1 ( + 1) " ln  # = +  ξ X τ(τ xi u) · τ xi u ξτ(τ u) X τ xi u  i=1 − − − − − i=1 − −  (5.2)

75 Solving the first equation in (5.2) we obtain

τ 1 n τ ξ = ln ln  + ln ln  > 0 − τ u n X τ xi u − i=1 − − Replacing ξ by this result in the second equation, we can obtain an implicit equation in

τ. Solutions that satisfy τ > max x1, ..., xn can be found by numerical procedures. { } (4) Trigonometric transform of the GPD family: case a = 0

1/ξ πu 1/ξ π u + x − Suppose F (x) = HT3 (x) = 1 tan tan   and θ = (ξ; τ), − h  2τ i 2 τ then

n n n/ξ 1/ξ 1 π πu π(u + xi) π(u + xi) − − L(θ x) = n ln tan sec2 tan  2τξ  2τ  Y 2τ Y 2τ | · · i=1 · i=1 and

n n π n πu π(u + xi) 1 π(u + xi) l(θ x) = n ln + ln tan +2 ln sec  ( +1) ln tan  2τξ ξ  2τ  X 2τ ξ X 2τ | i=1 − i=1 Taking the partial derivative of l(θ x) with respect to ξ and τ, we obtain the following | log-likelihood equations:

n πu π(u + xi)  nξ n ln tan + ln tan  = 0 − −  2τ  X 2τ  i=1  n n πun π(u + xi) 1 π(u + xi) nτ + + π(u + xi) tan  = ( + 1)    ξ sin(πu/τ) X · 2τ ξ X sin(π(u + xi)/τ)  i=1 i=1  (5.3)

Solving the first equation in (5.3) we obtain

n πu 1 π(u + xi) ξ = ln tan + ln tan  > 0  2τ  n X 2τ − i=1 Replacing ξ by this result in the second equation, we can obtain an implicit equation in

τ. Solutions that satisfy τ > max x1, ..., xn can be found by numerical procedures. { } 76 The results we obtain here give very simple connection between the the shape parameter ξ and the range parameter τ. It looks very similar to the format of the known Hill estimator, which provides inference of the shape parameter ξ under the general maximum domain of attraction condition. For more detail, see Hill [14].

5.3 Future Research Directions

In general, transforming bounded data so that it can be fitted into unbounded distri- bution models provides a useful tool in case a natural bound exists. This technique can be easily combined with many other techniques in the research frontier of extreme values.

For most problems involved with actuarial modeling and value at risk in finance, • ξ is positive and the transforms work fine. The very intuitive problem is to

extend the model to the cases when the shape parameter ξ 0. When ξ < ≤ 0, one possible way is to find appropriate transforms ( , b] (a, b] and −∞ → take similar steps as in previous chapters to obtain corresponding results. The

transforms in chapter 4 all have little effect at the vicinity of the fixed point

b. Thus the effect on the extreme (maximum) value is also small, may even

negligible in this situation. And we need either to seek solutions that improves

the model or give a reasonable explanation why transforms are not necessary.

For the Gumbel distribution, transforms can only be performed on the half

interval that includes the interesting end.

77 As we have seen in section 4.2, the probability density functions(pdf) of the • transformed GEV or GPD distributions may have an infinite limit at the upper

bound a + τ. This indicates that a mixture of continuous and discrete distri-

bution may fit the model as well. Suppose the point mass taken at the upper

bound is , then the pdf can be written as

f(x) = (1 )h(x) + δ(x a τ) − − − where h(x) is the pdf of transformed distribution and δ(x) is the unit point

mass density.

One of the most active areas in this field is extreme value theory applied to time- • dependent sequences and certain time series models. By adopting the general

theory of point processes, there is a natural way explaining the independence

between parameters. In particular, the range parameter τ added to the GEV

and GPD models can also depend on the time variable t. We can apply all

techniques that are used to other parameters to fit a reasonable function for τ.

For example, one of the interesting topics is the limits of human’s athletic

ability. Taking the record time of 100-meter dash as an index, we certainly

know that there should be a natural limit for this record, for instance, 9.5

seconds. Our interest may be at the improvement of this limit as time evolves.

Some approaches are possible here: one way is to collect the top 10 finishes

of each year and apply order statistics techniques; an alternative way is to

collect all runs with record time less than 10.2 seconds and then apply the EOT

method.

78 So far we have studied the extreme value theory of the univariate case. If there • is an inter-relation between two variables, it is beneficial to study the joint

behavior of the combined data. We list some examples of bivariate models: the

distance between the moon and the earth vs. sea level; extreme weather vs.

the El Nino effect; the exchange rate of US dollar against both UK sterling

and Japanese yen. Our interest is in the effect of the introduction of the range

variable τ. If the two variates in a bivariate model are positively correlated,

and the range parameter τ is a significant factor for one of the variables, then

probably the other variable can be fitted with a transformed model. Further

research also needs to explore the possibility of constructing transforms that

map the unbounded multidimensional regions to simple regions.

Out last approach is to apply the technique of proper transforms to extreme • models using Bayesian inferences and/or Markov chains. A Bayesian analysis

of extreme value data is desirable if we acquire other source of information

through a prior distribution. It is specially helpful when the number of available

observations is not large enough to apply large sample approximation in extreme

value theory. The byproduct of Bayesian analysis is a more complete inference

through the posterior distribution than the MLE method. In case the extreme

values seem to cluster, we can model the successive observations x1, x2, ..., xn with a first-order process. The MLE method works well in this

setting, and the introduction of the range τ only increases the dimension of the

parameter space by 1.

79 BIBLIOGRAPHY

[1] Arnold, B.C. & Balakrishnan, N. and Nagaraja, H.N. A First Course in Order Statistics. Wiley, New York. (1992), 119,186,195.

[2] Azzalini, A. Statistical Inference Based on the Likelihood. Chapman and Hall, London. (1996).

[3] Bassi, F. & Embrechts, P. & Kafetzaki, M. “A survival kit on quantile estimation”. Environmetrics 5 (1994), 221–239.

[4] Casella, G. & Berger, J.O. Statistical Inference. Duxbury, Belmont, California. (1990).

[5] Coles, S.G. & Tawn, J.A. “A seasonal Markov model for extremely low tem- peratures”. Environmetrics 5 (1994), 221–239.

[6] Coles, S.G. An Introduction to Statistical Modeling of Extreme Values. Springer- Verlag, London. (2001)

[7] Coles, S.G. & Powell, E. A. “Bayesian methods in extreme value modeling: a review and new developments”. International Statistical Review 64 (1996), 119–136.

[8] David, D.A. Order Statistics. Wiley, New York. (1981), 195,293.

[9] Embrechs, P., Kluppelberg, C. and Mikosch, T. Modelling Extremal Events for Insurance and Finance. Springer, Berlin. (1997)

[10] Fisher, R.A. & Tippet, L.H.C. “Limiting forms of the frequency distribution of the largest or smallest number of the sample”. Proc. Cambridge Philos. Soc. 24 (1928), 180–190.

[11] K. A. Froot, K. A. & O’Connell P. G., “The Pricing of US Catastrophe Reinsurance”, NBER Working Paper No. 6043, May 1997, in The Financing of Catastrophe Risk, K. A. Froot, ed. Chicago: University of Chicago Press, 1999.

80 [12] Gnedenko, B.V. & Kolmogorov, A.N. Limit Theorems for Sums of Independent Random Variables. Addison-Wesley, Campbridge, Mass. (1954), 80–81.

[13] de Haan, L. “Fighting the arch-enemy with mathematics”. Statistica Neer- landica 44 (1990), 45–68.

[14] Hill, B.M. “A simple general approach to inference about the tail of a distribu- tion”. Ann. Statist 3 (1975), 1163–1174.

[15] Jaffe, D.M. & Russell, T. “Catastrophe insurance, capital markets and unin- surable risk”. (1996), Financial Institutions Center, The Wharton School, 96–12.

[16] Mishkin, F.S. & White, E.N. “U.S. stock market crashes and their aftermath: implications for monetary policy”, (2002), asset price bubbles conference federal reserve bank of chicago and the world bank, Chicago, Illinois.

[17] Pielke, Jr., R. A. & C. W. Landsea, “normalized Hurricane Damages in the United States: 1925-1995”. Weather and Forecasting 13 (1998), 621–631.

[18] Resnick, S.I. Extreme Values, Regular Variation, and Point Processes. Springer, New York. (1987), Proposition 0.3.

[19] Smith, R. L. “Maximum likelihood estimation in a class of non-regular cases”. Biometrika 72 (1985), 67–90.

81