<<

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

Fixed-k Asymptotic Inference About Tail Properties

Ulrich . Müller & Yulong Wang

To cite this article: Ulrich K. Müller & Yulong Wang (2017) Fixed-k Asymptotic Inference About Tail Properties, Journal of the American Statistical Association, 112:519, 1334-1343, DOI: 10.1080/01621459.2016.1215990 To link to this article: https://doi.org/10.1080/01621459.2016.1215990

Accepted author version posted online: 12 Aug 2016. Published online: 13 Jun 2017.

Submit your article to this journal

Article views: 206

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=uasa20 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –, Theory and Methods https://doi.org/./..

Fixed-k Asymptotic Inference About Tail Properties

Ulrich K. Müller and Yulong Wang Department of Economics, Princeton University, Princeton,

ABSTRACT ARTICLE HISTORY We consider inference about tail properties of a distribution from an iid sample, based on extreme value the- Received February  ory. All of the numerous previous suggestions rely on asymptotics where eventually, an infinite number of Accepted July  observations from the tail behave as predicted by , enabling the consistent estimation KEYWORDS of the key tail index, and the construction of confidence intervals using the delta method or other classic Extreme quantiles; tail approaches. In small samples, however, extreme value theory might well provide good approximations for conditional expectations; only a relatively small number of tail observations. To accommodate this concern, we develop asymptoti- extreme value distribution cally valid confidence intervals for high quantile and tail conditional expectations that only require extreme value theory to hold for the largest k observations, for a given and fixed k. Small-sample simulations show that these “fixed-k” intervals have excellent small-sample coverage properties, and we illustrate their use with mainland U.. hurricane data. In addition, we provide an analytical result about the additional asymp- totic robustness of the fixed-k approach compared to kn →∞inference.

1. Introduction of ξˆ, and thus undermines the distribution theory that justifies Tail properties of distributions have been an important and the confidence construction. On the other hand, picking ongoing empirical and theoretical issue. A large literature is kn large amounts to imposing the generalized Pareto approxima- devoted to exploiting implications of extreme value theory and tion on a relatively large fraction of the distribution, especially tail regularity conditions for purposes of estimation and infer- if n is only moderately large. This can be a poor approximation, ence: see Embrechts, Klüppelberg, and Mikosch (1997), Coles even for underlying in the domain of attraction of an extreme (2001), Reiss and Thomas (2007), Resnick (2007), de Haan and value distribution, leading to substantial bias and undersized Ferreira (2007), Beirlant, Caeiro, and Gomes (2012) and Gomes confidence intervals. As such, for some combinations of n and and Guillou (2015) for reviews and references. As demonstrated F,nochoiceofkn leads to satisfactory inference derived under →∞ by Balkema and de Haan (1974)andPickands(1975), a distribu- kn asymptotics. tion F is in the domain of attraction of an extreme value distribu- This article develops an alternative asymptotic embedding to tion with tail index ξ if and only if its tail is well approximated by address this issue. In particular, we derive asymptotically valid a generalized with ξ.This inference under the sole assumption that extreme value theory regularity assumption about the tail implies that common tail applies to the k largest observations, for given and fixed k.This properties of interest, such as tail probabilities, high quantiles, or should yield good small-sample approximations whenever (a lit- / tail conditional expectations are functions of ξ and the location tle more than) a fraction k n of the tail of F is well approximated and scale of the generalized Pareto distribution, at least approx- by a generalized Pareto distribution. These “fixed-k”asymp- imately. Corresponding of tail properties can hence toticsreflectthesmall-samplenatureoftheinferenceproblem be constructed by plugging in estimators of these three param- in the sense that only relatively few observations are assumed eters. The literature contains numerous suggestions along those to stem from the nicely behaved tail, while previous approaches lines, reviewed in the surveys mentioned above. crucially exploit that the number of such observations is (even- A common feature of all of these approaches is that they tually) large. Our approach is close in spirit to recently devel- oped alternative asymptotic embeddings in other contexts, such model an increasing number kn →∞, kn/n → 0oftailobser- vation in a sample of size n as stemming from the approximate as the “fixed-” asymptotics considered by Kiefer and Vogelsang generalized Pareto tail. Indeed, under additional regularity con- (2005) in the context of heteroskedasticity and autocorrelation ditions, tail index estimators ξˆ cantypicallybeshowntobecon- robust inference, or the small bandwidth asymptotics in non- sistent and asymptotically Gaussian, enabling the construction parametric kernel estimation studied by Cattaneo, Crump and of confidence intervals about tail properties using variants of the Jansson (2010, 2014). More specifically, we focus in this article on the construc- delta method or similar approaches. For these asymptotics to − / provide a good approximation, however, k has to satisfy a del- tion of confidence intervals for the 1 n quantile, for given n h, and corresponding tail conditional expectation from an iid icatebalance:ontheonehand,pickingkn small invalidates the asymptotic approximation of consistency and/or Gaussianity sample. By restricting attention to scale and location equivariant

CONTACT Ulrich K. Müller [email protected] Department of Economics, Princeton University, Princeton, NJ . ©  American Statistical Association JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1335

X intervals, the asymptotic problem under fixed- k extreme value for any fixed k, where the joint p..f. of is given by ( ,..., ) = ( ) k ( )/ ( ) ≤ theory becomes a reasonably transparent parametric small- fX|ξ x1 xk Gξ xk =1 gξ xi Gξ xi on xk xk−1 sample problem: Given a single k-dimensional draw from the ≤···≤x1 with gξ () = dGξ (x)/dx,andzerootherwise. joint extreme value distribution (which is indexed only by the We seek to construct an inference method that yields valid scalar ξ), construct an equivariant for a spe- asymptotic inference whenever (3)holds,forsomegivenfinite cific deterministic function of ξ and h. One straightforward con- k, to better approximate the small-sample effect of having a struction is obtained by inverting a generalized likelihood ratio finite number of order that are reasonably modeled as statistic. A more systematic approach along the lines of Elliott, stemming from the generalized Pareto tail. Thus, the effective

Müller, and Watson (2015) and Müller and Norets (2016) mini- data becomes Yk = (Yn:n,...,Yn:n−k+1 ) ,andtheobjectiveisto ξ mizes weighted expected length (over )subjecttothecoverage construct a confidence set S(Yk) ⊂ such that P(U (F, n/h) ∈ constraint. S(Yk)) ≥ 1 − α,atleastasn →∞. Section 2 contains the details of the corresponding deriva- By definition of U (F, n/h), P(Yn:n ≤ U (F, n/h)) = (1 − n −h tions. Monte Carlo simulations in Section 3 show that these new h/n) → . Thus, under (3), (U (F, n/h) − bn)/an converges −h confidence intervals have excellent small-sample coverage and to the e quantile of X1, denoted (ξ, h) in the following. length properties for moderately large k and n compared to pre- A straightforward calculation shows q(ξ, h) = (h−ξ − 1)/ξ vious confidence interval constructions. We illustrate the appli- for ξ = 0andq(0, h) =−log(h).Ifan and bn were known, cation of the new intervals with data on mainland U.S. Hurricane the asymptotic problem would hence become inference about damage in Section 4.Finally,inSection 5,weconcludewitha q(ξ, h) based on the k × 1vectorofobservationsX.Butan theorem that provides an analytical sense in which the fixed-k and bn are not known, and they depend on tail properties of F approach leads to more robust large sample inference compared themselves. To make further progress, we hence impose loca- →∞ to potentially more informative kn approaches. tion and scale equivariance on the confidence set S. Specifi- cally, we impose that for any constants a > 0andb, S(aYk + ) = ( ) + (Y ) + ={ ( − )/ ∈ (Y )} 2. Derivation of Fixed-k Inference b aS k b,whereaS k b y : y b a S k . Under this equivariance constraint, we can write

2.1. High Quantile ( , / ) − Y − ( ( , / ) ∈ (Y )) = U F n h bn ∈ k bn , ,..., P U F n h S k P S We observe a random sample Y1 Y2 Yn from some popula- an an ( , ) tion with cumulative distribution function F.WriteU F p for → Pξ q(ξ, h) ∈ S(X) , the 1 − 1/p quantile of F.Weinitiallyconsiderinferenceabout ( , / ) the quantile U F n h for given h. The objective is to construct where the notation Pξ indicatesthattheprobabilityoftheevent an asymptotically valid confidence interval for U (F, n/h) of q(ξ, h) ∈ S(X) for any given set valued function S only depends level 1 − α,whereh does not vary with n (so the quantile on ξ.Let ⊂ R be the set of tail indices for which we impose of interest is of the same order of magnitude as the sample asymptotically valid inference. The asymptotic problem then maximum). is the construction of a location and scale equivariant S that ≤ ≤···≤ LetYn:1 Yn:2 Yn:n denote the order statistics, so that satisfies Yn:n is the sample maximum. The fundamental result in extreme value theory due to Fisher and Tippett (1928) and Gnedenko Pξ q(ξ, h) ∈ S(X) ≥ 1 − α for all ξ ∈  (4) (1943) states that if there exist sequences an and bn such that since any S that satisfies (4) also satisfies lim inf →∞ − n Yn:n bn P(U (F, n/h) ∈ S(Yk)) ≥ 1 − α under (3). This is a fairly ⇒ X1 (1) an transparent small-sample problem, involving a single observa- tion X ∈ Rk from a parametric distribution indexed only by the for some nondegenerate X1, then the distribu- scalar parameter ξ ∈ . tion of X1 is, up to location and scale normalization, the gener- One straightforward construction of S is based on the inver- alized extreme value distribution with .d.f. sion of a generalized likelihood ratio statistic: let LX˜ (μ,σ,ξ)= −μ −μ − /ξ ( ( X1 ,..., Xk )) − (σ ) exp[−(1 + ξx) 1 ], 1 + ξx ≥ 0, for ξ = 0 log fX|ξ σ σ k log be the log-likelihood of Gξ (x) = (2) × X˜ = μ + σ X exp[−e−x], x ∈ R, ξ = 0 the k 1randomvector .Totestthenullhypoth- esis H0 : q(ξ, h) = q0 based on X, consider the test statistic where ξ<0, ξ = 0andξ>0 correspond to Weibull, Gum- LR(q0, X) = max LX˜ (μ,σ,ξ) bel and Fréchet type extreme value distributions, respectively. {(μ,σ,ξ ):σ>0,ξ∈} Without loss of generality, assume that any location and scale − max LX˜ (μ,σ,ξ). normalization of X1 is subsumed in an and bn,sothatthec.d.f. {(μ,σ,ξ ):μ+σ q(ξ,h)=q0,σ >0,ξ∈} of X1 in (1)isequaltoGξ .Itiswellknown(see,forinstance,The- orem 3.5 of Coles (2001)) that if (1) holds, then extreme value Clearly, LR(aq0 + b, aX + b) = LR(q0, X) for all a > 0andb ∈ theory also holds jointly for the first k -order statistics R,sotheset ⎛ ⎞ − ⎛ ⎞ Yn:n bn (X) ={ ( , X)< } X SLR q0 :LR q0 cvLR (5) ⎜ an ⎟ 1 ⎜ ⎟ ⎜ ⎟ . ⇒ X = ⎝ . ⎠ (3) ⎝ . ⎠ . is equivariant. To ensure (4)holdsforS = SLR,cvLR here is cho- Yn:n−k+1−bn Xk sen to solve supξ∈ Pξ (LR(q(ξ, h), X)>cvLR ) = α. an 1336 U. K. MÜLLER AND Y. WANG

A more systematic approach to confidence interval construc- with c.d.f. F. Assume F is in the domain of attraction with tail tion seeks to minimize the weighted average expected length index ξ<1 (otherwise, the tail conditional expectation does criterion not exist). Recall that for a positive random variable with c.d.f.  = ∞( − ( )) FZ, E[Z] 0 1 FZ z dz.Thus Eξ [lgth(S(X))]dW (ξ ),(6) Tn − bn U (F, n/h) − bn  ( ) = = where is positive measure with on ,andlgth A a a 1 ∈ ⊂ R n n [y A]dy for any Borel set A .Tosolvetheprogramof  ∞ minimizing (6)subjectto(4)amongallequivariantsetestima- −1 + h n (1 − F(any + bn))dy. tors S,introduce U (F,n/h)−bn an − − − = X1 Xk , X2 Xk ,...,Xk Xk − − − As noted before, a necessary condition for F to be in the domain X1 Xk X1 Xk X1 Xk of attraction of an extreme value distribution with index ξ is that and its tail is approximately generalized Pareto. This can be written q(ξ, h) − X in the form Y s(ξ ) = k . X1 − Xk −1/ξ n(1 − F(any + bn )) →(1 + ξy) for all y such that 1 + ξy > 0 In this notation, equivariance of S implies Eξ [lgth(S(X))] = Eξ [(X − X )lgth(S(Xs))] = Eξ [κξ (Xs)lgth(S(Xs))]withas in Theorem 1.1.6 of de Haan and Ferreira (2007)(with 1 k − /ξ − s s ( + ξ ) 1 y ξ = κξ (X ) = Eξ [X1 − Xk|X ], and Pξ q(ξ, h) ∈ S(X) = 1 y interpreted as e for 0). Furthermore, as in ( ( , / ) − )/ → (ξ, ). Pξ (Y s(ξ ) ∈ S(Xs)). Thus, the program of minimizing (6) Section 2.1 above, U F n h bn an q h Putting this subject to (4)amongallequivariantsetestimatorsS equiva- together yields the convergence†  lently becomes −  Tn bn − − /ξ → q(ξ, h)+h 1 (1 + ξy) 1 dy s s min Eξ [κξ (X ) lgth(S(X ))]dW (ξ ) an {y:y>q(ξ,h) and (1+ξy)>0} S(·) −ξ s s h 1 s.. Pξ (Y (ξ ) ∈ S(X )) ≥ 1 − α for all ξ ∈ . (7) = − ≡ τ(ξ,h). ξ(1 − ξ) ξ Note that (7)onlyinvolvesS evaluated at Xs, which is an element of a smaller dimensional subspace compared to X (since the first Thus, the only difference between fixed-k asymptotic equivari- ( , / ) and last elements of Xs are always equal to 1 and 0, respectively). ant inference about the quantile U F n h and the tail condi- But any solution to (7) also provides the form of S for uncon- tional expectation Tn is that in the resulting small-sample prob- X τ(ξ, ) strained X via equivariance, S(X) = (X − X )S(Xs) + X . lem involving ,theobjectofinterestis h ,ratherthan 1 k k (ξ, ) By writing the expectations in (7) as integrals over the densi- q h . The two approaches described in Section 2.1 to the con- s s s struction of asymptotically valid confidence sets thus readily ties fXs|ξ of X and fY s (ξ ),Xs|ξ of (Y (ξ ), X ),itisnotverydifficult to see that its solution is carry over to tail conditional expectations.   s s s S(X ) = y : κξ (X ) fXs|ξ (X )dW (ξ ) 2.3. Implementation and Asymptotic Properties   The suggested confidence sets require a choice for the parameter < ( , Xs ) (ξ) fY s (ξ ),Xs|ξ y d (8) space of the tail index ,and,forS,aweightfunctionW.We choose  = [−1/2, 1/2], although the methods could equally  provided the nonnegative measure has support on well be implemented for other  (for S,allvaluesin have {Pξ (Y s(ξ ) ∈ S(Xs)) = 1 − α}⊂,andS(Xs) satisfies ∗ to be smaller than one for the expected length to exist, though). the constraint in (7). The only remaining challenge is thus Since ξ<1/2isnecessaryforF to have a finite second , to identify suitable Lagrangian weights , and to this end we  = [−1/2, 1/2] should be fairly agnostic for most applications. resort to the numerical approach developed in Elliott, Müller, For the weighting function W,wechooseadensityon that and Watson (2015)andMüllerandWatson(2015). The output is inversely proportional to the minimal expected length of an of these numerical approximations is a level 1 − α equivari- equivariant confidence set under ξ ∈  known. The idea is to ant set that has, by construction, W-weighted average expected use the known tail index case as a benchmark for the relative dif- length that is no more than 1% longer than any other level 1 − α ficulty of obtaining informative inference. The inverse weighting equivariant set (that is, any other equivariant set satisfying (4)). then puts equal weight on all ξ ∈  relative to this benchmark, Further details are provided in the Appendix. so loosely speaking, we seek to minimize the average regret of not knowing ξ. This weighting scheme has the additional advan- 2.2. Tail Conditional Expectation tage that the (essentially arbitrary) scale normalization for dif- ferent ξ of the extreme value distributions in (2)playsnorolein Now consider the problem of constructing an asymptotically the determination of S. valid fixed-k confidence interval for the tail conditional expec- = | ≥ ( , / ) tation Tn E[Yi Yi U F n h ], for given h,whereYi is iid † The convergence formally follows from the dominated convergence theorem for ξ<0 (since F then has bounded support), and by Karamata’s Theorem for ξ>0, ∗ Theorem  in Müller and Norets () provides a corresponding formal result. as in Equation (.) of Zhu and Li (). JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1337

Table . Asymptotic lengths of fixed-k confidence intervals.

h .  ξ −0.5 −0.25 0.00.25 0.5 −0.5 −0.25 0.00.25 0.5

Quantile, k = 10 Fixed-k LR . . . . . . . . . . Fixed-k opt . . . . . . . . . . Fixed-k env . . . . . . . . . . Quantile, k = 20 Fixed-k LR . . . . . . . . . . Fixed-k opt . . . . . . . . . . Fixed-k env . . . . . . . . . . Quantile, k = 50 Fixed-k LR . . . . . . . . . . Fixed-k opt . . . . . . . . . . Fixed-k env . . . . . . . . . . Tail conditional expectation, k = 10 Fixed-k LR . . . . . . . . . . Fixed-k opt . . . . . . . . . . Fixed-k env . . . . . . . . . . Tail conditional expectation, k = 20 Fixed-k LR . . . . . . . . . . Fixed-k opt . . . . . . . . . . Fixed-k env . . . . . . . . . . Tail conditional expectation, k = 50 Fixed-k LR . . . . . . . . . . Fixed-k opt . . . . . . . . . . Fixed-k env . . . . . . . . . .

NOTES: Entries are asymptotic expected lengths of % confidence intervals about the 1 − h/n quantile and tail conditional expectation relative to the expected length of the k = 10 fixed-k LR interval. See the main text for a description of the three types of confidence intervals.

For any given k, h and confidence level α,thecriti- smallest expected length interval does not exist and correspond- cal value cvLR and measure  only needs to be deter- ingly, the choice of W matters. At the same time, for ξ ≤ 0.25, mined once. Given cvLR and , the confidence sets SLR the expected length of S comes reasonably close to the enve- and S are readily computed from (5)and(8). We pro- lope, so no other choice of W couldleadtomuchshorterinter- vide corresponding Matlab code, and tables of cvLR and  vals there. All these differences are more pronounced for h = 0.1 distributions for k ∈{5, 10, 15, 20, 30, 40, 50, 75, 100},logh ∈ than for h = 5. Looking across different values of k, expected {−5.0, −4.5,...,3.0} and α ∈{0.01, 0.05, 0.10, 0.20}.Over lengthstendtodecreaseatarateslowerthank−1/2,withapro- this range of parameters, the weighted expected length (6)of nounced exception for h = 0.1 and negative ξ ,wherelargerk SLR is up to 60% longer than S, with the largest differences enable dramatically more informative inference. occurring for small h and moderately large k.Butformany other parameter configurations, the gains are more moderate: 3. Monte Carlo Results the difference in weighted expected length is 12%. In this sense, SLR is fairly close to weighted expected length opti- This section reports some small-sample results for the result- malinmanyscenarios. ing confidence set for α = 0.05 and n = 250. We consider Table 1 reports the asymptotic expected lengths of 95% six data-generating processes: a standard normal and stan- confidence intervals SLR (“fixed-k LR”) and the weighted dard log- (both in the domain of attrac- expected length optimal interval S (“fixed-k opt”) for ξ ∈ tion of the Gumbel law, ξ = 0), a Student-t distribution with {−0.5, −0.25, 0, 0.25, 0.5}, k ∈{10, 20, 50} and h ∈{0.1, 5}, 3 degrees of freedom (in the domain of attraction of the relative to the expected lengths of SLR for k = 10 and the same Fréchet law with ξ = 1/3), an F-distribution with 4 degrees value of ξ. As an additional comparison, we also report for each of freedom in both the numerator and denominator (in value of ξ the expected length of the length optimal confidence the domain of attraction of the Fréchet law with ξ = 0.5), interval with degenerate weight function W with all mass at that a 0.8 / 0.2 mixture between a standard normal distribution and value of ξ.These“fixed-k env” rows do not correspond to a fea- a Student-t distribution with 3 degrees of freedom (in the same sible interval, but provide the lower envelope on the expected domain of attraction as the pure Student-t distribution, since length of any 95% fixed-k confidence interval for each value the tail is eventually dominated by the Student-t distribution), of ξ. and a symmetric with support equal to AscanbeseenfromTable 1,theSLR intervals tend to be [−1, 1] (in the domain of attraction of the Weibull law with shorter than S for ξ = 0.5, but they have relatively less attrac- ξ =−0.5). tive properties for ξ ≤ 0. The S intervals are up to 40% longer We compare the two intervals derived above (“fixed-k than the envelope, although for ξ ≤ 0.25 the differences are LR” and “fixed-k opt”) with several alternatives developed in much less pronounced. This demonstrates that a uniformly the literature under kn →∞ asymptotics. The first method 1338 U. K. MÜLLER AND Y. WANG

(“W-H”) is the classic Weissman (1978) that relies on reflects that most distributions considered in Tables 2 and 3 a Pareto tail approximation and estimates the tail index by the have nicely behaved tails, so that a relatively large fraction can Hill (1975) estimator. Denoting the Hill estimator by ξˆH ,the be well approximated by a generalized Pareto distribution. ˆ −ξˆH But as demonstrated by the , it is easy to 1 − h/n quantile estimator is UWH = Yn:n−k (h/k) ,andthe ˆ construct underlying distributions whose tail behavior is much corresponding tail conditional expectation estimator is TWH = ˆH less benign—most draws stem from the normal component, ( ( / )−ξ )/( − ξˆH ) Yn:n−k h k 1 . Confidence intervals are obtained misleadingly indicating a thin tail, but the tail properties of by exploiting the asymptotic normal limit of these estimators interest are in fact determined by the Student-t component, (after suitable scaling). Note that the method is asymptotically especially for small h.Inthatcase,choosingk > 10 leads to ξ> valid only for 0. For further discussion and small-sample substantial undercoverage also for the fixed-k approaches. simulation results, see Drees (2003). The second method (“dH- F”) is described in Chapter 4 of de Haan and Ferreira’s (2007) textbook and is based on the asymptotically normal estima- ˆ 4. Application to Hurricane Damage ˆ ˆ −ξ ˆM ˆ torUdHF = Yn:n−k + a (n/k) ((h/k) − 1)/ξ ,wherea (n/k) and ξˆM are estimators of the scale and the tail index, correspond- Hurricanes cause significant damage to coastal communities in ingly (similar to what was derived by Dekkers and de Haan theU.S.Itseemsinterestingtolearnabouttailfeaturesofthedis- (1989) and de Haan and Rootzén (1993)). The analogous esti- tribution of these damages, both from a public policy and insur- ˆ ance perspective. In its technical memorandum NWS NHC-6, mator for the tail conditional expectation is TdHF = Yn:n−k + ˆ the National Weather Service provides estimates of the dam- ˆ ( / ) (( / )−ξ M − + ξˆM )/(ξˆM ( − ξˆM )) a n k h k 1 1 . See de Haan and age of the 30 costliest mainland U.S. tropical cyclones from Ferreira (2007) for further details on the construction of the cor- 1900–2010, measured in 2010 US$. These estimates, however, responding confidence intervals. None of these estimators are are based on variable methodology, which is standardized only ˆ ˆ location and scale equivariant, since ξ H and ξ M are not trans- from the 1995 hurricane season onwards. We thus only rely on lation invariant. Invariant estimators were discussed by Santos, the data about the k = 10 costliest hurricanes in the 16 year win- Alves, and Gomes (2006), but we found that the suggested con- dow from 1995 to 2010. Panel A in Table 4 replicates these data fidence intervals based on these estimators do not perform well points for convenience. in our small sample. Finally, we consider intervals constructed Note that in this example, the number n of total tropical from the profile likelihood based on a Poisson and generalized cyclones in the period 1995–2010 is not known to us. Under the Pareto approximation (“profile”), as suggested and implemented assumption that hurricane damage is iid and hurricane arrival by Davison and Smith (1990). This method assumes that the is stationary, the 1 − h/n quantiles nevertheless has a straight- number of exceedances above a high threshold u follows a Pois- forward interpretation: a hurricane causing at least that amount son distribution with mean λ, and conditionally on the number of damage is expected every 16/h years. Similarly, the tail con- of exceedances, the excess values are iid generalized Pareto. In ditional expectation corresponds to the average damage of these = our implementation, we choose u Yn:n−k, which renders the extreme hurricanes. corresponding intervals scale and translation equivariant. For all Panel B in Table 4 provides 95% confidence intervals for these →∞ three kn methods, we impose the same parameter space quantities for h ∈{0.1, 1, 5} using the two fixed-k approaches restriction  = [−1/2, 1/2] on the tail index that we chose in developedabove.Inallcases,theLRintervalsarefullycontained the implementation of the fixed-k method.‡ in the weighted expected length minimizing ones. It is interest- Tables 2 and 3 report the coverage and length properties of ing to note that the upper bound of the LR interval for h = 1 these five procedures for h ∈{0.1, 5}. We find that the fixed-k is $116.3 billion, only slightly larger than the costliest hurricane approaches have excellent small-sample coverage, especially for that was observed in the sample (Katrina in 2005 with $105.8 k ≤ 20. In contrast, the W-H and dH-F intervals display very billion). As can be seen in the first panel, Katrina is an out- substantial undercoverage in many of the considered cases. In lier relative to the other costliest hurricanes in 1995–2010, mak- comparison, the profile likelihood intervals fare much better, ing it relatively implausible that the 1 − 1/n quantile is much corroborating corresponding remarks about the superior small- bigger than the largest observation. The corresponding upper sample performance in Coles (2001)andSmith(2004). Still, the bound on the 95% LR confidence interval for the tail conditional fixed-k approaches provides much more reliable inference, espe- expectation is $266.4 billion. This is very large in absolute terms, cially for k ≤ 20. corresponding to roughly 1.7% of current U.S. gross domestic The small-sample performance of the fixed-k intervals mir- product, and indicates substantial insurance needs even from a rors their asymptotic properties of Table 1:sizecontrolisvery macroeconomic perspective. similar, and the LR intervals are shorter for underlying distribu- tions with heavy tails, but at the expense of worse performance under thinner tails. In our view, the relative simplicity and trans- 5. Robustness of Fixed-k Asymptotic Inference parency of the LR intervals give them the edge for applied work, unless there is good reason to be believe that the data might For a given distribution F in the domain of attraction of an exhibit thin tails. extreme value distribution, there are eventually infinitely many →∞ Larger k mostly leave coverage of the fixed-k intervals close observations from the generalized Pareto tail as n .Forthis to the nominal level, and average length decreases. This pattern reason, we expect the fixed-k confidence intervals to be most useful for inference in small to moderately large samples, where ‡ Not doing so leads to intervals that are very much longer on average, and with alargek would amount to modeling a large fraction of F as being typically no better (but often) worse coverage properties. generalized Pareto. In this sense, the fixed-k approach can be JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1339

Table . Small-sample performance for 1 − h/n quantile.

      k Cov Lgth Cov Lgth Cov Lgth Cov Lgth Cov Lgth Cov Lgth

h = 0.1 Normal Student-t() Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . .  . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . . Log-Normal F(,) Fixed-k LR . . . . . . .  .  .  Fixed-k opt .  . . . . .  .  .  W-H .  .  . . .  .  .  dH-F . . . . . . .  .  .  Profile . . . . . . .  .  . . Mixture N (0, 1)/Student-t() Triangular Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . .  .  .  . h = 5 Normal Student-t() Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . . Log-Normal F(,) Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . . Mixture N (0, 1)/Student-t() Triangular Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . .

NOTES: Entries are coverage probabilities (in percent) and expected lengths of nominal % confidence intervals in a sample of size n = 250 about the 1 − h/n quantile of the underlying distribution, based on the largest k order statistics. See the main text for a description of the five types of confidence intervals. Based on  Monte Carlo simulations. thoughtofasasmall-sampleadjustmenttokn →∞asymptotic an and bn approaches. n [F (a x + b )] → Gξ (x). (9) At the same time, for any sample size n, there exist F that are 0 n n in the domain of attraction of the extreme value distribution, yet Suppose the sequence of distribution functions Fn is choosing k large yields poor inference (think of the mixture dis- such that for some rn →∞, Fn(y) = F0(y) for all y ≥ tribution case in the simulation section). This suggests studying U (F0, n/rn ).LetYn:1,...,Yn:n be the order statistics from the robustness of tail inference under triangular array asymp- an iid sample of size n from Fn.Thenforanyfixedk ∈ N, totics, where the iid sample of size n is drawn from a population − − = Yn:n bn Yn:n−k+1 bn indexed by n,sothatF Fn. The following theorem contrasts ,..., ⇒ (X1,...,Xk) fixed-k asymptotic inference with potentially more informative an an k →∞inference under such triangular array asymptotics. Its (10) n ,..., proof is in the Appendix. where X1 Xk have joint extreme value distribution with tail index ξ. (b) Write Paξ for the c.d.f. of a Pareto distribution with sup- Theorem 5.1. port [1, ∞) and shape parameter α = 1/ξ > 0. Suppose ˆ (a) Let F0 be in the domain of attraction of an extreme value ξn is a scale invariant estimator of ξ that, when applied to ξ ξ distribution with tail index , that is for some sequences an iid sample from Paξ0 ,convergesinprobabilityto 0 for 1340 U. K. MÜLLER AND Y. WANG

Table . Small-sample performance for 1 − h/n tail conditional expectation.

      k Cov Lgth Cov Lgth Cov Lgth Cov Lgth Cov Lgth Cov Lgth

h = 0.1 Normal Student-t() Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . .  . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . . Log-Normal F(,) Fixed-k LR .  .  .  .  .  .  Fixed-k opt .  .  .  .  .  .  W-H .  .  .  .  .  .  dH-F .  .  .  .  .  .  Profile .  .  .  .  .  .  Mixture N (0, 1)/Student-t() Triangular Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . .  .  .  . h = 5 Normal Student-t() Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . . Log-Normal F(,) Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . . . . . . . . Mixture N (0, 1)/Student-t() Triangular Fixed-k LR . . . . . . . . . . . . Fixed-k opt . . . . . . . . . . . . W-H . . . . . . . . . . . . dH-F . . . . . . . . . . . . Profile . . . . . .  . . . . .

NOTES: Entries are coverage probabilities (in percent) and expected lengths of nominal % confidence intervals in a sample of size n = 250 about the tail conditional expectation E[Yi|Yi > U(F, n/h)] of the underlying distribution F, based on the largest k order statistics. See the main text for a description of the five types of confidence intervals. Based on  Monte Carlo simulations.

Table . Empirical results on damage of U.S. mainland hurricanes.

Panel A: Damage in US$ billion of  costliest hurricanes in –

. . . . . . . . . . Panel B: Endpoints of % confidence intervals for the 1 − h/n quantile and tail conditional expectation

h = 0.1 h = 1 h = 5

Fixed-k LR Quantile . . . . . . TCE . . . . . . Fixed-k opt Quantile . . . . . . TCE . . . . . .

NOTE: Based on the k = 10 order statistics from Panel A. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1341

some ξ0 > 0. Then for any ξ1 > 0 there exists a sequence fact, a calculation shows√ that if the upper tail of F of mass rk/n of distributions Fn and a real sequence rn →∞such that with rk = k + 3 + 3 k isequaltoageneralizedParetodistri- ˆ Fn(y) = Paξ (y) for all y ≥ U (Paξ , n/rn ),yetξn applied bution F0, then with probability of at least 99.88%, the largest 1 1 ≥ ≥ to an iid sample of size n from Fn converges in probability k-order statistics stem from F0,forallk 5andn rk.The to ξ0. only remaining approximation of fixed-k asymptotic inference ˆ ˆ (c) For some fixed h > 0, let [ln, un]beascaleequiv- in small samples then involves the approximation of the distri- ariant confidence interval for U (F, n/h).Suppose bution of Krk by a with mean rk.Unreported ˆ / ( , / ) ˆ / ( , / ) small-sample simulations with n as small as 25 show excellent ln U Paξ0 n h and un U Paξ0 n h converge in prob- ability to unity when applied to an iid sample of size n coverage properties of the fixed-k intervals derived here for all / ξ = ξ ξ > F with upper tail of mass rk n equaltoageneralizedParetodis- from Paξ0 .Thenforany 1 0, 1 0thereexistsa tribution, even if below the U (F, n/rk ) quantile, the shape of F sequence of distributions Fn and a real sequence rn →∞ differs arbitrarily from the generalized Pareto tail distribution. such that Fn(y) = Paξ (y) for all y ≥ U (Paξ , n/rn ),yet ˆ 1 1 In practice, the choice of k is thus directly interpretable as an [ , uˆ ] applied to an iid sample of size n from F contains n n n assumption about the extent of the (approximate) generalized U (F , n/h) = U (Paξ , n/h) with probability converging n 1 Pareto tail, and it makes sense to present consumers of tail infer- to zero. ence with results under various choices for this key regularity assumption. We leave further consideration of this important Part(a)showsthatitsufficesforthesequenceofdistributions issue to future research. Fn to have a tail in the domain of attraction of extreme value dis- tribution of mass equal to rn/n to induce the corresponding joint extremevaluedistributionforthelargestk order statistics, for A. Appendix any fixed k and arbitrarily slowly increasing sequence rn →∞. Thus, to the extent that the object of interest concerns properties A.1. Computational Details of this (extreme) tail, fixed k asymptotic inference is asymptoti- κ (·) (·) (·) cally valid. The set S in (8)requiresevaluationof ξ fXs|ξ and fY s (ξ ),Xs . In contrast, part (b) demonstrates that any approach to scale Using the expression for fX|ξ below (2), straightforward calculations invariant consistent tail index estimation is necessarily nonro- yield bust to some sequence Fn of this sort. All popular tail index esti-  b0 (ξ ) mators are scale invariant and consistent under iid data from s s k−1 κξ (X ) fXs|ξ (X ) = (k − ξ) s a Pareto distribution. Thus, Theorem 1(b) applies and shows 0 that for a large enough sample size, one can always construct   an underlying distribution for which the extreme tail properties k × −( + ξ −1 ) ( + ξ s ) , are Pareto with parameter ξ1, yet the tail index estimator takes exp 1 log 1 Xi s i=1 on values close to an entirely different value ξ0 = ξ1 with high probability. This typically implies very poor coverage properties of confidence intervals based on consistent tail index estimators. where is the Gamma function, b (ξ ) =−1/ξ for ξ<0, and ( , / ) 0 The fixed-k intervals about U F n h derived here have b(ξ ) =∞otherwise, and length Op(U (F, n/h)) for data from a Pareto distribution. In →∞  contrast, intervals derived under kn asymptotics usually   b1 (ξ ) ( ( , / )) s  −k k−1 aremoreinformativeandhavelengthop U F n h ,sothat fY s (ξ ),Xs (y, X ) = y |q(ξ, h) − s| (ξ ) their endpoints satisfy the assumption of part (c). The theorem a1  − /ξ − shows that any such scale equivariant interval has zero asymp- × exp −(1 + ξs) 1 − (1 + ξ 1 ) totic coverage for some underlying sequence of distributions Fn  k for which fixed-k inference is asymptotically valid, whether or q(ξ, h) − s × log 1 + ξs + Xsξ ds, not its construction involves a consistent tail index estimator i y ˆ i=1 ξn. Taken together, Theorem 1 thus provides a precise sense in which fixed-k asymptotic inference is more robust than all pre- (ξ ) (ξ ) ∈ ( (ξ ), (ξ )) + viously suggested inference approaches we are aware of. where a1 and b1 are such that for all s a1 b1 ,1 ξ > + ξ + ξ q(ξ,h)−s > ( (ξ, ) − )/ > A key challenge for the empirical implementation of fixed- s 0, 1 s y 0and q h s y 0. We evalu- k based inference is the selection of k. Of course, also under ate these integrals by numerical quadrature.  kn →∞, the determination of kn in a given sample of size n is The determination of closely follows the approach devel- widely recognized as a difficult issue. But the problem is arguably oped in Elliott, Müller, and Watson (2015) and Müller and Wat- ˜ ˜ even harder under fixed-k asymptotics, as there cannot exist a son (2015). As discussed there, consider  = c,where is a procedure based on the largest k order statistics that consistently given with support on .Supposethe > ξ ˜ determines whether, say, k1 ≤ k or k2 < k1 is appropriate. scalar c 0issuchthatif is random and drawn from , Oneusefulwaytothinkaboutthechoiceofk is to recall that then Sc˜ hascoverageequaltothenominallevel,thatisc solves ( , / ) ( s(ξ ) ∈ (Xs )) (ξ)˜ = − α ˜ the number Kr of exceedances above U F n r in an iid sam- Pξ Y Sc˜ d 1 (so c is a function of ). = ple from F has a with parameters n and Then, the W-weighted average expected length of Sc˜ , ˜ = / κ (Xs ) ( (Xs )) (ξ ) p r n,soithasmeanr and a no larger than r.In Eξ [ ξ lgth Sc˜ ]dW ,providesalowerboundforthe 1342 U. K. MÜLLER AND Y. WANG

W-weighted average expected length of any set S with uniform cov- ˜ −1( ˜ − ,..., ˜ − for the sample Yn,i,thatisan Yn:n bn Yn:n−k+1 ξ ( s(ξ ) ∈ (Xs )) ≥ − α ξ ∈  erage P Y S 1 for all , since uniform cov- bn ) ⇒ (X1,...,Xk ).LetKn be the number of U1,...,Un ˜ ˜ erage implies (at least) -weighted average coverage for any proba- that exceed 1 − rn/n.TheeventYn:i = Yn:i for i = n − k + ˜  bility distribution with support on , and by construction of S, 1,...,n is clearly implied by the event Kn ≥ k.ButKn is Sc˜ minimizes W-weighted average expected length among all sets binomially distributed with parameters rn/n and n,sorn → ˜ − α with -weighted coverage of at last 1 . ∞ implies P(Kn ≥ k) → 1, and the result follows. ξ > = ( , / ) = ( / ) 0 So suppose we knew of some probability distribution (b) For r 0, let ur,n U Paξ0 n r n r and define the ˜ ∗  ∗ > with support on and constant d 0suchthat c.d.f. Hn,r as ( s(ξ ) ∈ (Xs )) ≥ − α ξ ∈  (i) Pξ Y Sd∗˜ ∗ 1 for all and (ii) s s ξ κξ (X ) ( ∗ ˜ ∗ (X )) (ξ ) ≤ ( + ε) ˜ ∗ ξ ( ) ≤ , E [ lgth Sd  ]dW 1 V .Thenwe Pa 0 y for y ur n − α (Xs ) Hn,r (y) = would have identified a level 1 confidence set, Sd∗˜ ∗ ,that Paξ (U (Paξ , n/r)y/u , ) for y > u , is demonstrably no more than 100ε% longer in a W-weighted 1 1 r n r n average expected sense than any other confidence set of the same − /ξ − 1 0 ≤ level. 1 y for y ur,n = The remaining challenge is the determination of a suitable dis- 1/ξ −1/ξ − /ξ 1 − u , 1 0 y 1 1 for y > u , . tribution ˜ ∗.Tothisend,werestrict˜ ∗ to be a discrete distri- r n r n bution with support on  ={−1/2, −1/2 + 1/59,...,1/2},and a We first show that for fixed r,theexperimentofobservingan determine the 60 point masses by fixed-point iterations based on n iid sample of size n from H , , {Y , } = ,iscontiguoustothe importance sample Monte Carlo estimates of coverage, as sug- n r n i i 1 experiment of observing an iid sample of size n from Paξ . gested by Elliott, Müller, and Watson (2015) and Müller and Wat- 0 Write Fn for the product measure of an iid sample of size n son (2015). In particular, we simulate rejection probabilities with n ξ from the distribution Fn. Note that the log likelihood ratio 100,000 iid draws from a proposal with drawn uniformly from ( n / n ) log dHn,r dPaξ0 is given by p ={−0.5, −0.5 + 1/29,...,0.5} (which yields Monte Carlo standard errors of approximately 0.1% for 95% level confidence n n n sets), and iteratively increase or decrease the 60 point masses on log(dH , /dPaξ ) = 1[Yn,i ≥ ur,n]((1/ξ0 − 1/ξ1 )  n r 0 a as a function of whether the (estimated) inclusion probability i=1 under the corresponding ξ ∈ a is larger or smaller than the nom- × log(Y , /u , ) + log(ξ /ξ )). inal level. After 500 iterations, the resulting discrete distribution is n i r n 0 1 ˜ ∗ acandidatefor .WecomputeV˜ ∗ , and then numerically deter- = ∗ κ (Xs ) ( (Xs )) (ξ ) = ( + ε) Let Kn the number of exceedances of ur,n, Kn mine d so that Eξ [ ξ lgth Sd∗˜ ∗ ]dW 1 V˜ ∗ n 1 ≥ = ,..., i=1 [Yn,i ur,n]. Furthermore, let Wn,i, i 1 Kn for ε = 0.01. In a last step, we check whether S ∗ ˜ ∗ indeed controls d  { − }Kn be a random permutation of Yn:n−i+1 ur,n i=1.Then coverage uniformly by computing coverage probabilities over the n under Paξ , and conditional on K , W , is iid with  ={− / , − / + 1 ,..., / } 0 n n i fine grid f 1 2 1 2 199 1 2 . c.d.f. FW,n(w) = (Hn,r (ur,n + w) − Hn,r (ur,n ))/(1 − The critical value cvLR of SLR in (5)iscomputedfrom −1/ξ0 Hn,r (ur,n )) = 1 − (1 + w/ur,n ) for w ≥ 0asin the same 100,000 importance sampling draws. In all cases, = / + ( ( (ξ, ), X)> ) = α ξ = / Smith (1987), so that Zn,i Wn,i ur,n 1isiidwith Pξ LR q h cvLR for 1 2. −1/ξ n c.d.f. 1 − z 0 for z ≥ 1. Furthermore, under Paξ ,by For a given value of k and h, these computations take about one 0 Theorem 2.1.1 in Leadbetter (1983), Kn ⇒ K,whereK is minute on a modern PC. Of course, in actual applications, there is n ∗ ˜ ∗ Poisson with parameter r. Thus, under Paξ0 , no need to recompute cvLR, d ,and ; the determination of the confidence sets simply requires the numerical determination of5 ( )  = ∗˜ ∗ K and (8)forthevaluesofcvLR and d we already computed. n / n ⇒ ( ) = (( /ξ − /ξ ) log dHn,r dPaξ log L 1 0 1 1 ˜ = {(μ,σ,ξ ) σ> ,ξ∈} ˜ (μ,σ,ξ) 0 Forthelatter,notethatwithMX max : 0 LX , i=1 the upper endpoint of SLR solves the nonlinearly constrained opti- × + (ξ /ξ )), mization problem log Zi log 0 1

max μ + σ q(ξ, h) s.t. LX˜ (μ,σ,ξ)≥ MX˜ − cvLR, {(μ,σ,ξ ) σ> ,ξ∈} −1/ξ0 : 0 where Zi is iid with c.d.f. 1 − z independent of K (and log(L) = 0ifK = 0). Also and the lower endpoint solves the corresponding minimization    problem. We provide tables of cvLR and  and corresponding Mat- K = (( /ξ − /ξ ) + (ξ /ξ )) lab code on our website, www.princeton.edu/∼umueller. E[L] E exp 1 0 1 1 log Zi log 0 1 i=1   ∞ s ξ /ξ − /ξ = ( = ) 0 1 0 1 1 A.2. Proof of Theorem 1 P K s E ξ Zi s=0 i=1 1 ∼ , −( ) = { ∈ R ( ) ≥ } (a) Let Ui iidU[0 1] and Fn p infy y : Fn y p . ∞ = −( ) Generatethesampleofsizen from Fn viaYn,i Fn Ui ,and = P(K = s) = 1, − ˜ = ( ) = denote the corresponding sample from F0 by Yn,i F0 Ui , s 0 = −( ) so that the order statistics are given by Yn:i Fn Un:i − /ξ − /ξ ˜ = ( ) 1 0 1 1 = ξ /ξ and Yn:i F0 Un:i .Asiswellknown(seeTheorem3.5 where the third equality uses E[Zi ] 1 0 from a n / n ⇒ of Coles (2001), for instance), (9)impliesthat(10)holds direct calculation. But the convergence dHn,r dPaξ0 L JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1343

n = Cattaneo, M. D., Crump, R. K., and Jansson, M. (2010), “Robust Data- under Paξ0 and E[L] 1 imply contiguity via LeCam’s first lemma (see Lemma 6.4 in van der Vaart (1998), for Driven Inference for Density-Weighted Average ,” Journal instance). of the American Statistical Association, 105, 1070–1083. [1334] Cattaneo, M. D., Crump, R. K., and Jansson, M. (2014), “Small Bandwidth ξˆ →p ξ n Bydefinitionofcontiguity, n 0 under Paξ0 implies Asymptotics for Density-Weighted Average Derivatives,” Econometric ξˆ →p ξ n Theory, 30, 176–200. [1334] n 0 also under Hn,r,foranyfixedr,wherewewrite p Coles, S. (2001), An Introduction to Statistical Modeling of Extreme Values, ‘→’ for convergence in probability. Now for any r > 0, let London: Springer. [1334,1335,1338] ∗ ≥ (|ξˆ − ξ | > Davison, A. C., and Smith, R. L. (1990), “Models for Exceedances over High nr be the smallest n 1suchthatsupn≥n∗ P n 0 Thresholds,” Journal of the Royal Statistical Society, Series B, 393–442. −1 )< −1 n ξˆ →p ξ n r r under Hn,r.Notethat n 0 under Hn,r for [1338] any fixed r implies nr < ∞.Constructtheinversefunc- de Haan, L., and Ferreira, A. (2007), Extreme Value Theory: An Introduction, tion rn via rn = sup{r > 0:nr ≤ n}. Since nr < ∞ for all New York: Springer Science and Business Media. [1334,1336,1338] n ˆ −1 −1 de Haan, L., and Rootzén, H. (1993), “On the Estimation of High Quantiles,” r, rn →∞. Thus, under H , , P(|ξn − ξ0| > r )