Multivariate Tail Estimation with Application to Analysis of Covar

MULTIVARIATE TAIL ESTIMATION WITH APPLICATION TO ANALYSIS OF COVAR

TILO NGUYEN AND GENNADY SAMORODNITSKY

Abstract. The quality of estimation of multivariate tails depends sig- niﬁcantly on the portion of the sample included in the estimation. A simple approach involving sequential statistical testing is proposed in order to select which observations should be used for estimation of the tail and spectral measures. We prove that the estimator is consistent. We test the proposed method on simulated data, and subsequently apply it to analyze CoVar for stock and index returns.

1. Introduction It is one of the stylized facts about ﬁnancial returns that instruments related by industry, or by geography, tend to show highly dependent extreme movements. Quantifying this extreme dependence is, clearly, important, but statistical estimation is tricky.

1991 Mathematics Subject Classiﬁcation. Primary 62G32, 60G70. Key words and phrases. extremes, tail estimation, tail measure, spectral measure, Co- Var, tail region . This research was partially supported by the ARO grants W911NF-07-1-0078 and W911NF-12-10385, NSF grant DMS-1005903 and NSA grant H98230-11-1-0154 at Cornell University.

0.4 0.2

0.3 0.15

0.2 0.1

0.1 0.05

0 0 CAC 40

JP Morgan −0.1 −0.05

−0.2 −0.1

−0.3 −0.15

−0.4 −0.2 −0.4 −0.2 0 0.2 0.4 −0.2 −0.1 0 0.1 0.2 Bank of America FTSE 100

Figure 1. Daily log return 1 2 T. NGUYEN AND G. SAMORODNITSKY

Figure 1 shows a scatter plot of the daily log returns on the stock of Bank of America and JP Morgan over a period of 4362 days from January 1996 to June 2012, next to a scatter plot of daily long returns on the British FTSE 100 index and the French CAC 40 index, from January 1995 to April 2012. It is obvious to the “naked eye” that the two indices have higher dependence in the extreme movements than the two stocks do. The diﬃculty with making this precise via statistical estimation is two-fold. First of all, there are simply not that many extreme observations. Secondly, it is not easy to decide, in a given sample, what observations “belong to the tail region”, and should be used to estimate the dependence of the extremes. The main tools in describing such dependence are the tail and spectral measures of the returns. Theit statistical estimation has been studied in detail, see e.g. de Haan (1985), de Haan and Resnick (1993), Einmahl et al. (2001), Einmahl and Segers (2009). All known methods are subject to the diﬃculties described above. We propose a method that allows a user to systematically decide what part of the sample to use in tail estimation. The method is fast and easy to automate. The rest of the paper is organized as follows. In Section 2, we summarize the notions of regular variation, both univariate and multivariate. In Section 3, we describe how one can use the method of ranks to estimate the tail and spectral measures. Our method for deciding on the “tail part of the sample” is introduced in Section 4, where we also prove that our approach leads to a consistent estimator. In Section 5 we test our approach of estimating the spectral measure on simulated data, and compare it with the “rule of thumb” approach that has been commonly used. Finally, in Section 6 we analyze the recently introduced conditional risk measure, CoVar, for stock and index returns.

2. Regular variation: A summary

Regular variation provides us with a framework that enables us to model multivariate extreme events. We now proceed with a quick summary of the background theory. MULTIVARIATE TAIL ESTIMATION 3

Let F be a univariate distribution function. F is said to have a regularly varying right tail of index α > 0 if the tail function F¯ = 1 − F satisfies F¯(tx) (2.1) lim = t−α x→∞ F¯(x) for all t > 0. The index α measures the heaviness of the tail. The smaller is α, the heavier is the right tail of the distribution. An encyclopedic treatment of regular variation is given by Resnick (1987, 2007). The concept of regular variation can be extended to the multivariate case. Let Z be a d-dimensional random vector with nonnegative components and d F be the distribution of Z. Denote E = [0, ∞] \{0}. We say that F has a regularly varying tail if there exists a function b(t) → ∞ and a nontrivial (i.e. nonzero) Radon measure ν on E, vanishing on the set of infinite points, such that Z tP ∈ · →v ν, t → ∞ , b(t) vaguely in the space of nonnegative Radon measures on E; see Resnick (2007). The index α of the regular variation of the tail in this definition is hidden in the index of regular variation of the function b. It also appears in the scaling property the limit measure ν (the so-called tail measure) must possess: ν(c·) = c−αν(·) for every c > 0. Equivalently, Z has a regularly varying tail with index α > 0 if there exists d−1 d a probability measure S on S+ = S ∩ [0, ∞) such that for all x > 0 Z t ||Z|| > b(t)x, ∈ · ⇒ cx−αS(·), t → ∞ , P ||Z|| d−1 weakly on the unit sphere S . The probability measure S is called the spectral measure of the distribution. One can choose an arbitrary norm on d R ; this will determine the unit sphere and, thus, the spectral measure. The constant c depends on the choice of the normalizing function b. Clearly, if the same normalizing function b is used in both definitions, then the tail and spectral measures are related by Z (2.2) ν ||Z|| > x, ∈ · = cx−αS(·), x > 0 . ||Z|| If the tail measure (or the spectral measure) is concentrated on the axes, it is common to speak of asymptotic (or tail) independence between the 4 T. NGUYEN AND G. SAMORODNITSKY components of the vector Z, in the sense of very low likelihood that more than one component takes a large value at the same time. Otherwise, the components of the vector Z are said to be tail dependent. We should note that these definitions are really useful if all the marginal tails are comparable, which implies that they are regularly varying with the same index α. In practice this is often an unreasonable assumption. There exists a framework of multivariate regular variation that allows different marginal tail indices. Specifically, we assume that there exist d normalizing functions b(i)(t) → ∞, i = 1 . . . , d, such that both

" (i) ! # Z v (2.3) t P ∈ · → ναi , t → ∞ b(i)(t) vaguely on [0, ∞) for each i = 1, . . . , d and

" (1) (2) (d) ! # Z Z Z v (2.4) t P , ,..., ∈ · → ν, t → ∞ b(1)(t) b(2)(t) b(d)(t) vaguely on E, where ναi is a measure on the Borel sets [0, ∞) with the density −(1+α ) ciαix i , x > 0, αi, ci > 0, i = 1, . . . , d with respect to the Lebesgue measure. Moreover, ν is a nontrivial Radon measure on E, vanishing on the set of infinite points; it is also called the tail measure. The ith normalizing function bi is regularly varying with index 1/αi, i = 1, . . . , d. Since the indices of regular variation may be different in different directions, the normalizing functions may be different as well; this the nonstandard version of regular variation. It can be converted into the standard version of regular variation, with equivalent tails, as follows. Let Fi, i = 1, . . . , d be the marginal distribution functions. Define 1 ← (2.5) ui(x) = (x), i = 1, . . . , d . 1 − Fi Then " ! # u←Z(i) (2.6) t i , i = 1, . . . , d ∈ · →v ν (·) P t ∗ vaguely on E, while for each i = 1, . . . , d, " # u←Z(i) (2.7) t i > x → x−1, x > 0 . P t MULTIVARIATE TAIL ESTIMATION 5

Here ν∗ is a nontrivial Radon measure on E, vanishing on the set of inﬁnite points, with the scaling property

−1 (2.8) ν∗(c·) = c ν∗(·), c > 0 .

The measures ν and ν∗ are related by

1/α c c ν([0, x ] ) = ν∗([0, x] ) ,

1/α 1/αi where x is deﬁned as the vector with coordinates xi , i = 1, . . . , d. Therefore, the transformation (2.5) achieves both the standard global regular variation and the standard Pareto index 1 marginal distributions.

3. Estimation of the tail measure

Estimating the tail measure ν (or ν∗) is of crucial importance in many applications of stochastic models, and a number of estimators have been de- signed for that purpose. The most popular method is the ranks method that automatically standardizes the problem without requiring one to estimate first the marginal quantile functions u in (2.5). Suppose that conditions (2.3) and (2.4) for global and marginal regular variation hold. Given a d-dimensional sample of size n, for each j = 1, . . . , d, we define n j X ri = 1 (j) (j) , 1 ≤ i ≤ n , [Zm ≥Zi ] m=1 which is the sequence of ranks. If k = k(n) is an intermediate sequence, i.e. k if k → ∞ and n → 0 as n → ∞ then weak consistency of the empirical estimator n 1 X (3.1)   ⇒ ν k k ∗ i=1 ,j=1,...,d  (j)  ri in the vague topology on the space of nonnegative Radon measures on E k holds; see pp. 311-312 in Resnick (2007). Note that the condition n → 0 ensures that only “tail observations” affect the estimator. The second order behavior of this estimator is difficult; in the two-dimensional case, and under additional regularity assumptions, asymptotic normality was established in Einmahl et al. (2001). 6 T. NGUYEN AND G. SAMORODNITSKY

To estimate the spectral measure, we ﬁrst applying the polar transformation on the rank data, by  k  , j = 1, . . . , d  k r(j)  (R , θ ) =  , j = 1, . . . , d , i  , i = 1, . . . , n . i,k i,k  (j) k   ri , j = 1, . . . , d  (j) ri The continuity of the transformation and (2.2) and (3.1) imply n 1 X 1 ⇒ cS(·) k (Ri,k>1, θi,k∈(·)) i=1 weakly on the unit sphere, which gives us the following consistent estimator for S: Pn 1 i=1 (Ri,k>1, θi,k∈(·)) (3.2) S (·) := ⇒ S(·) . k,n Pn 1 i=1 (Ri,k>1, θi,k∈[0,π/2])

4. The choice of k

In practice, the estimator Sk,n in (3.2) shares with other “tail related” estimators sensitivity to the choice of k. A decision on what k to use can be viewed as a decision on what part of the sample corresponds to “tail observations”. For estimators of the index of univariate regular variation, k is the number of upper order statistics to use in the estimator, and several visual techniques have been suggested to pick out that number, ncluding the most popular choice, the Hill plot; see Section 4.4 in Resnick (2007). Other methods to optimize the asymptotic behaviour of the tail estimators have been suggested in e.g. Danielsson et al. (2001) and Drees and Kaufmann (1998). In our previous work (Nguyen and Samorodnitsky, 2012), we proposed a sequential testing method for deciding “where the tail begins” in which the signiﬁcance level increases with the sample size. Our approach is based on a simple idea. Let (Xj) be a nonnegative i.i.d. sequence with a distribution F . It is well known that, under the assumption (2.1) of regular variation, we have the following weak convergence n X Nn = δXi/an ⇒ N∗ i=1 MULTIVARIATE TAIL ESTIMATION 7 in the vague topology on the space of nonnegative Radon measures on (0, ∞]

(see e.g. Proposition 4.26 in Resnick (1987)). Here δx is the point mass at x, and (an) a positive sequence satisfying F¯(an) ∼ 1/n as n → ∞.

Further, N∗ is a Poisson random measure on (0, ∞] with mean measure −α µ∗(x, ∞] = x , x > 0. Accordingly, our procedure for deciding on the number k of the upper order statistics to use in the tail estimation consists of sequentially testing the samples log(Xn−i,n/Xn−k,n), i = 0, 1 . . . k − 1 , with increasing value of k, for the null hypothesis of exponential distribution.

Our choice of k used in tail estimation is then Nn − 1, where Nn is the smallest k such that the test described above rejects the null hypothesis of exponentiality. To test for exponentiality we used the statistic  2  √ 1 Pk−1 log Xn−i,n k k i=0 Xn−k,n Qk,n =  − 2 ; 2  X 2  1 Pk−1 log n−i,n k i=0 Xn−k,n under the null hypothesis it converges to the standard normal distribution (Dahiya and Gurland (1972)). Our choice of the signiﬁcance level of the test resulted in ( r ) θ N := inf k : 1 ≤ k ≤ n, |Q | ≥ ω n , n k,n k where ω > 0 and (θn) an increasing to inﬁnity sequence such that θn = 2|ρ| o n 1+2|ρ| . Here ρ is the parameter of the assumed second order regular

Nn variation. Under these assumptions, ⇒ τω , where τω is the first time a θn standard Brownian motion hits ±ω. For estimating the spectral measure in the multivariate regular variation context, one can adapt the one-dimensional rule of thumb and use that part of the sample whose radial component is in the top 5% among all the observations. The only existing systematic method to choose k appears to be the “Stărică plots”, introduced in Stărică (1999). It is a visual method, based on the fact that the tail measure ν∗ has the scaling property (2.8). This property is checked empirically for a range of scales, and a “good” choice of k (of the upper radial order statistics) is the one for which the estimated tail measure seems to be close to satisfying the scaling property. This checking is performed graphically, and a “good” choice of k is the one where the plot 8 T. NGUYEN AND G. SAMORODNITSKY stays as much as possible around the value equal to 1; see p. 314 in Resnick (2007). One would then use the observations of the norm larger than the kth radial order statistic in estimating the spectral measure. In this work we extend our sequential testing procedure to the multivariate framework of estimating the spectral measure. Specifically, we start by applying the testing procedure to each set of marginal observations. For the jth marginal, j = 1, . . . , d we calculate the first rejection time, N (j), via  s  (j) (j)  θn  (4.1) Nn := inf k : 1 ≤ k ≤ n, |Qk,n| ≥ ωj ,  k 

(j) using the data for that marginal, with some ωj > 0 and some sequence (θn ). Then we set our choice of k to be

d ^ (j) (4.2) Nn = Nn . j=1 Taking the minimum in (4.2) means that our procedure is, generally, con- servative about deciding on the “tail part” of the data. Other choices of k are Wd (j) possible, with the largest rejection point, Nn = j=1 Nn , being an obvious choice. When tested on simulated data, the latter choice of k often appeared to be “too generous” with deciding which observations are “in the tails”. The consistency of the resulting estimator of the tail measure is proven in the following theorem.

Theorem 1. Assume that the marginal and the joint regular variation conditions (2.3) and (2.4) hold. Assume, further, that the jth marginal distribution satisfies the second order regular variation condition with exponent (j) ρj < 0, j = 1, . . . , d. Let ωj > 0 and (θn ) be an increasing to infinity 2|ρj | ! (j) 1+2|ρ | sequence such that θn = o n j , j = 1, . . . , d, and let Nn be defined by (4.2). Then n 1 X   ⇒ ν∗ N Nn n i=1 ,j=1,...,d  (j)  ri weakly in the vague topology. MULTIVARIATE TAIL ESTIMATION 9

Proof. It is enough to show that for each j = 1, . . . , d and t > 0, (j) XdN te (4.3) n → t−1/αj in probability, b(j)( n ) Nn because then the argument on pp. 311-312 in Resnick (2007) can be used to establish the claim of the theorem. We start with checking that for every j, m = 1, . . . , d X(j) dN (m)te (4.4) n → t−1/αj in probability. (j) n b ( (m) ) Nn Fix j, m and let 0 < a < A < ∞. We verify ﬁrst that   X(j) dθ(m)ute (4.5)  n , a ≤ u ≤ A → t−1/αj e(u), a ≤ u ≤ A  (j) n  b ( (m) ) θn u in probability in the Skorohod J1 topology on D[a, A]. Here e is a function on [a, A] equal identically to 1. Indeed, let 0 < ε < t−1/αj , and choose K so large that

ui < (1 + εt1/αj )αj for each i = 1, . . . , k, ui−1 with ui = a + (A − a)i/K, i = 0, 1,...,K. Then   X(j) dθ(m)ute P  sup n − t−1/αj > ε  (j) n  a≤u≤A b ( (m) ) θn u

(j) n ≤ P X > t−1/αj + εb(j) for some a ≤ u ≤ A dθ(m)ute (m) n θn u (j) n +P X < t−1/αj − εb(j) for some a ≤ u ≤ A dθ(m)ute (m) n θn u := an + bn . By monotonicity, K n X (j) −1/αj (j) an ≤ P X (m) > t + ε b , dθn ui−1te (m) i=1 θn ui and the ith term in the finite sum can be written as  (j)  X b(j)( n ) dθ(m)u te (m) P  n i−1 > t−1/αj + ε θn ui  → 0  (j) n (j) n  b ( (m) ) b ( (m) ) θn ui−1 θn ui−1 10 T. NGUYEN AND G. SAMORODNITSKY as n → ∞, because by (4.18) in Resnick (2007), the random variable in the left hand side of the inequality converges in probability to t−1/αj , while the expression in the right hand side of the inequality converges, by the regular variation, to 1/αj ui−1 t−1/αj + ε > t−1/αj ui by the choice of K. Hence an → 0 and, similarly, bn → 0 as n → ∞, which establishes (4.5). Let ϕ : (0, ∞) → [a, A] be defined by   x, a ≤ x ≤ A ϕ(x) = a, 0 < x < a  A, x > A. (m) (m) Since ϕ is a continuous function, and Nn /θn ⇒ τωm by Theorem 1 in Nguyen and Samorodnitsky (2012) (recall that τωm is the first time a (m) (m) standard Brownian motion hits ±ωm), we conclude that ϕ Nn /θn ⇒ ϕ τωm . Since the convergence in (4.5) is to a deterministic limit, it follows by Theorem 4.4 in Billingsley (1968) that  (j)   X (m)  dθn ute , a ≤ u ≤ A , ϕN (m)/θ(m)  (j) n  n n  b ( (m) ) θn u h i −1/αj ⇒ t e(u), a ≤ u ≤ A , ϕ τωm weakly in D[a, A] × [a, A]. Since the map (x, t) → x(t) on D[a, A] × [a, A] is continuous at every point of its domain with a continuous first coordinate, we conclude that (j) X (m) (m) (m) dθn ϕ Nn /θn ϕ τω te m → t−1/αj (j) n b ( (m) (m) (m)) θn ϕ Nn /θn in probability. Since  (j) X(j)  X (m) (m) (m) (m) (m) ! dθn ϕ Nn /θn ϕ τω te N P  dNn te 6= m  ≤ P n 6∈ [a, A]  (j) n (j) n  (m) b ( (m) ) b ( (m) (m) (m)) θn Nn θn ϕ Nn /θn → P τωm 6∈ [a, A] , which can be made arbitrarily small by taking a small and A large, (4.4) follows. MULTIVARIATE TAIL ESTIMATION 11

The connection, through subsequences, between convergence in probability and a.s. convergence shows that (4.4) implies (4.3), so the proof of the theorem is complete.

The following is an immediate corollary of Theorem 1. It provides the estimator of the spectral measure we will use in the rest of the paper.

Corollary 2. Under the assumptions of Theorem 1, Pn 1 i=1 (Ri,N >1, θi,N ∈(·)) (4.6) S (·) = ⇒ S(·) , N,n Pn 1 i=1 (Ri,N >1, θi,N ∈[0,π/2]) weakly on the unit sphere.

5. Testing the estimator on simulated data

In this section, we evaluate our estimator (4.6) of the spectral measure on simulated data. We consider 3 two-dimensional examples. The ﬁrst example looks at two scenarios with asymptotic independence (so that the true spectral measure is concentrated at the extreme points of the arc), the second example looks at a case of a complete tail dependence (so that the true spectral measure is concentrated at a single point in the interior of the arc), and the last example considers a situation where the true spectral measure is absolutely continuous with respect to the Lebesgue measure on the arc. In each case we ran simulations with sample sizes n = 1000, n = 5000 and n = 20000, and each simulation was performed 500 times. The results we report are the averages obtained from the 500 simulations. In all cases, (j) (j) 2 we choose the numbers N , j = 1, 2 by (4.1) with θn = (log n) , as recommended in Nguyen and Samorodnitsky (2012).

Example 5.1. The generic model is X(i) = |Z + Y (i)|, i = 1, 2 where Z ∼ N(0, 1) and Y (1) and Y (2) are independent identically distributed random variables independent of Z, with regularly varying tails. (a) Y (i), i = 1, 2 have a t distribution with 3 degree of freedom; (b) Y (i), i = 1, 2 have a Generalized Pareto distribution with parameters µ = 1, α = 1, σ = 2. Recall that the distribution function of such a 12 T. NGUYEN AND G. SAMORODNITSKY

distribution is given by

1 (y − µ)−α (5.1) G (y) = 1 − 1 + , y ≥ µ . α,σ α σ Our decision on the tail part of the sample plays a role not only in the estimation of the spectral measure, but also in estimation of the parameters of the distribution. Estimating the latter is not necessary for the estimation of the spectral measure, which is our main goal. However, the interplay between estimation of the parameters and the choice of the part of the data to use for it, is instructive, and we include a short discussion of the resulting insights. In Example 5.1 (a) we estimate the marginal tail index (coinciding with the number of the degrees of freedom) by the Hill estimator, which is deﬁned as follows. If X1,n ≤ X2,n ≤ ... ≤ Xn,n are the order statistics from a positive sample, X1,...,Xn, then the Hill estimator based on k upper order statistics is

k−1 1 X Xn−i,n (5.2) H := log . k,n k X i=0 n−k,n If the observations are i.i.d., with a regularly varying right tail with tail k index α, then, with n → ∞, k → ∞ and n → 0, the Hill estimator Hk,n 1 converges in probability to γ = α (see Mason (1982)). We follow Nguyen and Samorodnitsky (2012) in our choice of k in estimating the jth marginal (j) tail index. We choose it to be Nn as above, j = 1, 2. In the case of the generalized Pareto distribution of Example 5.1 (b) we (j) use the Nn th largest order statistic of Xj as an estimate of the location parameter µ(j), j = 1, 2. The point of this estimator is that a generalized Pareto distribution is, really, a tail model, and the tail it describes is assumed to begin at the location parameter. Once the location parameter is estimated, the remaining parameters α and σ are obtained via maximum likelihood estimation. We note that in both parts of Example 5.1 the tails are asymptotic independent, and the spectral measure splits its mass equally between the two extreme points, at the angles 0 and π/2. MULTIVARIATE TAIL ESTIMATION 13

n α(1) α(2) N (1) N (2) N MISD 5% 1000 2.9968 2.9802 108.614 109.5900 80.2120 .0330 .0223 5000 3.1180 3.1072 218.0160 212.9620 137.3120 .0122 .0220 20000 3.1194 3.1051 325.08 294.9 187.4540 .0042 .0219 Table 1. Simulation results for Example 5.1 (a)

n N (1) N (2) N MISD 5% 1000 144.9940 153.9840 95.966 .0336 .0191 5000 277.8820 237.3660 141.5240 .0111 .0190 20000 318.12 313.644 194.2820 .0040 .0190 Table 2. Simulation results for Example 5.1 (b)

Our general format for reporting the results of estimation is as follows. We (j) report the numbers Nn , j = 1, 2, averaged over all the runs, as well as their minimum, Nn, also averaged over all the runs. To measure how close the estimated spectral measure is to the true one, we calculate and report the R π/2 ˆ 2 mean integrated squared diﬀerence (MISD), given by 0 S(θ) − S(θ) dθ, where S(θ) is the mass assigned by the true spectral measure S to the part of the L∞ unit sphere whose angle, in the polar coordinates, is in the interval [0, θ], 0 ≤ θ ≤ π/2. Similarly, Sˆ(θ) is the mass assigned by the estimated spectral measure to the same set; these are the “cumulative distribution functions” indexed by the angle. The reported MISD is the average over all the runs. We compare the calculated MISD obtained using our approach to the “rule of thumb” choice of the part of the sample whose radial component is in the top 5% among all the observations. The alternative “Stărică plot” is not useful in this situation; it cannot distinguish between diﬀerent k in the relevant range. The estimation results for the t distribution of Example 5.1 (a) are reported in Table 1. We present estimates of the spectral measure from 20 simulation in Figure 2. The estimation results for the Generalized Pareto distribution of Example 5.1 (b) are similarly reported in Table 2, Table 3 and Figure 3. Observe that, while the “5% rule of thumb” results in a lower MISD than our estimator for the smallest sample size n = 1000, the situation is reversed 14 T. NGUYEN AND G. SAMORODNITSKY

1 1

0.8 0.8

0.6 0.6 ) ) θ θ S( S( 0.4 0.4

0.2 0.2 theoretical CDF theoretical CDF estimated CDF estimated CDF 0 0 0 0.5 1 1.5 0 0.5 1 1.5 θ θ

Figure 2. Sample spectral measures from 20 simulations of Example 5.1 (a) with n = 5000 and n = 20000

n α(1) α(2) σ(1) σ(2) µ(1) µ(2) 1000 1.0113 .9897 21.0712 20.4578 21.4896 20.6536 5000 1.0258 1.0021 73.2986 67.2985 83.0037 70.9820 20000 1.0170 1.0062 213.0583 222.8202 231.3540 239.1394 Table 3. Simulation results for Example 5.1 (b)

1 1

0.8 0.8

0.6 0.6 ) ) θ θ S( S( 0.4 0.4

0.2 0.2 theoretical CDF theoretical CDF estimated CDF estimated CDF 0 0 0 0.5 1 1.5 0 0.5 1 1.5 θ θ

Figure 3. Estimated spectral measures from 20 simulations of Example 5.1 (b) with n = 5000 and n = 20000 for larger sample sizes. We interpret this as showing that, for small sample sizes, there is very little information in the sample about the tails, so neither approach estimates the spectral measure well. Our approach quickly picks MULTIVARIATE TAIL ESTIMATION 15

n α(1) α(2) N (1) N (2) N MISD 5% 1000 2.5296 1.0812 121.7260 145.7480 97.1420 .0410 .0267 5000 2.2689 1.0270 198.8200 235.0340 158.1500 .0192 .0262 20000 2.1667 1.0180 307.0820 324.1760 243.0800 .0108 .0262 Table 4. Simulation results for Example 5.2

up the emerging tails as the sample size increases, while the precision of the “rule of thumb” approach does not become any better. As far as the estimation of other parameters is concerned, we see that we are estimating the tail exponent α well even for small sample sizes. How- ever, we are missing the true values of the location parameter µ and the scale parameter σ of the Generalized Pareto distribution of Example 5.1 (b) by a wide margin. The reason is that our approach views the model as a tail model, and so estimates µ by the value of the order statistic where, according to our procedure, the tails “begin”. The actual Generalized Pareto model in this example is not a tail model, so we are overestimating the location parameter. Recall that, in a Generalized Pareto distributed with parameters (µ, α, σ) given that X > µ + δ, for some δ > 0, X has again a Generalized Pareto distribution, this time with parameters (µ + δ, α, σ + δ/α). Hence overestimating the location parameter leads to overestimating the scale parameter as well. Note, however, that the tail exponent α is still adequately estimated.

Example 5.2. This is an example of a bivariate law with a complete tail dependence. Let Y be Pareto(2) distributed, independent of two i.i.d. standard normal random variables Z(1) and Z(2). Let X(1) = Z(1) + Y and X(2) = Z(2) + Y 2. The true spectral measure in this case has mass only at a single point, π/4. We present the results in Table 4 and Figure 4. The lessons we learn here are similar to the lessons learned in Example 5.1; for larger sample sizes our algorithm estimates the spectral measure more precisely than the 5% rule does. 16 T. NGUYEN AND G. SAMORODNITSKY

1 1

0.8 0.8

0.6 0.6 ) ) θ θ S( S( 0.4 0.4

0.2 0.2 theoretical CDF theoretical CDF estimated CDF estimated CDF 0 0 0 0.5 1 1.5 0 0.5 1 1.5 θ θ

Figure 4. Estimated spectral measures from 20 simulations of Example 5.2 with n = 5000 and n = 20000

n α(1) α(2) N¯(1) N¯(2) N¯ MISD 5% 1000 1.0070 1.0072 149.2900 145.0900 105.9720 .0017 .0021 5000 1.0133 1.0112 229.0200 240.4080 156.61 .0010 .0004 20000 1.0178 1.0161 307.8380 307.7380 203.3380 .0009 .0001 Table 5. Simulation results for Example 5.3 (a)

Example 5.3. In this example the true spectral measure is absolutely continuous with respect to the Lebesgue measure on the L∞ unit sphere. The example is based on the bivariate Cauchy distribution, with density (2π)−1(1+ 2 2 −3/2 x + y ) for x, y ∈ R. We consider two cases. (a) X(i) = |Y (i)|, i = 1, 2 with Y (1),Y (2) having the bivariate Cauchy distribution. (b) X(i) = |Y (i) + Z(i)|, i = 1, 2, with Y (1),Y (2) bivariate Cauchy, independent of i.i.d. Pareto(2) random variables Z(1) and Z(2). Note that, in both cases, the “c.d.f.” of the true spectral measure is : S(θ) = 1 R θ √ ||sin(ψ), cos(ψ)||∞dψ, 0 ≤ θ ≤ π/2. 2 0 The results for part (a) of the example are reported in Table 5 and Fig- ure 5, and for part (b) of the example in Table 6 and Figure 6. We see that, as expected, it is harder to estimate the spectral measure in the “contaminated” case (b). Nevertheless, our approach works reasonably well. It is instructive to observe that in the pure bivariate Cauchy case of part MULTIVARIATE TAIL ESTIMATION 17

1 1

0.8 0.8

0.6 0.6 ) ) θ θ S( S( 0.4 0.4

0.2 0.2 theoretical CDF theoretical CDF estimated CDF estimated CDF 0 0 0 0.5 1 1.5 0 0.5 1 1.5 θ θ

Figure 5. Estimated spectral measures from 20 simulations of Example 5.3 (a) with n = 5000 and n = 20000

n α(1) α(2) N¯(1) N¯(2) N¯ MISD 5% 1000 1.1315 1.1315 138.58 142.89 98.17 .0041 .0044 5000 1.0671 1.0636 216.4560 214.2900 148.6300 .0024 .0029 20000 1.0263 1.0297 279.3420 309.5700 197.5380 .0011 .0026 Table 6. Simulation results for Example 5.3 (b)

1 1

0.8 0.8

0.6 0.6 ) ) θ θ S( S( 0.4 0.4

0.2 0.2 theoretical CDF theoretical CDF estimated CDF estimated CDF 0 0 0 0.5 1 1.5 0 0.5 1 1.5 θ θ

Figure 6. Estimated spectral measures from 20 simulations of Example 5.3 (b) with n = 5000

(a) the 5% rule works very well in all situations. This is because the entire distribution is rotationally invariant and, hence, in this special case all parts of the sample provide exactly the same information, and one does not need 18 T. NGUYEN AND G. SAMORODNITSKY to detect the tail part of the sample. In contrast, in the “contaminated” case of part (b), it is important to know where the tails begin, and our approach, again, is superior to the 5% rule, apart from the smallest sample size situation.

It is sometimes of interest to estimate the actual density of the spectral measure, as opposed to the CDF. We use a kernel density estimation procedure with the values of θi,N in (4.6) such that Ri,N > 1 as the input. Since the spectral measure is concentrated on the interval [0, π/2], we use the beta kernel of Chen (1999) (with bandwidth b = 04) that takes into account the proximity of an observation to the endpoints of the interval. The results of estimating the spectral density for the contaminated Cauchy example are presented in Figure 7. Estimating the density is always hard, however the true spectral density is estimated much better by our procedure, than using the 5% rule.

6. Analyzing CoVaR

In this section we apply our approach of estimating multivariate tails to the problem of evaluating the conditional risk measure CoVaR, proposed by Adrian and Brunnermeier (2011). It is based on the widely used risk measure of Value-at-Risk. (j) Recall that for 0 < q < 1, and close to 1, V aRq is defined as the (1 − q)- quantile of the return on an instrument associated with institution j. In light of the recent financial crisis, efforts have been made to capture better the spillover effects between financial institutions. This gave rise to the conditional risk measure CoVaR of Adrian and Brunnermeier (2011), defined by

(j) j|i (i) P X ≤ CoV aRq C(X ) = 1 − q .

Here C(X(i)) is some event describing a measure of distress of institution i (i) i. In this section we will consider events of the type X ≤ V aRr for some 0 < r < 1, close to 1. We are also interested in estimating the natural versions of CoVaR in the case of more than one institution being in distress. MULTIVARIATE TAIL ESTIMATION 19

1.5 1 θ 0.5 theoretical density estimated density

0 1 0

1.5 0.5 ) s( θ

1.5 1 θ 0.5 theoretical density estimated density

0 1 0

1.5 0.5 ) s( θ

1.5 1 θ 0.5 theoretical density estimated density

0 1 0

20 kernel estimations for the density function of contaminated Cauchy tail with (1) n=5000, (2)

1.5 0.5 ) s( θ n=20000, (3) n=20000 using 5% rule Figure 7. 20 T. NGUYEN AND G. SAMORODNITSKY

In order to keep the notation closer to what we used in the earlier sections, which dealt with the right tails, we will switch from returns to losses and work with Y (i) = −X(i), i = 1, . . . , d. The problem then becomes that of studying for large values of t the probabilities of the type

(j) (i) (i) (6.1) P Y > t Y > V aR1−r, i = 1, . . . , d, i 6= j . As long as we keep r close to 1, we may (and will) work with positive losses only. (i) We approximate the marginal quantiles V aR1−r, i = 1, . . . , d using the Generalized Pareto model (5.1) of Example 5.1 (b). As before, we estimate (i) the location parameter by the Nn th order statistic in (4.1), and the remaining two parameters by the maximum likelihood estimation. Consequently, we estimate, using (2.6), the probabilities in (6.1) for t of the order uj(1/r) as follows. (6.2)

(j) (i) (i) P Y > t Y > V aR1−r, i = 1, . . . , d, i 6= j 1 − r 1 − r 1 − r P > , > 1 , i = 1, . . . , d, i 6= j (1 − F (j)(Y j)) (1 − F (j)(t)) (1 − F (i)(Y i)) ∼ 1 − r P > 1 , i = 1, . . . , d, i 6= j) (1 − F (i)(Y i)) 1 − r µ Y (j) > ,Y (i) > 1 , i = 1, . . . , d, i 6= j ∗ 1 − F (j)(t) ∼ . (i) µ∗ Y > 1, i = 1, . . . , d, i 6= j When using the estimator (6.2) in practice, one needs to deal with the following issue. Since we would like to estimate high conditional quantiles, we will need to use the estimator for t such that (1 − r)/1 − F (j)(t) is very large. In such cases the estimate of µ∗ in the numerator of the ratio in the right hand side of (6.2), described in Theorem 1, may be based on a very small number of observations. We have chosen to resolve this diﬃculty by using the scaling property (2.8) of µ∗: for every L > 0 we can rewrite the ratio in the right hand side of (6.2) as 1 − r 1 1 µ Y (j) > ,Y (i) > , i = 1, . . . , d, i 6= j ∗ 1 − F (j)(t) L L (6.3) . (i) Lµ∗ Y > 1, i = 1, . . . , d, i 6= j MULTIVARIATE TAIL ESTIMATION 21

By choosing an appropriately large L, the diﬃculty described above is re- duced. Another diﬃculty arises, however. Notice that an appropriate choice of L is, by the nature of the problem, t-dependent. We are estimating a function of t that is, clearly, monotone, and so is the expression in (6.3).

However, we are using empirical estimates of µ∗ both in the numerator and the denominator of (6.3). With a t-dependent L, the monotonicity may be occasionally violated. We have found that a good choice of L is 1 − r L = , 1,..., 1 . t (j) 1 − F (t) 2 We demonstrate our procedure on two examples, that show markedly different behavior of CoVaR: daily returns of shares of 4 major US ﬁnancial institutions and daily returns of 4 major European indices.

Example 6.1. Daily returns of 4 major US financial companies We analyze the daily returns on the stock of Bank of America, JP Morgan, Morgan Stanley and Citigroup, from January 2, 1996 to June 30, 2012. The data set contains 4169 observations. As described above, we concentrate on the respective losses Y (i) = −X(i), i = 1,..., 4 of these 4 stocks. We use as the distress events Ci the exceedance (i) (i) events Y > V aR1−r, i = 2, 3, 4 with r = .95 and r = .99, and estimate the conditional quantiles of the losses on the stock of Bank of America. We specifically concentrate on the change in these conditional quantiles as we condition on more and more distress events for other companies. That is, (1) (1) we will estimate the conditional probabilities P Y > t|C2 , P Y > (1) t|C2,C3 and P Y > t|C2,C3,C4 , as well as the resulting conditional quantiles. For comparison, we estimate unconditional quantiles as well. The results are presented in Figure 8. Note that including every additional distress event makes the conditional tail of the loss distribution on the stock of Bank of America observably heavier and affects, correspondingly, the resulting value of CoVaR. Some estimated quantiles for losses of Bank of America are reported in Table 7.

Example 6.2. Daily returns of 4 major European indices 22 T. NGUYEN AND G. SAMORODNITSKY

Quantile Unconditional Given C2 Given C2,C3 Given C2,C3,C4 90% .027 .114 .152 .179 95% .041 .159 .213 .253 99% .089 .319 .420 .488 Table 7. Estimated quantiles for losses of Bank of Amer- ica unconditioned and conditioned on Ci where Ci’s are the exceedance events with r = .95

0.2 0.2

0.1 0.1

0 0 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 t t

Figure 8. US stocks: Unconditional and conditional P (Y (1) > t) for r = .95 and r = .99

We analyze the daily returns of the following indices: British FTSE 100, French CAC 40, Deutsche Börse AG and Spain’s IBEX 35, from January 2, 1995 to April 27, 2012. The data set contains 4402 observations. Once again, we concentrate on the respective losses Y (i) = −X(i), i = 1,..., 4. Using the same type of distress events as in the previous examples, we study the conditional loss tail of the return on British FTSE 100 and the corresponding CoVaR. The results are displayed in Figure 9 and Table 8. An obvious difference between the results obtained in this example and the previous example is that no significant movement in the conditional loss probabilities is noticeable after the first conditioning. This is due to the fact that there is strong tail dependence between the losses on French CAC 40, Deutsche Börse AG and Spain’s IBEX 35. This results in the tail measure that concentrated in a small part of the four-dimensional space, and in the conditional loss curves that barely move. MULTIVARIATE TAIL ESTIMATION 23

Quantile Unconditional Given C2 Given C2,C3 Given C2,C3,C4 90% .013 .039 .045 .048 95% .019 .048 .055 .059 99% .033 .073 .084 .089 Table 8. Estimated quantiles for losses of FTSE 100 unconditioned and conditioned on Ci where Ci’s are the exceedance events with r = .95

P 0.15 P 0.15

0.1 0.1

0.05 0.05

0 0 0 0.05 0.1 0.15 0 0.05 0.1 0.15 t t

Figure 9. European indexes: Unconditional and conditional P (Y (1) > t) for r = .95 and r = .99

Acknowledgments

We are grateful to Zari Rachev for the suggestion of using CoVaR to describe the tails of high dimensional ﬁnancial data and for providing us with the data on the European indices.

References

T. Adrian and M. Brunnermeier (2011): CoVaR. Federal Reserve Bank of New York Staﬀ Technical report No. 348. P. Billingsley (1968): Convergence of Probability Measures. Wiley, New York. S. Chen (1999): Beta kernel estimators for density functions. Computational Statistics & Data Analysis 31:131–145. 24 T. NGUYEN AND G. SAMORODNITSKY

R. Dahiya and J. Gurland (1972): Goodness of ﬁt tests for the gamma and exponential distributions. Technometrics 14:791–801. J. Danielsson, L. de Haan, L. Peng and C. G. de Vries (2001): Using a bootstrap method to choose the sample fraction in tail index estimation. Journal of Multivariate Analysis 76:226–248. L. de Haan (1985): Extremes in higher dimensions: The model and some statistics. In Bulletin of the International Statistical Institute, Amster- dam, 1985. Proceedings of the 45th Session of the International Statistical Institute. L. de Haan and S. Resnick (1993): Estimating the limit distribution of multivariate extremes. Stochastic Models 9:275–309. H. Drees and E. Kaufmann (1998): Selecting the optimal sample fraction in univariate extreme value estimation. Stochastic Processes and their Applications 75:149–172. J. Einmahl, L. de Haan and V. Piterbarg (2001): Nonparametric estimation of the spectral measure of an extreme value distribution. Annals of Statistics 9:1401–1423. J. Einmahl and J. Segers (2009): Maximum empirical likelihood estimation of the spectral measure of an extreme value distribution. Annals of Statistics 37:2953–2989. D. Mason (1982): Laws of large numbers for sums of extreme values. Annals of Probability 10:756–764. T. Nguyen and G. Samorodnitsky (2012): Tail Inference: where does the tail begin? Extremes, to appear. S. Resnick (1987): Extreme Values, Regular Variation and Point Processes. Springer-Verlag, New York. S. Resnick (2007): Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, New York. C. Stărică (1999): Multivariate extremes for models with constant conditional correlations. Journal of Empirical Finance 6:515–553. MULTIVARIATE TAIL ESTIMATION 25

Center for Applied Math, Cornell University, Ithaca, NY 14853 E-mail address: [email protected]

School of Operations Research and Information Engineering, and Depart- ment of Statistical Science, Cornell University, Ithaca, NY 14853 E-mail address: [email protected]