Inference for vast dimensional elliptical distributions

Yves Dominicy1, Hiroaki Ogata2 and David Veredas3

Abstract

We propose a quantile–based method to estimate the parameters (i.e. locations, dispersions, co–dispersions and the tail index) of an elliptical distribution, and a battery of tests for model adequacy. The method is suitable for vast dimensions since the estimators for the location vector and the dispersion matrix have closed form expressions, while estimation of the tail index boils down to univariate optimizations. The tests for model adequacy are for the null hypothesis of correct specification of one or several level contours. A Monte Carlo study to three distributions (Gaussian, Student–t and elliptical stable) for dimensions 20, 200 and 2000 reveals the goodness of the method, both in terms of computational time and for finite samples. An empirical application to financial illustrates the usefulness of the approach.

Keywords: Quantiles, elliptical family, simulations, heavy tails.

JEL classification: C13, C15, G11

1ECARES, Solvay Brussels School of Economics and Management, Universit´elibre de Bruxelles; email: [email protected]. 2Tokyo Metropolitan University and Waseda University; email: [email protected] 3ECARES, Solvay Brussels School of Economics and Management, Universit´elibre de Bruxelles; email: [email protected]. Corresponding address: David Veredas, ECARES, Universit´elibre de Bruxelles, 50 Av F.D. Roosevelt CP114, B1050 Brussels, Belgium. Phone: +3226504218. Fax: +3226504475. This work started during the visit of Hiroaki Ogata to the Universit´elibre de Bruxelles, as holder of the Chaire Waseda, in March 2010. Yves Dominicy acknowledges financial support from a F.R.I.A. grant. Hiroaki Ogata acknowledges financial support from the Japanese Grant–in–Aid for Young Scientists (B), 22700291, and the International Relations Department of the Universit´elibre de Bruxelles. David Veredas acknowledges financial support from the Belgian National Bank, and the IAP P6/07 contract, from the IAP program (Belgian Scientific Policy), ’Economic policy and finance in the global economy’. Yves Dominicy and David Veredas are members of ECORE, the recently created association between CORE and ECARES. Any error and inaccuracy are ours. 1 Introduction

The elliptical family of distributions is commonly used as it nests, among others, the Gaussian, Student–t, elliptical stable (ESD henceforth), Cauchy, Laplace and Kotz laws. They are defined by three sets of parameters: a vector of locations, a dispersion matrix that reproduces the ellipticity, and a (possibly) tail index that generates tail thickness. While the statistical properties of the elliptical family of distributions are well known (see, among others, Kelker, 1970, Cambanis, et al., 1981, Fang et al., 1990, and Frahm, 2004), inference for vast dimensions is still an almost unexplored area, in particular for heavy–tailed distributions. For moderate dimensions and for thin–tailed distributions, such as the Gaussian, standard estimation methods –namely maximum likelihood (ML) and the generalized method of moments (GMM)– are straightforward. For heavy–tailed distributions, they may fail because of intractability of the probability density function or/and lack of existence of moments.1 This is the case of, for instance, the ESD, Cauchy and the Student–t. Alternative methods, such as Indirect Inference (Lombardi and Veredas, 2009) or projections (Nolan, 2010) can be used but, as the authors acknowledge, they do not apply to vast dimensions. Another branch of the literature has focused on the estimation of the shape matrix, i.e. the dispersion matrix up to a positive scalar factor. Tyler (1987) introduces an affine-equivariant estimator, and Hallin et al. (2006) propose an R-estimator of the shape matrix. In this article we propose inference (i.e. estimation and testing) for vast dimensional elliptical distributions. Estimation is based on quantiles, which always exist regardless of the thickness of the tails, and testing is based on the geometry of the elliptical family. More precisely, the contribution of this article is threefold. First, we introduce a quantile–based function that is informative about the co–dispersions. Second, we propose a fast method for the estimation of the parameters that i) does not require tractability of the density function and existence of moments and ii) does not suffer from the curse of dimensionality. Last, we introduce simple testing procedures for the null hypothesis of correct specification of one or several level contours. Estimation is based on an enhanced version of the Method of Simulated Quantiles (MSQ) of Dominicy and Veredas (2012). MSQ is based on a vector of functions of quantiles that can be either computed from data (the sample functions) or from the distribution (the theoretical functions). The estimated parameters are those that minimize a quadratic distance between both. Since the theoretical functions of quantiles may not have a closed form expression, we rely on simulations. One of the centerpieces of the method is that the functions of quantiles have to be informative about the parameters of interest. The first contribution of this article is to propose a function of quantiles for the co–dispersion that is based on the following simple idea: If two centered and scaled random variables co–move, most of the times the pairs of standardized observations have the same sign. So a new equal to their pro- jection in the 45–degree line has large dispersion. If, by contrast, they anti–move most of the times the pairs of standardized observations have opposite sign, and their projection in the 45–degree line has small dispersion. Therefore the interquantile of the projection is an informative function about the co–dispersion.

1There exists a tail–trimmed version of GMM (Hill and Renault, 2012) that does not require existence of moments.

2 Furthermore, and this is the second contribution of the article, due to the properties of the elliptical family, we find a way to circumvent almost all the optimizations of the original MSQ. Only univariate quadratic minimizations are used for the tail index, if there is any, while the other parameters are obtained straightforwardly without any optimization procedure. This makes the method fast and applicable to vast dimensions. To assess the finite sample properties of the estimators we carry out a Monte Carlo study for the Gaussian, Student–t and ESD distributions of dimensions 20, 200 and 2000. For a wide range of sample sizes and tail indexes, we find that the estimated parameters are essentially unbiased. Moreover, computationally wise, the procedure is fast, even for vast dimensions. The third contribution is a battery of tests for correct specification of one or several level contours. This is different to testing for the correct distributional assumption, which can be done with standard goodness–of–fit test (Cram´er–von–Mises, Anderson–Darling or Kolmogorov–Smirnov to name a few). In many applications the researcher is not interested in fitting correctly the whole distribution but a small set of level contours. In finance, for instance, a risk manager is concerned by extreme level contours. In the univariate case, this motivation has lead to back–tests for Value–at–Risk that are based on the , that is the percentage of times that the observations are below a given quantile. If the theoretical quantile is well specified and estimated, the empirical failure rate should not be statistically different to the nominal rate. A similar approach is followed in this article, where the failure rate is defined as the percentage of times that the observations are inside an estimated level contour.2 Moreover, due to the geometry of the elliptical family, the tests can be easily performed without an increase in complexity with the dimension. This concept can be extended to testing the adequacy of several level contours, which leads to a vector of failure rates. We also consider the case where there is a level contour of main interest while we take into account the information in the neighboring contours. This leads to a Bonferroni type of test where for the level contour of interest we are very intolerant (i.e. low size) with increasing tolerance as we move further away. In all testing scenarios, the distribution of the vector of failure rates is multinomial and a simple statistic can be used. We derive the asymptotic distribution of the tests that incorporates the uncertainty of the estimated parameters. The rest of the article is laid out as follows. Section 2 briefly reviews the elliptical family of distributions. Section 3 covers the estimation method for the locations, dispersions and tail index as an extension of MSQ (Section 3.1), explaining at length the function of quantiles for the co–dispersions (Section 3.2), and the asymptotic properties of the estimators (Section 3.3). Then, Section 3.4 presents the fast procedure, which overcomes almost all the optimizations. A comprehensive Monte Carlo study to a variety of distributions, tail thickness and sample sizes is touched upon in Section 4. The tests for level contours are described in Section 5. Section 6 illustrates the theory with an application of 22 de–volatilized financial daily return series of market indexes. Lastly, Section 7 the conclusions and directions for further research. Finally, a word on notation. Unless otherwise indicated, vectors are treated as column vectors. Random vectors are in capital and bold (e.g. X) and the corresponding realizations in small caps and bold (e.g. x). Elements of the random vector and of the realizations are not in bold (e.g. Xl and xl). Vectors of parameters are denoted in small caps and bold (e.g.

2Gonzalez–Rivera et al. (2011) and Gonzalez–Rivera and Yoldas (2011) propose tests in the same spirit and in a more general context. But while these tests require numerical integration, ours, because of the properties of elliptical family, do not.

3 µ) while matrices are denoted in capitals and bold (e.g. Σ). Single parameters and elements 2 of the vectors and matrices are in small caps (e.g. α, µi and σi i). The letter J is reserved for the dimension and N for the size of the sequence of random vectors and their realizations (with l–th element). The letters i, j, r and g are used for elements in vectors and matrices of parameters.

2 Elliptical Distributions

Let X be a random vector of size J. We assume the following

A1X 1,..., XN is a sequence of i.i.d random vectors distributed according to a known dis- tribution that belongs to the elliptical family. I.e. the l–th element has the stochastic d representation Xl = µ + Rα lΛUl.

U1,..., UN is a sequence of J–dimensional i.i.d random vectors uniformly distributed, and J−1  J therefore radial, in the unit sphere with J − 1 dimensions S = ul ∈ R : kulk2 = 1 . The J × J scaling matrix Λ produces the ellipticity and is a matrix such that Σ = ΛΛ0, a positive definite symmetric dispersion matrix –also called the shape matrix– of rank J and with (i i)– 2 3 th and (i j)–th elements σi i and σi j. If Λ equals the identity matrix, the density of Xl remains radial. The non–negative random variable Rα l –often called the generating variate– generates the tail thickness, its sequence is i.i.d, and is stochastically independent of Ul. The J–dimensional vector µ re–allocates the center of the distribution. Let θ = (µ, Σ, α) ∈ Θ denote the vector of unknown parameters. We make the following assumption

J+ J(J+1) +1 A2 (a) The parameter space Θ is a non–empty and compact set on R 2 . (b) The true parameter value θ0 belongs to the interior of Θ.

The family of elliptical distributions possesses a number of useful properties, among which its closeness to affine transformations and aggregation (Fang et al., 1990), and the conditional and marginal distributions being also elliptical. Numerous distributions that are relevant for theoretical and practical work belong to this family: Gaussian, Laplace, Student–t, ESD (and hence Cauchy with α = 1), and Kotz among others.4 Gaussian N (µ, Σ) and Laplace L(µ, λ, Σ) q 2 q 2 p distributions are obtained if Rα = χJ and Rα = χJ E(λ) respectively –where E(λ) is 2 a exponential random variable with parameter λ and stochastically independent of χJ . The generating variate of these distributions does not depend on a tail index: Rα = R. Student–t q q 2 p 2 −1 2 p Stα(µ, Σ) and ESD Sα(µ, Σ) are obtained if Rα = χJ ( χα/α) and Rα = χJ Sα/2 2 respectively –where Sα/2 is a positive α/2 stable distributed random variable, and χα and 2 Sα/2 are stochastically independent of χJ . The Cauchy C(µ, Σ) is the ESD with α = 1. The p Kotz K(µ, Σ, α) distribution is obtained if Rα = G(κ, ξ) –where G(κ, ξ) is a gamma random variable with κ = J/((J + 2)α + 2) and ξ = (J + 2)α + 2 being the shape and scale parameters

3Σ is the matrix, up to a scale, if the second moments exist. 4Hereafter we skip the term multivariate but the reader should always keep in mind that U and X are random vectors and Rα is a random variable.

4 respectively. For more complicated choices of Rα we can obtain the Discrete Scale Mixture and the Polynomial Expansions. The probability density function of X is

p −1 0 −1  fX(x) = |Σ |gRα (x − µ) Σ (x − µ) , where Γ J √ √ g (x) = 2 x−(J−1)f ( x) Rα 2πJ/2 Rα is the density generator, and fRα is the probability density function of the generating vari- ate. Likewise, the characteristic function can be expressed as a function of the cumulative distribution function of the generating variate: Z ∞ 0 2 0 ϕX(ξ) = exp(iξ µ) ϕU(r ξ Σξ)dFRα (r), 0

J where ξ ∈ R is a J × 1 vector of frequencies, ϕU(·) is the characteristic function of U, and FRα is the cumulative distribution function of the generating variate. For instance, the characteristic function of the ESD is

0   0 α/2 ϕX(ξ) = exp iξ µ exp −(ξ Σξ) , which reduces to the characteristic function of a Gaussian if α = 2, or a Cauchy if α = 1. It is worth to recall the characteristic function of an affine transformation, as it is used extensively in the sequel. Let Z = b + AX, then Z ∞ 0 2 0 0 ϕZ(ξ) = exp(iξ (b + Aµ)) ϕU(r ξ AΣA ξ)dFRα (r) (1) 0 is the characteristic function of an elliptical random vector with location vector b + Aµ, dispersion matrix AΣA0 and tail index α.

3 Quantile–based Inference

In this section we first introduce the functions of quantiles for the locations, dispersions and tail index, which are drawn from Dominicy and Veredas (2012). Next, we explain the quantile– based measure for the co–dispersions and we derive its properties. In the third part we discuss the minimization problem and the asymptotic properties of the estimators. Once the general theory is discussed, in the last part we show how to avoid almost all the optimizations so that the method is applicable to vast dimensions.

3.1 Locations, dispersions and tail index

Let xl j be the l–th realization of the j–th random variable Xj and let xj = (x1 j, . . . , xN j) be the vector of N realizations. Denote by qˆ = (ˆq , ..., qˆ ) ∈ sj a s × 1 vector j N j τ1 N j τsj N R j of sample quantiles of xj –that is,q ˆj τk N denotes the τk–th sample quantile of xj. Gather

5 all the observations into the N × J matrix x, and the sample quantiles into the vector qˆN = s s M (qˆ1 N ,..., qˆJN ) ∈ R . Let h(qˆN ) be a M × 1 vector R → R of measurable functions of x. Likewise, denote by q = (q , ..., q ) ∈ sj an s × 1 vector of theoretical quantiles. θj τ1 θj τsj θj R j

That is, qτk θj denotes the τk–th theoretical quantile of xj. These quantiles may not be avail- able analytically but they can be computed through simulation. All theoretical quantiles are s s M gathered into the vector qθ = (qθ1 ,..., qθJ ) ∈ R . And let h(qθ) be an M ×1 vector R → R of functions.

The vectors of functions of quantiles h(qˆN ) and h(qθ) should be informative about the parameters of interest. As for the location parameters µ, the J × 1 vector of are the best candidates. For the sample quantiles:

hµ(qˆN ) = (ˆq1 0.50 N ,..., qˆJ 0.50 N ) .

And similarly for the theoretical quantiles hµ(qθ). Regarding the tail index α, if there is any, Dominicy and Veredas (2012), following Fama and Roll (1971) and McCulloch (1986), propose the J × 1 vector or functions of quantiles   qˆ1 0.95 N − qˆ1 0.05 N qˆJ 0.95 N − qˆJ 0.05 N hα(qˆN ) = ,..., , qˆ1 0.75 N − qˆ1 0.25 N qˆJ 0.75 N − qˆJ 0.25 N and likewise for the vector of theoretical quantiles hα(qθ). As far as the dispersion matrix ˆ is concerned, the interquantile rangesq ˆj τ N − qˆj 1−τ N , which we denote by IQRj N , are very informative for the diagonal elements, i.e. the dispersions σj j, that we gather in the J × 1 vector  ˆ ˆ  hdiagΣ(qˆN ) = IQR1 N ,..., IQRJN , where typically τ = 0.75 (and hence 1 − τ = 0.25) but not necessarily. Similarly for the vector of theoretical quantiles hdiagΣ(qθ).

3.2 Co–dispersions

The construction of a function of quantiles that is informative about the co–dispersion σj k is the first contribution of this article. In a nutshell, it is a pairwise function equal to the interquantile range of the projection of Xj and Xk onto the 45–degree line. Figure 1 shows the intuition: the left panel shows a , along with the 45–degree line, where Xj and Xk are positively co–dispersed (the pairs are represented by the circles). Projecting the observations on the 45–degree line produces a new random variables Z(j k) (represented by the squares) that is dispersed –since most of the times the pairs have the same sign– and therefore the interquantile range of the projection is large (in a sense to be defined below). By contrast, the right panel shows the case where Xj and Xk are negatively co–dispersed. Projecting them onto the 45–degree line produces a new random variable Z(j k) that is concentrated around the –since very frequently the pairs have the opposite sign– and therefore the interquantile range of the projection is small (in a sense to be defined below).

[FIGURE 1 ABOUT HERE]

6 The construction of this function is as follows: let the random variable Yj be a standard- ization of Xj by of the median and the interquantile range:

Xj − q0.50 θj Yj = . IQRθj

The next step is to project the pair of the standardized random variables (Yj,Yk) onto the 45–degree line. By standard trigonometric arguments: 1 Z(j k) = √ (Yj + Yk), 2 which can be written as     √1 √1 Yj √ Z(j k) = 2 2 = a 2Y(j k). (2) Yk

Letq ˆ(j k) τ N and q(j k) τ θ be the τ–th sample and theoretical quantiles of Z(j k) respectively. The function that we use as informative for the co–dispersion between Xj and Xk is the ˆ interquantile range of Z(j k): IQR(j k) N =q ˆ(j k) τ N − qˆ(j k) 1−τ N for the sample values and IQR = q − q for the theoretical counterparts. As shown in the following θ(j k) (j k) τ θ (j k) 1−τ θ Lemma, IQR has interesting and intuitive properties.5 θ(j k)

√ Lemma 1 IQR is bounded above and below by 2 and 0 respectively (which corresponds θ(j k) to maximal and minimal co–dispersion), it takes value 1 when the co–dispersion is zero, and it does not depend on the tail index α.

Proof The vector a√ in (2) is a scale shift. Let A = diag(IQR−1, IQR−1), b = 2 IQR θ (j k) θj θk 0.5 θ (j k) 2 (q0.50 θj , q0.50 θk ), µ(j k) = (µj, µk), and Σ(j k) a 2 × 2 matrix with diagonal elements σj j 2 and σk k and off–diagonal element σi j. Then

√ ϕZ(j k) (ξ) = exp(iξa 2AIQR,θ,(j k)(µ(j k) − b0.5 θ (j k))) Z ∞ ϕ (r2ξa√ A Σ A0 a√0 ξ)dF (r) U 2 IQR θ (j k) (j k) IQR θ (j k) 2 Rα 0 is the characteristic function of an univariate elliptical distribution with location and scale

√ µz(j k) = a 2AIQR θ (j k)(µ(j k) − b0.5 θ (j k)) σ2 = a√ A Σ A0 a√0 . z(j k) 2 IQR θ (j k) (j k) IQR θ (j k) 2

5The idea of using X ± Y as a way to glean information about dependence is embedded in the concept of co–difference (Samorodnitsky and Taqqu, 1994). Co–difference and our proposal are however substantially different in several respects. First, our measure is not the sum but the projection into the 45–degree line. Second, co–difference, as defined in Samorodnitsky and Taqqu (1994), applies to strict and does not generalize to the elliptical family. Third, co–difference is a measure of dependence between two strictly stable random variables, i.e. the equivalent of σi j in the elliptical family. Our projection is a way to estimate the latter.

7 The latter takes the explicit form ! 1 σ2 σ σ2 σ2 = j j + 2 j k + k k . (3) z(j k) 2 IQR2 IQR IQR IQR2 θj θj θk θk

There is a relation between IQRθj and σj j: IQRθj = c(α)σj j. Substituting in (3)

1  σ  σ2 = 1 + j k , z(j k) 2 c(α) σj jσk k

and, since σ2 = c(α)−2IQR2 , z(j k) θ(j k)

r σ IQR = 1 + j k . θ(j k) σj jσk k

So if the co–dispersion is negative, zero or positive, IQR is smaller, equal or larger √ θ(j k) than one. It is bounded below by 0 and above by 2 since, by positive definiteness of Σ, −σ σ < σ < σ σ . Last, IQR does not depend on c(α) and therefore neither j j k k j k j j k k θ(j k) on the tail index. Q.E.D.

This result was for the co–dispersion between the j–th and the k–th random variables. Note that IQR uses two quantiles: q and q . We now gather all the quantiles for θ(j k) (j k),τ,θ (j k),1−τ,θ the J(J − 1)/2 × 1 combinations between the J random variables. Let

qˆz N = (ˆq(1 2) τ N , qˆ(1 2) 1−τ N ,..., qˆ(J−1 J) τ N , qˆ(J−1 J) 1−τ N ) and

qz,θ = (q(1 2) τ θ, q(1 2) 1−τ θ, . . . , q(J−1 J) τ θ, q(J−1 J) 1−τ θ) be the J(J − 1) × 1 vectors of sample and theoretical quantiles of Z(1 2),...,Z(J−1 J). The J(J − 1)/2 × 1 vector that is informative about the co–dispersions is

 ˆ ˆ  hoffΣ(qˆz N ) = IQR(1 2) N ,..., IQR(J−1 J) N ,

and equivalently for the theoretical quantiles hoffΣ(qz θ).

3.3 The optimization and asymptotic properties

In this section we make use of standard results of a vector of sample quantiles for a random variable (see for instance Cram´er,1946), and of Babu and Rao (1988) that shows the joint asymptotic distribution of marginal quantiles in samples from a random vector (for instance, the asymptotic distribution of the medians of the elements of a random vector).

Recall that qˆj N √is a vector of sample quantiles from the same marginal distribution. Cram´er d ∗ ∗ ∗ (1946) shows that N(qˆj N − qθ) → N (0, ηθ) where the (j g)–th element of ηθ is ηθ j g = τj ∧τg−τj τg −1 −1 . On the other hand, Babu and Rao (1988) show that the J × 1 vector f(F (τj ))f(F (τg)) of sample quantiles qˆ = (ˆq , ..., qˆ )0, where each element is computed from different N 1 τ1 N J τJ N √ d ∗∗ marginal distributions, has the following asymptotic distribution: N(qˆN − qθ) → N (0, ηθ )

8 −1 −1 ∗∗ ∗∗ Fj g(Fj (τj ),Fg (τg))−τj τg where the (j g)–th element of ηθ is ηθ j g = −1 −1 and Fj g(·) is the bivariate fj (Fj (τj ))fg(Fg (τg)) cumulative distribution function between the j–th and the g–th random variables.

∗ ∗∗ J(J+4)(J(J+4)+1) We gather ηθ and ηθ into the 2 matrix ηθ that is composed of blocks. The (r r) diagonal blocks are due to Cram´er(1946). Let ηθ k g be the (k g)–th elements of the r–th diagonal block. Then (r r) τk ∧ τg − τkτg ηθ k g = −1 −1 fr(Fr (τk))fj(Fj (τg)) The off–diagonal blocks are due to Babu and Rao (1988):

−1 −1 (r j) Fr j(Fr (τi),Fj (τg)) − τkτg ηθ k g = −1 −1 fr(Fr (τk))fj(Fj (τg)) for r 6= j. Let ηˆ be the estimator of ηθ. J(J+5) We gather hµ(qˆN ), hdiagΣ(qˆN ), hoffΣ(qˆz,N ) and hα(qˆN ) into the 2 ×1 vector h(qˆN , qˆz N ) of functions of quantiles:

h(qˆN , qˆz N ) = (hµ(qˆN ), hdiagΣ(qˆN ), hoffΣ(qˆz,N ), hα(qˆN )) .

Note that while h(qˆN , qˆz N ) is of dimension J(J + 5)/2, the dimensions of the vectors qˆN and qˆz N are 5J and J(J − 1) respectively; or (qˆN , qˆz N ) is of dimension J(J + 4). Denote by Ωˆ the J(J+5) J(J+5) 2 × 2 sample variance– of h(qˆN , qˆz N ). Equivalently, we gather the theoretical quantiles into h(qθ, qz θ). We need assumptions A3, A4 and A5 below, followed by Lemma 1 that shows the asymptotic properties of h(qˆN , qˆz N ).

A3 (a) h(qθ, qz,θ) is continuously differentiable with respect to θ for all x, and measurable ∂h(qθ0 ,qz,θ0 ) of x for all θ ∈ Θ. (b) is of full–column rank. (c) h(qθ, qz,θ) is injective with ∂(qθ0 ,qz,θ0 ) respect to θ. ˆ A4 (a) limN→∞ ηˆ = ηθ0 . (b) limN→∞ Ω = Ωθ0 .

A5 There exists a unique true value θ0 such that the probabilistic limit of the sample functions of quantiles equal the theoretical ones.

Lemma 2 Under A1–A5, √ d N(h(qˆN , qˆz,N ) − h(qθ0 , qz,θ0 )) → N (0, Ωθ0 ),

0 ∂h(qθ0 ,qz,θ0 ) ∂h(qθ0 ,qz,θ0 ) where Ωθ0 = ηθ0 0 is a symmetric positive definite variance–covariance ∂(qθ0 ,qz,θ0 ) ∂(qθ0 ,qz,θ0 ) matrix.

The proof is immediate and follows by applying the delta method. The vector h(qθ, qz,θ) does not have an explicit relation with θ but it is obtained from the sample quantiles of simulated observations, which we denote by h˜(qθ, qz θ). Moreover, we draw R simulated paths ˜R 1 PR ˜r and compute the average vector h (qθ, qz θ) = R r=1 h (qθ, qz θ).

9 The estimation principle is to find the value of the parameters that best match the sample and theoretical functions of quantiles. This is done by minimizing the quadratic distance R between h(qˆN , qˆz N ) and h˜ (qθ, qz θ):

R 0 R θˆN = argminθ∈Θ (h(qˆN , qˆz N ) − h˜ (qθ, qz θ)) Wθ(h(qˆN , qˆz N ) − h˜ (qθ, qz θ)). (4)

The J(J + 5)/2 × J(J + 5)/2 weighting matrix Wθ defines the metric and comply with the following assumption.

A6W θ is full rank and symmetric positive definite.

Since it depends on the parameters, we proceed similarly to GMM and Indirect Inference: we optimize (4) with Wθ = I, an identity matrix. The estimated parameters, θˇN , albeit inefficient, are consistent. Then we replace θ by θˇN in Wθ and solve

R 0 ∗ R θˆN = argminθ∈Θ (h(qˆN , qˆz N ) − h˜ (qθ, qz θ)) W (h(qˆN , qˆz N ) − h˜ (qθ, qz θ)). (5) θˇN

Under Lemma 2, and A6, the estimator θˆN is consistent and asymptotically Gaussian, as shown in the following Theorem.

Theorem Given Lemma 2 and under A6 √   1   N(θˆ − θ ) →d N 0, 1 + D W Ω W0 D0 , N 0 R θ0 θ0 θ0 θ0 θ0

 0 −1 0 ∂h(qθ0 ,qz,θ0 ) ∂h(qθ0 ,qz,θ0 ) ∂h(qθ0 ,qz,θ0 ) where Dθ0 = Wθ0 0 . ∂θ0 ∂θ0 ∂θ0

The multilayer sandwich form of the variance-covariance matrix has an intuitive explanation: Ω is the variance–covariance matrix of h(qˆ , qˆ ). The first layer W ·W0 is the effect of θ0 N z,N θ0 θ0 the weighting matrix. The second layer, D ·D0 captures the mapping of h(qˆ , qˆ ) on θˆ . θ0 θ0 N z,N N The proof is not shown here as it follows the same lines as in Dominicy and Veredas (2012) and Gouri´eroux,Monfort and Renault (1993). The calculation of the asymptotic variance– ˆ covariance matrix of θN needs an estimator of Ωθ0 , which is a function of the grid of τ’s and ∂h(q ,q ) the sparsity function that is obtained via simulations. The vector of derivatives θ0 z,θ0 ∂θ0 can be computed numerically.

The inverse of the variance–covariance of the quantile functions is the√ optimal weighting matrix, in the sense that it minimizes the variance–covariance matrix of N(θˆN − θ0). Thus, the estimator has, under W∗ = Ω−1, the following asymptotic distribution. θ0 θ0 ! √  1  ∂h(q , q )0 ∂h(q , q )−1 N(θˆ − θ ) →d N 0, 1 + θ0 z,θ0 Ω−1 θ0 z,θ0 . N 0 θ0 0 R ∂θ0 ∂θ0

10 3.4 Going fast

For moderate dimensions, the minimization (5) can be expensive in terms of computational time, and for vast dimensions (i.e. hundreds or thousands) is unfeasible. However, there is a way to circumvent almost all the optimizations.

A closer look to the functions of quantiles for the tail index hα(qˆN ) reveals that they are location and scale free. We can therefore optimize them with respect to α independently of hµ(qˆN ), hdiagΣ(qˆN ), and hoffΣ(qˆz,N ): ¨R 0 ˆ ¨R αˆN = argminα∈Θ (hα(qˆN ) − hα (qθ)) WαˇN (hα(qˆN ) − hα (qθ)), (6) where the double dot indicates that the theoretical functions are estimated from simulated ob- servations from a standardized distribution, and so it only depends on α. The optimization (6) involves J equations, which is still unfeasible for vast dimensions. However, by the properties of the elliptical family, the marginal distributions are also elliptical with the same tail index, meaning that optimization with respect to just one function is enough to obtain a consistent   qˆj 0.95 N −qˆj 0.05 N ¨R estimator for α. Let hα(qˆj N ) = and similarly for h (q ), then qˆj 0.75 N −qˆj 0.25 N α θj

¨R 0 ˆ ¨R αˆj N = argminα∈Θ (hα(qˆj N ) − hα (qθj )) WαˇN (hα(qˆj N ) − hα (qθj )).

While we know thatα ˆj N converges in probability to α, in finite samples the use of the real- izations of just one element of the random vector may introduce undesirable biases. To sweep (J) 1 PJ them out, we compute the pooled estimatorα ˆN = J j=1 αˆj N . Furthermore, for very large dimensions, the average over J ∗ << J suffices. A sensible choice, but not unique, for J ∗ is blog(J) + 100c, i.e. for dimensions smaller than 100 J = J ∗, while J ∗ increases logarithmically for higher dimensions. The Monte Carlo study in the next section confirms the goodness of these methodology.

Once the tail index is estimated, the dispersions σj j are straightforwardly estimated, fol- lowing McCulloch (1986): qˆj τ N − qˆj 1−τ N σˆj j N = , q¨j τ αˆN − q¨j 1−τ αˆN whereq ¨τ αˆN is the theoretical τ–th quantile of a standardized distribution, similarly to (6). Thus, no optimizations are needed for the dispersion parameters. Likewise, no optimizations are needed to estimate the co–dispersions σj k. Recall that Z(j k) is the projection of the j–th and the k–th standardized random variables on the 45 degree line. The dispersion of Z(j k) can be computed similarly to above:

qˆ(j k) τ N − qˆ(j k) 1−τ N σˆz(j k) N = , q¨(j k) τ αˆN − q¨(j k) 1−τ αˆN whereq ¨(j k) τ αˆN is the theoretical τ–th quantile of the standardized distribution of Z(i j) and so it only depends on the tail index. The object of interest is, however, the co–dispersionσ ˆj k N between Xj and Xk. By re–arranging (3) it can be shown that ! IQR IQR 2 1 2 θk 2 θj σˆj k N =σ ˆ IQR IQR − σˆ +σ ˆ , z(j k) N θj θk 2 j j N IQR k k N IQR θj θk

11 where IQRθj is the theoretical interquantile range for the j–th random variable evaluated at σˆ andα ˆ , and equivalently for IQR . j j N N θk

Last, the locations are also estimated without optimizations:µ ˆj N =q ˆj 0.50 N . To sum up, we only optimize J ∗ univariate problems for the tail index α, while the esti- mators for the locations, dispersions and co–dispersions are available analytically. This makes this method feasible for vast dimensions and it provides accurate estimators, as shown in the Monte Carlo study. Since dispersions and co–dispersions are estimated separately, in finite samples it is likely that estimated dispersion matrix Σˆ N is not positive definite. Though estimates can be very close to the true values, the small finite sample biases in the elements of the matrix accu- mulate and, as a result, it may not have full rank. To circumvent this problem one can average the simulated function of quantiles over R draws, but it is time consuming and for large dimensions it does not ensure definite positiveness.6 An alternative is regularizing by means of the eigenvalue cleaning technique (see Laloux, Cizeau, Bouchaoud and Potters, 1999, Tola, Lillo, Gallegati and Mantegna, 2008, and Hautsch, Kyj and Oomen, 2011). Let −1/2 −1/2 Γˆ N = diag(Σˆ N ) Σˆ N diag(Σˆ N ) be the estimated standardized dispersion matrix with ˆ ˆ ˆ ˆ 0 ˆ spectral decomposition ΓN = QN ΛN QN , where QN is the orthonormal matrix of estimated ˆ ˆ ˆ eigenvectors and ΛN is the diagonal matrix of estimated eigenvalues. Let λ(1) N ≥ ... ≥ λ(J) N ˆ ˆ be the ordered eigenvalues (i.e. λ(1) N is the largest and λ(J) N is the smallest). Eigenvalue cleaning is based on replacing the eigenvalues less than a threshold λmax by the average of the positive eigenvalues below λmax:

PL max(0, λˆ ) λ˜ = l=0 (J−l) N , N L + 1 where L + 1 corresponds to the position, from the right, of the largest eigenvalue smaller ˜ ˆ ˜ ˆ 0 than λmax. The resulting estimated standardized dispersion matrix is ΓN = QN ΛN QN and the positive definite estimated dispersion matrix is obtained by un–standardizing Γ˜ N : Σ˜ N = ˆ 1/2 ˜ ˆ 1/2 diag(Σ)N ΓN diag(ΣN ) .  PL∗ ˆ   p  The threshold is given by λmax = 1 − l=1 λ(l) N /J 1 + J/N + 2 J/N , i.e. it is a ∗ ∗ function of the L largest eigenvalues. The smaller L , the largest the difference between Σˆ N ˜ ∗ ˆ and ΣN . The above mentioned references consider L = 1 on the grounds that λ(1) N represents the common dispersion. There is no reason however in our case to consider this value. After calibration we have found that L∗ = 10 is the best compromise.7

6Detailed results are available under request. 7Results for the Monte Carlo study and the empirical application with other values of L∗ are available under request.

12 4 Assessing the finite sample properties

We analyze the finite sample properties of the estimators for the Gaussian, Student–t and ESD.8 We consider dimensions 20, 200 and 2000. For dimension 20 the parameters are es- timated with both the optimization (4) and the fast method, while for the 200 and 2000 dimensions we only use the fast method. Locations are set to zero. Tail indexes are set to 3, 10 and 20 for the Student-t, and to 1.5, 1.7 and 1.9 for the ESD. Regarding the dispersion matrices, top left plots of Figures 3, 4, and 5 display the true ones.

[FIGURE 2 ABOUT HERE]

For each dimension we simulate 10, 100 and 500 draws of 100, 1000 and 5000 observations. Due the high number of parameters (231, 20301 and 2003001 for 20, 200 and 2000 dimensions respectively) we only present the medians of the estimators for the locations (in the form of line plots) and dispersion matrices (in the form of heat maps). Because of space considerations, we present results just for the fast method, 500 draws and 1000 observations. Results for other configurations, which don’t change qualitatively, as well as detailed results (such as the root square error and the mean absolute deviation) for all configurations are available under request.

[FIGURE 3 ABOUT HERE]

[FIGURE 4 ABOUT HERE]

[FIGURE 5 ABOUT HERE]

Figure 2 shows the results for the location parameters, Figures 3, 4, and 5 show the es- timated dispersion matrices, and Table 1 displays the estimated tail indexes and the compu- tational time per draw. The estimated location parameters are all around zero and with a variance that increases with the dimension. This is due to the fact that in this set up the dispersions increases with the dimension (it can be easily understood by looking at the y–axis of the heatmaps). The estimated dispersion matrices are in general very close to the true ones, with a slight deterioration with the dimension.9 The distributional assumption and the degree of tail thickness does not affect the estimation as the estimated dispersion matrices are alike.

[TABLE 1 ABOUT HERE]

8Simulating from an ESD requires to simulate from a Gaussian and a totally right skewed standardized univariate stable distribution:  πα 2/α  A ∼ S cos , 1, 0 , α/2 4 for which we use Chambers et. al (1976). 9The plots of the estimated dispersion matrices in Figure 5 should be taken cautiously since the scale is not the same as in the true dispersion matrix.

13 The estimated tail indexes are very closed to the true ones for all dimensions, with a marginal upper bias of 1.5 − 2.5% that deserves further research. The table also shows the estimation time per draw (in seconds).10 The maximum computational time are 10 seconds for dimension 20, 1 minute and 15 seconds for the 200 dimensional case, and 70 minutes for dimension 2000. These times show that the method is indeed fast. Substantial computational gains can be obtained if the code is written in C and run in a parallel processing environment.

5 Testing for level contours fit

In many applications the researcher is interested in fitting one, or several, level contours rather than the distribution. In financial applications, for instance, the interest lies on the tails. The tests that we propose in this section are based on the failure rate: the percentage of times that the observations are inside of one, or several, estimated level contours. If these contours are well specified and estimated, the empirical failure rates should not be statistically different to the nominal rates. To test this hypothesis, we rely on the fact that a vector of empirical failure rates is multinomial, and therefore simple Wald test statistics can be used. The stochastic representation of the elliptical family of distributions can be re–written as 0 −1 2 (X − µ) Σ (X − µ) = Rα. Testing for the correct distributional assumption boils down to testing for the distributional assumption of Rα, which can be done, once we substitute µ and Σ by estimators, with standard goodness–of–fit tests statistics (Cram´er–von–Mises, Anderson Darling, and Kolmogorov–Smirnov among others). A different issue is testing for the choice of Rα that best fits certain level contours, in particular those on the tails. For the same µ and Σ, what makes one elliptical distribution different from another is the generating variate Rα. Let µˆN , Σˆ N andα ˆN be the estimators, then: (X − µˆ )0Σˆ −1(X − µˆ ) = R2 . N N N αˆN For a given value of R2 = c, this is the equation of a J–dimensional . By varying c αˆN we obtain of different sizes. But as they are ellipsoids of a , they have the interpretation of level contours: the constant c is the τ–quantile of R2 (which αˆN we denote by qτ,R) or the τ–level contour of X: 0 ˆ −1 (X − µˆN ) ΣN (X − µˆN ) = qτ,R.

Note that qτ,R can be easily computed with simulations. If the distributional assumption on Rα fits correctly the τ–level contour, then τ0% (the hypothesized value) of the observations should lie inside the corresponding τ0–level contour. Let  1 if (x − µˆ )0Σˆ −1(x − µˆ ) < q I = l N N l N τ0,R l 0 otherwise, for l = 1,...,N and where xl is a J × 1 vector of observations. Then

N 1 X τˆ = I N N i i=1

10All the simulation study and the empirical illustration below were performed on a Sony Vaio with an Intel Core Due processor of 2.10GHz and 4GB of SDRAM.

14 is the empirical proportion of observations inside the τ–level contour. We can proceed sim- ilarly for a grid τ = (τ1, . . . , τb) with τ0 = (τ1 0, . . . , τb 0) hypothesized levels and empirical counterparts τˆN = (ˆτ1 N ,..., τˆb N ). Equipped with these vectors, several hypothesis testing scenarios are possible, which are shown in Figure 6 and we exemplify in terms of financial risk management. The top left plot in Figure 6 shows the scenario 1 where the risk manager is interested in fitting correctly just one τ–level contour, say the 95%. The top right plot shows the scenario 2 where the risk manager is, instead, interested in fitting correctly several τ–level contours, e.g. τ = (0.99, 0.975, 0.95, 0.925, 0.90). The bottom plots show two situations where the main interest is one contour but the risk manager would also like to fit reasonably well the neighbouring ones (the precise meaning the ”reasonably well” to be explained below). The bottom left shows the scenario 3 where the middle τ–level contour is the main concern, and two τ–level contours that are equally spaced from the one of interest have the same importance. The bottom right plot shows a similar case but the relation is asymmetric: outer neighbouring τ–level contours are more important than inner τ–level contours. This is scenario 4.

[FIGURE 6 ABOUT HERE]

Whatever the scenario is, they can all be framed in the following null hypothesis:

H0 : R(τ − τ0) = 0, where R is the identity matrix of size b × b for scenarios 2, 3 and 4, or a 1 × b vector R = (0,..., 1,..., 0) for scenario 1. The test statistic is a Wald–type that, following Laurent and Veredas (2011), incorporates the parameter uncertainty in µˆN , Σˆ N and αˆN : 0 0 0 −1 2 ξN = N (R(τˆN − τ0)) (R(Πθ0 − BΨθ0 B )R ) (R(τˆN − τ0)) ∼ χb , where Πθ0 is the variance–covariance matrix of a (replaced by its estimator Πˆ =τ ˆ (1 − τˆ ) and Πˆ =τ ˆ ∧ τˆ − τˆ τˆ ), Ψ is the variance–covariance √i i i N i N i j i N j N i N j N θ0 ˆ ˆ ∂τˆN matrix of N(θN − θ0), replaced by its estimator Ψˆ , and B = limN→∞ 0 . For vast θN ∂θ dimensional problems, Ψθ0 is very large and the computation of B can be time consuming. 0 An alternative is to bootstrap from Il and substituteτ ˆi N and Πθ0 − BΨθ0 B for the mean and variance of the bootstraped τ ’s. Scenarios 3 and 4 are special as the main interest is one τ–level contour while the neigh- bouring ones are of minor concern, but they are taken into account. In other words, the main interest is fitting one τ–level contour with a high degree of precision, or intolerance, while the fit of neighbouring contours can be less precise (with increasing tolerance as they are further from the one of interest). From a statistical viewpoint increasing tolerance trans- lates into increasing size, or probability of type I error. Let ζ0 be the size of the τ–level contour of interest, and let ζ = ζ0w be the vector of sizes for all the considered contours where w = (w−(b−1)/2, . . . , w0, . . . , w(b−1)/2) is a vector of weights (larger than one except w0 = 1) that governs the increase in the sizes of the neighbouring contours. A sensible pattern for the weights is

   1 + cr − 1 (1 − exp(−k/s )) for k > 0  ζ0 r wk =   1 + cl − 1 (1 − exp(k/s )) for k ≤ 0,  ζ0 l

15 where (cr, sr) and (cl, sl) are tuning parameters that control (to the right and left side of ζ0 respectively) the degree of asymmetry and the maximum tolerance. Their bounds, for k ≤ 0, are

1 − ζ0 exp(k/sl) ζ0 ≤ cl ≤ 1 − exp(k/sl) 0 < sl < +∞.

And equivalently for cr and sr (but for k > 0 and substituting k for −k). This is therefore a Bonferroni–type of multiple testing problem for b hypotheses with different sizes. Since the critical regions of the hypotheses may not be independent, it is difficult to compute the level 11 of the test statistic ξN under the null (denoted by ζ).

[TABLE 2 ABOUT HERE]

Pb Pb We know however that the upper bound is given by ζ ≤ i=1 ζi = ζ0 i=1 wi, and that the difference with the case of independence is small for reasonable values of b. Table 2 shows the values of the sizes in the cases of independence and dependence among the hypothesis. Note that, for instance, b = 9 means that there are 4 neighbouring contours. Though differences increase with ζ0, for small and moderate number of neighbouring contours they are minor.

6 Illustration

We illustrate the method with 9 years of daily returns of 22 major worldwide market indexes that represent three geographical areas: America (S&P500, NASDAQ, TSX, Merval, Bovespa and IPC), Europe and Middle East (AEX, ATX, FTSE, DAX, CAC40, SMI, MIB and TA100), and East Asia and Oceania (HgSg, Nikkei, StrTim, SSEC, BSE, KLSE, KOSPI and AllOrd). This is the same data as used in Dominicy and Veredas (2012). We refer to Table 4 of this article for further details. The sample consists of 2536 observations, from January 4, 2000 to September 22, 2009.

[FIGURE 7 ABOUT HERE]

The volatility behavior is very heterogeneous. Some countries display strong clustering while others present large deviations but not clusters. It is known (De Vries, 1991, Ghose and Kroner, 1995) that heavy tails generated by volatility clustering can be mistakenly interpreted as evidence in favor of unconditional stable distributions. To safeguard against conditional volatility (and possible mean reversion), and following standard practice, we adjust each return series with an AR(2)–GARCH(1,1) model such that the remaining is not conditional.12 Admittedly, the illustration is subject to certain criticisms. First the tail index is the same for all countries. Dominicy and Veredas (2012) show that indeed it is not the case, ranging (for

11 Qb If the hypotheses would be independent ζ = 1 − i=1(1 − ζ0wi). 12If we adjust jointly with a VAR(2)-CCC model, the dispersion matrix of the adjusted returns is the identity matrix.

16 the ESD) between 1.54 and 1.93, but when plotted with the 5% confidence bands, the constraint of a single tail index does not seem unreasonable. Second, data are skewed yet we do not allow for asymmetries. Third, we assume that the AR(2)–GARCH(1,1) adjustment gets rid of all the dependence. This is standard in empirical finance and there is no evidence that dependencies beyond the location and dispersion are a stylized fact. Fourth, we consider constant co– dispersion. In order to check it, Figure 7 shows the sample correlations for a rolling window of five years: 2000-2004, 2001-2005, 2002-2006, 2003-2007, 2004-2008, 2005-2009. Markets are ordered by geographical areas and from top left to bottom right as explained in the first paragraph of this section. The American and European markets are strongly related, which we qualify as the Atlantic block. And the North American and European markets are more related within than between each other (except the Austrian market ATX). By contrast the East Asia and Oceania markets are not very related, nor within each other –except Hang Seng and Nikkei– nor with the Atlantic block. Though there is an evolution in the correlations due to the integration of markets worldwide, the assumption of constant co–dispersion is plausible. In any case, this estimation exercise is purely illustrative and an application in a dynamic and asymmetric context is beyond the scope of this paper; though it is an worth investing research avenue, as explained in the conclusions.

[FIGURE 8 ABOUT HERE]

The parameter space is 275–dimensional for the Gaussian distribution and 276–dimensional for the Student–t and ESD. We do not show all the results (available under request). Instead we focus on the dispersions matrices and the tail indexes. Figure 8 shows, in the forms of heat maps, the estimated standardized dispersion matrices (i.e. main diagonal equals one) under Gaussianity (top), Student–t (middle) and ESD (bottom), and prior and after the eigenvalue correction (left and right columns respectively). The dispersions present important clustering. Yet, there are differences between the three distributions. For the Student–t and ESD are lower than for the Gaussian. This is due to the fact that the tails indexes explain the extreme dispersion. Indeed, the estimated tails indexes are 9.51 and 1.76 for the Student–t and the ESD respectively. The eigenvalue cleaning has an effect on the estimates, decreasing slightly their values.

[TABLE 3 ABOUT HERE]

Table 3 shows the test statistics for individual testing of 0.90, 0.95 and 0.99 τ–level contour (gathered under the legend Individual) as well as the statistics for the null hypothesis of τ equal to 0.95 and 0.99 when information on the neighbouring level contours is taken into account. The tuning parameters are cr = cl = 0.06 and sr = sl = 20. Top and bottom panels show 0 the results of the test statistics when Πθ0 − BΨθ0 B is computed exactly and with bootstrap respectively. The sub–panels Plain and EV (for eigenvalue) are for results prior and posterior to the eigenvalue cleaning. Numbers in regular and small fonts are the test statistics ξN and their p–values respectively (and p–values in bold are those larger than 0.01). Results show that the Gaussian distribution never fits correctly the level contours while the Student–t and the ESD do it reasonably well except for τ = 0.99, and all the distributions have it hard in the weighted tests. The eigenvalue cleaning affects the results since it modifies the shape of the ellipsoid, as seen above. A further investigation of this issue is worthy exploring.

17 7 Conclusions

In this article we propose a fast way to do inference in vast dimensional elliptical distributions. Since it is based on quantiles there is no need of existence of moments or analytic form of the density function. Three are the contributions to this paper. The first one is a function of quantiles that is informative about the co–dispersions and that is based on the following idea: If two centered and scaled random variables co–move, most of the times the pairs of observations have the same sign. So the dispersion of its projection in the 45–degree line has large dispersion. And vice–versa if they anti–move. Therefore the interquantile range of the projection is informative about the co–dispersion between the two random variables. The second contribution is that we are able to overcome almost all the optimizations, making the estimation method suitable for vast dimensional distributions. The last contribution is a battery of tests for testing the correct specification of level contours. Several extensions for future research are possible, such as time–varying dispersions matrices and time dependencies.

References

[1] Babu, G.J. and Rao, C.R. (1988) Joint asymptotic distribution of marginal quantiles and quantile functions in sample from a multivariate population. Journal of Multivariate Analysis 27, 15-23.

[2] Bonato, M. (2012) Modeling fat tails in stock returns: a multivariate Stable–GARCH approach. Compu- tational Statistics forthcoming.

[3] Cambais, S., Huang, S. and Simons, G. (1981) On the theory of elliptically contoured distributions. Journal of Multivariate Analysis 11, 368-385.

[4] Chambers, J.M., Mallows, C.L. and Stuck, B.W. (1976) A method for simulating stable random variables. Journal of the American Statistical Association 71, 340-344. Corrections 82 (1987): 704, 83 (1988): 581.

[5] Cram´er,H. (1946). Mathematical methods of statistics. Princeton N.J.: Princeton University Press.

[6] de Vries, C.G. (1991) On the relation between GARCH and stable processes. Journal of 48, 313-324.

[7] Dominicy, Y. and Veredas, D. (2012) The method of simulated quantiles. Journal of Econometrics forth- coming.

[8] Fama, E. and Roll, R. (1968) Some properties of symmetric stable distributions, Journal of American. Statistical Association 63, 817-836.

[9] Fama, E. and Roll, R. (1971) Parameter estimates for symmetric stable distributions. Journal of American. Statistical Association 66, 331-338.

[10] Fang K.T., Kotz, S. and Ng, K.W. (1990) Symmetric multivariate and related distributions. New York: Chapman and Hall.

[11] Frahm, G. (2004) Generalized elliptical distributions. PhD thesis, University of Cologne.

[12] Ghose, D. and Kroner K.F. (1995) The relationship between GARCH and symmetric stable processes: Finding the source of fat tails in financial data, Journal of Empirical Finance 2, 225-251.

[13] Gonzalez–Rivera, G., Senyuz, Z. and Yoldas E. (2011) Autocontours: Dynamic Specification Testing, Journal of Business and Economic Statistics 29, 186-200.

18 [14] Gonzalez–Rivera, G., and Yoldas E. (2011) Autocontour–based Evaluation of Multivariate Predictive Densities, International Journal of Forecasting forthcoming.

[15] Gouri´eroux,C., Monfort, A. and Renault E. (1993) Indirect Inference. Journal of Applied Econometrics, 8, 85-118.

[16] Hallin, M., Paindaveine, D. and Oja, H. (2006) Semiparametrically efficient rank-based inference for shape. II. Optimal R-estimation of shape Annals of Statistics 34, 2757-2789.

[17] Hautsch, N., Kyj, L. M. and Oomen, R. C. (2011) A and regularization approach to high dimen- sional realized covariance estimation, forthcoming in the Journal of Applied Econometrics.

[18] Hill, J.B. and Renault, E. (2012) Generalized Method of Moments with Tail Trimming. University of North Carolina at Chapel Hill, mimeo.

[19] Kelker, D. (1970) Distribution theory of spherical distributions and a location- generaliza- tion. Sankhya A32, 419-430.

[20] Laloux, L., Cizeau, P., Bouchaud, J-P. and Potters, M. (1999) Noise dressing of financial correlation matrices, Physical Review Letters 83, 1467-1470.

[21] Laurent S. and Veredas, D. (2011) Testing Conditional Asymmetry. A Residual-Based Approach, Journal of Economic Dynamics and Control, forthcoming.

[22] Lombardi, M.J. and Veredas, D. (2009) Indirect inference of elliptical fat tailed distributions. Computa- tional Statistics and Data Analysis 53, 2309-2324.

[23] McCulloch, J.H. (1986) Simple consistent estimators of stable distribution parameters. Communications in Statistics - Simulation and Computation 15(4), 1109-1136.

[24] Nolan, J.P. (2010) Multivariate elliptically contoured stable distributions: theory and estimation. Mimeo.

[25] Samorodnitsky, G. and Taqqu, M.S. (1994) Stable non-gaussian random processes: Stochastic models with infinite variance. Chapman & Hall/CRC, Stochastic Modeling.

[26] Tola, V., Lillo, F., Gallegati. M., and Mantegna, R. (2008) for portfolio optimization, Journal of Economic Dynamics and Control 32, 235-258.

[27] Tyler, D.E. (1987) A distribution-free M-estimator of multivariate scatter. Annals of Statistics 15, 234-251.

19 Tables and Figures

Figure 1: A diagrammatic representation of the function of quantiles for the co–dispersion

Xi Xi Z(j j) Z(j j)

Xj Xj

(a) Positive co–dispersion (b) Negative co–dispersion

The left plot shows a situation when the co–dispersion between Xi and Xj (the pairs are represented by the circles) are positively related. This results in a random variable Z(i j) (represented by the squares) that is the projection onto the 45–degree line and very dispersed. By contrast, the right plot shows the situation where Xi and Xj are negatively related, which results in the projection Z(i j) that is little dispersed.

Table 1: Estimated tail indexes and computational time Dim. 20 Dim. 200 Dim. 2000 (J∗) (J∗) αˆN Timeα ˆN Timeα ˆN Time Gaussian – 1.18 – 47.77 – 2774 ESD α = 1.5 1.525 7.540 1.517 63.64 1.529 3080 ESD α = 1.7 1.731 5.713 1.723 70.92 1.736 3263 ESD α = 1.9 1.918 5.197 1.928 71.27 1.936 3279 Student–t α = 3 3.074 10.05 3.075 72.34 3.073 4209 Student–t α = 10 10.26 7.084 10.25 75.23 10.24 4016 Student–t α = 20 20.51 6.580 20.59 72.17 20.49 4510 Computational time (denoted by Time) measured in seconds per draw.

20 Figure 2: Locations

0.02 0.08

0.015 0.06

0.01 0.04 0.005

0 0.02

−0.005 0

−0.01 −0.02 −0.015

−0.04 −0.02

−0.025 −0.06 0 5 10 15 20 0 50 100 150 200

(a) Dim. 20 (b) Dim. 200

4

3

2

1

0

−1

−2

−3

−4 0 500 1000 1500 2000

(c) Dim. 2000

Estimated location parameters for the Gaussian (solid line), ESD with α = 1.7 (dotted line) and Student-t with α = 10 (dashed line) for dimensions 20, 200 and 2000.

Table 2: Sizes and upper bounds of ξN ζ0 = 0.01 ζ0 = 0.05 b Indep. Bound Indep. Bound 3 0.03 0.03 0.15 0.16 5 0.07 0.07 0.25 0.27 9 0.14 0.16 0.42 0.54 13 0.23 0.26 0.57 0.82 17 0.33 0.39 0.69 1.12 21 0.42 0.53 0.77 1.45

Indep. denotes the sizes for ξN in the case of independence while Bound denotes the upper bounds when the b hypothesis are not inde- pendent. The table consider two sizes for the hypothesis of interest (i) ζ0 = 0.01 with cr = cl = 0.02 and sr = sl = 10, and (ii) ζ0 = 0.05 with cr = cl = 0.06 and sr = sl = 10. The first column denotes the number of neighbor- ing contours (i.e. b − 1).

21 Figure 3: Dispersion matrices: Dimension 20

7 7 2 2

4 6 4 6

6 6 5 5 8 8 4 4 10 10

12 3 12 3

14 14 2 2 16 16

18 1 18 1

20 20

5 10 15 20 5 10 15 20

(a) True (b) Gaussian

7 7 2 2

4 6 4 6

6 6 5 5 8 8 4 4 10 10

12 3 12 3

14 14 2 2 16 16

18 1 18 1

20 20

5 10 15 20 5 10 15 20

(c) ESD α = 1.5 (d) ESD α = 1.7

7 7 2 2

4 6 4 6

6 6 5 5 8 8 4 4 10 10

12 3 12 3

14 14 2 2 16 16

18 1 18 1

20 20 0 5 10 15 20 5 10 15 20

(e) ESD α = 1.9 (f) Student–t α = 3

7 7 2 2

4 6 4 6

6 6 5 5 8 8

4 4 10 10

12 3 12 3

14 14 2 2 16 16

18 1 18 1

20 20 0 5 10 15 20 5 10 15 20

(g) Student–t α = 10 (h) Student–t α = 20

22 Figure 4: Dispersion matrices: Dimension 200

20 60 20 60 40 40 50 60 60 50

80 40 80 40 100 100 30 30 120 120

140 20 140 20

160 160 10 10 180 180

200 200 0 50 100 150 200 50 100 150 200

(a) True (b) Gaussian

70

20 20 60 60 40 40

60 50 60 50

80 80 40 40 100 100 30 30 120 120

140 20 140 20

160 160 10 10 180 180

200 0 200 0 50 100 150 200 50 100 150 200

(c) ESD α = 1.5 (d) ESD α = 1.7

70

20 20 60 60 40 40

60 50 60 50

80 80 40 40 100 100 30 30 120 120

140 20 140 20

160 160 10 10 180 180

200 0 200 0 50 100 150 200 50 100 150 200

(e) ESD α = 1.9 (f) Student–t α = 3

20 20 60 60 40 40

50 60 60 50

80 80 40 40 100 100 30 30 120 120

140 20 140 20

160 160 10 10 180 180

200 0 200 0 50 100 150 200 50 100 150 200

(g) Student–t α = 10 (h) Student–t α = 20

23 Figure 5: Dispersion matrices: Dimension 2000

200 600 200 700

400 400 500 600 600 600 500 800 400 800

1000 1000 400 300 1200 1200 300

1400 200 1400 200 1600 1600 100 100 1800 1800

2000 2000 0 500 1000 1500 2000 500 1000 1500 2000

(a) True (b) Gaussian

200 700 200 700

400 400 600 600 600 600 500 500 800 800

1000 400 1000 400

1200 300 1200 300

1400 1400 200 200 1600 1600 100 100 1800 1800

2000 0 2000 0 500 1000 1500 2000 500 1000 1500 2000

(c) ESD α = 1.5 (d) ESD α = 1.7

800

200 200 700 700 400 400 600 600 600 600 500 500 800 800

1000 400 1000 400

1200 1200 300 300 1400 1400 200 200 1600 1600 100 100 1800 1800

2000 0 2000 0 500 1000 1500 2000 500 1000 1500 2000

(e) ESD α = 1.9 (f) Student–t α = 3

200 200 700 700 400 400 600 600 600 600 500 500 800 800

1000 400 1000 400

1200 1200 300 300

1400 1400 200 200 1600 1600 100 100 1800 1800

2000 0 2000 0 500 1000 1500 2000 500 1000 1500 2000

(g) Student–t α = 10 (h) Student–t α = 20

24 Figure 6: Hypothesis testing scenarios

X k Xk

X j Xj

(a) Scenario 1 (b) Scenario 2

Xk Xk

Xj Xj

(c) Scenario 3 (d) Scenario 4

(a) Scenario 1: the interest lies in fitting correctly just one τ–level contour. (b) Scenario 2: the interest lies in fitting correctly several τ–level contours. (c) Scenario 3: the main interest is one contour while the neighbouring ones are of minor interest. At the same time, the adjacent contours loose relevance as they get further from the one of interest. (d) Scenario 4: similar case to (c) but the relation is asymmetric. i.e. outer neighboring τ–level contours are more important than inner τ–level contours.

25 Figure 7: Sample correlations

1 1 2 2 0.9 0.9 4 4 0.8 0.8 6 6 0.7 0.7 8 8 0.6 0.6 10 10

12 0.5 12 0.5

14 0.4 14 0.4

16 0.3 16 0.3

18 0.2 18 0.2

20 0.1 20 0.1 22 22

5 10 15 20 5 10 15 20

(a) 2000-2004 (b) 2001-2005

1 1 2 2 0.9 0.9 4 4 0.8 0.8 6 6 0.7 0.7 8 8 0.6 0.6 10 10 0.5 12 0.5 12

14 0.4 14 0.4

16 0.3 16 0.3

18 0.2 18 0.2

20 0.1 20 0.1 22 22

5 10 15 20 5 10 15 20

(c) 2002-2006 (d) 2003-2007

1 1 2 2 0.9 0.9 4 4 0.8 0.8 6 6 0.7 8 0.7 8

10 0.6 10 0.6

12 0.5 12 0.5

14 14 0.4 0.4 16 16 0.3 0.3 18 18 0.2 0.2 20 20 0.1 0.1 22 22

5 10 15 20 5 10 15 20

(e) 2004-2008 (f) 2005-2009

Heat maps of the sample correlations for the rolling windows 2000-2004, 2001-2005, 2002-2006, 2003-2007, 2004-2008, 2005-2009. Markets are ordered by geographical areas and from top left to bottom right: America (S&P500, NASDAQ, TSX, Merval, Bovespa and IPC), Europe and Middle East (AEX, ATX, FTSE, DAX, CAC40, SMI, MIB and TA100), and East Asia and Oceania (HgSg, Nikkei, StrTim, SSEC, BSE, KLSE, KOSPI and AllOrd).

26 Figure 8: Estimated standardized dispersions matrices

1 1 2 2 0.9 0.9 4 4 0.8 0.8 6 6 0.7 0.7 8 8

10 0.6 10 0.6

12 0.5 12 0.5

14 14 0.4 0.4 16 16 0.3 0.3 18 18 0.2 0.2 20 20 0.1 0.1 22 22

5 10 15 20 5 10 15 20

(a) Gaussian (b) Gaussian – Eigenvalue cleaning

1 1 2 2 0.9 0.9 4 4 0.8 0.8 6 6

8 0.7 8 0.7

10 0.6 10 0.6

12 12 0.5 0.5 14 14 0.4 0.4 16 16 0.3 0.3 18 18

20 0.2 20 0.2

22 22 0.1 0.1 5 10 15 20 5 10 15 20

(c) Student–t (d) Student–t – Eigenvalue cleaning

1 1 2 2 0.9 0.9 4 4 0.8 0.8 6 6 0.7 0.7 8 8 0.6 0.6 10 10 0.5 0.5 12 12 0.4 14 0.4 14 0.3 16 0.3 16 0.2 18 0.2 18 0.1 20 0.1 20

22 22 0 0 5 10 15 20 5 10 15 20

(e) ESD (f) ESD – Eigenvalue cleaning

The top, middle and bottom plots show the heat maps of the standardized dispersions matrices under Gaussian- ity, Student–t and ESD respectively, and prior and posterior to the eigenvalue cleaning (left and right columns respectively). Markets are ordered by geographical areas and from top left to bottom right: America (S&P500, NASDAQ, TSX, Merval, Bovespa and IPC), Europe and Middle East (AEX, ATX, FTSE, DAX, CAC40, SMI, MIB and TA100), and East Asia and Oceania (HgSg, Nikkei, StrTim, SSEC, BSE, KLSE, KOSPI and AllOrd).

27 Table 3: Testing results Individual Weighted 0.90 0.95 0.99 0.95 0.99 Exact Plain Gaussian 65.94 64.64 110.1 114.6 105.4 0.000 0.000 0.000 0.000 0.000 Student-t 37.82 13.62 5.492 39.43 2.425 0.000 0.000 0.193 0.000 0.279 ESD 1.185 1.754 166.7 167.3 187.5 0.447 0.386 0.000 0.000 0.000 EV Gaussian 75.40 114.8 149.9 151.8 144.8 0.000 0.000 0.000 0.000 0.000 Student-t 19.01 21.22 0.005 31.59 5.071 0.000 0.000 0.943 0.000 0.167 ESD 1.893 0.973 193.2 229.6 166.2 0.269 0.724 0.000 0.000 0.000 Bootstrap Plain Gaussian 72.06 71.60 67.69 130.1 224.9 0.000 0.000 0.000 0.000 0.000 Student-t 8.671 0.218 9.987 40.23 11.09 0.029 0.641 0.015 0.000 0.006 ESD 9.201 1.617 275.4 129.8 139.7 0.019 0.403 0.000 0.000 0.000 EV Gaussian 1.7e5 1.8e5 1.3e5 4.7e5 3.6e5 0.000 0.000 0.000 0.000 0.000 Student-t 315.2 428.6 186.5 931.2 544.8 0.000 0.000 0.000 0.000 0.000 ESD 1537 4.822 292.5 1834 344.6 0.000 0.147 0.000 0.000 0.000 Top and bottom panels show the results of the test statistics when 0 Πθ0 −BΨθ0 B is computed exactly and with bootstrap respectively. The sub–panels Plain and EV (for eigenvalue) are for results prior and posterior to the eigenvalue cleaning. Numbers in regular and small fonts are test statistic ξN and its p–value respectively (p– values in bold are those larger than 0.01).

28