<<

Targeting for Heavy Tailed June 2012 Conference “Advances in ”, Yale SOM

Jonathan Hill* Eric Renault** *University of North Carolina at Chapel Hill ** Brown University

OUTLINE 1. Variance targeting in GARCH models 2. Sample variance with tail trimming 3. QMLE with (tail-trimmed) variance targeting = solution to infinite unconditional problem 4. Trimming orthogonality conditions for GMM = solution to infinite conditional kurtosis problem 5. Monte Carlo study 1. Variance Targeting in GARCH Models

• Example 1: GARCH(1,1)

1/ 2 yt1  ht1   t1 2 ht1   (yt1)  ht ,  0,   0,      1 Var(y )     t 1     new parameterization :   ( , ),  (,) Variance Targeting: • Idea = direct estimation of unconditional variance   common practice: 1 T ˆ 2 (i)  = estimated by sample variance:  T   yt T t1 (ii) (Q)MLE = applied to remaining parameters after plugging in sample variance: 2 2 ht1   yt1  ht   yt1  (1 )ht 2 2 2  ˆt1,T ( )  ˆT yt1  (1 )ˆt,T ( ),   (,). Cost-Benefit of variance targeting • Benefits: (i) Better finite sample performance (especially for estimation of ). (ii) Robustness to misspecification of variance equation. • Costs: (i) loss at least if QMLE = MLE (ii) For asymptotic normality, requires a much more restrictive assumption than QMLE Not only E  4   (and Eh     )  t1  t1 4 2 4 But also Eyt1  Eht1 t1    Var(ht1)   Does unconditional kurtosis exist?

• Existence of the unconditional fourth of stochastic process generating financial return  maintained interest for researchers: • He and Terasvirta (1999, ET)“Fourth Moment Structure of the GARCH(p,q) Process” R1. Existence “would enable one to see how well the kurtosis and (of squared returns) implied by the estimated model match the estimates obtained directly from the data” R2. Existence = far from being certain in case of volatility persistence GARCH (1,1) case: (see He and Terasvirta for general GARCH(p,q))

1/ 2 yt1  ht1   t1 2 4 Et ( t1)  0, Et ( t1) 1, Et ( t1)  4 , 2 ht1   (yt1)  ht . 4 2 2 E(yt1)    4  2   1. Conditionally normal case:

4 2 2 E(yt1)    2  (  ) 1. Pareto Tail Index  • Basrak, Davis and Mikosch, SPA, 2002 “Regular variation of GARCH processes”

1/ 2 yt1  ht1   t1

2 a ht1   (yt1)  ht ,  0,   0,  2  / 2 P yt  c c ,c    Et    1 p    1   2, Eyt      p     1 Var(y )  t 1   R3. Hill’s tail index estimator for daily log- returns over the period 2001-2011: 90% confidence band: SP 500 and NASDAQ : [2,3] DAX: [2, 3.5] NIKKEI : [1.8, 2.8] Conclusion: (i) Variance should be finite ( not compelling for Nikkei!) (ii)(Unconditional) moment of order 3 may exist (iii)Hard to find a series that appears to have finite unconditional moment of order 4

Hill Plots for daily log-returns on SP500 (2513 days from Jan ,1st, 2001 to Jan 1st, 2011) Multivariate examples: • Example 1: DCC-GARCH(1,1): Univariate GARCH for each asset + dynamic conditional correlations:

yi,t  i,t  1/ 2  conditional correlation matrix : (hii,t )

Qt  conditional variance matrix of  t  ( i,t )1in  correlation targeting in the DCC equation :

Qt  (1   )Q  t t 'Qt1  no additional problem under 4 maintained assumption E i,t   ,i. Variance targeting and parsimony • DCC example  Once the unconditional individual and correlation matrix is estimated : only individual GARCH + two more parameters  and   saves N = n(n+1)/2 parameters. • Example 2: VEC-GARCH(1,1)

1/ 2 yt  Ht  t ,ht  vech(Ht ), t  vech( t t ')

h  c  A  Gh , t t1 t1  vech Var(y )  [Id  A  G]1c.  t  N  Variance targeting still requires unconditional finite (matricial) kurtosis 2. Sample variance with tail trimming • Univariate case: 1 T ˆ 2 T  0  T   yt   Var(yt )   T t1 0 4 BUT : T ˆT   asymptotically normal if E(yt )  . Key idea: Tail-Trimming of returns before computing sample variance ˆ( y) ( y) IT ,t  0  yt is one of the kT largest

observations among y , 1,...,T. 1 T ˆ( y) ˆ(tr) 2 ˆ( y) IT ,t 1 otherwise, T   yt IT ,t T t1 ( y) kT   promotes Gaussian asymptotics ( y) (tr) 0 kT /T  0 ensures unbiasedness : ˆT    Var(yt ) Asymptotic distribution of estimator of unconditional variance in GARCH: 1st case: Without trimming (with finite kurtosis): Horvath, Kokoszka,Zitikis, JFEC,2006: 1 T 1 T 1 T ˆ 2 2  T   yt  ht ( t 1)  ht T t1 T t1 T t1 0 T 0 1  1 2    0 ht ( t 1) oP (1/ T )  T t1 2  T  1  0   v2  Var Tˆ  Var y 2  T  E h2 Var  2 T  T   t   0   t   t   t1     2nd case: With trimming:

 T  2 ˆ(tr) 2 ( y) vT  VarT T  Var yt I yt  cT   t1  2 v2 1  0  Lim T    E h2 Var  2 if E y 4   T   0   t   t   t  T    v2 Lim T   if Ey 4  , BUT v2  o(T 2 ) T  T t T

Same asymptotic distribution if finite kurtosis  More involved in the general case because, by trimming, we lose the mds property: Pareto tails: ( y) 2 ( y) 2 kT ( y)  Pyt  cT    d(cT ) ,2    4 T 1/ ( y) ( y) 1/  T  T  kT cT  (d)  ( y)   as  0 kT  T Feller (1971) (regularly varying functions):

4 ˆ( y) ( y) 4 ( y)  Eyt IT ,t  (cT ) LcT , L(x)  o(x ),  0 Corollary 1: ( y) 4 ˆ( y)   2 and kT    Eyt IT ,t  o(T) Corollary 2 (with geometric -mixing): Long term variance matrix = o(T) T 2  2 ˆ( y)  2 vT  Var yt IT ,t   o(T ),vT  o(T).  t1  4 BUT : E(yt )    vT / T  

4 2 4 E(yt )    vT / T   bE( t ) 1 2  1   4 b    E(ht ) 1    R. Geometric -mixing for y = implied by +<1 when  has an absolutely continuous distribution Asymptotic distribution of tail-trimmed estimators of unconditional variance: Long-term variance matrix = o(T)  Promotes asymptotic normality of tail trimmed sample variance: T (tr) 0 d ˆT  (0,1) vT ,t

T 2 ˆ( y) 0 insofar as : Eyt IT   0. vT, t 2 ˆ( y) 0 Always true if : T Eyt IT   0.  Lighter trimming for heavier tail (feasible from estimation of tail index) 1st improvement of tail-trimmed sample variance • Peng (2001) : “Estimating the of a heavy tailed distribution”, & Probability Letters  Characterizes the systematic finite sample bias due to trimming: 1 T ˆ(tr) 2 ˆ( y)  T   yt IT ,t biased by : T t1 ( y) 2 ˆ( y)  kT ( y) 2 Eyt 1 IT ,t  cT    2 T k ( y) P y  c( y)  T t T T This bias can be( y )estimated by: ˆ 2 ˆ *  kT (a) R  y ( y ) T  k  ˆ  2 T T ( y) ˆ  Hill estimator based on kT largest absolute values. 1  k ( y ) 1 (a)  1 T  y  ˆ   i    ( y)  Log (a) k  y ( y )   T i1 k    T  (a) ( y) y ( y )  component kT in order statistics of absolute values : kT (a) (a) (a) y1  y2  ....  yT .  Improved estimator of conditional variance based on Peng (2001): * (tr) ˆ * ˆT  ˆT  RT Pros and cons of Peng’s improvement

• Pro 1: Avoids a systematic finite sample under- estimation of unconditional variance • Pro 2: Provides asymptotic normality without the problematic maintained assumption:

T 2 ˆ( y) 0 Eyt IT   0. vT Con: The correction term is not asymptotically independent from the main term * (tr) ˆ * ˆT  ˆT  RT (tr)  asymptotic distribution of ˆT  We propose a new estimator: by under-correction for the gap, we obtain an asymptotically normal estimator with the same distribution limit as the initial one: ~( y) ˆ 2 ˆ *  kT (a) R  y~( y ) T  k  ˆ  2 T T ~( y) ( y) (a) (a) T  k / k    y~( y ) / y ( y ) 0 T T k k T T  ( y) ˆ  Hill estimator based on even more kT largest absolute values. Price to pay for this simpler bias- corrected estimator: Need to maintain the assumption:

T 2 ˆ( y) ~( y) Eyt IT  IT  0. vT  we will work under this maintained assumption , or  No-bias correction + the corresponding stronger assumption: T 2 ˆ( y) 0 Eyt IT   0. vT 3. QMLE with (tail-trimmed) variance targeting 1/ 2 2 yt1  ht1  t1,ht1   (yt1)  ht .  1  ,   /,  (,)',  ( ,')'

T 1/ 2 0 d T ˆT  (0, Id3 ) vT

NW of T 1 1 SW of T  J K  E( 4 ) 1 SE of    t J 1  J 1KK' J 1 T  2   vT /T   1 ht ht   1 ht ht  J  E 2 , K  E 2  ht  ' ht    R1. Trimming has no impact on asymptotic distribution of QML-VT when finite kurtosis (Francq,Horvath,Zakoian,JFEC, 2011) R2. In case of infinite kurtosis, the asymptotic variance of GARCH parameters (α,) (SE block) is smaller:

 1  1 * T   1 1  K' J    J K R3. Price to pay for variance targeting = now in terms of rate of convergence BUT Two directions in the parameter space with root-T rate of convergence: 1 2 ' J K *   , ()      ()'  ()  0   

T 0   ()'ˆT   0 vT 0 4 1  T ()'ˆT  0,E( t ) 1' J  1   (1,0)'  ()'  J K 1  1 1   (1,1)'  ()'  J K 1  J K 2  1 

4. Trimming orthogonality conditions for GMM

QML score (with re-parameterization for variance targeting):  y2  1 h () 2  t  t et () 1st ()   1  ht ()  ht () 

4  May need tail-trimming if E(t )   since then Gaussian QML ( Hall and Yao, 2003) is not asymptotically normal and has a slow rate of convergence R1. General results of Hill and Renault (2010, GMM with Tail Trimming) : shows that a well- suited tail trimming of the estimating equations will ensure asymptotic normality with a faster rate of convergence ( albeit smaller than root T)  Asymptotic distribution with standard formulas (but Jacobian and Variance matrices with trimmed estimating equations) rate of convergence slower than root-T R2. Tail trimming of estimating equations  No reason anymore to consider just identified moment conditions (QML)  more lags of the variance score s() for more estimating equations General notations: • Tail trimming of moment conditions mˆ ()  mˆ () t,T  i,t,T 1iH ˆ ˆ mˆ i,t,T ()  mit ()Ii,t,T (), Ii,t,T (){0,1} ˆ Ii,t,T ( )  0  mit ( ) is one of the kiT largest observations among mi ( ) , 1,...,T. kiT  ,kiT /T  0

( fixed quantile trimming,kiT  cT) Trimming and local identification Jacobian of moment conditions J(): R1. Problem of non-existence of expected Jacobian matrix related to non-existence of moments :  2 mt ()  (yt yt1)yt1  mt ()  yt1  0 0  JT ( ) and ST ( ) may both diverge when T   R2. Standard assumption of full-column rank with asymptotic non-degeneracy (after trimming).

Jacobian formula: • Thanks to asymptotically vanishing trimming:

0  *    * 0  J ( )  E[m ()] 0  E m () I ( ) T ' t,T    ' i,t  i,t,T      0    1iH • While the perverse term:  0 * Em ( )I () 0  ' i,t i,t,T  

Is asymptotically negligible (would not work with fixed quantile trimming)

Linearization of FOC: 0 * 0 JT ( )' T mT ( )  0 0 ˆ 0 JT ( )'JT ( ) T (T  )  oP (1) R. In Cizek (2009) “Generalized Method of Trimmed moments”, W.P. Tilburg Fixed Quantile Trimming The “perverse” term is no longer negligible in the linearization Must deal with an asymmetric matrix J’J*instead of J’J (no perverse term in finite sample left Jacobian term)

Assuming: 0 JT ( )  asymptotically full column rank VT ()  ' ' 1 ' T[JT ()JT ()][JT ()ST ()JT ()] [JT ()JT ()] 1/ 2 0 ˆ 0 d VT ( )[T  ]  N(0, Id p ) An “efficient” weighting matrix:

0 1 T  [ST ( )] 1/ 2 ' 0 1 0 0 1/ 2 ˆ 0 T [JT ( )ST ( )JT ( )] (T  ) d  N(0, Id p ) R1. Efficient weighting matrix = exists thanks to symmetric occurrence of Jacobian matrices because perverse terms are negligible ( would not work with fixed quantile trimming, Cizek, 2009)

R2.Rate of convergence may depend upon the choice of trimming fractiles  questions the concept of “efficiency”

R3. We recover root-T asymptotic normality (with efficient asymptotic variance )if the matrices (T) and J(T) have well-defined limits  Tail trimming is always a safe practice .

R4. Asymptotic normality  allows standard inference like J-test, Wald, with standard formulas in spite of non-standard rates. ( see Antoine and Renault (2009)). R5. Fractile selection for sufficiently fast identification: 1st solution (Horowitz): Compare trimmed GMM and naïve estimator with identity weighting matrix (averaged over subsamples) 2nd solution: Iterative estimation of the tail indices

R. HR(2010) : trim estimating equations in a general environment BUT here: We can focus the trimming on the source of the extreme  Different trimming for the standardized innovation e() and the variance score s()  better control for artificially introduced bias due to trimming and even more importantly maintains (asymptotically) the martingale difference structure (no need of HAC estimators for GMM weighting matrix) by re-centering the tail-trimmed innovation. Trimmed estimating equation:

 1 T  ˆ* ˆ* ˆ* eT ,t ()  eT , ()sT ,t j ()  T  1  * ˆ(e) eˆT ,t ()  et ()IT ,t * ˆ(s) sˆT ,t j ()  st j ()IT ,t j  Asymptotic theory of GMTTM (HR,2010) while taking into account that it is a two step GMM but no need of HAC estimation ( while in HR 2010, perverse serial correlation due to trimming) Additional simplification: (Francq and Zakoian, 2004)

If sT ,t j ( )  lagged value of 1 h () t  uniformly square integrable ht () 

no need of trimming when Var(yt )    GARCH polynomial in denominator promotes finite variance 5. Monte Carlo study 4.1. 1st case: QML-GARCH without variance targeting (and global trimming of moment condition): 1000 samples of size T=1000 Strong impact of a few trimmed observations (breakdown point: Sakata and White (1998))  kT  [T ],  {.01,.02,.....,.99} •Tail trimming always delivers an approximately normal estimator •In the cases of substantially heavy tails, standard GMM and QMLE strongly fails the KS . • Only a few tail observations need to be trimmed to ensure approximate normality Example: GARCH with Pareto errors (=2.5 for errors =1.5 for y): •Select KS (test for normality) minimizing  .35 kT [1000 ] 11 •Always safe to trim even when the variance is not infinite. • Asymmetric trimming for asymmetrically distributed equations is always optimal, where less observations are trimmed from the heavier tail. •Example: TARCH model :

( 1,2 )  (.26,.11)  k1T  6,k2T  2 •Two-step GMTTME with a QML plug-in dominates one-step GMTTME (and two-step GMTTME with a GMTTME plug-in 2nd case: Variance Targeting • Three distributions for standardized innovations: N(0,1), Pareto with  = 4.1 and 2.5 .Two GARCH(1,1) models: (,,) = (1,0.3,0.4) and (1,0.3,0.6) 6 different DGPs: 2 with infinite kurtosis for  4 with infinite kurtosis for y  Focus on inference on  (Normality, MSE, Tests) Main Monte-Carlo results

1. VT = improves upon QML when y has finite kurtosis but large . 2. TTVT = needed to restore normality when y has infinite kurtosis ( and no cost in terms of MSE) 3. G-TTVT = needed to restore normality when  has infinite kurtosis (does not hurt if  has finite kurtosis) 4. Peng’s bias correction matters (no significant need of under-correction in finite sample) CONCLUSION Variance targeting in GARCH-QML estimation: (i) Cost in terms of efficiency as already documented by Francq, Horvath and Zakoian,2011, in the finite kurtosis case. (ii) Advantages for finite sample performance and robustness to misspecification (iii) Inference based on asymptotic normality = no longer valid if infinite kurtosis of asset returns. (iv) Tail trimming = hedge against infinite kurtosis (unconditional and/or conditional)