The Generalised Hyperbolic distribution∗ Some practical issues

Javier Menc´ıa Enrique Sentana Bank of Spain CEMFI

March 2009

∗Address for correspondence: Casado del Alisal 5, E-28014 Madrid, Spain, tel: +34 91 429 05 51, fax: +34 91 429 1056. 1 The density function

Consider an N-dimensional random vector u, which can be expressed in terms of the following Location-Scale Mixture of Normals (LSMN ):

u = α + ξ−1Υβ + ξ−1/2Υ1/2r, (1) where α and β are N-dimensional vectors, Υ is a positive definite matrix of order N, r ∼ N(0, IN ), and ξ is an independent positive mixing variable. If the mixing variable follows a Generalised Inverse Gaussian distribution (GIG), then the distribution of u will be the Generalised Hyperbolic distribution (GH ) introduced by Barndorff-Nielsen (1977). More explicitly, if ξ ∼ GIG (−ν, γ, δ) then the density of the N × 1 GH random vector u will be given by

γ ν N n oν− 2 δ p 0 2  −1  fGH (u) = N N 1 β Υβ + γ δq δ (u − α) 0 2 ν− 2 (2π) 2 [β Υβ + γ ] |Υ| 2 Kν (δγ) n o p 0 2  −1  0 ×K N β Υβ + γ δq δ (u − α) exp [β (u − α)] , ν− 2

−1 p −2 0 −1 where −∞ < ν < ∞, γ > 0, q [δ (u − α)] = 1 + δ (u − α) Υ (u − α) and Kν (·) is the modified Bessel function of the third kind (see Abramowitz and Stegun, 1965, p. 374, as well as Section 4). Given that δ and Υ are not separately identified, Barndorff-Nielsen and Shephard (2001) set the determinant of Υ equal to 1. However, it is more convenient to set δ = 1 instead in order to reparametrise the GH distribution so that it has mean vector

0 0 and covariance matrix IN . Hence, if ξ ∼ GIG(−ν, γ, 1), then τ = (ν, γ) , π1(τ ) = p Rν(γ)/γ, and cv(τ ) = Dν+1(γ) − 1, where Rν (γ) = Kν+1 (γ) /Kν (γ) and Dν+1 (γ) = 2 Kν+2 (γ) Kν (γ) /Kν+1 (γ). It is then straightforward to use Proposition 1 in Menc´ıa and Sentana (2009) to obtain a standardised GH distribution. Specifically, we set α = −c (β, ν, γ) β and   γ c (β, ν, γ) − 1 0 Υ = IN + 0 ββ , (2) Rν (γ) β β where p 0 −1 + 1 + 4[Dν+1 (γ) − 1]β β c (β, ν, γ) = 0 . (3) 2[Dν+1 (γ) − 1]β β ∗ Thus, the distribution of εt depends on two shape parameters, ν and γ, and a vector of N skewness parameters, denoted by β.

1 One of the most attractive properties of the GH distribution is that it contains as particular cases several of the most important multivariate distributions already used in the literature. The best known examples are: • Normal, which can be achieved in three different ways: (i) when ν → −∞ or (ii) ν → +∞, regardless of the values of γ and β; and (iii) when γ → ∞ irrespective of the values of ν and β. • Symmetric Student t, obtained when −∞ < ν < −2, γ = 0 and β = 0. • Asymmetric Student t, which is like its symmetric counterpart except that the vector β of skewness parameters is no longer zero. • Asymmetric Normal-Gamma, which is obtained when γ = 0 and 0 < ν < ∞ (see Madan and Milne, 1991). • Normal Inverse Gaussian, for ν = −.5 (see Aas, Dimakos, and Haff, 2005). • Hyperbolic, for ν = 1 (see Chen, H¨ardle, and Jeong, 2008) • Asymmetric Laplace, for ν = 1 and γ = 0 (see Cajigas and Urga, 2007).

2 The score function

Let yt be a vector of N observed variables. To accommodate flexible specifications, we assume the following conditionally heteroskedastic dynamic regression model:

1 2 ∗  yt = µt(θ) + Σt (θ)εt ,  (4) µt(θ) = µ (It−1; θ) ,  Σt(θ) = Σ (It−1; θ) , where µ() and vech [Σ()] are N and N(N +1)/2-dimensional vectors of functions known up to the p × 1 vector of true parameter values, θ0, It−1 denotes the information set 1/2 available at t − 1, which contains past values of yt and possibly other variables, Σt (θ) 1/2 1/20 ∗ is some N × N “square root” matrix such that Σt (θ)Σt (θ) = Σt(θ), and εt is ∗ a standardised GH vector martingale difference sequence satisfying E(εt |It−1; θ0) = 0 ∗ and V (εt |It−1; θ0) = IN . As a consequence, E(yt|It−1; θ0) = µt(θ0) and V (yt|It−1; θ0) =

Σt(θ0). ∗ Importantly, given that εt is not generally observable, the choice of “square root” matrix is not irrelevant except in univariate GH models, or in multivariate GH models in

∗ which either Σt(θ) is time-invariant or εt is spherical (i.e. β = 0). But, if we parametrise β as a function of past information and a new vector of parameters b in the following

2 way: 1 2 0 βt(θ, b) = Σt (θ)b, (5) then it is straightforward to see that the resulting distribution of yt conditional on It−1 1 2 will not depend on the choice of Σt (θ). Finally, it is analytically convenient to replace ν and γ by η and ψ, where η = −.5ν−1 and ψ = (1 + γ)−1, although we continue to use ν and γ in some equations for notational simplicity. If the mean vector and covariance matrix specifications in (4) were constant, it would be potentially advantageous to use the EM algorithm for estimation purposes. In general dynamic models, though, the EM is not as useful because it typically requires numerical maximisation procedures at each M step. However, the EM principle can still be useful to derive the score function of the GH distribution. In this context, the procedure that we follow is divided in two parts. In the first step, we derive l (yt|ξt,It−1; φ) and l (ξt|It−1; φ) with respect to φ. Then, in the second step, we take the of these derivatives given IT = {y1, y2, ··· , yT } and the parameter values.

Conditional on ξt, yt is the following multivariate normal:     γ 1 γ 1 ∗ yt|ξt,It−1 ∼ N µt(θ) + Σt(θ)ct(φ)b − 1 , Σt (φ) , Rν (γ) ξt Rν (γ) ξt

1 2 0 where ct(φ) = c[Σt (θ)b, ν, γ] and

∗ ct(φ) − 1 0 Σt (φ) = Σt(θ) + 0 Σt(θ)bb Σt(θ) b Σt(θ)b

If we define pt = yt − µt(θ) + ct(φ)Σt(θ)b, then we have the following log-density N ξ R (γ) 1 ξ R (γ) l (y |ξ ,I ; φ) = log t ν − log |Σ∗(φ)| − t ν p0 Σ∗−1(φ)p t t t−1 2 2πγ 2 t 2 γ t t t 0 0 b Σt(θ)b γct(φ) +b pt − . 2ξt Rν (γ)

Similarly, ξt is distributed as a GIG with parameters ξt|It−1 ∼ GIG (−ν, γ, 1), with a log-likelihood given by   1 2 1 l (ξt|It−1; φ) = ν log γ − log 2 − log Kν (γ) − (ν + 1) log ξt − ξt + γ . 2 ξt

In order to determine the distribution of ξt given all the observable information IT , we can exploit the serial independence of ξt given It−1; φ to show that

f (yt,ξt|It−1; φ) f (ξt|IT ; φ) = ∝ f (yt|ξt,It−1; φ) f (ξt|It−1; φ) f (yt|It−1; φ) N       2 −ν−1 −1 Rν (γ) 0 ∗−1 γct(φ) 0 2 1 ∝ ξt × exp ptΣt (φ)pt + 1 ξt + b Σt(θ)b + γ , 2 γ Rν (γ) ξt

3 which implies that s s ! N γct(φ) 0 2 Rν (γ) 0 ∗−1 ξt|IT ; φ ∼ GIG − ν, b Σt(θ)b + γ , ptΣt (φ)pt + 1 . 2 Rν (γ) γ From here, we can use (9) and (10) to obtain the required moments. Specifically, q γct(φ) b0Σ (θ)b + γ2 Rν (γ) t E (ξt|IT ; φ) = q Rν (γ) 0 ∗−1 γ ptΣt pt + 1 "s s # γct(φ) 0 2 Rν (γ) 0 ∗−1 ×R N −ν b Σt(θ)b + γ ptΣt pt + 1 , 2 Rν (γ) γ q   Rν (γ) p0 Σ∗−1p + 1 1 γ t t t E IT ; φ = q ξt γct(φ) b0Σ (θ)b + γ2 Rν (γ) t 1 × hq q i, γct(φ) 0 2 Rν (γ) 0 ∗−1 R N b Σt(θ)b + γ ptΣt pt + 1 2 −ν−1 Rν (γ) γ s ! s ! γct(φ) 0 2 Rν (γ) 0 ∗−1 E (log ξt| IT ; φ) = log b Σt(θ)b + γ − log ptΣt pt + 1 Rν (γ) γ "s s # ∂ γc (φ) R (γ) + log K t b0Σ (θ)b + γ2 ν p0 Σ∗−1p + 1 . ∂x x R (γ) t γ t t t ν N x= 2 −ν If we put all the pieces together, we will finally have that ∂l(y | I ; φ) 1 ∂vec[Σ (θ)] ∂p t t−1 = − vec0[Σ−1(θ)] t − f(I , φ)p0 Σ∗−1(φ) t ∂θ0 2 t ∂θ0 T t t ∂θ0 1 c (φ) − 1 ∂vec[Σ (θ)] ∂p − t vec0 (bb0) t + b0 t 0 p 0 0 0 2 ct(φ)b Σt(θ)b 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂θ ∂θ 1 ∂vec[Σ∗(φ)] + f(I , φ)[p0 Σ∗−1(φ) ⊗ p0 Σ∗−1(φ)] t 2 T t t t t ∂θ0 1 g(I , φ) ∂vec[Σ (θ)] − T vec0 (bb0) t , p 0 0 2 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂θ

∂l (yt| It−1; φ) ct(φ) − 1 0 = − b Σt(θ) 0 p 0 ∂b ct(φ)b Σt(θ)b 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b

0 0 ct(φ) − 1 0 −f (IT , φ) ct(φ)pt + εt + f (IT , φ) 0 (b pt) b Σt(θ)b ( 0 [ct(φ) − 1] (b pt) 0 × b Σt(θ) 2 0 p 0 ct (φ)b Σt(θ)b 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b 0 ) pt 1 0 + − b Σt(θ) p 0 ct(φ) 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b

[2 − g (IT , φ)] 0 + b Σt(θ), p 0 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b

4   ∂l (yt| It−1; φ) N ∂ log Rν (γ) 0 1 ∂ct(φ) log (γ) = + b Σt(θ)b − + 2 ∂η 2 ∂η 2ct(φ) ∂η 2η ∂ log K (γ) 1 f (I , φ) ∂ log R (γ) − ν − E [log ξ |Y ; φ] − T ν p0 Σ∗−1(φ)p ∂η 2η2 t T 2 ∂η t t t " 0 2 #) ∂ct(φ) 0 (b εt) + b Σt(θ)b − 2 0 ∂η ct (φ)b Σt(θ)b b0Σ (θ)b ∂c (φ) ∂ log R (γ) − t g (I , φ) t − c (φ) ν , 2 T ∂η t ∂η and   ∂l (yt| It−1; φ) N ∂ log Rν (γ) N 0 1 ∂ct(φ) = + + b Σt(θ)b − ∂ψ 2 ∂ψ 2ψ (1 − ψ) 2ct(φ) ∂ψ 1 ∂ log K (γ) f (I , φ) ∂ log R (γ) 1  + − ν − T ν + p0 Σ∗−1(φ)p 2ηψ (1 − ψ) ∂ψ 2 ∂ψ ψ (1 − ψ) t t t " 0 2 #) ∂ct(φ) 0 (b εt) + b Σt(θ)b − 2 0 ∂ψ ct (φ)b Σt(θ)b b0Σ (θ)b  c (φ) ∂c (φ) ∂ log R (γ) R (γ) − t g (I , φ) − t + t − c (φ) ν + g (I , φ) ν , 2 T ψ (1 − ψ) ∂ψ t ∂ψ T ψ2 where

−1 f (IT , φ) = γ Rν (γ) E (ξt|IT ; φ) ,

−1 −1  g (IT , φ) = γRν (γ) E ξt |IT ; φ ,

∗ ∂vec[Σt (φ)] ∂vec[Σt(θ)] ct(φ) − 1 0 0 ∂vec[Σt(θ)] 0 = 0 + 0 {[Σt(θ)bb ⊗ IN ] + [IN ⊗ Σt(θ)bb ]} 0 ∂θ ∂θ b Σt(θ)b ∂θ ( ) c (φ) − 1 1 + t − 1 0 2 p 0 [b Σt(θ)b] 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂vec[Σ (θ)] ×vec [Σ (θ)bb0Σ (θ)] vec0 (bb0) t , t t ∂θ0

∂p ∂µ (θ) ∂vec[Σ (θ)] t = − t + c (φ)[b0 ⊗ I ] t ∂θ0 ∂θ0 t N ∂θ0 ct(φ) − 1 1 0 0 ∂vec[Σt(θ)] + Σt(θ)bvec (bb ) , 0 p 0 0 b Σt(θ)b 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂θ ∂c (φ) c (φ) − 1 1 t = t , 0 0 p 0 ∂ (b Σt(θ)b) b Σt(θ)b 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂c (φ) c (φ) − 1 ∂D (γ) t = t ν+1 , p 0 ∂η [Dν+1 (γ) − 1] 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂η and ∂c (φ) c (φ) − 1 ∂D (γ) t = t ν+1 . p 0 ∂ψ [Dν+1 (γ) − 1] 1 + 4 (Dν+1 (γ) − 1) b Σt(θ)b ∂ψ 5 3 Skewness and kurtosis of GH distributions

We can tediously show that

E [vec (ε∗ε∗0) ε∗0] = E [(ε∗ ⊗ ε∗) ε∗0]  2  3 Kν+3 (γ) Kν (γ) 0 0 = c (β,ν, γ) 3 − 3Dν+1 (γ) + 2 vec (ββ ) β Kν+1 (γ) 0 0 0 +c(β,ν, γ)[Dν+1 (γ) -1] (KNN +IN 2 )(β ⊗ A) A +c(β,ν, γ)[Dν+1 (γ) -1] vec (AA ) β , and

E [vec (ε∗ε∗0) vec0 (ε∗ε∗0)] = E [ε∗ε∗0 ⊗ ε∗ε∗0]  3 2  4 Kν+4 (γ) Kν (γ) Kν+3 (γ) Kν (γ) 0 0 0 = c (β,ν, γ) 4 − 4 3 + 6Dν+1 (γ) − 3 vec (ββ ) vec (ββ ) Kν+1 (γ) Kν+1 (γ)  2  2 Kν+3 (γ) Kν (γ) +c (β, ν, γ) 3 − 2Dν+1 (γ) + 1 Kν+1 (γ) 0 0 0 0 0 0 0 0 × {vec (ββ ) vec (AA ) +vec (AA ) vec (ββ ) + (KNN +IN 2 )[ββ ⊗ AA ](KNN +IN 2 )}

0 0 0 0 0 +Dν+1 (γ) {[AA ⊗ AA ](KNN + IN 2 ) + vec (AA ) vec (AA )} , where 1  c(β,ν, γ) − 1  2 A = I + ββ0 , N β0β and KNN is the commutation matrix (see Magnus and Neudecker, 1988). In this respect, note that Mardia’s (1970) coefficient of multivariate excess kurtosis will be -1 plus the trace of the fourth moment above divided by N(N + 2). Under symmetry, the distribution of the standardised residuals ε∗ is clearly elliptical,

∗ p p 2 −1 as it can be written as ε = ζ/ξ γ/Rν (γ)u, where ζ ∼ χN and ξ ∼ GIG (ν, 1, γ). This is confirmed by the fact that the third moment becomes 0, while

∗ ∗0 ∗ ∗0 0 E [ε ε ⊗ ε ε ] = Dν+1 (γ) {[IN ⊗ IN ](KNN + IN 2 ) + vec (IN ) vec (IN )} .

In the symmetric case, therefore, the coefficient of multivariate excess kurtosis is simply

Dν+1 (γ)-1, which is always non-negative, but monotonically decreasing in γ and |ν|. 4 Modified Bessel function of the third kind

The modified Bessel function of the third kind with order ν, which we denote as

Kν (·), is closely related to the modified Bessel function of the first kind Iν (·), as π I (x) − I (x) K (x) = −ν ν . (6) ν 2 sin (πν)

6 Some basic properties of Kν (·), taken from Abramowitz and Stegun (1965), are −1 −1 Kν (x) = K−ν (x), Kν+1 (x) = 2νx Kν (x)+Kν−1 (x), and ∂Kν (x) /∂x = −νx Kν (x)−

Kν−1 (x). For small values of the argument x, and ν fixed, it holds that

1 1 −ν K (x) ' Γ(ν) x . ν 2 2

Similarly, for ν fixed, |x| large and m = 4ν2, the following asymptotic expansion is valid r   π −x m-1 (m-1) (m-9) (m-1) (m-9) (m-25) Kν (x) ' e 1+ + + + ··· . (7) 2x 8x 2! (8x)2 3! (8x)3

Finally, for large values of x and ν we have that

r π exp (−νl−1)  (x/ν) −ν  3l-5l3 81l2-462l4+385l6  K (x) ' 1- + + ··· , (8) ν 2ν l−2 1 + l−1 24ν 1152ν2

− 1 where ν > 0 and l = 1 + (x/ν)2 2 . Although the existing literature does not discuss how to obtain numerically reliable derivatives of Kν(x) with respect to its order, our experience suggests the following conclusions: • For ν ≤ 10 and |x| > 12, the derivative of (7) with respect to ν gives a better approximation than the direct derivative of Kν(x), which is in fact very unstable. • For ν > 10, the derivative of (8) with respect to ν works better than the direct derivative of Kν(x). • Otherwise, the direct derivative of the original function works well.

We can express such a derivative as a function of Iν(x) by using (6) as:

∂K (x) π ∂I (x) ∂I (x) ν = −ν − ν − π cot (νπ) K (x) ∂ν 2 sin (νπ) ∂ν ∂ν ν

However, this formula becomes numerically unstable when ν is near any non-negative integer n = 0, 1, 2, ··· due to the sine that appears in the denominator. In our experience, it is much better to use the following Taylor expansion for small |ν − n|:

2 ∂Kν(x) ∂Kν(x) ∂ Kν(x) = + 2 (ν − n) ∂ν ∂ν ν=n ∂ν ν=n 3 4 ∂ Kν(x) 2 ∂ Kν(x) 3 + 3 (ν − n) + 4 (ν − n) , ∂ν ν=n ∂ν ν=n where for integer ν:

∂K (x) 1 ∂2I (x) ∂2I (x) ν = −ν − ν + π2 [I (x) − I (x)] , ∂ν 4 cos (πn) ∂ν2 ∂ν2 −ν ν

7 ∂2K (x) 1 ∂3I (x) ∂3I (x) π2 ∂I (x) ∂I (x) π2 ν = −ν - ν + −ν - ν - K (x), ∂ν2 6 cos (πn) ∂ν3 ∂ν3 3 cos (πn) ∂ν ∂ν 3 n

∂3K (x) 1 ∂4I (x) ∂4I (x) ν = −ν − ν ∂ν3 8 cos (πn) ∂ν4 ∂ν4 ∂2I (x) ∂2I (x)  ∂K (x) −4π2 −ν − ν − 12π4 [I (x) − I (x)] + 3π2 n , ∂ν2 ∂ν2 −ν ν ∂ν and ∂4 1 3 ∂5I (x) ∂5I (x) K (x) = −ν − ν ∂ν4 ν 8 cos (πn) 2 ∂ν5 ∂ν5 ∂3I (x) ∂3I (x) ∂I (x) ∂I (x) ∂2K (x) -10π2 −ν − ν -4π4 −ν − ν +6π2 n − π4K (x). ∂ν3 ∂ν3 ∂ν ∂ν ∂ν2 n Let ψ(i) (·) denote the polygamma function (see Abramowitz and Stegun, 1965). The

first five derivatives of Iν(x) for any real ν are as follows: ∞ ν  k ∂Iν(x) x x X Q1(ν + k + 1) 1 = I (x) log − x2 , ∂ν ν 2 2 k! 4 k=0 where  ψ (z) /Γ(z) if z > 0 Q (z) = 1 π−1Γ (1 − z)[ψ (1 − z) sin (πz) − π cos (πz)] if z ≤ 0

∞ 2 2 ν  k ∂ Iν(x) x ∂Iν(x) h xi x X Q2(ν + k + 1) 1 = 2 log − I (x) log − x2 , ∂ν2 2 ∂ν ν 2 2 k! 4 k=0 where  [ψ0 (z) − ψ2 (z)] /Γ(z) if z > 0  −1  2 0 2 Q2(z) = π Γ (1 − z) π − ψ (1 − z) − [ψ (1 − z)] sin (πz)  +2Γ (1 − z) ψ (1 − z) cos (πz) if z ≤ 0

∂3I (x) x ∂2I (x) h xi2 ∂I (x) h xi3 ν = 3 log ν − 3 log ν + log I (x) ∂ν3 2 ∂ν2 2 ∂ν 2 ν ∞ ν  k x X Q3(ν + k + 1) 1 − x2 , 2 k! 4 k=0 where  [ψ3 (z) − 3ψ (z) ψ0 (z) + ψ00 (z)] /Γ(z) if z > 0  −1 3 2 0 00 Q3(z) = π Γ (1 − z) {ψ (1 − z) − 3ψ (1 − z)[π − ψ (1 − z)] + ψ (1 − z)} sin (πz)  +Γ (1 − z) {π2 − 3 [ψ2 (1 − z) + ψ0 (1 − z)]} cos (πz) if z ≤ 0

∂4I (x) x ∂3I (x) h xi2 ∂2I (x) h xi3 ∂I (x) ν = 4 log ν − 6 log ν + 4 log ν ∂ν4 2 ∂ν3 2 ∂ν2 2 ∂ν ∞ 4 ν  k h xi x X Q4(ν + k + 1) 1 − log I (x) − x2 , 2 ν 2 k! 4 k=0

8 where

  4 2 0 00 0 2 000   -ψ (z) + 6ψ (z) ψ (z) − 4ψ (z) ψ (z) − 3 [ψ (z)] + ψ (z) /Γ(z) if z > 0  −1 4 2 2 2 0  π Γ (1 − z) {−ψ (1 − z) + 6π ψ (1 − z) − 6ψ (1 − z) ψ (1 − z) 00 0 2 2 0 Q4(z) = −4ψ (1 − z) ψ (1 − z) − 3 [ψ (1 − z)] + 6π ψ (1 − z)  −ψ000 (1 − z) − π4} sin (πz) + Γ (1 − z) 4ψ3 (1 − z) − 4π2ψ (1 − z)   +12ψ (1 − z) ψ0 (1 − z) + 4ψ00 (1 − z) cos (πz) if z ≤ 0 and finally,

∂5I (x) x ∂4I (x) h xi2 ∂3I (x) h xi3 ∂2I (x) ν = 5 log ν − 10 log ν + 10 log ν ∂ν5 2 ∂ν4 2 ∂ν3 2 ∂ν2 ∞ 4 5 ν  k h xi ∂Iν(x) h xi x X Q5(ν + k + 1) 1 −5 log + log I (x) − x2 , 2 ∂ν 2 ν 2 k! 4 k=0 where

  5 3 0 2 00 0 2  ψ (z) − 10ψ (z) ψ (z) + 10ψ (z) ψ (z) + 15ψ (z)[ψ (z)] 000 0 00 (iv) Q5(z) = −5ψ (z) ψ (z) − 10ψ (z) ψ (z) + ψ (z) /Γ(z) if z > 0 −1  π Γ (1 − z) fa (z) sin (πz) + Γ (1 − z) fb (z) cos (πz) if z ≤ 0 with

5 2 3 3 0 2 00 fa (z) = ψ (1 − z) − 10π ψ (1 − z) + 10ψ (1 − z) ψ (1 − z) + 10ψ (1 − z) ψ (1 − z) +15ψ (1 − z)[ψ0 (1 − z)]2 + 5ψ (1 − z) ψ000 (1 − z) + 5π4ψ (1 − z) −30π2ψ (1 − z) ψ0 (1 − z) + 10ψ0 (1 − z) ψ00 (1 − z) − 10π2ψ00 (1 − z) + ψ(iv) (1 − z) , and

4 2 2 2 0 fb (z) = −5ψ (1 − z) + 10π ψ (1 − z) − 30ψ (1 − z) ψ (1 − z) −20ψ (1 − z) ψ00 (1 − z) − 15 [ψ0 (1 − z)]2 + 10π2ψ0 (1 − z) − 5ψ000 (1 − z) − π4.

5 Moments of the GIG distribution

If X ∼ GIG (ν, δ, γ), its density function will be

(γ/δ)ν  1 δ2  xν−1 exp − + γ2x , 2Kν (δγ) 2 x where Kν (·) is the modified Bessel function of the third kind and δ, γ ≥ 0, ν ∈ R, x > 0. Two important properties of this distribution are X−1 ∼ GIG (−ν, γ, δ) and √ √ (γ/δ)X ∼ GIG ν, γδ, γδ. For our purposes, the most useful moments of X when δγ > 0 are

 δ k K (δγ) E Xk = ν+k (9) γ Kν (δγ)  δ  ∂ E (log X) = log + K (δγ) . (10) γ ∂ν ν

9 The GIG nests some well-known important distributions, such as the gamma (ν > 0, δ = 0), the reciprocal gamma (ν < 0, γ = 0) or the inverse Gaussian (ν = −1/2). Importantly, all the moments of this distribution are finite, except in the reciprocal gamma case, in which (9) becomes infinite for k ≥ |ν|. A complete discussion on this distribution can be found in Jørgensen (1982), who also presents several useful Gaussian approximations based on the following limits:

δγ→∞ pδγ[(γx/δ) − 1] → N(0, 1)

δγ→∞ pδγ log (γx/δ) → N(0, 1) γ2  2ν  √ x − ν→→+∞ N(0, 1) 2 ν γ2 −2ν3/2  δ2  x + ν→−∞→ N(0, 1) δ2 2ν

10 References

Aas, K., X. Dimakos, and I. Haff (2005). Risk estimation using the multivariate normal inverse gaussian distribution. Journal of Risk 8, 39–60. Abramowitz, M. and A. Stegun (1965). Handbook of mathematical functions. New York: Dover Publications. Barndorff-Nielsen, O. (1977). Exponentially decreasing distributions for the logarithm of particle size. Proc. R. Soc. 353, 401–419. Barndorff-Nielsen, O. and N. Shephard (2001). Normal modified stable processes. Theory of Probability and Mathematical Statistics 65, 1–19. Cajigas, J. and G. Urga (2007). A risk management analysis using the AGDCC model with asymmetric multivariate of innovations. mimeo Cass Busi- ness School. Chen, Y., W. H¨ardle, and S. Jeong (2008). Nonparametric risk management with Gener- alized Hyperbolic distributions. Journal of the American Statistical Association 103, 910–923. Jørgensen, B. (1982). Statistical properties of the generalized inverse Gaussian distribu- tion. New York: Springer-Verlag. Madan, D. B. and F. Milne (1991). Option pricing with V.G. martingale components. Mathematical Finance 1, 39–55. Magnus, J. R. and H. Neudecker (1988). Matrix differential calculus with applications in statistics and econometrics. Chichester: John Wiley and Sons. Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530. Menc´ıa, J. and E. Sentana (2009). Multivariate location-scale mixtures of normals and mean--skewness portfolio allocation. mimeo.

11