The Copula Information Criterion
Total Page:16
File Type:pdf, Size:1020Kb
Dept. of Math. University of Oslo Statistical Research Report No. 7 ISSN 0806–3842 July 2008 THE COPULA INFORMATION CRITERION STEFFEN GRØNNEBERG AND NILS LID HJORT Abstract. The arguments leading to the classical AIC does not hold for the case of parametric copulae models when using the pseudo maximum likelihood procedure. We derive a proper correction, and name it the Copula Information Criterion (CIC). The CIC is, as the copula, invariant to the underlying marginal distributions, but is fundamentally different from the AIC formula erroneously used by practitioners. We further observe that bias-correction–based information criteria do not exist for a large class of commonly used copulae, such as the Gumbel or Joe copulae. This motivates the investigation of median unbiased information criteria in the fully parametric setting, which then could be ported over to the current, more complex, semi-parametric setting. Contents 1. Introduction and summary 1 2. Approximation Theorems for the Kullback–Leibler divergence 2 2.1. Infinite expectation of qn and rn for certain copula models 8 2.2. AIC-like formula 9 2.3. TIC-like formula 11 3. Illustration 11 Appendix A. Mathematical proofs 14 References 20 1. Introduction and summary Let Xi = (Xi,(1),Xi,(2),...,Xi,(d)) be i.i.d from some joint distribution, for which we intend to employ one or more parametric copulae. Letting Cθ(v) be such a copula, with d density cθ(v) on [0, 1] , in terms of some parameter θ of length p, the pseudo-log-likelihood function is Xn `n(θ) = log cθ(Vˆi) i=1 in terms of the estimated and transformed marginals, say ˆ ˆ ˆ ˆ (1) (2) (d) Vi = (Vi,(1), Vi,(2),..., Vi,(d)) = (Fn (Xi,(1)),Fn (Xi,(2)),...,Fn (Xi,(d)), Date: 25th July 2008. Key words and phrases. AIC, copulae, model selection. 1 2 STEFFEN GRØNNEBERG AND NILS LID HJORT where the traditional choices are 1 Xn F (i)(x) = I{X(i) ≤ x}. n n + 1 i i=1 In particular, the Vˆi lie inside the unit square. The maximizer θˆ of `n(·) is called the maximum pseudo-likelihood estimator. Often in copulae applications one uses ? AIC = 2`n,max − 2length(θ) (1) as a model selection criterion, with `n,max = `n(θˆ) being the maximum pseudo likelihood, # # from the traditional Akaike information criterion AIC = 2`n,max −2length(θ), where `n,max is the usual maximum likelihood for a fully parametric model. One computes this AIC? score for each candidate model and in the end chooses the model with highest score. This cannot be quite correct, however, as the arguments underlying the derivations of the traditional AIC do not apply here – since `n(·) at work here is not a proper log- likelihood function for a model. In other words, the AIC? formula above ignores the noise ? inherent in the transformation step that takes Xi to Vˆi. The AIC -formula would be ◦ ◦ ◦ appropriate only if we could use Ui,(k) = F(k)(Xi,(k)), with known marginals F(k). So this paper is about the appropriate modifications necessary to provide the correct penalisation for complexity. We shall see that the result is the Copula Information Cri- terion ∗ ∗ ∗ CIC = 2`n,max − 2(ˆp +q ˆ +r ˆ ) (2) with a formula for pˆ∗+ˆq∗+ˆr∗ different from (and more complicated than) merely length(θ). These quantities even vary non-trivially with the model parameter – in clear contrast with length(θ) which is invariant to the actual value of θ. Surprisingly, we will see that there does not exist any (first order) bias-correction-based information criterion for the maximum pseudo likelihood estimator for a large class of copulae. This motivates the investigation of a median unbiased information criterion in parametric and semi-parametric estimation. See the further discussion in Section 2.1. We derive the theoretical basis for the Copula Information Criterion in Section 2. As in the fully parametric case, we get different results if we assume the parametric model is in fact correct or not. Subsection 2.2 assumes this and results an AIC-like formula, while Subsection 2.3 does not and results in a TIC-like1 formula. Finally, we give a small numerical illustration of the CIC in Section 3 and conclude with some needed mathematical results in Appendix A. 2. Approximation Theorems for the Kullback–Leibler divergence We start out considering Xn −1 −1 An(θ) = n `n(θ) = n log cθ(Vˆi) i=1 1See Claeskens & Hjort (2008) for the definitions and derivations of the AIC and TIC formulae in the fully parametric case. THE COPULA INFORMATION CRITERION 3 the pseudo-log-likelihood normalized by sample size. We postulate that the true (but unknown) data-generating distribution has a densityf ◦. We then have the unique decom- position Yd ◦ ◦ ◦ ◦ ◦ ◦ f (x1, x2, . , xd) = c (F1 (x1),F2 (x2),...,Fd (xd)) fi (xi). i=1 This motivates naming c◦ the true data generating copula and, more traditionally, name ◦ fi the true data generating marginal distributions. There is convergence almost surely Z a.s. ◦ An(θ) −−−→ A(θ) = c (v) log cθ(v) dv n→∞ [0,1]d for each θ. Let θ0 be the least false value for the family, i.e. the maximizer of A(·), assumed to be unique and an inner point of the parameter space. This is also the minimizer of the Kullback–Leibler distance Z ◦ ◦ c KL(c , cθ) = c log dv [0,1]d cθ from the true to the parametric copula. For the following result, which is proved in Genest et al. (1995), let ∂A (θ ) ∂` (θ ) U = n 0 = n−1 n 0 , n ∂θ ∂θ the normalized pseudo-score function, evaluated at θ0. Also, let ( ) Xd Z ∂φ(v, θ0) ◦ Σ = I + Var (I{ξi ≤ vi} − vi) dC (v) d ∂v i=1 [0,1] i where I is the Information matrix t I = Eφ(ξ, θ0)φ(ξ, θ0) , and where φ(·, θ) = ∂/∂θ log c(·, θ) and ξ ∼ C◦. Lemma 1. Given the regularity assumptions on {cθ : θ ∈ Θ} stated in Genest et al. (1995) and somewhat more explicitly in Tsukahara (2005), we have √ W nUn −−−→ U ∼ Np(0, Σ). n→∞ We shall also need the symmetric matrix Z 00 ∂ log cθ0 (v) J = −A (θ0) = − c(v) t dv, [0,1]d ∂θ∂θ assumed to be of full rank. A useful random process is now √ Hn(s) = n{An(θ0 + s/ n) − An(θ0)}. d √ It is defined for those s ∈ R for which θ0 + s/ n is inside the parameter region; in d particular, for any s ∈ R , Hn(s) is defined for all large n. 4 STEFFEN GRØNNEBERG AND NILS LID HJORT Lemma 2. We have W t 1 t Hn(s) −−−→ H(s) = s U − s Js n→∞ 2 where U is the limit of Un and where the convergence takes place inside each Skorokhod space D([−a, a]d). Proof. Essentially, Taylor expansion demonstrate √ 1 H (s) = st nU − stJ s + o (1), n n 2 n P a.s. where Jn −−−→ J is the empirical version of J. The result then follows from Lemma n→∞ 1. ¤ The first consequence of note is the limiting distribution of the maximum pseudo- likelihood estimator. From continuity of the argmax functional, W Mn = argmax(Hn) −−−→ M = argmax(H). n→∞ But this is the same as √ W −1 −1 −1 n(θˆ − θ0) −−−→ J U ∼ Np(0,J ΣJ ). n→∞ This result duplicates that of Genest et al. (1995) and the slight extension given by Chen & Fan (2005) to allow model mis-specification. Secondly, we investigate the actual Kullback–Leibler distance from the true model to that used from fitting the parametric family. However, we model the marginal distributions fully non-parametrically through the empirical distribution functions, which are strongly consistent under any possible set of true marginal distributions. As long as we confine our selves to consider models estimated through the maximum pseudo likelihood paradigm, we are only interested in the Kullback–Leibler divergence KL(c◦, c(·, θˆ)) between the true data generating copula c◦ and the fitted parametric part of our model c(·, θˆ) given by Z Z KL(c◦, c(·, θˆ)) = c◦ log c◦ dv − c◦(v) log c(v, θˆ) dv. [0,1]d [0,1]d It is rather difficult (but possible) to estimate the first term from data, but we may ignore it, since it is common to all parametric families. For the purposes of model selection it therefore suffices to estimate the second term, which is A(θˆ). We now examine −1 estimator An(θˆ) = n `n,max vis-`a-vis target A(θˆ). Let Zn = n{An(θˆ) − An(θ0)} − n{A(θˆ) − A(θ0)}. Some re-arrangement shows that −1 An(θˆ) − A(θˆ) = n Zn + An(θ0) − A(θ0). (3) Also, 1 Z = H (M ) + n(θˆ − θ )tJ(θˆ − θ ) + o (1), n n n 2 0 0 P THE COPULA INFORMATION CRITERION 5 in which we define the stochastically significant part as pn, giving rise to 1 t W 1 t −1 t −1 pn := Hn(Mn) + n(θˆ − θ0) J(θˆ − θ0) −−−→ H(M) + U J U = U J U =: p. 2 n→∞ 2 We have ¡ ¢ p∗ = Ep = EU tJ −1U = Tr(J −1Σ) = Tr J −1I " Ã !# Xd Z −1 ∂φ(v, θ0) ◦ + Tr J Var (I{ξi ≤ vi} − vi) dC (v) . d ∂v i=1 [0,1] i Remark 3. Note that p∗ ≥ 0 as in the standard parametric case. In the purely parametric case, we have EAn(θ0) − A(θ0) = 0, and so the standard AIC argument simply ends here (modulo stating a reasonable empirical estimate pˆ∗ of p∗), giving ∗ AIC = 2`n,max − 2ˆp as an asymptotic bias-correction for the estimated (relevant part of the) Kullback–Leibler divergence.