Statistical Model Evaluation by Generalized Information Criteria

Statistical Mo del Evaluation by Generalized Information Criteria | Bias and Variance Reduction Techniques | Genshiro Kitagawa The Institute of Statistical Methematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 106-8569 Japan [email protected] Sadanori Konishi Kyushu University, Graduate School of Mathematics 6-10-1 Hakozaki, Higashi-ku Fukuoka 812-8581 Japan [email protected] 1. Intro duction The problem of evaluating the go o dness of statistical mo dels is crucially imp ortant in various elds of statistical science. Akaike's 1973 information criterion provides a useful to ol for evaluating mo dels estimated by the metho d of maximum likeliho o d. In recent years advances in the p erformance of computers enables us to construct complicated mo dels for analyzing data with complex structure, and consequently more ecient criteria are required for mo del evaluation and selection problems. By extending Akaike's basic idea, several attempts have b een made to relax the assumptions imp osed on AIC and obtained information criteria which maybe applied to various typ es of statistical mo dels. The purp ose of the present pap er is to prop ose information criteria which yield more re ned results than previously prop osed criteria. The use of the b o otstrap in mo del evaluation problems is also investigated from theoretical and practical p oints of view. 2. Bias Correction for the Log-Likeliho o d and Information Criteria Assume that the observations are generated from an unknown \true" distribution function Gx and the mo del is characterized by a density function f x. In the derivation of AIC Akaike R 1973, the exp ected log-likeliho o d E log f Y = log f y dGy is used as the basic measure Y to evaluate the similaritybetween two distributions, which is equivalent to the Kullback-Leibler information. In actual situations, Gx is unknown and only a sample X = fX ;:::;X g is given. We 1 n R P n ^ then use the log-likeliho o d ` = n log f xdG x = log f X as a natural estimator of n i i=1 ^ n times of the exp ected log-likeliho o d. Here G x is the empirical distribution function. n 1 ^ ^ For a statistical mo del f xj tted to observed data, the log-likeliho o d n ` = P n 1 1 ^ ^ n log f X j n log f X j has a p ositive bias as an estimator of the exp ected i i=1 ^ log-likeliho o d, E log f Y j , and it cannot b e directly used for mo del selection. By correcting Y the bias 1 ^ ^ bG=nE log f Xj E log f Y j ; 1 X Y n an estimator of the exp ected log-likeliho o d is obtained in the form 1 1 ^ ^ ^ ^ IC = 2n log f Xj b G = 2 log f Xj +2bG; 2 n n ^ where b Gisa bias estimate. In a general setting, it is extremely dicult to obtain the bias bG in a closed form. Hence it is usually approximated by an asymptotic bias. Akaike 1973 approximated the bGby the ^ numb er of parameters, b = m, and prop osed the AIC criterion, AIC = 2 log f Xj + AIC ML ^ 2m; where is the maximum likeliho o d estimate. On the other hand, Takeuchi 1976 ML 1 ^ ^ ^ ^ prop osed b =trfIGJG g b eing the asymptotic bias of bG, where I G and J G are TIC resp ectively the estimates of the Fisher information and exp ected Hessian matrices. This metho d of bias correction for the log-likeliho o d can b e extended to a general mo del ^ ^ constructed by using an m dimensional statistical functional such as = TG . For such a n general statistical mo del, Konishi and Kitagawa 1996 derived the asymptotic bias @ log f Y jTG 1 T Y ; G b G= trE ; 3 0 GIC Y @ 1 and prop osed GIC Generalized Information Criterion. Here T Y ; G is the rst derivative of the statistical functional TY ; G which is usually called the in uence function. The b o otstrap metho d provides an alternativeway of bias correction of the log-likeliho o d, for which the bias bG in 1 is estimated by n o ~ ~ ^ b G = E log f X j X log f Xj X ; 4 B n X called EIC Extended Information Criterion Ishiguro et al. 1997. In actual computation, ^ the b o otstrap bias correction term b G is approximated by b o otstrap resampling. B n 3. Improvement of Information Criteria In particular situations of distributional and structural assumptions for mo dels, Sugiura 1978 and Hurvich & Tsai 1989 demonstrated the e ectiveness of the bias reduction of AIC. Weinvestigate this bias reduction pro cedure in a general situation. ^ The rst order bias-corrected information criterion can b e generally denoted by log f Xj ^ ^ b G ; where b G is the rst order bias correction term such as 3. We consider the higher 1 n 1 n order bias correction for the information criteria of the rst order bias-corrected log- !%Bias likeliho o d as an estimate of the exp ected log-likeliho o d is given by h i ^ ^ ^ E log f Xj b G nE log f Y j X 1 n Y h i h i ^ ^ ^ = E log f Xj nE log f Y j E b G : 5 X Y X 1 n The rst term in the right hand side is the bias of the log-likeliho o d and maybe expanded as h i 1 2 ^ ^ b G+On ; 6 bG=E log f Xj nE log f Y j = b G+ 2 X Y 1 n ^ where b G is the rst order bias correction term. The exp ected value of b G can also be 1 1 n expanded as h i 1 2 ^ b G = b G+ b G+On : 7 E 1 1 1 X n Hence, the bias of the rst order bais-corrected log-likeliho o d is h i 1 2 ^ ^ ^ log f XjG b G nE log f Y j = E fb G b Gg + O n ; 8 n 1 n Y X 2 1 n where b G b Gisgiven by 2 1 8 Z Z m < X 1 @ log f xjTG 2 dGx b G b G = T x; x; GdGx 2 1 p : 2 @ p p=1 Z Z m m 2 X X @ log f xjTG 1 1 + T x; GT x; GdGx dGx p q @ @ p q p=1 q =1 Z m 2 X @ log f xjTG 1 1 T x; GT x; G dGx p q @ @ p q p=1 9 Z m = X @ log f xjTG 2 1 T x; x; G 2T x; G dGx : 9 p p ; @ p p=1 The second order bias-corrected information criterion is then de ned by 1 ^ ^ ^ ^ GIC = 2 log f Xj +2 b G + b G b G : 10 2 1 n 2 n 1 n n R [Example] ABayesian predictive distribution is de ned by hz jX = fzj jXd ; where and jX are the prior and the p osterior distribution, resp ectively. By using the 1 ^ ^ ^ Laplace metho ds, the rst order bias b G is given bytrfIGJG g. Therefore the second 1 n ^ order bias term b G for the Bayesian mo del is obtained from n 2 Z 1 ^ ^ b G = E [ log hX jX trfI G J G g n log hz jX dGz : 11 2 X n n For the mo del with complex structure, it is more practical to obtain the second order bias correction term by b o otstrapping o i h n 1 ^ ^ ^ n log hXjX ; 12 E log f X j X tr I G J G X n n ^ where X is the b o otstrap sample and G is the empirical distribution function based on X . 4. Automatic Variance Reduction Technique in Bo otstrap Simulation A practically imp ortant problem with the b o otstrap metho d for the mo del selection is the reduction of the variance of the bias estimate. If the variance in the b o otstrap simulation is large, a large number of b o otstrap resampling is required. The variance of the b o otstrap estimate of the bias de ned in 4 can b e reduced automatically without any analytical arguments Konishi and Kitagawa 1996, Ishiguro et al. 1997. We give the theoretical justi cation for statistical mo dels estimated by functional approach. ^ ^ Let D X ; G = log f Xj nE [log f Y j ]. Then D X; G can be decomp osed into Y D X; G=D X;G+D X;G+D X;G 13 1 2 3 ^ where D X; G = log f Xj - log f XjTG, D X; G = log f XjTG - nE [log f Y jT G] 1 2 Y ^ and D X; G = nE [log f Y jT G] - nE [log f Y j ]. For a general estimator de ned by a 3 Y Y ^ ^ statistical functional = T G , it can be shown that the b o otstrap estimate of E [D + D ] n X 1 3 is the same as that of E [D ], but VarfD g = O n and VarfD + D g = O 1. Therefore by X 1 3 estimating the bias by ^ b G = E [D +D ]; 14 n X 1 3 B a signi cant reduction of the variance can b e achieved for any estimators de ned by statistical functional, esp ecially for large n.

Statistical Model Evaluation by Generalized Information Criteria

Higher-Order Asymptotics

The Method of Maximum Likelihood for Simple Linear Regression

Use of the Kurtosis Statistic in the Frequency Domain As an Aid In

Statistical Models in R Some Examples

A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications

Chapter 5 Statistical Models in Simulation

Statistical Modeling Methods: Challenges and Strategies

Principles of Statistical Inference

Probability, Algorithmic Complexity, and Subjective Randomness

Effects of Skewness and Kurtosis on Model Selection Criteria

Autocorrelation-Robust Inference*

Measures of Multivariate Skewness and Kurtosis in High-Dimensional Framework