Aims: Average Information Matrix Splitting 3

Total Page:16

File Type:pdf, Size:1020Kb

Aims: Average Information Matrix Splitting 3 AIMS: AVERAGE INFORMATION MATRIX SPLITTING Shengxin Zhu∗ Laboratory for Intelligent Computing and Financial Technology Department of Mathematics, Xi’an Jiaotong-Liverpool University Suzhou, 215123, P.R.China Tongxiang Gu and Xingping Liu Laboratory of Computational Physics Institute of Applied Physics and Computational Mathematics Beijing 100088, P.R.China Abstract. For linear mixed models with co-variance matrices which are not linearly dependent on variance component parameters, we prove that the av- erage of the observed information and the Fisher information can be split into two parts. The essential part enjoys a simple and computational friendly for- mula, while the other part which involves a lot of computations is a random zero matrix and thus is negligible. 1. Introduction. Many statistical methods require an estimation of unknown (co- )variance parameter(s). The estimation is usually obtained by maximizing a log- likelihood function. In principle, one requires the observed information matrix—the negative Hessian matrix of the log-likelihood—to obtain a maximum likelihood es- timator according to the Newton-Raphson method [11]. The expected value of the observed information matrix is usually referred to as the Fisher information matrix or simply the information matrix. It keeps the essential information about unknown parameters and enjoys a simper formula. Therefore it is widely used in many appli- cations [3]. The resulting algorithms is called the Fisher-scoring algorithm which is widely used in informetrics [13] and now is standard procedure in computational statistics [7, p.30]. The Fisher scoring algorithm is a success in simplifying the approximation of the Hessian matrix of the log-likelihood. Still, evaluating elements of the Fisher arXiv:1605.07646v4 [stat.ME] 8 May 2020 information matrix remains as one of bottlenecks in a log-likelihood maximiza- tion procedure, which prohibits the use of Fisher scoring algorithm for large data sets. In particular, the high-throughput technologies in biological science and en- gineering mean that the size of data sets and the corresponding statistical models have suddenly increased by several orders of magnitude. Further simplification of computational procedure is quite needed for applications of large scale statistical 2010 Mathematics Subject Classification. Primary: 65J12, 62P10;Secondary: 65C60,65F99. Key words and phrases. Observed information matrix, Fisher information matrix, Newton method, linear mixed model,variance parameter estimation, Average Information. This research is supported by Foundation of LCP(6142A05180501), Jiangsu Science and Tech- nology Basic Research Program (BK20171237), Key Program Special Fund of XJTLU (KSF-E-21, KSF-P-02), Research Development Fund of XJTLU (RDF-2017-02-23) and partially supported by NSFC (No.11771002, 11571047, 11671049, 11671051, 6162003, and 11871339). ∗ Corresponding author: Shengxin Zhu. 1 2 SHENGXIN ZHU models, such as genome-wide association studies, which involves many thousands parameters to be estimated [22]. The aim of this short note is to provide a concise mathematical result: the average of the observed information and the Fisher information can be split into two parts; the essential part enjoys a simple formula and is easy to compute, while the other part which involves a lot of computations is a random zero matrix and thus is negligible. Such a spitting and approximation provides significant reduction in computations. What we should mention is that the average information idea has been proposed in [12] for (co)variance matrices which linearly depend on variance parameters. It results in an efficient breeding algorithm in [6], and followed by [14]. However, previous results assume that the variance-variance matrix should be linearly dependent on the underlying variance parameters. Here we prove that similar results still be obtained even the variance-variance matrix is not linearly dependent on the underlying variance parameters. 2. Preliminary. Consider the following widely-used linear mixed models [1, 2, 5, 28, 21] y = Xτ + Zu + ǫ, (1) where y ∈ Rn×1 is the observation, τ ∈ Rp×1 is the vector of fixed effects, X ∈ Rn×p is the design matrix which corresponds to the fixed effects, u ∈ Rb×1 is the vector of unobserved random effects, Z ∈ Rn×b is the design matrix which corresponds to the random effects. ǫ ∈ Rn×1 is the vector of residual errors. The random effects, u, and the residual errors, ǫ, follow multivariate normal distributions such that E(u) = 0, E(ǫ) = 0, u ∼ N(0, σ2G), ǫ ∼ N(0, σ2R) and u G 0 var = σ2 , (2) ǫ 0 R where G ∈ Rb×b, R ∈ Rn×n. Typically G and R are parameterized matrices with unknown parameters to be estimated. Precisely, suppose that G = G(γ), R = R(φ), and denote κ = (γ, φ), θ = (σ2,κ). Estimating our main concern, the variance parameters θ, requires a conceptually simple nonlinear iterative procedure: one has to maximize a residual log-likelihood function of the form [18, p.252] 1 yT Py ℓ = const − {(n − ν) log σ2 + log det(H) + log det(XT H−1X)+ }, (3) R 2 σ2 where H = R(φ)+ ZG(γ)ZT , ν = rank(X) and P = H−1 − H−1X(XT H−1X)−1XT H−1. Here we suppose X is full rank, say, ν = p. The first derivatives of ℓR is referred to as the scores for the variance parameters θ := (σ2,κ)T [18, p.252]: ∂ℓ 1 n − ν yT Py s(σ2)= R = − − , (4) ∂σ2 2 σ2 σ4 ∂ℓR 1 ∂H 1 T ∂H s(κi)= = − tr(P ) − 2 y P Py . (5) ∂κi 2 ∂κi σ ∂κi where κ = (γT , φT )T . We shall denote 2 T S(θ) = (s(σ ),s(κ1),...,s(κm)) . AIMS: AVERAGE INFORMATION MATRIX SPLITTING 3 Algorithm 1 Newton-Raphson method to solve S(θ) = 0. 1: Give an initial guess of θ0 2: for k =0, 1, 2, ··· until convergence do 3: Solve IO(θk)δk = S(θk), 4: θk+1 = θk + δk 5: end for The negative Hessian of the log-likelihood function is referred to as the observed information matrix. We will denote the matrix as IO. 2 2 2 ∂ ℓR ∂ ℓR ∂ ℓR 2 2 2 2 1 ··· ∂σ2 ∂σ ∂σ2 ∂κ ∂σ 2∂κm ∂ ℓR ∂ ℓR ∂ ℓR 2 ··· ∂κ1∂σ ∂κ1∂κ1 ∂κ1∂κm IO = − . (6) . .. 2 2 2 ∂ ℓR ∂ ℓR ∂ ℓR 2 ··· ∂κm∂σ ∂κm∂κ1 ∂κm∂κm Given an initial guess of the variance parameter θ0, a standard approach to maximize ℓR or find the root of the score equation S(θ) = 0 is the Newton-Raphson method (Algorithm 1), which requires elements of the observed information matrix. In particular, tr(P H¨ ) − tr(P H˙ P H˙ ) 2yT P H˙ P H˙ Py − yT P H¨ Py I (κ ,κ )= ij i j + i j ij , (7) O i j 2 2σ2 2 where H˙ = ∂H and H¨ = ∂ H . Each element I (κ ,κ ) involves two compu- i ∂κi ij ∂κiκj O i j tationally intensive trace terms, which prohibits the practical use of the (exact) Newton-Raphson method for large data sets. In practice, the Fisher information matrix, I = E(IO), is preferred. The ele- ments of the Fisher information matrix have simper forms than these of the observed information matrix for example tr P H˙ P H˙ I(κ ,κ )= i j . (8) i j 2 The corresponding algorithm is referred to as the Fisher scoring algorithm [13]. Still the element I(κi,κj ) of the Fisher information matrix involves the computationally expensive trace terms. When the variance-variance matrix H is linearly dependent on the variance pa- rameter, say H¨ij = 0, researchers noticed that the average information matrix I(κ ,κ )+ I (κ ,κ ) yT PH PH Py i j O i j = i j (9) 2 2σ2 enjoys a simpler and more computational friendly formula [6][12]. yT P H˙ P H˙ Py I = i j . (10) A 2σ2 For more general covariance matrices when H¨ij 6= 0, we still have the following nice property by the classical matrix splitting [20, p.94]. The average information matrix can be split into two parts as follows [25]: I(κ ,κ )+ I (κ ,κ ) yT P H˙ P H˙ Py tr(P H¨ ) − yT P H¨ Py/σ2 i j O i j = i j + ij ij . (11) 2 2σ2 4 I IA(κi,κj ) Z (κi,κj ) | {z } | {z } 4 SHENGXIN ZHU Such a splitting enjoys a nice property which is presented as our main result. 3. Main result. Theorem 3.1. Let IO and I be the observed information matrix and the Fisher information matrix for the residual log-likelihood of linear mixed model respectively, then the average of the observed information matrix and the Fisher information IO+I matrix can be split as 2 = IA +IZ , such that the expectation of IA is the Fisher information matrix and E(IZ )=0. Such a splitting aims to remove computationally expensive and negligible terms so that a Newton-like method is applicable for large data which involves thousands of fixed and random effects. It keeps the essential information in the observed infor- mation matrix. In this sense, IA is a good approximation which is data-dependent (on y) to the data-independent Fisher information. An Quasi-Newton iterative procedure is obtained by replacing IO with IA in Algorithm 1. Proof of the main result. 3.1. Basic Lemma. Lemma 3.2. Let y ∼ N(Xτ,σ2H), be a random variable and H is a symmetric positive definite matrix, then P = H−1 − H−1X(XT H−1X)−1XT H−1 is a weighted projection matrix such that 1. PX =0; 2. P HP = P ; 3. tr(PH)= n − ν, where rank(X)= ν; 4. P E(yyT )= σ2PH. Proof. The first two terms can be verified by direct computation.
Recommended publications
  • Fast Inference in Generalized Linear Models Via Expected Log-Likelihoods
    Fast inference in generalized linear models via expected log-likelihoods Alexandro D. Ramirez 1,*, Liam Paninski 2 1 Weill Cornell Medical College, NY. NY, U.S.A * [email protected] 2 Columbia University Department of Statistics, Center for Theoretical Neuroscience, Grossman Center for the Statistics of Mind, Kavli Institute for Brain Science, NY. NY, U.S.A Abstract Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate compu- tation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting \expected log-likelihood" can in many cases be computed sig- nificantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation be- comes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings com- pared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly de- crease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina.
    [Show full text]
  • Empirical Bayes Methods for Combining Likelihoods Bradley EFRON
    Empirical Bayes Methods for Combining Likelihoods Bradley EFRON Supposethat several independent experiments are observed,each one yieldinga likelihoodLk (0k) for a real-valuedparameter of interestOk. For example,Ok might be the log-oddsratio for a 2 x 2 table relatingto the kth populationin a series of medical experiments.This articleconcerns the followingempirical Bayes question:How can we combineall of the likelihoodsLk to get an intervalestimate for any one of the Ok'S, say 01? The resultsare presented in the formof a realisticcomputational scheme that allows model buildingand model checkingin the spiritof a regressionanalysis. No specialmathematical forms are requiredfor the priorsor the likelihoods.This schemeis designedto take advantageof recentmethods that produceapproximate numerical likelihoodsLk(6k) even in very complicatedsituations, with all nuisanceparameters eliminated. The empiricalBayes likelihood theoryis extendedto situationswhere the Ok'S have a regressionstructure as well as an empiricalBayes relationship.Most of the discussionis presentedin termsof a hierarchicalBayes model and concernshow such a model can be implementedwithout requiringlarge amountsof Bayesianinput. Frequentist approaches, such as bias correctionand robustness,play a centralrole in the methodology. KEY WORDS: ABC method;Confidence expectation; Generalized linear mixed models;Hierarchical Bayes; Meta-analysis for likelihoods;Relevance; Special exponential families. 1. INTRODUCTION for 0k, also appearing in Table 1, is A typical statistical analysis blends data from indepen- dent experimental units into a single combined inference for a parameter of interest 0. Empirical Bayes, or hierar- SDk ={ ak +.S + bb + .5 +k + -5 dk + .5 chical or meta-analytic analyses, involve a second level of SD13 = .61, for example. data acquisition. Several independent experiments are ob- The statistic 0k is an estimate of the true log-odds ratio served, each involving many units, but each perhaps having 0k in the kth experimental population, a different value of the parameter 0.
    [Show full text]
  • Stat 3701 Lecture Notes: Statistical Models, Part II
    Stat 3701 Lecture Notes: Statistical Models, Part II Charles J. Geyer August 12, 2020 1 License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http: //creativecommons.org/licenses/by-sa/4.0/). 2 R • The version of R used to make this document is 4.0.2. • The version of the rmarkdown package used to make this document is 2.3. • The version of the alabama package used to make this document is 2015.3.1. • The version of the numDeriv package used to make this document is 2016.8.1.1. 3 General Statistical Models The time has come to learn some theory. This is a preview of STAT 5101–5102. We don’t need to learn much theory. We will proceed with the general strategy of all introductory statistics: don’t derive anything, just tell you stuff. 3.1 Probability Models 3.1.1 Kinds of Probability Theory There are two kinds of probability theory. There is the kind you will learn in STAT 5101-5102 (or 4101–4102, which is more or less the same except for leaving out the multivariable stuff). The material covered goes back hundreds of years, some of it discovered in the early 1600’s. And there is the kind you would learn in MATH 8651–8652 if you could take it, which no undergraduate does (it is a very hard Ph. D. level math course). The material covered goes back to 1933. It is all “new math”. These two kinds of probability can be called classical and measure-theoretic, respectively.
    [Show full text]
  • Observed Information Matrix for MUB Models
    Quaderni di Statistica Vol.8, 2006 Observed information matrix for MUB models Domenico Piccolo Dipartimento di Scienze Statistiche Università degli Studi di Napoli Federico II E-mail: [email protected] Summary: In this paper we derive the observed information matrix for MUB models, without and with covariates. After a review of this class of models for ordinal data and of the E-M algorithms, we derive some closed forms for the asymptotic variance- covariance matrix of the maximum likelihood estimators of MUB models. Also, some new results about feeling and uncertainty parameters are presented. The work lingers over the computational aspects of the procedure with explicit reference to a matrix- oriented language. Finally, the finite sample performance of the asymptotic results is investigated by means of a simulation experiment. General considerations aimed at extending the application of MUB models conclude the paper. Keywords: Ordinal data, MUB models, Observed information matrix. 1. Introduction “Choosing to do or not to do something is a ubiquitous state of ac- tivity in all societies” (Louvier et al., 2000, 1). Indeed, “Almost without exception, every human beings undertake involves a choice (consciously or sub-consciously), including the choice not to choose” (Hensher et al., 2005, xxiii). From an operational point of view, the finiteness of alternatives limits the analysis to discrete choices (Train, 2003) and the statistical interest in this area is mainly devoted to generate probability structures adequate to interpret, fit and forecast human choices1. 1 An extensive literature focuses on several aspects related to economic and market- 34 D. Piccolo Generally, the nature of the choices is qualitative (or categorical), and classical statistical models introduced for continuous phenomena are nei- ther suitable nor effective.
    [Show full text]
  • Method for Computation of the Fisher Information Matrix in the Expectation–Maximization Algorithm
    1 August 2016 Method for Computation of the Fisher Information Matrix in the Expectation–Maximization Algorithm Lingyao Meng Department of Applied Mathematics & Statistics The Johns Hopkins University Baltimore, Maryland 21218 Abstract The expectation–maximization (EM) algorithm is an iterative computational method to calculate the maximum likelihood estimators (MLEs) from the sample data. It converts a complicated one-time calculation for the MLE of the incomplete data to a series of relatively simple calculations for the MLEs of the complete data. When the MLE is available, we naturally want the Fisher information matrix (FIM) of unknown parameters. The FIM is, in fact, a good measure of the amount of information a sample of data provides and can be used to determine the lower bound of the variance and the asymptotic variance of the estimators. However, one of the limitations of the EM is that the FIM is not an automatic by-product of the algorithm. In this paper, we review some basic ideas of the EM and the FIM. Then we construct a simple Monte Carlo-based method requiring only the gradient values of the function we obtain from the E step and basic operations. Finally, we conduct theoretical analysis and numerical examples to show the efficiency of our method. The key part of our method is to utilize the simultaneous perturbation stochastic approximation method to approximate the Hessian matrix from the gradient of the conditional expectation of the complete-data log-likelihood function. Key words: Fisher information matrix, EM algorithm, Monte Carlo, Simultaneous perturbation stochastic approximation 1. Introduction The expectation–maximization (EM) algorithm introduced by Dempster, Laird and Rubin in 1977 is a well-known method to compute the MLE iteratively from the observed data.
    [Show full text]
  • Partially Observed Information and Inference About Non-Gaussian
    The Annals of Statistics 2005, Vol. 33, No. 6, 2695–2731 DOI: 10.1214/009053605000000543 c Institute of Mathematical Statistics, 2005 PARTIALLY OBSERVED INFORMATION AND INFERENCE ABOUT NON-GAUSSIAN MIXED LINEAR MODELS1 By Jiming Jiang University of California, Davis In mixed linear models with nonnormal data, the Gaussian Fisher information matrix is called a quasi-information matrix (QUIM). The QUIM plays an important role in evaluating the asymptotic covari- ance matrix of the estimators of the model parameters, including the variance components. Traditionally, there are two ways to estimate the information matrix: the estimated information matrix and the observed one. Because the analytic form of the QUIM involves pa- rameters other than the variance components, for example, the third and fourth moments of the random effects, the estimated QUIM is not available. On the other hand, because of the dependence and nonnor- mality of the data, the observed QUIM is inconsistent. We propose an estimator of the QUIM that consists partially of an observed form and partially of an estimated one. We show that this estimator is consistent and computationally very easy to operate. The method is used to derive large sample tests of statistical hypotheses that involve the variance components in a non-Gaussian mixed linear model. Fi- nite sample performance of the test is studied by simulations and compared with the delete-group jackknife method that applies to a special case of non-Gaussian mixed linear models. 1. Introduction. Mixed linear models are widely used in practice, espe- cially in situations involving correlated observations. A typical assumption regarding these models is that the observations are normally distributed, or, equivalently, that the random effects and errors in the model are normal.
    [Show full text]
  • Maximum Likelihood Estimation ∗ Contents
    Statistical Inference: Maximum Likelihood Estimation ∗ Konstantin Kashin Spring óþÕ¦ Contents Õ Overview: Maximum Likelihood vs. Bayesian Estimationó ó Introduction to Maximum Likelihood Estimation¢ ó.Õ What is Likelihood and the MLE?..............................................¢ ó.ó Examples of Analytical MLE Derivations..........................................ß ó.ó.Õ MLE Estimation for Sampling from Bernoulli Distribution..........................ß ó.ó.ó MLE Estimation of Mean and Variance for Sampling from Normal Distribution.............. ó.ó.ì Gamma Distribution................................................. ó.ó.¦ Multinomial Distribution..............................................É ó.ó.¢ Uniform Distribution................................................ Õþ ì Properties of MLE: e Basics ÕÕ ì.Õ Functional Invariance of MLE................................................ÕÕ ¦ Fisher Information and Uncertainty of MLE Õó ¦.þ.Õ Example of Calculating Fisher Information: Bernoulli Distribution...................... Õ¦ ¦.Õ e eory of Fisher Information.............................................. Õß ¦.Õ.Õ Derivation of Unit Fisher Information...................................... Õß ¦.Õ.ó Derivation of Sample Fisher Information..................................... ÕÉ ¦.Õ.ì Relationship between Expected and Observed Fisher Information...................... ÕÉ ¦.Õ.¦ Extension to Multiple Parameters......................................... óþ ¢ Proofs of Asymptotic Properties of MLE óÕ ¢.Õ Consistency of MLE.....................................................
    [Show full text]
  • Curvature and Inference for Maximum Likelihood Estimates
    Submitted to the Annals of Statistics CURVATURE AND INFERENCE FOR MAXIMUM LIKELIHOOD ESTIMATES By Bradley Efron Stanford University Maximum likelihood estimates are sufficient statistics in exponen- tial families, but not in general. The theory of statistical curvature was introduced to measure the effects of MLE insufficiency in one- parameter families. Here we analyze curvature in the more realistic venue of multiparameter families|more exactly, curved exponential families, a broad class of smoothly defined non-exponential family models. We show that within the set of observations giving the same value for the MLE, there is a \region of stability" outside of which the MLE is no longer even a local maximum. Accuracy of the MLE is affected by the location of the observation vector within the re- gion of stability. Our motivating example involves \g-modeling", an empirical Bayes estimation procedure. 1. Introduction. In multiparameter exponential families, the maximum likelihood estimate (MLE) for the vector parameter is a sufficient statistic. This is no longer true in non-exponential families. Fisher (1925, 1934) argued persuasively that in this case data sets yielding the same MLE could differ greatly in their estimation accuracy. This effect was examined in Efron (1975, 1978), where \statistical curvature" was introduced as a measure of non-exponentiality. One-parameter families were the only ones considered there. The goal of this paper is to quantitatively examine curvature effects in multiparameter families| more precisely, in multiparameter curved exponential families, as defined in Section 2. Figure 1 illustrates a toy example of our situation of interest. Independent Poisson random variables have been observed, ind (1.1) yi ∼ Poi(µi) for i = 1; 2;:::; 7: We assume that (1.2) µi = α1 + α2xi; x = (−3; −2; −1; 0; 1; 2; 3): The vector of observed values yi was (1.3) y = (1; 1; 6; 11; 7; 14; 15): Direct numerical maximization yielded (1.4)α ^ = (7:86; 2:38) as the MLE of α = (α1; α2).
    [Show full text]
  • Loglikelihood and Confidence Intervals
    Stat 504, Lecture 3 1 Stat 504, Lecture 3 2 ' $ ' $ Review (contd.): The likelihood of the sample is the joint PDF (or Loglikelihood and PMF) Confidence Intervals n L(θ) = f(x1, .., xn; θ) = f(xi; θ) i=1 Y Review: The maximum likelihood estimate (MLE) θˆMLE maximizes L(θ): Let X1, X2, ..., Xn be a simple random sample from a probability distribution f(x; θ). L(θˆMLE ) L(θ), θ ≥ ∀ A parameter θ of f(x; θ) is a variable that is If we use Xi’s instead of xi’s then θˆ is the maximum characteristic of f(x; θ). likelihood estimator. A statistic T is any quantity that can be calculated Usually, MLE: from a sample; it’s a function of X1, ..., Xn. is unbiased, E(θˆ) = θ An estimate θˆ for θ is a single number that is a • reasonable value for θ. is consistent, θˆ θ, as n • → → ∞ An estimator θˆ for θ is a statistic that gives the is efficient, has small SE(θˆ) as n • → ∞ formula for computing the estimate θˆ. ˆ− is asymptotically normal, (θ θ) N(0, 1) • SE(θˆ) ≈ & % & % Stat 504, Lecture 3 3 Stat 504, Lecture 3 4 ' $ ' $ The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). The loglikelihood, however, is the sum of the For a variety of reasons, statisticians often work with individual loglikelihoods: the loglikelihood rather than with the likelihood. One reason is that l(θ ; x) tends to be a simpler function l(θ ; x) = log f(x ; θ) than L(θ ; x).
    [Show full text]
  • Generalized Linear Models
    Generalized linear models Juha Karvanen April 30, 2009 Contents 1 What is a generalized linear model? 5 1.1 Model ............................... 5 1.2 Linearmodel ........................... 5 1.3 Generalizedlinearmodel . 6 1.4 Motivatingexamples . .. .. 7 1.5 Linkfunctions........................... 8 1.6 Confusingterminology . 8 1.6.1 Generalized linear model (GLM) and general linear model(GLM)....................... 8 1.6.2 Names of X and Y .................... 9 2 Generalized linear models in statistical software 10 2.1 Generalized linear models in R . 10 2.2 Generalized linear models in SAS, Matlab and SPSS . 12 3 Theory of generalized linear models 13 3.1 Notation.............................. 13 3.2 Modelassumptions . .. .. 13 3.3 Likelihood............................. 14 3.4 Canonicallink........................... 15 3.5 Score function, observed information and expected informa- tion(Fisherinformation) . 15 3.6 Estimation............................. 16 3.7 Deviance.............................. 17 3.8 Quasi-likelihood. .. .. 18 1 Generalized linear models University of Helsinki, spring 2009 4 Modeling 19 4.1 Processofmodeling. .. .. 19 4.2 Residuals ............................. 19 4.3 Nonlinearterms.......................... 20 4.4 Interactions ............................ 20 4.5 Hypothesistesting . .. .. 21 4.5.1 Singletest......................... 21 4.5.2 Multipletests ....................... 21 4.6 Modelselection .......................... 22 4.7 Experimental and observational studies . 23 4.8 Missingdata...........................
    [Show full text]
  • Estimating Standard Errors and Efficient Goodness-Of-Fit Tests For
    Estimating standard errors and efficient goodness-of-fit tests for regular vine copula models ULF SCHEPSMEIER Fakult¨atf¨urMathematik Technische Universit¨atM¨unchen 85748 Garching TECHNISCHE UNIVERSITAT¨ MUNCHEN¨ Lehrstuhl f¨urMathematische Statistik Estimating standard errors and efficient goodness-of-fit tests for regular vine copula models Ulf Schepsmeier Vollst¨andigerAbdruck der von der Fakult¨atf¨urMathematik der Technischen Universit¨at M¨unchen zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende: Univ.-Prof. Donna Pauler Ankerst, Ph.D. Pr¨uferder Dissertation: 1. Univ.-Prof. Claudia Czado, Ph.D. 2. Prof. Peter X.K. Song, Ph.D. University of Michigan, USA 3. Prof. Dr. Kjersti Aas University of Bergen, Norway Die Dissertation wurde am 16.10.2013 bei der Technischen Universit¨atM¨unchen eingere- icht und durch die Fakult¨atf¨urMathematik am 27.12.2013 angenommen. Zusammenfassung In der vorliegenden Arbeit betrachten wir die sehr flexible Klasse der regul¨arenVine- Copula Modelle (R-vines). Diese multivariaten Copulas werden hierarchisch konstruiert, wobei lediglich zwei-dimensionale Copulas als Bausteine benutzt werden. Die Zerlegung selber wird Paar-Copula-Konstruktion (PCC) genannt. F¨urdiese R-vine Copula Mod- elle f¨uhrenwir die konzeptionelle wie auch algorithmisch umgesetzte Berechnung der Scorefunktion und der beobachteten Informationsmatrix ein. Die damit einhergehende Absch¨atzungder Fisher-Information erlaubt es uns die Standardfehler der Parameter- sch¨atzerroutinem¨aßigzu bestimmen. Außerdem beseitigen die analytischen Ausdr¨ucke des R-vine log-likelihood Gradienten und der Hessematrix Genauigkeitsdefizite bei den bisher gebr¨auchlichen numerischen Ableitungen. Die hierf¨urben¨otigten bivariaten Ableitungen bez¨uglich der Copulaparameter und -argumente werden ebenfalls berechnet.
    [Show full text]
  • Generalized Linear Models Lecture 2
    Generalized Linear Models Lecture 2. Exponential Family of Distributions Score function. Fisher information GLM (MTMS.01.011) Lecture 2 1 / 32 Background. Linear model y = Xβ + ε, Ey = µ = Xβ T y = (y1,..., yn) – dependent variable, response c c X = (1, x1,..., xk ) – design matrix n × p, 1 – vector of 1s T β = (β0, . , βk ) – vector of unknown parameters k – number of explanatory variables (number of unknown parameters p = k + 1) T ε = (ε1, . , εn) – vector of random errors Assumptions: 2 2 Eεi = 0, εi ∼ N(0, σ ), Dεi = σ . ⇒ Model has normal errors, constant variance and the mean of response equals to the linear combination of explanatory variables. There are some situations where general linear models are not appropriate The range of Y is restricted (e.g. binary, count) The variance of Y depends on the mean. Generalized linear models extend the general linear model framework to address these issues. GLM (MTMS.01.011) Lecture 2 2 / 32 Data structure In cross-sectional analysis the data (yi , xi ) consists of observations for each individual i (i = 1,..., n). Ungrouped data – the common situation, one row of data matrix corresponds to exactly one individual. Grouped data – only rows with different combinations of covariate values appear in data matrix, together with the number of repetitions and the arithmetic mean of the individual responses Notation n – number of groups (or individuals in ungrouped case) ni – number of individuals (repetitions) corresponding to row i N – total number of measurements (sample size): n X N = ni i=1 Ungrouped data is a special case of grouped data with n1 = ..
    [Show full text]