9. Linear Models and Regression Objective

Total Page:16

File Type:pdf, Size:1020Kb

9. Linear Models and Regression Objective 9. Linear models and regression AFM Smith Objective To illustrate the Bayesian approach to fitting normal and generalized linear models. Bayesian Statistics Recommended reading • Lindley, D.V. and Smith, A.F.M. (1972). Bayes estimates for the linear model (with discussion). Journal of the Royal Statistical Society B, 34, 1–41. • Broemeling, L.D. (1985). Bayesian Analysis of Linear Models, Marcel- Dekker. • Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003). Bayesian Data Analysis, Chapter 8. • Wiper, M.P., Pettit, L.I. and Young, K.D.S. (2000). Bayesian inference for a Lanchester type combat model. Naval Research Logistics, 42, 609–633. Bayesian Statistics Introduction: the multivariate normal distribution Definition 22 T A random variable X = (X1,...,Xk) is said to have a multivariate normal distribution with mean µ and variance / covariance matrix Σ if 1 1 T −1 Rk f(x|µ, Σ) = 1 exp − (x − µ) Σ (x − µ) for x ∈ . (2π)k/2|Σ|2 2 In this case, we write X|µ, Σ ∼ N (µ, Σ). The following properties of the multivariate normal distribution are well known. Bayesian Statistics i. Any subset of X has a (multivariate) normal distribution. Pk ii. Any linear combination i=1 αiXi is normally distributed iii. If Y = a + BX is a linear transformation of X, then Y|µ, Σ ∼ N a + Bµ, BΣBT . X1 µ Σ11 Σ12 iv. If X = µ, Σ ∼ N 1 , then the conditional X2 µ2 Σ21 Σ22 density of X1 given X2 = x2 is −1 −1 X1|x2, µ, Σ ∼ N µ1 + Σ12Σ22 (x2 − µ2), Σ11 − Σ12Σ22 Σ21 Bayesian Statistics The multivariate normal likelihood function Suppose that we observe a sample x = (x1,..., xn) of data from N (µ, Σ). Then the likelihood function is given by n ! 1 1 X T −1 l(µ, Σ|x) = n exp − (xi − µ) Σ (xi − µ) (2π)nk/2|Σ| 2 2 i=1 " n #! 1 1 X T −1 T −1 ∝ n exp − (xi − x¯) Σ (xi − x¯) + n(µ − x¯) Σ (µ − x¯) |Σ| 2 2 i=1 1 1 −1 T −1 ∝ n exp − tr SΣ + n(µ − x¯) Σ (µ − x¯) |Σ| 2 2 1 Pn Pn T where x¯ = n i=1 xi and S = i=1(xi − x¯)(xi − x¯) and tr(M) represents the trace of the matrix M. It is possible to carry out Bayesian inference with conjugate priors for µ, Σ. We shall consider two cases which reflect different levels of knowledge about the variance-covariance matrix Σ. Bayesian Statistics 1 Conjugate Bayesian inference for the multivariate normal distribution I: Σ = φC Firstly, consider the case where the variance-covariance matrix is known up 1 to a constant, i.e. Σ = φC where C is a known matrix. Then, we have 1 X|µ, φ ∼ N µ, φC and the likelihood function is nk φ l(µ, φ|x) ∝ φ 2 exp − tr SC−1 + n(µ − x¯)T C−1(µ − x¯) . 2 Analogous to the univariate case, it can be seen that a multivariate normal- gamma prior distribution is conjugate. Bayesian Statistics The multivariate normal gamma distribution We say that µ, φ have a multivariate normal gamma prior with parameters −1 a b m, V , 2, 2 if 1 µ|φ ∼ N m, V φ a b φ ∼ G , . 2 2 −1 a b In this case, we write µ, φ ∼ N G m, V , 2, 2 . Bayesian Statistics The marginal distribution of µ In this case, the marginal distribution of µ is a multivariate, non-central t distribution. Definition 23 A(k-dimensional) random variable, T = (T1,...,Tk), has a multivariate t distribution with parameters d, µT , ΣT if d+k d+k − 2 Γ 2 1 T −1 f(t) = k 1 + (t − µT ) ΣT (t − µT ) . 2 1/2 d d (πd) |ΣT | Γ 2 In this case, we write T ∼ T (µT , ΣT , d). The following theorem gives the density of µ. Theorem 34 −1 a b Let µ, φ ∼ N G m, V , 2, 2 . Then the marginal density of µ is µ ∼ b T m, aV, a . Bayesian Statistics Proof Z ∞ p(µ) = p(µ, φ) dφ 0 Z ∞ = p(µ|φ)p(φ) dφ 0 a ∞ b Z 1 φ 2 a b = exp − (µ − m)T V−1(µ − m) 2 φ2−1 exp − φ dφ k 1 a 0 (2π) 2 |V|2 2 Γ 2 2 a b ∞ 1 2 Z a+k φ h i = 2 φ 2 −1 exp − b + (µ − m)T V−1(µ − m) dφ k 1 a (2π) 2 |V|2 Γ 2 0 2 a !−a+k 1 b 2 a + k b + (µ − m)T V−1(µ − m) 2 = 2 Γ k 1 a (2π) 2 |V|2 Γ 2 2 2 and, reordering terms proves the result. Bayesian Statistics The posterior distribution of µ, φ Theorem 35 1 Let X|µ, φ ∼ N µ, φC and assume the prior distributions µ|φ ∼ 1 a b N m, φV and φ ∼ G 2, 2 . Then, given sample data x, we have 1 µ|x, φ ∼ N m?, V? φ a? b? φ|x ∼ G , where 2 2 −1 V? = V−1 + nC−1 m? = V? V−1m + nC−1x¯ a? = a + nk b? = b + tr SC−1 + mT V−1m + nx¯T C−1x¯ − m?V?−1m?. Proof Exercise. The proof is analogous to the univariate case. Bayesian Statistics A simplification In the case where V ∝ C, we have a simplified result, similar to that for the univariate case. Theorem 36 −1 a b Let µ, φ ∼ N G m, αC , 2, 2 . Then, αm + nx¯ 1 µ|φ, x ∼ N , C α + n (α + n)φ αn T −1 a + n b + tr S + α+n(m − x¯)(m − x¯) C φ|x ∼ G , . 2 2 1 Proof This follows from the previous theorem substituting V = αC. Bayesian Statistics Results with the reference prior Theorem 37 1 Given the prior p(µ, φ) ∝ φ, then the posterior distribution is nk φ p(µ, φ|x) ∝ φ 2 −1 exp − tr SC−1 + n(µ − x¯)T C−1(µ − x¯) 2 1 µ|x, φ ∼ N x¯, C nφ ! (n − 1)k tr SC−1 φ|x ∼ G , 2 2 ! tr SC−1 µ|x ∼ T x¯, C, (n − 1)k . n(n − 1)k −1 1 (n−1)k tr(SC ) Proof µ, φ|x ∼ N G x¯, nC, 2 , 2 and the rest follows. Bayesian Statistics Conjugate inference for the multivariate normal distribution II: Σ unknown In this case, it is useful to reparameterize the normal distribution in terms of the precision matrix Φ = Σ−1 when the normal likelihood function becomes n 1 l(µ, Φ|x) ∝ |Φ| 2 exp − tr (SΦ) + n(µ − x¯)T Φ(µ − x¯) 2 It is clear that a conjugate prior for µ and Φ must take a similar form to the likelihood. This is a normal-Wishart distribution. Bayesian Statistics The normal Wishart distribution Definition 24 A k × k dimensional symmetric, positive definite random variable W is said to have a Wishart distribution with parameters d and V if d−k−1 |W| 2 1 f(W) = exp − tr V−1W dk d k(k−1) k 2 2 4 Q d+1−i 2 2 |V | π i=1 Γ 2 where d > k − 1. In this case, E[W] = dV and we write W ∼ W(d, V). If W ∼ W(d, V), then the distribution of W−1 is said to be an inverse Wishart distribution, W−1 ∼ IW d, V−1 with mean E W−1 = 1 −1 d−k−1V . Bayesian Statistics Theorem 38 −1 1 −1 Suppose that X|µ, Φ ∼ N µ, Φ and let µ|Φ ∼ N m, αΦ and Φ ∼ W(d, W). Then: αm + nx¯ 1 µ|Φ, x ∼ N , Φ−1 α + n α + n αn Φ|x ∼ W d + nk, W−1 + S + (m − x¯)(m − x¯)T α + n Proof Exercise. We can also derive a limiting prior distribution by letting d → 0 when k+1 p(Φ) ∝ |Φ| 2 when the posterior distribution is 1 µ|Φ, x ∼ N x¯, Φ−1 Φ|x ∼ W (n(k − 1), S) . n Bayesian Statistics Semi-conjugate inference via Gibbs sampling The conjugacy assumption that the prior precision of µ is proportional to the model precision Σ is very strong in many cases. Often, we may simply wish to use a prior distribution of form µ ∼ N (m, V) where m and V are known and a Wishart prior for Φ, say φ ∼ W(d, W) as earlier. In this case, the conditional posterior distributions are −1 −1 µ|Φ, x ∼ N V−1 + nΦ V−1m + nΦx¯ , V−1 + nΦ Φ|µ, x ∼ W d + n, W−1 + S + n(µ − x¯)(µ − x¯)T and therefore, it is straightforward to set up a Gibbs sampling algorithm to sample the joint posterior, as in the univariate case. Bayesian Statistics Aside: sampling the multivariate normal, multivariate t and Wishart distributions Samplers for the multivariate normal distribution (usually based on the Cholesky decomposition) are available in most statistical packages such as R or Matlab. Sampling the multivariate t distribution is only slightly more complicated. Assume that we wish to sample from T ∼ T (µ, Σ, d). Then from Theorem 34, the distribution of T is the same as the marginal distribution of T in the two stage model 1 d d T|φ ∼ N µ, Σ φ ∼ G , .
Recommended publications
  • Generalized Linear Models
    Generalized Linear Models A generalized linear model (GLM) consists of three parts. i) The first part is a random variable giving the conditional distribution of a response Yi given the values of a set of covariates Xij. In the original work on GLM’sby Nelder and Wedderburn (1972) this random variable was a member of an exponential family, but later work has extended beyond this class of random variables. ii) The second part is a linear predictor, i = + 1Xi1 + 2Xi2 + + ··· kXik . iii) The third part is a smooth and invertible link function g(.) which transforms the expected value of the response variable, i = E(Yi) , and is equal to the linear predictor: g(i) = i = + 1Xi1 + 2Xi2 + + kXik. ··· As shown in Tables 15.1 and 15.2, both the general linear model that we have studied extensively and the logistic regression model from Chapter 14 are special cases of this model. One property of members of the exponential family of distributions is that the conditional variance of the response is a function of its mean, (), and possibly a dispersion parameter . The expressions for the variance functions for common members of the exponential family are shown in Table 15.2. Also, for each distribution there is a so-called canonical link function, which simplifies some of the GLM calculations, which is also shown in Table 15.2. Estimation and Testing for GLMs Parameter estimation in GLMs is conducted by the method of maximum likelihood. As with logistic regression models from the last chapter, the generalization of the residual sums of squares from the general linear model is the residual deviance, Dm 2(log Ls log Lm), where Lm is the maximized likelihood for the model of interest, and Ls is the maximized likelihood for a saturated model, which has one parameter per observation and fits the data as well as possible.
    [Show full text]
  • Robust Bayesian General Linear Models ⁎ W.D
    www.elsevier.com/locate/ynimg NeuroImage 36 (2007) 661–671 Robust Bayesian general linear models ⁎ W.D. Penny, J. Kilner, and F. Blankenburg Wellcome Department of Imaging Neuroscience, University College London, 12 Queen Square, London WC1N 3BG, UK Received 22 September 2006; revised 20 November 2006; accepted 25 January 2007 Available online 7 May 2007 We describe a Bayesian learning algorithm for Robust General Linear them from the data (Jung et al., 1999). This is, however, a non- Models (RGLMs). The noise is modeled as a Mixture of Gaussians automatic process and will typically require user intervention to rather than the usual single Gaussian. This allows different data points disambiguate the discovered components. In fMRI, autoregressive to be associated with different noise levels and effectively provides a (AR) modeling can be used to downweight the impact of periodic robust estimation of regression coefficients. A variational inference respiratory or cardiac noise sources (Penny et al., 2003). More framework is used to prevent overfitting and provides a model order recently, a number of approaches based on robust regression have selection criterion for noise model order. This allows the RGLM to default to the usual GLM when robustness is not required. The method been applied to imaging data (Wager et al., 2005; Diedrichsen and is compared to other robust regression methods and applied to Shadmehr, 2005). These approaches relax the assumption under- synthetic data and fMRI. lying ordinary regression that the errors be normally (Wager et al., © 2007 Elsevier Inc. All rights reserved. 2005) or identically (Diedrichsen and Shadmehr, 2005) distributed. In Wager et al.
    [Show full text]
  • Generalized Linear Models Outline for Today
    Generalized linear models Outline for today • What is a generalized linear model • Linear predictors and link functions • Example: estimate a proportion • Analysis of deviance • Example: fit dose-response data using logistic regression • Example: fit count data using a log-linear model • Advantages and assumptions of glm • Quasi-likelihood models when there is excessive variance Review: what is a (general) linear model A model of the following form: Y = β0 + β1X1 + β2 X 2 + ...+ error • Y is the response variable • The X ‘s are the explanatory variables • The β ‘s are the parameters of the linear equation • The errors are normally distributed with equal variance at all values of the X variables. • Uses least squares to fit model to data, estimate parameters • lm in R The predicted Y, symbolized here by µ, is modeled as µ = β0 + β1X1 + β2 X 2 + ... What is a generalized linear model A model whose predicted values are of the form g(µ) = β0 + β1X1 + β2 X 2 + ... • The model still include a linear predictor (to right of “=”) • g(µ) is the “link function” • Wide diversity of link functions accommodated • Non-normal distributions of errors OK (specified by link function) • Unequal error variances OK (specified by link function) • Uses maximum likelihood to estimate parameters • Uses log-likelihood ratio tests to test parameters • glm in R The two most common link functions Log • used for count data η η = logµ The inverse function is µ = e Logistic or logit • used for binary data µ eη η = log µ = 1− µ The inverse function is 1+ eη In all cases log refers to natural logarithm (base e).
    [Show full text]
  • Stat 714 Linear Statistical Models
    STAT 714 LINEAR STATISTICAL MODELS Fall, 2010 Lecture Notes Joshua M. Tebbs Department of Statistics The University of South Carolina TABLE OF CONTENTS STAT 714, J. TEBBS Contents 1 Examples of the General Linear Model 1 2 The Linear Least Squares Problem 13 2.1 Least squares estimation . 13 2.2 Geometric considerations . 15 2.3 Reparameterization . 25 3 Estimability and Least Squares Estimators 28 3.1 Introduction . 28 3.2 Estimability . 28 3.2.1 One-way ANOVA . 35 3.2.2 Two-way crossed ANOVA with no interaction . 37 3.2.3 Two-way crossed ANOVA with interaction . 39 3.3 Reparameterization . 40 3.4 Forcing least squares solutions using linear constraints . 46 4 The Gauss-Markov Model 54 4.1 Introduction . 54 4.2 The Gauss-Markov Theorem . 55 4.3 Estimation of σ2 in the GM model . 57 4.4 Implications of model selection . 60 4.4.1 Underfitting (Misspecification) . 60 4.4.2 Overfitting . 61 4.5 The Aitken model and generalized least squares . 63 5 Distributional Theory 68 5.1 Introduction . 68 i TABLE OF CONTENTS STAT 714, J. TEBBS 5.2 Multivariate normal distribution . 69 5.2.1 Probability density function . 69 5.2.2 Moment generating functions . 70 5.2.3 Properties . 72 5.2.4 Less-than-full-rank normal distributions . 73 5.2.5 Independence results . 74 5.2.6 Conditional distributions . 76 5.3 Noncentral χ2 distribution . 77 5.4 Noncentral F distribution . 79 5.5 Distributions of quadratic forms . 81 5.6 Independence of quadratic forms . 85 5.7 Cochran's Theorem .
    [Show full text]
  • Iterative Approaches to Handling Heteroscedasticity with Partially Known Error Variances
    International Journal of Statistics and Probability; Vol. 8, No. 2; March 2019 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Iterative Approaches to Handling Heteroscedasticity With Partially Known Error Variances Morteza Marzjarani Correspondence: Morteza Marzjarani, National Marine Fisheries Service, Southeast Fisheries Science Center, Galveston Laboratory, 4700 Avenue U, Galveston, Texas 77551, USA. Received: January 30, 2019 Accepted: February 20, 2019 Online Published: February 22, 2019 doi:10.5539/ijsp.v8n2p159 URL: https://doi.org/10.5539/ijsp.v8n2p159 Abstract Heteroscedasticity plays an important role in data analysis. In this article, this issue along with a few different approaches for handling heteroscedasticity are presented. First, an iterative weighted least square (IRLS) and an iterative feasible generalized least square (IFGLS) are deployed and proper weights for reducing heteroscedasticity are determined. Next, a new approach for handling heteroscedasticity is introduced. In this approach, through fitting a multiple linear regression (MLR) model or a general linear model (GLM) to a sufficiently large data set, the data is divided into two parts through the inspection of the residuals based on the results of testing for heteroscedasticity, or via simulations. The first part contains the records where the absolute values of the residuals could be assumed small enough to the point that heteroscedasticity would be ignorable. Under this assumption, the error variances are small and close to their neighboring points. Such error variances could be assumed known (but, not necessarily equal).The second or the remaining portion of the said data is categorized as heteroscedastic. Through real data sets, it is concluded that this approach reduces the number of unusual (such as influential) data points suggested for further inspection and more importantly, it will lowers the root MSE (RMSE) resulting in a more robust set of parameter estimates.
    [Show full text]
  • Bayes Estimates for the Linear Model
    Bayes Estimates for the Linear Model By D. V. LINDLEY AND A. F. M. SMITH University College, London [Read before the ROYAL STATISTICAL SOCIETY at a meeting organized by the RESEARCH SECTION on Wednesday, December 8th, 1971, Mr M. J. R. HEALY in the Chair] SUMMARY The usual linear statistical model is reanalyzed using Bayesian methods and the concept of exchangeability. The general method is illustrated by applica­ tions to two-factor experimental designs and multiple regression. Keywords: LINEAR MODEL; LEAST SQUARES; BAYES ESTIMATES; EXCHANGEABILITY; ADMISSIBILITY; TWO-FACTOR EXPERIMENTAL DESIGN; MULTIPLE REGRESSION; RIDGE REGRESSION; MATRIX INVERSION. INTRODUCTION ATIENTION is confined in this paper to the linear model, E(y) = Ae, where y is a vector of observations, A a known design matrix and e a vector of unknown para­ meters. The usual estimate of e employed in this situation is that derived by the method of least squares. We argue that it is typically true that there is available prior information about the parameters and that this may be exploited to find improved, and sometimes substantially improved, estimates. In this paper we explore a particular form of prior information based on de Finetti's (1964) important idea of exchangeability. The argument is entirely within the Bayesian framework. Recently there has been much discussion of the respective merits of Bayesian and non-Bayesian approaches to statistics: we cite, for example, the paper by Cornfield (1969) and its ensuing discussion. We do not feel that it is necessary or desirable to add to this type of literature, and since we know of no reasoned argument against the Bayesian position we have adopted it here.
    [Show full text]
  • Design and Analysis of Ecological Data Landscape of Statistical Methods: Part 1
    Design and Analysis of Ecological Data Landscape of Statistical Methods: Part 1 1. The landscape of statistical methods. 2 2. General linear models.. 4 3. Nonlinearity. 11 4. Nonlinearity and nonnormal errors (generalized linear models). 16 5. Heterogeneous errors. 18 *Much of the material in this section is taken from Bolker (2008) and Zur et al. (2009) Landscape of statistical methods: part 1 2 1. The landscape of statistical methods The field of ecological modeling has grown amazingly complex over the years. There are now methods for tackling just about any problem. One of the greatest challenges in learning statistics is figuring out how the various methods relate to each other and determining which method is most appropriate for any particular problem. Unfortunately, the plethora of statistical methods defy simple classification. Instead of trying to fit methods into clearly defined boxes, it is easier and more meaningful to think about the factors that help distinguish among methods. In this final section, we will briefly review these factors with the aim of portraying the “landscape” of statistical methods in ecological modeling. Importantly, this treatment is not meant to be an exhaustive survey of statistical methods, as there are many other methods that we will not consider here because they are not commonly employed in ecology. In the end, the choice of a particular method and its interpretation will depend heavily on whether the purpose of the analysis is descriptive or inferential, the number and types of variables (i.e., dependent, independent, or interdependent) and the type of data (e.g., continuous, count, proportion, binary, time at death, time series, circular).
    [Show full text]
  • Generalized Linear Models Link Function the Logistic Equation Is
    Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1 Generalized Linear Models Link Function The logistic equation is stated in terms of the probability that Y = 1, which is π, and the probability that Y = 0, which is 1 - π. π ln =αβ + X 1−π The left-hand side of the equation represents the logit transformation, which takes the natural log of the ratio of the probability that Y is equal to 1 compared to the probability that it is not equal to one. As we know, the probability, π, is just the mean of the Y values, assuming 0,1 coding, which is often expressed as µ. The logit transformation could then be written in terms of the mean rather than the probability, µ ln =αβ + X 1− µ The transformation of the mean represents a link to the central tendency of the distribution, sometimes called the location, one of the important defining aspects of any given probability distribution. The log transformation represents a kind of link function (often canonical link function)1 that is sometimes given more generally as g(.), with the letter g used as an arbitrary name for a mathematical function and the use of the “.” within the parentheses to suggest that any variable, value, or function (the argument) could be placed within. For logistic regression, this is known as the logit link function. The right hand side of the equation, α + βX, is the familiar equation for the regression line and represents a linear combination of the parameters for the regression. The concept of this logistic link function can generalized to any other distribution, with the simplest, most familiar case being the ordinary least squares or linear regression model.
    [Show full text]
  • Short Introduc,On to the Angeneral Linear Model Introduction to Neuroimagingfor Neuroimaging Analysis
    OXFORD NEUROIMAGING PRIMERS Short introduc,on to the AnGeneral Linear Model Introduction to Neuroimagingfor Neuroimaging Analysis Mark Jenkinson Mark Jenkinson Janine Bijsterbosch Michael Chappell Michael Chappell Anderson Winkler Series editors: Series editors: Mark Jenkinson and Michael ChappellMark Jenkinson and Michael Chappell 2PRIMER APPENDIX List of Primers Series Editors: Mark Jenkinson and Michael Chappell Introduc1on to Neuroimaging Analysis Mark Jenkinson Michael Chappell Introduc1on to Perfusion Quan1fica1on using Arterial Spin Labelling Michael Chappell Bradley MacIntosh Thomas Okell Introduc1on to Res1ng State fMRI Func1onal Connec1vity Janine Bijsterbosch Stephen Smith Chris<an Beckmann List of Primer Appendices Short Introduc1on to Brain Anatomy for Neuroimaging Short Introduc1on to MRI Physics for Neuroimaging Short Introduc1on to MRI Safety for Neuroimaging Short Introduc1on to the General Linear Model for Neuroimaging Copyright Por<ons of this material were originally published in Introduc)on to Neuroimaging Analysis authored by Mark Jenkinson and Michael Chappell, and in Introduc)on to Res)ng State fMRI Func)onal Connec)vity authored by Janine Bijsterbosch, Stephen Smith, and Chris<an Beckmann, and have been reproduced by permission of Oxford University Press: hGps://global.oup.com/ academic/product/introduc<on-to-neuroimaging-analysis-9780198816300 and hGps:// global.oup.com/academic/product/introduc<on-to-res<ng-state-fmri-func<onal- connec<vity-9780198808220. For permission to reuse this material, please visit: hGp:// www.oup.co.uk/academic/rights/permissions. SHORT INTRO TO THE GLM ii Preface This text is one of a number of appendices to the Oxford Neuroimaging Primers, designed to provide extra details and informa<on that someone reading one of the primers might find helpful, but where it is not crucial to the understanding of the main material.
    [Show full text]
  • General Linear Models - Part I
    Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby October 2010 Henrik MadsenPoul Thyregod (IMM-DTU) Chapman & Hall October 2010 1 / 37 Today The general linear model - intro The multivariate normal distribution Deviance Likelihood, score function and information matrix The general linear model - definition Estimation Fitted values Residuals Partitioning of variation Likelihood ratio tests The coefficient of determination Henrik MadsenPoul Thyregod (IMM-DTU) Chapman & Hall October 2010 2 / 37 The general linear model - intro The general linear model - intro We will use the term classical GLM for the General linear model to distinguish it from GLM which is used for the Generalized linear model. The classical GLM leads to a unique way of describing the variations of experiments with a continuous variable. The classical GLM's include Regression analysis Analysis of variance - ANOVA Analysis of covariance - ANCOVA The residuals are assumed to follow a multivariate normal distribution in the classical GLM. Henrik MadsenPoul Thyregod (IMM-DTU) Chapman & Hall October 2010 3 / 37 The general linear model - intro The general linear model - intro Classical GLM's are naturally studied in the framework of the multivariate normal distribution. We will consider the set of n observations as a sample from a n-dimensional normal distribution. Under the normal distribution model, maximum-likelihood estimation of mean value parameters may be interpreted geometrically as projection on an appropriate subspace. The likelihood-ratio test statistics for model reduction may be expressed in terms of norms of these projections.
    [Show full text]
  • IBM SPSS Advanced Statistics Business Analytics
    IBM Software IBM SPSS Advanced Statistics Business Analytics IBM SPSS Advanced Statistics More accurately analyze complex relationships Make your analysis more accurate and reach more dependable Highlights conclusions with statistics designed to fit the inherent characteristics of data describing complex relationships. IBM® SPSS® Advanced • Build flexible models using a wealth Statistics provides a powerful set of sophisticated univariate and of model-building options. multivariate analytical techniques for real-world problems, such as: • Achieve more accurate predictive models using a wide range of modeling • Medical research—Analyze patient survival rates techniques. • Manufacturing—Assess production processes • Identify random effects. • Pharmaceutical—Report test results to the FDA • Market research—Determine product interest levels • Analyze results using various methods. Access a wide range of powerful models SPSS Advanced Statistics offers generalized linear mixed models (GLMM), general linear models (GLM), mixed models procedures, generalized linear models (GENLIN) and generalized estimating equations (GEE) procedures. Generalized linear mixed models include a wide variety of models, from simple linear regression to complex multilevel models for non-normal longitudinal data. The GLMM procedure produces more accurate models when predicting nonlinear outcomes (for example, what product a customer is likely to buy) by taking into account hierarchical data structures (for example, a customer nested within an organization). The GLMM
    [Show full text]
  • Linear Models
    Linear Models 1. The basic model We now study a linear statistical model. That is, we study the models 1 where the observations Y := (Y Y ) has the following assumed property: Y = Xβ + ε 0 1 −1 where β := (β β β ) is a vector of unknown parameters, and 10 1−1 ··· . X := . 0 −1 ··· is the socalled “regression matrix,” or “design matrix.” The elements of the × matrix X are assumed to be known; these are the “descriptive” or “explanatory” variables, and the randomness of the observed values is 1 inherited from the “noise vector,” ε := (ε ε ) , which we may think of as being “typically small.” Note that we are changing our notation slightly; X is no longer assumed to be a random vector [this is done in order to conform with the historical development of the subject]. Throughout, we assume always that2 the ε ’s are independent with mean zero and common variance σ , where σ>0 is possibly [in fact, typically] unknown. In particular, it follows from this assumption that 2 Eε = 0 and Var(ε)=σ I (1) 41 42 7. Linear Models Let us emphasize that our linear model, once written out coordinate- wise, is 0 0 −1 −1 Y = β + ···+ β + ε (1 ≤ ≤ ) It is intuitively clear that unless ≥ , we cannot hope to effectively use2 our observed values in order to estimate the +1 unknowns 0 −1 σ β β . Therefore, we assume always that ≥ +1 (2) This condition guarantees that our linear model is not overspecified. The best-studied linear models are the normal models.
    [Show full text]