General Linear Models: an Integrated Approach to Statistics

Total Page:16

File Type:pdf, Size:1020Kb

General Linear Models: an Integrated Approach to Statistics Tutorial in Quantitative Methods for Psychology 2008, Vol. 4(2), p. 65‐78. General Linear Models: An Integrated Approach to Statistics Sylvain Chartier Andrew Faulkner University of Ottawa Generally, in psychology, the various statistical analyses are taught independently from each other. As a consequence, students struggle to learn new statistical analyses, in contexts that differ from their textbooks. This paper gives a short introduction to the general linear model (GLM), in which it is showed that ANOVA (one‐way, factorial, repeated measure and analysis of covariance) is simply a multiple correlation/regression analysis (MCRA). Generalizations to other cases, such as multivariate and nonlinear analysis, are also discussed. It can easily be shown that every popular linear analysis can be derived from understanding MCRA. The most commonly used statistical analyses are concepts – which is absolutely necessary to do any work in ANOVA, t‐tests, and regressions (Cousineau, 2005). These research. According to Tatsuoka (1988): are taught as independent modules during the training of “… much is to be gained by the student’s going psychology students, which sadly fragments their through the calculations by hand … Students who knowledge into pieces that seem different and disconnected have undergone this sort of learning experience will from one another. Multivariate statistic books are no be more likely to develop a thorough understanding stranger to this practice (e.g. Howell, 2002; Shavelson, 1996; of the major steps involved in a sequence of Stevens, 1992; Tabachnick & Fidell, 2001) – the vast majority computation than will those who, from the outset, give only a very brief introduction to the subject. This is leave all the “busy work” to the computer.” even worse in univariate books, in which even such cursory Consequently, after their mandatory courses, students still treatment is lacking. With this is combined with the fact that usually have difficulties choosing the correct statistical test mathematical reasoning is not a priority in most psychology for their data. A student who masters software does not curricula (Giguère, Hélie, & Cousineau, 2004), the net result indicate his comprehension – all he/she shows is a bit of is a general misunderstanding of statistics among many technical competence. students of social science. Although most psychology teachers know that ANOVA In addition, more time is devoted to using how‐to‐do‐it and regression are linked through the general linear model computer tools (due to their accessibility and user‐friendly (GLM), few actually teach it in their courses. GLM offers a interfaces), and less time is spent on understanding the unique pedagogical perspective to provide such a unified view of statistical testing. To provide a more detailed explanation, it will be shown that ANOVA ‐ as well as the t‐ Address correspondence to Sylvain Chartier, University of test ‐ are simply special cases of multiple Ottawa, School of Psychology, 125 University, Ottawa, correlation/regression analysis (MCRA). Table 1 shows the Ontario, K1N 6N5, E‐mail: [email protected]. The different analyses that can be derived from MCRA given the authors are grateful to an anonymous reviewer and Jean‐ number of independent variables (IV), and the number of François Ferry for their help in reviewing the manuscript dependant variables (DV). Therefore, if the IV is nominal and Jean‐François Allaire and Isabelle Smith for their and there is only one continuous DV, the ANOVA is a comments in reviewing a previous version of the special case of MCRA. Thus, a general framework can be manuscript. 65 66 Table 1. Univariate and multivariate representations of the GLM. DV Form IV Form Type of analysis 1 nominal 1 nominal Phi coefficient / Chi‐square 1 continuous 1 nominal t‐test 1 nominal ≥1 continuous and/or Logistic regression / GLM nominal Discriminant function 1 continuous 1 continuous Simple correlation / regression 1 continuous ≥2 nominal ANOVA Univariate (one‐wa y, factorial, repeated measure) 1 continuous ≥2 continuous and ANCOVA nominal 1 continuous ≥2 continuous Multiple correlation / regression ≥ 2 nominal ≥2 nominal Correspondence ≥ 2 nominal ≥2 continuous and/or Multivariate logistic regression / Discriminant nominal functions ≥ 2 continuous ≥2 nominal MANOVA GLM ≥ 2 continuous ≥1 latent Principal component / Factor ≥ 2 continuous ≥1 latent Multidimensional scaling Multivariate and/or nominal ≥ 2 continuous ≥1 continuous and/or Structural equation modeling and/or latent latent ≥ 2 continuous ≥2 continuous Canonical correlation Note: DV, dependent va riable, IV, independent variable taught in which statistical methods are viewed as a whole, ANOVA (repeated, one‐way, factorial, covariance), the same which would facilitate their comprehension and application. three steps are always involved: compute the appropriate However, to really understand statistical analysis, linear coding matrix, create the SSCP matrix, and calculate the R‐ algebra must be used, which, paradoxically, is not part of squared. More precisely, a description and review of the mandatory psychology curriculum in many universities. multiple correlation analysis (MCRA) and simple This once again reinforces the disadvantage that psychology correlation/regression analysis (SCRA) is provided. students (unwittingly) endure in their training, compared to Subsequently, ANOVA, factorial ANOVA, and repeated‐ other sciences (Giguère, Hélie, & Cousineau, 2004). While is measures are presented as special cases of MCRA, in that should be noted that linear algebra is briefly introduced in order. The second part of the paper asserts the various links some multivariate statistics books (e.g. Tabachnick & Fidell, between MCRA and ANOVA with some numerical 2001; Tatsuoka, 1988), far stronger material is found in books examples using one‐way ANOVA and repeated‐measures about linear algebra itself (e.g. Lipschutz & Lipson, 2001; ANOVA. The final section discusses the multivariate case Strang, 1988). and nonlinear analysis (the generalized linear model). The information in this paper is primarily a synthesis of knowledge first presented in Cohen & Cohen (1983), Multiple Correlation/Regression Analysis Tatsuoka (1988), Kutner, Nachtsheim, Neter, & Li (2005), The purpose of MCRA is to determine the strength of and Morrison (1976), and it is mainly divided into three correlation between a criterion (the dependent variable) and parts. The first shows that when using MCRA or any kind of multiple predictors (the independent variables). If a 67 functional relationship is desired, then multiple regression junction of the second row and second column will be used analysis can be performed. to estimate the criterion variance: ⎡ nn⎤ ()x −−xxxy2 ()()−y Sum of square and cross product (SSCP) matrix and the R‐ ⎢ ∑∑iii⎥ ⎢ ii==11⎥ squared SSCPSCRA = nn (5) ⎢ 2 ⎥ ⎢∑∑()()yyxxii−− () yyi − ⎥ Different variables can be expressed, using standard ⎣⎢ ii==11⎦⎥ matrix notations, as follows. The predictor variables are where x and y represent the mean of the predictors and of defined as: the criterion, respectively. ⎡⎤x11xx 12 L 1p ⎢⎥ By partitioning the SSCP matrix correctly, the coefficient x21xx 22 L 2 p 2 X = ⎢⎥ (1) of determination ‐ R‐squared (R ) ‐ can be obtained. This is ⎢⎥MMOM ⎢⎥ done by dividing the SSCP into four sectors, which we x xx ⎣⎦⎢⎥nn12L np name Spp, Spc, Scp, Scc. These are (in order), the sum of squares and the criterion variable is defined as: of the predictors alone, the sum of cross‐products between ⎡⎤y1 the predictors and the criterion, the sum of cross‐products ⎢⎥ y between the criterion and the predictors (note that Scp = SpcT), y = ⎢⎥2 (2) ⎢⎥M and finally the sum of squares of the criterion alone. ⎢⎥ ⎣⎦yn ⎡sspp pc ⎤ SSCP = ⎢ ⎥ (6) where X is matrix of dimension n×p, y a matrix (vector) of ⎢ss⎥ ⎣ cp cc ⎦ 1×n and, n and p are the number of participants and Once this is drawn up, the coefficient of determination predictors, respectively. These two matrices can be put into (0≤R2≤1) can be obtained with the following matrix a single matrix : multiplication: ⎡⎤x11xx 12 L 1p y 1 2T1−−1 R = SSSS (7) ⎢⎥x xxy pcpppccc ⎢⎥21 22 L 2p 2 M = (3) In the particular case of SCRA, each element of the SSCP ⎢⎥MMOMM ⎢⎥matrix is a scalar, and thus the R2 will also be a scalar. It is ⎣⎦⎢⎥xnn12xxL npny given by the following: where xij represents the jth predictor of the ith participant, yi nn−1 th 22⎛⎞ the criterion of the i participant. From the M matrix, the RyyxxxxSCRA =−−∑∑()()()i i ⎜⎟i − ii==11⎝⎠ sum of squares and cross product (SSCP) matrix can be nn−1 computed. Useful information such as the variance, ⎛2 ⎞ ×−−∑∑()()()xxyyii⎜ yy i −⎟ (8) covariance, and R‐squared can also be extracted from the ii==11⎝⎠ 2 SSCP matrix, which is obtained by: ⎛⎞n ()()xxyy−− SSCP=−()() M MT M − M ⎜⎟∑ ii = ⎝⎠i=1 TT nn =−MM MM (4) 22 ∑∑()()xxii−− yy =−MMTTTT()()/ 1M 1M n ii==11 where 1 is defined as a vector (of dimension n) in which all Equation 8 is the standard way to obtain the coefficient of elements are equal to 1, M is a means‐score matrix (of determination. If the standard bivariate correlation (RSCRA) is 2 dimension n × p+1) where the mean of each column is desired, then one must find the square root of R SCRA, as repeated over n lines, and T denotes the matrix transpose found by equation 8, which must then be multiplied by the operation. If the SSCP is divided by the corresponding sign (+ or ‐) of Spc (direction of the covariance). 2 R2 degrees of freedom (n‐1), then the variance/covariance For an unbiased estimate of R ( % ), the following matrix is obtained. Thus, the SSCP is a convenient way to correction must be applied: (1−−Rn2 )( 1) represent a lot of information about variability in a single R% 2 =−1 (9) (1)np−− matrix. Naturally, the same can be found for SCRA using Equation 4. In that case, the matrix will be reduced to a 2 by This is called the shrunken R‐squared, or the adjusted R‐ 2 format.
Recommended publications
  • Arxiv:1910.08883V3 [Stat.ML] 2 Apr 2021 in Real Data [8,9]
    Nonpar MANOVA via Independence Testing Sambit Panda1;2, Cencheng Shen3, Ronan Perry1, Jelle Zorn4, Antoine Lutz4, Carey E. Priebe5 and Joshua T. Vogelstein1;2;6∗ Abstract. The k-sample testing problem tests whether or not k groups of data points are sampled from the same distri- bution. Multivariate analysis of variance (Manova) is currently the gold standard for k-sample testing but makes strong, often inappropriate, parametric assumptions. Moreover, independence testing and k-sample testing are tightly related, and there are many nonparametric multivariate independence tests with strong theoretical and em- pirical properties, including distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic). We prove that universally consistent independence tests achieve universally consistent k-sample testing, and that k- sample statistics like Energy and Maximum Mean Discrepancy (MMD) are exactly equivalent to Dcorr. Empirically evaluating these tests for k-sample-scenarios demonstrates that these nonparametric independence tests typically outperform Manova, even for Gaussian distributed settings. Finally, we extend these non-parametric k-sample- testing procedures to perform multiway and multilevel tests. Thus, we illustrate the existence of many theoretically motivated and empirically performant k-sample-tests. A Python package with all independence and k-sample tests called hyppo is available from https://hyppo.neurodata.io/. 1 Introduction A fundamental problem in statistics is the k-sample testing problem. Consider the p p two-sample problem: we obtain two datasets ui 2 R for i = 1; : : : ; n and vj 2 R for j = 1; : : : ; m. Assume each ui is sampled independently and identically (i.i.d.) from FU and that each vj is sampled i.i.d.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Generalized Linear Models
    Generalized Linear Models A generalized linear model (GLM) consists of three parts. i) The first part is a random variable giving the conditional distribution of a response Yi given the values of a set of covariates Xij. In the original work on GLM’sby Nelder and Wedderburn (1972) this random variable was a member of an exponential family, but later work has extended beyond this class of random variables. ii) The second part is a linear predictor, i = + 1Xi1 + 2Xi2 + + ··· kXik . iii) The third part is a smooth and invertible link function g(.) which transforms the expected value of the response variable, i = E(Yi) , and is equal to the linear predictor: g(i) = i = + 1Xi1 + 2Xi2 + + kXik. ··· As shown in Tables 15.1 and 15.2, both the general linear model that we have studied extensively and the logistic regression model from Chapter 14 are special cases of this model. One property of members of the exponential family of distributions is that the conditional variance of the response is a function of its mean, (), and possibly a dispersion parameter . The expressions for the variance functions for common members of the exponential family are shown in Table 15.2. Also, for each distribution there is a so-called canonical link function, which simplifies some of the GLM calculations, which is also shown in Table 15.2. Estimation and Testing for GLMs Parameter estimation in GLMs is conducted by the method of maximum likelihood. As with logistic regression models from the last chapter, the generalization of the residual sums of squares from the general linear model is the residual deviance, Dm 2(log Ls log Lm), where Lm is the maximized likelihood for the model of interest, and Ls is the maximized likelihood for a saturated model, which has one parameter per observation and fits the data as well as possible.
    [Show full text]
  • Univariate and Multivariate Skewness and Kurtosis 1
    Running head: UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 1 Univariate and Multivariate Skewness and Kurtosis for Measuring Nonnormality: Prevalence, Influence and Estimation Meghan K. Cain, Zhiyong Zhang, and Ke-Hai Yuan University of Notre Dame Author Note This research is supported by a grant from the U.S. Department of Education (R305D140037). However, the contents of the paper do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. Correspondence concerning this article can be addressed to Meghan Cain ([email protected]), Ke-Hai Yuan ([email protected]), or Zhiyong Zhang ([email protected]), Department of Psychology, University of Notre Dame, 118 Haggar Hall, Notre Dame, IN 46556. UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 2 Abstract Nonnormality of univariate data has been extensively examined previously (Blanca et al., 2013; Micceri, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74% of univariate distributions and 68% multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17% in a t-test and 30% in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances.
    [Show full text]
  • Multivariate Chemometrics As a Strategy to Predict the Allergenic Nature of Food Proteins
    S S symmetry Article Multivariate Chemometrics as a Strategy to Predict the Allergenic Nature of Food Proteins Miroslava Nedyalkova 1 and Vasil Simeonov 2,* 1 Department of Inorganic Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, 1 James Bourchier Blvd., 1164 Sofia, Bulgaria; [email protected]fia.bg 2 Department of Analytical Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, 1 James Bourchier Blvd., 1164 Sofia, Bulgaria * Correspondence: [email protected]fia.bg Received: 3 September 2020; Accepted: 21 September 2020; Published: 29 September 2020 Abstract: The purpose of the present study is to develop a simple method for the classification of food proteins with respect to their allerginicity. The methods applied to solve the problem are well-known multivariate statistical approaches (hierarchical and non-hierarchical cluster analysis, two-way clustering, principal components and factor analysis) being a substantial part of modern exploratory data analysis (chemometrics). The methods were applied to a data set consisting of 18 food proteins (allergenic and non-allergenic). The results obtained convincingly showed that a successful separation of the two types of food proteins could be easily achieved with the selection of simple and accessible physicochemical and structural descriptors. The results from the present study could be of significant importance for distinguishing allergenic from non-allergenic food proteins without engaging complicated software methods and resources. The present study corresponds entirely to the concept of the journal and of the Special issue for searching of advanced chemometric strategies in solving structural problems of biomolecules. Keywords: food proteins; allergenicity; multivariate statistics; structural and physicochemical descriptors; classification 1.
    [Show full text]
  • Robust Bayesian General Linear Models ⁎ W.D
    www.elsevier.com/locate/ynimg NeuroImage 36 (2007) 661–671 Robust Bayesian general linear models ⁎ W.D. Penny, J. Kilner, and F. Blankenburg Wellcome Department of Imaging Neuroscience, University College London, 12 Queen Square, London WC1N 3BG, UK Received 22 September 2006; revised 20 November 2006; accepted 25 January 2007 Available online 7 May 2007 We describe a Bayesian learning algorithm for Robust General Linear them from the data (Jung et al., 1999). This is, however, a non- Models (RGLMs). The noise is modeled as a Mixture of Gaussians automatic process and will typically require user intervention to rather than the usual single Gaussian. This allows different data points disambiguate the discovered components. In fMRI, autoregressive to be associated with different noise levels and effectively provides a (AR) modeling can be used to downweight the impact of periodic robust estimation of regression coefficients. A variational inference respiratory or cardiac noise sources (Penny et al., 2003). More framework is used to prevent overfitting and provides a model order recently, a number of approaches based on robust regression have selection criterion for noise model order. This allows the RGLM to default to the usual GLM when robustness is not required. The method been applied to imaging data (Wager et al., 2005; Diedrichsen and is compared to other robust regression methods and applied to Shadmehr, 2005). These approaches relax the assumption under- synthetic data and fMRI. lying ordinary regression that the errors be normally (Wager et al., © 2007 Elsevier Inc. All rights reserved. 2005) or identically (Diedrichsen and Shadmehr, 2005) distributed. In Wager et al.
    [Show full text]
  • Generalized Linear Models Outline for Today
    Generalized linear models Outline for today • What is a generalized linear model • Linear predictors and link functions • Example: estimate a proportion • Analysis of deviance • Example: fit dose-response data using logistic regression • Example: fit count data using a log-linear model • Advantages and assumptions of glm • Quasi-likelihood models when there is excessive variance Review: what is a (general) linear model A model of the following form: Y = β0 + β1X1 + β2 X 2 + ...+ error • Y is the response variable • The X ‘s are the explanatory variables • The β ‘s are the parameters of the linear equation • The errors are normally distributed with equal variance at all values of the X variables. • Uses least squares to fit model to data, estimate parameters • lm in R The predicted Y, symbolized here by µ, is modeled as µ = β0 + β1X1 + β2 X 2 + ... What is a generalized linear model A model whose predicted values are of the form g(µ) = β0 + β1X1 + β2 X 2 + ... • The model still include a linear predictor (to right of “=”) • g(µ) is the “link function” • Wide diversity of link functions accommodated • Non-normal distributions of errors OK (specified by link function) • Unequal error variances OK (specified by link function) • Uses maximum likelihood to estimate parameters • Uses log-likelihood ratio tests to test parameters • glm in R The two most common link functions Log • used for count data η η = logµ The inverse function is µ = e Logistic or logit • used for binary data µ eη η = log µ = 1− µ The inverse function is 1+ eη In all cases log refers to natural logarithm (base e).
    [Show full text]
  • In the Term Multivariate Analysis Has Been Defined Variously by Different Authors and Has No Single Definition
    Newsom Psy 522/622 Multiple Regression and Multivariate Quantitative Methods, Winter 2021 1 Multivariate Analyses The term "multivariate" in the term multivariate analysis has been defined variously by different authors and has no single definition. Most statistics books on multivariate statistics define multivariate statistics as tests that involve multiple dependent (or response) variables together (Pituch & Stevens, 2105; Tabachnick & Fidell, 2013; Tatsuoka, 1988). But common usage by some researchers and authors also include analysis with multiple independent variables, such as multiple regression, in their definition of multivariate statistics (e.g., Huberty, 1994). Some analyses, such as principal components analysis or canonical correlation, really have no independent or dependent variable as such, but could be conceived of as analyses of multiple responses. A more strict statistical definition would define multivariate analysis in terms of two or more random variables and their multivariate distributions (e.g., Timm, 2002), in which joint distributions must be considered to understand the statistical tests. I suspect the statistical definition is what underlies the selection of analyses that are included in most multivariate statistics books. Multivariate statistics texts nearly always focus on continuous dependent variables, but continuous variables would not be required to consider an analysis multivariate. Huberty (1994) describes the goals of multivariate statistical tests as including prediction (of continuous outcomes or
    [Show full text]
  • Nonparametric Multivariate Kurtosis and Tailweight Measures
    Nonparametric Multivariate Kurtosis and Tailweight Measures Jin Wang1 Northern Arizona University and Robert Serfling2 University of Texas at Dallas November 2004 – final preprint version, to appear in Journal of Nonparametric Statistics, 2005 1Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA. Email: [email protected]. 2Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083- 0688, USA. Email: [email protected]. Website: www.utdallas.edu/∼serfling. Support by NSF Grant DMS-0103698 is gratefully acknowledged. Abstract For nonparametric exploration or description of a distribution, the treatment of location, spread, symmetry and skewness is followed by characterization of kurtosis. Classical moment- based kurtosis measures the dispersion of a distribution about its “shoulders”. Here we con- sider quantile-based kurtosis measures. These are robust, are defined more widely, and dis- criminate better among shapes. A univariate quantile-based kurtosis measure of Groeneveld and Meeden (1984) is extended to the multivariate case by representing it as a transform of a dispersion functional. A family of such kurtosis measures defined for a given distribution and taken together comprises a real-valued “kurtosis functional”, which has intuitive appeal as a convenient two-dimensional curve for description of the kurtosis of the distribution. Several multivariate distributions in any dimension may thus be compared with respect to their kurtosis in a single two-dimensional plot. Important properties of the new multivariate kurtosis measures are established. For example, for elliptically symmetric distributions, this measure determines the distribution within affine equivalence. Related tailweight measures, influence curves, and asymptotic behavior of sample versions are also discussed.
    [Show full text]
  • Stat 714 Linear Statistical Models
    STAT 714 LINEAR STATISTICAL MODELS Fall, 2010 Lecture Notes Joshua M. Tebbs Department of Statistics The University of South Carolina TABLE OF CONTENTS STAT 714, J. TEBBS Contents 1 Examples of the General Linear Model 1 2 The Linear Least Squares Problem 13 2.1 Least squares estimation . 13 2.2 Geometric considerations . 15 2.3 Reparameterization . 25 3 Estimability and Least Squares Estimators 28 3.1 Introduction . 28 3.2 Estimability . 28 3.2.1 One-way ANOVA . 35 3.2.2 Two-way crossed ANOVA with no interaction . 37 3.2.3 Two-way crossed ANOVA with interaction . 39 3.3 Reparameterization . 40 3.4 Forcing least squares solutions using linear constraints . 46 4 The Gauss-Markov Model 54 4.1 Introduction . 54 4.2 The Gauss-Markov Theorem . 55 4.3 Estimation of σ2 in the GM model . 57 4.4 Implications of model selection . 60 4.4.1 Underfitting (Misspecification) . 60 4.4.2 Overfitting . 61 4.5 The Aitken model and generalized least squares . 63 5 Distributional Theory 68 5.1 Introduction . 68 i TABLE OF CONTENTS STAT 714, J. TEBBS 5.2 Multivariate normal distribution . 69 5.2.1 Probability density function . 69 5.2.2 Moment generating functions . 70 5.2.3 Properties . 72 5.2.4 Less-than-full-rank normal distributions . 73 5.2.5 Independence results . 74 5.2.6 Conditional distributions . 76 5.3 Noncentral χ2 distribution . 77 5.4 Noncentral F distribution . 79 5.5 Distributions of quadratic forms . 81 5.6 Independence of quadratic forms . 85 5.7 Cochran's Theorem .
    [Show full text]
  • Iterative Approaches to Handling Heteroscedasticity with Partially Known Error Variances
    International Journal of Statistics and Probability; Vol. 8, No. 2; March 2019 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Iterative Approaches to Handling Heteroscedasticity With Partially Known Error Variances Morteza Marzjarani Correspondence: Morteza Marzjarani, National Marine Fisheries Service, Southeast Fisheries Science Center, Galveston Laboratory, 4700 Avenue U, Galveston, Texas 77551, USA. Received: January 30, 2019 Accepted: February 20, 2019 Online Published: February 22, 2019 doi:10.5539/ijsp.v8n2p159 URL: https://doi.org/10.5539/ijsp.v8n2p159 Abstract Heteroscedasticity plays an important role in data analysis. In this article, this issue along with a few different approaches for handling heteroscedasticity are presented. First, an iterative weighted least square (IRLS) and an iterative feasible generalized least square (IFGLS) are deployed and proper weights for reducing heteroscedasticity are determined. Next, a new approach for handling heteroscedasticity is introduced. In this approach, through fitting a multiple linear regression (MLR) model or a general linear model (GLM) to a sufficiently large data set, the data is divided into two parts through the inspection of the residuals based on the results of testing for heteroscedasticity, or via simulations. The first part contains the records where the absolute values of the residuals could be assumed small enough to the point that heteroscedasticity would be ignorable. Under this assumption, the error variances are small and close to their neighboring points. Such error variances could be assumed known (but, not necessarily equal).The second or the remaining portion of the said data is categorized as heteroscedastic. Through real data sets, it is concluded that this approach reduces the number of unusual (such as influential) data points suggested for further inspection and more importantly, it will lowers the root MSE (RMSE) resulting in a more robust set of parameter estimates.
    [Show full text]
  • Bayes Estimates for the Linear Model
    Bayes Estimates for the Linear Model By D. V. LINDLEY AND A. F. M. SMITH University College, London [Read before the ROYAL STATISTICAL SOCIETY at a meeting organized by the RESEARCH SECTION on Wednesday, December 8th, 1971, Mr M. J. R. HEALY in the Chair] SUMMARY The usual linear statistical model is reanalyzed using Bayesian methods and the concept of exchangeability. The general method is illustrated by applica­ tions to two-factor experimental designs and multiple regression. Keywords: LINEAR MODEL; LEAST SQUARES; BAYES ESTIMATES; EXCHANGEABILITY; ADMISSIBILITY; TWO-FACTOR EXPERIMENTAL DESIGN; MULTIPLE REGRESSION; RIDGE REGRESSION; MATRIX INVERSION. INTRODUCTION ATIENTION is confined in this paper to the linear model, E(y) = Ae, where y is a vector of observations, A a known design matrix and e a vector of unknown para­ meters. The usual estimate of e employed in this situation is that derived by the method of least squares. We argue that it is typically true that there is available prior information about the parameters and that this may be exploited to find improved, and sometimes substantially improved, estimates. In this paper we explore a particular form of prior information based on de Finetti's (1964) important idea of exchangeability. The argument is entirely within the Bayesian framework. Recently there has been much discussion of the respective merits of Bayesian and non-Bayesian approaches to statistics: we cite, for example, the paper by Cornfield (1969) and its ensuing discussion. We do not feel that it is necessary or desirable to add to this type of literature, and since we know of no reasoned argument against the Bayesian position we have adopted it here.
    [Show full text]