Lecture 4: Relaxing the Assumptions of the Linear Model

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 4: Relaxing the Assumptions of the Linear Model Lecture 4: Relaxing the Assumptions of the Linear Model R.G. Pierse 1 Overview 1.1 The Assumptions of the Classical Model E(ui) = 0 ; i = 1; ··· ; n (A1) 2 2 E(ui ) = σ ; i = 1; ··· ; n (A2) E(uiuj) = 0 ; i; j = 1; ··· ; n j 6= i (A3) X values are fixed in repeated sampling (A4) 0 The variables Xj are not perfectly collinear. (A4 ) ui is distributed with the normal distribution (A5) In this lecture we look at the implications of relaxing two of these assumptions: A2 and A3. Assumption A2 is the assumption of homoscedasticity, that the error variance is constant over all observations. If this assumption does not hold then the errors are heteroscedastic and 2 2 E(ui ) = σi ; i = 1; ··· ; n 2 where the subscript i on σi indicates that the error variance can be different for each observation. Assumption A3 is the assumption that the errors are serially uncorrelated. If this assumption does not hold then we say that the errors are serially correlated, or equivalently, that they exhibit autocorrelation. Symbolically E(uiuj) 6= 0 j 6= i 1 2 Autocorrelation Here we consider relaxing Assumption A3 and allowing the errors to exhibit au- tocorrelation. Symbolically E(uiuj) 6= 0 j 6= i : (2.1) Autocorrelation really only makes sense in time-series data where there is a natural ordering for the observations. Hence we assume for the rest of this section that we are dealing with time-series data and use the suffices t and s, rather than i and j to denote observations. 2.1 The First Order Autoregressive Model The assumption (2.1) is too general to deal with as it stands and we need to have a more precise model of the form that the autocorrelation takes. Specifically, we consider the hypothesis that the errors follow a first-order autoregressive or AR(1) scheme ut = ρut−1 + "t ; t = 1; ··· ;T (2.2) −1 < ρ < 1 where ut and "t are assumed to be independent error processes and "t has the standard properties: 2 2 E("t) = 0 ; E("t ) = σ" ; t = 1; ··· ;T and E("t"s) = 0 ; t; s = 1; ··· ; T s 6= t : By successive substitution in (2.2) we can write 2 t−1 ut = "t + ρεt−1 + ρ "t−2 + ··· + ρ "1 + u0 where the initial value u0 is taken as fixed with u0 = 0. Hence it follows that E(ut) = 0. Consider the variance of ut where ut follows a first order autoregressive process: 2 2 E(ut ) = E(ρut−1 + "t) 2 2 2 = ρ E(ut−1) + E("t ) + 2ρ E(ut−1"t) but the final term is zero since ut and "t are assumed independent and, in the first 2 2 term, E(ut−1) = E(ut ) since assumption A2 is still assumed to hold, so that σ2 E(u2) = " : t 1 − ρ2 2 Similarly, the first order covariance E(utut−1) = E(ρut−1 + "t) ut−1 2 = ρ E(ut−1) + ρ E(ut−1"t) σ2ρ = " 1 − ρ2 and, more generally, σ2ρs E(u u ) = " ; s ≥ 0 : t t−s 1 − ρ2 The autocorrelation between ut and ut−s decreases as the distance between the observations, s, increases. If ρ is negative, then this autocorrelation alternates in sign. 2.2 Generalised Least Squares Consider the multiple regression model Yt = β1 + β2X2t + ··· + βkXkt + ut ; t = 1; ··· ;T (2.3) Lagging this equation and multiplying by ρ gives ρYt−1 = ρβ1 + ρβ2X2;t−1 + ··· + ρβkXk;t−1 + ρut−1 (2.4) and, subtracting from the original equation, Yt − ρYt−1 = (1 − ρ)β1 + β2(X2t − ρX2;t−1) + ··· + βk(Xkt − ρXk;t−1) + ut − ρut−1 or ∗ ∗ ∗ ∗ Yt = β1 + β2X2t + ··· + βkXkt + "t ; t = 2; ··· ;T (2.5) ∗ ∗ where Yt = Yt − ρYt−1 and Xjt = Xjt − ρXj;t−1 are transformed variables known as quasi-differences. By transforming the equation, the error process has been tranformed into one that obeys all the classical assumptions so that, if the value of ρ is known, estimation of (2.5) gives the BLUE of the model. This estimator is known as the Generalised Least Squares or GLS estimator. Note that by quasi- differencing we lose one observation from the sample, because the first observation cannot be quasi-differenced. As long as T is large enough, we don't need to worry about this. For the simple regression model with only one explanatory variable PT (xt − ρxt−1)(yt − ρyt−1) eb = t=2 PT 2 t=2(xt − ρxt−1) is the GLS estimator and σ2 V ar(eb) = " : PT 2 t=2(xt − ρxt−1) 3 2.3 Estimating ρ: The Cochrane-Orcutt Procedure In practice, the autoregressive coefficient ρ is unknown and so has to be estimated. The Cochrane-Orcutt (1949) method is an iterative procedure to implement a feasible GLS estimator and estimate both ρ and the regression parameters β efficiently. Step 1: Estimate the regression equation Yt = β1 + β2X2t + ··· + βkXkt + ut ; t = 1; ··· ;T by OLS and form the OLS residuals et. Step 2: Run the regression et = ρet−1 + vt : to give the OLS estimator PT e e ρ = t=2 t t−1 b PT 2 t=2 et + + Step 3: Form the quasi-differenced variables Yt = Yt − ρYb t−1, and Xj;t = Xj;t − ρXb j;t−1, j = 2; ··· ; k and run the regression + + + + + Yt = β1 + β2X2t + ··· + βkXkt + "t ; t = 2; ··· ;T This gives a new set of estimates for the β parameters, and a new set of residuals, + et . Steps 2 and 3 are then executed repeatedly to derive new estimates of β and ρ and this process continues until there is no change in the estimate of ρ obtained from successive iterations. This is the full Cochrane-Orcutt iterative procedure. However, a simplified version is the two-step procedure which stops after the second step when the OLS estimator of ρ has been obtained. 2.4 Properties of OLS In the presence of autocorrelation, OLS is no longer BLUE. However, it remains unbiased since ! ! PT x y PT x u E(b) = E t=1 t t = b + E t=1 t t = b b PT 2 PT 2 t=1 xt t=1 xt where the last equality uses the result that E(ut) = 0. Because OLS is no longer BLUE, it follows that the OLS variance must be larger than that of the BLUE GLS estimator. Hence, in hypothesis testing, standard errors will be larger than they should be, so that coefficients will appear less significant than they truly are. This will lead to incorrect inference. 4 2.5 Testing for Autocorrelation In practice we do not know whether or not we have autocorrelation. If autocorre- lation is ignored, then although parameter estimates remain unbiased, inference based on those estimates will be incorrect. It is thus important to be able to test for autocorrelation in the OLS model. If autocorrelation is detected, then we can re-estimate the model by GLS. 2.5.1 The Durbin-Watson test This is the most famous test for autocorrelation. It is an exact test, in that the distribution of the test statistic holds for all sizes of sample. However, it is only valid when (a) the regressors X are fixed in repeated samples and (b) an intercept is included in the regression. Note in particular that the first assumption is violated if the regressors include any lagged dependent variables. The Durbin- Watson (1951) statistic is given by the formula PT (e − e )2 d = t=2 t t−1 : (2.6) PT 2 t=2 et Expanding the numerator gives PT e2 + PT e2 − 2 PT e e d = t=2 t t=2 t−1 t=2 t t−1 PT 2 t=2 et ! PT e e ' 2 1 − t=2 t t−1 = 2(1 − ρ) PT 2 b t=2 et PT 2 PT −1 2 PT 2 since t=2 et−1 ≡ t=1 et ' t=2 et . Note that 0 ≤ d ≤ 4, with E(d) = 2, and d < 2 indicative of positive autocorrelation, and d > 2 indicative of negative autocorrelation. In general, the distribution of d is a function of the explanatory variables X, so to compute the exact distribution of the statistic is very complicated. However, Durbin and Watson were able to show that the distribution of d is bounded by the distributions of two other statistics dL and dU which do not depend on X. Critical values of the distributions of dL and dU were tabulated by Durbin and Watson. The procedure for applying the Durbin-Watson test for testing positive auto- correlation is as follows: d < dL : reject the null hypothesis of no positive autocorrelation dL ≤ d ≤ dU : inconclusive region d > dU : do not reject the null hypothesis of no autocorrelation 5 Similarly, to test for negative autocorrelation: d > 4 − dL : reject the null hypothesis of no negative autocorrelation 4 − dU ≤ d ≤ 4 − dL : inconclusive region d < 4 − dU : do not reject the null hypothesis of no autocorrelation In both cases, there is an inconclusive region where the test does not allow us to say whether or not there is autocorrelation. For example, with T = 25 and k = 2, the 5% critical values are dL = 1:29 and dU = 1:45 so that, for any value of d in this range, the result of the test is indeterminate. The inconclusive region in the Durbin-Watson test is clearly a nuisance. One solution is to adopt the modified d test, where test values in the inconclusive region are treated as rejections of the null hypothesis.
Recommended publications
  • A Generalized Linear Model for Principal Component Analysis of Binary Data
    A Generalized Linear Model for Principal Component Analysis of Binary Data Andrew I. Schein Lawrence K. Saul Lyle H. Ungar Department of Computer and Information Science University of Pennsylvania Moore School Building 200 South 33rd Street Philadelphia, PA 19104-6389 {ais,lsaul,ungar}@cis.upenn.edu Abstract they are not generally appropriate for other data types. Recently, Collins et al.[5] derived generalized criteria We investigate a generalized linear model for for dimensionality reduction by appealing to proper- dimensionality reduction of binary data. The ties of distributions in the exponential family. In their model is related to principal component anal- framework, the conventional PCA of real-valued data ysis (PCA) in the same way that logistic re- emerges naturally from assuming a Gaussian distribu- gression is related to linear regression. Thus tion over a set of observations, while generalized ver- we refer to the model as logistic PCA. In this sions of PCA for binary and nonnegative data emerge paper, we derive an alternating least squares respectively by substituting the Bernoulli and Pois- method to estimate the basis vectors and gen- son distributions for the Gaussian. For binary data, eralized linear coefficients of the logistic PCA the generalized model's relationship to PCA is anal- model. The resulting updates have a simple ogous to the relationship between logistic and linear closed form and are guaranteed at each iter- regression[12]. In particular, the model exploits the ation to improve the model's likelihood. We log-odds as the natural parameter of the Bernoulli dis- evaluate the performance of logistic PCA|as tribution and the logistic function as its canonical link.
    [Show full text]
  • Linear, Ridge Regression, and Principal Component Analysis
    Linear, Ridge Regression, and Principal Component Analysis Linear, Ridge Regression, and Principal Component Analysis Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/∼jiali Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Introduction to Regression I Input vector: X = (X1, X2, ..., Xp). I Output Y is real-valued. I Predict Y from X by f (X ) so that the expected loss function E(L(Y , f (X ))) is minimized. I Square loss: L(Y , f (X )) = (Y − f (X ))2 . I The optimal predictor ∗ 2 f (X ) = argminf (X )E(Y − f (X )) = E(Y | X ) . I The function E(Y | X ) is the regression function. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Example The number of active physicians in a Standard Metropolitan Statistical Area (SMSA), denoted by Y , is expected to be related to total population (X1, measured in thousands), land area (X2, measured in square miles), and total personal income (X3, measured in millions of dollars). Data are collected for 141 SMSAs, as shown in the following table. i : 1 2 3 ... 139 140 141 X1 9387 7031 7017 ... 233 232 231 X2 1348 4069 3719 ... 1011 813 654 X3 72100 52737 54542 ... 1337 1589 1148 Y 25627 15389 13326 ... 264 371 140 Goal: Predict Y from X1, X2, and X3. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Linear Methods I The linear regression model p X f (X ) = β0 + Xj βj .
    [Show full text]
  • Auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics
    auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics APREPRINT Alicja Gosiewska Przemysław Biecek Faculty of Mathematics and Information Science Faculty of Mathematics and Information Science Warsaw University of Technology Warsaw University of Technology Poland Faculty of Mathematics, Informatics and Mechanics [email protected] University of Warsaw Poland [email protected] May 27, 2020 ABSTRACT Machine learning models have spread to almost every area of life. They are successfully applied in biology, medicine, finance, physics, and other fields. With modern software it is easy to train even a complex model that fits the training data and results in high accuracy on test set. The problem arises when models fail confronted with the real-world data. This paper describes methodology and tools for model-agnostic audit. Introduced tech- niques facilitate assessing and comparing the goodness of fit and performance of models. In addition, they may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. Presented methods were implemented in the auditor package for R. Due to flexible and con- sistent grammar, it is simple to validate models of any classes. K eywords machine learning, R, diagnostics, visualization, modeling 1 Introduction Predictive modeling is a process that uses mathematical and computational methods to forecast outcomes. arXiv:1809.07763v4 [stat.CO] 26 May 2020 Lots of algorithms in this area have been developed and are still being develop. Therefore, there are countless possible models to choose from and a lot of ways to train a new new complex model.
    [Show full text]
  • Principal Component Analysis (PCA) As a Statistical Tool for Identifying Key Indicators of Nuclear Power Plant Cable Insulation
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2017 Principal component analysis (PCA) as a statistical tool for identifying key indicators of nuclear power plant cable insulation degradation Chamila Chandima De Silva Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Materials Science and Engineering Commons, Mechanics of Materials Commons, and the Statistics and Probability Commons Recommended Citation De Silva, Chamila Chandima, "Principal component analysis (PCA) as a statistical tool for identifying key indicators of nuclear power plant cable insulation degradation" (2017). Graduate Theses and Dissertations. 16120. https://lib.dr.iastate.edu/etd/16120 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Principal component analysis (PCA) as a statistical tool for identifying key indicators of nuclear power plant cable insulation degradation by Chamila C. De Silva A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Materials Science and Engineering Program of Study Committee: Nicola Bowler, Major Professor Richard LeSar Steve Martin The student author and the program of study committee are solely responsible for the content of this thesis. The Graduate College will ensure this thesis is globally accessible and will not permit alterations after a degree is conferred.
    [Show full text]
  • Generalized Linear Models
    CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + + Xkβk.Logistic regression predicts Pr(y =1)forbinarydatafromalinearpredictorwithaninverse-··· logit transformation. A generalized linear model involves: 1. A data vector y =(y1,...,yn) 2. Predictors X and coefficients β,formingalinearpredictorXβ 1 3. A link function g,yieldingavectoroftransformeddataˆy = g− (Xβ)thatare used to model the data 4. A data distribution, p(y yˆ) | 5. Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution. The options in a generalized linear model are the transformation g and the data distribution p. In linear regression,thetransformationistheidentity(thatis,g(u) u)and • the data distribution is normal, with standard deviation σ estimated from≡ data. 1 1 In logistic regression,thetransformationistheinverse-logit,g− (u)=logit− (u) • (see Figure 5.2a on page 80) and the data distribution is defined by the proba- bility for binary data: Pr(y =1)=y ˆ. This chapter discusses several other classes of generalized linear model, which we list here for convenience: The Poisson model (Section 6.2) is used for count data; that is, where each • data point yi can equal 0, 1, 2, ....Theusualtransformationg used here is the logarithmic, so that g(u)=exp(u)transformsacontinuouslinearpredictorXiβ to a positivey ˆi.ThedatadistributionisPoisson. It is usually a good idea to add a parameter to this model to capture overdis- persion,thatis,variationinthedatabeyondwhatwouldbepredictedfromthe Poisson distribution alone.
    [Show full text]
  • Time-Series Regression and Generalized Least Squares in R*
    Time-Series Regression and Generalized Least Squares in R* An Appendix to An R Companion to Applied Regression, third edition John Fox & Sanford Weisberg last revision: 2018-09-26 Abstract Generalized least-squares (GLS) regression extends ordinary least-squares (OLS) estimation of the normal linear model by providing for possibly unequal error variances and for correlations between different errors. A common application of GLS estimation is to time-series regression, in which it is generally implausible to assume that errors are independent. This appendix to Fox and Weisberg (2019) briefly reviews GLS estimation and demonstrates its application to time-series data using the gls() function in the nlme package, which is part of the standard R distribution. 1 Generalized Least Squares In the standard linear model (for example, in Chapter 4 of the R Companion), E(yjX) = Xβ or, equivalently y = Xβ + " where y is the n×1 response vector; X is an n×k +1 model matrix, typically with an initial column of 1s for the regression constant; β is a k + 1 ×1 vector of regression coefficients to estimate; and " is 2 an n×1 vector of errors. Assuming that " ∼ Nn(0; σ In), or at least that the errors are uncorrelated and equally variable, leads to the familiar ordinary-least-squares (OLS) estimator of β, 0 −1 0 bOLS = (X X) X y with covariance matrix 2 0 −1 Var(bOLS) = σ (X X) More generally, we can assume that " ∼ Nn(0; Σ), where the error covariance matrix Σ is sym- metric and positive-definite. Different diagonal entries in Σ error variances that are not necessarily all equal, while nonzero off-diagonal entries correspond to correlated errors.
    [Show full text]
  • Simple Linear Regression: Straight Line Regression Between an Outcome Variable (Y ) and a Single Explanatory Or Predictor Variable (X)
    1 Introduction to Regression \Regression" is a generic term for statistical methods that attempt to fit a model to data, in order to quantify the relationship between the dependent (outcome) variable and the predictor (independent) variable(s). Assuming it fits the data reasonable well, the estimated model may then be used either to merely describe the relationship between the two groups of variables (explanatory), or to predict new values (prediction). There are many types of regression models, here are a few most common to epidemiology: Simple Linear Regression: Straight line regression between an outcome variable (Y ) and a single explanatory or predictor variable (X). E(Y ) = α + β × X Multiple Linear Regression: Same as Simple Linear Regression, but now with possibly multiple explanatory or predictor variables. E(Y ) = α + β1 × X1 + β2 × X2 + β3 × X3 + ::: A special case is polynomial regression. 2 3 E(Y ) = α + β1 × X + β2 × X + β3 × X + ::: Generalized Linear Model: Same as Multiple Linear Regression, but with a possibly transformed Y variable. This introduces considerable flexibil- ity, as non-linear and non-normal situations can be easily handled. G(E(Y )) = α + β1 × X1 + β2 × X2 + β3 × X3 + ::: In general, the transformation function G(Y ) can take any form, but a few forms are especially common: • Taking G(Y ) = logit(Y ) describes a logistic regression model: E(Y ) log( ) = α + β × X + β × X + β × X + ::: 1 − E(Y ) 1 1 2 2 3 3 2 • Taking G(Y ) = log(Y ) is also very common, leading to Poisson regression for count data, and other so called \log-linear" models.
    [Show full text]
  • ROC Curve Analysis and Medical Decision Making
    ROC curve analysis and medical decision making: what’s the evidence that matters for evidence-based diagnosis ? Piergiorgio Duca Biometria e Statistica Medica – Dipartimento di Scienze Cliniche Luigi Sacco – Università degli Studi – Via GB Grassi 74 – 20157 MILANO (ITALY) [email protected] 1) The ROC (Receiver Operating Characteristic) curve and the Area Under the Curve (AUC) The ROC curve is the statistical tool used to analyse the accuracy of a diagnostic test with multiple cut-off points. The test could be based on a continuous diagnostic indicant, such as a serum enzyme level, or just on an ordinal one, such as a classification based on radiological imaging. The ROC curve is based on the probability density distributions, in actually diseased and non diseased patients, of the diagnostician’s confidence in a positive diagnosis, and upon a set of cut-off points to separate “positive” and “negative” test results (Egan, 1968; Bamber, 1975; Swets, Pickett, 1982; Metz, 1986; Zhou et al, 2002). The Sensitivity (SE) – the proportion of diseased turned out to be test positive – and the Specificity (SP) – the proportion of non diseased turned out to be test negative – will depend on the particular confidence threshold the observer applies to partition the continuously distributed perceptions of evidence into positive and negative test results. The ROC curve is the plot of all the pairs of True Positive Rates (TPR = SE), as ordinate, and False Positive Rates (FPR = (1 – SP)), as abscissa, related to all the possible cut-off points. An ROC curve represents all of the compromises between SE and SP can be achieved, changing the confidence threshold.
    [Show full text]
  • Estimating the Variance of a Propensity Score Matching Estimator for the Average Treatment Effect
    Observational Studies 4 (2018) 71-96 Submitted 5/15; Published 3/18 Estimating the variance of a propensity score matching estimator for the average treatment effect Ronnie Pingel [email protected] Department of Statistics Uppsala University Uppsala, Sweden Abstract This study considers variance estimation when estimating the asymptotic variance of a propensity score matching estimator for the average treatment effect. We investigate the role of smoothing parameters in a variance estimator based on matching. We also study the properties of estimators using local linear estimation. Simulations demonstrate that large gains can be made in terms of mean squared error, bias and coverage rate by prop- erly selecting smoothing parameters. Alternatively, a residual-based local linear estimator could be used as an estimator of the asymptotic variance. The variance estimators are implemented in analysis to evaluate the effect of right heart catheterisation. Keywords: Average Causal Effect, Causal Inference, Kernel estimator 1. Introduction Matching estimators belong to a class of estimators of average treatment effects (ATEs) in observational studies that seek to balance the distributions of covariates in a treatment and control group (Stuart, 2010). In this study we consider one frequently used matching method, the simple nearest-neighbour matching with replacement (Rubin, 1973; Abadie and Imbens, 2006). Rosenbaum and Rubin (1983) show that instead of matching on covariates directly to remove confounding, it is sufficient to match on the propensity score. In observational studies the propensity score must almost always be estimated. An important contribution is therefore Abadie and Imbens (2016), who derive the large sample distribution of a nearest- neighbour propensity score matching estimator when using the estimated propensity score.
    [Show full text]
  • Chapter 2 Simple Linear Regression Analysis the Simple
    Chapter 2 Simple Linear Regression Analysis The simple linear regression model We consider the modelling between the dependent and one independent variable. When there is only one independent variable in the linear regression model, the model is generally termed as a simple linear regression model. When there are more than one independent variables in the model, then the linear model is termed as the multiple linear regression model. The linear model Consider a simple linear regression model yX01 where y is termed as the dependent or study variable and X is termed as the independent or explanatory variable. The terms 0 and 1 are the parameters of the model. The parameter 0 is termed as an intercept term, and the parameter 1 is termed as the slope parameter. These parameters are usually called as regression coefficients. The unobservable error component accounts for the failure of data to lie on the straight line and represents the difference between the true and observed realization of y . There can be several reasons for such difference, e.g., the effect of all deleted variables in the model, variables may be qualitative, inherent randomness in the observations etc. We assume that is observed as independent and identically distributed random variable with mean zero and constant variance 2 . Later, we will additionally assume that is normally distributed. The independent variables are viewed as controlled by the experimenter, so it is considered as non-stochastic whereas y is viewed as a random variable with Ey()01 X and Var() y 2 . Sometimes X can also be a random variable.
    [Show full text]
  • Week 7: Multiple Regression
    Week 7: Multiple Regression Brandon Stewart1 Princeton October 24, 26, 2016 1These slides are heavily influenced by Matt Blackwell, Adam Glynn, Jens Hainmueller and Danny Hidalgo. Stewart (Princeton) Week 7: Multiple Regression October 24, 26, 2016 1 / 145 Where We've Been and Where We're Going... Last Week I regression with two variables I omitted variables, multicollinearity, interactions This Week I Monday: F matrix form of linear regression I Wednesday: F hypothesis tests Next Week I break! I then ::: regression in social science Long Run I probability ! inference ! regression Questions? Stewart (Princeton) Week 7: Multiple Regression October 24, 26, 2016 2 / 145 1 Matrix Algebra Refresher 2 OLS in matrix form 3 OLS inference in matrix form 4 Inference via the Bootstrap 5 Some Technical Details 6 Fun With Weights 7 Appendix 8 Testing Hypotheses about Individual Coefficients 9 Testing Linear Hypotheses: A Simple Case 10 Testing Joint Significance 11 Testing Linear Hypotheses: The General Case 12 Fun With(out) Weights Stewart (Princeton) Week 7: Multiple Regression October 24, 26, 2016 3 / 145 Why Matrices and Vectors? Here's one way to write the full multiple regression model: yi = β0 + xi1β1 + xi2β2 + ··· + xiK βK + ui Notation is going to get needlessly messy as we add variables Matrices are clean, but they are like a foreign language You need to build intuitions over a long period of time (and they will return in Soc504) Reminder of Parameter Interpretation: β1 is the effect of a one-unit change in xi1 conditional on all other xik . We are going to review the key points quite quickly just to refresh the basics.
    [Show full text]
  • An Introduction to Logistic Regression: from Basic Concepts to Interpretation with Particular Attention to Nursing Domain
    J Korean Acad Nurs Vol.43 No.2, 154 -164 J Korean Acad Nurs Vol.43 No.2 April 2013 http://dx.doi.org/10.4040/jkan.2013.43.2.154 An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain Park, Hyeoun-Ae College of Nursing and System Biomedical Informatics National Core Research Center, Seoul National University, Seoul, Korea Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial short- comings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nurs- ing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models. Key words: Logit function, Maximum likelihood estimation, Odds, Odds ratio, Wald test INTRODUCTION The model serves two purposes: (1) it can predict the value of the depen- dent variable for new values of the independent variables, and (2) it can Multivariable methods of statistical analysis commonly appear in help describe the relative contribution of each independent variable to general health science literature (Bagley, White, & Golomb, 2001).
    [Show full text]