Multiple Linear Regression (MLR) Handouts

Total Page:16

File Type:pdf, Size:1020Kb

Multiple Linear Regression (MLR) Handouts Multiple Linear Regression (MLR) Handouts Yibi Huang • Data and Models • Least Squares Estimate, Fitted Values, Residuals • Sum of Squares • How to Do Regression in R? • Interpretation of Regression Coefficients • t-Tests on Individual Regression Coefficients • F -Tests for Comparing Nested Models MLR - 1 You may skip this lecture if you have taken STAT 224 or 245. However, you are encouraged to at least read through the slides if you skip the video lecture. MLR - 2 Data for Multiple Linear Regression Models Multiple linear regression is a generalized form of simple linear regression, when there are multiple explanatory variables. SLR MLR x y x1 x2 ::: xp y case 1: x1 y1 x11 x12 ::: x1p y1 case 2: x2 y2 x21 x22 ::: x2p y2 . .. case n: xn yn xn1 xn2 ::: xnp yn I For SLR, we observe pairs of variables. For MLR, we observe rows of variables. Each row (or pair) is called a case, a record, or a data point I yi is the response (or dependent variable) of the ith case I There are p explanatory variables (or covariates, predictors, independent variables), and xik is the value of the explanatory variable xk of the ith case MLR - 3 Multiple Linear Regression Models yi = β0 + β1xi1 + ::: + βpxip + "i In the model above, 2 I "i 's (errors, or noise) are i.i.d. N(0; σ ) I Parameters include: β0 = intercept; βk = regression coefficient (slope) for the kth explanatory variable; k = 1;:::; p 2 σ = Var("i ) = the variance of errors I Observed (known): yi ; xi1; xi2;:::; xip 2 Unknown: β0, β1; : : : ; βp, σ , "i 's I Random variables: "i 's, yi 's 2 Constants (nonrandom): βk 's, σ , xik 's MLR - 4 MLR is just like SLR. The least squares estimate (βb0;:::; βbp) for (β0; : : : ; βp) is the intercept and slopes of the (hyper)plane with the minimum sum of squared vertical distance to the data points n X 2 (yi − βb0 − βb1xi1 − ::: − βbpxip) i=1 Fitting the Model | Least Squares Method Recall for SLR, the least squares 18 estimate (βb0; βb1) for (β0; β1) is the 16 intercept and slope of the straight 14 line with the minimum sum of 12 squared vertical distance to the data points 10 8 Xn 2 Y = % in poverty (yi − βb0 − βb1xi ) : 6 i=1 75 80 85 90 95 X = % HS grad MLR - 5 Fitting the Model | Least Squares Method Recall for SLR, the least squares 18 estimate (βb0; βb1) for (β0; β1) is the 16 intercept and slope of the straight 14 line with the minimum sum of 12 squared vertical distance to the data points 10 8 Xn 2 Y = % in poverty (yi − βb0 − βb1xi ) : 6 i=1 75 80 85 90 95 MLR is just like SLR. The least squares estimateX = (%βb0 ;:::;HS gradβbp) for (β0; : : : ; βp) is the intercept and slopes of the (hyper)plane with the minimum sum of squared vertical distance to the data points n X 2 (yi − βb0 − βb1xi1 − ::: − βbpxip) i=1 MLR - 5 The \Hat" Notation From now on, we use the \hat" notation to differentiate I the estimated coefficient βbj from I the actual unknown coefficient βj MLR - 6 Solving the Least Squares Problem (1) To find the (βb0; βb1;:::; βbp) that minimize n X 2 L(βb0; βb1;:::; βbp) = (yi − βb0 − βb1xi1 − ::: − βbpxip) i=1 one can set the derivatives of L with respect to βbj to 0 n @L X = −2 (yi − βb0 − βb1xi1 − ::: − βbpxip) @βb0 i=1 n @L X = −2 xik (yi − βb0 − βb1xi1 − ::: − βbpxip); k = 1; 2;:::; p @βbk i=1 and then equate them to 0. This results in a system of (p + 1) equations in (p + 1) unknowns on the next page. MLR - 7 Solving the Least Squares Problem (2) Pn Pn Pn βb0 · n + βb1 i=1 xi1 + ··· + βbp i=1 xip = i=1 yi Pn Pn 2 Pn Pn βb0 i=1 xi1 + βb1 i=1 xi1 + ··· + βbp i=1 xi1xip = i=1 xi1yi . Pn Pn Pn Pn βb0 i=1 xik + βb1 i=1 xik xi1 + ··· + βbp i=1 xik xip = i=1 xik yi . Pn Pn Pn 2 Pn βb0 i=1 xip + βb1 i=1 xipxi1 + ··· + βbp i=1 xip = i=1 xipyi I Don't worry about solving the equations. R and many other softwares can do the computation for us. I In general, βbj 6= βj , but they will be close under some conditions MLR - 8 Solving the Least Squares Problem (2) Pn Pn Pn βb0 · n + βb1 i=1 xi1 + ··· + βbp i=1 xip = i=1 yi Pn Pn 2 Pn Pn βb0 i=1 xi1 + βb1 i=1 xi1 + ··· + βbp i=1 xi1xip = i=1 xi1yi . Pn Pn Pn Pn βb0 i=1 xik + βb1 i=1 xik xi1 + ··· + βbp i=1 xik xip = i=1 xik yi . Pn Pn Pn 2 Pn βb0 i=1 xip + βb1 i=1 xipxi1 + ··· + βbp i=1 xip = i=1 xipyi | {z } | {z } | {z } | {z } known known known known I Don't worry about solving the equations. R and many other softwares can do the computation for us. I In general, βbj 6= βj , but they will be close under some conditions MLR - 8 Fitted Values The fitted value or predicted value: ybi = βb0 + βb1xi1 + ::: + βbpxip I Again, the \hat" symbol is used to differentiate the fitted value ybi from the actual observed value yi . MLR - 9 Errors and Residuals I One cannot directly compute the errors "i = yi − β0 − β1xi1 − ::: − βpxip since the coefficients β0; β1; : : : ; βp are unknown. I The errors "i can be estimated by the residuals ei defined as: residual ei = observed yi − predicted yi = yi − ybi = yi − (βb0 + βb1xi1 + ::: + βbpxip) | {z } predicted yi I ei 6= "i in general since βbj 6= βj MLR - 10 Properties of Residuals Recall the least squares estimate (βb0; βb1;:::; βbp) satisfies the equations n X (yi − βb0 − βb1xi1 − ::: − βbpxip) = 0 and i=1 | {z } = yi −ybi = ei = residual n X z }| { xik (yi − βb0 − βb1xi1 − ::: − βbpxip) = 0; k = 1; 2;:::; p: i=1 Thus the residuals ei have the properties Xn Xn ei = 0 ; xik ei = 0; k = 1; 2;:::; p: i=1 i=1 | {z } | {z } Residuals add up to 0. Residuals are orthogonal to covariates. MLR - 11 Sum of Squares Observe that yi − y = (ybi − y) + (yi − ybi ) | {z } | {z } a b Squaring up both sides using the identity( a + b)2 = a2 +b2 +2ab, we get 2 2 2 (yi − y) = (ybi − y) + (yi − ybi ) + 2(ybi − y)(yi − ybi ) | {z } | {z } | {z } a2 b2 2ab Summing up over all the cases i = 1; 2;:::; n, we get SST SSR SSE zn }| { zn }| { zn }| { X 2 X 2 X 2 (yi − y) = (ybi − y) + (yi − ybi ) i=1 i=1 i=1 | {z } =ei n X + 2 (ybi − y)(yi − ybi ) i=1 | {z } = 0, see next page. MLR - 12 Pn Why i=1(ybi − y)(yi − ybi ) = 0? n X (ybi − y)(yi − ybi ) i=1 | {z } =ei n n X X = ybi ei − yei i=1 i=1 n n X X = (βb0 + βb1xi1 + ::: + βbpxip)ei − yei i=1 i=1 n n n n X X X X = βb0 ei +βb1 xi1ei + ::: + βbp xipei −y ei i=1 i=1 i=1 i=1 | {z } | {z } | {z } | {z } =0 =0 =0 =0 = 0 Pn in which we used the properties of residuals that i=1 ei = 0 and Pn i=1 xik ei = 0 for all k = 1;:::; p: MLR - 13 Interpretation of Sum of Squares =e n n n i X 2 X 2 X z }| { 2 (yi − y) = (ybi − y) + (yi − ybi ) i=1 i=1 i=1 | {z } | {z } | {z } SST SSR SSE I SST = total sum of squares I total variability of y I depends on the response y only, not on the form of the model I SSR = regression sum of squares I variability of y explained by x1;:::; xp I SSE = error (residual) sum of squares Pn 2 I = minβ0,β1,...,βp i=1(yi − β0 − β1xi1 − · · · − βpxip) I variability of y not explained by x's MLR - 14 Degrees of Freedom If the MLR model yi = β0 + β1xi1 + ::: + βpxip + "i , "i 's i.i.d. ∼ N(0; σ2) is true, it can be shown that SSE ∼ χ2 ; σ2 n−p−1 If we further assume that β1 = β2 = ··· = βp = 0, then SST SSR ∼ χ2 ; ∼ χ2 σ2 n−1 σ2 p and SSR is independent of SSE. Note the degrees of freedom of the 3 chi-square distributions dfT = n − 1; dfR = p; dfE = n − p − 1 break down similarly dfT = dfR + dfE just like SST = SSR + SSE. MLR - 15 Why SSE Has n − p − 1 Degrees of Freedom? The n residuals e1;:::; en cannot all vary freely. There are p + 1 constraints: n n X X ei = 0 and xki ei = 0 for k = 1;:::; p: i=1 i=1 So only n − (p + 1) of them can be freely varying. The p + 1 constraints comes from the p + 1 coefficients β0; : : : ; βp in the model, and each contributes one constraint @ = 0: @βk MLR - 16 Mean Square Error (MSE) | Estimate of σ2 The mean squares is the sum of squares divided by its degrees of freedom: SST SST MST = = = sample variance of Y ; dfT n − 1 SSR SSR MSR = = ; dfR p SSE SSE MSE = = = σ2 dfE n − p − 1 b SSE 2 2 I From the fact σ2 ∼ χn−p−1 and that the mean of a χk distribution is k, we know that MSE is an unbiased estimator for σ2. MLR - 17 Example: Housing Price Price BDR FLR FP RMS ST LOT BTH CON GAR LOC 53 2 967 0 5 0 39 1.5 1 0.0 0 55 2 815 1 5 0 33 1.0 1 2.0 0 Price = Selling price in $1000 56 3 900 0 5 1 35 1.5 1 1.0 0 BDR = Number of bedrooms 58 3 1007 0 6 1 24 1.5 0 2.0 0 FLR = Floor space in sq.
Recommended publications
  • Statistical Modeling Through Regression Part 1. Multiple Regression
    ACADEMIC YEAR 2018-19 DATA ANALYSIS Block 4: Statistical Modeling through Regression OF TRANSPORT AND LOGISTICS – Part 1. Multiple Regression MSCTL - UPC Lecturer: Lídia Montero September 2018 – Version 1.1 MASTER’S DEGREE IN SUPPLY CHAIN, TRANSPORT AND LOGISTICS DATA ANALYSIS OF TRANSPORT AND LOGISTICS – MSCTL - UPC CONTENTS 4.1-1. READINGS____________________________________________________________________________________________________________ 3 4.1-2. INTRODUCTION TO LINEAR MODELS _________________________________________________________________________________ 4 4.1-3. LEAST SQUARES ESTIMATION IN MULTIPLE REGRESSION ____________________________________________________________ 12 4.1-3.1 GEOMETRIC PROPERTIES _______________________________________________________________________________________________ 14 4.1-4. LEAST SQUARES ESTIMATION: INFERENCE __________________________________________________________________________ 16 4.1-4.1 BASIC INFERENCE PROPERTIES __________________________________________________________________________________________ 16 4.1-5. HYPOTHESIS TESTS IN MULTIPLE REGRESSION ______________________________________________________________________ 19 4.1-5.1 TESTING IN R ________________________________________________________________________________________________________ 21 4.1-5.2 CONFIDENCE INTERVAL FOR MODEL PARAMETERS __________________________________________________________________________ 22 4.1-6. MULTIPLE CORRELATION COEFFICIENT_____________________________________________________________________________
    [Show full text]
  • Stat331: Applied Linear Models
    Stat331: Applied Linear Models Helena S. Ven 05 Sept. 2019 Instructor: Leilei Zeng ([email protected]) Office: TTh, 4:00 { 5:00, at M3.4223 Topics: Modeling the relationship between a response variable and several explanatory variables via regression models. 1. Parameter estimation and inference 2. Confidence Intervals 3. Prediction 4. Model assumption checking, methods dealing with violations 5. Variable selection 6. Applied Issues: Outliers, Influential points, Multi-collinearity (highly correlated explanatory varia- bles) Grading: 20% Assignments (4), 20% Midterms (2), 50% Final Midterms are on 22 Oct., 12 Nov. Textbook: Abraham B., Ledolter J. Introduction to Regression Modeling Index 1 Simple Linear Regression 2 1.1 Regression Model . .2 1.2 Simple Linear Regression . .2 1.3 Method of Least Squares . .3 1.4 Method of Maximum Likelihood . .5 1.5 Properties of Estimators . .6 2 Confidence Intervals and Hypothesis Testing 9 2.1 Pivotal Quantities . .9 2.2 Estimation of the Mean Response . 10 2.3 Prediction of a Single Response . 11 2.4 Analysis of Variance and F-test . 11 2.5 Co¨efficient of Determination . 14 3 Multiple Linear Regression 15 3.1 Random Vectors . 15 3.1.1 Multilinear Regression . 16 3.1.2 Parameter Estimation . 17 3.2 Estimation of Variance . 19 3.3 Inference and Prediction . 20 3.4 Maximum Likelihood Estimation . 21 3.5 Geometric Interpretation of Least Squares . 21 3.6 ANOVA For Multilinear Regression . 21 3.7 Hypothesis Testing . 23 3.7.1 Testing on Subsets of Co¨efficients . 23 3.8 Testing General Linear Hypothesis . 25 3.9 Interactions in Regression and Categorical Variables .
    [Show full text]
  • And SPECFIT/32 in the Regression of Multiwavelength
    Spectrochimica Acta Part A 86 (2012) 305–314 Contents lists available at SciVerse ScienceDirect Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy j ournal homepage: www.elsevier.com/locate/saa Reliability of dissociation constants and resolution capability of SQUAD(84) and SPECFIT/32 in the regression of multiwavelength spectrophotometric pH-titration data a,∗ a b Milan Meloun , Zuzana Ferencíkovᡠ, Milan Javurek˚ a Department of Analytical Chemistry, University of Pardubice, CZ 532 10 Pardubice, Czech Republic b Department of Process Control, University of Pardubice CZ 532 10 Pardubice, Czech Republic a r t i c l e i n f o a b s t r a c t Article history: The resolving power of multicomponent spectral analysis and the computation reliability of the stability Received 1 June 2011 constants and molar absorptivities determined for five variously protonated anions of physostigmine Received in revised form 23 August 2011 salicylate by the SQUAD(84) and SPECFIT/32 programs has been examined with the use of simulated Accepted 17 October 2011 and experimental spectra containing overlapping spectral bands. The reliability of the dissociation con- stants of drug was proven with goodness-of-fit tests and by examining the influence of pre-selected Keywords: noise level sinst(A) in synthetic spectra regarding the precision s(pK) and also accuracy of the estimated Spectrophotometric titration dissociation constants. Precision was examined as the linear regression model s(pK) = ˇ0 + ˇ1 sinst(A). In Dissociation constant ˇ SPECFIT/32 all cases the intercept 0 was statistically insignificant. When an instrumental error sinst(A) is small and − SQUAD(84) less than 0.5 mAU, the parameters’ estimates are nearly the same as the bias pK = pKa,calc pKa,true is Physostigmine salicylate quite negligible.
    [Show full text]
  • MISPECIFICATION BOOTSTRAP TESTS of the CAPITAL ASSET PRICING MODEL a Thesis by NHIEU BO BS, Texas A&M University-Corpus Chri
    MISPECIFICATION BOOTSTRAP TESTS OF THE CAPITAL ASSET PRICING MODEL A Thesis by NHIEU BO BS, Texas A&M University-Corpus Christi, 2014 Submitted in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in MATHEMATICS Texas A&M University-Corpus Christi Corpus Christi, Texas December 2017 c NHIEU BO All Rights Reserved December 2017 MISPECIFICATION BOOTSTRAP TESTS OF THE CAPITAL ASSET PRICING MODEL A Thesis by NHIEU BO This thesis meets the standards for scope and quality of Texas A&M University-Corpus Christi and is hereby approved. LEI JIN, PhD H. SWINT FRIDAY, PhD, CFP Chair Committee Member BLAIR STERBA-BOATWRIGHT, PhD Committee Member December 2017 ABSTRACT The development of the Capital Asset Pricing Model (CAPM) marks the birth of asset pricing framework in finance. The CAPM is a simple and powerful tool to describe the linear relationship between risk and expected return. According to the CAPM, all pricing errors should be jointly equal to zero. Many empirical studies were conducted to test the validity of the model in various stock markets. Traditional methods such as Black, Jensen, and Scholes (1972), Fama-MacBeth (1973) and cross-sectional regression have some limitations and encounter difficulties because they often involve estimation of the covariance matrix between all estimated price errors. It becomes even more difficult when the number of assets becomes larger. Our research is motivated by the objective to overcome the limitations of the traditional methods. In this study, we propose to use bootstrap methods which can capture the characteristics of the original data without any covariance estimation.
    [Show full text]
  • Regression Diagnostics
    Error assumptions Unusual observations Regression diagnostics Botond Szabo Leiden University Leiden, 30 April 2018 Botond Szabo Diagnostics Error assumptions Unusual observations Outline 1 Error assumptions Introduction Variance Normality 2 Unusual observations Residual vs error Outliers Influential observations Botond Szabo Diagnostics Error assumptions Unusual observations Introduction Errors and residuals 2 Assumption on errors: "i ∼ N(0; σ ); i = 1;:::; n: How to check? Examine the residuals" ^i 's. If the error assumption is okay," ^i will look like a sample generated from the normal distribution. Botond Szabo Diagnostics Error assumptions Unusual observations Variance Mean zero and constant variance Diagnostic plot: fitted values Y^i 's versus residuals" ^i 's. Illustration: savings data on 50 countries from 1960 to 1970. Linear regression; covariates: per capita disposable income, percentage of population under 15 etc. Botond Szabo Diagnostics Error assumptions Unusual observations Variance R code > library(faraway) > data(savings) > g<-lm(sr~pop15+pop75+dpi+ddpi,savings) > plot(fitted(g),residuals(g),xlab="Fitted", + ylab="Residuals") > abline(h=0) Botond Szabo Diagnostics Error assumptions Unusual observations Variance Plot No significant evidence against constant variance. 10 5 0 Residuals −5 6 8 10 12 14 16 Fitted Botond Szabo Diagnostics Error assumptions Unusual observations Variance Constant variance: examples 2 1.5 1 0.5 0 −1 rnorm(50) rnorm(50) −0.5 −2 −1.5 0 10 20 30 40 50 0 10 20 30 40 50 1:50 1:50 2 2 1 1 0 0 rnorm(50)
    [Show full text]
  • The Effect of Nonzero Autocorrelation Coefficients on the Distributions of Durbin-Watson Test Estimator: Three Autoregressive Models
    Expert Journal of Economics (2014) 2 , 85-99 EJ © 2014 The Author. Published by Sprint Investify. ISSN 2359-7704 Economics.ExpertJournals.com The Effect of Nonzero Autocorrelation Coefficients on the Distributions of Durbin-Watson Test Estimator: Three Autoregressive Models Mei-Yu LEE * Yuanpei University, Taiwan This paper investigates the effect of the nonzero autocorrelation coefficients on the sampling distributions of the Durbin-Watson test estimator in three time-series models that have different variance-covariance matrix assumption, separately. We show that the expected values and variances of the Durbin-Watson test estimator are slightly different, but the skewed and kurtosis coefficients are considerably different among three models. The shapes of four coefficients are similar between the Durbin-Watson model and our benchmark model, but are not the same with the autoregressive model cut by one-lagged period. Second, the large sample case shows that the three models have the same expected values, however, the autoregressive model cut by one-lagged period explores different shapes of variance, skewed and kurtosis coefficients from the other two models. This implies that the large samples lead to the same expected values, 2(1 – ρ0), whatever the variance-covariance matrix of the errors is assumed. Finally, comparing with the two sample cases, the shape of each coefficient is almost the same, moreover, the autocorrelation coefficients are negatively related with expected values, are inverted-U related with variances, are cubic related with skewed coefficients, and are U related with kurtosis coefficients. Keywords : Nonzero autocorrelation coefficient, the d statistic, serial correlation, autoregressive model, time series analysis JEL Classification : C32, C15, C52 1.
    [Show full text]
  • The Ways of Our Errors
    The Ways of Our Errors Optimal Data Analysis for Beginners and Experts c Keith Horne Universit y of St.Andrews DRAFT June 5, 2009 Contents I Optimal Data Analysis 1 1 Probability Concepts 2 1.1 Fuzzy Numbers and Dancing Data Points . 2 1.2 Systematic and Statistical Errors { Bias and Variance . 3 1.3 Probability Maps . 3 1.4 Mean, Variance, and Standard Deviation . 5 1.5 Median and Quantiles . 6 1.6 Fuzzy Algebra . 7 1.7 Why Gaussians are Special . 8 1.8 Why χ2 is Special . 8 2 A Menagerie of Probability Maps 10 2.1 Boxcar or Uniform (ranu.for) . 10 2.2 Gaussian or Normal (rang.for) . 11 2.3 Lorentzian or Cauchy (ranl.for) . 13 2.4 Exponential (rane.for) . 14 2.5 Power-Law (ranpl.for) . 15 2.6 Schechter Distribution . 17 2.7 Chi-Square (ranchi2.for) . 18 2.8 Poisson . 18 2.9 2-Dimensional Gaussian . 19 3 Optimal Statistics 20 3.1 Optimal Averaging . 20 3.1.1 weighted average . 22 3.1.2 inverse-variance weights . 22 3.1.3 optavg.for . 22 3.2 Optimal Scaling . 23 3.2.1 The Golden Rule of Data Analysis . 25 3.2.2 optscl.for . 25 3.3 Summary . 26 3.4 Problems . 26 i ii The Ways of Our Errors c Keith Horne DRAFT June 5, 2009 4 Straight Line Fit 28 4.1 1 data point { degenerate parameters . 28 4.2 2 data points { correlated vs orthogonal parameters . 29 4.3 N data points . 30 4.4 fitline.for .
    [Show full text]
  • 12 the Analysis of Residuals
    B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 12 The Analysis of Residuals 12.1 Errors and residuals Recall that in the statistical model for the completely randomized one-way design, Yij = μ + τi + εij i =1,...,a; j =1,...,ni, (1) 2 the errors εij are assumed to be NID(0,σ ). Specifically, we have the following two assumptions: homogeneity of variance — the error variance is the same for each treatment; normality — the errors are normally distributed. The errors εij are themselves unobservable. From the model (1), εij = Yij − μ − τi i =1,...,a; j =1,...,ni, but μ and the τi are unknown. However, we may replace μ + τi in the above expression by μˆ +ˆτi.Thisgivestheresiduals εˆij (as also defined in Section 10.2), εˆij = Yij − μˆ − τˆi = Yij − Y¯.. − (Y¯i. − Y¯..) = Yij − Y¯i. i =1,...,a; j =1,...,ni, whose observed values, eij, can be calculated from the experimental data. The residuals are closely related to the errors but are not the same and do not have the 2 same distributional properties. Specifically, the errors εij are assumed to be NID(0,σ ), but the residualsε ˆij are not even independently distributed of each other. They are in fact linearly dependent, since they satisfy the identitiesε ˆi. =0orei. =0,i =1,...,a. Furthermore, under our assumptions, the residuals are not normally distributed. Nevertheless, the residuals do have approximately the same joint distribution as the errors, and we shall use them as proxies for the errors to investigate the validity of the assumptions about the error distribution.
    [Show full text]
  • Robust and Clustered Standard Errors
    Robust and Clustered Standard Errors Molly Roberts March 6, 2013 Molly Roberts Robust and Clustered Standard Errors March 6, 2013 1 / 1 An Introduction to Robust and Clustered Standard Errors Outline Molly Roberts Robust and Clustered Standard Errors March 6, 2013 2 / 1 An Introduction to Robust and Clustered Standard Errors Outline Molly Roberts Robust and Clustered Standard Errors March 6, 2013 3 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Review: Errors and Residuals Errors are the vertical distances between observations and the unknown Conditional Expectation Function. Therefore, they are unknown. Residuals are the vertical distances between observations and the estimated regression function. Therefore, they are known. Molly Roberts Robust and Clustered Standard Errors March 6, 2013 4 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Notation Errors represent the difference between the outcome and the true mean. y = Xβ + u u = y − Xβ Residuals represent the difference between the outcome and the estimated mean. y = Xβ^ + u^ u^ = y − Xβ^ Molly Roberts Robust and Clustered Standard Errors March 6, 2013 5 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Variance of β^ depends on the errors − β^ = X0X 1 X0y − = X0X 1 X0(Xβ + u) − = β + X0X 1 X0u Molly Roberts Robust and Clustered Standard Errors March 6, 2013 6 / 1 An Introduction to Robust and Clustered Standard Errors
    [Show full text]
  • Microeconometrics Based on the Textbooks Verbeek: a Guide to Modern Econometrics and Cameron and Trivedi: Microeconometrics
    Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data Microeconometrics Based on the textbooks Verbeek: A Guide to Modern Econometrics and Cameron and Trivedi: Microeconometrics Robert M. Kunst [email protected] University of Vienna and Institute for Advanced Studies Vienna October 21, 2013 Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data Outline Basics Ordinary least squares The linear regression model Goodness of fit Restriction tests on coefficients Asymptotic properties of OLS Endogenous regressors Maximum likelihood Limited dependent variables Panel data Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data What is micro-econometrics? Micro-econometrics is concerned with the statistical analysis of individual (not aggregated) economic data. Some aspects of such data imply special emphasis: ◮ Samples are larger than for economic aggregates, often thousands of observations are available; ◮ Non-linear functional dependence may be identified from data (for aggregates linear or pre-specified functions must suffice); ◮ Qualitative or limited-dependent variables are common; ◮ Individuals can often be regarded as independent, random sampling assumptions can be plausible (no serial correlation); ◮ Individuals are often heterogeneous: heteroskedasticity; ◮ The logical sequences in time are unavailable. Cause and effect are identified from theory, not from the data. Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data Ordinary least squares Linear regression: ordinary least squares There are N observations on a ‘dependent’ variable y and on K 1 ‘explanatory’ variables (regressors) x ,..., xK .
    [Show full text]
  • Moment Ratio Estimation of Autoregressive/Unit Root Processes and Autocorrelation-Consistent Standard Errors
    An abridged version of this paper is forthcoming in Computational Statistics and Data Analysis Published online 5 Aug. 2015 DOI: 10.1016/j.csda.2015.07.003 Moment Ratio Estimation of Autoregressive/Unit Root Parameters and Autocorrelation-Consistent Standard Errors (Unabridged Version) J. Huston McCulloch Ohio State University (Emeritus) April 6, 2015 Corrections Aug. 1, 2015 ABSTRACT A Moment Ratio estimator is proposed for an AR(p) model of the errors in an OLS regression, that provides standard errors with far less median bias and confidence intervals with far better coverage than conventional alternatives. A unit root, and therefore the absence of cointegration, does not necessarily mean that a correlation between the variables is spurious. The estimator is applied to a quadratic trend model of real GDP. The rate of change of GDP growth is negative with finite standard error but is insignificant. The “output gap” has an infinite standard error and is therefore a statistical illusion. Keywords: Method of Moments, Autoregressive processes, Unit Root Processes, Regression Errors, Consistent Covariance matrix, Real GDP growth JEL codes: C13 (Estimation), C22 (Time Series Models), E17 (Macroeconomic Forecasting and Simulation) URL to PDF of latest version of paper, with color graphs: http://www.econ.ohio-state.edu/jhm/papers/MomentRatioEstimator.pdf Corresponding author: J. Huston McCulloch, 145 W 67th St, New York, NY 10023. Cell: (614) 460-1195. E-mail: [email protected] 2 1. Introduction Serial correlation is a pervasive problem in time series models in econometrics, as well as in statistics in general. When, as is often the case, positive serial correlation is present in both the errors and the regressors, it has long been well known that the Ordinary Least Squares (OLS) estimates of the standard errors are generally too small, and hence the derived t-statistics too large.1 If the form and parameters of the error serial correlation were known, it would be elementary to compute correct standard errors for OLS regression coefficients.
    [Show full text]
  • 1 a Non!Technical Introduction to Regression
    1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in a non-technical fashion some basic concepts (e.g. data types, graphs, descriptive statistics, correlation and regression). Since you have covered this material before, I will go through this material quickly, with a focus on the most important tool of the applied economist: re- gression. But please read through chapters 1 and 2, particularly if you need some review of this material. Regression is used to help understand the relation- ships between many variables. Regression as a Best Fitting Line We begin with simple regression to understand the relationship between two variables, X and Y . Example: see Figure 2.1 which is XY-plot of X = out- put versus Y = costs of production for 123 electric utility companies in the U.S. in 1970. The microeconomist will want to understand the re- lationship between output and costs. Regression …ts a line through the points in the XY- plot that best captures the relationship between out- put and costs. Simple Regression: Some Theory Question: What do we mean by “best …tting” line? Assume a linear relationship between X = output and Y = costs Y = + X; where is the intercept of the line and its slope. Even if straight line relationship were true, we would never get all points on an XY-plot lying precisely on it due to measurement error. True relationship probably more complicated, straight line may just be an approximation.
    [Show full text]