4. Statistical Inference for Regression

4. Statistical Inference for Regression

Sociology 740 John Fox Lecture Notes 4. Statistical Inference for Regression Copyright © 2014 by John Fox Statistical Inference for Regression 1 1. Goals: I To introduce the standard statistical models and assumptions for simple and multiple linear regression. I To describe properties of the least-squares coefficients as estimators of the parameters of the regression model. I To introduce flexible and general procedures for statistical inference based on least-squares estimators. I To explore further the interpretation of regression equations. I To show some fundamental results in matrix form (time permitting). c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 2 2. Simple Regression 2.1 The Simple-Regression Model Standard statistical inference in simple regression is based upon a statistical ‘model’: \l = + [l + %l I The coefficients and are the population regression parameters to be estimated. I The error %l represents the aggregated, omitted causes of \ : Other explanatory variables that could have been included. • Measurement error in \ . • Whatever component of \ is inherently random. • c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 3 I The key assumptions of the simple-regression model concern the behavior of the errors — or, equivalently, of the distribution of \ conditional on [: Linearity. H(%l) H(% {l)=0. Equivalently, the average value of \ • is a linear function of [:| H(\l) H(\ {l)=H( + {l + %l) l | = + {l + H(%l) = + {l 2 Constant Variance. Y (% {l)=%. Equivalently, the variance of \ • around the regression line| is constant: 2 2 2 2 Y (\ {l)=H[(\l ) ]=H[(\l {l) ]=H(% )= | l l % 2 Normality. %l Q(0>%). Equivalently, the conditional distribution of • 2 \ given { is normal: \l Q( + {l>%). The assumptions of linearity, constant variance, and normality are illustrated in Figure 1. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 4 p(Y|x) Y E(Y) = DEx P5 P4 P3 P2 P1 X x1 x2 x3 x4 x5 Figure 1. The assumptions of linearity, normality, and constant variance in the simple-regression model. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 5 Independence. The observations are sampled independently: Any • pair of errors %l and %m (or, equivalently, of conditional response- variable values, \l and \m) are independent for l = m. The assumption of independence needs to be justified by the procedures6 of data collection. Fixed [ or [ independent of the error. Depending upon the design • of a study, the values of the explanatory variable may be fixed in advance of data collection or they may be sampled along with the response variable. – Fixed [ corresponds almost exclusively to experimental research. – When, as is more common, [ is sampled along with \ , we assume that the explanatory variable and the error are independent in the population from which the sample is drawn: That is, the error has the 2 same distribution [Q(0>%)] for every value of [ in the population. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 6 2.2 Properties of the Least-Squares Estimator Under the strong assumptions of the simple-regression model, the sample least-squares coefficients D and E have several desirable properties as estimators of the population regression coefficients and : I The least-squares intercept and slope are linear estimators,inthesense that they are linear functions of the observations \l. For example, q E = pl\l l=1 where X {l { pl = q 2 ({m {) m=1 P This result is not important in itself, but it makes the distributions of the • least-squares coefficients simple. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 7 I Under the assumption of linearity, D and E are unbiased estimators of and : H(D)= H(E)= I Under the assumptions of linearity, constant variance, and indepen- dence, D and E have simple sampling variances: 2 2 % {l Y (D)= 2 q ({l {) P2 % Y (E)= P 2 ({l {) P It is instructive to rewrite the formula for Y (E): • 2 Y (E)= % (q 1)V2 [ c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 8 I The Gauss-Markov theorem: Of all linear unbiased estimators, the least-squares estimators are most efficient. Under normality, the least-squares estimators are most efficient among • all unbiased estimators, not just among linear estimators. This is a much more compelling result. I Under the full suite of assumptions, the least-squares coefficients D and E are the maximum-likelihood estimators of and . c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 9 I Under the assumption of normality, the least-squares coefficients are themselves normally distributed: 2 2 % {l D Q > 2 q ({l {) P2 ¸ % E Q > P 2 ({l {) ¸ P Even if the errors are not normally distributed, the distributions of D • and E are approximately normal, with the approximation improving as the sample size grows (the central limit theorem). c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 10 2.3 Confidence Intervals and Hypothesis Tests I The distributions of D and E cannot be directly employed for statistical 2 inference since % is never known in practice. 2 I The variance of the residuals provides an unbiased estimator of %> H2 V2 = l H q 2 and a basis for estimating the variancesP of D and E: V2 {2 Y (D)= H l q ({ {)2 Pl V2 b H Y (E)= P 2 ({l {) The added uncertainty induced by estimating the error variance I b P is reflectedintheuseofthew-distribution, in place of the normal distribution, for confidence intervals and hypothesis tests. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 11 To construct a 100(1 d)% confidence interval for the slope, we take • = E w SE(E) ± d@2 where w is the critical value of w with q 2 degrees of freedom and d@2 a probability of d@2 to the right, and SE(E) isthesquarerootofY (E). (Thisisjustlikeaconfidence interval for a population mean.) b Similarly, to test the hypothesis K0: = 0 (most commonly, K0: =0), • calculate the test statistic E w = 0 0 SE(E) which is distributed as w with q 2 degrees of freedom under K0. Confidence intervals and hypothesis tests for follow the same • pattern. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 12 I For Davis’s regression of measured on reported weight, for example: 418=87 V = =2=0569 H 101 2 r 2=0569 s329> 731 SE(D)= × =1=7444 s101 4539=3 × 2=0569 SE(E)= =0=030529 s4539=3 Since w=025 for 101 2=99degrees of freedom is 1=984, 95-percent • confidence intervals for and are =1=778 1=984 1=744 = 1=778 3=460 ± × ± =0=9772 1=984 0=03053 = 0=9772 0=06057 ± × ± c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 13 3. Multiple Regression Most of the results for multiple-regression analysis parallel those for simple regression. 3.1 The Multiple-Regression Model I The statistical model for multiple regression is \l = + [l1 + [l2 + + [ln + %l 1 2 ··· n I The assumptions underlying the model concern the errors, %l,andare identical to the assumptions in simple regression: Linearity. H(%l)=0. • 2 Constant Variance. Y (%l)= . • % 2 Normality. %l Q(0> ). • % Independence. %l>%m independent for l = m. • 6 Fixed [’s or [’s independent of %= • c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 14 I Under these assumptions (or particular subsets of them), the least- squares estimators D> E1> ===> En of > 1>===>n are linear functions of the data, and hence relatively simple; • unbiased; • maximally efficient among unbiased estimators; • maximum-likelihood estimators; • normally distributed. • I The slope coefficient Em in multiple regression has sampling variance 1 2 Y (E )= % m 2 q 2 1 Um × l=1([lm [m) 2 where Um is the squared multiple correlation from the regression of [m on all of the other [’s. P The second factor is essentially the sampling variance of the slope • 2 in simple regression, although the error variance % is smaller than before. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 15 The first factor — called the variance-inflation factor — is large • when the explanatory variable [m is strongly correlated with other explanatory variables (the problem of collinearity). c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 16 3.2 Confidence Intervals and Hypothesis Tests 3.2.1 Individual Slope Coefficients I Confidence intervals and hypothesis tests for individual coefficients closely follow the pattern of simple-regression analysis: 2 The variance of the residuals provides an unbiased estimator of %: • H2 V2 = l H q n 1 2 P Using V , we can calculate the standard error of Em: • H 1 VH SE(Em)= 2 × 2 1 Um ([lm [m) Confidence intervals andq tests, basedq onP the w-distribution with q n 1 • degrees of freedom, follow straightforwardly. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 17 I For example, for Duncan’s regression of occupational prestige on education and income: 7506=7 V2 = = 178=73 H 45 2 1 u12 = =72451 1 s178=73 SE(E1)= =0=098252 s1 =724512 × s38> 971 1 s178=73 SE(E2)= =0=11967 s1 =724512 × s26> 271 With only two explanatory variables, U2 = U2 = u2 . • 1 2 12 c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 18 To construct 95-percent confidence intervals for the slope coefficients, • we use w=025 =2=018 from the w-distribution with 45 2 1=42degrees of freedom: Education: 1 =0=5459 2=018 0=09825 = 0=5459 0=1983 Income: =0=5987 ± 2=018 × 0=1197 = 0=5987 ±0=2415 2 ± × ± c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 19 3.2.2 All Slopes I We can also test the global or ‘omnibus’ null hypothesis that all of the regression slopes are zero: K0: = = = =0 1 2 ··· n which is not quite the same as testing the separate hypotheses (1) (2) (n) K0 : 1 =0; K0 : 2 =0;===; K0 : n =0 An I -test for the omnibus null hypothesis is given by • RegSS I = n 0 RSS q n 1 q n 1 U2 = n × 1 U2 Under the null hypothesis, this test statistic follows an I -distribution • with n and q n 1 degrees of freedom.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us