4. Statistical Inference for Regression
Total Page:16
File Type:pdf, Size:1020Kb
Sociology 740 John Fox Lecture Notes 4. Statistical Inference for Regression Copyright © 2014 by John Fox Statistical Inference for Regression 1 1. Goals: I To introduce the standard statistical models and assumptions for simple and multiple linear regression. I To describe properties of the least-squares coefficients as estimators of the parameters of the regression model. I To introduce flexible and general procedures for statistical inference based on least-squares estimators. I To explore further the interpretation of regression equations. I To show some fundamental results in matrix form (time permitting). c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 2 2. Simple Regression 2.1 The Simple-Regression Model Standard statistical inference in simple regression is based upon a statistical ‘model’: \l = + [l + %l I The coefficients and are the population regression parameters to be estimated. I The error %l represents the aggregated, omitted causes of \ : Other explanatory variables that could have been included. • Measurement error in \ . • Whatever component of \ is inherently random. • c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 3 I The key assumptions of the simple-regression model concern the behavior of the errors — or, equivalently, of the distribution of \ conditional on [: Linearity. H(%l) H(% {l)=0. Equivalently, the average value of \ • is a linear function of [:| H(\l) H(\ {l)=H( + {l + %l) l | = + {l + H(%l) = + {l 2 Constant Variance. Y (% {l)=%. Equivalently, the variance of \ • around the regression line| is constant: 2 2 2 2 Y (\ {l)=H[(\l ) ]=H[(\l {l) ]=H(% )= | l l % 2 Normality. %l Q(0>%). Equivalently, the conditional distribution of • 2 \ given { is normal: \l Q( + {l>%). The assumptions of linearity, constant variance, and normality are illustrated in Figure 1. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 4 p(Y|x) Y E(Y) = DEx P5 P4 P3 P2 P1 X x1 x2 x3 x4 x5 Figure 1. The assumptions of linearity, normality, and constant variance in the simple-regression model. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 5 Independence. The observations are sampled independently: Any • pair of errors %l and %m (or, equivalently, of conditional response- variable values, \l and \m) are independent for l = m. The assumption of independence needs to be justified by the procedures6 of data collection. Fixed [ or [ independent of the error. Depending upon the design • of a study, the values of the explanatory variable may be fixed in advance of data collection or they may be sampled along with the response variable. – Fixed [ corresponds almost exclusively to experimental research. – When, as is more common, [ is sampled along with \ , we assume that the explanatory variable and the error are independent in the population from which the sample is drawn: That is, the error has the 2 same distribution [Q(0>%)] for every value of [ in the population. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 6 2.2 Properties of the Least-Squares Estimator Under the strong assumptions of the simple-regression model, the sample least-squares coefficients D and E have several desirable properties as estimators of the population regression coefficients and : I The least-squares intercept and slope are linear estimators,inthesense that they are linear functions of the observations \l. For example, q E = pl\l l=1 where X {l { pl = q 2 ({m {) m=1 P This result is not important in itself, but it makes the distributions of the • least-squares coefficients simple. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 7 I Under the assumption of linearity, D and E are unbiased estimators of and : H(D)= H(E)= I Under the assumptions of linearity, constant variance, and indepen- dence, D and E have simple sampling variances: 2 2 % {l Y (D)= 2 q ({l {) P2 % Y (E)= P 2 ({l {) P It is instructive to rewrite the formula for Y (E): • 2 Y (E)= % (q 1)V2 [ c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 8 I The Gauss-Markov theorem: Of all linear unbiased estimators, the least-squares estimators are most efficient. Under normality, the least-squares estimators are most efficient among • all unbiased estimators, not just among linear estimators. This is a much more compelling result. I Under the full suite of assumptions, the least-squares coefficients D and E are the maximum-likelihood estimators of and . c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 9 I Under the assumption of normality, the least-squares coefficients are themselves normally distributed: 2 2 % {l D Q > 2 q ({l {) P2 ¸ % E Q > P 2 ({l {) ¸ P Even if the errors are not normally distributed, the distributions of D • and E are approximately normal, with the approximation improving as the sample size grows (the central limit theorem). c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 10 2.3 Confidence Intervals and Hypothesis Tests I The distributions of D and E cannot be directly employed for statistical 2 inference since % is never known in practice. 2 I The variance of the residuals provides an unbiased estimator of %> H2 V2 = l H q 2 and a basis for estimating the variancesP of D and E: V2 {2 Y (D)= H l q ({ {)2 Pl V2 b H Y (E)= P 2 ({l {) The added uncertainty induced by estimating the error variance I b P is reflectedintheuseofthew-distribution, in place of the normal distribution, for confidence intervals and hypothesis tests. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 11 To construct a 100(1 d)% confidence interval for the slope, we take • = E w SE(E) ± d@2 where w is the critical value of w with q 2 degrees of freedom and d@2 a probability of d@2 to the right, and SE(E) isthesquarerootofY (E). (Thisisjustlikeaconfidence interval for a population mean.) b Similarly, to test the hypothesis K0: = 0 (most commonly, K0: =0), • calculate the test statistic E w = 0 0 SE(E) which is distributed as w with q 2 degrees of freedom under K0. Confidence intervals and hypothesis tests for follow the same • pattern. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 12 I For Davis’s regression of measured on reported weight, for example: 418=87 V = =2=0569 H 101 2 r 2=0569 s329> 731 SE(D)= × =1=7444 s101 4539=3 × 2=0569 SE(E)= =0=030529 s4539=3 Since w=025 for 101 2=99degrees of freedom is 1=984, 95-percent • confidence intervals for and are =1=778 1=984 1=744 = 1=778 3=460 ± × ± =0=9772 1=984 0=03053 = 0=9772 0=06057 ± × ± c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 13 3. Multiple Regression Most of the results for multiple-regression analysis parallel those for simple regression. 3.1 The Multiple-Regression Model I The statistical model for multiple regression is \l = + [l1 + [l2 + + [ln + %l 1 2 ··· n I The assumptions underlying the model concern the errors, %l,andare identical to the assumptions in simple regression: Linearity. H(%l)=0. • 2 Constant Variance. Y (%l)= . • % 2 Normality. %l Q(0> ). • % Independence. %l>%m independent for l = m. • 6 Fixed [’s or [’s independent of %= • c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 14 I Under these assumptions (or particular subsets of them), the least- squares estimators D> E1> ===> En of > 1>===>n are linear functions of the data, and hence relatively simple; • unbiased; • maximally efficient among unbiased estimators; • maximum-likelihood estimators; • normally distributed. • I The slope coefficient Em in multiple regression has sampling variance 1 2 Y (E )= % m 2 q 2 1 Um × l=1([lm [m) 2 where Um is the squared multiple correlation from the regression of [m on all of the other [’s. P The second factor is essentially the sampling variance of the slope • 2 in simple regression, although the error variance % is smaller than before. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 15 The first factor — called the variance-inflation factor — is large • when the explanatory variable [m is strongly correlated with other explanatory variables (the problem of collinearity). c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 16 3.2 Confidence Intervals and Hypothesis Tests 3.2.1 Individual Slope Coefficients I Confidence intervals and hypothesis tests for individual coefficients closely follow the pattern of simple-regression analysis: 2 The variance of the residuals provides an unbiased estimator of %: • H2 V2 = l H q n 1 2 P Using V , we can calculate the standard error of Em: • H 1 VH SE(Em)= 2 × 2 1 Um ([lm [m) Confidence intervals andq tests, basedq onP the w-distribution with q n 1 • degrees of freedom, follow straightforwardly. c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 17 I For example, for Duncan’s regression of occupational prestige on education and income: 7506=7 V2 = = 178=73 H 45 2 1 u12 = =72451 1 s178=73 SE(E1)= =0=098252 s1 =724512 × s38> 971 1 s178=73 SE(E2)= =0=11967 s1 =724512 × s26> 271 With only two explanatory variables, U2 = U2 = u2 . • 1 2 12 c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 18 To construct 95-percent confidence intervals for the slope coefficients, • we use w=025 =2=018 from the w-distribution with 45 2 1=42degrees of freedom: Education: 1 =0=5459 2=018 0=09825 = 0=5459 0=1983 Income: =0=5987 ± 2=018 × 0=1197 = 0=5987 ±0=2415 2 ± × ± c 2014 by John Fox Sociology 740 ° Statistical Inference for Regression 19 3.2.2 All Slopes I We can also test the global or ‘omnibus’ null hypothesis that all of the regression slopes are zero: K0: = = = =0 1 2 ··· n which is not quite the same as testing the separate hypotheses (1) (2) (n) K0 : 1 =0; K0 : 2 =0;===; K0 : n =0 An I -test for the omnibus null hypothesis is given by • RegSS I = n 0 RSS q n 1 q n 1 U2 = n × 1 U2 Under the null hypothesis, this test statistic follows an I -distribution • with n and q n 1 degrees of freedom.