Chapter 14. Linear Least Squares

Chapter 14. Linear Least Squares

Serik Sagitov, Chalmers and GU, March 9, 2016 Chapter 14. Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x. For a given set of values (x1; : : : ; xn) of the independent variable put Yi = β0 + β1xi + i; i = 1; : : : ; n; 2 assuming that the noise vector (1; : : : ; n) has independent N(0,σ ) random components. Given the 2 data (y1; : : : ; yn), the model is characterised by the likelihood function of three parameters β0, β1, σ n 2 S(β ,β ) 2 Y 1 n (yi − β0 − β1xi) o −n=2 −n − 0 1 L(β ; β ; σ ) = p exp − = (2π) σ e 2σ2 ; 0 1 2σ2 i=1 2πσ Pn 2 where S(β0; β1) = i=1(yi − β0 − β1xi) . Observe that n −1 −1 X 2 2 2 2 2 n S(β0; β1) = n (yi − β0 − β1xi) = β0 + 2β0β1x¯ − 2β0y¯ − 2β1xy + β1 x + y : i=1 Least squares estimates Regression lines: true y = β0 + β1x and fitted y = b0 + b1x. We want to find (b0; b1) such that the observed responses yi are approximated by the predicted responsesy ^i = b0 + b1xi in an optimal way. P 2 Least squares method: find (b0; b1) minimising the sum of squares S(b0; b1) = (yi − y^i) . From @S=@b0 = 0 and @S=@b1 = 0 we get the so-called Normal Equations: 8 xy − x¯y¯ rsy b0 + b1x¯ =y ¯ < b1 = = 2 2 2 implying x − x¯ sx b0x¯ + b1x = xy : b0 =y ¯ − b1x¯ The least square regression line y = b + b x takes the form y =y ¯ + r sy (x − x¯): 0 1 sx 2 1 P 2 2 1 P 2 sample variances sx = n−1 (xi − x¯) , sy = n−1 (yi − y¯) , 1 P sample covariance sxy = n−1 (xi − x¯)(yi − y¯), sample correlation coefficient r = sxy . sxsy The least square estimates (b0; b1) are the maximum likelihood estimates of (β0; β1). The least square estimates (b0; b1) are not robust: outliers exert leverage on the fitted line. 2 Residuals The estimated regression line predicts the responses to the values of the explanatory variable by y^ =y ¯ + r sy (x − x¯). The noise in the observed responses y is represented by the residuals i sx i i e = y − y^ = y − y¯ − r sy (x − x¯), i i i i sx i e1 + ::: + en = 0, x1e1 + ::: + xnen = 0,y ^1e1 + ::: +y ^nen = 0. 1 Residuals ei have normal distributions with zero mean and P 2 P 2 k(xk − xi) 2 k(xk − xi)(xk − xj) Var(ei) = σ 1 − 2 ; Cov(ei; ej) = −σ · 2 : n(n − 1)sx n(n − 1)sx Error sum of squares 2 P 2 P 2 sy 2 sy P 2 2 2 SSE = e = (yi − y¯) − 2r n(xy − y¯x¯) + r 2 (xi − x¯) = (n − 1)s (1 − r ). i i i sx sx i y 2 2 SSE n−1 2 2 Corrected maximum likelihood estimate of σ : s = n−2 = n−2 sy(1 − r ) Using yi − y¯ =y ^i − y¯ + ei we obtain SST = SSR + SSE, P 2 2 SST = i(yi − y¯) = (n − 1)sy is the total sum of squares, P 2 2 2 SSR = i(^yi − y¯) = (n − 1)b1sx is the regression sum of squares. 2 SSR SSE Coefficient of determination r = SST = 1 − SST . Coefficient of determination is the proportion of variation in Y explained by main factor X. Thus r2 has a more transparent meaning than the correlation coefficient r. To test the normality assumption use the normal distribution plot for the standardized residuals ei , si q P 2 k(xk−xi) where si = s 1 − 2 are the estimated standard deviations of ei. n(n−1)sx The expected plot of the standardised residuals versus xi is a horizontal blur (linearity), variance does not depend on x (homoscedasticity). Example (flow rate vs stream depth) For this example with n = 10, the scatter plot looks slightly non-linear. The residual plot gives a clearer picture having the U-shape. After the log-log transformation, the scatter plot is closer to linear and the residual plot has a horizontal profile. 3 Confidence intervals and hypothesis testing The list square estimators (b0; b1) are unbiased and consistent. Due to the normality assumption we have the following exact distributions p σ2·P x2 s P x2 2 2 i b0−β0 p i b0 ∼ N(β0; σ0), σ0 = n(n−1)s2 , s ∼ tn−2; sb0 = , x b0 sx n(n−1) 2 2 2 σ b1−β1 ps b1 ∼ N(β1; σ1), σ1 = 2 , ∼ tn−2; sb1 = : (n−1)sx sb1 sx n−1 σ2·x¯ Weak dependence between the two estimators: Cov(b0; b1) = − 2 . (n−1)sx α Exact 100(1 − α)% CI for βi: bi ± tn−2( 2 ) · sbi bi−βi0 Hypothesis testing H0: βi = βi0: test statistic T = , exact null distribution T ∼ tn−2. sbi Model utility test and zero-intercept test H0: β1 = 0 (no relationship between X and Y ), test statistic T = b1=sb1 , null distribution T ∼ tn−2. H0: β0 = 0, test statistic T = b0=sb0 , null distribution T ∼ tn−2. 2 Intervals for individual observations Given x predict the value y for the random variable Y = β0 +β1 ·x+. Its expected value µ = β0 +β1 ·x has the least square estimateµ ^ = b0 + b1 · x. The standard error ofµ ^ is computed as the square root of Var(^µ) = σ2 + σ2 · ( x−x¯ )2. n n−1 sx q Exact 100(1 − α)% confidence interval for the mean µ: b + b x ± t ( α ) · s 1 + 1 ( x−x¯ )2 0 1 n−2 2 n n−1 sx q Exact 100(1 − α)% prediction interval for y: b + b x ± t ( α ) · s 1 + 1 + 1 ( x−x¯ )2 0 1 n−2 2 n n−1 sx Prediction interval has wider limits Var(Y − µ^) = σ2+Var(^µ) = σ2(1 + 1 + 1 · ( x−x¯ )2), n n−1 sx since it contains the uncertainty due the noise factors. Compare these two formulas by drawing the confidence bands around the regression line both for the individual observation y and the mean µ. 4 Linear regression and ANOVA Recall the two independent samples case from Chapter 11: first sample µ1 + 1; : : : ; µ1 + n, second sample µ2 + n+1; : : : ; µ2 + n+m, 2 where the noise variables are independent and identically distributed i ∼ N(0; σ ). This setting is equivalent to the simple linear regression model Yi = β0 + β1xi + i; x1 = ::: = xn = 0; xn+1 = ::: = xn+m = 1; with µ1 = β0; µ2 = β0 + β1: The model utility test H0 : β1 = 0 is equivalent to the equality test H0 : µ1 = µ2. More generally, for the one-way ANOVA setting with I = p levels for the main factor and n = pJ observations β0 + i; i = 1; : : : ; J; β0 + β1 + i; i = J + 1;:::; 2J; ::: β0 + βp−1 + i; i = (p − 1)J + 1; : : : ; n; we need a multiple linear regression model Yi = β0 + β1xi;1 + ::: + βp−1xi;p−1 + i; i = 1; : : : ; n with dummy variables xi;j taking values 0 and 1 so that xi;1 = 1 only for i = J + 1;::: 2J; xi;2 = 1 only for i = 2J + 1;::: 3J; ::: xi;p−1 = 1 only for i = (p − 1)J + 1; : : : ; n: 3 5 Multiple linear regression Consider a linear regression model 2 Y = β0 + β1x1 + ::: + βp−1xp−1 + , ∼ N(0; σ ) with p − 1 explanatory variables and a homoscedastic noise. This is an extension of the simple linear regression model with p = 2. The corresponding data set consists of observations (y1; : : : ; yn) with n > p, which are realisations of n independent random variables Y1 = β0 + β1x1;1 + ::: + βp−1x1;p−1 + 1; ::: Yn = β0 + β1xn;1 + ::: + βp−1xn;p−1 + n: T In the matrix notation the column vector y = (y1; : : : ; yn) is a realisation of Y = Xβ + , where T T T Y = (Y1;:::;Yn) ; β = (β0; : : : ; βp−1) ; = (1; : : : ; n) ; are column vectors, and X is the so called design matrix 0 1 1 x1;1 : : : x1;p−1 X = @ :::::::::::: A 1 xn;1 : : : xn;p−1 T 2 assumed to have rank p. Least square estimates b = (b0; : : : ; bp−1) minimise S(b) = ky − Xbk , where kak is the length of a vector a. Solving the normal equations XT Xb = XT y we find the least squares estimates being b = MXT y; M = (XT X)−1: Least squares multiple regression: predicted responses y^ = Xb = Py, where P = XMXT . 2 Covariance matrix for the least square estimates Σbb = σ M is a p × p matrix with elements 2 Cov(bi; bj). The vector of residuals e = y − y^ = (I − P)y have a covariance matrix Σee = σ (I − P). 2 2 SSE 2 An unbiased estimate of σ is given by s = n−p , where SSE = kek . p The standard error of bi is computed as sbj = s mjj, where mjj is a diagonal element of M. bj −βj Exact sampling distributions ∼ tn−p, j = 1; : : : ; p − 1. sbj yi−y^i Inspect the normal probability plot for the standardised residuals p , where pii are the diagonal s 1−pii elements of P. Coefficient of multiple determination can be computed similarly to the simple linear regression 2 SSE 2 2 model as R = 1 − SST , where SST = (n − 1)sy. The problem with R is that it increases even if irrelevant variables are added to the model. To punish for irrelevant variables it is better to use the adjusted coefficient of multiple determination 4 2 n−1 SSE s2 R = 1 − · = 1 − 2 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us