3. Linear Least-Squares Regression

Total Page:16

File Type:pdf, Size:1020Kb

3. Linear Least-Squares Regression Sociology 740 John Fox Lecture Notes 3. Linear Least-Squares Regression Copyright © 2014 by John Fox Linear Least-Squares Regression 1 1. Goals: I To review/introduce the calculation and interpretation of the least- squares regression coefficients in simple and multiple regression. I To review/introduce the calculation and interpretation of the regression standard error and the simple and multiple correlation coefficients. I To introduce and criticize the use of standardized regression coefficients I Time and interest permitting: To introduce matrix arithmetic and least-squares regession in matrix form. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 2 2. Introduction I Despite its limitations, linear least squares lies at the very heart of applied statistics: Some data are adequately summarized by linear least-squares • regression. The effective application of linear regression is expanded by data • transformations and diagnostics. The general linear model — an extension of least-squares linear • regression — is able to accommodate a very broad class of specifica- tions. Linear least-squares provides a computational basis for a variety of • generalizations (such as generalized linear models). I This lecture describes the mechanics and descriptive interpretation of linear least-squares regression. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 3 3. Simple Regression 3.1 Least-Squares Fit I Figure 1 shows Davis’s data on the measured and reported weight in kilograms of 101 women who were engaged in regular exercise. The relationship between measured and reported weight appears to • be linear, so it is reasonable to fit a line to the plot. I Denoting measured weight by \ and reported weight by [,aline relating the two variables has the equation \ = D + E[. No line can pass perfectly through all of the data points. A residual, H, • reflects this fact. The regression equation for the lth of the q = 101 observations is • \l = D + E[l + Hl = \l + Hl b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 4 Measured Weight (kg) Weight Measured 40 50 60 70 40 45 50 55 60 65 70 75 Reported Weight (kg) Figure 1. A least-squares line fit to Davis’s data on reported and measured weight. (The broken line is the line \ = [.) Some points are over-plotted. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 5 The residual • Hl = \l \l = \l (D + E[l) is the signed vertical distance between the point and the line, as showninFigure2. b I A line that fits the data well makes the residuals small. q Simply requiring that the sum of residuals, l=1 Hl, be small is futile, • since large negative residuals can offset large positive ones. P Indeed, any line through the point ([>\ ) has Hl =0. • I Two possibilities immediately present themselves:P Find D and E to minimize the absolute residuals, Hl , which leads • to least-absolute-values (LAV) regression. | | P 2 Find D and E to minimize the squared residuals, Hl , which leads to • least-squares (LS) regression. P c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 6 Y (Xi, Yi) ^ Y A BX Yi Ei ^ Yi 0 X Xi Figure 2. The residual Hl is the signed vertical distance between the point and the line. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 7 I In least-squares regression, we seek the values of D and E that minimize: q 2 2 V(D> E)= Hl = (\l D E[l) l=1 X X For those with calculus: • – The most direct approach is to take the partial derivatives of the sum-of-squares function with respect to the coefficients: CV(D> E) = ( 1)(2)(\l D E[l) CD CV(D> E) X = ( [l)(2)(\l D E[l) CE Setting these partial derivativesX to zero yields simultaneous linear • equations for D and E,thenormal equations for simple regression: Dq + E [l = \l 2 D [l + E X[l = X [l\l c 2014 by John Fox Sociology 740 ° X X X Linear Least-Squares Regression 8 Solving the normal equations produces the least-squares coefficients: • D = \ E[ q [ \ [ \ ([ [)(\ \ ) E = l l l l = l l 2 2 2 q [ ( [l) ([l [) P l P P P P P P – The formula for D implies that the least-squares line passes through the point-of-means of the two variables. The least-squares residuals therefore sum to zero. – The second normal equation implies that [lHl =0; similarly, \lHl =0. These properties imply that the residuals are uncorre- P latedwithboththe[’s and the \ ’s. P b b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 9 I For Davis’s data on measured weight (\ ) and reported weight ([): q = 101 5780 \ = =57=23 101 5731 [ = =56=74 101 ([l [)(\l \ ) = 4435= 2 X ([l [) = 4539= 4435 X E = =0=9771 4539 D =57=23 0=9771 56=74 = 1=789 × The least-squares regression equation is • Measured\ Weight =1=79 + 0=977 Reported Weight × c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 10 I Interpretation of the least-squares coefficients: E =0=977: A one-kilogram increase in reported weight is associated • on average with just under a one-kilogram increase in measured weight. – Since the data are not longitudinal, the phrase “a unit increase” here implies not a literal change over time, but rather a static comparison between two individuals who differ by one kilogram in their reported weights. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 11 Ordinarily, we may interpret the intercept D as the fitted value • associated with [ =0, but it is impossible for an individual to have a reported weight equal to zero. – The intercept D is usually of little direct interest, since the fitted value above [ =0is rarely important. – Here, however, if individuals’ reports are unbiased predictions of their actual weights, then we should have \ = [ — i.e., D =0.The intercept D =1=79 is close to zero, and the slope E =0=977 is close to one. b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 12 3.2 Simple Correlation I It is of interest to determine how closely the line fits the scatter of points. I The standard deviation of the residuals, VH, called the standard error of the regression, provides one index of fit. Because of estimation considerations, the variance of the residuals is • defined using q 2 degrees of freedom: H2 V2 = l H q 2 P The standard error is therefore • 2 Hl VH = sq 2 P Since it is measured in the units of the response variable, the standard • error represents a type of ‘average’ residual. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 13 For Davis’s regression of measured on reported weight, the sum of • 2 squared residuals is Hl = 418=9, and the standard error 418=9 VP= =2=05 kg. H 101 2 r I believe that social scientists overemphasize correlation and pay • insufficient attention to the standard error of the regression. I The correlation coefficient provides a relative measure of fit: To what degree do our predictions of \ improve when we base that prediction on the linear relationship between \ and [? A relative index of fit requires a baseline — how well can \ be • predicted if [ is disregarded? – To disregard the explanatory variable is implicitly to fit the equation \l = D0 + Hl0 – We can find the best-fitting constant D0 by least-squares, minimizing 2 2 V(D0)= H0 = (\l D0) l X X c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 14 – The value of D0 that minimizes this sum of squares is the response- variable mean, \ . The residuals Hl = \l \l from the linear regression of \ on [ • will generally be smaller than the residuals H0 = \l \ , and it is l necessarily the case that b 2 2 (\l \l) (\l \ ) – This inequality holds because the ‘null model,’ \ = D + H is X X l 0 l0 a special case of the moreb general linear-regression ‘model,’ \l = D + E[l + Hl,settingE =0. We call • 2 2 H0 = (\l \ ) l the total sum of squares for \ , abbreviated TSS,while X X 2 2 H = (\l \l) l is called the residual sumX of squaresX , and is abbreviated RSS. b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 15 The difference between the two, termed the regression sum of • squares, RegSS TSS RSS gives the reduction in squared error due to the linear regression. The ratio of RegSS to TSS, the proportional reduction in squared error, • defines the square of the correlation coefficient: RegSS u2 TSS To find the correlation coefficient u we take the positive square root of • u2 when the simple-regression slope E is positive, and the negative square root when E is negative. If there is a perfect positive linear relationship between \ and [,then • u =1. A perfect negative linear relationship corresponds to u = 1. • If there is no linear relationship between \ and [,thenRSS= TSS, • RegSS =0,andu =0. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 16 Between these extremes, u gives the direction of the linear relationship • between the two variables, and u2 may be interpreted as the proportion of the total variation of \ that is ‘captured’ by its linear regression on [. Figure 3 depicts several different levels of correlation. • I The decomposition of total variation into ‘explained’ and ‘unexplained’ components, paralleling the decomposition of each observation into a fitted value and a residual, is typical of linear models.
Recommended publications
  • Generalized Linear Models and Generalized Additive Models
    00:34 Friday 27th February, 2015 Copyright ©Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ Chapter 13 Generalized Linear Models and Generalized Additive Models [[TODO: Merge GLM/GAM and Logistic Regression chap- 13.1 Generalized Linear Models and Iterative Least Squares ters]] [[ATTN: Keep as separate Logistic regression is a particular instance of a broader kind of model, called a gener- chapter, or merge with logis- alized linear model (GLM). You are familiar, of course, from your regression class tic?]] with the idea of transforming the response variable, what we’ve been calling Y , and then predicting the transformed variable from X . This was not what we did in logis- tic regression. Rather, we transformed the conditional expected value, and made that a linear function of X . This seems odd, because it is odd, but it turns out to be useful. Let’s be specific. Our usual focus in regression modeling has been the condi- tional expectation function, r (x)=E[Y X = x]. In plain linear regression, we try | to approximate r (x) by β0 + x β. In logistic regression, r (x)=E[Y X = x] = · | Pr(Y = 1 X = x), and it is a transformation of r (x) which is linear. The usual nota- tion says | ⌘(x)=β0 + x β (13.1) · r (x) ⌘(x)=log (13.2) 1 r (x) − = g(r (x)) (13.3) defining the logistic link function by g(m)=log m/(1 m). The function ⌘(x) is called the linear predictor. − Now, the first impulse for estimating this model would be to apply the transfor- mation g to the response.
    [Show full text]
  • Ordinary Least Squares 1 Ordinary Least Squares
    Ordinary least squares 1 Ordinary least squares In statistics, ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation. The resulting estimator can be expressed by a simple formula, especially in the case of a single regressor on the right-hand side. The OLS estimator is consistent when the regressors are exogenous and there is no Okun's law in macroeconomics states that in an economy the GDP growth should multicollinearity, and optimal in the class of depend linearly on the changes in the unemployment rate. Here the ordinary least squares method is used to construct the regression line describing this law. linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors be normally distributed, OLS is the maximum likelihood estimator. OLS is used in economics (econometrics) and electrical engineering (control theory and signal processing), among many areas of application. Linear model Suppose the data consists of n observations { y , x } . Each observation includes a scalar response y and a i i i vector of predictors (or regressors) x . In a linear regression model the response variable is a linear function of the i regressors: where β is a p×1 vector of unknown parameters; ε 's are unobserved scalar random variables (errors) which account i for the discrepancy between the actually observed responses y and the "predicted outcomes" x′ β; and ′ denotes i i matrix transpose, so that x′ β is the dot product between the vectors x and β.
    [Show full text]
  • Linear Regression
    DATA MINING 2 Linear and Logistic Regression Riccardo Guidotti a.a. 2019/2020 Regression • Given a dataset containing N observations Xi, Yi i = 1, 2, …, N • Regression is the task of learning a target function f that maps each input attribute set X into a output Y. • The goal is to find the target function that can fit the input data with minimum error. • The error function can be expressed as • Absolute Error = ∑! |�! − �(�!)| " • Squared Error = ∑!(�! − � �! ) residuals Linear Regression Y • Linear regression is a linear approach to modeling the relationship between a dependent variable Y and one or more independent (explanatory) variables X. • The case of one explanatory variable is called simple linear regression. • For more than one explanatory variable, the process is called multiple linear X regression. • For multiple correlated dependent variables, the process is called multivariate linear regression. What does it mean to predict Y? • Look at X = 5. There are many different Y values at X=5. • When we say predict Y at X =5, we are really asking: • What is the expected value (average) of Y at X = 5? What does it mean to predict Y? • Formally, the regression function is given by E(Y|X=x). This is the expected value of Y at X=x. • The ideal or optimal predictor of Y based on X is thus • f(X) = E(Y | X=x) Simple Linear Regression Dependent Independent Variable Variable Linear Model: Y = mX + b � = �!� + �" Slope Intercept (bias) • In general, such a relationship may not hold exactly for the largely unobserved population • We call the unobserved deviations from Y the errors.
    [Show full text]
  • Generalized and Weighted Least Squares Estimation
    LINEAR REGRESSION ANALYSIS MODULE – VII Lecture – 25 Generalized and Weighted Least Squares Estimation Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 The usual linear regression model assumes that all the random error components are identically and independently distributed with constant variance. When this assumption is violated, then ordinary least squares estimator of regression coefficient looses its property of minimum variance in the class of linear and unbiased estimators. The violation of such assumption can arise in anyone of the following situations: 1. The variance of random error components is not constant. 2. The random error components are not independent. 3. The random error components do not have constant variance as well as they are not independent. In such cases, the covariance matrix of random error components does not remain in the form of an identity matrix but can be considered as any positive definite matrix. Under such assumption, the OLSE does not remain efficient as in the case of identity covariance matrix. The generalized or weighted least squares method is used in such situations to estimate the parameters of the model. In this method, the deviation between the observed and expected values of yi is multiplied by a weight ω i where ω i is chosen to be inversely proportional to the variance of yi. n 2 For simple linear regression model, the weighted least squares function is S(,)ββ01=∑ ωii( yx −− β0 β 1i) . ββ The least squares normal equations are obtained by differentiating S (,) ββ 01 with respect to 01 and and equating them to zero as nn n ˆˆ β01∑∑∑ ωβi+= ωiixy ω ii ii=11= i= 1 n nn ˆˆ2 βω01∑iix+= βω ∑∑ii x ω iii xy.
    [Show full text]
  • 18.650 (F16) Lecture 10: Generalized Linear Models (Glms)
    Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X (µ(X), σ2I), | ∼ N And ⊤ IE(Y X) = µ(X) = X β, | 2/52 Components of a linear model The two components (that we are going to relax) are 1. Random component: the response variable Y X is continuous | and normally distributed with mean µ = µ(X) = IE(Y X). | 2. Link: between the random and covariates X = (X(1),X(2), ,X(p))⊤: µ(X) = X⊤β. · · · 3/52 Generalization A generalized linear model (GLM) generalizes normal linear regression models in the following directions. 1. Random component: Y some exponential family distribution ∼ 2. Link: between the random and covariates: ⊤ g µ(X) = X β where g called link function� and� µ = IE(Y X). | 4/52 Example 1: Disease Occuring Rate In the early stages of a disease epidemic, the rate at which new cases occur can often increase exponentially through time. Hence, if µi is the expected number of new cases on day ti, a model of the form µi = γ exp(δti) seems appropriate. ◮ Such a model can be turned into GLM form, by using a log link so that log(µi) = log(γ) + δti = β0 + β1ti. ◮ Since this is a count, the Poisson distribution (with expected value µi) is probably a reasonable distribution to try. 5/52 Example 2: Prey Capture Rate(1) The rate of capture of preys, yi, by a hunting animal, tends to increase with increasing density of prey, xi, but to eventually level off, when the predator is catching as much as it can cope with.
    [Show full text]
  • Simple Linear Regression with Least Square Estimation: an Overview
    Aditya N More et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (6) , 2016, 2394-2396 Simple Linear Regression with Least Square Estimation: An Overview Aditya N More#1, Puneet S Kohli*2, Kshitija H Kulkarni#3 #1-2Information Technology Department,#3 Electronics and Communication Department College of Engineering Pune Shivajinagar, Pune – 411005, Maharashtra, India Abstract— Linear Regression involves modelling a relationship amongst dependent and independent variables in the form of a (2.1) linear equation. Least Square Estimation is a method to determine the constants in a Linear model in the most accurate way without much complexity of solving. Metrics where such as Coefficient of Determination and Mean Square Error is the ith value of the sample data point determine how good the estimation is. Statistical Packages is the ith value of y on the predicted regression such as R and Microsoft Excel have built in tools to perform Least Square Estimation over a given data set. line The above equation can be geometrically depicted by Keywords— Linear Regression, Machine Learning, Least Squares Estimation, R programming figure 2.1. If we draw a square at each point whose length is equal to the absolute difference between the sample data point and the predicted value as shown, each of the square would then represent the residual error in placing the I. INTRODUCTION regression line. The aim of the least square method would Linear Regression involves establishing linear be to place the regression line so as to minimize the sum of relationships between dependent and independent variables.
    [Show full text]
  • Linear Regression
    eesc BC 3017 statistics notes 1 LINEAR REGRESSION Systematic var iation in the true value Up to now, wehav e been thinking about measurement as sampling of values from an ensemble of all possible outcomes in order to estimate the true value (which would, according to our previous discussion, be well approximated by the mean of a very large sample). Givenasample of outcomes, we have sometimes checked the hypothesis that it is a random sample from some ensemble of outcomes, by plotting the data points against some other variable, such as ordinal position. Under the hypothesis of random selection, no clear trend should appear.Howev er, the contrary case, where one finds a clear trend, is very important. Aclear trend can be a discovery,rather than a nuisance! Whether it is adiscovery or a nuisance (or both) depends on what one finds out about the reasons underlying the trend. In either case one must be prepared to deal with trends in analyzing data. Figure 2.1 (a) shows a plot of (hypothetical) data in which there is a very clear trend. The yaxis scales concentration of coliform bacteria sampled from rivers in various regions (units are colonies per liter). The x axis is a hypothetical indexofregional urbanization, ranging from 1 to 10. The hypothetical data consist of 6 different measurements at each levelofurbanization. The mean of each set of 6 measurements givesarough estimate of the true value for coliform bacteria concentration for rivers in a region with that urbanization level. The jagged dark line drawn on the graph connects these estimates of true value and makes the trend quite clear: more extensive urbanization is associated with higher true values of bacteria concentration.
    [Show full text]
  • 4.3 Least Squares Approximations
    218 Chapter 4. Orthogonality 4.3 Least Squares Approximations It often happens that Ax D b has no solution. The usual reason is: too many equations. The matrix has more rows than columns. There are more equations than unknowns (m is greater than n). The n columns span a small part of m-dimensional space. Unless all measurements are perfect, b is outside that column space. Elimination reaches an impossible equation and stops. But we can’t stop just because measurements include noise. To repeat: We cannot always get the error e D b Ax down to zero. When e is zero, x is an exact solution to Ax D b. When the length of e is as small as possible, bx is a least squares solution. Our goal in this section is to compute bx and use it. These are real problems and they need an answer. The previous section emphasized p (the projection). This section emphasizes bx (the least squares solution). They are connected by p D Abx. The fundamental equation is still ATAbx D ATb. Here is a short unofficial way to reach this equation: When Ax D b has no solution, multiply by AT and solve ATAbx D ATb: Example 1 A crucial application of least squares is fitting a straight line to m points. Start with three points: Find the closest line to the points .0; 6/; .1; 0/, and .2; 0/. No straight line b D C C Dt goes through those three points. We are asking for two numbers C and D that satisfy three equations.
    [Show full text]
  • Generalized Linear Models
    Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions • So far we've seen two canonical settings for regression. Let X 2 Rp be a vector of predictors. In linear regression, we observe Y 2 R, and assume a linear model: T E(Y jX) = β X; for some coefficients β 2 Rp. In logistic regression, we observe Y 2 f0; 1g, and we assume a logistic model (Y = 1jX) log P = βT X: 1 − P(Y = 1jX) • What's the similarity here? Note that in the logistic regression setting, P(Y = 1jX) = E(Y jX). Therefore, in both settings, we are assuming that a transformation of the conditional expec- tation E(Y jX) is a linear function of X, i.e., T g E(Y jX) = β X; for some function g. In linear regression, this transformation was the identity transformation g(u) = u; in logistic regression, it was the logit transformation g(u) = log(u=(1 − u)) • Different transformations might be appropriate for different types of data. E.g., the identity transformation g(u) = u is not really appropriate for logistic regression (why?), and the logit transformation g(u) = log(u=(1 − u)) not appropriate for linear regression (why?), but each is appropriate in their own intended domain • For a third data type, it is entirely possible that transformation neither is really appropriate. What to do then? We think of another transformation g that is in fact appropriate, and this is the basic idea behind a generalized linear model 1.2 Generalized linear models • Given predictors X 2 Rp and an outcome Y , a generalized linear model is defined by three components: a random component, that specifies a distribution for Y jX; a systematic compo- nent, that relates a parameter η to the predictors X; and a link function, that connects the random and systematic components • The random component specifies a distribution for the outcome variable (conditional on X).
    [Show full text]
  • Time-Series Regression and Generalized Least Squares in R*
    Time-Series Regression and Generalized Least Squares in R* An Appendix to An R Companion to Applied Regression, third edition John Fox & Sanford Weisberg last revision: 2018-09-26 Abstract Generalized least-squares (GLS) regression extends ordinary least-squares (OLS) estimation of the normal linear model by providing for possibly unequal error variances and for correlations between different errors. A common application of GLS estimation is to time-series regression, in which it is generally implausible to assume that errors are independent. This appendix to Fox and Weisberg (2019) briefly reviews GLS estimation and demonstrates its application to time-series data using the gls() function in the nlme package, which is part of the standard R distribution. 1 Generalized Least Squares In the standard linear model (for example, in Chapter 4 of the R Companion), E(yjX) = Xβ or, equivalently y = Xβ + " where y is the n×1 response vector; X is an n×k +1 model matrix, typically with an initial column of 1s for the regression constant; β is a k + 1 ×1 vector of regression coefficients to estimate; and " is 2 an n×1 vector of errors. Assuming that " ∼ Nn(0; σ In), or at least that the errors are uncorrelated and equally variable, leads to the familiar ordinary-least-squares (OLS) estimator of β, 0 −1 0 bOLS = (X X) X y with covariance matrix 2 0 −1 Var(bOLS) = σ (X X) More generally, we can assume that " ∼ Nn(0; Σ), where the error covariance matrix Σ is sym- metric and positive-definite. Different diagonal entries in Σ error variances that are not necessarily all equal, while nonzero off-diagonal entries correspond to correlated errors.
    [Show full text]
  • Simple Linear Regression
    The simple linear model Represents the dependent variable, yi, as a linear function of one Regression Analysis: Basic Concepts independent variable, xi, subject to a random “disturbance” or “error”, ui. yi β0 β1xi ui = + + Allin Cottrell The error term ui is assumed to have a mean value of zero, a constant variance, and to be uncorrelated with its own past values (i.e., it is “white noise”). The task of estimation is to determine regression coefficients βˆ0 and βˆ1, estimates of the unknown parameters β0 and β1 respectively. The estimated equation will have the form yˆi βˆ0 βˆ1x = + 1 OLS Picturing the residuals The basic technique for determining the coefficients βˆ0 and βˆ1 is Ordinary Least Squares (OLS). Values for the coefficients are chosen to minimize the sum of the ˆ ˆ squared estimated errors or residual sum of squares (SSR). The β0 β1x + estimated error associated with each pair of data-values (xi, yi) is yi uˆ defined as i uˆi yi yˆi yi βˆ0 βˆ1xi yˆi = − = − − We use a different symbol for this estimated error (uˆi) as opposed to the “true” disturbance or error term, (ui). These two coincide only if βˆ0 and βˆ1 happen to be exact estimates of the regression parameters α and β. The estimated errors are also known as residuals . The SSR may be written as xi 2 2 2 SSR uˆ (yi yˆi) (yi βˆ0 βˆ1xi) = i = − = − − Σ Σ Σ The residual, uˆi, is the vertical distance between the actual value of the dependent variable, yi, and the fitted value, yˆi βˆ0 βˆ1xi.
    [Show full text]
  • Statistical Properties of Least Squares Estimates
    1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall: Assumption: E(Y|x) = η0 + η1x (linear conditional mean function) Data: (x1, y1), (x2, y2), … , (xn, yn) ˆ Least squares estimator: E (Y|x) = "ˆ 0 +"ˆ 1 x, where SXY "ˆ = "ˆ = y -"ˆ x 1 SXX 0 1 2 SXX = ∑ ( xi - x ) = ∑ xi( xi - x ) SXY = ∑ ( xi - x ) (yi - y ) = ∑ ( xi - x ) yi Comments: 1. So far we haven’t used any assumptions about conditional variance. 2. If our data were the entire population, we could also use the same least squares procedure to fit an approximate line to the conditional sample means. 3. Or, if we just had data, we could fit a line to the data, but nothing could be inferred beyond the data. 4. (Assuming again that we have a simple random sample from the population.) If we also assume e|x (equivalently, Y|x) is normal with constant variance, then the least squares estimates are the same as the maximum likelihood estimates of η0 and η1. Properties of "ˆ 0 and "ˆ 1 : n #(xi " x )yi n n SXY i=1 (xi " x ) 1) "ˆ 1 = = = # yi = "ci yi SXX SXX i=1 SXX i=1 (x " x ) where c = i i SXX Thus: If the xi's are fixed (as in the blood lactic acid example), then "ˆ 1 is a linear ! ! ! combination of the yi's. Note: Here we want to think of each y as a random variable with distribution Y|x . Thus, ! ! ! ! ! i i ! if the yi’s are independent and each Y|xi is normal, then "ˆ 1 is also normal.
    [Show full text]