Linear Regression Using Ordinary Least Squares
Total Page:16
File Type:pdf, Size:1020Kb
Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression (LSR) line is the line with the smallest sum of square residuals smaller than any other line. That is, we want to measure closeness of the line to the points. The LSR line uses vertical distance from points to a line. A residual is the vertical distance from a point to a line. Equations: The true line for the population parameter: y = α + βx Note: We are trying to estimate this equation. We obtain a sample and estimate an approximation: yˆ = a + bx + E Estimate y Predict α = y-intercept Predict β = slope x = The independent variable. E = Epsilon or the error. These are random errors of measurement. Check the Assumptions: 1. No outliers. 2. Residuals follow a normal distribution with a mean = 0. 3. Residuals should be randomly scattered. Note: For #2, produce a histogram of the standardized residuals to see if they are normal. Hypotheses for the Correlation Coefficient (R): A measure of how close residuals are to the regression line. H0: ρ = 0 H1: ρ ≠ or < or > 0 Coefficient of Determination = R2 Range 0 – 1 0 ≤ r 2 ≤ 1 R2 = An effect size measure and yields the percentage of the variation in the Y values explained by X. Adjusted R2: Adjusts for R2’s upward bias and is a variance accounted for effect size measure. 2 Adj. R = .40 40% of variation in Y is explained by the regression line and dependent on X. SSR What’s explained by regression line SSRegression R2 = 2 SST (variability in y values or Σ()y − y SEE: The square root of the Mean Square Residual is the same as the Standard Error of the Estimate, which is the amount of error in the model measured in DV units. The higher the Adj. R2 value, the smaller the amount of error in the model (i.e., the smaller the value of the SEE) and the more stability the model will have upon replication. F Test Hypotheses: H0: The regression does not explain a significant proportion of the variance in Y. H1: The regression works and does explain a significant proportion of the variance in Y. ANOVA Results: How close the points are to the line: 1. SSR/SSE = Sum of Squared Residuals or SSE (E = Errors): Variation attributed to factors other than the relationship between X and Y. 2. SST = Sum of Squares Total: A measure of the total variability of your Y values around their Mean Y (i.e., how much Y values vary). 3. SSR = Sums of Squares Regression: The explained variation attributed to the relationship between X and Y. Note: We want a large SSRegression & Small SSE. That is, the points are close to the line and the line does a good job of predicting. Hypotheses for the Slope: H0: β = 0 No linear relationship H1: β ≠ or < or > 0 Linear relationship Test statistic = The distribution of β (slope) 1. The distribution of β is normal. 2. Mean = β Se Se = measure of the variation of the points around the line. 3. S.D. = 2 Σ()x − x SSE (# of parameters; thing you’re estimating) n − 2 B t = Se Hypotheses for the Intercept: H0: α (y-intercept) = 0 H1: α ≠ or < or > 0 a t = Se Confidence Intervals for B (Unstandardized Coefficient) B ±1.645or1.96or2.58α (Se) 2 Simple Linear Regression: Example 1 The linear model assumes that the relations between two variables can be summarized by a straight line. The X variable is often called the predictor and Y is often called the criterion. We often talk about the regression of Y on X, so that if we were predicting GPA from SAT we would talk about the regression of GPA on SAT. The regression problems that we deal with will use a line to transform values of X to predict values of Y. In general, not all of the points will fall on the line, but we will choose our regression line so as to best summarize the relations between X and Y. Suppose we measured the height and weight of a random sample of 10 adults in DeKalb. We want to predict weight from height in the population. Ht Wt 61 105 62 120 63 120 65 160 65 120 68 145 69 175 70 160 72 185 75 210 N=10 N=10 67 150 Mean 20.89 1155.5 Variance (s2) 4.57 33.99 SD (s) Correlation (r) = .94 For the regression of weight on height, we found: Y = -316.86 + 6.97(x), where -361.86 is the intercept (α ) and 6.97 is the slope ( β ). We could also write that weight is -316.86+6.97(height). The slope value means that for each inch we increase in height, we expect to increase approximately 7 pounds in weight. The intercept is the value of Y that we expect when X is zero. So if we had a person 0 inches tall, they should weigh -316.86 pounds (i.e., 6.97 *0 = 0; -316.86 + 0 = -316.86). Of course we do not find people who are zero inches tall and we do not find people with negative weight. Sometimes, in educational research, the value of the intercept will have no meaningful interpretation. Simple Linear Regression: Example 2 A. Predicted: Self-destructiveness = -108.92 + 22.33 Alcohol Unstandardized Regression Coefficients 22.33 Alcohol: We predict a 22.33 point increase in self-destructiveness for a one point increase in Alcohol when all other variables are held constant. B. Predicted: Self-destructiveness = 0.49 Alcohol Standardized Regression Coefficients 0.49 Alcohol: We predict a 0.49 standard deviation increase in self-destructiveness for a one standard deviation increase in Alcohol when all other variables are held constant. Simple Linear Regression: Example 3 The linear model tells us that each observed Y is composed of two parts, (1) a linear function of X, and (2) an error. We can use the regression line to predict values of Y given values of X. For any given value of X, we go straight up to the line, and then move horizontally to the left to find the value of Y. The predicted value of Y is called the predicted value of Y, and is denoted Y'. The difference between the observed Y and the predicted Y (Y-Y') is called a residual. The predicted Y part is the linear part. The residual is the error. N Ht Wt Y' Residual 1 61 105 108.19 -3.19 2 62 120 115.16 4.84 3 63 120 122.13 -2.13 4 65 160 136.06 23.94 5 65 120 136.06 -16.06 6 68 145 156.97 -11.97 7 69 175 163.94 11.06 8 70 160 170.91 -10.91 9 72 185 184.84 0.16 10 75 210 205.75 4.25 Mean 67 150 150.00 0.00 SD 4.57 33.99 31.85 11.89 Variance 20.89 1155.56 1014.37 141.32 Compare the numbers in the table for person 5 (height = 65, weight=120) to the same person on the graph. The regression line for X=65 is 136.06. The difference between the mean of Y and 136.06 is the part of Y due to the linear function of X. The difference between the line and Y is -16.06. This is the error part of Y, the residual. .