Correlation and Regression s1
Total Page:16
File Type:pdf, Size:1020Kb

Unit 8 Chapter 9 Correlation and Regression
Scatter Diagram and Linear Correlation
A scatter diagram is a graph in which data points (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. The x variable is called the explanatory variable. The y is the response variable.
By observing the scatter diagram it can be observed if there may be a linear relationship between the x and y values. Correlation will give us tools to determine if there exists a relationship and how strong the relationship is if it does exist. A linear relationship is what we are looking for.
A veterinary science study was conducted to study the weight of Shetland Ponies. The question poses was “How much should a healthy Shetland Pony weight?” The follow data was observed and expanded to develop a correlation for the situation. Then it was desired to construct a line of best fit for the data.
Weight of Shetland Ponies x = age of the pony (in months) y = average weight of the pony (in kilograms)
x y x^2 y^2 xy 3 60 9 3600 180 n = 5 6 95 36 9025 570 12 140 144 19600 1680 18 170 324 28900 3060 24 185 576 34225 4440
Totals 63 650 1089 95350 9930 A scatter diagram shows the point observed in the applications. The points show a close to linear pattern with the y increasing as the x increases.
Ages & Average Weights of Shetland Ponies
) 200 s 24, 185 m
a 18, 170 r
g 150 o l
i 12, 140 k (
t 6, 95 h 100 g i e W
e 3, 60
g 50 a r e v A 0 0 5 10 15 20 25 30 Age (months)
The Sample Correlation Coefficient r can be calculated to give a measure showing the strength on the linear association between the two variables. 1) The calculated r is between -1 and 1. 2) If r is = -1, there is a perfect negative correlation which means as the x variable increase, the y variable decrease. 3) If r is 1, there is a perfect positive correlation which means as the x variable increase, the y variable increase. 4) If r = 0, there is no linear correlation. 5) The closer r is to -1 and 1, the better/stronger the relationship.
Correlation Coefficient n xy x y r 2 2 n x 2 x n y 2 y Use Excel to construct a table to calculate these totals. x = age of the pony (in months) y = average weight of the pony (in kilograms)
x y x^2 y^2 xy 3 60 9 3600 180 n = 5 6 95 36 9025 570 12 140 144 19600 1680 18 170 324 28900 3060 24 185 576 34225 4440
Totals 63 650 1089 95350 9930 n = 5 xy = 9930 x = 63 y = 650 x2 = 1089 y2 = 95350 59930 63650 8700 r 0.972 51089 632 595350 6502 8948.34
Since r = 0.972 is close to 1, there is a very high positive linear correlation.
Strength of Correlation Size of r Interpretation
Note: These values could be positive and negative. Only positive numbers are shown.
0.90 to 1.00 - very high 0.70 to 0.89 - high 0.50 to 0.69 - moderate 0.30 to 0.49 - low 0.00 to 0.29 - little, if any
Linear Regression and the Coefficient of Determination The scatter diagram below has a least-squares line overlaid in the grid. Excel uses the Trendline option to produce the line. But you should use the formula given to calculate the equation of the line.
Ages & Average Weights of Shetland Ponies y = 5.89x + 55.73
) 250 s m a r 200 g
o 24, 185 l
i 12, 140 k (
150 t 18, 170 h
g 6, 95 i
e 100 W
e
g 3, 60
a 50 r e v A 0 0 5 10 15 20 25 30 Age (months)
Least-squares line y a bx where a is the intercept and b is the slope. y is pronounced y -hat
Using the Excel sheet for the values-- _ x 63 First find sample mean for x: x 12.6 and n 5 _ y sample mean for y: y 130 n n xy x y 59930 63650 8700 Slope b 2 2 5.89 n x 2 x 51089 63 1476
_ _ Intercept a y b x 130 5.8912.6 55.79 Therefore the regression line is y 55.79 5.89x . (Note that the value in the excel line may vary slightly due to rounding.)
Using the least-squares line for prediction: Making predictions is the main application of linear regression. The least-squares ^ line can be used to predict y values for corresponding x values. There are two types of predictions. ^ 1) Interpolation: Predicting y values that are between observed x values in the data set. ^ For example, find y for a 10 year old pony. ^ y = 55.79 + 5.89 (10) = 114.69 lb ^ 2) Extrapolation: Predicting y values that are beyond observed x values in the data set. Extrapolation to far beyond observed x values may be unreasonable at some point. ^ For example, find y for a 30 year old pony. ^ y = 55.79 + 5.89 (30) = 203.04 lb
Coefficient of Determination r2 is formed by squaring the correlation coefficient r. r 0.792, r2 0.945
The coefficient of determination is a measurement of proportion of the variation in y explained by the regression line, using x as the explanatory variable.
For r2 0.945, then 94.5% of variation of y can be explained by x if we use the regression line. In addition, 5.5% of the variation is due to random chance or possibly a lurking variable.