
<p>Unit 8 Chapter 9 Correlation and Regression</p><p>Scatter Diagram and Linear Correlation</p><p>A scatter diagram is a graph in which data points (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. The x variable is called the explanatory variable. The y is the response variable.</p><p>By observing the scatter diagram it can be observed if there may be a linear relationship between the x and y values. Correlation will give us tools to determine if there exists a relationship and how strong the relationship is if it does exist. A linear relationship is what we are looking for.</p><p>A veterinary science study was conducted to study the weight of Shetland Ponies. The question poses was “How much should a healthy Shetland Pony weight?” The follow data was observed and expanded to develop a correlation for the situation. Then it was desired to construct a line of best fit for the data.</p><p>Weight of Shetland Ponies x = age of the pony (in months) y = average weight of the pony (in kilograms)</p><p> x y x^2 y^2 xy 3 60 9 3600 180 n = 5 6 95 36 9025 570 12 140 144 19600 1680 18 170 324 28900 3060 24 185 576 34225 4440</p><p>Totals 63 650 1089 95350 9930 A scatter diagram shows the point observed in the applications. The points show a close to linear pattern with the y increasing as the x increases.</p><p>Ages & Average Weights of Shetland Ponies</p><p>) 200 s 24, 185 m</p><p> a 18, 170 r</p><p> g 150 o l</p><p> i 12, 140 k (</p><p> t 6, 95 h 100 g i e W</p><p> e 3, 60</p><p> g 50 a r e v A 0 0 5 10 15 20 25 30 Age (months)</p><p>The Sample Correlation Coefficient r can be calculated to give a measure showing the strength on the linear association between the two variables. 1) The calculated r is between -1 and 1. 2) If r is = -1, there is a perfect negative correlation which means as the x variable increase, the y variable decrease. 3) If r is 1, there is a perfect positive correlation which means as the x variable increase, the y variable increase. 4) If r = 0, there is no linear correlation. 5) The closer r is to -1 and 1, the better/stronger the relationship.</p><p>Correlation Coefficient n xy x y r 2 2 n x 2 x n y 2 y Use Excel to construct a table to calculate these totals. x = age of the pony (in months) y = average weight of the pony (in kilograms)</p><p> x y x^2 y^2 xy 3 60 9 3600 180 n = 5 6 95 36 9025 570 12 140 144 19600 1680 18 170 324 28900 3060 24 185 576 34225 4440</p><p>Totals 63 650 1089 95350 9930 n = 5 xy = 9930 x = 63 y = 650 x2 = 1089 y2 = 95350 59930 63650 8700 r 0.972 51089 632 595350 6502 8948.34</p><p>Since r = 0.972 is close to 1, there is a very high positive linear correlation. </p><p>Strength of Correlation Size of r Interpretation</p><p>Note: These values could be positive and negative. Only positive numbers are shown.</p><p>0.90 to 1.00 - very high 0.70 to 0.89 - high 0.50 to 0.69 - moderate 0.30 to 0.49 - low 0.00 to 0.29 - little, if any</p><p>Linear Regression and the Coefficient of Determination The scatter diagram below has a least-squares line overlaid in the grid. Excel uses the Trendline option to produce the line. But you should use the formula given to calculate the equation of the line.</p><p>Ages & Average Weights of Shetland Ponies y = 5.89x + 55.73</p><p>) 250 s m a r 200 g</p><p> o 24, 185 l</p><p> i 12, 140 k (</p><p>150 t 18, 170 h</p><p> g 6, 95 i</p><p> e 100 W</p><p> e</p><p> g 3, 60</p><p> a 50 r e v A 0 0 5 10 15 20 25 30 Age (months)</p><p> Least-squares line y a bx where a is the intercept and b is the slope. y is pronounced y -hat</p><p>Using the Excel sheet for the values-- _ x 63 First find sample mean for x: x 12.6 and n 5 _ y sample mean for y: y 130 n n xy x y 59930 63650 8700 Slope b 2 2 5.89 n x 2 x 51089 63 1476</p><p>_ _ Intercept a y b x 130 5.8912.6 55.79 Therefore the regression line is y 55.79 5.89x . (Note that the value in the excel line may vary slightly due to rounding.)</p><p>Using the least-squares line for prediction: Making predictions is the main application of linear regression. The least-squares ^ line can be used to predict y values for corresponding x values. There are two types of predictions. ^ 1) Interpolation: Predicting y values that are between observed x values in the data set. ^ For example, find y for a 10 year old pony. ^ y = 55.79 + 5.89 (10) = 114.69 lb ^ 2) Extrapolation: Predicting y values that are beyond observed x values in the data set. Extrapolation to far beyond observed x values may be unreasonable at some point. ^ For example, find y for a 30 year old pony. ^ y = 55.79 + 5.89 (30) = 203.04 lb </p><p>Coefficient of Determination r2 is formed by squaring the correlation coefficient r. r 0.792, r2 0.945</p><p>The coefficient of determination is a measurement of proportion of the variation in y explained by the regression line, using x as the explanatory variable. </p><p>For r2 0.945, then 94.5% of variation of y can be explained by x if we use the regression line. In addition, 5.5% of the variation is due to random chance or possibly a lurking variable.</p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-