Correlation and Regression s1

Correlation and Regression s1

<p>Unit 8 Chapter 9 Correlation and Regression</p><p>Scatter Diagram and Linear Correlation</p><p>A scatter diagram is a graph in which data points (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. The x variable is called the explanatory variable. The y is the response variable.</p><p>By observing the scatter diagram it can be observed if there may be a linear relationship between the x and y values. Correlation will give us tools to determine if there exists a relationship and how strong the relationship is if it does exist. A linear relationship is what we are looking for.</p><p>A veterinary science study was conducted to study the weight of Shetland Ponies. The question poses was “How much should a healthy Shetland Pony weight?” The follow data was observed and expanded to develop a correlation for the situation. Then it was desired to construct a line of best fit for the data.</p><p>Weight of Shetland Ponies x = age of the pony (in months) y = average weight of the pony (in kilograms)</p><p> x y x^2 y^2 xy 3 60 9 3600 180 n = 5 6 95 36 9025 570 12 140 144 19600 1680 18 170 324 28900 3060 24 185 576 34225 4440</p><p>Totals 63 650 1089 95350 9930 A scatter diagram shows the point observed in the applications. The points show a close to linear pattern with the y increasing as the x increases.</p><p>Ages & Average Weights of Shetland Ponies</p><p>) 200 s 24, 185 m</p><p> a 18, 170 r</p><p> g 150 o l</p><p> i 12, 140 k (</p><p> t 6, 95 h 100 g i e W</p><p> e 3, 60</p><p> g 50 a r e v A 0 0 5 10 15 20 25 30 Age (months)</p><p>The Sample Correlation Coefficient r can be calculated to give a measure showing the strength on the linear association between the two variables. 1) The calculated r is between -1 and 1. 2) If r is = -1, there is a perfect negative correlation which means as the x variable increase, the y variable decrease. 3) If r is 1, there is a perfect positive correlation which means as the x variable increase, the y variable increase. 4) If r = 0, there is no linear correlation. 5) The closer r is to -1 and 1, the better/stronger the relationship.</p><p>Correlation Coefficient n xy   x y r     2 2 n x 2   x n y 2   y Use Excel to construct a table to calculate these totals. x = age of the pony (in months) y = average weight of the pony (in kilograms)</p><p> x y x^2 y^2 xy 3 60 9 3600 180 n = 5 6 95 36 9025 570 12 140 144 19600 1680 18 170 324 28900 3060 24 185 576 34225 4440</p><p>Totals 63 650 1089 95350 9930 n = 5  xy = 9930  x = 63  y = 650  x2 = 1089  y2 = 95350 59930 63650 8700 r    0.972 51089 632 595350 6502 8948.34</p><p>Since r = 0.972 is close to 1, there is a very high positive linear correlation. </p><p>Strength of Correlation Size of r Interpretation</p><p>Note: These values could be positive and negative. Only positive numbers are shown.</p><p>0.90 to 1.00 - very high 0.70 to 0.89 - high 0.50 to 0.69 - moderate 0.30 to 0.49 - low 0.00 to 0.29 - little, if any</p><p>Linear Regression and the Coefficient of Determination The scatter diagram below has a least-squares line overlaid in the grid. Excel uses the Trendline option to produce the line. But you should use the formula given to calculate the equation of the line.</p><p>Ages & Average Weights of Shetland Ponies y = 5.89x + 55.73</p><p>) 250 s m a r 200 g</p><p> o 24, 185 l</p><p> i 12, 140 k (</p><p>150 t 18, 170 h</p><p> g 6, 95 i</p><p> e 100 W</p><p> e</p><p> g 3, 60</p><p> a 50 r e v A 0 0 5 10 15 20 25 30 Age (months)</p><p> Least-squares line y  a  bx where a is the intercept and b is the slope.  y is pronounced y -hat</p><p>Using the Excel sheet for the values-- _ x 63 First find sample mean for x: x     12.6 and n 5 _ y sample mean for y: y    130 n n xy   x y 59930 63650 8700 Slope b  2  2   5.89 n x 2   x 51089 63 1476</p><p>_ _ Intercept a  y b x  130  5.8912.6  55.79  Therefore the regression line is y  55.79  5.89x . (Note that the value in the excel line may vary slightly due to rounding.)</p><p>Using the least-squares line for prediction: Making predictions is the main application of linear regression. The least-squares ^ line can be used to predict y values for corresponding x values. There are two types of predictions. ^ 1) Interpolation: Predicting y values that are between observed x values in the data set. ^ For example, find y for a 10 year old pony. ^ y = 55.79 + 5.89 (10) = 114.69 lb ^ 2) Extrapolation: Predicting y values that are beyond observed x values in the data set. Extrapolation to far beyond observed x values may be unreasonable at some point. ^ For example, find y for a 30 year old pony. ^ y = 55.79 + 5.89 (30) = 203.04 lb </p><p>Coefficient of Determination r2 is formed by squaring the correlation coefficient r. r  0.792, r2  0.945</p><p>The coefficient of determination is a measurement of proportion of the variation in y explained by the regression line, using x as the explanatory variable. </p><p>For r2  0.945, then 94.5% of variation of y can be explained by x if we use the regression line. In addition, 5.5% of the variation is due to random chance or possibly a lurking variable.</p>

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us