Unit 1 Review Items
Total Page:16
File Type:pdf, Size:1020Kb
Test #3 – Even More Review Name ______
Explanatory vs. Response Variables…. How do you determine which variable is explanatory and which is response? Basically it depends on which variable is being PREDICTED. The variable being predicted is on the y-axis (and hence the response), the variable we base our predictions on is the x-variable. For example, if I want to predict your grade from the amount of homework you did, then grade is the y-variable because it’s the one being predicted. If you aren’t making a prediction, then it doesn’t matter which is x vs y… like if (for the fun of it), I decided to graph your height vs. your armspan, just to see if a relationship exists, then it doesn’t matter which is x and which is y.
Correlation Coefficient (r)…. The correlation value (r) will either be given to you (in some form) or you will be given data and will use your calculator (LinReg) to find it. As for the interpretation, let’s say there is an r value of 0.75 between the grams of fat and number of calories in a fast food restaurant’s menu offerings. This means that there is a positive, moderately strong relationship between fat grams and calories. Correlation measures the strength and direction of a linear relationship. We know strength from the actual numerical value (close to 1 is strong, close to 0 is weak) and we know direction from the sign (positive or negative number). **NOTE: In order to interpret the correlation, we really need to know if the data has a linear pattern. If the data is non-linear, then the interpretation may not make sense!
Formulas for slope and intercept…
These are on the formula sheet, on the first column, about halfway down. The y-intercept is the “bo” and the slope is the “b1”
LSRL… If you are told to find the “LSRL”, that just means write the equation of the regression line.
Non-resistance to outliers… Both the correlation and the regression line (LSRL) are easily affected by outliers. You can tell if a point is influential by removing the point from the dataset and seeing if the correlation and the slope change.
R 2 … To find the r2, either use your calculator (LinReg) or if you are given the r value, square it. You may also be given the r2 from computer output. The interpretation is where most students struggle. In a nutshell, r2 tells us the ratio of how much better we do using the least-squares line to predict values of y than we would if we simply guessed the average y-value (y-bar) every time. Some of you may just need to memorize “___% of variability in the y-variable that can be explained by variability of the x-variable” Here’s another example: Height explains weight. Not totally, but roughly. Suppose R2 is 75% for a dataset between height and weight. We know that other things affect also weight, including genetics, diet and exercise. So we say that 75% of a variability in a person’s weight can be explained by the variability in a person’s height, but that 25% of the variation in weight is due to other factors. Computer Output Tables… The computer output table below shows the relationship between hourly wage of employees and their “quit rate” – that is, the number of employees per 100 that quit that job. The key to the computer output is to look at the COEF (coefficient) column, this tells you the slope and y-intercept. The slope is the one with the x-variable, the y-intercept is the one that is NOT with the x-variable.
I don’t get how to find and interpret a single residual. Let’s say that we decided to measure how thick pages in our textbooks were. Since we can’t measure a single page, we had to use multiple pages and measure. Let’s say our regression line ended up being: Thickness = 0.068(Pages) + 0.2857. One of the data points was for 100 pages and was measured to be 7 mm. The 7 mm is the ACTUAL thickness that was measured. Our regression line allows us to make a PREDICTION for 100 pages by plugging 100 into the equation: 0.068(100) + 0.2857, which equals 7.086 mm. The residual is the difference between these two numbers (Resid = Actual – Predicted). We simply subtract 7 (the Actual thickness) – 7.086 (the Predicted thickness) to get a residual of -0.086. Since this residual is negative, we know this actual point on the graph will fall below the regression line, or below the predicted value.