AP Statistics Unit 3: Examining Relationships 3.2B Residual Plots Name______Date______Hour______

1. In what way is a regression line a mathematical model?

2. What is extrapolation and why is it dangerous?

3. What is a least-squares regression line?

4. What is the formula for the equation of the least-squares regression line?

5. The least-squares regression line always passes through the point …?

6. What is a residual?

7. How can you calculate residuals on your calculator and use this to produce a residual plot?

8. If a least-squares regression line fits the data well, what characteristics should the residual plot exhibit?

9. The table below gives data on the number of powerboats registered in Florida and the number of manatees killed by boats in the years from 1977 to 1990.

Year PowerBoat Manatees Registrations killed (in thousands) 1977 447 13 1978 460 21 1979 481 24 1980 498 16 1981 513 24 1982 512 20 1983 526 15 1984 559 34 1985 585 33 1986 614 33 1987 645 39 1988 675 43 1989 711 50 1990 719 47 2EXAMINING RELATIONSHIPS

a) Use your calculator to make a scatterplot of these data. Sketch the graph below:

b) Find the equation of the least-squares regression line and overlay that line on your scatterplot.

c) Predict the number of manatees that will be killed in a year when 716,000 powerboats are registered.

d) Here are four more years of manatee data.

1991 716 53 1992 716 38 1993 716 35 1994 535 49

Add these points to your scatterplot. Florida took stronger measures to protect manatees during these years. Do you see evidence that these measures succeeded?

e) In part (c) you predicted the number of manatee deaths in a year with 716,000 powerboat registrations. In fact, powerboat registrations were 716,000 for three years. Compare the mean manatee deaths in these three years with your prediction from part (c). How accurate was your prediction?

10. The table below gives data on the number of new birds y and percent of returning birds x for 13 sparrowhawk colonies. Enter the data into your calculator.

Percent New returnin Adults g 74 5 66 6 81 8 AP Statistics Unit 3: Examining Relationships 3.2B Residual Plots 52 11 73 12 62 15 52 16 45 17 62 18 46 18 60 19 46 20 38 20

a) Use your calculator’s regression function to find the equation of the least-squares regression line.

b) Use your calculator to find the mean and standard deviation of both x and y and their correlation r.

c) Calculate the slope b and y intercept a of the regression line following the method of Example 3.13 on page 208 in your book. Verify that your equation is the same as the one you obtained in part (a) except for slight rounding errors.

d) Explain in words what the slope and y-intercept of the regression line tell us.

e) An ecologist uses the line to predict how many birds will join another colony of sparrowhawks, to which 60% of the adults return from the previous year. What is the prediction?

11. A study recorded data on number of beers consumed and blood alcohol content (BAC) for 16 students. Page 212 in your book, problem 3.35 gives partial computer output from Minitab relating to these data: a) Use the computer output to write the equation of the least-squares line. b) Interpret the slope and y-intercept of the equation in this setting. c) What blood alcohol level would your equation predict for a student who consumed 6 beers? d) The one student in the study who consumed 6 beers had a BAC of 0.10. What is your prediction error in part (c)? e) Problem 3.37 in your book on page 213 gives a screen shot from a TI-84 produced using the data. Use the method described in example 3.13 on page 208 in your book to calculate the slope and y- intercept of the least-squares regression line. Compare with your results from part (a). 4EXAMINING RELATIONSHIPS

12. The Trans-Alaska Oil pipeline is a tube formed from ½ inch thick steel that carries oil across 800 miles of sensitive arctic and sub-arctic terrain. The pipe and the welds that join the segments were carefully examined before installation. How accurate are field measurements of the depth of small defects? Scatterplot below compares the results of measurements on 100 defects made in the field with measurements of the same defects made in the laboratory. The line y = x is drawn on the scatterplot. The second plot is a residual plot for these data.

a) Describe the overall pattern you see in the scatterplot, as well as any deviations from that pattern.

b) If field and laboratory measurements all agree, then the points should fall on the y=x line drawn on the scatterplot, except for small variations in the measurements. But this is not the case. Explain.

c) The line drawn on the scatterplot (y=x) is not the least-squares regression line. How would the slope and y-intercept of the least-squares line compare? Justify your answer.

d) Discuss what the residual plot tells you about how well the least-squares regression line fits the data.

12. Data on the fuel consumption y of a car at various speeds x is given on page 182 in your book. Fuel consumption is measured in liters of gasoline per 100 kilometers driven and speed is measured in kilometers per hour. A statistical software package gives the least-squares regression line and also the residuals. The regression line is:

The residuals in the same order as the observations are:

10.09 2.24 -0.62 -2.47 -3.33 -4.28 -3.73 -2.94 -2.17 -1.32 -0.42 0.57 1.64 2.76 3.97 AP Statistics Unit 3: Examining Relationships 3.2B Residual Plots a) Make a scatterplot of the observations and draw the least-squares regression line on your plot.

b) Would you use the regression line to predict y from x? Justify your answer.

c) Check that the residuals have sum zero (up to roundoff error) d) Make a plot of residuals against the values of x. Draw a horizontal line at height zero on your plot. Notice that the residuals show the same pattern about this line as the data points show about the regression line in the scatterplot in (a). What do you conclude about the residual plot?

15. Below are four sets of data prepared by the statistician Frank Ascombe to illustrate the dangers of calculating without first plotting the data. 6EXAMINING RELATIONSHIPS

a) Without making scatterplots, find the correlation and the least-squares regression line for all four data sets. What do you notice? Use the regression line to predict y for x = 10 for all four data sets.

b) Make a scatterplot for each of the data sets and add the regression line to each plot.

c) Now make a sketch of the residual plot for each of the four data sets

d) In each of the four cases, which would you be willing to use the regression line to describe the dependence of y on x? Explain our answer in each case.