Math 1020, Mathematics for the Liberal Arts, Fall 2015

Total Page:16

File Type:pdf, Size:1020Kb

Math 1020, Mathematics for the Liberal Arts, Fall 2015

Math 1020, Mathematics for the Liberal Arts, Fall 2015 Activity #26: Time-series data; tracking change over time Evidence of improvement? How much? How convincing? Friday, November 21(/Monday, November 24?) Learning Outcomes: • Introduction to Time Series graphs • Recognizing and interpreting regression analysis in public policy-related literature • More interpretation of slope as rate of change • More on prediction with a model and relative error

Name(s)______

Let’s see how the Bush Lake Use Attainability Assessment (U.A.A.) included an interpretation of a regression line without explicitly saying so.

On p. iv of the reading packet “Bush Lake Use Attainability Analysis Final Report” handed out a few weeks ago, we read the following:

“The results of the regression analyses indicate total phosphorus concentration in the epilimnion (which means “top layer” of the lake) has been decreasing at the rate of 0.5 μg/L per year;” (italics not in the original) and similar statements about chlorophyll a concentration and Secchi disc transparency. They do not, however, explain where they got those figures (encouraging as they are; together, they paint a picture of improving water quality). We are going to find out.

A scatterplot when the explanatory (“x”) variable happens to be time is called a time series. Time series are important when you’re trying to watch how a variable is changing over a period of… well, time. (In fact, we were looking at time series back in the beginning of the course, when we looked at graphs of existing ecological uses.) We can often see long term trends, of increase, or decrease, or some of each, when the data are plotted over a time series.

In particular we’re going to look at time series for some water quality variables in Bush Lake, using data from the U.A.A. to see how the authors arrived at their conclusion mentioned above.

On the Y: drive, find the file TimeSeriesData.xls in the Nov. 21 folder. Since p. iv of the U.A.A. refers to “trend analysis…for the period 1982 to 2000,” only those years are included. The data values come from the table of summer averages on p. 5 of the reading handout. Save the file to your H: drive or a flash drive, then: Question 1. Make a scatterplot with “year” on the x – axis and “mean phosphorus” on the y axis. Label the axes, and include the trendline, with its equation and R 2 value. Print the finished graph.

Question 2. Write the regression equation and R 2 value:

Equation: y = ______x + ______

R 2 = ______.

You may or may not recall the “slope – intercept form” of a straight-line equation from algebra. This is “y = mx + b.” In this equation, x and y are the variables whose relationship we’re exploring. m and b are fixed numbers, or “constants,” whose value is found from data. The number m is called the slope of the line. This number has various interpretations, which we’ve discussed off and on when looking at regression equations.

Question 3. What value does m have for the equation you found in question 2?

Here we get to the claim about decreasing phosphorus concentrations mentioned earlier. (Remember the claim: “The results of the regression analyses indicate total phosphorus concentration in the epilimnion has been decreasing at the rate of 0.5 μg/L per year.”)

As we’ve discussed before, one interpretation of a slope is as a rate of change. Thus this claim was based on the slope of the regression line made from the variables “year” (x) and “mean phosphorus concentration” (y). If the slope is a negative number, then the y variable is getting smaller, or decreasing, as the x variable increases. In this case, it means that the overall trend is for the mean phosphorus concentration to decrease with each passing year. Note the language: Sometimes the phosphorus concentration did go up, but the overall trend was for it to go down. Question 4. Based on R 2 from question 2, and from whether the relationship between the year and the mean phosphorus concentration is positive or negative, find the R – value for this relationship, and use it to classify the relationship as strong, moderate, or weak.

R = ______

Relationship is (circle one): Strong Moderate Weak Now, one way to potentially use the slope as a rate of change is as a predictive tool. This means to use current data and the rate of change to predict future values. Whether this gives reliable predictions depends on the strength of the relationship; the stronger the relationship, the more reliable these predictions are likely to be. Here’s how such a prediction works; we’ll use phosphorus concentration in 1982 to predict phosphorus concentration in 1983 using the regression slope. This allows us to check the reliability of the prediction, since we have actual data from 1983. Note that this is not the same thing we did when we were just plugging in numbers for “x” in the regression equation.

Question 5. What was the mean phosphorus concentration in 1982? (It’s in the table in the Excel file; include units.)

Question 6. What was the slope of your regression line (from question 3)?

This slope is the regression line’s predicted change in phosphorus concentration in 1 year’s time. So: Question 7. Starting with the concentration level from question 5, if the phosphorus concentration changes by the amount in question 6, what would the phosphorus concentration become?

This is your predicted mean phosphorus concentration in 1983, based on the regression equation’s slope and the actual concentration in the previous year. Question 8. From the data table, what was the actual mean phosphorus concentration in 1983? Include units.

Question 9. Recall that, when using a model to predict the value of some variable we defined:

error error = (actual value) – (predicted value), and relative error = . actual value We’ll call a prediction “close” if it is within 5% of the actual value from the data; that is, if its relative error is less than 5%. Is the regression prediction from question 7 close? Show the work that led to your answer.

We’ll look at a couple more predictions, using the regression slope, which was your answer to Question 6. Starting from the actual phosphorus concentration on 1983, we’ll predict the phosphorus concentration for 1984 and compare the prediction with the actual value from the data.

Question 10. What was the mean phosphorus concentration in 1983? (It’s in the table in the Excel file; include units.)

Question 11. Starting with the concentration level from question 10, if the phosphorus concentration changes by the amount in question 6, what would the phosphorus concentration become?

This is your predicted mean phosphorus concentration in 1984, based on the regression slope and the actual concentration in the previous year. Question 12. From the data table, what was the actual mean phosphorus concentration in 1984? Include units.

Question 13. Is the regression prediction from question 11 close? (Use the same criterion as was used in Question 9, and show the work that led to your answer.)

So far we’ve been using the slope of the regression line to make predictions one year ahead. But the data have some gaps; for example, there are no phosphorus readings for any years between 1988 and 1993. Here’s where we think about the slope as a rate of change.

Question 14. How many years pass from 1988 until 1993?

Question 15. Here’s the kicker: Your answer to Question 6 gives the predicted change in phosphorus concentration in one year. By how much, then, would phosphorus concentration change in the number of years you found in Question 14 above? Include units.

Question 16. From the table in the Excel file, what was the epilimneal phosphorus concentration on Bush Lake in 1988? Include units.

Question 17. Starting with this concentration, use your answer to Question 15 to give the regression slope’s prediction of the phosphorus concentration in 1993. Question 18. From the data table, what was the actual mean phosphorus concentration in 1993? Include units.

Question 19. Is the regression prediction from question 17 close? (Use the same criterion as was used in Questions 9 and 13.)

OK. The slope of the regression line is best thought of as the average change per year in the lake’s phosphorus concentration. This means, of course, that some years the change is more than the regression slope and sometimes it’s less, so that the slope isn’t necessarily appropriate for year – to – year predictions, but is rather a measure of the long term trend. This trend was referred to in the quote cited at the beginning of this activity:

“…total phosphorus concentration…has been decreasing at the rate of 0.5 μg/L per year.”

Question 20. Is the rate in the quote the same as the rate we’ve been using from the slope of the regression line? Briefly explain.

Question 21. Looking at the time series graph, is there a particular data point that stands out, as lying especially outside the overall pattern of the data, as indicated by the trendline? If so, go back and circle that point on the graph. (Remember from earlier in the course that such a point is called an outlier.) Question 22. Find the data for the outlier in the table from the Excel file. What year does the outlier come from, and what was the mean phosphorus concentration that year?

Year:______Phosphorus concentration: ______g/L

As you’ve done before, copy and paste the data column for the year and for phosphorus concentration, side-by-side, with the year column on the left (so that the year will be on the horizontal, or “x,” axis of the scatterplot you’ll make). Then delete, from the phosphorus column, the entry which contains the outlier value. (Don’t delete the year from its column.)

Question 23. Now make a new scatterplot, with the trendline and its equation and R 2 value included. Print the graph.



Question 24. Is the relationship between the year and the phosphorus concentration stronger now, with the outlier removed? Explain, using the correlation coefficient value for each relationship as your justification.

Question 25. What is the slope of the regression line of the new graph (with the outlier removed)? Round to just one place after the decimal.

Question 26. According to the new graph and trendline, was the phosphorus concentration in Bush Lake increasing or decreasing from 1982 to 2000? (Again, we’re talking overall trend, even though the trend might not hold for every single year.) Question 27. What do your answers to the two preceding questions give as a rate of change, according to the regression line model, from the data with the outlier removed? (Answer using the “form” provided below.)

Answer: The phosphorus concentration was (circle one) increasing decreasing by

______per year.

(The blank above should be filled in not just with a number, but should include the “units” that go with that number. For an example, see the quote on the first page of this activity.)

Question28 Is this rate the same as the rate given in the Bush Lake U.A.A., as quoted in the opening page of this activity? Briefly explain.

Question 29. Write the three different rates of phosphorus concentration decrease we have so far below: one from the U.A.A., one from the data with the outlier included (Questions 3 and 6), and one from the data with the outlier removed (question 28 above). Don’t write numbers only, but include the units that the number is measuring. Write only one place after the decimal for each rate.

Rate from the U.A.A.:

Rate from data with outlier included (Question 6):

Rate from data with outlier removed (Question 28):

(My hunch, and that’s all it is, is that the number reported in the U.A.A. was obtained by calculating the rate with the outlier, calculating the rate without the outlier, then averaging the two. Certainly by doing this they would have arrived at the rate they reported.)

Recommended publications