Exploring Regression
Total Page:16
File Type:pdf, Size:1020Kb
ADVANCED ALGEBRA Exploring Regression -----------------------"- G. BURRILL, J. BURRILL, P. HOPFENSPERGER, J. LANDWEHR DATA-DRIVEN MATHEMATICS D A L E S E Y M 0 U R P U B L I C A T I 0 N S® Exploring Least-Squares Linear Regression DATA-DRIVEN MATHEMATICS Gail F. Burrill, Jack C. Burrill, Patrick W. Hopfensperger, and James M. Landwehr Dale Seymour Pullllcallons® White Plains, New York This material was produced as a part of the American Statistical Managing Editors: Catherine Anderson, Alan MacDonell Association's Project "A Data-Driven Curriculum Strand for Editorial Manager: John Nelson High School" with funding through the National Science Foundation, Grant #MDR-9054648. Any opinions, findings, Senior Mathematics Editor: Nancy R. Anderson conclusions, or recommendations expressed in this publication Project Editor: John Sullivan are those of the authors and do not necessarily reflect the views of the National Science Foundation. Production/Manufacturing Director: Janet Yearian Production/Manufacturing Manager: Karen Edmonds Production Coordinator: Roxanne Knoll Design Manager: Jeff Kelly Cover and Text Design: Christy Butterfield Cover Photo: Romilly Lockyer, Image Bank This book is published by Dale Seymour Publications®, an imprint of Addison Wesley Longman, Inc. Dale Seymour Publications 10 Bank Street White Plains, NY 10602 Customer Service: 800-872-1100 Copyright© 1999 by Addison Wesley Longman, Inc. All rights reserved. No part of this publication may be reproduced in any form or by any means without the prior written permission of the publisher. Printed in the United States of America. Order number DS21182 ISBN 1-57232-245-4 1 2 3 4 5 6 7 8 9 10-ML-03 02 01 00 99 98 This Book Is Printed On Recycled Paper DALE SEYMOUR PUBLICATIONS® Alltllars Gail F. Burrill Jack C. Burrill Mathematics Science Education Board National Center for Mathematics Washington, D.C. Sciences Education University of Wisconsin-Madison Madison, Wisconsin Patrick W. Hoplenaperaer J ...... M.Landwellr Homestead High School Bell Laboratories Mequon, Wisconsin Lucent Technologies Murray Hill, New Jersey Consultants Emily Errthum Henry Kranendonk Homestead High School Rufus King High School Mequon, Wisconsin Milwaukee, Wisconsin Maria Mastromatteo Vince O'Connor Brown Middle School Milwaukee Public Schools Ravenna, Ohio Milwaukee, Wisconsin Jeflrey Witmer Oberlin College Oberlin, Ohio Data-Dl'Wen ••lllelflalfcs Leadenihip Tea1111 Gail F. Burrill Miriam CUHord Mathematics Science Education Board Nicolet High School Washington, D.C. Glendale, Wisconsin James M. Landwehr Richard Scheaffer Bell Laboratories University of Florida Lucent Technologies Gainesville, Florida Murray Hill, New Jersey Kenneth Sherrick Berlin High School Berlin, Connecticut Acknowledgments The authors thank the following people for their assistance during the preparation of this module: • The many teachers who reviewed drafts and participated in the field tests of the manuscripts • The members of the Data-Driven Mathematics 'leader ship team, the consultants, and the writers • Robert Johnson and Bill Yager for their field testing and evaluation of the original manuscript • Kathryn Rowe and Wayne Jones for their help in orga nizing the field-test process and leadership workshops • Jean Moon for her advice on how to improve the field test process • Barbara Shannon for many hours of word processing and secretarial services • Beth and Bryan Cole for writing the answers for the Teacher's Edition • The many students at Homestead and Whitnall High Schools who helped shape the ideas as they were being developed Table of Contents About Data-Driven Mathematics vi Using This Module vii Introductory Lesson: Why Draw a Line Through Data? 1 Lesson 1: What Is a Residual? 4 Lesson 2: Finding a Measure. of Fit 13 Lesson 3: Squaring or Absolute Value? 23 Lesson 4: Finding the Best Slope 27 Lesson 5: Finding the Best Intercept 33 Lesson 6: The Best Slope and Intercept 39 Lesson 7: Quadratic Functions and Their Graphs 43 Lesson 8: The Least-Squares Line 49 Lesson 9: Using the Least-Squares Linear-Regression Line 59 Lesson 10: Correlation 65 Lesson 11: Which Model When? 88 TABLE OF CONTENTS v About llala-llrillen Malllemarics Historically, the purposes of secondary-school mathematics have been to provide students with opportunities to acquire the mathematical knowledge needed for daily life and effective citi zenship, to prepare students for the workforce, and to prepare students for postsecondary education. In order to accomplish these purposes today, students must be able to analyze, inter pret, and communicate information from data. Data-Driven Mathematics is a series of modules meant to com plement a mathematics curriculum in the process of reform. The modules offer materials that integrate data analysis with secondary mathematics courses. Using these materials will help teachers motivate, develop, and reinforce concepts taught in current texts. The materials incorporate major concepts from data analysis to provide realistic situations for the development of mathematical knowledge and realistic opportunities for practice. The extensive use of real data provides opportunities for students to engage in meaningful mathematics. The use of real-world examples increases student motivation and provides opportunities to apply the mathematics taught in secondary school. The project, funded by the National Science Foundation, included writing and field testing the modules, and holding conferences for teachers to introduce them to the materials and to seek their input on the form and direction of the modules. The modules are the result of a collaboration between statisti cians and teachers who have agreed on statistical concepts most important for students to know and the relationship of these concepts to the secondary mathematics curriculum. vi ABOUT DATA-DRIVEN MATHEMATICS Using This Module Why the Content I• Important Studying mathematics involving data brings with it the notion of fitting a line to a data set. The desire to find a best line gives rise to a need to understand least-squares regression and corre lation. Most calculators and computer software today create the least-squares regression line and with it often display the correlation coefficient. It is because of this widespread avail ability and the misconceptions that can accompany these topics that this module came to be written. In this module, you will explore the development of the least squares regression line and its application. Why it works, when it is appropriate to use it, and how it should be interpreted are at the heart of the module. While investigating the relationship between data and the line and when the least-squares line is the best line, you will become aware of the dependence of the least squares line upon both residuals and a minimum point deter mined by plotting the sum of the squared residuals against the slope and intercept of that line. You will also learn to appreci ate the effect of outliers upon the li.ne. Knowing how to find and interpret the correlation coefficient and understanding the expression the strength of a linear relationship between two variables are two of the desired outcomes of this module. Throughout the module, you will find many real-world appli cations of these two important topics: least-squares regression line and the correlation coefficient. USING THIS MODULE vii Content Mathematics content: You will be able to: • Represent linear functions symbolically and graphically. • Determine and interpret slope and intercepts for linear functions. • Represent quadratic functions symbolically and graphically. • Determine the minimum point of a quadratic function. • Graph the sum of quadratic functions. • Represent absolute-value functions symbolically and graphically. • Determine the minimum point of an absolute-value function when possible. • Graph the sum of absolute-value functions. • Use summation notation and perform summation arithmetic. • Use variable notation, including subscripts and superscripts. Statistics content: You will be able to: • Calculate residuals. • Find the sum of squared residuals. • Find the absolute mean squared error. • Work with the correlation coefficients r and r2. • Describe the linear relationship between two variables. • Find least-squares regression lines. viU USING THIS MODULE INTRODUCTORY LESSON Why Draw a Line Through Data? INVESTIGATE OBJECTIVE Estimatins Calori• Discover relationships in The Food and Drug Administration (FDA) requires nutrition a scatter plot by drawing labels on food packages. Below is an example of a label from a lines through the data box of Lucky Charms breakfast cereal. points. Nutrition Fach Serving Size: 1 cup (30 g) Servings per Container: about 13 1 Amount per Serving Cereal With 2 cup skim milk Calories 120 160 Calories from fat 10 15 % Daily Values Total Fat 1 g 2% 2% Saturated Fat 0 g 0% 0% Cholesterol O mg 0% 1% Sodium 210 mg 9% 11 % Potassium 55 mg 2% 7% Total Carbohydrates 25 g 8% 10% Dietary Fiber 1 g 6% 6% Sugars 13 g Other Carbohydrates 11 g Protein 2 g Di•cuaion and Practice Without looking at these labels, how well can you estimate the calories of some selected food items? 1. In the table on page 2 is a list of some food items and their serving sizes. Copy the table. After each item write your estimate for how many calories are in one serving. Use the information above as a guide. WHY DRAW A LINE THROUGH DATA? 1 Item Serving Size Estimated Calories Chicken McNuggets 6 French Fries Regular size 1 Ben & Jerry's Cookie Dough Ice Cream 2 cup Saltine Crackers 5 Beef Ravioli 1 cup 1 Tomato Soup 2 cup 1 Skittles 12 oz 1 Raisins 4 cup Parmesan Cheese 1 Tbsp 1 Rice-a-Roni 22 oz 1 Rice Krispies Cereal 12 cup 3 Cap'n Crunch Cereal 4 cup z. How well were you able to estimate the number of calories in one serving of these food items? To help answer this question, use a nutrition book to find the actual number of calories for each item. Then make a scatter plot with your estimate of calories on the horizontal axis and the actual number of calories on the vertical axis.