<<

STATISTICS WITHIN DEPARTMENTS OF MATHEMATICS AT LIBERAL ARTS COLLEGES

THOMAS L. MOORE GRINNELL COLLEGE

and

JEFFREY WITMER

TECHNICAL REPORT NO. 91-007

SLAW is supported by a grant from the Exxon Corporation Statistics in the Liberal Arts Workshop (SLAW) is a group of educators concerned with the teaching of statistics. The workshop was initially funded by the Sloan Foundation. Continuing support has been provided by the Exxon Corporation.

Participants in the suUD11er of 1991 are:

.Donald L. Bentley David S. Moore Purdue University

George W. Cobb Thomas L. Moore Mt. Holyoke College Grinnell College

Janice Gifford Norean ·Radke-Sharpe Mt. Holyoke College

Katherine T. Halvorsen Rosemary Roberts Bowdoin College

Homer T. Hayslett, Jr. Dexter Whittinghill III Colby College

Gudmund Iversen Jeffrey Witmer Oberlin College

Robin H. Lock St. THE AMERICAN MATHEMATICAL MONTHLY

(ISSN 0002-9890)

ARTICLES The Cauchy Problem for the Wave Equation with Distribution Data: an Elementary Approach ...... CALVIN H. WILCOX 401

Continuous Nowhere-Differentiable Functions-an Application of Contraction Mappings ...... HIDEFUMI KATSUURA 411

LETTERS TO THE EDITOR 417

NOTES A Note on Finding a Strict Saddlepoint ...... DANIEL BIENSTOCK, FAN CHUNG, MICHAEL FREDMAN, ALEJANDRO A. SCHAFFER, PETER W. SHOR, < Q. ANO SUBHASH SURI 418 C 3 Circumscribed Circles ...... ROBERT OSSERMAN 419 (1) (0 OJ Stirling's Series and Bernoulli z Numbers ...... ELIAS Y. DEEBA AND DENNIS M. RODRIGUEZ 423 3 CY ro.... THE TEACHING OF MATHEMATICS A New Scheme for Multiple-Choice Tests in Lower-Division Mathematics ...... BRUCE A. JOHNSON 427

A Simple Proof of the Weierstrass Approximation Theorem ...... J. H. LINDSEY II 429

Statistics Within Departments of Mathematics A .....0 at Liberal Arts Colleges ...... THOMAS L. MOORE AND JEFFREY A. WITMER 431 I A ::0 0 PROBLEMS AND SOLUTIONS

Elementary Problems and Solutions ...... 437

Advanced Problems and Solutions ...... 445

REVIEWS

Representations and Characters of Finite Groups. By M. J. Collins ...... MARK STEVEN MAZUR 453

Theorist. By Prescience Corporation ...... FRANK WATTENBERG 455

TELEGRAPHIC 8EVIEWS ...... 461 431

Statistics Within Departments of Mathematics at Liberal Arts Colleges

THOMAS L. MCXJRE Department of Mathematics, Grinnell College, Grinnell, IA 50112

JEFFREY A. WITMER Department of Mathematics, Oberlin College. Oberlin, OH 44074

In his important article "The Science of Patterns" [9], former MAA president Lynn Steen provides us with a state-of-the-discipline report on modern mathemat­ ics. Steen divides the mathematical sciences into three parts '·of roughly compara­ ble size": statistical science, core mathematics, and applied mathematics. Currently, core mathematics dominates both the faculty and of departments of mathematics at liberal arts colleges. Although Steen claims that distinctions between his three divisions may be less "intrinsic differences" than "differences in style, purpose, and ," we feel recognition of these differences can be an important benefit to the community of liberal arts colleges, where practicalities dictate that all three divisions of the mathematical sciences be housed in a single department In particular, a broadened view of the mathematical sciences can energize a mathematics department by increasing enrollments, the number of majors, the number of students going on to , and the general level of enthusiasm for studying the mathematical sciences. Below we outline some of the essential differences between statistics and core mathematics and discuss some practical curricular implications that these differences have for a mathematics department at a liberal arts college. We concentrate on statistics rather than applied mathematics because we are statisticians. Statistics is underrepresented among mathematics faculty at liberal arts colleges in the . A recent survey [8] of mathematics departments at liberal arts colleges suggests that approximately half of all such departments have no one with an advanced degree in statistics and only 12.5% have more than one such person. And, unlike the situation at many universities, if statisticians are employed at liberal arts colleges they will generally be housed in the department of mathemat­ ics, since mathematics is the traditional liberal arts discipline most closely aligned with statistics. Because the responsibility of statistics education at most liberal arts colleges rests with the mathematics department, it is imperative that the depart­ ment recognize the fundamental differences between statistics and core mathemat­ ics and ensure that their statistics curriculum reflects these. differences. Core mathematics and statistics differ in two fundamental ways. Both fields look for structure and patterns, but core mathematics looks in the abstract arena of space and number while statistics looks at data from other, nonmathematical subject areas. Hence core mathematics and statistics differ in their objects of study. They also differ in their methods. The emphasis in mathematical thinking is on deduction. For example, the axioms of classical group theory form a framework from which theorems can be derived. Naturally, in the discovery of new truths inductive reasoning and exploration are essential tools. In developing a new 432 THOMAS L. MOORE AND JEFFREY WITMER [May

theorem in group theory one may observe patterns in known groups with certain characteristics. If all such groups examined exhibit a certain property then a conjecture may emerge and a proof be sought. Nevertheless. in this endeavor the emphasis is on the deduction that establishes the theorem. Statistical thinking, on the other hand, focuses on a dialog.between models and data. For example, suppose one wants to develop a predictive relationship between a student's SAT score and his or her college GP A. using linear regression. One can postulate a simple linear relationship with independent, normally distributed deviations from the line and from this model infer many features of the population from which the data came. For example, one could construct an interval estimate of the GPA of a student with an 1150 SAT. However, the linear regression model is only a model of reality. All statistical models are tentative. If diagnostic plots show that the model does not fit the data, then a more complicated model will have to be considered. the inferences redrawn, and the new model scrutinized for appropriateness in its own right. This dialog between model and data is a fundamental feature of statistical thinking. A critical aspect of this dialog between model and data is the quality of the data. For example, by properly choosing the sample or by designing a good experiment the statistician can usually simplify analysis and strengthen the inter­ pretability of the model. It is therefore very important to present these ideas when teaching statistics. Even an introductory course should give attention to ideas such as sampling and experimental bias, the role of randomization, the idea of pairing observations, etc. Unfortunately many introductory courses and textbooks ignore these concepts. Current trends in statistical research reflect the fundamental difference be­ tween statistics and core mathematics. Consider first the area of regression diagnostics. Often in linear regression one or more points may be anomalous. The fitted model may not fit these points well (i.e., they may be outliers) or these points may have enormous influence on the fitting process itself. Much recent work has gone into developing diagnostic procedures for identifying outliers and influential points in complicated regression problems (see, e.g., [3]). These procedures play a key role in the iterative cycle of data analysis in which a statistician fits a model, then checks the fit using diagnostic methods, then if necessary fits a modified model, and so on. A typical regression analysis using diagnostics will depend heavily on the computer. Nevertheless, other areas of modern statistical research make even greater computing demands. Classical statistical inference rests upon a precise model specification. For example, in the SAT /GPA example above we need both the underlying straight-line and the normally distributed errors to create the interval estimate. The bootstrap is a method for making a statistical inference without standard model specifications. Using the bootstrap one repeatedly draws pseudo-samples from the data at hand and uses these pseudo-samples as the basis of, for example, an interval estimate of a parameter. The bootstrap is particularly useful in situations where the model-based theory is poorly developed. A good survey of this technique is given by Efron and Tibshirani [4]. The area of dynamic graphical data analysis provides a third example of statistics research that is quite distinct from core mathematics. The thrust of this 1991] THE TEACHING OF MATHEMATICS 433

research is using modern computer graphics to develop methods that allow the data analyst to discover the structure in complex, multidimensional data sets. Becker, Cleveland, and Wilks [I] give a good survey of this area. For each of the past few summers, a small group of statisticians from liberal arts colleges has met to discuss the nature of statistics within liberal arts colleges. These Statistics in the Liberal Arts Workshops (SLAW) have considered three aspects of statistics within the liberal arts setting: the teaching of statistics, the role of statistics within the mathematics major, and the role of a statistician as a general campus resource (see [8]). A key conclusion was that a vibrant statistics curriculum includes real data and applied statistics and that every mathematics department should offer at least one data-driven statistics course that counts for mathematics major credit and that preferably can be taken early in a student's career. Several models are possible for such a course: (1) Make an existing introductory course data-driven and count it for the major. (2) Add a (one- or two-credit) supplement to the traditional mathematical statistics course. (3) Teach new and different applied and data-driven statistics courses at the introductory level for mathematics majors. Model I may be the easiest to implement. Most liberal arts college mathematics departments teach a course in introductory statistics. Why not allow credit for the mathematics major for such a course? The traditional response to this question is that such a course is not mathematical enough, where "mathematical" is taken to mean "core mathematical." However, once statistics is given equal status as a mathematical science, this objection will vanish if such introductory courses are taught with statistical rigor, that is, if they are data-driven explorations of the discipline of statistics, as opposed to the dry presentations of formulas that students often see. Model 1 together with the traditional mathematical statistics course then becomes similar to the model most of us now use for teaching calculus: an introduction that teaches basic techniques and the flavor of the subject followed later by a theoretical real analysis course. It is natural for mathematicians teaching statistics to emphasize the mathemati­ cal (i.e., theoretical, deductive) side of the discipline. Such an approach makes things easier for the teacher, but gives only a partial picture of statistics. For within such a curriculum students who might be drawn to statistical thinking see little of it and potential mathematical scientists (and potential mathematics majors!) are lost. A good data-driven course uses real data to teach statistical thinking. Experi­ ence suggests that real data are a far more powerful motivating force than are artificial data. Statisticians-and their students-are interested in understanding and solving real problems. The analysis of data should be the focus of an introductory course. Students should discuss the real-world problems the data were collected to solve, the quality of the data, and their effective analysis. Exercises should be built around real data sets and students might even design projects that require them to produce and analyze their own data. The emphasis of such a course is on understanding the broad concepts that apply to most statistical problem solving. At least some portion of the course should deal with larger data sets and their analysis using a statistical computer package. It is important to indude large data sets and computers for several reasons. First, a good interactive package is a basic 434 THOMAS L. MOORE AND JEFFREY WITMER [May

tool for the modern data analyst and it is good for the student to learn about one. Second. most real data sets are too large for hand calculations. Third, larger data sets can better motivate some techniques than smaller ones. For example, finding outliers in a small data set can be done by scanning a listing of the data, but with a larger data set one turns naturally to statistical graphics. The widespread availability of computers and easy-to-use statistics packages (e.g., Minitab, as opposed to SAS or SPSS-X, which are powerful but not user-friendly) makes it possible to avoid tedious calculation when analyzing real data and, instead, to concentrate on the relevant statistical concepts. When the computer is used, students can spend less time on calculation and more time on the important task of learning what the numbers mean. This in turn gives the teacher more room to test understanding of concepts. Dealing with statistical concepts can be difficult, but students should not be able to avoid thinking by hiding behind formulas (and neither should their teachers). We have found, after a period of adjustment, that de-emphasizing formulas helps us think more clearly about the concepts we are trying to teach. Teaching a data-driven course can be a challenge. Such a course will require a different style of teaching for most faculty, with more discussion and less lecture. More time will be spent on the computer with a statistical package and, possibly, simulations. Office time will be spent helping students work out the very nonmath­ ematical aspects of the design and analysis of their own projects. Teaching a data-driven course will be difficult the first time, but the incorporation of the preceding ideas can be approached in steps and can result in a very rewarding teaching experience. A good textbook that teaches statistical thinking can be of great help. Three good examples, aimed at progressively more sophisticated audiences, are [6], [S], and [7]. A recent article by Cobb [2] provides excellent guidelines on what to look for in a textbook and also reviews several. We present a short list of resources and readings in the appendix to aid those who want to learn more about incorporating real examples and applications into statistics courses. A longer bibliography can be found in [8]. Model 2, the supplement to the mathematical statistics, is currently being used at Oberlin College, where the two-semester, six-credit sequence in probability and mathematical statistics is augmented with a one-credit supplement offered during the second semester. The course, called Data Analysis, meets once per week and covers various topics from applied statistics, including the tools of exploratory data analysis, normal probability plots and transformations of data, control charts, applied linear regression, and analysis of variance. All of the topics are introduced with computer applications to real data. In the spring of 1989 the course centered on an open-ended class project of analyzing a set of over 200 variables from a survey of college librarians at 97 liberal arts colleges, so that the students could gain first-hand experience as statisticians working on a real problem. Several implementations of model 3-new, applied, and introductory level courses for mathematics majors-currently exist and surely others are possible. Mt. Holyoke College teaches an applied regression course that carries a calculus and linear algebra prerequisite. The students are required to do individual projects in which they find or produce their own data. They must write a final report and present their project findings to the class. St. Lawrence University offers an 1991) 11-IE TEACHING OF MATI-IEMATICS 435

applied time series and forecasting course with a year of calculus as a prerequisite. This course uses much real data that are analyzed using a computer package. Both and Swarthmore College offer courses in applied multivariate analysis that require linear algebra as a prerequisite. AJthough few theorems, if any, are proved in these courses, they all require some mathematical sophistication of their audiences so that proper understanding of rather elaborate statistical models is possible. When statistics secures a greater role in the mathematics department of a liberal arts college, everyone gains. The department becomes more properly a department of mathematical sciences as described by Steen. Students more in­ clined toward statistics than core mathematics have an additional point of entry into the mathematics major and the college may send more students on to careers in the mathematical sciences. At the same time, these benefits require that the differences between statistics and core mathematics be appreciated and allowed for. Ideally, a department will hire a statistician (or two) to develop a solid statistics curriculum. Short of that, mathematicians presently teaching statistics should develop a data component in their statistics curriculum along the lines outlined above. A data-driven, concept-oriented approach can breathe new life into your statistics courses. We gratefully acknowledge the help of our fellow SLAW participants in formulating the ideas in this paper: Donald L. Bentley of Pomona, George W. Cobb of Mt. Holyoke, Homer T. (Pete) Hayslett, Jr. of Colby. Gudmund Iversen of Swarthmore, Anju Joglckar of Smith, Robin H. Lock of St. Lawrence, David S. Moore of Purdue, Rosemary Roberts of Bowdoin, and Frank Wolf of Carleton. We also thank the referee for helpful comments.

Appendix The following are introductory textbooks with a special strength or focus. Even experienced teachers should find them interesting. Items 1-6 list textbooks at the elementary level. l. J. Devore and R. Peck, The Exp/oration and Analysis of Data, 1986, West Publishing, St. Paul. This introductory textbook is loaded with real data from all areas of applica­ tion. 2. L. Koopmans, Introduction to Contemporary Statistical Methods, 2nd ed., 1987, Duxbury, Boston. Koopmans was among the first authors to thoroughly integrate the modern topics of exploratory data analysis, graphics, and robust methods into a coherent textbook. 3. B. F. Ryan, 8. L. Joiner, and T. A. Ryan, Jr.. , Minitab Handbook, 2nd ed., 1985, PWS-Kent. Boston. The Handbook is an excellent companion textbook to the highly popular statistical package Minitab. The book discusses the several larger data sets that come with the package and includes many good exercises about them. 4. D. S. Moore, Statistics: Concepts and Controversies, 1989, W. H. Freeman, New York. 5. D. Freedman, R. Pisani, and R. Purves, Statistics, 1978, W. W. Norton, New York. 436 THOMAS L. MOORE AND JEFFREY WITMER

6. D. S. Moore and George McCabe, Introduction to the Practice of Statistics, 1989, W. H. Freeman, New York. We referred to these three in the article. Items 7 and 8 are textbooks for more advanced students. 7. J. A. Rice, Mathematical Statistics and Data Analysis, 1988, Wadsworth, Belmont, Calif. This introductory textbook aimed at advanced undergraduates provides that audience with a true appreciation of the range of statistics and features the use of real data, exploratory data analysis and graphics, and a concern for model assumptions. 8. G. E. P. Box, W. G. Hunter, and J. S. Hunter, Statistics for Experimenters, 1978, Wiley, New York. This textbook provides a lucid, conceptual introduction to statistics and experi­ mentation for an audience of engineers, chemists, and other industrial scien­ tists. Items 9 and 10 are excellent supplemental reading for students and teachers. 9. W. S. Peters, Counting for Something: Statistical Principles and Personalities, 1987, Springer-Verlag, New York. This charming and casual introduction to elementary statistical concepts gives the reader an excellent sense of the history of the discipline. 10. J. M. Tanur, F. Mosteller, W. H. Kruskal, E. L. Lehmann, R. F. Link, R. S. Pieters, and G. R. Rising, Statistics : A Guide to the Unknown, 3rd ed., 1989, Wadsworth, Belmont, Calif. This is an anthology of short, nontechnical articles on a wide variety of statistical applications.

REFERENCES

I. Richard A. Becker. William S. Cleveland, and Allan R. Wilks, Dynamic graphics for data analysis, Statistical Sc~nct, I (1986) 355-395. 2. George W. Cobb, Introductory textbooks: a framework for evaluation, Journal of tlit American Statistical Association, 82 (1987) 321-339. 3. R. Dennis Cook and Sanford Weisberg, Residuals and Influence in Regression, 1982, Chapman and Hall, . 4. B. Efron and R. Tibshirani, Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Statistical Scitnct, I (1986) 54-77. 5. David Freedman, Robert Pisani, and Roger Purves, Statistics, 1978, W.W. Norton, New York. 6. David S. Moore, Statistics: Concepts and Controversies, 2nd ed., 1985, W. H. Freeman, New York. 7. David S. Moore and Georae P. McCabe, Introduction to the Practice of Statistics, 1989, W. H. Freeman, New York. 8. Thomas L Moore and Rosemary A. Roberts, Statistics at liberal arts colleaes, American Statistician, 43 (1989) 80-85. 9. Lynn A. Steen, The science of patterns, Sc~nct, 240 (1988) 611-616.