Stat Labs: Mathematical Statistics Through Applications

Stat Labs: Mathematical Statistics Through Applications Deborah Nolan Terry Speed Springer To Ben and Sammy —D.N. To Sally —T.P.S This page intentionally left blank Preface This book uses a model we have developed for teaching mathematical statistics through in-depth case studies. Traditional statistics texts have many small numer- ical examples in each chapter to illustrate a topic in statistical theory. Here, we instead make a case study the centerpiece of each chapter. The case studies, which we call labs, raise interesting scientific questions, and figuring out how to answer a question is the starting point for developing statistical theory. The labs are substan- tial exercises; they have nontrivial solutions that leave room for different analyses of the data. In addition to providing the framework and motivation for studying topics in mathematical statistics, the labs help students develop statistical thinking. We feel that this approach integrates theoretical and applied statistics in a way not commonly encountered in an undergraduate text. The Student The book is intended for a course in mathematical statistics for juniors and seniors. We assume that students have had one year of calculus, including Taylor series, and a course in probability. We do not assume students have experience with statistical software so we incorporate lessons into our course on how to use the software. Theoretical Content The topics common to most mathematical statistics texts can be found in this book, including: descriptive statistics, experimental design, sampling, estimation, viii Preface testing, contingency tables, regression, simple linear least squares, analysis of variance, and multiple linear least squares. Also found here are many selected topics, such as quantile plots, the bootstrap, replicate measurements, inverse regression, ecological regression, unbalanced designs, and response surface analysis. This book differs from a standard mathematical statistics text in three essential ways. The first way is in how the topic of testing is covered. Although we address testing in several chapters and discuss the z, t, and F tests as well as chi-square tests of independence and homogeneity, Fisher’s exact test, Mantel-Haenszel test, and the chi-square goodness of fit test, we do not cover all of the special cases of t tests that are typically found in mathematical statistics texts. We also do not cover nonparametric tests. Instead we cover more topics related to linear models. The second main difference is the depth of the coverage. We are purposefully brief in the treatment of most of the theoretical topics. The essential material is there, but details of derivations are often left to the exercises. Finally this book differs from a traditional mathematical statistics text in its layout. The first four sections of each chapter provide the lab’s introduction, data description, background, and suggestions for investigating the problem. The theoretical material comes last, after the problem has been fully developed. Because of this novel approach, we have included an Instructor’s Guide to Stat Labs, where we describe the layout of the chapters, the statistical content of each chapter, and ideas for how to use the book in a course. The design of Stat Labs is versatile enough to be used as the main text for a course, or as a supplement to a more theoretical text. In a typical semester, we cover about 10 chapters. The core chapters that we usually cover are Chapter 1 on descriptive statistics, Chapter 2 on simple random sampling, Chapter 4 on estimation and testing, and Chapter 7 on regression. Other chapters are chosen according to the interests of students. We give examples of semester courses for engineering, social science, and life science students in the Instructor’s Guide. Acknowledgments This book has been many years in the making, and has changed shape dramatically in its development. We are indebted to those who participated in this project. It began with the creation of a modest set of instruction sheets for using the computer for simulation and data analysis. Ed Chow helped us develop these first labs. Chad Heilig helped reorganize them to look more like the labs found here. In the final stages of preparation, Gill Ward helped clarify and improve the presentation of the material. Others have assisted us in the preparation of the manuscript: Yoram Gat prepared many of the figures; Liza Levina wrote the probability appendix; Chyng- Lan Liang prepared the news articles and Gang Liang made countless corrections to the manuscript. Thanks also go to Frank McGuckin, the production editor at Springer, for turning the many idiosyncrasies of our manuscript into a book. Preface ix Several undergraduates assisted in the preparation of the labs. Christine Cheng researched the topics of AIDS and hemophilia for Chapter 6, Tiffany Chiang pilot- tested several experiments for Chapter 5, Cheryl Gay prepared the background material for Chapter 11, and Jean-Paul Sursock helped formulate the investigations of the data for many of the labs. We are very grateful to those who shared with us their data and subject-matter knowledge for the labs. Without them, this book would not exist. Thanks go to David Azuma, Brenda Eskanazi, David Freedman, James Goedert, Bill Kahn, Stephen Klein, David Lein, Ming-Ying Leung, Mike Mohr, Tony Nero, Phillip Price, Roger Purves, Philip Rosenberg, Eddie Rubin, and Desmond Smith. We also thank those who have tried out some of our labs in their courses: Geir Eide, Marjorie Hahn and Pham Quan, and our colleagues Ani Adhikari, Ching- Shui Cheng, Kjell Doksum, John Rice, Philip Stark, Mark van der Laan, and Bin Yu. Similar thanks go to all the graduate students who taught from the ever evolving Stat Labs, including Ed Chow, Francois Collin, Imola Fodor, Bill Forrest, Yoram Gat, Ben Hansen, Chad Heilig, Hank Ibser, Jiming Jiang, Ann Kalinowski, Namhyun Kim, Max Lainer, Vlada Limic, Mike Moser, Jason Schweinsberg, and Duncan Temple Lang. Apologies to anyone left off this list of acknowledgments. This project was partially supported by an educational mini-grant from the University of California, and by a POWRE grant from the National Science Foundation. Finally, we are indebted to Joe Hodges, who allowed us to adopt the title from his book with Krech and Crutchfield, which was an inspiration to us. We also thank John Kimmel, our editor. Throughout this long evolution John patiently waited and supported our project. Berkeley, California Deborah Nolan Berkeley, California Terry Speed March 2000 This page intentionally left blank Instructor’s Guide to Stat Labs The labs you find here in this text are case studies that serve to integrate the practice and theory of statistics. The instructor and students are expected to analyze the data provided with each lab in order to answer a scientific question posed by the original researchers who collected the data. To answer a question, statistical methods are introduced, and the mathematical statistics underlying these methods are developed. The Design of a Chapter Each chapter is organized into five sections: Introduction, Data, Background, In- vestigations, and Theory. Sometimes we include a section called Extensions for more advanced topics. Introduction Here a clear scientific question is stated, and motivation for answering it is given. The question is presented in the context of the scientific problem, and not as a request to perform a particular statistical method. We avoid questions suggested by the data, and attempt to orient the lab around the original questions raised by the researchers who collected the data. The excerpt found at the beginning of a chapter relates the subject under investigation to a current news story, which helps convey the relevance of the question at hand. xii Instructor’s Guide to Stat Labs Data Documentation for the data collected to address the question is provided in the Data section. Also, this section includes a description of the study protocol. The data can be found at the Stat Labs website: www.stat.berkeley.edu/users/statlabs/ Background The Background section contains scientific material that helps put the problem in context. The information comes from a variety of sources, and is presented in nontechnical language. Investigations Suggestions for answering the question posed in the Introduction appear in the Investigations section. These suggestions are written in the language of the lab’s subject matter, using very little statistical terminology. They can be used as an assignment for students to work on outside of the classroom, or as a guide for the instructor for discussing and presenting analyses to answer the question in class. The suggestions vary in difficulty, and are grouped to enable the assignment of subsets of investigations. Also included are suggestions on how to write up the results. Appendix A gives tips on how to write a good lab report. Theory The theoretical development appears at the end of the chapter in the Theory section. It includes both material on general statistical topics, such as hypothesis testing and parameter estimation, and on topics specific to the lab, such as goodness-of- fit tests for the Poisson distribution and parameter estimation for the log-normal distribution. The exercises at the end of the Theory section are designed to give practice with the theoretical material introduced in the section. Some also extend ideas introduced in the section. The exercises can be used for paper-and-pencil homework assignments. Statistical Topics The table below lists the main statistical topics covered in each chapter. All of the basic topics found in most mathematical statistics texts are included here: descriptive statistics, experimental design, sampling, estimation, testing, contingency tables, regression, simple linear least squares, analysis of variance, and multiple linear least squares. We also list some of the additional specialized topics covered in each chapter.

Stat Labs: Mathematical Statistics Through Applications

Using Epidemiological Evidence in Tort Law: a Practical Guide Aleksandra Kobyasheva

Analysis of Scientific Research on Eyewitness Identification

Probability, Explanation, and Inference: a Reply

Patents & Legal Expenditures

Jurimetrics--The Exn T Step Forward Lee Loevinger

Problems with the Use of Computers for Selecting Jury Panels Author(S): George Marsaglia Source: Jurimetrics, Vol

Teaching Statistics to Law Students

Assessing Randomness in Case Assignment: the Case Study of the Brazilian Supreme Court

Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records

Evidence-Based Crime Investigation. a Bayesian Approach

Auditable Blockchain Randomization Tool †

Statistics in Litigation: a Selective Bibliography