<<

A Report on the Future of Author(s): Bruce G. Lindsay, Jon Kettenring, David O. Siegmund Source: Statistical Science, Vol. 19, No. 3, (Aug., 2004), pp. 387-407 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/4144386 Accessed: 23/06/2008 09:31

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ims.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

http://www.jstor.org Statistical Science 2004, Vol. 19, No. 3, 387-413 DOI 10.1214/088342304000000404 0 Instituteof MathematicalStatistics, 2004 A Report on the Future of Statistics Bruce G. Lindsay, Jon Kettenring and David O. Siegmund

Abstract. In May 2002 a workshopwas held at the National Science Foun- dation to discuss the future challenges and opportunitiesfor the statistics community. After the workshop the scientific committee produced an ex- tensive reportthat described the general consensus of the community.This articleis an abridgmentof the full report. Key words and phrases: Research funding, National Science Foundation, challenges, opportunities,statistical education.

1. INTRODUCTION on the future of statistics that was held at the National Science Foundation(NSF) in May of 2002. Evidence is all about us for the currentunique op- portunities for statistics. Consider, for example, the 1.1 The Workshop three pillars of the Mathematical Sciences Priority The was held at the of the Area of the National Science Foundation: handling workshop request Foun- dation and was a scientific massive data, modeling complex systems and dealing organized by committee with uncertainty.All three are primaryinterests of the of nine members. That same committee prepared a discipline of statistics. Never before has statistical final report with the guidance and assistance of the knowledge been more important-nor as widely workshopparticipants and many others.The full report useful-to the scientific enterprise. (83 pages) is available online at http://www.stat.psu. Massive amounts of data are collected nowadays in edu/~bgl/nsf_report.pdfThis articleis an abridgedand many scientific fields, but unless there are properdata updatedversion of the report. collection plans, this will almost inevitably lead to There were about 50 participantsin the workshop, massive amountsof useless data.Without scientifically chosen to representthe breadthof the statistical pro- justified methods and efficient tools for the collection, fession. There were a significantnumber of non-U.S. explorationand analysis of data sets, regardlessof their participants,but for the most partthe workshopand the size, we will fail to learnmore aboutthe often complex full reportwere focused on statistical sciences within and poorly or partlyunderstood processes thatyield the the boundariesof the United States. data. The scientific committeedecided that, for maximum To master this enormous opportunity,the statistics impact, the report should be directed to a wide range communitymust addressthe many challenges that are of audiences.Thus our targetwas not just statistics re- arising. Some of these are intellectualchallenges; oth- searchers,but also such importantsupporting players ers are infrastructural,arising from the changing tides as collaborators,department heads, college deans and of external forces. This document records an attempt funding agencies. by the statistics community to identify and address The workshop was designed to focus on aspects of these challenges and forces. It is based on a workshop the statistics field that were particularlyrelevant to the NSE As a consequence, was not in- Bruce G. is cluded. It is a Lindsay Distinguished Professor,Depart- large and thriving subdiscipline, with ment of Statistics, Penn State University, University many departmentsof biostatisticsassociated with med- Park,PA 16802-2111, USA(e-mail: [email protected]).Jon ical schools throughoutthe United States. Even though Kettenringis AdjunctProfessor Drew University,Sum- it has been excluded, we should point out that research mit, NJ 07901, USA (e-mail: [email protected]). in biostatistics constitutes a major component of the David O. Siegmund is Professor Department of Sta- total researcheffort in statistics. tistics, StanfordUniversity, Stanford, CA 94305-4065, We should also note that the National Science Foun- USA (e-mail: [email protected]). dation (1998) report98-95, widely known as the Odom 387 388 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

Report (and so cited here), was an importantdocu- of cautious principles for drawing scientific conclu- ment for our report because of its role in generating sions from data.This principledapproach distinguishes Foundationpolicy. It was writtento providean external statistics from a larger venue of data manipulation, assessment of the needs of the mathematicalsciences organization and analysis. An overarching principle including statistics. dictates that one should provide a measure of the un- Since it is well written and important,and provides certaintyfor scientific statementsbased on data. Such supportfor many of our main points, it is cited often statisticaltools as confidence coefficients, significance in the full report.For example, the reportidentifies the levels and credible regions were designed to provide three activities of mathematiciansas follows: primary easily interpretedmeasures of validity.When used ap- (1) Generatingconcepts in fundamentalmathematics; propriately,these tools help to curb drawingfalse con- (2) Interactingwith areas that use mathematics,such clusions from data. as science, engineering, technology, finance, and Of course, statisticiansdo not own the tools of statis- nationalsecurity; and tics any more than mathematiciansown mathematics. (3) Attractingand developing the next generation of Certainlymost statisticalapplications and much statis- mathematicians. tical researchis carriedout by scientistsin othersubject matterareas. The essential role of statisticalresearch is After substitutingstatistics for mathematics,this tri- to new tools for use at the frontiersof science. chotomy serves well to describe the activities develop primary we will demonstrate of statisticians as well. One fundamentaldistinction In the later sections of this report between mathematicsand statistics lies in the balance the very exciting statistical research possibilities that in recent In the between items (1) and (2), as will be discussed later. have arisen years. particular, possibil- ities for data collection and storage have opened the 1.2 What Is Statistics? need for whole new approachesto data analysis prob- An importantdriving force behindthe workshopand lems. the was the that the role of resultingreport perception 1.3 Our Scientific Domains the statisticsprofession is often only poorly understood by the rest of the scientific community.Much of the in- The scientific domains of statisticalwork are nearly tellectual excitement of the core of the subject comes as wide as all scientific endeavor.In the workshopwe from the development and use of sophisticatedmath- focused on two main areas:the centralpart of our sub- ematical and computationaltools, and so falls beyond ject, which we called the core, came first. The keynote the ken of all but a few scientists. One of our was goals speakerwas lain Johnstoneof StanfordUniversity. to this improve situation. Then came five domains of application: biologi- To fulfill this the first at the work- need, speaker cal science (WarrenEwens, University of Pennsylva- the eminentD. R. Cox of Oxford was shop, University, nia), engineering and industrialstatistics (Vijay Nair, asked to startwith the basics and "Whatis sta- identify University of Michigan), geological and environmen- tistics?"This was to be addressed question repeatedly tal sciences (RichardSmith, University of North Car- the course of the for the sake of throughout workshop information (Werner Steutzle, its wider scientific audience. We summarize some of olina), technology University of Washington),and social and economic the key points here. science (Joel Horowitz, Northwestern University). Statistics is the discipline concerned with the study These categorieswere chosen to correspondroughly to of variability,with the study of uncertaintyand with the differentdirectorates of the NationalScience Foun- the study of decision-making in the face of uncertainty. dation in which the researchis (In writing Whereas these are issues that are crucial throughout supported. the sciences and engineering, statistics is inherently an the report, a sixth domain-physical sciences-was interdisciplinary science. Even though statistics does added due to the increasing role of statistics in this not have its own concrete scientific domain (like rocks, area.) clouds, stars or DNA), it is united through a common In addition, there were talks by Chris Heyde (Aus- body of knowledge and a common intellectual her- tralian National University and Columbia University) itage. and James Berger (Duke University) on "Statisticsin A distinguishing feature of the statistics profession, the InternationalScene" and "Institutes:The Role and and the methodology it develops, is the focus on a set Contributionto Statistics,"respectively. THEFUTURE OF STATISTICS 389

Note that the subject of statistics does not have an However, one of the key conclusions of the partici- agreed-upon division of its heritage into distinct ar- pants of the workshop was that statistics has become eas of research, such as "algebra,analysis, topology more and more distinct from the other mathematical and geometry"in mathematicsor "inorganic,organic, areas. The scientific goals of statisticians and the di- physical and biophysical"in chemistry.There are in- rections of modern science point to a world where least as stead a centralportion of the research,which we have computerand informationscience tools are at of chosen to call the core, and a variety of applications- importantto statistics as those probabilitytheory. of the academicstatistics com- oriented research that we have divided by scientific A substantialfraction field. munity works in departmentsother than statistics.This with statistics In a later section of this article, each of these areas, can occur even in universities depart- where can be found in business schools, save one, will be given an overview in terms of re- ments, they and science across the searchactivity. Unfortunately, social and economic sci- departments spec- trum. In schools without statistics departments,as for ence was excluded from the full report.The difficulty in colleges, there are often statisti- the editorsfaced is thatthis areais not only rathersepa- example four-year cians within the mathematicsdepartment, where they ratedfrom the rest, but is also quite complex. Research needed for education. there workersin the field are most often housed not in sta- are undergraduate Finally, are also many statisticianswho work in biostatisticsde- tistics departments,but instead in such departmentsas partmentsand in medical schools. economics, psychology or sociology. It includes sev- Going beyond the academic community, but well eral domains that have their own mature, specialized connected to it, are many more statisticiansemployed statisticsliteratures, such as and econo- psychometrics in and business, as well as many users of metrics. We felt that an review of this rich government adequate statistics. Regarding the field of statistics, the Odom and area was our time frame and sophisticated beyond Reportstated: resources. com- As a supplementto this report,it is worthnoting that The interactionbetween the academic in and the book Statistics in the 21st Century(Raftery, Tanner munity and users industry govern- and hence there and Wells, 2002) contains 70 papers written by many ment is highly developed, is a disseminationof theoreticalideas of the leading scholars of today. It is recommendedto rapid and of from statisticiansas a valuablecompendium of information, challenging problems applica- as well as a traditionof covering the currentstatus and future directionsof re- tions, interdiscipli- work. searchin a wide variety of statisticaltopic areas. nary Statisticiansare found in governmentagencies from 1.4 The Statistical Community the Census Bureau to the National Institute of Stan- By the nature of their work, statisticianswork in a dards and Technology to the National Institutes of wide arrayof environments.In the United States there Health. They are employed across a wide range of in- are many statisticianswho work in departmentsof sta- dustries, often for quality control work. In particular, tistics. Such departmentsare found at most of the major the pharmaceuticalindustry has been a leading em- researchuniversities. There are now 86 Ph.D. programs ployer of statisticians, who carry out the design and in statistics, biostatistics and . They have analysis of experimentsrequired for drugdevelopment. tendedto focus on graduateresearch, including collab- 2. CURRENTSTATUS oration with other disciplines, and education, as well as undergraduateservice courses. In the full report, the second chapter provides an These departments largely arose by splitting off overview of the history of statistics. Since the main from mathematicsdepartments in the second half of reasonfor its inclusion was to informthe generalscien- the twentieth century. Based on this long term re- tific audience about the developmentof statistics over lationship, statistics is often viewed as a branch of the course of the twentiethcentury, it has been omitted mathematics.This structuralview is evidenced in the from this article, and we proceed to the currentstatus National Science Foundation itself, in which proba- of the profession. For the materialin this section, the bility and statistics is one programof the Division of editors are extremely grateful for the contributionsof MathematicalSciences, placed side by side with such Iain Johnstone, who developed most of the data and "pure"branches as topology and algebra. content as partof his workshopaddress. 390 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

2.1 The Quality of the Profession TABLE 3 The Odom Reportprovided a strongendorsement of American StatisticalAssociation (ASA) 16,000 Instituteof MathematicalStatistics 3,500 the of the U.S. effort in statistics, that (IMS) quality stating Biometric (ENAR/WNAR) 3,500 "the statistical Society sciences are very healthy across all sub- AmericanMathematical Society (AMS) 30,000 areas in the United States, which is the clear world MathematicalAssociation of America (MAA) 33,000 leader." Society for Industrialand Applied Mathematics(SIAM) 9,000 At the workshop lain Johnstone presented an in- formal survey of four leading statistics journals (two of which are based in the United Kingdom) to pro- A better measure might be the annual number of vide evidence for this statement. Table 1 shows the statistics Ph.D.'s. However, these counts suffer from affiliation of the U.S.-based authorsin these journals. many of the usual data collection challenges: defini- Approximatelyone-half of the authorshad U.S. affil- tion of population,quality of data and census nonre- iations. Most of these authorsare in academic institu- sponse. Table4 presentsthree rather different numbers tions. Moreover,the vast majoritycome from statistics for statistics, as well as two estimates for the rest of or biostatisticsdepartments, with less than 1 in 10 com- mathematics. The AMS with non- ing from a departmentof mathematicsor mathematical survey acknowledgesproblems from statistics The NSF of sciences. Table2 shows the reportedsources of funding response programs. Survey Earned Doctorates number is derived for this publishedresearch. by aggregating "statisticalsubfields" from the 300 fine cate- Clearlythe National Science Foundationand the Na- nearly which fields of doctorate are categorized in tional Institutesof Health are the majorrole players in gories by this Foundation-wide funding research in statistics. However, as will come survey. If we consider the numberof doctoratesin math ex- up later, this split in our funding is a key factor in di- cluding statistics, there is greater coherence between minishing our presence in either scientific agency. the AMS and NSF surveys, again suggesting problems 2.2 The Size of the Profession with the identificationand collection of data for statis- tics in particular. One way to gauge the size of the statistics profes- The NSF survey does provide data back in time sion is to compare it with the rest of mathematics. that is useful for understandinghow the relationship In Table 3 we the give approximatenumber of mem- between statistics and the rest of mathematics has bers in the statisticsand mathematicssocieties. leading changed over the last 35 years. Figure 1 shows that the These numbersare somewhatdifficult to due compare annualnumber of statisticsPh.D.'s (perNSF definition) to lists and overlappingmembership possible reporting startedat 200, less than 1/3 the numberof mathematics biases, but do thatthe numberof statistics they suggest degrees, but has grown more or less linearly ever since be somewherebetween one-fourth professionalsmight to 800, staying roughly equal with mathematicsin the to one-half the numberof mathematicians. 1980s and falling behind mathematicsslightly since. The American MathematicalSociety annual survey The number of research doctorates is a noisy sur- for 2001 indicates that there are 86 doctoralprograms rogate for the level of research activity. Regardless, in statistics, biostatistics and biometrics (Group IV). one might find it surprisingthat there are three pro- This can be comparedwith 196 programsin other ar- gram directors in the Division of MathematicalSci- eas of mathematics(Groups I, II, III and V). Again, the ences (DMS) for Statistics and Probabilityas opposed numbersare not easy to compare,but do provide some to 19 for all other mathematicalareas. This does not idea of the scale. TABLE4 TABLE 1 TABLE2 AMS Survey 2000 (excluding probability) 310 Statistics 49% NIH 40% Amstat Online 2000 (self reports) 457 Biostatistics 23% NSF 38% NSF Surveyof EarnedDoctorates 2000 822 Industry 6% NSA 9% (accumulatedover statisticalsubfields) Math. Science 5% ARO/ONR/EPA 4% For reference,math excluding statistics Mathematics 4% Other 9% AMS Survey 2000 809 Other 13% NSF Survey of EarnedDoctorates 925 THE FUTURE OF STATISTICS 391

Number of Ph.D. recipients (NSF Survey Earned Doctorates) With the advent of high-speed computers 1400 and sensors, some experimental sciences can now generate enormous volumes of 1200 data-the human genome is an example- 1000 and the new tools needed to organize this dataand extract significant information from it will dependon the mathematicalsciences.

600 Of all the mathematical sciences, the statistical sci- 400 ences are uniquelyfocused on the collection and analy- sis of scientific data. senior statisticianhas felt Subfields" Every 200--+-"Statistical -*-"Other Mathematics" the impact of this startlinggrowth in the scale of data

Year 1968 1972 1976 1980 1984 1988 1992 1996 in recent years. Year 1968 1972 1976 1980 1984 1988 1992 1996 2.3.2 Increased opportunitiesfor scientific collab- FIG. 1. NSF survey on the numberof doctorates by subject mat- oration. A second theme of this is that ter. major report concurrent with the increased demand for statistical knowledge in the sciences, comes an increased pres- seem proportionateto the size of the discipline or its sure for statisticiansto make majortime commitments potential importancein building connections between to gaining knowledge and providingexpertise in a di- DMS and other sciences and engineering. This issue verse set of scientific areas. As noted in the Odom Re- has been discussed with the Their re- DMS leadership. port: sponse is thatthe numberof programofficers in an area is stronglyrelated to the numberof researchproposals Both in applications and in multidiscipli- received in that area, and at this time statistics is ac- naryprojects... thereexist seriousproblems tually overrepresentedrelative to this number,the so- in the misuse of statisticalmodels and in the called proposalpressure. quality of the education of scientists, engi- Thus we are, in a sense, victims of our own suc- neers, social scientists, and other users of cess at being interdisciplinaryand obtaining funding statistical methods. As observationsgener- from other sources, particularlythe National Institutes ate more data, it will be essential to resolve of Health (NIH). This plus the low funding rates in this problem, perhaps by routinely includ- the DMS are the sources of reduced proposal pres- ing statisticianson researchteams. sure. One is that we have a smaller role, consequence The Odom Report furthernoted the scientific prob- and less influence, in this division of the NSF. Another lems of the future will be extremely complex and consequence, to be developed later, is that there is not require collaborative efforts. It states that it will be enough encouragementto do core research,as DMS is virtuallyimpossible for a single researcherto maintain virtuallythe only source of funding. sufficient expertise in both mathematics/computersci- 2.3 The Odom Report: Issues in Mathematics ence and a scientific discipline to model complex prob- and Statistics lems alone. We wholeheartedlyagree with this finding. The Odom Report provided some broad statements 2.3.3 The next generation. In several ways the fu- about the most importantissues in mathematicsas a ture challenges to statistics differ from those of mathe- whole. In this section we discuss them in the context matics. For example, the Odom Report identifies three of the currentstatus of statistics. We will later revisit key issues: these themes in the designatedsubareas. ...the mathematics community in the 2.3.1 Data collection. A majortheme of our report United States shares with other nations sig- is that the statistics profession is experiencing a dra- nificant disciplinary challenges including a matic growth in its scientific value and its scientific condition of isolation from other fields of workload due to changes in science and, in particular, science and engineering, a decline in the data collection. The Odom Reportstated that: number of young people entering the field, 392 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

and a low level of interactionwith nonacad- is introspectiveactivity, as noted above, a centralphi- emic fields, especially in the privatesector. losophy of the core is that the importanceof a problem [Emphasisis ours.] is not dictatedby its intrinsicbeauty (as, say, in abstract Rather,its is dictated its It is clear thatthe middle concernis of mathematics). importance by greatimportance for wide or, for its to us. It is our observationthat the numberof U.S. resi- potential application alternatively, value to of the scientific dents the statisticsfield has shrunkenover the expandunderstanding validity entering of our methods. years and the growthin Ph.D. degrees has come largely Through this combination of looking inward and from foreign recruitment. looking outward,the core serves very much as an in- On the other hand, in the opinion of the scientific formationhub. It is defined by its connectivity to, and committee,the Odom Report'sconcern aboutisolation simultaneoususe in, virtuallyall other sciences. That from other fields, scientific and nonscientific, applies core statistical and can be used less and less to the currentstatistics scene. concepts methodology simultaneouslyin a vast range of sciences and applica- tions is a source of efficiency in statistics and, as 3. THECORE OF STATISTICS great a consequence,provides high value to all of science. Outside the collaborativedomain, the core activity Core research might be contrastedto "application- of statisticiansis the constructionof the mathematical, specific statistical research,"which is more closely conceptual and computationaltools that can be used drivenby the need to analyzedata so as to answerques- for informationextraction. Much of the research has tions in a particularscientific field. Of necessity,this re- as its mathematicalbasis probability theory, but the searchdraws on core knowledge for tools as well as for end goal is always to provide results useful in empir- an understandingof the limitationsof the tools. It also ical work. This distinguishes the theoretical research provides raw materialfor futurecore researchthrough efforts of statisticiansfrom most areas of mathematics its unmet needs. in which abstractresults are pursuedpurely for theirin- 3.1 Understanding Core Interactivity trinsic significance.As was statedin the Odom Report: As Johnstone noted in his workshop address, one Statistics has always been tied to applica- way to demonstratethe amazing way that the core ac- tions, and the significance of results, even tivities of statisticsprovide widespread value to the sci- in theoretical statistics, is strongly depen- entific community is to consider data on the citations dent on the class of applications to which of statistical literature.He did offer a strong caution the results are relevant. In this aspect it that citation data should not be overinterpreted,be- stronglydiffers from all other disciplines of cause high citations for individual articles can reflect the mathematicalsciences except computa- things other than quality or intrinsic importance.Just tional mathematics.[Our emphasis.] the same, it is offeredhere because it provides a simple The terminology "core of statistics"is not routinely and accessible measureof the widespreadinfluence of used by statisticiansand so it is useful to describe more statisticalresearch on scientific fields outside of statis- precisely its intended meaning. We define the core of tics. statistics as that subset of statisticalactivity that is fo- The Instituteof Scientific Information(ISI), which cused inward,on the subject itself, ratherthan toward produces the Science Citation Index and its relatives, the needs of statistics in particularscientific domains. created several lists of the most cited scientists in the As a synonym for "core,"the word "inreach"might be 1990s. Based on data provided by Jennifer Minnick, offered. This would reflect the fact that this core ac- ISI (Oct. 11, 2000), 18 of the 25 most cited mathemat- tivity is the opposite of outreach.As such, almost all ical scientists of the period 1991-2001 were statisti- statisticiansare active in both inreachand outreachac- cians or biostatisticians.Citation counts per authorare tivities. given in Figure 2. In addition,the Journal of the Amer- Researchin the core area is focused on the develop- ican StatisticalAssociation was far and away the most ment of statisticalmodels, methods and relatedtheory cited mathematical science journal. (Editors' note: based on the general principlesof the field. The objec- Since the full report the data on citations has shown tives are to createunifying philosophies, concepts, sta- an even more remarkableshift towardstatisticians; see tistical methodsand computationaltools. Althoughthis the website http://in-cites.com/top/2003/index.htmlfor THE FUTURE OF STATISTICS 393

Citationsof MathScientists in 1990s RecentCitations of Efron'sBootstrap Paper

CompSci & Eng (34, 9.8%) Stat Geo & EnviroSci (50, 14.4%) Other Math o & Co Biology Ag (66, 19.0%)

90 ......

0 o - c iiiiiiii Medicine(71, 20.4%)

:irB~:iaiiiiil~~:igiii: il~i18818118881iia-iii i : ...... Social Sci (93, 26.7%) Phys Sci & Others(34, 9.8%) ...... CO ...... FIG. 3. 500 recent citations Efron's paper, 152 were in ...... Of of ...... statistics. The distributionamong sciences of the others is shown ...... above......

specific areas, finds importantideas, and creates the 0 O :::.::1::"8:i::i: 0r-. --~ 0:aar that widen As 0 I:ii 11111188 ir?~: I-::-~- 81188888~ii:: 8188 11118 881118181 ~~~~ 01 necessary generalizations applicability. an example, consider the developmentof methods that 0";: had their origins in age-specific death rates in actuar- O ial work. In 1972 and 1975 the ideas of proportional hazardsregression and partiallikelihood analyses were introduced,which greatly enriched the tools available for the analysis of lifetime data when one has cen- sored data along with covariateinformation. Since that FIG. 2. Citationcounts of the most cited mathematicalscientists. time, these ideas and this methodologyhave grown and spread throughoutthe sciences to all settings where data that are censored or partiallyobserved occur. This a list of the top 10 names; all were statisticiansat the includes astronomy,for example, where a star visible time of this writing.) with one measurementtool might be invisible due to Thereis evidence thatthis high rate of citationof sta- inadequatesignal with a second measurementtool. tistical articles, relative to mathematicsas a whole, is relatedto its wide scientific influence.For example, the 3.2 A Detailed Example of Interplay Hall and which consid- paper by Titterington(1987), The following recent example illustratesin more de- ers the thornyproblem of choosing a smoothing para- tail the theme that the core researchin statistics feeds meter in nonparametricfunction estimation,has about off and interacts with outreach efforts. Since at least of its citations outside definition of the core 2/3 any some of the work is NSF funded,it indicatesin partthe of the IEEE Journal statistics, including journals, of kind of interactionsthat should be kept in mind when Biomedical and Journal de Microscopy, Engineering supportingcore research. in Physique. This is despite its appearance a core re- In 2001 three astrophysicistspublished in Science searchjournal, and its theoreticalcast. (Miller, Nichol and Batuski, 2001) a confirmationof One of the most importantarticles that leapt directly the Big Bang theory of the creation of the universe. from core researchinto the mainstreamof many scien- They studied the imprintof so-called acoustic oscilla- tific areas is the one by Efron (1979) that introduced tions on the distributionof matterin the universetoday bootstrapmethods. An examination of 500 recent ci- and showed it was in concordance with the distribu- tations of this paper shows that only 152 of these ci- tion of cosmic microwave backgroundradiation from tations appeared in the statistics literature.Figure 3 the early universe. It not only providedsupport for the shows the wide dispersal of this innovation that was Big Bang theory, it also provided an understandingof generatedin the core of statistics. the physics of the early universe that enabled predic- Of course, the core also arrives at meaningful and tions of the distributionof matterfrom the microwave useful methods for science because it reaches out to backgroundradiation forward and backwardin time. 394 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

The discovery was made using a new statistical The estimationproposal caught the attentionof oth- method, the false discovery rate (known as the FDR), ers because of its potential for threshold selection to detect the oscillations. At false discovery rate 1/4, in wavelet shrinkage methods for statistical signal eight oscillations were flagged as possibly inconsistent processing. Statisticians at Carnegie Mellon Univer- with a smooth, featureless power spectrum.This and sity (CMU) began work on FDR, both as a core statis- furtheranalyses led the authorsto conclude that the os- tics topic and also in collaborationwith astrophysicists cillations were statisticallysignificant departures from Miller and Nichol, two of the above-mentionedastro- a featurelessmatter-density power spectrum. physicists. Initially, they considered signal detection The method was developed through collaboration problemsin huge pixel arrays.Later in their collabora- with two statisticiansand in TheAstronomi- published tion, the physicists recognizedthat this approachwould cal Journal et this the (Miller al., 2001). Using method, apply to the acoustic oscillation signatures,which led authorswere able to make their discovery and publish to the Science article. it in Science Nichol and while (Miller, Batuski, 2001) Miller and Nichol report that when they give talks other were still the competing groups plowing through to the physics community on this work there is great plethoraof data. interestin the FDR CMU It is to trace the of this approach. physics professor interesting history success, Bob Nichol writes, in "I would like to because it illustrates well how the "information part, personally quite the that has hub" 4 illustratesthe route emphasize symbiotic relationship grown operates. Figure migration between the statisticians and here at of the statisticalidea. astrophysicists CMU. It is now becoming clear thatthere are core com- When one tests many hypotheses on the same data mon both sets of domain researchersfind in- set, one must the levels of the tests problems adjust significance of FDR to to avoid spurious rejection of true null terestinge.g. application astrophysicalprob- hypotheses. lems." This "simultaneousinference" problem has perhapsre- In the the mathemat- ceived the most attentionin medical statistics-at least, fact, astrophysicistsappreciate ical of the statistics want to be all of the referencescited as motivationappeared in the beauty (and involved), medical literature.Indeed, the main statistical contri- while the statisticiansclearly relish their role in help- to understandthe cosmos. In additionto these bution here was not to propose the sequential P-value ing joint procedurethat was used in this example per se, which projects,this collaborationalso is drivingseparate new actually went back to Simes in the 1980s (and maybe research in the individual domains. In summary,this earlier)(Simes, 1986), but ratherto establisha convinc- multiway collaborationhas simulated both new joint ing theoreticaljustification. This theoreticaljustifica- research, as well as new separateresearch in the do- tion, the FDR control, led other researchersto propose main sciences. Therefore,it is a perfect marriage! a version for estimation. 3.3 A Set of Research Challenges From Medicine to The Big Bang via FDR What are the researchchallenges facing the core of statistics?Identifying such challenges in statisticsis in- Acoustic Physics-signal Oscillation detection herently different than in other sciences. Whereas in mathematics,for example, much focus has been given to famous lists of problems whose challenge is endur- in the evolve relative Genovese ing, statistics problems always wasserman to the development of new data structuresand new tools. In unlike the Estimatioarys, computational addition, laboratory sciences, statistics does not have big expensive prob- ' o99s lems with multiple labs competing-or cooperating- on majorfrontiers. It is perhapsmore true in statistical Signal7Processi - science than in other sciences that the most important advances will be unpredictable. edical Statisti For this reason we need to maintain an 70's-80's underlying philosophy that is flexible enough to adapt to change. FIG. 4. Illustraton of the migration of statistical ideas into the At the same time it is importantthat this future re- corefrom outside, their generalizationin the core,followed by their search not degenerate into a disparate collection of export to new areas. techniques. THE FUTURE OF STATISTICS 395

One can identify some general themes drivingmod- 3.3.3 Data analysis outside statistics. Many meth- em core arearesearch. The challenges are based on the ods and computational strategies-such as machine developmentof conceptual frameworksand appropri- learningand neuralnetworks-have developedoutside ate asymptoticapproximation theories for dealing with the field of statistics. For the most part these meth- (possibly) large numbers of observations with many ods are not informed by the broader understanding parameters,many scales and complex dependencies. they might gain from integrationinto mainstreamsta- The following subsectionsidentify these issues in more tistics. Thus futureresearch should involve coherently detail. integrating the many methods of analysis for large and complex data sets being developed by the ma- 3.3.1 Scales of data. It has become commonplace chine learningcommunity and elsewhere into the core to remarkon the explosion in data being gathered.It knowledge of statistics. is trite but true that the growth in data has been expo- Following our tradition,this researchcould presum- nential, in data analysts quadraticand in statisticians ably be based on building models and structuresthat linear.Huber's 1994 taxonomy of data sizes, allow descriptionof risk as well as its data-basedas- sessment. It would then include developing principled Small Medium 106 Tiny 102, 104, tools for guided adaptationin the model building ex- Large 108, Huge 1010 ercise. Another possibility was expressed by Breiman (2001), who stated that "If our goal as a field is to use already looks quaint (Wegman, 1995). For example, data to solve problems, then we need to move away a single data base for a single particle physics exper- from exclusive dependence on data models and adopt iment using the BaBaR detectorat the StanfordLinear a more diverse set of tools." AcceleratorCenter has 5 x 1015bytes. 3.3.4 Multivariateanalysis for large p, small n. In There will continue to be research issues at every statisticalapplications there are many scale-we have not solved all for data sets many important problems more variables (p) than there are units being mea- under 100. However,a new of the to sta- part challenge sured (n). Examples include analysis of curve data, tistics is that the mix of issues, such as generalizabil- spectra,images and DNA microarrays.A recent work- and as well as the of ity, scalability, robustness, depth shop titled "High Dimensional Data: p > n in Math- scientific of the will with understanding data, change ematical Statistics and in Biomedical Applications," scale and context. Moreover, it is clear that our re- held in Leiden, Netherlands,highlighted the currentre- search and graduatetraining has yet to fully recognize searchimportance of this subject across many areas of the computationaland other issues associated with the statistics. largerscales. The following more specific example can be offered to illustratehow innovationsin otherfields might prove 3.3.2 Data reduction and compression. We need useful in this problem, thereby reinforcing the idea new "reductionprinciples." R. A. Fisher gave us many that the core continuallylooks outwardfor ideas. Ran- key ideas for data reduction,such as sufficiency,ancil- dom matrixtheory describes a collection of models and larity, conditional arguments,transformations, pivotal methods that have developed over the last 40 years in methods and asymptotic optimality. Invariancecame mathematicalphysics, beginning with the study of en- along later.However, there is a clear need for new ideas ergy levels in complex nuclei. In recent years these to us in areas such as model guide selection, prediction ideas have created much interest in probability and and classification. combinatorics. One such idea is the use of as a "compression" guid- The time now seems ripe to apply and develop these for data The basic idea is that ing paradigm analysis. methods in high dimensional problems in statistics and structural of data is related to our good understanding data analysis. Scientists in many fields work with large to ability compactly store it without losing our abil- data matrices [many observations (n) and many vari- ity to "decompress"it and recover nearly the original ables (p)] and there is little current statistical theory to information.For example, in the domain of signal and support and understand heuristic methods used for di- image data,wavelets are actuallynot optimalfor repre- mensionality reduction in principal components analy- senting and compressingcurved edges in images. This ses, canonical correlation analyses and so forth. suggests the need for new representationalsystems for Early results suggest that "large n-large p" theory bettercompression. can in some cases yield more useful and insightful 396 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND approximationsthan the classical "large n-fixed p" subjectmatter knowledge and greaterspecialization in asymptotics. For example, the Tracy-Widom distrib- applications.This in turn has created a challenge for ution for "Gaussianorthogonal ensembles" provides statistics by putting the core of the subject under a single distribution,which with appropriatecentering stresses that could with time diminish its currentef- and scaling provides really quite remarkablyaccurate fectiveness in the consolidation of statistical knowl- descriptions of the distributionsof extreme principal edge and its transferback to the scientific frontiers.In componentsand canonical correlationsin null hypoth- essence, the multidisciplinaryactivities are becoming esis situations. sufficientlylarge and diverse that they threatenprofes- sional cohesiveness. 3.3.5 Bayes and biased estimation. The decade of If there is exponential growth in data collected and the nineties broughtthe computationaltechniques and in the need for data analysis, why is core researchrele- the power to make Bayesian methods fully imple- vant?Because unifying ideas can tame this growth,and mentablein a wide range of model types. A challenge the core area of statistics is the one place where these for the coming decades is to fully develop and ex- ideas can and be communicatedthroughout sci- ploit the links between Bayesian methods and those happen ence. That is, core statistics researchis ac- of modem nonparametricand semiparametricstatis- promoting an infrastructure for science from tics, includingresearch on the possible combinationof tually important goal the of view of efficient and commu- Bayesian and frequentistmethodology. point organization nication of advancesin data One clear issue is that for models with huge data analysis. A core of statistics a con- problemswith large numbersof variables,the ideas of healthy (through lively nection with is the best for efficient unbiasednessor "near"unbiasedness (as for the maxi- applications) hope and between do- mum likelihoodestimator) become less useful, because assimilation,development portability of the of data methods that is the idea of data summarizationimplicit in statistical mains explosion analytic As it is a infrastructurefor science methodology becomes lost in the complexity and vari- occurring. such, key ability of any unbiasedmethod. This points to the need generally. In 4 of the full we identifiedand elab- for a more extensive "biased estimation theory" and Chapter report oratedon the and needs for the new theories for huge data problems with large num- following opportunities bers of variables. core: Given their use in all ever-increasing kinds of model- * Adapting to data analysis outside the core. The it is also clear thatthere is a need for buildingexercises, growth in data needs provides a distinct challenge further of Monte Carlo methodsfor inference. analysis for statisticiansto provide, in adequatetime, intel- methods 3.3.6 Middle ground between proof and computa- lectual structurefor the many data analytic tional experiment. Yet another challenge for being developed in other arenas. is theoretical work in the coming decades is to develop * Fragmentationof core research.Outreach activity an agreed-upon middle ground between the pace of high and increasingfor all sorts of good reasons.We proof (too slow), and the swamp of unfetteredcom- think there has been an unintendedconsequence of putational experimentation(too arbitraryand uncon- this growth-a relativeneglect of basic researchand vincing). There are many problems in which rigorous an attendantdanger of our field fragmenting. mathematicalverifications might be left behind in the * Manpowerproblem. There is an ever-shrinkingset developmentof methodologyboth because they are too of researchworkers in the United States who workin hard and because they seem of secondaryimportance. core area research.This manpowerproblem is des- For example, despite many years of work, there are im- tined to grow worse, partlyfrom the generalshortage portant families of statistical models, such as mixture of recruitsinto statisticsand partlybecause outreach models, in which identifiability questions are largely areas are pulling statisticians away from core re- ignored because of the difficult analysis that is involved search. and the ever-widening variety of model structures that * Increasedprofessional demands. The core research must be investigated. of statistics is multidisciplinaryin its tools: It bor- rows from (at least) informationtheory, computer 3.4 Building and Maintaining the Core Activities science and physics as well as from probabilityand Exploitation of the current manifold opportunities traditionalmath areas. As statisticianshave become in science has led to an increased demand for greater more and more data-focused(in the sense of solving 'HE FUTURE OF STATISTICS 397

real problems of modern size and scope), the skills goals include improvedtailoring of medical treatments needed in core areas have gone up. This need for to the individual (e.g., by devising treatmentsuited to ever-increasingtechnical skills provides a challenge the individual'sgenetic makeup),alleviation of malnu- to keeping the core vital as a place for integrationof trition and starvationby improving agriculturallyim- statisticalideas. portantplant species and domestic animals, improved * Research funding. It seems clear that funding for public health, and betterdefense againstbioterrorism. core researchhas not kept pace with the growth of At the risk of oversimplifyingthe many new devel- the subject. Investigators,rather than beating their opments in biological research,it is useful to consider heads against difficult funding walls, turn their ef- four areas where statistical and computationalmeth- forts towardbetter funded outreachactivities or con- ods have played and will continueto play an important sulting. The most basic needs remain the same: to role: nurturetalent, to give senior people time and space * Biomolecular analysis and ge- to think, and to to into sequence functional encouragejunior people buy nomics refers to science based on of DNA this line of research. analysis blocks of and amino * New New ideas for re- sequences (the building genes) funding paths. supporting blocks of and search statistics meet its chal- acid sequences (the building proteins), funding might help of RNA and in various cel- The full contained an extended global profiles proteins lenges. report lular as used to discover the structure and of a that enable states, example funding strategy might evolution of and and their functions statisticiansto enrich their basic statisticalresearch genes proteins, in normal and abnormal An is with without them processes. example interdisciplinaryactivity pulling the identificationof control imbeddedin the too far out of core research. regions genome that govern the amountof proteinproduced and the conditions underwhich it is 4. STATISTICSIN SCIENCEAND INDUSTRY produced. * Genetic epidemiology. The goal of genetic epi- A distinguishingfeature of statistics as a discipline demiology is to understandthe relative importance is its interaction with the entire spectrum of natural of environmentand genetics in human disease. For and social sciences and with technology. This section example, gene mappinginvolves the use of maps of is concernedwith elucidationof the role of statisticsin molecularmarkers throughout the genome of a par- gatheringknowledge across a wide spectrumof possi- ticular plant or animal to locate the genes that con- bilities. Rather than give a broad, and necessarily in- tributeto phenotypes of interest.It is frequentlythe complete, survey of areas in which statistics has had, first step towardbetter understanding and treatment and will continue to have, an impact, this section fo- of those diseases in plants and animals where inher- cuses on topics that illustrateimportant aspects of the itance plays an importantrole. interplaybetween statistics and other scientific disci- * Evolution, population genetics and ecology study plines. the changes that occur at the population level in plants and animalsin responseto randommutational 4.1 Biological Sciences changes in the population's gene pool and changes Building on the foundationsof agriculturaland ge- in their environment.Although originally oriented netic statisticsdeveloped in the first half of the twenti- towardthe study of evolutionaryrelationships (e.g., eth century,biostatistics, statistical epidemiology and the evidence supportingthe hypothesisof a common randomized clinical trials have been cornerstones of African origin of modernhumans), the ideas of pop- the systematic attack on human disease that has dra- ulation genetics are increasingly used to understand matically increased life expectancy in advanced soci- the evolution of bacteria and viruses (to provide ap- eties during the past half century. propriate vaccines and drugs) and the evolution of Recent progress in molecular biology and genet- proteins in different species of plants and animals (to ics has opened entirely new areas of investigation, understand protein structure and function by iden- where for the foreseeable future there will be rapid ad- tifying parts of related proteins in different species vances in understanding fundamental life processes at that have been conserved by evolution). the molecular level. The long term goals of this re- * Computational neuroscience uses modern methods search are application of the knowledge of molecular of neuroimaging (PET, fMRI) to gain understanding processes to entire organisms and populations. These of the functioning of nervous systems. This raises 398 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

questions both at the level of small numbersof in- humanresponse to medical interventionproduce an in- teractingneurons and at the level of the entirebrain: creasing demand for statisticians who can communi- Which parts of the brain are activatedunder which cate with biologists and devise new methods to guide conditions? How do the brains of normal and psy- experimentaldesign and biological data analysis. chotic individuals differ in their structure and/or 4.2 Engineering and Industry function? How can we use this knowledge for di- agnosis and treatment? Statisticalconcepts and methods have played a key role in industrial over the last 4.1.1 Statistical and computationalmethods. As a development century. in and in turn,have consequence of this enormous diversity of scientific Applications engineering industry, been for research in statistical problems, an expansive set of statistical, probabilis- major catalysts theory and The richness and of these tic and computationalmethods has proved to be very methodology. variety have influenced the of useful. Some methods have proved themselves in a problems greatly development as a number of areas, while others have more specialized statistics discipline. applications. Much of the early work was driven by the needs of Stochastic processes, from finite Markov chains to the agricultural,manufacturing and defense industries. point processes and Gaussian random fields, are use- In recent years, the scope has expanded substantially ful across the entire spectrumof problems.Because of into business and finance, software engineering, and the large amountof dataproduced [e.g., expressionlev- service and health industries. Applications in these els on a microarrayfor tens of thousandsof genes in a areas include credit scoring, customer profiling, de- sample of individuals, or data from up to a thousand sign of intelligenthighways and vehicles, e-commerce, markers(in the future perhapsone hundredthousand) fraud detection, network monitoring, and software distributedacross the genome of thousandsof individ- quality and reliability. uals], challenging issues of multiple comparisonsfre Global competition and increasingcustomer expec- quently arise in these areas. tations are transforming the environment in which Hidden Markov models and Markov chain Monte companies operate.These changes have importantim- Carlo provide important computational algorithms plications for researchdirections in statistics. Follow- for calculating and maximizing likelihood functions. ing are brief descriptionsof four general examples. Some of these statistical methods are classical (e.g., 4.2.1 Massive data sets with complex structure. principal components, likelihood analysis), but even This cuts across all of business and indus- they may require adaptation(principal curves, likeli- topic parts well as other areas discussed in this report). hood analysis of stochastic processes) to deal with the try (as Business and manufacturingprocesses are becoming large amounts of data producedby modern biological and experiments.Other methods (hidden Markov models, increasingly complex. Consequently, engineers are in need of relevantdata to Markovchain Monte Carlo) have developed relatively managers greater guide than ever before. recently in parallel with the modern computing tech- decision-making At the same advances in and data nology necessary to implementthem. time, sensing A common featureof all the efforts describedabove capture technologies have made it possible to col- amounts of data. These data often have is the amount, complexity and variability of data, lect extensive in the form of time with the result that computation(frequently including complex structure series, spatial dimensions with graphics)is an importantaspect of the implementation processes, texts, images, very high of every idea. In view of the diverse mathematicaland hierarchicalstructure and so on. Collection, modeling computationalbackgrounds of scientists engaged in bi- and analysis of these data present a wide range of dif- ological research,it is importantthat computationalal- ficult researchchallenges. gorithms be made as "user-friendly"as possible. This For example, monitoring, diagnosis and improve- may requiresupport for specialiststo providethe "front ment of advanced manufacturingprocesses require end" and documentationnecessary so that laboratory new methods for data compression and feature ex- scientists can use tools developed by statisticianseas- traction, development of intelligent diagnostics, and ily and correctly. real-timeprocess control. These problemsalso involve In summary,the large amountsof data producedby issues of a generalnature such as selection biases, com- modern biological experiments and the variability in puting, scalabilityof algorithmsand visualization. THE FUTURE OF STATISTICS 399

4.2.2 Large-scale computational models. Compu- 4.2.4 Softwareengineering. This is still a relatively tational models and simulation are being used more new field comparedwith traditionalbranches of engi- and more frequently in many areas of application.In neering. Its importanceto the nation is underscoredby manufacturingindustries, competitive market forces the increasing reliance of the U.S. economy and na- and the concomitantpressure to reduce productdevel- tional defense on high quality,mission criticalsoftware opment cycle times have led to less physical testing (National ResearchCouncil, 1996). and greateruse of computer-aideddesign and engineer- Statistics has a significant role to play in software ing methods. Finite-element analysis and other tech- engineering because data are central to managing the niques are used extensively in the automobileindustry software developmentprocess, and statisticalmethods for productdesign and optimization. have proven to be valuable in dealing with several There are similar trends in semiconductormanufac- aspects of it. To mention a few examples, statistical turing,aircraft, defense and other industries.The com- considerations are essential for the construction and putationalmodels are very high dimensional,involving utilization of effective software metrics, and experi- hundredsand even thousandsof parametersand design mental design ideas are the backboneof technology for variables.A single function evaluationcan take several reducing the number of cases needed to test software days on high-end computingplatforms. efficiently (but not exhaustively). Furthermore,statis- Experimentation,analysis, visualization and vali- tical quality control provides the basis for quantitative dation using large-scale computational models raise analysis of various parts of the software process and a varietyof statisticalchallenges. These include (a) de- for continuousprocess improvement. velopment of experimentaldesigns for approximating 4.3 Geophysical and EnvironmentalSciences and exploring response surfaces in very high dimen- sions; (b) incorporatingrandomness and uncertaintyin The term "geophysicaland environmentalsciences" the design parametersand materialcharacteristics into covers many specific fields of study,particularly if en- the computationalmodel; (c) modeling, screening,pre- vironmentalsciences are taken to include the study of diction and optimization. ecological phenomenaand processes. This broad area of statistical does not have an summa- 4.2.3 and The activity easily Reliability safety. design, develop- rized nor a of In- ment and fabricationof reliable that history simple pattern development. highly products the of statisticalwork in the also meet and environmental an- deed, history geophysical safety goals represent and environmentalsciences is intertwinedwith fields other area of major challenge faced by industry.The as diverse as agriculture,biology, civil engineering,at- traditionalfocus in reliability has been on the collec- mosphericchemistry and ecology, among others. tion and analysis of "time-to-failure"data. This poses The collection and processing of large amounts of difficulties in high-reliability applications with few data are features in many of the major components of failures and high degrees of censoring. geophysical and environmentalsciences such as mete- Fortunately,advances in sensing technologies are orology, oceanography,seismology, the detection and makingit possible to collect extensive amountsof data attributionof climate change, andthe dispersionof pol- on degradationand performance-relatedmeasures as- lutantsthrough the atmosphere. sociated with systems and components. While these Statisticians have been actively involved in all of dataare a rich source of reliabilityinformation, there is these fields, but as statistical methodology has ad- a paucity of models and methods for analyzing degra- vanced to include, for example, complex models for dation data and for combining them with physics-of- spatiotemporal data and the associated methods of failure mechanisms for efficient reliability estimation, computation, the potential for direct interactions be- prediction and maintenance. Degradation analysis and tween statisticians and geophysical and environmental device-level failure prediction are integral parts of pre- scientists has increased enormously. We offer a few ex- dictive maintenance for expensive and high-reliability amples of the types of challenges being addressed. systems. There are also vast amounts of field-performance 4.3.1 Deterministic process models and stochastic data available from warranty and maintenance data models. A substantial amount of emphasis is now be- bases. Mining these data for signals and process prob- ing placed on the tandem use of deterministic process lems and using them for process improvement should models and statistical models. Process models have be a major area of focus. typically taken fundamental scientific concepts such 400 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND as mass balance in chemical constituents as a foun- 4.3.3 Statistical modeling and scientific conceptu- dation and built up more elegant mathematicalstruc- alization. It is common for changes in environmental tures by overlaying equations that representphysical data recordsto be conceptualizedwithin the statistical and chemical interactions,often in the form of sets of frameworkof signal plus noise. Indeed,this is the case differentialequations. Statistical models, on the other for many of the models discussed above, in which var- hand, typically rely on the description of observed ious forms are given to the signal (or systematic) and data patterns as a fundamentalmotivation for model noise (or error)components of the models to betterrep- there development. Increasingly, is recognition that resent the processes understudy. of and environ- one's understanding many geophysical An example in which statistics has aided in the de- mental can be advanced ideas processes by combining velopment of scientific thinking is the analysis of cy- from these two modeling approaches. cles in the populationsof Canadianlynx and snowshoe One method that has been used to combine process hare, and a series of papers dealing with this has ap- and statistical models is to use the output of deter- peared in the Proceedings of the National Academyof ministic models as input information to a stochastic Sciences (Stenseth et al., 1997, 1998, 2004a, b) and formulation.The full NSF report contains a detailed Science. Here, collaborationbetween statisticiansand illustrationof this process that arose in the analysis of ecological scientists has resulted in a strengtheningof bivariatetime series representingnorthern and south- scientific A numberof concepts have been de- ern hemispherictemperature averages. theory. veloped through this work, including the relationship 4.3.2 Correlated data and environmental trends. between autoregressionorder of a statisticalmodel and environmental Many problems involve the detection the complexity of feedback systems between species and estimation of over time. For an changes example, (i.e., lynx and hare), and the idea that population cy- environmental such as the Envi- monitoring agency cles may exhibit spatialsynchrony. ronmentalProtection Agency uses trend estimates to assess the success of pollution control programsand 4.4 InformationTechnology to areas where more controls are identify stringent The rise of and data needed. In climate a rapid computing large-scale modeling, major preoccupation has human some- is to determinewhether there is an overall trend in the storage impacted many endeavors, times in ways. There has never been a more data, not only for widely studied variablessuch as the profound time for statisticians in areas related global mean temperature,but also for numerousother exciting working variableswhere the conclusions are less clear-cut. to informationtechnology. The of the web and the For statisticians, estimation of trend components development exponen- of have with correlated errors is a problem with a long his- tially increasingcapabilities computersystems tory, and much of this work involved substantialin- opened up possibilities previously undreamedfor the teraction between statisticians and geophysical and exchange of information, the ability to collect and environmental scientists. For example, Sir Gilbert analyze extremely large data sets of diverse nature Walker,known to statisticiansthrough his many con- from diverse sources, and to communicatethe results. tributions to time series analysis and, in particular, The development of open source software magnifies the Yule-Walkerequations, was also a distinguished the ability of researchersto leverage their talents and meteorologist who worked extensively on the El ideas. New challenges in statisticalmodel-building and Nifio-Southern Oscillation phenomenon, and these learningfrom data abound.The remainderof this sec- contributionswere largely the result of the same re- tion highlights a selected set of high-impactareas. search. 4.4.1 Communications. A wealth of communica- A long collaborationbetween statisticiansand geo- physicists has resulted in a series of papers on the tions records is generated every minute of every day. detection of change in stratosphericozone in which Each wireless and wireline call produces a record that a large number of models with correlated errors are reports who placed the call, who received the call, considered.This research,consisting largely of papers when and where it was placed, how long it lasted, and with a statisticianas lead authorbut appearingin jour- how it was paid for. Each user request to download a nals outside of the mainstream statistical outlets, is file from an Internet site is recorded in a log file. Each an excellent illustrationof the outreachof statistics to post to an on-line chat session in a public forum is other scientific fields. recorded. THE FUTURE OF STATISTICS 401

Such communicationsrecords are of interestto net- to be done in a manner that addresses a serious sta- work engineers, who must design networks and de- tistical problem of goodness of fit. Classical statistical velop new services, to sociologists, who are concerned approachesand techniques are typically renderedim- with how people communicateand form social groups, practical by the appearanceat many points of heavy to service providers, who need to ferret out fraud as tailed distributions (often leaving even such stan- quickly as possible, and to law enforcementand secu- dard tools as variance and correlation useless) and rity agencies looking for criminal and terroristactivi- long-range dependence and nonstationarity(stepping ties. beyond the most basic assumptions of classical time There is a host of challenging statistical problems series). that need to be met before the wealth of data is con- Network topology presents different types of sta- verted into a wealth of information. tistical problems. Here the goal is to understandthe structureof the internet. theoretic 4.4.2 Machine learning and data mining. The line connectivity Graph notions, combined with variation over time and also between researchin machinelearning and datamining, combinedwith issues, areneeded to make as carried out primarilyin computer sciences depart- sampling se- rious in this ments, and research in nonparametricestimation, as headway area. carried out primarilyin statistics departments,has in- Network tomographyis about inferring structureof creasingly become blurred.In fact the labels "machine the Internet,based only on the behaviorof signals sent it. learning"and "datamining" are now often used by sta- through Proper understanding,analysis and mod- tisticians. Primaryareas of very active researchwithin eling of the complex uncertainties involved in this statistics departmentsinclude new methods for classi- process are importantto make headwayin this area. and fication, clustering predictivemodel-building. Sta- 4.4.4 Data streams. Statistical analyses of large tisticians have been classification developing tools for data sets are often performed in what is essentially a but the in long time, explosion computationalabil- batchmode. Such data sets may requireyears to collect with the fruits of recent researchhave led to ity along and prepare,and the correspondingstatistical analyses some new advances. important may extend over a similar period of time. However, One such new advancein classificationthat takes ad- real-time data mining offers a rapidly growing niche of these facts is vector machines. The vantage support for statisticians.Such situations arise, for example, in methodis science ma- highly popularamong computer remote sensing, where limited bandwidthbetween an chine communities, but has benefited learning greatly orbitingsatellite and its groundstation precludes trans- from input statisticians, who have contributedin by mission of all raw data. A second example is commer- important to the of the ways understanding properties cial web sites such as an airline reservation system, method.However, there are for importantopportunities where detailed keystroke sequence data leading to ac- further of understanding both the theoreticalproperties tual or abortivereservations is not saved. of this tool, and the most and efficient appropriate way The challenge is to create statisticaltools that run in to use this tool to recover informationfrom in a data, almost linear time, that is, to design tools that can run broadvariety of contexts. in parallelwith the real-timestream of data.For simple 4.4.3 Networks. The study of internettraffic can be statistics such as sample moments, there are no diffi- roughly divided into traffic measurementand model- culties. However, these tools must be able to adapt in ing, network topology, and network tomography.All real time. Furthermore,data mining makes use of vir- of these areaspresent large-scale statisticalchallenges. tually every modern statisticaltool (e.g., clusteringal- Furtherresearch in measurementand modeling is gorithms,trees, logistic regression).Transforming and motivated by the need to jointly improve quality of recasting the statisticaltoolbox into this new and fun- service and efficiency. The currentapproach to qual- damentallyimportant setting will requireimagination, ity of service is based on massive overprovisioningof cleverness and collaborationwith algorithmicexperts resources, which is both wasteful and not completely in other areas of the mathematicalsciences. effective, because of bursts in the traffic caused partly 4.5 Physical Sciences by improperprotocol and routingprocedures. Because many ideas for addressingthese problems have been Historically,astronomy is one of the first and most proposed, there is a major need for comparison,now importantsources of inspirationfor, and applicationof, done typically by simulation. This requires modeling statisticalideas. In the eighteenthcentury astronomers 402 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND were using averages of a number of measurements elusive particles shielded out yields a "background" of the same quantitymade under identical conditions. count of y events. What is an upper confidence limit This led, at the beginning of the nineteenthcentury, to for the true rate of the particles of interest?Statistical the method of least squares. issues become particularlysensitive if y exceeds x, so Astronomy has expanded enormously, both in the thatthe unbiasedrate estimateis actuallynegative. The size and complexity of its data sets, in recent years so question then is whether the upperconfidence limit is as to estimate the Big Bang cosmological parameters sufficiently positive to encourage furtherdetection ef- from the anisotropicclustering of galaxies, the fluctu- forts. ation spectrum of the cosmic microwave background Even in its simplest form-actual situationscan in- radiation,and so forth. A host of other basic statistical volve much more elaboratebackground corrections- problems arises from the Virtual Observatory,a fed- this problem has attractedwidespread interest in the erationof multiterabytemultiwavelength astronomical physics community. A much quoted reference is survey databases. Feldman and Cousins (1998). Louis Lyons, profes- Despite the common origins of statistics and astron- sor of physics at Oxford, organizeda September2003 omy, and our mutualinterest in dataanalysis, only very conference at the StanfordLinear AcceleratorCenter recently have there been substantialcollaborations be- devoted to statistical problems in particle physics, as- tween statisticiansand astronomers.(One example of trophysics and cosmology (www-confslac.stanford. this type was presentedearlier, in our discussion of the edu/phystat2003/). core.) 4.5.2 Comparative experiments in chemical spec- This between the fields of statis- long-standinggap troscopy. Richard Zare of the Stanford chemistry tics and a common in astronomy exemplifies pattern faculty developedan advancedclass of mass spectrom- the physical sciences. Statistics works by the efficient eters able to simultaneously time the flights of large accrualof evidence from noisy individualinformation volumes of massive particles. This permits compar- sources. In large partthe historicalspread of statistical isons between collections of particles obtained under methodology can be described as "noisy fields first": different conditions, for example, complex molecules vital statistics, economics, agriculture,education, psy- grown in differentchemical environments. medical chology, science, genetics and biology. The A typical spectrum consists of particle counts in "hardsciences" earnedtheir name from the almost per- binned units of time, perhaps 15,000 bins in a typ- fect signal-to-noiseratios attainable in classical experi- ical run. Comparing two such spectra, that is, look- mentation,so it is understandablethat they have proved ing for bins with significantcount differencesbetween to be the most resistantto statisticalmethodology. the two conditions, is an exercise in simultaneoushy- However, recent trends are softening the hard sci- pothesis testing. With 15,000 bins, the simultaneity ences, and so there is an increasing need for statisti- is massive. Statistical methodology originally devel- cal principles and methods. Technology now enables oped for microarrayanalysis can be brought to bear bigger and more ambitious data-gatheringprojects on spectroscopycomparisons, but the relationshipbe- such as those of the Sudbury neutrino observatory tween time bins is unlike that between genes, suggest- and the Wilkinsonmicrowave anisotropy probe. These ing that new methodology will be needed. projects must extract crucial nuggets of information 4.5.3 Survival analysis and astronomy. In a spec- from mountainsof noisy data. (The signal-to-noisera- tacular of parallel development, astronomy tio at Sudburyis less than one in a million.) Unsurpris- example and biostatistics invented closely related theories to ingly, statisticalmethods play a big, sometimes crucial the field called survival role these deal with missing data, analy- in projects. sis in the statistics literature.The reasons for the miss- To illustratethe promisingfuture role of statistics in were different:astronomers are earth-boundso the physical sciences, we offer three brief statistics- ingness they cannot observe events too dim or too far away, intensive examples, from particle physics, chemical leading to data "truncation."Data "censoring"occurs spectroscopyand astronomy. in medical trials when subjects fail to record a key 4.5.1 Confidence intervals in particle detection. event, such as relapse or death, before the end of the The following situation arises in the search for elu- trial. Lynden-Bell's method and the Kaplan-Meieres- sive particles:a detectorruns for a long period of time, timate, the astronomy and statistics solutions to the recording x interesting events; a similar run with the missing dataproblem, are essentially the same. THE FUTURE OF STATISTICS 403

Mutual awarenessdid not occur until the 1980s. An * The large amountsof dataproduced in almost all ar- influential series of joint astronomy-statisticsconfer- eas are producing an increasing demand for statis- ences organized at Penn State by Babu and Feigelson ticians who can communicatewith nonstatisticians, led to collaborations and progress in the statistical while devising new methods to guide experimental analysis of astronomicaldata. For example, the extra- design and data analysis. galactic origin of gamma-raybursts was demonstrated * As noted more extensively in the full report, there via survivalanalysis before the burstscould be identi- exists a software challenge that touches deeply in fied with specific physical sources. a number of areas. It correspondsto a wide need for statisticalmethods to be incorporatedinto open 4.6 Enhancing Collaborative Activities source softwareproducts, and a parallellack of sup- A distinguishing feature of the intellectual orga- port for this infrastructureneed. nization of statistics is the value placed on individ- * There exists a need for coordinated long term ual participationboth in the developmentof statistical funding for interdisciplinaryprojects so that the methodology and on multidisciplinaryactivities (e.g., statistician can afford to develop the scientific un- in applications of statistics in biology, medicine, so- derstandingvital to true collaboration. cial science, astronomy,engineering, governmentpol- icy and national security), which, in turn, becomes an 5. STATISTICALEDUCATION important stimulus for new methodology. Although We have identified a growing need for statistics and different people strike different balances between statisticians from wide areas of science and industry. methodological research and subject matter applica- As noted the Odom Report,"There is ample profes- tions, and the same people strike differentbalances at by sional for people in statistics, both different times in their careers, essentially all statisti- opportunity young in academia, industry,and government."At the same cians participatein both activities. the cannot meet demandwith domes- Statistics, throughthese interactions,develops tools time, profession tic candidates. from the Odom Report, "A very that are critical for enabling discoveries in other sci- Again, of students are foreign-born ences and engineering.Statisticians are also instrumen- high proportion graduate and remain in the United States upon gradua- tal in unearthing commonalities between seemingly many tion." unrelated problems in different disciplines, thereby At the same time that the demandson the profession contributingto or creating synergistic interactionsbe- therehas been a star- tween differentscientific fields. have grown in the researcharena, in demandin statisticaleducation at lower However, as noted by the Odom Report, our reach tling growth has not been wide or far enough. As they said: levels: * The statistics has startedto feel the im- Both in applications and in multidiscipli- profession of the wide of statistical training in nary projects, however, there exist serious pact growth led the of an Ad- problems in the misuse of statistical mod- grades K-12, by implementation Placement course in statistics. This els and in the quality of educationof scien- vanced (AP) students are to with tists, engineers, social scientists, and other means many coming college users of statisticalmethods. As observations substantialknowledge about statistics. enrollments in statistics were generate more data, it will be essential to * Undergraduate up 1990 and 2000. resolve this problem, perhaps by routinely sharply(45%) between including statisticianson researchteams. These circumstancespoint to the need for the pro- how to handle this One problemis that statisticianswho attemptto par- fession as a whole to consider ticipate widely in these activities face several steep growth and how to build a in- challenges, includingthe need to stay currentin all rel- frastructurethat can meet the changing and growing evant fields and the need to provide the relevant soft- needs. The NSF workshop was not designed to give ware to carry out statistical analyses. In addition, in detailed considerationto the problemsof statisticaled- the spite of a culturethat encouragesmultidisciplinary ac- ucation. However, it was clear from the outset of tivities, evaluatingthese activities has proven difficult meeting that many of our currentand futurechallenges and sometimes controversial. are arising in the domain of education. As a result, In summary,we note that: a chapterof the full NSF reportwas dedicated to this 404 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND topic. The goal of the chapterwas to describe the cur- * Thereis a growingneed for postdoctoraltraining po- rent issues, as we saw them, and to invite furtheraction sitions and other continuingeducation opportunities by the community.Here is a summaryof our sugges- to help newly minted graduatesdevelop their pro- tions. fessional skills and help older statisticiansstay up to 5.1 Educational Reform date. Postdoctoraltraining should receive higherpri- ority in the statisticscommunity as a sensible way to Clearly, the long-range solution to the shortage of launch research careers and broaden interests. The statisticians must lie in improvementsto the educa- NSF's VIGRE plus NISS and SAMSI postdoctoral tional system that will attract,train, retain and reward programsare signals of growthin this area. a new generation of talented students. Improvements In all of the educationalreform we have de- will be requiredacross the spectrumfrom elementary aspects it is that decisions are made school throughcontinuing education of the workforce. scribed, important using sound and our own The pool of K-16 teachers who are qualified to teach experimentation good data, special statisticsneeds to be increased. forte. We have identified the following key issues and 5.2 Recruitment needs in the area of educationalimprovements: The recruitmentof the next generationdefines a fur- " Adequately trainedteachers are needed for the sta- ther set of challenges to our community.The numberof tistics AP as well as courses, statistically literate people with trainingin statistics is not growing nearly instructorsin other in subjects grades K-12. Con- fast enough for the exponentialgrowth in the demands siderable effort continues to be given to training for statistical expertise. This trend must be changed teachersto take on AP statistics. K-12 Still, the need dramaticallyto meet the high demands for statistical exceeds the with the col- supply.Dealing K-12 (and expertise for today's complex, large, interdisciplinary of trainedteachers well be lege level) shortage may researchproblems in science and engineering. the numberone priorityin statisticseducation. There is no doubt that recruitmentat the lowest lev- * There is a need for an K-16 curriculum integrated els has been helped by the AP course. At the same that accounts for the in school improvement high time, programsfor enhanced recruitmentthat are fo- statistics note that statistics training.(We improving cused on the mathematicsprofession as a whole, such is one of a to literacy just piece largerchallenge pro- as VIGRE, are very promising, but have many times vide quantitativepreparation in this country.) lacked to the needs of statistics. * sensitivity special Colleges and universities need expanded statistics Much of what needs to be done in educationand re- minor and in both and majoroptions undergraduate cruitmentwill requirelarge investments.It is naturalto With the demandfor graduateprograms. undergrad- look to the NSF for leadershipand support.The time uate statisticscourses on the rise, it is naturalto only for action is now, as evidenced by such problems as increased on a expect emphasis providing meaning- the of statisticiansin industriesand many ful of shortage key package knowledge. Special pressure points concerns about na- can be seen in the demandfrom the government agencies, heightened pharmaceutical, tional security,and the increasedreliance at the college biomedical and governmentstatistical communities level on adjunctsand instructorswho are not members (including those involved in homeland security and of the regularfaculty. nationaldefense matters). * We need to encourage and enable students, at all 6. A SUMMARYWITH RECOMMENDATIONS levels, to acquire deeper and broader subject mat- ter knowledge in an area or areas of application.It The field of statistics has made profound contri- has become increasingly difficult to be a successful butions to society over the past century. Its impact statistician without involvement in a focused area of is felt across virtually all branches of science, medi- science and technology. cine, industry and government. The strong growth * At the graduate level, there is a large challenge of the field-stimulated in large part by advances in to build training programs that can offer sufficient computing technology-has also caused strains and depth over the wide breadth of tools that the modern frustrationsas the opportunitieshave increasedand the statistician is now using. This includes large chunks supportinginfrastructure has struggled to keep pace. of mathematics, computer science and basic science. This reportattempts to highlight many of the contribu- Unfortunately, the needs are expanding in all these tions that statistics has made and to identify priorities areas. for continuedprogress. THE FUTURE OF STATISTICS 405

Statistics is itself a science-the science of learn- that, wherever it is housed, statistics can flourish ing from data. It is grounded in a still growing core without unproductiveconstraints. of knowledge that reflects its roots in probabilityand " Increase support for, and the autonomy of, the NSF mathematics, and also the more recent influence of statistics program. To avoid stifling the momentum computer science. Statistics both draws from these evident today in statistics (and partiallydocumented roots and feeds back to them new mathematicaland in this report) and to reap the benefits of the mul- computationalquestions. Statistics is also an unusu- titude of opportunitiespresenting themselves, there ally interdisciplinaryfield. Indeed, applications are are compelling reasons for providing a substantial its lifeblood: they stimulate research on new theo- boost in resourcesthat supportstatistics at the NSE below for some In we ries and methods while providing valuable outlets (See specific needs.) addition, thatthe NSF the DMS statistics for established techniques. Among the highest pri- suggest provide pro- with increased within its currentor- orities for statistics today is adapting to meet the gram autonomy structure.This would be a needs of data sets that are so large and complex that ganizational logical step toward full division status that feel is new ideas are required,not only to analyze the data, many already overdue. but also to design the experiments and interpretthe * more flexible funding models. The creation experimental results. These problems are often the Develop of the new Statistics and Applied MathematicsIn- source of the widespread interdisciplinarycollabora- stituteis an excellent example of creativenew fund- tions-from astronomyto public policy to zoology- ing needed by the statistics profession. The needs, which statisticiansengage in today. however, are not institutional. Increasingly, A substantial of the describes con- solely proportion report individual researchers are becoming involved in tributionsthat statistics has made to other fields, in- complex cross-disciplinaryprojects or in activities and social as cluding physical, biological sciences, that are more akin to runninga laboratorythan doing well as and sciences. The mo- engineering computer individual research. One implication of this move- tivation for this is to the often emphasis help clarify ment is the need for learningadvanced programming misunderstoodrole of statistical science and to illus- techniques and development of sophisticated user- trateits in a of impact variety ways. Special attentionis friendly software.We propose that the NSF develop also to given statisticseducation because thereare sub- novel funding arrangementsthat would encourage stantial across-the-boardopportunities for advancing these new ventures while being careful not simply the way the subjectis taughtand studentsare trained. to extractthese monies from the individualresearch The sense of the workshop and the feeling of the grantpool. leadership in the profession is that enormous oppor- * Strengthen the core of statistics research. The risk tunities lie ahead for statistics. The realizationof this of fragmentationof the core of statistics has in- potential,however, will not come easily. Resources are creased substantiallyas the field has diversifiedand too limited, the pipeline of studentsis too small and the expanded. More attentionmust be given to consol- infrastructuresupporting the field is too constrained. idation of knowledge and the development of new To deal with these and other challenges, the following theories and methods with broad applicability.We recommendationsare put forth: urge the NSF to take responsibilityfor providingthe level of to the statistics * supportnecessary strengthen Promote understanding of statistical science. Statis- core. tics is hardto At the it falls pigeonhole. NSF, largely * Improve the support of multidisciplinary research under the mathematical sciences, yet most statis- activities. Much of the excitement in science to- ticians would agree that statistics is not a branch day stems from research that involves multiple dis- of mathematics. Modern statistics is also close to ciplines. While statistics comes by this naturally,it computer science, especially machine learning, yet often suffers from inclusion as an afterthought or most statisticians would agree that statistics is not exclusion as a minority player without a significant a branch of computer science. Statistics is a sci- role. We encourage the NSF to experiment with new ence in itself, and attemptsto group it here or there vehicles for funding this type of research and-when ultimately exacerbate misunderstandings about the appropriate-to assure a role for statistics. For ex- field. Statisticians need to take responsibility for ample, in many cases statisticians should be part- more effectively articulating the unique capabilities ners in projects with complex design and data analy- of their discipline. The NSF can help by assuring sis components. For such collaborations to succeed, 406 B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

statisticianswill need to have the time and support start with, we are very indebted to the National Sci- to understandthe subject area. ence Foundationfor supportof the workshop.Particu- * Develop new models for statistics education. The lar thanksgo to MarianthiMarkatou and John Stufken growth of AP statistics courses in high schools, the for their tremendousassistance at the time of the work- burgeoning enrollments in undergraduatestatistics shop and, more importantly,their strong enthusiasm courses and majorimprovements in computingtech- and encouragementfor its development. nology for data analysis underscore the need for The startingpoint for this reportwere the talks and reevaluationof the entire K-16 approachto statistics writtendocuments that the theme speakersat the work- education.Graduate training is also due for reassess- shop provided. After these presentations, the work- ment: Keeping the right balance between trainingin shop participantsdivided into small groups according the directionof statisticsin the the core parts of the science, preparingstudents for to theme and discussed The was based on a com- cross-disciplinarywork and incorporatingrelevant theme areas. report prepared written material on the various that parts of computer science into the curriculumare pilation of topics was the editors. The editors also created among the contributingfactors to the awkwardbal- organizedby materialthat would a wider view of the field as ancing act that departmentsface today. The role of provide well as between the areas. postdoctoraltraining and continuingeducation more linkage topic 3 and 4 of the full which are Sec- generally should also be part of the updatedvision. Chapters report, tions 2 and 3 of this article, dealt with the current To help the statisticscommunity develop appropriate status of statistics and the core of statistics. new models for education, and to do it both holis- They were constructedbased on speakingnotes preparedby tically and systematically,we suggest that the NSF D. R. Cox and lain Johnstone.Special thanksgo to sponsor or supporta series of focused, coordinated Iain Johnstonefor the many figures in these chapters. workshopson statisticseducation with the aim of de- Membersof the Scientific Committeewho were des- veloping concrete plans for reform on these various ignated as leaders for the various topic areas were in fronts. It would be naturalto carry out this under- charge of preparingindividual reports based on their takingin collaborationwith the scientific and educa- committee's In Section 5.1 of the full tional that share for and input. particular, organizations responsibility was on sciences and was directed concern about statisticseducation. report biological by David Siegmund; Section 5.2, engineering and in- * Accelerate the recruitmentof the next generation. dustry,by Sam Hedayat;Section 5.3, geophysical and Workshopparticipants to short- pointed repeatedly environmental sciences, by Larry Brown; and Sec- ages in the of students and unmet demand pipeline tion 5.4, informationtechnology, by Grace Wahba.In from key industriesand laboratoriesand government addition,Bradley Efron and Eric Feigelson contributed The solution to this agencies. long-range problem materialto Section 5.5 on the physical sciences. These must lie in improvementsto the education system, sections are numbered4.1-4.5 in this article. even in school and starting elementary continuing Jon Kettenringprepared Chapter 6, found here as into high school and undergraduateprograms. How- Section 5. Since education was not one of the themes of this will take much time and in- ever,changes type of the workshop itself, material for this chapter was vestment. the Meanwhile, shortagemay prove quite derived from secondary sources. Special thanks go to damaging to the nation's infrastructure,especially David Moore and Dick Schaeffer for their assistance in this period of heightenedconcerns about national in this area. Jon was also responsible for the material defense and security-areas to which statistics has in the summarysection of this article. (It was the exec- much to offer. Novel special programsdesigned to utive summaryin the full report.) spur interestin undergraduateand graduatetraining In addition,many helpful suggestions have been re- in statistics should be considered.We encouragethe ceived since the materialwas first put into draftform, NSF to join forces with leaders of the statisticspro- both from workshopparticipants and others.The report fession to help solve the pipeline problem. has been greatly improvedas a result of their efforts.

ACKNOWLEDGMENTS REFERENCES The editors wish to thank the many people who BREIMAN, L. (2001). Statisticalmodeling: The two cultures(with contributedto the production of the NSF report. To discussion). Statist. Sci. 16 199-231. THE FUTURE OF STATISTICS 407

EFRON,B. (1979). Bootstrapmethods: Another look at the jack- RAFTERY,A. E., TANNER, M. A. and WELLS, M. T., eds. (2002). knife. Ann. Statist. 7 1-26. Statistics in the 21st Century.Chapman and Hall/CRC Press, FELDMAN, G. J. and COUSINs, R. D. (1998). Unified approach London. to the classical statisticalanalysis of small signals. Phys. Rev. SIMES, R. J. (1986). An improvedBonferroni procedure for mul- D 57 3873-3889. tiple tests of significance. 73 751-754. N. BJORNSTAD, N. and HALL, P. and TITTERINGTON,D. M. (1987). Common structure STENSETH, C., FALCK, W., O. KREBS, C. J. in snowshoe hare of techniquesfor choosing smoothingparameters in regression (1997). Population regulation and problems.J. Roy. Statist. Soc. Ser. B 49 184-198. Canadianlynx: Asymmetricfood web configurationsbetween hare and lynx. Proc. Nati. Acad. Sci. USA94 5147-5152. MILLER,C. J., NICHOL,R. C. and BATUSKI,D. J. (2001). STENSETH,N. C. et al. From to Phase Acoustic oscillations in the universe and Science (1998). patterns processes: early today. and in the Canadian 292 2302-2303. density dependencies lynx cycle. Proc. Natl. Acad. Sci. USA 95 15430-15435. MILLER,C. J. et al. (2001). the rate in Controlling false-discovery STENSETH,N. C. et al. (2004a). Snow conditionsmay createan in- data Astronomical 122 3492-3505. astrophysical analysis. J. visible barrierfor lynx. Proc. Natl. Acad. Sci. USA 101 10632- NATIONALRESEARCH COUNCIL (1996). Statistical SoftwareEn- 10634. gineering. National Academies Press, Washington. STENSETH,N. C. et al. (2004b). The effect of climatic forcing on NATIONAL SCIENCE FOUNDATION (1998). Report of the senior populationsynchrony and genetic structuringof the Canadian assessment panel of the internationalassessment of the U.S. lynx. Proc. Natl. Acad. Sci. USA 101 6056-6061. mathematicalscience. Report 98-95, National Science Foun- WEGMAN, E. J. (1995). Huge data sets and the frontiersof com- dation, Arlington,VA. putationalfeasibility. J. Comput.Graph. Statist. 4 281-295.

Comment Peter Bickel

As a member of the committee charged with ini- are playing, and I expect will continueto play, a major tiating this report, but who, for various reasons, was role in security concerns as well as presenting major unable to do much, I know how much effort went sources of novel problems.I do not know how it could into the workshop that led to this report and its sub- have been done, given space limitations, but I would sequent putting together.The profession owes a great have liked to see more detailed connections drawnbe- debt to Jon, Bruce and David, as well as to Marianthi tween the core and the various subject matter areas, Markatouand John Stufken, who proposed this work- and between the subject matterareas themselves such shop. Having said this, I have to admit that I am un- as the one above or between truncationand censoring comfortablewith the abridgedreport as a presentation in survival analysis, reliability theory, astronomy and of the scope of statistics, as opposed to a well-justified other examples. I also believe the connections to the appeal for a larger role for statistics at the NSE The other mathematicalsciences are understated.For in- focus on the lattergoal led to the eliminationof an ex- stance, compression is an idea that, I believe, comes tensive discussion of biostatistics,which, as the authors from and is highly developed in information theory. is admit, perhapsthe biggest field of employment and The tools needed to analyze graphs have been stud- of much research for statisticians, as well as discus- ied by graphtheorists and specialists in discretemathe- sions about the very importantfields connected with matics for some time; optimization,efficient algorithm public policy such as sampling and econometricmod- construction, data base construction, not to mention These fields are eling. being just as overwhelmed by both practicaland theoreticalaspects of machinelearn- the data flood as the rest of society. That questions ing, are all the province of computerscientists as well. from these areas will not a arising play major role in I would agree that what has made statistics so excit- our futureseems doubtfulto me. For instance,statistics ing is the growing ubiquity of stochastic models in of social networks-questions of samplingin graphs- almost all fields of human endeavor.As inheritorsof the traditionsof formulatingand analyzing such mod- PeterBickel is Professor Departmentof Statistics, Uni- els, in fact of thinking in these ways, we have a lot versity of California, Berkeley, CA 94720-3860, USA to bring to the table and an importanteducational re- (e-mail: [email protected]). sponsibility.However, to really participate,we need to