The Present and Future of Statistics Challenges and Opportunities
Total Page:16
File Type:pdf, Size:1020Kb
The Present and Future of Statistics Challenges and Opportunities Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian 1/47 New York Times, August 6, 2009 “I keep saying that the sexy job in the next 10 years will be statisticians” – Hal Varian, Chief Economist, Google 2/47 McKinsey Report, 2011 2011 McKinsey Global Institute report: Big data: The next frontier for innovation, competition, and productivity “A significant constraint. will be a shortage of . people with deep expertise in statistics and data mining. a talent gap of 140K - 190K positions in 2018 (in the US)” http://www.mckinsey.com/insights/mgi/research/technology and innovation/ big data the next frontier for innovation 3/47 The Wall Street Journal, March 1, 2013 4/47 Advanced placement statistics AP Statistics Exam Participation, 1997−2013 200 150 100 50 Number of Students (thousands) 0 2000 2005 2010 Year 5/47 Rock star Nate Silver attracted ∼ 4,000 statisticians to his president’s invited address at the 2013 Joint Statistical Meetings 6/47 Bestsellers 7/47 Bestsellers 7/47 The Wall Street Journal, November 15, 2013 “Statistics is cool” – Ron Wasserstein, ASA Executive Director 8/47 Rock stars 2013 MacArthur Fellow Susan Murphy 9/47 Rock stars 2013 Prime Minister’s Prize for Science (Australia) recipient Terry Speed 10/47 Rock stars Sir David Spiegelhalter (aka “Professor Risk”) 11/47 Essential “Statistical rigor is necessary to justify the inferential leap from data to knowledge” 12/47 Big opportunities The opportunities for statistics to have major impact are endless 13/47 Big challenge Big Data and data science 14/47 Big hype 15/47 Big hype, receding “We are now past the ‘peak of inflated expectations’ of the hype cycle”(http://en.wikipedia.org/wiki/Hype_cycle) 16/47 Investment 17/47 Investment 18/47 Investment 19/47 What is data science? 20/47 Big challenge remains • Perception that other disciplines are more relevant to data-driven science and business • Computer science, machine learning, mathematics, engineering, physics, analytics, . • Perception that statistics is old-fashioned and rigid • Institutes, centers, degree and certificate programs; in many cases statistics is MIA 21/47 For example “Statistics departments and journals still strongly emphasize a very narrow range of topics and methods and techniques, all driven by a tiny handful of results, many dating from the 1930s. the older zombie methods persist in the statistics literature and teaching.” 22/47 For example “My impression is that scien- tists view statistics not so much as a science but as a ‘bag of tools.’ You have a visibility prob- lem in Science and AAAS.” – Alan Leshner, Chief Executive Officer of the American Asso- ciation for the Advancement of Science (AAAS), to representa- tives of the ASA in 2011 Science, February 11, 2011 23/47 • We continue to have daily impact in our critical collaborative roles • We have gotten lots of great press • Enrollments in undergraduate programs and applications to graduate programs are skyrocketing • But statistics is still often absent from the discourse on Big Data and data science • And the importance of statistical rigor is sometimes lost on fellow scientists • Statistics and statistical principles continue to be misunderstood Weird juxtaposition 24/47 • But statistics is still often absent from the discourse on Big Data and data science • And the importance of statistical rigor is sometimes lost on fellow scientists • Statistics and statistical principles continue to be misunderstood Weird juxtaposition • We continue to have daily impact in our critical collaborative roles • We have gotten lots of great press • Enrollments in undergraduate programs and applications to graduate programs are skyrocketing 24/47 Weird juxtaposition • We continue to have daily impact in our critical collaborative roles • We have gotten lots of great press • Enrollments in undergraduate programs and applications to graduate programs are skyrocketing • But statistics is still often absent from the discourse on Big Data and data science • And the importance of statistical rigor is sometimes lost on fellow scientists • Statistics and statistical principles continue to be misunderstood 24/47 ASA impact The American Statistical Association has undertaken numerous initiatives to • Promote the role of statistics in all the sciences • Highlight the unique perspectives statistics brings to business and policy • Increase awareness of opportunities for statisticians among students and thereby enhance our impact on science and society 25/47 Key ASA Staff Ron Wasserstein Steve Pierson Jeff Myers 26/47 AAAS and Science AAAS – A focal point for science • World’s largest general scientific society, ∼120K members • 261 affiliated scientific societies (including ASA) • Publisher of Science • 24 AAAS sections, including Section U on Statistics 2013 presidential initiative • Raise the profile of statistics within AAAS 27/47 AAAS and Science • Button campaigns encouraging statisticians to join AAAS • Section U membership increased 14% from 2012 to 2013 • Section U invited session proposals for AAAS Annual Meetings • Nominations for AAAS Fellow 28/47 Impacting AAAS and Science September 26, 2013 • A group of ASA representatives met with Alan Leshner • Very positive reception! • We also met with new Science editor-in-chief Marcia McNutt and several senior editors • Great interest in enhancing the role of statistics and statisticians 29/47 Impacting Science Science Editor-in-Chief Marcia McNutt 30/47 Science, January 17, 2014 EDITORIAL Reproducibility Marcia McNutt is Editor- SCIENCE ADVANCES ON A FOUNDATION OF TRUSTED DISCOVERIES. REPRODUCING AN EXPERIMENT in-Chief of Science. is one important approach that scientists use to gain confidence in their conclusions. Recently, the scientifi c community was shaken by reports that a troubling proportion of peer-reviewed preclinical studies are not reproducible. Because confi dence in results is of paramount importance to the broad scientifi c community, we are announcing new initiatives to increase confi dence in the studies published in Science. For preclinical studies (one of the targets of recent concern), we will be adopting recommendations of the U.S. National Insti- tute of Neurological Disorders and Stroke (NINDS) for increasing transparency.* Authors will indicate whether there was a pre-experimental plan for data handling (such as how to deal with outliers), whether they conducted a sample size estimation to ensure a suffi cient signal-to-noise ratio, whether samples were treated randomly, and whether the experimenter was blind to the conduct of the experiment. These criteria will be included in our author guidelines. There are a number of reasons why peer-reviewed preclinical studies may not be reproducible. The system under investigation may be more complex than previously thought, so that the experimenter on July 16, 2014 is not actually controlling all independent variables. Authors may not have divulged all of the details of a complicated experiment, making it irreproducible by another lab. It is also expected that through ran- dom chance, a certain number of studies will produce false positives. If researchers are not alert to this possibility and have not set appro- priately stringent signifi cance tests for their results, the outcome is a study with irreproducible results. Although there is always the possi- bility that an occasional study is fraudulent, the number of preclinical studies that cannot be reproduced is inconsistent with the idea that all www.sciencemag.org irreproducibility results from misconduct in such research. It is unlikely that the issues with irreproducibility are confi ned to preclinical studies (social science has been equally noted, for example). Unfortunately, there are no equivalents to the NINDS recommendations for other disciplines that provide a basis for requiring trans- parency across all fi elds. For the next 6 months, we will be asking reviewers and editors to identify papers submitted to Science that demonstrate excellence in transparency and instill confi dence in the results. This will inform the next steps in implementing reproducibility Downloaded from guidelines. Science Translational Medicine, a sister journal of Science, already enforces the NINDS guidelines for preclinical studies. Both journals also are open to improving on the NINDS recommendations for preclinical studies. There is also a wide range of sophistication in the application of statistics displayed in research analysis, ranging from practically no statistics, to the routine use of generic soft- ware packages, to the application of advanced methods that extract subtle signals from noise. Because reviewers who are chosen for their expertise in the subject matter of a study may not be authorities in statistics as well, statistical errors in manuscripts may slip through unde- tected. For that reason, with the advice of the American Statistical Association and others, we are adding new members to our Board of Reviewing Editors from the statistics commu- nity to ensure that manuscripts receive appropriate scrutiny in their methods of data analysis. Science’s standards have always been high, and these measures add to steps we have already taken to increase transparency, such as requiring data accessibility. Nevertheless, journals can only do so much to assure readers of the validity of the studies they publish. The ultimate responsibility lies with authors to be completely open with their