The Compass for Navigating a Data-Centric World

Statistics: The Compass for Navigating a Data-Centric World Marie Davidian Department of Statistics North Carolina State University January 11, 2013 Available at http://statistics2013.org Statistics2013 Video Statistics2013 Video Available at http://statistics2013.org using . Statistics http://fivethirtyeight.blogs.nytimes.com/ Silver used a statistical model to combine the results of state-by-state polls, weighting them according their previous accuracy, and to simulate many elections and estimate probabilities of the outcome Triumph of the geeks Nate Silver predicted the outcome of the 2012 US presidential election in all 50 states http://fivethirtyeight.blogs.nytimes.com/ Silver used a statistical model to combine the results of state-by-state polls, weighting them according their previous accuracy, and to simulate many elections and estimate probabilities of the outcome Triumph of the geeks Nate Silver predicted the outcome of the 2012 US presidential election in all 50 states using . Statistics Triumph of the geeks Nate Silver predicted the outcome of the 2012 US presidential election in all 50 states using . Statistics http://fivethirtyeight.blogs.nytimes.com/ Silver used a statistical model to combine the results of state-by-state polls, weighting them according their previous accuracy, and to simulate many elections and estimate probabilities of the outcome Triumph of the geeks Others did, too. “Dynamic Bayesian forecasting of presidential elections in the states,” by Drew A. Linzer, Journal of the American Statistical Association, in press But the interest in statistics didn’t start with the US elections. Triumph of the geeks “Nate Silver-led statistics men crush pundits in election” – Bloomberg Businessweek “Nate Silver has made statistics sexy again” – Associated Press “Drew Linzer: The stats man who predicted Obama’s win” – BBC News Magazine “The allure of the statistics field grows” – Boston Globe Triumph of the geeks “Nate Silver-led statistics men crush pundits in election” – Bloomberg Businessweek “Nate Silver has made statistics sexy again” – Associated Press “Drew Linzer: The stats man who predicted Obama’s win” – BBC News Magazine “The allure of the statistics field grows” – Boston Globe But the interest in statistics didn’t start with the US elections. Statistics in the news New York Times, August 6, 2009 “I keep saying that the sexy job in the next 10 years will be statisticians” – Hal Varian, Chief Economist, Google Statistics in the news New York Times, January 26, 2012 “I went to parties and heard a little groan when people heard what I did. Now they’re all excited to meet me” – Rob Tibshirani, Department of Statistics, Stanford University Statistics in the news New York Times, February 11, 2012 “Statistics are interesting and fun. It’s cool now” – Andrew Gelman, Department of Statistics, Columbia University Statistics in the news The Wall Street Journal, December 28, 2012 Carl Bialik, The Numbers Guy Data • Administrative (e.g., tax records), government surveys • Genomic, meteorological, air quality, seismic, . • Electronic medical records, health care databases • Credit card transactions, point-of-sale, mobile phone • Online search, social networks • Polls, voter registration records A veritable tsunami/deluge/avalanche of data Data, data, and more data Why is there so much talk of statistics and statisticians? • Administrative (e.g., tax records), government surveys • Genomic, meteorological, air quality, seismic, . • Electronic medical records, health care databases • Credit card transactions, point-of-sale, mobile phone • Online search, social networks • Polls, voter registration records A veritable tsunami/deluge/avalanche of data Data, data, and more data Why is there so much talk of statistics and statisticians? Data Data, data, and more data Why is there so much talk of statistics and statisticians? Data • Administrative (e.g., tax records), government surveys • Genomic, meteorological, air quality, seismic, . • Electronic medical records, health care databases • Credit card transactions, point-of-sale, mobile phone • Online search, social networks • Polls, voter registration records A veritable tsunami/deluge/avalanche of data Demand 2011 McKinsey Global Institute report: Big data: The next frontier for innovation, competition, and productivity “A significant constraint. will be a shortage of . people with deep expertise in statistics and data mining. a talent gap of 140K - 190K positions in 2018 (in the US)” http://www.mckinsey.com/insights/mgi/research/technology and innovation/ big data the next frontier for innovation • However, Big Data does not automatically mean Big Information • Science, decision-making, and policy formulation require not only prediction and finding associations and patterns, but uncovering causal relationships • Which, as we’ll discuss later, is not so easy. Opportunities and challenges • Our ability to collect, store, access, and manipulate vast and complex data is ever-improving • The potential benefits to science and society of learning from these data are enormous Opportunities and challenges • Our ability to collect, store, access, and manipulate vast and complex data is ever-improving • The potential benefits to science and society of learning from these data are enormous • However, Big Data does not automatically mean Big Information • Science, decision-making, and policy formulation require not only prediction and finding associations and patterns, but uncovering causal relationships • Which, as we’ll discuss later, is not so easy. Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.” Perils From “The Age of Big Data” With huge data sets and fine-grained measurement,. there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.” Perils From “The Age of Big Data” With huge data sets and fine-grained measurement,. there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.” Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.” Statistics While Big Data have inspired considerable current interest in statistics, statistics has been fundamental in numerous areas of science, business, and government for decades Critical need Sound, objective methods for modeling, analysis, and interpretation While Big Data have inspired considerable current interest in statistics, statistics has been fundamental in numerous areas of science, business, and government for decades Critical need Sound, objective methods for modeling, analysis, and interpretation Statistics Critical need Sound, objective methods for modeling, analysis, and interpretation Statistics While Big Data have inspired considerable current interest in statistics, statistics has been fundamental in numerous areas of science, business, and government for decades • Statistical stories • Our data-rich future Roadmap • A brief history • Our data-rich future Roadmap • A brief history • Statistical stories Roadmap • A brief history • Statistical stories • Our data-rich future Statistics: The science of learning from data and of measuring, controlling, and communicating uncertainty The path to what is now the formal discipline of statistical science is long and winding. What is statistics? The path to what is now the formal discipline of statistical science is long and winding. What is statistics? Statistics: The science of learning from data and of measuring, controlling, and communicating uncertainty What is statistics? Statistics: The science of learning from data and of measuring, controlling, and communicating uncertainty The path to what is now the formal discipline of statistical science is long and winding. • But it was not until the the mid-1600s that the mathematical notions of probability began to be developed by (mainly) mathematicians and physicists (e.g., Blaise Pascal), often inspired by games of chance • The first formal attempt to summarize and learn from data was by John Graunt, who created a precursor to modern life tables used in demography • Christiaan Huygens was among the first to connect such data analysis to probability Origins – pre-1700 • Sporadic accounts of measurement and data collection and interpretation date back as early as 5 B.C. • The first formal attempt to summarize and learn from data was by John Graunt, who created a precursor to modern life tables used in demography • Christiaan Huygens was among the first to connect such data analysis to probability Origins – pre-1700 • Sporadic accounts of measurement and data collection and interpretation date back as early as 5 B.C. • But it was not until the the mid-1600s that the mathematical notions of probability began to be developed by (mainly) mathematicians and physicists (e.g., Blaise Pascal), often inspired by games of chance

Load more