Science Magazine
Total Page:16
File Type:pdf, Size:1020Kb
SPECIAL SECTION PREDICTION THE PULSE OF THE PEOPLE Can internet data outdo costly and unreliable polls in predicting election outcomes? By John Bohannon on February 3, 2017 n an apartment on New York City’s Up- 1000 polls found evidence of widespread per West Side on 8 November 2016, data fabrication (Science, 4 March 2016, Hernan Makse and several friends p. 1014). By contrast, Makse’s group tracked cooked branzino and sipped Chablis the political opinions of millions of people as they watched the U.S. presidential directly, second by second, for months—and election unfold. They hopscotched they got those data for free. between MSNBC and Fox News while Twitter isn’t the only online data stream keeping an eye on The New York Times that scientists are funneling into predic- website on a laptop. The Times was tive models of everything from elections to streaming live updates of its “presidential street protests. The largest tech companies Ielection forecast.” It was still early, and re- such as Facebook and Google generate data http://science.sciencemag.org/ sults from key states had not yet come in. On that are free for researchers to use, though a chart labeled “Chance of Winning Presi- with varying degrees of inconvenience. So dency” that reflected the polling data rolling Makse and many other social scientists are in, Hillary Clinton bounced above 80%, leav- asking: Could online data enhance polling ing Donald Trump mired below 20%. as a forecasting tool, or even replace it? Makse, a statistical physicist at City Uni- The election night verdict: not yet. As the versity of New York, had placed a scientific evening wore on, Makse’s forecast based on Downloaded from bet on the outcome. The day before, his freely harvested tweets continued to match Both polling and an analysis of pre–election night lab group had posted a research paper to the pricey polling data, predicting a win for tweets failed to flush out Trump’s hidden voters. arXiv, the online preprint repository. They Clinton with 55.5% of the vote. But both had feverishly revised it to make the 4 p.m. forecasts got it wrong. Before their dinner tion to a community or society seems like deadline and publish on Election Day. Like was done, Makse watched as the projections a nonstarter. “But in some ways that is an the gauge chart on the Times website, they on the Times’s data-driven blog, The Up- easier problem,” says Taha Yasseri, a com- S predicted who would become president. But shot, caught up with reality. “It was funny putational social scientist at the University E G A whereas the Times used data from state-by- to see how at around 8 p.m.,” he says, “they of Oxford Internet Institute in the United M I state polling, Makse’s prediction was based switched from 20% to 95% for Trump.” Kingdom. He offers an analogy from phys- Y TT E G entirely on data gathered from Twitter in The internet, it seems, can’t yet reliably ics: Although the movement of a single / R E the months leading up to the election. take the pulse of the people. But Makse particle looks random, “the behavior of G N I If Makse’s group nailed the election fore- and many other social scientists are con- a gas made up of millions of particles is TR S / Z cast, they would have reason to brag. Poll- vinced that it eventually will—if only they very predictable.” E R A V ing, whether done by phone or door-to-door, can figure out how to translate terabytes of The idea that society can be treated like L A Z is extremely labor intensive and expensive: data into human intentions. a physics problem has deep roots. In the O N It fuels an $18 billion industry. And it has 1950s, science fiction author Isaac Asimov U DO M problems. Not only have response rates FORECASTING WHAT PEOPLE WILL DO, and conjured up a branch of science called R A U fallen to single digits, leaving pollsters to why, is the essence of social science. psychohistory. With powerful computers D E : O rely on a thin and biased sample of people, Considering how hard it is to divine even a and gargantuan data sets, he imagined, re- T O but also an analysis last year of more than single person’s behavior, scaling up predic- searchers would forecast not just elections, PH 470 3 FEBRUARY 2017 • VOL 355 ISSUE 6324 sciencemag.org SCIENCE Published by AAAS DA_0203SpecialNewsSection.indd 470 2/1/17 10:20 AM on February 3, 2017 http://science.sciencemag.org/ but the rise and fall of empires. from 86 different countries going back to the authors quip. Others agree that for Downloaded from A lifetime later, the computers and the World War II. To predict winners, Kennedy, the time being, polling reigns. “If you’re data Asimov envisioned are becoming real- David Lazer, a social scientist at Northeast- trying to predict a decision people will ity. But for now, polling—costly and ineffi- ern University in Boston, and his Ph.D. stu- make, there’s just no substitute for ask- cient as it is—remains the tool of choice for dent Stefan Wojcik statistically modeled the ing them directly,” says Andrew Gelman, predicting group behavior such as elections. elections using voter polling data as well a statistician at Columbia University. And a study of electoral races around the as data on other factors that can tip elec- Yet Lazer, for one, believes our reliance world on p. 515 suggests that polls are still tions: the country’s economic prosperity, on polling may not last much longer. “Ca- reliable, despite last November’s surprise. democratic freedoms—using a third-party nonical polling methods are in crisis,” he Ryan Kennedy, a social scientist at the measurement called a Polity score—and says. One factor is people’s growing im- University of Houston in Texas, and col- whether an incumbent was running. patience with pollsters; another is the death leagues focused on a data set of presidential They trained their models on data of the landline. You can’t survey people if elections. They avoided the complexity of up to 2007 and then tested them on the you don’t know how to find them. Could a comparing different government systems by most recent 8 years, totaling 128 elec- fire hose of data from the internet plug the limiting the study to elections in which vot- tions. Overall, they correctly predicted gap? That holds “great promise,” says Lazer, ers chose a national leader directly, rather the winner 80% to 90% of the time. And “but a lot of work has to be done before than, for example, through a party-based of all the indicators, polling proved the those approaches are validated.” parliamentary system like the United King- most powerful, by far. “We predict that One challenge is that it is hard to deci- dom’s. That filter left plenty of data: The reports of the death of quantitative elec- pher people’s motivations from their inter- final tally came to more than 500 elections toral forecasts are greatly exaggerated,” net habits: that is, their web searches and SCIENCE sciencemag.org 3 FEBRUARY 2017 • VOL 355 ISSUE 6324 471 Published by AAAS DA_0203SpecialNewsSection.indd 471 2/1/17 10:20 AM SPECIAL SECTION PREDICTION social media posts. If millions of people vates someone to visit a website, tweet, and, amplifying a point of view. Deploying such tweet sentiments supportive of a candidate ultimately, vote one way or another. Once bots is like planting people in an audience to or critical of an opponent, can it be de- they solve the anonymity problem, he says, laugh at your jokes. duced, reliably, how they will vote? Predict- the team hopes to start predicting outcomes To use Twitter as a voter opinion poll, ing people’s behavior is tough, Yasseri says, such as elections within a few years. Makse’s team had to detect all those bots and “if you don’t know what motivates them.” filter them out first. They did that before elec- A promising test ground for probing MAKSE HIMSELF IS TRYING to improve his tion night by analyzing not only the content motivation is Wikipedia, a website used Twitter-based model. The morning after and timing of the tweets, but also information by a remarkably broad swath of humanity Trump’s election, he met his graduate stu- about the accounts behind them. A telltale as a one-stop shop for basic information dents and postdocs in the lab. The mood was sign of a bot is an account that does not use one on almost any topic. To see what Wikipe- grim. “Most of them are foreigners,” he says, of the standard Twitter software clients and dia’s traffic might reveal about electoral and the anti-immigrant rhetoric of Trump’s relentlessly retweets content from other ac- outcomes, Yasseri and fellow Oxford re- campaign had been bruising. counts verbatim. As they debotted their searcher Jonathan Bright have data, a stark pattern emerged: been tracking the number of Whereas the pro-Trump tweet- daily visitors to the Wikipedia Conventional wisdom ers were riddled with bots, those pages devoted to political par- During the party conventions last summer, the number of individuals who supporting Clinton seemed al- ties competing in the European tweeted in favor of Hillary Clinton (blue) spiked, far outstripping those most exclusively human. The ef- Union’s parliamentary elections who backed Donald Trump (red)—in line with polls that had Clinton in the lead.