Forecasting the 2016 EU Referendum with Big Data: Remain to win, in spite of Cameron Ronald MacDonald, University of Glasgow and Xuxin Mao, UCL

This report summarises predictions about the outcome of the upcoming EU referendum using what we call the ‘Big Key findings Data’ framework. The approach supplements daily polling data with information about what relevant online •Momentum for a Remain searches are being made by voters in the run up to an victory began before the election or referendum. and UKIP’s controversial immigration poster. Our starting point is that what people search for online •Voter decision making is may be a better indication of polling intention than what not being determined by they actually tell pollsters, or at least give value added issues such as security, to that intention. Evidence from our two previous sovereignty or the cost of studies has borne this out. The underlying statistical model we use for computational purposes has the added EU membership advantage that it can pick up momentum effects in the •The Leave campaign data and momentum reversals; it can also identify benefits - and only benefits potential factors that influence tactical voting. - from the immigration issue •Interventions by David The predictions contained here are based on text mining Cameron are having a Google searches up to 18th June 2016, a period that negative effect on the includes the tragic death of Jo Cox and the controversial Remain campaign and a UKIP immigration poster. For the polling data we were positive impact on Leave able to use daily voting intention information between 15th April, when the campaign officially started, and 18th June. The polling information came from ORB, Survation and YouGov, the only three mainstream pollsters that provide regular poll updates.

The momentum is now clearly in favour of Remain - and did not originate with the Jo Cox tragedy or UKIP poster

In Figure 1, below, we portray voting intention data based on YouGov, Survation and ORB polls and note that after the campaign officially started on 15th April, Remain enjoyed a comfortable majority of 4-7 until late May 2016 when a Leave momentum kicks in. But this momentum has stalled since 12th June and indeed has now reversed. Together with our findings of a reduced number of undecided voters swinging towards Leave, the momentum is now clearly in favour of Remain. Our statistical analysis detects a shift in favour of Remain prior to 16th June and so, contrary to opinions expressed by some pollsters and journalists, the momentum change does not originate with either the Jo Cox tragedy or the Nigel Farage poster row. Figure 1 Voting Intention Data based on YouGov, Survation and ORB

50

37.5

Remain 25 Leave Undecided

12.5

0 21-Feb-20 23-Mar-20 23-Apr-20 24-May-20 24-Jun-20 Note: The units are in percent.

The first part of the TRUST approach relies on the text mining a very large data-base of newspapers in print, along with their web based counterparts, using sophisticated algorithms to represent the topics that will motivate voters (this is discussed in some detail in our previous work - see panel, below). The results are summarised in Table 1 for various periods of the campaign, with the key themes summarised in the last row.

Table 1: Text Mined Topics on the EU Referendum between during the official EU Referendum Campaign Period

15 April-14 May 2016 economy, market, trade, Cameron, Osborne, Obama 15-21 May 2016 market, business, economic price, bill, Cameron, Johnson Market, trade, economy, treasury, claim, immigration, , Cameron, 22-28 May 2016 Johnson 29 May-4 June 2016 trade, market, economy, immigration, Cameron, Labour 5-11 June 2016 market, trade, economy, immigration, Cameron, Johnson, Labour Trade, work, market, bank, price, Pound, business, Cameron, 12-18 June 2016 Labour UK economy, EU trade, Single Market, EU immigration, Key Themes , Boris Johnson, Labour Party

There are several noteworthy points here. Firstly, key words/ names such as Corbyn, UKIP/Farage or SNP rarely show up in our algorithmic searches of the newspapers during the EU campaign period. This suggests that the EU referendum is more of an internal Conservative matter since the key names Cameron and Johnson constantly come up as motivational keywords. Secondly, economic issues (trade, economy, the Single Market, etc) dominate the referendum themes. Thirdly, immigration only emerges as an issue from 22 May to 11 June, the same period when the Leave side were generating momentum in the polls and Remain was trailing in the polls.

Fourthly, issues such as security, the constitution, sovereignty, the NHS, and the cost of EU membership and the potential issue of an EU army, claimed by many to be important issues, are not directly related to people's decision making. Despite the coverage given to the killing of Jo Cox and the controversy aroused by the UKIP poster of a queue of refugees launched by Nigel Farage on 16th June, there was no evidence in our data of these being motivational factors, although we cannot comment on whether these factors affected intentions after 18th June.

Issues such as security, sovereignty, the NHS and the cost of EU membership are not directly related to voter’s decision making

Using the text-mined topics noted in Table 1 we then construct Big Data volume indictors based on the key EU themes. Combined with daily voting intention information, we conducted a statistical analysis that is used to predict the outcome of the referendum. Before using it in that way, however, some of the determinants of how people will vote that our statistical analysis generates are interesting and these are summarised in Table 2 (essentially these terms are significant coefficients, or weights, on the relevant term; so, for example, the issue of the UK economy has a statistically positive effect of 0.01 per cent).

From Table 2 the first noteworthy finding is a bigger swing tendency towards Remain than Leave: when the What is the ‘Big Data’ ratio of undecided voters is reduced by one percent, framework? there is a 0.5% increase in the Remain vote and a lesser 0.43% increase in Leave’s vote. This is due perhaps to a •The ‘Big Data’ Topic status quo bias, caused by people’s dislike of change Retrieved, Uncovered and and uncertainty. Structurally Tested (TRUST) framework was previously used for predicting the Second, while general economy related arguments help outcomes of the Scottish the Remain side and shows no effects on Leave, referendum and the 2015 potential voters do not appear to be interested in the specifics of this in terms of EU trade issues or indeed General Election the Single Market. Third, Table 2 shows that the Leave •Supplements daily polling camp benefits, and only benefits, from immigration data with information on issues, while the Remain camp did badly throughout relevant online searches by the campaign on this topic. voters •Uses a statistical model that identifies momentum effects Fourth, it is striking to find that Boris Johnson does not and reversals, and potential appear with statistical significance on either side of factors that influence tactical the debate while David Cameron’s interventions appear voting to have a negative effect on the Remain vote and a •More detail here positive impact on Leave. In terms of other political figures, has been accused of having a lacklustre performance throughout the referendum and that is borne out here, as the Labour party does not have any significant loadings in the statistical model.

David Cameron’s interventions appear to have a negative effect on the Remain vote and a positive impact on Leave

Table 2: Factors Influencing Remain and Leave voters, 15 April -18 June 2016

Remain Voting Intention Leave Voting Intention Undecided Voter -0.50 -0.43 UK Economy 0.01 EU Trade Single Market EU Immigration -0.06 0.07 David Cameron -0.07 0.05 Boris Johnson Labour Party

Finally, we use our statistical model to calculate the predicted outcomes for the referendum, reported in Table 3, and they show that remain will have a clear win in the referendum with a mean poll of 48% against Leave’s 44%. Allowing for our calculated swing ratios, noted in the second row of the Table, confirms that even taking account of undecided voters Leave cannot win the referendum as shown in the Final Rate Range and Mean rate rows.

Table 3: Projecting Referendum Voting Results

Remain Leave Mean Voting Intention Rate 48.4% 45% Swing votes Range 0-3.4% 0-2.9% Final Rate Range 50.1%-53.6% 46.4-49.7% Final Mean Rate 51.9% 48.1%