The Promise and Pitfalls of Conflict Prediction: Evidence from Colombia and Indonesia
Total Page:16
File Type:pdf, Size:1020Kb
NBER WORKING PAPER SERIES THE PROMISE AND PITFALLS OF CONFLICT PREDICTION: EVIDENCE FROM COLOMBIA AND INDONESIA Samuel Bazzi Robert A. Blair Christopher Blattman Oeindrila Dube Matthew Gudgeon Richard Merton Peck Working Paper 25980 http://www.nber.org/papers/w25980 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 June 2019 We thank seminar participants at ESOC, MWIEDC, ISF, NBER Economics of National Security, and NEUDC for helpful feedback. Miguel Morales-Mosquera provided excellent research assistance. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. © 2019 by Samuel Bazzi, Robert A. Blair, Christopher Blattman, Oeindrila Dube, Matthew Gudgeon, and Richard Merton Peck. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. The Promise and Pitfalls of Conflict Prediction: Evidence from Colombia and Indonesia Samuel Bazzi, Robert A. Blair, Christopher Blattman, Oeindrila Dube, Matthew Gudgeon, and Richard Merton Peck NBER Working Paper No. 25980 June 2019 JEL No. C52,C53,D74 ABSTRACT Policymakers can take actions to prevent local conflict before it begins, if such violence can be accurately predicted. We examine the two countries with the richest available sub-national data: Colombia and Indonesia. We assemble two decades of fine-grained violence data by type, alongside hundreds of annual risk factors. We predict violence one year ahead with a range of machine learning techniques. Models reliably identify persistent, high-violence hot spots. Violence is not simply autoregressive, as detailed histories of disaggregated violence perform best. Rich socio-economic data also substitute well for these histories. Even with such unusually rich data, however, the models poorly predict new outbreaks or escalations of violence. "Best case" scenarios with panel data fall short of workable early-warning systems. Samuel Bazzi Oeindrila Dube Department of Economics University of Chicago Boston University Harris School of Public Policy 270 Bay State Road 1307 E 60th St Boston, MA 02215 Chicago, IL 60637 and CEPR and NBER and also NBER [email protected] [email protected] Matthew Gudgeon Robert A. Blair OEMA & Department of Social Sciences Brown University 607 Cullum Road Department of Political Science West Point, NY 10966 Watson Institute for International [email protected] and Public Affairs Providence, RI 02912 Richard Merton Peck [email protected] Northwestern University 2211 Campus Dr Christopher Blattman Evanston, IL 60208 Harris School of Public Policy [email protected] The University of Chicago 1307 E 60th St Chicago, IL 60637 and NBER [email protected] 1 Introduction Advances in data and computing techniques have kindled hopes that civil society, police, or peacekeepers could predict costly violence ahead of time. Such early-warning systems could be used to target scarce security personnel and resources, and prevent violence from occurring or escalating. Until recently, prediction focused on large-scale, country-level events, including coups, civil wars, and terror attacks.1 These macro-level efforts have informed policy, the science of prediction, and our understanding of violence. But such high-level predictions are not easy to act on. Scholars such as Cederman and Weidmann(2017) argue that country-level conflict predictions are unlikely to improve much in the future: there is simply too much complexity and randomness, they argue, to develop reliable forecasts over such wide time and space. Sub-national data or higher-frequency predictions could prove more fruitful. The past decade has seen the study of conflict push down to the micro-level causes, processes, and con- sequences, and we can avail ourselves of these data to investigate prediction. If governments, police, or peacekeepers can reliably predict what places will see escalations of violence, for instance, they may be able to act to prevent it. Policy options to prevent an ethnic riot or local unrest are likely better than policy options to prevent a civil war. The feasibility of these early warning systems are unknown, however. Now is a good moment to take stock of what existing methods and the richest available micro data can deliver. This paper takes advantage of high-quality and extensive data in two countries, Colombia and Indonesia. Both countries have been ravaged by violence for decades|a situation that typically does not bode well for data availability. But both countries are also wealthy enough (and have strong enough states and research communities) to produce some of the highest- 1The Political Instability Task Force's prediction efforts are likely the most well known (Goldstone et al., 2010). For other examples of cross-national prediction studies, see Beck et al.(2000); Brandt et al.(2011); Celiku and Kraay(2017); Gleditsch and Ward(2013); Gurr and Lichbach(1986); Harff(2003); Hegre et al.(2013, 2016); Perry(2013); Ward et al.(2013). For an early exception see Schrodt(2006) who studies violence in the Balkans. 2 quality local data in the developing world. This includes a trove of information on local socioeconomic conditions and other characteristics, plus at least a decade of subdistrict- or municipal-level data on local violence. We chose these two cases because they are among the current \best case" scenarios in terms of both micro-level data on violent events, as well as a wide range of predictors of violence, in panel form. If conflict prediction proves fruitful in these two cases, they could be models for other prediction efforts. If not, then we must ask where or with what additional data we can expect early warning systems to bear fruit. Finally, both countries have suffered recurring episodes of local violence during transitions to national peace. Anticipating and preventing these episodes is of both substantive and practical importance. We identified, collected, and merged dozens of subnational datasets in each country. This gives us an unusually rich array of hundreds of covariates per locality, including covariates that the empirical and theoretical literatures commonly associate with conflict onset and escalation (Blattman and Miguel, 2010). This data gathering also gives us multiple measures of the outcome we are trying to predict. For instance, using data from 1998 to 2014 in Indonesia, we are able to study conflicts related to interethnic or religious tensions, as well as electoral and resource disputes. In Colombia, our data span 1988 to 2005. We predict clashes between state, guerrilla, and paramilitary forces during a period of protracted civil conflict. We then deploy several machine learning methods to generate predictions of local violent incidents at the annual level. In our main year-ahead predictions, we train the algorithms on six to fourteen years of data, and forecast local conflict during the following year. We also examine predictive power across space as well as time, and with new outbreaks of violence as well as escalations. Our results illustrate both the promise and pitfalls of local violence forecasting. An ensemble of machine learning models effectively identifies locations at risk of having a violent incident. We are particularly effective at identifying \hot spots" with high concentrations 3 of violence, defined as five or more incidents in a single year. Indeed, our ensemble model, which leverages the best new methods, performs better than previous sub-national attempts (Blair et al., 2017; Colaresi et al., 2016; Weidmann and Ward, 2010; Witmer et al., 2017). We view these results as especially important given that such local hot spots can pose an especially serious risk of regional or national escalation. We find that local violence is not merely autoregressive, as a model consisting of the lagged dependent variable alone performs consistently poorly. Rather, our algorithms' strong performance is mainly driven by forecasts of where, but not when, violence is likely to occur. While a simple lagged dependent variable model yields disappointing results, more nuanced histories of violence tend to be the best predictors of hot spots in particular. In other words, to predict future violence, it is not enough to know where violence occurred in the past. But detailed and disaggregated histories of violence|including the severity of particular incidents (e.g., number of deaths, property damage) and the identity of the actors involved|perform very well. Even without such detailed and disaggregated histories, however, the available covariates also predict hot spots almost equally well. This suggests that much of the information contained in these violence histories is representative of observable characteristics of the units in our two samples. The most predictive risk factors tend to be slow-moving or time- invariant. In Colombia, for example, one of the most reliable predictors is terrain ruggedness. In Indonesia, robust predictors include religious and ethnic diversity as well as sectoral shares of the local economy. Importantly, predictive accuracy improves little when we add time-varying factors, includ- ing natural disasters, elections, and fluctuations