Measuring Democracy: from Texts to Data

Measuring Democracy: From Texts to Data Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Thiago Veiga Marzag~ao,M.A. Graduate Program in Political Science The Ohio State University 2014 Dissertation committee: Sarah Brooks, Advisor Irfan Nooruddin Marcus Kurtz Janet Box-Steffensmeier Copyright by Thiago Veiga Marzag~ao 2014 Abstract In this dissertation I use the text-as-data approach to create a new democracy index, which I call Automated Democracy Scores (ADS). Unlike other indices, the ADS are replicable, have standard errors small enough to actually distinguish between cases, and avoid contamination by human coders' ideological biases. ii Dedication Dedicated to the taxpayers who paid for this research. iii Acknowledgements Sarah Brooks, Irfan Nooruddin, Marcus Kurtz, and Janet Box-Steffensmeier provided invaluable advice and mentorship. They pushed me to be methodologically rigorous and to fully explore the substantive implications of my research. Also, they allowed me to take a big risk: creating an automated democracy index has never been attempted before and a different committee might have found the idea too ambitious for a dissertation research. I thank my committee members for their trust and open-mindedness. I am also indebted to those who took the time to read and comment earlier drafts and/or discuss my research idea: Philipp Rehm, Paul DeBell, Margaret Hanson, Carolyn Morgan, Peter Tunkis, Vittorio Merola, Raphael Cunha, and Marina Duque. I am also grateful to the institutions and people that provided material assistance. The Fulbright (grantee ID 15101786) and the Coordena¸c~aode Aperfei¸coamento de Pessoal de N´ıvel Superior - CAPES (BEX 2821/09-5) paid my tuition, university fees, airline tickets, and part of my health insurance. The Ministériodo Planejamento, Or¸camento e Gest~ao- MPOG (proceed n. 03080.000769/2010-53) granted me four years of paid leave. The Ohio Supercomputer Center allocated computing time. Courtney Sanders patiently answered my endless questions about graduation, forms, and procedures. Finally, I am indebted to my loved ones, all of which supported me even from a great distance. All errors are mine. iv Vita July 2003 . B.S. International Relations, University of Bras´ılia June 2007 . M.A. International Relations, University of Bras´ılia December 2012 M.A. Political Science, Ohio State University Publications A dimens~aogeogrfica das elei¸c~oesbrasileiras (\The spatial dimension of Brazilian elections"). Opini~aoPública(Public Opinion), 19(2), 270-290, 2013. Lobby e protecionismo no Brasil contemporâneo(\Lobby and protectionism in Brazil"). Revista Brasileira de Economia (Brazilian Review of Economics), 62(3), 263-178, 2008. Fields of Study Major Field: Political Science v Table of Contents Abstract . ii Dedication . iii Acknowledgements . iv Vita .............................................................................. v List of Tables . vii List of Figures . viii Introduction . 1 Paper 1: Ideological Bias in Democracy Measures . 3 Paper 2: Automated Democracy Scores . 27 Paper 3: Measuring Democracy From Texts: Can We Do Better Than Wordscores? 63 Conclusion . 92 References . 93 Appendix A: SEM Estimation . 99 Appendix B: Replication . 101 Appendix C: HMT . 103 Appendix D: HBB . 105 Appendix E: Multiple Decision Trees . 107 vi List of Tables Table 1. Bollen and Paxton's regressions for 1980 . 10 Table 2. Replication of Bollen and Paxton's regressions for 1980 . 12 Table 3. Simulation results for Marxism-Leninism and Catholicism 20 Table 4. Simulation results for Protestantism and monarchy . 21 Table 5. ADS summary statistics, by year . 47 Table 6. Correlation between ADS and other indices, by year . 49 Table 7. Largest discrepancies between ADS and UDS . 50 Table 8. Overlaps for the year 2008 . 54 Table 9. Correlations with UDS (using 50 topics) . 81 Table 10. Correlations with UDS (using 100 topics) . 82 Table 11. Correlations with UDS (using 150 topics) . 83 Table 12. Correlations with UDS (using 200 topics) . 84 Table 13. Correlations with UDS (using 300 topics) . 85 Table 14. Top 5 topics extracted with LSA . 87 Table 15. First 5 topics extracted with LDA . 89 vii List of Figures Figure 1. Bollen and Paxton's model . 7 Figure 2. Bollen and Paxton's fit statistics . 8 Figure 3. Fit statistics from my replication of Bollen and Paxton . 9 Figure 4. Automated Democracy Scores, 2012 . 45 Figure 5. Automated Democracy Scores, 1993-2012 . 46 Figure 6. ADS range and press coverage . 48 Figure 7. Example of wordsXtopics table generated with LSA . 68 Figure 8. Example of topicsXdocuments table generated with LSA 70 Figure 9. Example of decision tree . 78 viii Introduction In this dissertation I investigate the flaws of current democracy indices and propose a new, improved one. This dissertation consists of three papers. In the first paper I show that, unlike what previous research has led us to believe, we cannot make any claims about the nature of the ideological biases that contaminate existing democracy measures. For instance, I show that the Freedom House data, often believed to have a conservative bias, may actually have a liberal bias instead. I do that by replicating previous research on the subject (Bollen and Paxton 2000) but replacing real-world data by simulated data in which I manipulate democracy levels and the ideological biases of hypothetical raters. The results of these Monte Carlos show that even though we can confidently assert the existence of bias in some democracy measures we cannot say anything about which measures are biased or in what ways. That means we currently have no way to circumvent the circularity problem: if we find that democracy is associated with some variable X is that a genuine association or an artifact of our democracy measure being biased toward X? In the second paper I use automated text analysis to create the first machine- coded democracy index, which I call Automated Democracy Scores (ADS). I produce the ADS using the well-known \Wordscores" algorithm and 42 million news articles from 6,043 different sources. The ADS cover all independent countries in the 1993- 2012 period. Unlike the democracy indices we have today, the ADS are replicable, have standard errors small enough to actually distinguish between cases, and avoid contamination by human coders' ideological biases; and a simple (though computa- 1 tionally demanding) extension of the method would yield daily data and real-time data. I create a website where anyone can replicate and tweak the data-generating process by changing the parameters of the underlying model (no coding required): www.democracy-scores.org In the third paper I explore other ways to create an automated democracy index from news articles. More specifically, I use the same news articles used in the second paper but I replace Wordscores by other algorithms - namely, a combination of topic extraction methods (Latent Semantic Analysis and Latent Dirichlet Allocation) and decision trees. The goal is to address the issue of construct validity more directly. 2 Ideological Bias in Democracy Measures Abstract In this paper I show that, unlike what previous research has led us to believe, we cannot make any claims about the nature of the ideological biases that contaminate existing democracy measures. For instance, I show that the Freedom House data, often believed to have a conservative bias, may actually have a liberal bias instead. I do that by replicating previous research on the subject (Bollen and Paxton 2000) but replacing real-world data by simulated data in which I manipulate democracy levels and the ideological biases of hypothetical raters. The results of these Monte Carlos show that even though we can confidently assert the existence of bias in some democracy measures we cannot say anything about which measures are biased or in what ways. That means we currently have no way to circumvent the circularity problem: if we find that democracy is associated with some variable X is that a genuine association or an artifact of our democracy measure being biased toward X? 1. Introduction What do we know about ideological bias in democracy measures? Bollen and Paxton (2000), using structural equation modeling, find that several indicators from the Freedom House dataset (Sussman 1982; Gastil 1988) and from Arthur Banks' Cross-National Time Series Archive - CNTS (Banks [1971], updated through 1988) are compromised by ideological bias: the coding is sensitive to a number of variables that (conceptually) have nothing to do with democracy, such as economic policy (whether the polity is Marxist-Leninist), religion (whether the polity is predominantly Roman Catholic or predominantly Protestant), and form of government (whether the polity is a monarchy or a republic). 3 Bollen and Paxton's article has been highly influential. Fourteen years after its publication it is still the only comprehensive, systematic attempt to uncover ideological bias in democracy measures. As such, it appears in nearly every discussion of democracy measurement (e.g.: Munck and Verkuilen [2002], Treier and Jackman [2008], Pemstein, Meserve and Melton [2010]). And it has influenced researchers' choices - most notably, it has become commonplace to avoid the Freedom House democracy data on the grounds that Bollen and Paxton have found them to have a conservative bias. Although influential, Bollen and Paxton's findings have never been subjected to scrutiny; they are usually taken at face value. In this paper I perform the first re- assessment of Bollen and Paxton's findings. I do that with simulated data in which I manipulate the countries' levels of democracy and the measures' ideological biases. I find that Bollen and Paxton's method: a) yields incorrect results about which democracy measures are biased; b) yields incorrect results about the nature of those biases; and c) fails to find bias when different measures are biased in similar ways. In sum, I show that for the past fourteen years political scientists have allowed flawed results to influence their choices of democracy data.

Measuring Democracy: from Texts to Data

State of the Nation Address to the 3Rd Session of the 10Th Parliament

2021 COSAFA Cup TEAM NOTES

A History of the Botswana Defence Force, C. 1977-2007

Sport and Recreationand Rec

EXECUTIVE COUNCIL Twenty Second Ordinary Session 24 – 27 January 2013 Addis Ababa, ETHIOPIA EX.CL/755(XXII) Original: English/French

(CAF) NATIONS CUP in GABON By

The Modernity of Witchcraft in the Ghanaian Online Setting

Rugby World Cup 2015 1 17 September 2015

Catalogoabcafrica2010 I.Pdf

State of the Nation Address

South Africa South Africa Yearbook 2011/12

ICRC Annual Report 2011