Terrorism and the Rise of Right-Wing Content in Israeli Books
Total Page:16
File Type:pdf, Size:1020Kb
Terrorism and the Rise of Right-Wing Content in Israeli Books Supplementary Appendix Contents A Automated text analysis: Extracting data on political ideology in popular books 3 A.1 Textual sources . .3 A.2 Selection of books into the Google N-gram corpus . .4 A.3 Textual pre-processing of Hebrew words . .8 A.4 Generating a comprehensive dataset of political phrases in Israeli books . .9 A.5 Mapping phrases to the ideology of political parties . .9 A.6 Measuring phrases’ yearly frequency in Google N-gram . 10 A.7 Summary statistics . 13 B Empirical strategy and modeling choices 14 C Robustness tests 18 D Additional estimations 22 E Israeli civics textbook comparison 26 F Information on book publishers in Israel 31 1 List of Figures A1 OCR quality of Hebrew books with right and left-wing phrases . .5 A2 Phrases most strongly linked the ideology of political blocs . .7 A3 Stemming Hebrew Text . 12 A4 Residuals over time . 15 A5 Year-by-year correlation of the residuals . 16 A6 Placebo: Terrorism casualties from other countries . 19 A7 Topic Prevalence During the Intifadas vs. Not . 21 List of Tables A1 List of political parties by bloc . .6 A2 Summary Statistics . 13 A3 Placebo: Frequency of phrases by bloc and the number of casualties from other conflicts . 20 A4 Frequency of phrases by bloc after the peak of terrorism in the First Intifada 23 A5 Frequency of phrases by bloc and the number of casualties from Palestinian Violence (cumulative) . 24 A6 Frequency of phrases by bloc and the number of casualties from Palestinian violence (other estimations) . 25 A7 Civics book comparison: Book structure . 29 A8 Civics book comparison: The roots of the Israeli-Palestinian conflict . 30 A9 The Twenty Major Book Publishing Companies in Israel . 32 2 A Automated text analysis: Extracting data on political ideology in popular books A primary contribution of this study is to apply innovative data collection methods— specifically, automated text analysis—to quantitatively measure the popularity of right-wing ideology in books in the wake of terrorist violence. I make use of Hebrew language textual sources to generate a time-series, phrase-level dataset on the readership of right, center, and left-wing political ideas in books published in Israel. The text extraction method involves several steps. First, I obtain three sources of text: Israeli party platforms, the full text of the Hebrew edition of Wikipedia, and the Google N-gram corpus in Hebrew. Second, I pre-process these text corpora using an original algorithm designed to stem political texts in Hebrew. Finally, I generate a comprehensive vocabulary of political phrases extracted from party platforms and Wikipedia, and measure these phrases’ yearly frequency in Israeli books. A.1 Textual sources The first source I use is Israeli political party platforms from 1981 to 2013. I use these platforms to generate a “core vocabulary” that represents political issues on the right-center- left ideological space. I downloaded all available historical election platforms from the website of The Israel Democracy Institute.1 I converted the platforms to textual files using optical character recognition software, and complemented it with manual correction of mistakes. The platforms corpus consists of 51 platforms from right-wing, center, and left-wing parties. The second source I use is the full text of the Hebrew edition of Wikipedia, which con- sists of 156,531 articles. I divide each Wikipedia article into paragraphs, and calculate the paragraph-level association between “core” phrases coming from party platforms and “asso- ciated” phrases appearing in Wikipedia. More details on this process are described below. The third textual source I use is the Google N-gram corpus in Hebrew, which includes a sample of all books published in Israel from 1980 to 2008. The full Google N-gram dataset was created from a corpus of digitized books in eight languages drawn from several dozen universities around the world and direct contributions by publishers. The 2012 version of the dataset consists of more than 8 million books, which is about 6% of books ever published. The textual information was collected by scanning books and digitizing their text with optical character recognition (OCR). Metadata includes information on the year and place 1The platforms can be found at: http://en.idi.org.il/tools-and-data/ israeli-elections-and-parties/ 3 of publication (Michel et al., 2011; Lin et al., 2012). In this paper, I use the “2-gram” (i.e., two-word phrases) version of the Google N-gram dataset in Hebrew. The 2012 version of the Hebrew dataset consists of more than 70,000 books and over 8 billion words (Lin et al., 2012). A.2 Selection of books into the Google N-gram corpus One might wonder how books are selected into the Google N-gram corpus, and whether such selection might lead to bias. For example, if only books with certain type of content enter the corpus, such as purely academic publications, that might influence the analysis and interpretation of results. Similarly, if the selection of books varies over time, we might worry about bias in the time-series analysis in this paper. To investigate these possibilities, I examined, first, how Google Books obtained and scanned Hebrew language books — is there a reason to worry about selection of specific content? Second, since the Google N-gram corpus consists of a subset of Google Books, I evaluated the extent to which the selection of books into the N-gram corpus might render the analysis biased. Selection of books into Google Books Some have criticized Google Books in English for heavily representing academic works, as a result of partnering with major American uni- versity libraries (Pechenick, Danforth and Dodds, 2015). The critique is centered around the fact that books in universities heavily over-represent academic publications and not enough popular works. In Israel, however, the process of obtaining books has been different. Unlike Google Books in English, the Hebrew version is based on books obtained through contracts with Israeli popular book publishers. In 2010, the company signed contracts with several major Israeli publishers, including Keter and Hakibbutz Hameuchad, as well as smaller publishing houses (Hirschauge, 2010; Lilian, 2010; Kabir, 2010). The inclusion of Hebrew language popular books allows making inferences on the content of books that go beyond pure academic publications. Selection of books into the Google N-gram corpus Since Google N-gram is a subset of Google Books, it is important to understand how books are selected into the N-gram corpus. While the process of selection is not completely transparent, Michel et al. (2011) indicate that books entered the corpus on the basis of “the quality of their OCR and metadata.”2 This is not gold-standard random sampling, but selecting books with good quality OCR is likely orthogonal to the ideological content in books. In other words, it is hard to imagine that books with right-wing or left-wing content would have systematically different OCR 2Michel et al. (2011), p. 176. 4 qualities. In addition, one might worry that OCR quality varies over time, as books published in earlier years might be scanned with poorer quality. However, since the analysis in this paper is based on books published relatively recently — between 1980 and 2008 — the quality of OCR is unlikely to vary over these years. I visually inspected the OCR quality of books with right and left-wing phrases, as well as books published in various years analyzed in this paper. As Figure A1 shows, books with right and left-wing content, published in earlier and later years, have similar levels of OCR quality. 5 Figure A1: OCR quality of Hebrew books with right and left-wing phrases (A) “People of Israel” (published in 2004) (B) “Star of David” (published in 2000) (C) “Human Rights” (published in 2002) (D) “Just Peace” (published in 2001) (E) “Jewish-Arab’ (published in 1986) (F) “Synagogue’ (published in 1985) Note: The figure presents screenshots of Hebrew language books that had good quality OCR, included phrases linked to right and left-wing parties, and were published at different points in time. Sources of text: (A) Gamliel (2003, p. 27); (B) Owan and Bruner (2000, p. 13); (C) Golan Agnon (2002, p. 52); (D) Ophir (2001, p. 66); (E) Sela-Shefi and Geretz (1986, p.42); (F) Shiloach (1985, p. 39). 6 Table A1: List of political parties by bloc Election year Parties in left-wing bloc Parties in center bloc Parties in right-wing bloc 1981 Maarach, Hadash, Ratz* Telem*, Shinui Likud, Mafdal, Agudat Israel*, Hatchiya*, Tami 1984 Maarach, Hadash, Ratz*, Shinui Likud, Mafdal, Agudat Yachad*, Progressive List for Israel*, Hatchiya-Tzomet, Peace*, Shas, Morasha*, Tami*, Kach*, Ometz* 1988 Maarach, Hadash, Ratz*, Shinui Likud, Mafdal, Agudat Mapam*, Progressive List Israel*, Hatchiya*, Tzomet, for Peace*, Mada* Shas, Moledet*, Degel Hatorah* 1992 Labor, Hadash, Meretz, – Likud, Mafdal, Tzomet, Mada*, Shas, Yahadut Hatorah*, Moledet* 1996 Labor, Hadash-Balad, Third Way* Likud, Mafdal, Shas, Yisrael Meretz, Mada-Raam BaAliyah*, Yahadut Hatorah*, Moledet* 1999 Israel Achat, Meretz, Raam, Shinui, Center Party*, Likud, Mafdal, Shas, Hadash, Balad, Am Echad* Yahadut Hatorah*, Yisrael BaAliyah*, National Union, Yisrael Beiteinu 2003 Labor, Meretz, Hadash-Taal, Shinui Likud, Mafdal, Shas, Am Echad*, Balad, Raam National Union, Yahadut Hatorah*, Yisrael BaAliyah* 2006 Labor,Meretz-Yachad, Kadima, Gil* Likud, Shas, Yisrael Raam-Taal, Hadash, Balad Beiteinu, National Union-Mafdal, Yahadut Hatorah* 2009 Labor, Meretz, Meimad*, Kadima Likud, haBait haYehudi, Hadash, Balad*, Raam, haIchud haLeumi, Tzomet*, Daam* Yahadut haTorah*, Shas, Israel Beteinu, Israel haMitchadeshet* 2013 Labor, Meretz, Hadash, Kadima, Hatnua*, Likud-Beitenu, haBait Balad* Yesh-Atid* haYehudi, Shas*, Yahadut haTorah* Note: * Indicates parties whose platforms were missing.