A Test of Media Capture by Using Machine Learning Techniques Evidence from Italian Television News 2010-2014

∗ Andrea De Angelis † Alessandro Vecchiato

September 9, 2016

VERY PRELIMINARY. PLEASE DO NOT CIRCULATE.

Abstract

A central question in political communication refers to the existence and extent of me- dia bias and how that could affect political stability and electoral outcomes. We address this question by using novel methodologies based on machine learning techniques and an origi- nal dataset collecting the entire corpus of national TV news outlets (including Rai, Mediaset, and “LA7” TV networks) in from 2010 to 2014. Textual models can perform linguistic and substantive analysis of large corpus of text by exploiting variation in language use across and within authors, documents and time. In this paper we first estimate ideology scores for each TV outlet and analyze their change in the period under study. Secondly, we exploit the discontinuity in public TV ownership in our data, to determine existence of media bias in the news outlets as a way to make the public TV a more “favorable” environment for the incum- bent right-wing government. Finally, we identify key news topics and track their saliency over time and across networks. This methodology allows us to test whether political lean- ing in the news is due to strategic issue selection in news coverage or to differing frames in communication of the news stories.

∗ Andrea De Angelis, European University Institute, Dpt. of Political and Social Sciences, via dei Roccettini 9 I- 50014, San Domenico di Fiesole (Italy); [email protected]; † New York University, Wilf Family Dept. of Politics, 19 W 4th Street, Room 302, 10012 New York, NY; alessan- [email protected];

1 1 Introduction

Several papers find a remarkable correlation between the set up of the media landscape and political outcomes in the US and around the world (see e.g. Djankov et al.(2003); McMillan and Zoido(2004); Reinikka and Svensson(2005); Gentzkow and Shapiro(2010); Durante and Knight(2012); DellaVigna et al.(2016)). To understand the complex interaction between media outlet and political actors, both economists and political scientists have examined the specific factors that shape incentives to manipu- late the quality and content of news that reach voters: media owners can exploit political connections to reach funding resources outside the advertisment market, politicians bene- fit from a favorable media arena that reduces voters monitoring ability and therefore their accountability1 (Besley and Prat, 2006; Prat and Stromberg, 2013). One crucial problem is how the ownership structure affects news selection and framing to favor a specific political party Strömberg(2015). This paper follows the literature on media and explores the role of news and issue framing as an alternative measure of bias. Research in economics suggests that media outlet show a strong liberal leaning. Groseclose and Mi- lyo(2005) reports that in the universe of US news media, only Fox News and the Washington Times received scores at the right of the center. Bias can be driven from owners preferences or audience demand2. Media outlets can deliberately modify their language by including political slant in order to attract readers with similar political views Gentzkow and Shapiro (2010). Alternatively, they may follow the ideological sensibility of their ownership or a po- litical party their are trying to support. In this last case, bias emerges as a result of political capture of the media. We focus on this case and provide evidence of media capture by ex- ploiting quasi-exogenous variation in media ownership. To investigate this relationship we face several challenges. First, media ideological po- sition is very difficult to estimate. A number of papers have resorted to various techniques that detect ideological correlates in newspaper language usage. Groseclose and Milyo(2005) considers a group of 200 prominent think tanks or policy groups and counts the times a par- ticular member of the Congress cited one of them. They perform the same procedure for a number of newspapers and other media outlet and assign an ideological score to each of them on the basis of the frequency in which each think tank was nominated. This procedure allows them to link the ideological bias of media outlets to the one of others political actors and so derive an ideological measure of media. A more recent paper by Gentzkow and Shapiro (2010) focuses in a similar fashion on media slant. To assign a particular ideological leaning to specific language they refer to the Congressional Records and identify those set of phrases that are used much more frequently by one party than the other. Secondly, they calculate the number of time each outlet resorts to particular language that may sway voter to the left or the right of the political spectrum and assign to each of them the corresponding ideological score. However, while these measures may be able to capture newspapers ideological leaning, are focused on a specific way in which the media outlet may try to influence reporting. We overcome this difficulty by adopting a new unsupervised machine learning technique first introduced by Slapin and Proksch(2008). WORDFISH is a scaling algorithm that estimates policy positions based on word frequencies in texts. Following the naïve Bayes assumption

1For a comprehensive review of the results on media and politics read Strömberg(2015). 2See below for more extensive results on theoretical models of bias.

2 prevalent on text analysis literature (Eyheramendy et al., 2003), this algorithm represents a text as a vector of words counts. Individual words are assumed to be distributed at random, and word frequencies to be generated by a Poisson process. The procedure treats each piece of text as expression of a separate ideological position, and based on their word frequencies, estimates for each of them the relative weight of words in discriminating among ideological positions, together with the ideology score of the document. All parameters are estimated simultaneously for the entire corpus of text. The model can be expressed as follows:

yijt ∼ P oisson(λijt) (1) λijt = exp(αit + ψj + βj · ωit) (2)

where: i indexes documents, j the tokens (i.e. words stems when only unigrams are used, or ngrams), and t to time. The only Poisson parameter λijt is modelled as a function of three latent components: αit are the document-specific fixed effect, ψj are word-specific fixed ef- fects (capturing the relative frequency of the words), βj are the word-specific discrimination weight parameters, capturing the ability of words to discriminate between ideological posi- tions, and finally ωit estimate the latent position of the document. This process allows not only to estimate relative scores across political actor based on relative word frequencies, but also, by assuming independence across texts, over time. The omega scores thus represents measures of ideology within the spectrum of the available texts and are allowed to change over time. A second challenge comes from the intrinsic unobservable nature of media capturing practices. Politicians may influence the media through bribes, legal favoritisms or by ap- pointing sympathetic managers. McMillan and Zoido(2004), for instance, uses a secret po- lice account of government bribes to investigate corruption in Peru. They find that media and especially TV channels were receiving the largest shares of bribes during the Fujimori regime. Tella and Franceschelli(2011) uses government advertisement practices as a proxy for favoritism. They find that newspapers with government advertising are less likely to talk about government corruption, with one standard deviation increase in government adver- tising being associated with a decrease in coverage of corruption scandals of 0.23 of a front page per month. Nevertheless, these measure do not allow for a systematic study of capture given their rarity or their underestimation of the extent of corruption (advertisement may be only one of the strategies the government use to favor specific actors). Our setup provides a number of solutions to these challenges. The Italian media land- scape has been widely criticized for its general lack of impartiality, to the point of blatant political bias. The public ownership of three media channel, Rai 1, Rai 2 and Rai 3, gives the government the prerogative of nominating their CEOs and newscasts editors. Grasso(2004) describes the historical process that led to the politicization of the Italian State television in detail. In 1975, the government reformed3 the national television system, shifting the pre- rogative of nominating the TV managers from the government to the parliament. This led to a practice called ‘lottizzazione’ (lotting) that resulted in the consistent nomination as di- rectors and editors of figures that were politically affiliated with specific national parties. Thus, each channel had a well-defined (though not formally stated) political affiliation: Rai1

3Legge n. 103 del 14 aprile 1975 in matters of national television broadcasting.

3 Figure 1: Patrick Chiappori on Berlusconi Resignation. Source: New York Times

typically supported the incumbent government (Democrazia Cristiana), Rai2 supported the Socialist Party, and Rai3 the Communist Party (or the far left). This practice evolved after 1993 toward a stronger focus for television newscasts. After each election, with a new parliament and government in place, the editors of the national channels (particularly Rai1 and Rai2) were typically replaced with journalists or political figures closer to the new establishment. This problem was exacerbated by the candidacy of Silvio Berlusconi, a media and real estate tycoon, who entered the political arena in the aftermath of the Mani Pulite scandal in 1993. When in power, Silvio Berlusconi, never completely divested its properties of three TV channels, Canale 5, Italia 1 and Rete 4, officially controlling either directly or indirectly 86% of the media market in Italy. DellaVigna et al.(2016) study the Italian case from the advertisement market perspective. Given that in Italy government officials are not required to divest business holdings, in this paper they are able to test whether during the years of Berlusconi’s governments there was strategic advertisement toward its networks as a form of political support. They find that during Berlusconi political tenure Mediaset profits (his TV company) increased by one billion Euros. Our setup exploits a similar empirical strategy to associate government influence on tele- vision. In the aftermath of the financial crisis of 2008, Berlusconi’s government was facing incredible pressure from the international markets to reduce the increasing national debt. His failures to handle the situation and reduce the national bond spread over the Euro zone led to his resignation on November, 12 2011. We exploit this quasi-exogenous variation in government power due to his rapid decline to detect change in media ideological leaning as a consequence of capture. By comparing TV news ideological score before and after his resig- nation, and at each time a network director was replaces, we are able to provide significant evidence of capture in the Italian media.

4 Another advantage of our setup comes from the use of a novel dataset of TV news tran- scriptions collected by the Italian service of Teche Rai. The data collect the universe of TV news in Italy in the period 2010-2014. That is, we are able to derive an ideological score for each national TV network news service4 that aired each day in the specified time frame. The extent of this data matched with our methodology gives us numerous advantages with re- spect to previous research. First, we do not have to resort to methods that hand-code net- work ideology and are thus sensitive to researcher discretion. Secondly, we do not have to select specific significant words as representative of ideological leaning from which to esti- mate media ideology. Instead, we simultaneously estimate words ideological score and we are able to select ex-post which are the most ideologically charged expressions. Finally, we are able to track ideological movements over time and with the highest degree of granularity5. This paper provides a number of new results. We find that Italian newscasts have signif- icantly different political leaning, that map generally similarly to the Italian political spec- trum where Berlusconi’s TV networks map toward the right while historically leftist channels to the left. This evidence is consistent with previous results by Gentzkow and Shapiro(2010); Groseclose and Milyo(2005) that report significant political leaning for American newspa- pers. Differently from previous research, our work shows that this differentiation is not due to viewers demand but political capture. We support this conclusion with two different em- pirical strategies. First, we provide evidence of strategic reporting by the TV networks. More pro-government networks tend to substitute economic issue that were particularly problem- atic during the financial crisis with crime and entertainment reporting. We thus do not find differences in issue framing across network but only on which issues to provide more empha- sis. Secondly, we identify specific structural breaks in news reporting and match them with the political agenda. We find that consistently with the change in government in 2011, the average score of the TV newscast shifted significantly to the left of the spectrum. We take these results as robust evidence of capture in the Italian television system. This paper is structured as follows. Section 2 describes the data and the methodology in detail. Section 3 provides the baseline results results on ideology for both media and MPs. Section 4 shows the ideological change around the temporal discontinuity in 2011 and runs a battery of tests to confirm the graphical result. Section 5 concludes.

2 Data 2.1 Teche Rai News Database

The prevalent source of data for this study were obtained from the Teche Rai Service, the historical archive of Italian Radio and Television broadcasts implemented by the State Broad- casting company RAI. These data were developed from the audio-video files with a platform of Automatic Newscast Transcript System, ANTS, especially targeted to news programs. The obtained transcription quality is about 90% correct recognition. Also, since the text is syn- chronized with the multimedia signal, given a word the researcher can have immediate access

4Italy has seven main national TV channels, three publicly owned (Rai1, Rai2, and Rai3) and four private (Canale5, Rete4, Italia1 and La7). The media landscape hasn’t change since then. 5As we later explain, we proceed by grouping TV transcription by week.

5 to the segment where it is pronounced. In addition, a validator performs a segmentation of the signal based on the speech footprint of the speaker. The obtained transcripts are good enough to be used for text-based search and information discovery by, for instance, full-text search engines and artificial intelligence techniques. The project has now completed the transcription of the corpus of newscast from 2010 to 2014. These data present a unique and verbatim source for text analysis, with the most detailed level of granularity. In its original form, a text document is represented by a segment on a particular topic during a newscast. Each of the seven Italian broadcasting networks typically run three daily news cast of about 30 minutes (morning, noon and evening service). The total amount of data collected by this database therefore amounts to almost 20.000 hours of broadcasting divided into 319,895 segments. The dataset present a number of convenient features. Additionally to the transcription of the newscast services, the dataset contains a number of meta-data that organize them and ease analysis, like network, starting and ending time of each segment and subject6. In order to make sure that our ideological scores come exclusively from journalistic report- ing we delete from the dataset all transcriptions referring to speech performed by politicians. That is, all politicians interviews, remarks and otherwise recorded speech is excluded from our analysis. We keep instead any journalistic commentary on those parts. Overall, all the text we analyzed consists of chronicles and opinions from journalists and reporters In this sense, our study assumes a hypothetical perfect pluralism as for the presence of politicians in TV, which is the main criteria adopted by the Italian Communication Authority (AGCOM)7.

2.2 Legislative Speeches

We preliminarily test our methodology with an application on unambiguously ideological actors: the Italian MPs. For the analysis on the legislative speeches we exploit an original dataset directly developed by the authors. We scraped from the official website of the Italian Chamber of Deputies8 the entire corpus of the debates of the XVI legislature (corresponding to the years 2008-2013) recorded in the Italian lower chamber (Camera dei Deputati). Yet, we limit our focus to the time frame in which the Berlusconi’s cabinet was in charge (2008- 2011). In fact, the end of the Berlusconi’s experience led to the formation (16 of November 2011) of the Monti cabinet. The latter period was characterized by highly exceptional political circumstances. The technocratic government’s political backing changed twice during its term9 This implies that whenever the MPs would address the government in their speech, the statistical algorithm could not possible distinguish between the previous references to the Berlusconi government and the successive Monti one. Switching opposition and majority this

6By subject we refer to general journalistic categorization into politics, chronicles, economy, arts and sports. 7More detailed information regarding the monitoring of political pluralism, the aims of the AGCOM authority, and the Law n. 249/1997 (the “par condicio” law) regulating the issue of pluralism on the media is available (in Italian) from the following AGCOM link: https://www.agcom.it/ldisciplina-della-par-condicio- 8The website is at the following link: http://leg16.camera.it/207. The web scraping was performed relying on the Python’s package Beautiful Soup. 9Monti was initially supported by both the two main Italian parties (the and Berlusconi’s Peo- ple’s Freedom Party), and this means that two previously opposing forces would start a radically new political phase in which they supported the same government.

6 frequently would have increased significantly the noise in our estimates and compromised the validation strategy we are using. Our corpus includes all the official sessions of the legislature, including all the secondary discussions that concerned the legislative activities (decree presentations, discussion of amend- ments, final debates and voting statements). We excluded the works of the various commit- tees, Q&A sessions with the members of the government, as well as the parliamentary inves- tigations. Overall, we count the transcripts of 739 parliamentary sessions that correspond to 18,356 single interventions. All the interventions of the President of the Chamber (On. Gian- franco Fini) were removed from the corpus. For each transcript, we harvested information regarding the presenters’ names, the date and the session of the intervention, and the party affiliation of the presenter. In this way we were able to list the legislative speeches by MP, leading to a final dataset of 518 documents (not all the MPs made at least one intervention, see AppendixA for a summary list of MPs by number of interventions).

3 Analysis and preliminary results

This section introduces the main findings of our analytical effort. It is organized in three subsections. In the first place, we realize a preliminary test of ideological scaling of non- policy text. This is motivated by the fact that the WORDFISH algorithm is typically applied to manifesto documents containing explicit and (questionably) exhaustive or at least systematic text regarding the main policy positions of political parties and movements. Our application deviates from these applications, as out text corpus does not contain policy positions. Subsection 3.1 shows that it is indeed possible to detect latent political positions even if the political text does not directly involve policy positions. Next, in subsection 3.2 we apply the text scaling algorithm to the corpus of TV news transcripts and report details on the es- timation process and as well as on the main results. We anticipate that the estimates reveal systematic differences between the main Italian TV channels that are compatible with our prior knowledge on the ideological leanings in the Italian media system. Finally, subsection 3.3 investigates the two potential mechanisms of transmission of the political signal: strategic issue selection works through systematic differences in the amount of time devoted to distinct issues between the TV outlets; issue framing operates instead through the usage of systemati- cally different words to present the same issue. Our findings seem compatible with the issue selection mechanism while no supportive evidence is found for the framing mechanism.

3.1 A preliminary test of text scaling using parliamentary de- bates, Italy 2008-2011

We present a preliminary analysis of the Italian Chamber of Deputies’ parliamentary debates. This scaling exercise has two purposes. First, it will let the reader familiarize with WORDFISH and the process of latent scores’ estimation. Second, it shows that is possible to extract face valid latent positions from non-policy text. The “parliamentary-text test” is a particularly harsh one for text scaling. In fact, par- liamentary debates typically involve lengthy procedural discussions; debates involving ref- erences to legislative measures rather than explicitly to political issues and positions; the

7 Figure 2: Box plot of political groups based on scaling of MP interventions, Italy 2008-2011

language that is spoken in the parliament does not privilege clarity and simplicity, rather it inevitably involves articulated considerations, references to previous debates or events, am- biguous statements, and a generally allegorical and sophisticated style of communication10. Thus, our expectation is that if we will observe systematic differences among parliamentary groups based on the language that the single MPs have been using during the debates, then it will be possible to extract political positions also from other sources of non-policy text as well. We pretreat the corpus of legislative speeches with a procedure that will be detailed in subsection 3.2. This pretreatment involves the removal of all punctuation and characters and numbers, the reduction of the words to stemmed tokens11, the computation of a list of uni- grams (e.g. ‘govern’, ‘berluscon’) and bigrams (e.g. ‘govern_berluscon’), and the removal of Italian stopwords12. When all this text preparation steps are fulfilled, we convert the documents of MPs speeches into a document-feature matrix presenting all the tokens that are identified in rows and the documents ordered in columns. The entries of the data-feature matrix are thus word frequencies (i.e. absolute word counts for each document). The results of our scaling exercise are presented as divided according to the political groups in the lower chamber. The Box Plot in Figure2 represents the documents’ (or MP- specific) omega scores that were computed with WORDFISH on the corpus of the interventions at the Chamber of Deputies.

Results range from an average estimated position of ωIdV = −0.782 for the group of the

10The language employed by Italian politicians is known in the press jargon as “politichese”. 11Stemming is the process through which every word in the text (e.g. conjugated verbs, plural substantives, de- rived forms) are reduced to their root form. A related process is lemmatization, that reduces the words to their 0 0 0 morphological root. For instance, stemming the vector of Italian words {‘governo , ‘governare , ‘governativo } 0 0 0 we obtained the stemmed vector {‘govern , ‘govern , ‘govern }. 12Stopwords are those words that are functional to the creation of syntactic structures of the text. They are omit- ted because unrelated to the content.

8 (IdV), to the ωLN = 0.324 estimated for the group of the Northern League (LN). The value of the People’s Freedom Party is very close to the one of LN, with a value of

ωPdL = 0.318. This makes sense considering that the two parties were government partners. All the opposition political groups display negative average omega values, as it could be rea- sonably expected. Also, the ranking of the opposition groups follows our prior expectation. The most left-wing political group appears to be the Italy of Values (with the recalled value of −0.782). The IdV used to be a centrist, anti-corruption movement. However, after the 2009 European Parliament elections, the party undertook a populist turn and strengthened the relationships with the parties of the radical left 13. The party also led to the election of a left-wing major in Naples (Luigi De Magistris) in alliance with the Federation of the Left, and against the Democratic Party. Considered the historic absence of a Communist or Socialist po- litical group in the XVI legislature, we can reasonably think of the IdV as the most left-wing political group, or at least the most neatly opposed to Berlusconi and his government. Between the two poles of the IdV and the groups supporting the government, the algo-

rithm ranks respectively Future and Freedom (ωFLI = −0.259), the Democratic Party (ωPD =

−0.233), the mixed group of non-iscrits (ωMisto = −0.230), and the centrist Union of Cen-

ter (ωUdC = −0.085). Future and Freedom was a liberal-conservative group and a negative score may seem unreasonable. Yet, we believe that a negative score is indeed a meaning- ful one. First, the high dispersion of MPs’ scores seems to suggest substantial uncertainty around the position of FLI. Second, this is the party of President Gianfranco Fini’s followers. FLI was created after the split from the Peoples’ Freedom Party due to the recurring critiques to the Berlusconi’s government. Famously, Fini defended more progressive stances on social issues such as immigration (defending the right to vote for resident immigrants at the local elections14) demanding a stronger role of his faction within the PdL. It is thus likely that the latent scores computed through parliamentary debates are indeed better capture the degree of opposition towards the government rather than an overall ideology score. We read the results of this preliminary test as evidence that WORDFISH is able to estimate meaningful latent scores from non-programmatic political texts such as the transcripts of the parliamentary debates. In the next section 3.2 we will apply the scaling algorithm to our corpus of TV news transcriptions.

3.2 Scaling the fourth estate: extracting latent positions from TV news, Italy 2008-2014

The Italian TV news corpus consists of a total number of 319,895 recorded news stories15 broadcast from the main Italian TV news programs: TG1, TG2, TG3, TG4, TG5, Studio Aperto and TG7. In this subsection we offer a detailed account of the data preparation and estima- tion process and discuss the main findings. In the next subsection 3.3 we investigates the mechanism of TV programs’ political leaning.

13The more centrist faction of IdV split from the party in November 2009. 14“Fini: ‘Sì al voto agli immigrati’ ”, Il Sole 24 Ore, 04 September 2008. Link here. 15The initial total number of news stories is 412,039, but we remove a number of news stories for which the date is not available.

9 Data preparation

The analytical task requires a number of preliminary steps. First, we subset the news stories to include only politically relevant topics. To this end we exclude from the corpus the fol- lowing news topics: sport (we thus exclude also football news16), music, shows, culture and science news, whether and natural disasters, and reported cases of death or illness of impor- tant people. This leaves us with 276,266 valid news transcripts to consider in the analysis. Secondly, we undertake a number of operations that are required for the estimation. In fact, text analysis with WORDFISH, as with other “bag of words” approaches, involves converting the corpus of the raw documents into a document-feature matrix of discrete occurrences of all the tokens in all the documents. We place extreme care in the creation of the data-feature matrix, because it represent the core of the estimation process. For all these pre-estimation operations we rely on the tm R package17. The sequence of steps that led to the creation of the data-feature matrix is the following: 1. We merge all the news stories’ text by week for each TV channel. This effectively shifts the unit of analysis from the single news story to the [TV channel × week] level. An alternative aggregation scheme could have been the single TV news edition (three per day) or the single day (merging the three editions). Yet, the weekly solution in our opinion is more efficient in that it does not imply a dramatic loss of information, while it substantially shrinks the number of parameters to be estimated. This, in turn, con- siderably reducing the estimation time. 2. Italian language includes apostrophes, thus conventional punctuation removal tools 18 would behave stripping off the apostrophe and linking the leading character to the pre- vious one19. We thus preliminarily apply a regular expression to substitute all punctu- ation characters not with an empty character, but with a single space character instead. 3. We strip all extra spaces from the text documents. 4. We transform all text into lower case characters. 5. We remove stopwords in the text20.

16Although one may argue that the coverage of Berlusconi’s football team, A.C. Milan, may not be considered politically irrelevant. 17The quanteda package would have been a valid alternative. We opt for tm because it allows tokenization to follow stopwords removal and this leads to faster preprocessing, while quanteda preliminarily requires tokenization in order to preprocess text. 18Such as the removeP unctuation function from the tm R package that we use. 19This works on English text corpora because apostrophes indicate possession and are thus always following the word, as in “the professor’s hat”. The punctuation removal would result in “professors hat”, and once the words are stemmed the additional “s” character would be removed as well. In the Italian language apostrophes very often are associated with the elision of the article when the following word starts with a vowel. Thus, the punctuation removal of “l’amica” [the female friend] and “un’amica” [a female friend] would wrongly result in two different tokens: “lamica” and “unamica”, and this would be unaffected by the following stemming process. 20We have created a custom Italian stopwords list composed as the union of: 1) all the snowball stopwords list that are typically included in the most common R packages (as in tm and quanteda); 2) all the words included in the Ranks NL stopwords list; 3) the following list of 34 additional stopwords chosen after visual inspection: {l, poi, far, quest, qual, tant, quel, dic, so, quell, avev, piu, fa, vorrebb, gia, puo, s, sar, d, nun, ce, n, foss, x, b, va, ogni, vuol, andar, propr, fatt, vann, www, fonte}.

10 6. We remove numbers in the text. 7. We stem the document using the Porter’s stemming algorithm. 8. We create the document-feature matrix including all the unigrams and the bigrams in the text corpus. 9. We remove the top 5% and the bottom 5% tokens by frequency21. This shrinks the number of tokens from 29, 040 to 26, 172. The final document-feature matrix has thus dimensions [26172 × 881], with 26, 172 single tokens (unigrams and bigrams) and 881 documents (one for each TV channel per week) and matrix entries represented by the absolute frequency of tokens’ occurrence. The TV news reported range from the week starting on the 26 of July 2010 until the week starting on the 30 of September 2014.

Results

We run the WORDFISH’s iterative EM algorithm until it reached a tolerance threshold of 1e − 722. The results of the estimations are reported in Figure3. Every dot represents a weekly TV program’s omega score. The plot shows on the horizontal axis the date of the specific transcript, and the vertical axis expresses the estimated omega scores for each [TV program × week] document. We apply a LOESS smoothing filter together with a 99% confidence band to emphasize the trend of the estimates for each TV program. The plot points to three important results. First, we can identify differing central ten- dencies and trends in the series of omega scores, given that the confidence intervals mostly do not overlap. This means that indeed the scores signal the presence of systematic differ- ences in the word usage. Secondly, we observe a quite consistent and face valid ranking of the omega scores, that ranges from the position of Studio Aperto and Rete 4 on one pole, to the positions of TG3 and of TG7 on the opposite side of the identified latent space. This links the previous point to the political world. Because even if the latent scores based on the reported news are not direct estimates of the ideological slant of TV channels (because our corpus does not include programmatic policy statements), the fact that the three Mediaset (i.e. Berlus- coni) owned news programs appear on the positive-omegas pole, while the independent TG7 and the historically left-leaning TG3 on the negative-omegas side, provides a strong signal that our TV omega scores are indeed highly correlated with the underlying political slant. Indeed, the Italian Authority for Communications (AGCOM), using very different criteria23, in October 2010 warned Tg4, Studio Aperto, and TG1 for excessive political unbalance and disproportionate visibility of right-wing political leaders24. While we will deal with TG 4 and

21This is a standard procedure in quantitative text analysis that aims at cutting non-informative long tails in the distribution of words. 22To scale up WORDFISH to estimate latent positions in a large data setting we run the model on the EUI HPC cluster. We run the model using the wordfish function in R. The quanteda’s textmodel function could have been a viable alternative. The total estimation time is of 3.93 hours. 23Their judgement was only based on the direct presence of politicians on the various news programs. The reader should thus notice that since all politicians’ interviews have been removed, our corpus would represent a case of ‘perfect balance’ in the information adopting the standards of AGCOM 24‘Telegionali e pluralismo, ecco i dati’, Corriere della Sera, 21 October 2010 [link here]; ‘Dall’AGCOM arriva diffida al TG1: “Forte squilibrio a favore del governo”. Richiamo al TG4 e a Studio Aperto’, Corriere della Sera, 21 October 2010 [link here]

11 Figure 3: Wordfish scores of Italian TV news programs, 2010-2014

Studio Aperto thoroughly in the following sections, the case of TG 1 will be analyized in Sub- section 3.4. Finally, we notice a trend of growing differentiation over the years. We argue that the growing differentiation over time can be traced to fact that “hard news” programs devoted more coverage to the economic crisis, as will be shown in 3.3. The representation of the TV-specific trends in the omega scores seems to indicate the presence of a differentiated media content supply in Italy. TV news outlets in Italy appear to be using systematically different words, which leads us to think that TV channels in fact discuss about different topics, or about different aspects of the same topics. We will provide more detail on this two potential explanations of the WORDFISH scores divergence in the next Section (3.4). To better understand the contente of the longitudinal shifts that we observe for all the news programs, it can be useful to inspect the words-specific parameters (i.e. the βs) that are associated with the two poles of the latent space identified in the estimation process. Table1 shows respectively the list of the 25 more tightly linked to the pole of positive values of the latent scores, and the 25 words most connected with the opposite pole of negative omega values. The full distribution of words can be visually inspected in AppendixB. We notice that words associated with positive and negative beta scores refer to very dif- ferent kind of news content. Positive scores are associated with crime stories (francesc_mort, mort_yar) or accidents reports (foll_veloc), while negative scores are associated with hard news regarding political (i.e. ricandidatur, impegn_europe) and economic (e.g. stagnazion, miliard_- men) issues. This result may lead to the conclusion that more right-wing (that is of the same leaning as the government) networks supply softer and less politically-charged topics. We can conjecture that this could be functional to downplaying the saliency of more thorny or difficult issues. The next Subsection 3.3 will also address this point.

12 Table 1: List of top 25 words associated respectively with positive and negative beta scores

Tokens b psi Tokens b psi esperient_govern -1.50 -2.11 benven_stud 1.55 -3.52 ex_vertic -1.50 -2.46 angel_machiavell 1.18 -3.11 econom_grill -1.50 -2.05 gabriell_simon 1.02 -2.12 crosett -1.50 -2.27 ser_grad 1.02 -3.16 merc_unic -1.50 -2.73 machiavell 1.01 -2.60 interess_deb -1.51 -2.65 rem_croc 0.98 -3.01 vot_test -1.51 -2.63 ser_retequattr 0.94 -2.85 risors_destin -1.51 -2.80 francesc_mort 0.92 -3.21 resping_mittent -1.51 -2.42 yar_stat 0.83 -2.32 intes_riform -1.51 -2.42 apert_sent 0.83 -3.04 ricandidatur -1.51 -2.42 luc_pesant 0.79 -2.53 posit_arriv -1.52 -2.77 carmin_martin 0.79 -2.94 impegn_europe -1.52 -2.68 andiam_rom 0.79 -3.03 segretar_carrocc -1.52 -2.49 lecces 0.76 -2.40 tropp_alti -1.52 -2.42 prim_scompars 0.74 -2.79 confin_turc -1.52 -2.44 ser_stud 0.74 -2.70 commission_barros -1.52 -2.42 massim_canin 0.72 -2.57 intes_part -1.52 -2.34 massimil_dio 0.67 -2.56 acquist_ben -1.52 -2.13 mort_yar 0.62 -2.10 miliard_men -1.52 -2.32 giorn_stud 0.61 -2.83 azion_maggior -1.52 -2.57 luis_ross 0.58 -2.20 stagnazion -1.52 -2.41 feder_gatt 0.58 -2.28 ex_capogrupp -1.53 -2.39 bepp_gandolf 0.58 -2.34 govern_grec -1.53 -2.33 foll_veloc 0.55 -2.87 men_previst -1.53 -2.74 marc_graz 0.53 -2.56

3.3 Strategic issue selection or framing: assessing the mecha- nism of media political signal’s transmission

The previous section showed the presence of consistent and systematic differences in the language used by the Italian news programs and connected these differences to their political coordinates. Yet, it remains unclear whether the political leaning of the news is associated with differences in the content (i.e. what is being told to the audience) or whether the scores are driven by differences in the news framing (i.e. how the news are presented). To address this point, we run WORDFISH on two different subsets of the original TV corpus. In both cases, we focus our attention to the time period in which the Berlusconi IV cabinet was in charge (thus from the summer 2010 until October 2011). In the first subset we only consider economic news, while in the second we only consider the reports of news related to law and order (including a set of crime news regarding immigration, murders, terrorism, and other generic news stories unrelated to politics and the economy). The idea at the center of this design is that in case we are still able to identify systematic differences among news programs when the same issue is covered, then this would imply that the political signal is being channelled through the news frames rather than through the issue saliency.

13 3.3.1 The economy in the news

In order to investigate issue framing we thus focus on two central subjects and exploit the text analysis features to detects the use of different language and especially of qualifiers. The results of this second analysis are reported in Figure4. We notice a remarkably different pattern in the scaling of economic news with respect to the case of the entire corpus. In face, we can immediately notice that the variation in the omega scores does not occur between-programs, but rather over time. In fact, we observe a radical movement toward more negative omega values for all the news programs considered, and this corresponds to the worsening of the financial crisis in the second half 2011.

Figure 4: Wordfish scores of Italian news programs (economic news only), 2010-2011

Similarly to what we did before, to understand the substantive content of the latent “eco- nomic news” scores, AppendixD reports the list of words associated with the highest and lowest β discrimination values. Upon inspection, we notice that the “left” pole of negative values and the “right” one of positive scores respectively deal with problems and issues in the financial crisis, and with more standard and traditional industrial policy issues. Having found no systematic differences in the economic news’ frames of reference, we turn our attention towards the coverage of economic news. The idea is that Mediaset news programs may strategically downplay the importance of economic issues, given the bad news and the fact that Berlusconi in our time frame is incumbent. If this is observed, then we may conclude that it is not how the news are reported, but rather what kind of news are covered that explains the political signals. This view is indeed is corroborated by the evidence provided in Figure5. This graph repre- sents the cumulative airtime devoted to economic issues by network. We can indeed observe how economic issues are more intensely covered on the news programs previously shown as associated with “left-leaning” omega scores in the full model. Differently, Mediaset channels

14 appear to have silenced economic issues during the Berlusconi government, as they consis- tently underreport the financial crisis compared to other TV channels.

Figure 5: Cumulative airtime (in days) devoted to economic news

This provide suggestive evidence of strategic selection of news topics by political affili- ation, where networks ideologically closer to the right-wing government were strategically attempting to decrease the saliency of ‘uncomfortable’ issues related to the financial crisis.

3.3.2 Crime stories in the news

If the conjecture corroborated by the findings in the previous subsection is valid, then we should also observe the same “strategic issue selection” mechanism at work for political is- sues that are owned by the Italian right (i.e. favourable). Thus, in this section we provide similar evidence for crime news. If our argument is correct, we expect Mediaset channels to privilege news stories associated with the traditionally ‘right-wing’ issue of law and order: crimes, murders, immigration, and terrorism news reports. In the first place, we again investigate whether the TV frames of reference for such crime stories are presented in a similar or diversified fashion. Figure6 shows that there are no detectable framing effects also for crime news. Then, we again turn our attention toward the cumulative airtime, this time referring to crime news. We find striking to find that the patterns of coverage presented in Figure7 are the mirror-image of those previously presented with respect to the coverage of the economy. We find this evidence consistent with our previous argument. While there are not significant differences in the way (how) different newscasts reports either economic news (see Appendix C andD for evidence on omega scores and representative words) or crime news and current affairs (refer to AppendixE andF), we find a complementary pattern with respect to coverage.

15 Figure 6: Wordfish scores of Italian news programs (crime news and current affairs only), 2010- 2011

Figure 7: Cumulative airtime (in days) devoted to crime news and current affairs

To recap, in this section we have investigated two potential mechanism of transmission of the political signal through the news. Strategic issue selection involves downplaying the issues that are more political uncomfortable for the referent political actors and simultane- ously to overemphasize those who could potentially strengthen the electoral support of the

16 political referent group. Indeed we show this empirically, and our results point to the validity of Agenda Setting theories and the relevance of issue saliency for media and political actors. Differently, we find no empirical traces of a systematic framing power of the media, as we find that for a given topic, the language that is used is basically not differentiated. In the final empirical subsection 3.4 we leverage on the longitudinal variation of our omega scores to strengthen the internal validity of our estimates.

3.4 Structural breaks

In this subsection we investigates the WORDFISH scores from a time-series perspective, im- ing at identifying meaningful structural changes in the omega scores that are related to real world events. We have already related variation in the language used by the news programs to report crime and economic news, but this largely relied on anecdotal evidence. In this subsection, we more systematically link structural changes in the omega time-series to doc- umented real-world events. Again, we consider the cases of economic and current affair news. Finally, we shift our focus from all the TVs to the major (and first) Italian public TV: RAI1. Our structural breaks’ analysis shows that a break in the language adopted by the TG1 news programs occurred when Director Augusto Minzolini, notoriously a Berlusconi supporter25, was led to abandon the direction of TG 1. We use the changepoint R package to automatically identify the optimal positioning and the number of breaking points in the series.

3.4.1 Structural breaks in the economic news

Once we run the algorithm of structural change detection on the series of omega scores iden- tified for the subset of economic news, we identify one break, occurring in the week starting on the 4th of July 2011. Figure8 graphically represents this break. Indeed, the the financial crisis hit Italy generating a sudden increase in the German 10y yields of the BUND/Italian 10y BTP yield’s ratio on the 9 of July 201126. This corroborates our previous understanding of the economic news latent scores as ranging between ‘phys- iologic’ economic times, where news reports center on industrial policy, and the economic crisis, where financial issues become prevalent on the media. Figure9 further reinforces this finding showing the titles of the cover page of the main Italian paper, the Corriere della Sera, before and after the identified structural break.

3.4.2 Structural breaks in the crime and current affairs news

The same exercise is now repeated for the subset of omega scores produced for all the TV news programs with respect to crime news and current affairs. Once we run the break point analysis, we are able to identify two breakpoints on the weeks starting respectively on the

25The nomination of Augusto Minzolini was not supported by the center-left members of the RAI board, who left the board room at the moment of the vote and later called a press conference decscribind the appointment “un- acceptable” supporting information. The Italian Communication authority in April 2010 also warned Augusto Min- zolini for excessive unbalance and lack of pluralism supporting information. Finally, the AUDITEL data docmented a decrease in the audience share of TG1. See also footnote 24. 26‘Us Hedge Funds bet against Italian bonds’, Financial Time, 10 Juy 2011 [link here].

17 Figure 8: Structural breaks in the omega series (all TVs, economic news, 2010-2011

Figure 9: Cover page of Corriere della Sera across the structural break in economic news’ scores

(a) CdS before the structural break (08-07-2011) (b) CdS after the structural break (11-07-2011)

7 of February 2011, and on the 4th of April of the same year. Figure 10 presents these two structural breaks (all the TV news programs are considered for the period in which Berlusconi government is incumbent). We realize that indeed the two breaks delimit the peak of the Egyptian Revolution of 2011,

18 Figure 10: Structural breaks in the omega series (all TVs, crime news, 2010-2011

as Hosni Mubarak resigns on the 11 February 2011. Figure 11 further provides evidence to this claim presenting the titles of Corriere della Sera on the corresponding dates.

Figure 11: Cover page of Corriere della Sera across the structural break in current affairs news’ scores

(a) CdS before the structural break (10-02-2011) (b) CdS after the structural break (11-02-2011)

19 3.4.3 The case of TG1: structural breaks and director’s turnover

Having become more confident about the ability of WORDFISH to identify real changes in the underlying political and economic conditions, we run a final test to check whether we are also able to identify a “structural change” in the direction of the news programs. In particular, we have recalled the role played by Augusto Minzolini as government supporter at the time he was directing the main Italian news programs: the TG 1. In this subsection we thus run the structural break detection algorithm on the time-series of omega scores computed for TG1 in the main model that considered all the news covered. As shown in Figure 12, we could identify one structural break in the series in the week starting on date 17 January 2011.

Figure 12: Structural breaks in the omega series (TG1 only, 2010-2011)

Once we superimpose on the plot the dates in which the three TG 1 directors considered in the timespan of the analysis took charge, the association of the structural break in the over- all omega scores of the news program and the resignation of Augustion Minzolini becomes evident. We provide it in the dedicated Figure 13. The timing of the events lends support to the ability of the algorithm to detect changes in the political leaning of a news outlet. As already pointed out in Section 3.2, in October 2010 the TG 1 received a warning from the Com- munication Authority for strong imbalance in the airtime presence of politicians, favouring the right-wing government. In December 2010, Minzolini is dismissed from the direction of TG 1 and shortly after a new director (Alberto Maccari is appointed). The red vertical line in Figure 13 signals the official start of the Maccari direction, but the moment Minzolini was dismissed is actually shortly antecedent to the structural change we identify.

20 Figure 13: Structural breaks in the omega series (TG1 only, 2010-2011)

4 Conclusions

This paper provides a number of new results. Italian Television networks content is differ- entiated consistently across networks. Historically more left-leaning networks tend to focus on serious news content, like the state of the economy and political affairs. On the other hands, networks historically associated with the conservative parties provide more empha- sis to chronicles, gossip and crime news. We therefore find robust evidence of strategic issue selection throughout our dataset, but scant evidence of differential issue framing. Our results are consistent with previous work on newspaper ideology. Our study differs in the fact that, given our setting, we are able to determine media bias as a consequence of media capture. This results raise the welfare concerns that are typically associated with media bias. While a degree of differentiation across network in news content is desirable as it may be a mechanism to ensure pluralism, in this case it may be problematic as deviation in the ideological score are not driven by viewers demand but by political capture. As a re- sult, we might anticipate a number of negative political outcomes, like growing ideological segmentation and increasing polarization in the audience.

21 References

Baron, David P. 2006. “Persistent media bias.” Journal of Public Economics 90(1):1–36.

Besley, Timothy and Andrea Prat. 2006. “Handcuffs for the Grabbing Hand? Media Capture and Government Accountability.” American Economic Review 96(3):720–736. URL: https://www.aeaweb.org/articles?id=10.1257/aer.96.3.720

DellaVigna, Stefano, Ruben Durante, Brian Knight and Eliana La Ferrara. 2016. “Market-Based Lobbying: Evidence from Advertising Spending in Italy.” American Economic Journal: Applied Economics 8(1):224–256. URL: http://pubs.aeaweb.org/doi/10.1257/app.20150042

Djankov, Simeon, Caralee McLiesh, Tatiana Nenova and Andrei Shleifer. 2003. “Who Owns the Media.” Journal of Law and Economics XLVI(October):341–381. URL: http://scholar.harvard.edu/files/shleifer/files/media.pdf

Duggan, J. and C. Martinelli. 2011. “A spatial theory of media slant and voter choice.” Review of Economic Studies 78(2):640–666.

Durante, Ruben and Brian Knight. 2012. “Partisan Control, Media Bias, and Viewer Responses: Evidence From Berlusconi’s Italy.” Journal of the European Economic Association 10(3):451–481. URL: http://doi.wiley.com/10.1111/j.1542-4774.2011.01060.x

Eyheramendy, Susana, Susana Eyheramendy, David D. Lewis and David Madigan. 2003. “On the Naive Bayes Model for Text Categorization.”. URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1365

Gentzkow, Matthew and Jesse M. Shapiro. 2010. “What Drives Media Slant? Evidence From U.S. Daily Newspapers.” Econometrica 78(1):35–71. URL: http://doi.wiley.com/10.3982/ECTA7195

Grasso, Aldo. 2004. Storia della televisione italiana. Milano, Italia: Garzanti.

Groseclose, T. and J. Milyo. 2005. “A Measure of Media Bias.” The Quarterly Journal of Economics 120(4):1191–1237. URL: http://qje.oxfordjournals.org/cgi/doi/10.1162/003355305775097542

McMillan, John and Pablo Zoido. 2004. “How to Subvert Democracy: Montesinos in Peru.” Journal of Economic Perspectives 18(4):69–92. URL: https://www.aeaweb.org/articles?id=10.1257/0895330042632690

Mullainathan, Sendhil and Andrei Shleifer. 2005. “The Market for News.” American Economic Review 95(4):1031–1053. URL: http://pubs.aeaweb.org/doi/abs/10.1257/0002828054825619

Prat, Andrea. 2015. “Media Capture and Media Power.” Handbook of Media Economics Vol. 1B(2002):669–686.

22 Prat, Andrea and David Stromberg. 2013. “The Political Economy of Mass Media.” Advances in Economics and Econometrics Tenth World Congress, Volume 2: Applied Economics pp. 135–187.

Reinikka, Ritva and Jakob Svensson. 2005. “Fighting Corruption to Improve Schooling: Evi- dence from a Newspaper Campaign in Uganda.” Journal of the European Economic Association 3(2-3):259–267. URL: http://doi.wiley.com/10.1162/jeea.2005.3.2-3.259

Slapin, Jonathan B. and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating Time- Series Party Positions from Texts.” American Journal of Political Science 52(3):705–722.

Strömberg, David. 2015. “Media and Politics.” Annual Review of Economics 7(1):173–205.

Tella, Rafael Di and Ignacio Franceschelli. 2011. “Government Advertising and Media Cover- age of Corruption Scandals.” American Economic Journal: Applied Economics 3(4):119–151. URL: https://www.aeaweb.org/articles?id=10.1257/app.3.4.119

Zaller, John. 1999. A Theory of Media Politics. University of Chicago Press. URL: http://www.sscnet.ucla.edu/polisci/faculty/zaller/media politics book .pdf

23 A Appendix - List of top 25 and bottom 5 MPs included in the scaling of parliamentary speeches by number of interventions

MP name Political group Number of interventions 1 Roberto Giachetti Partito Democratico 340 2 Antonio Borghesi Italia Dei Valori 277 3 Fabio Evangelisti Italia Dei Valori 261 4 Erminio Angelo Quartiani Partito Democratico 225 5 Mario Tassone Unione Di Centro Per Il Terzo Polo 215 6 Simone Baldelli Popolo Della Liberta’ 207 7 Angelo Compagnon Unione Di Centro Per Il Terzo Polo 170 8 Federico Palomba Italia Dei Valori 152 9 Francesco Barbato Italia Dei Valori 150 10 Renato Cambursano Misto 148 11 Arturo Iannaccone Misto 141 12 Furio Colombo Partito Democratico 132 13 Pierluigi Mantini Unione Di Centro Per Il Terzo Polo 128 14 Marco Giovanni Reguzzoni Lega Nord Padania 126 15 Massimo Polledri Lega Nord Padania 123 16 Pier Ferdinando Casini Unione Di Centro Per Il Terzo Polo 122 17 Sergio Michele Piffari Misto 121 18 Massimo Donadi Misto 120 19 Carlo Monai Italia Dei Valori 119 20 Ivano Strizzolo Partito Democratico 119 21 Donatella Ferranti Partito Democratico 113 22 David Favia Misto 111 23 Amedeo Ciccanti Unione Di Centro Per Il Terzo Polo 110 24 Rita Bernardini Partito Democratico 108 25 Teresio Delfino Unione Di Centro Per Il Terzo Polo 108 ...

514 Michela Vittoria Brambilla Popolo Della Liberta’ 1 515 Piero Testoni Popolo Della Liberta’ 1 516 Pietro Lunardi Popolo Della Liberta’ 1 517 Sandro Oliveri Misto 1 518 Vincenzo Barba Popolo Della Liberta’ 1

24 B Appendix - Eiffel plot of the distribution of tokens for the Italian news programs (2010-2014

The following figure represent the words’ discrimination parameters computed by the W ordfish estimation performed on the entire corpus of the Italian TV news programs. The horizontal axis reports the β parameters, which estimate the weight of word in discriminating between the news programs’ positions, while the ψ scores, reported on the vertical axis, reports the words’ fixed effect, capturing the frequency of the words (with less frequent words typically having greater discrimination weight).

Figure 14: Word discrimination parameters for the corpus of Italian news programs

25 C Appendix - Eiffel plot of the distribution of tokens for the subset of economic news (2010-2011

The following figure represent the words’ discrimination parameters computed by the W ordfish estimation performed on the subset of economic news reported while the Berlusconi govern- ment was in charge. Interpretation is equivalent to the Eiffel plot in the previous appendix B.

Figure 15: Word discrimination parameters for economic news on the Italian news programs

26 D Appendix - List of top 25 words associated with eco- nomic news’ poles of the latent space

Table 3: List of top 25 words associated respectively with positive and negative beta scores

Tokens b psi Tokens b psi bors_stat -1.50 -4.83 quattord_gennai 5.45 -8.52 tagl_bilanc -1.50 -4.65 mirafior_referendum 5.39 -9.05 capovolg -1.50 -4.73 referendum_accord 5.19 -8.72 ital_aggiung -1.50 -4.83 regol_rappresent 4.92 -9.14 rating_unit -1.50 -5.19 fatt_accord 4.81 -8.72 mil_tutt -1.50 -4.49 accord_mirafior 4.80 -7.61 tempest_perfett -1.50 -5.34 referendum_futur 4.79 -8.47 arriv_guadagn -1.50 -4.19 fiom_cobas 4.75 -8.64 cattedr -1.51 -5.35 accord_separ 4.74 -7.88 chiud_perd -1.51 -4.94 cgil_fiom 4.71 -8.19 arriv_miliard -1.51 -5.06 conten_accord 4.60 -8.74 aggiorn_colleg -1.51 -4.43 nuov_fiat 4.53 -7.45 de_bortol -1.51 -5.06 esclusion_fiom 4.51 -8.63 voragin -1.51 -4.00 gennai_referendum 4.39 -8.33 tedesc_super -1.51 -5.06 accord_import 4.28 -8.05 pers_valor -1.51 -5.06 fia 4.20 -7.14 tremont_paregg -1.51 -5.35 fim_cisl 4.08 -7.94 percentual_iva -1.51 -4.84 debutt_bors 4.05 -6.96 moment_econom -1.51 -4.84 alluvion_venet 4.02 -8.02 rest_ben -1.51 -4.94 fiat_mirafior 3.94 -5.76 europ_scont -1.51 -5.35 ivec 3.83 -7.39 fin_prim -1.51 -4.58 alto_dicembr 3.78 -7.22 part_union -1.51 -4.10 ventott_gennai 3.77 -6.87 temporal -1.51 -4.58 videogam 3.73 -6.76 andat_bors -1.51 -5.07 fiom_accord 3.70 -7.64

27 E Appendix - Eiffel plot of the distribution of tokens for the subset of crime news (2010-2011)

The following figure represent the words’ discrimination parameters computed by the W ordfish estimation performed on the subset of crime news reported while the Berlusconi government was in charge. Interpretation is equivalent to the Eiffel plot in the previous appendixB.

Figure 16: Word discrimination parameters for crime news on the Italian news programs

28 F Appendix - List of top 25 words associated with the crime news’ poles of the latent space

Table 4: List of top 25 words associated respectively with positive and negative beta scores

Tokens b psi Tokens b psi trascin_div -2.00 -5.67 grad_accogl 6.58 -14.69 famos_german -2.00 -5.96 profug_dov 6.22 -13.33 fium_giorn -2.01 -5.97 emergent_accogl 6.19 -13.56 lasc_entrar -2.01 -5.68 accogl_cinquant 5.88 -12.46 anni_esatt -2.01 -5.56 situazion_igien 5.86 -12.88 purific -2.01 -5.46 dunqu_orma 5.80 -12.83 mai_screz -2.01 -5.82 trasfer_emigr 5.43 -11.88 iniz_vit -2.01 -6.16 marin_sammarc 5.42 -11.94 immens_dolor -2.01 -5.37 destin_migrant 5.34 -12.02 propr_cont -2.02 -5.98 arriv_region 5.33 -11.45 vib_valenz -2.02 -5.07 marcator 5.24 -11.43 cos_vuot -2.03 -5.83 equipagg_asso 5.06 -10.84 dov_spegn -2.03 -5.48 termin_consigl 5.05 -11.50 disag_viaggiator -2.03 -5.48 govern_sicil 5.02 -9.55 pegn_rivendic -2.03 -5.70 tripol_equipagg 4.96 -10.90 rivendic_mulin -2.03 -5.70 attracc_nav 4.95 -10.67 doman_sempr -2.03 -5.70 maron_region 4.94 -10.57 salv_gett -2.03 -5.84 vers_piattaform 4.92 -10.17 motovedett_riusc -2.03 -5.48 mezz_sbarc 4.90 -11.01 temperatur_picc -2.03 -5.48 immigr_dic 4.87 -10.95 gest_protest -2.03 -5.30 quarant_immigr 4.82 -10.83 parol_parol -2.03 -5.02 isol_sempr 4.77 -10.46 scors_infatt -2.03 -5.71 don_dio 4.74 -10.29 impiant_riscald -2.04 -5.59 maron_frattin 4.70 -9.47 propr_stess -2.04 -5.85 central_sci 4.69 -10.00

29