<<

Assessing political news quality: An automated comparison of political news quality indicators across German newspapers with different modalities and reach

Nicolas Mattis Student number: 12283177

Research Master’s Thesis Graduate School of Communication University of Amsterdam Research Master in Communication Science Supervised by Dr. Anne Kroon Word count: 7,497 June 26th, 2020

Abstract In order to best perform their societal functions, news media must adhere to certain normative standards for news quality – especially when reporting about events with political significance. While various past studies have examined (political) news quality, they often differ in the indicators and operationalisations that they use, making it difficult to compare findings across studies. Hence, this thesis proposes a comprehensive framework for automatically measuring political news quality that is easily scalable and can be applied in various contexts as well as over longer timespans. It combines existing measures with newly developed classifiers that assess impartiality, thereby highlighting the potential that supervised machine learning has for journalism studies and providing a means for future studies to assess impartiality in an automated manner. Furthermore, this thesis generates new insights into differences in political news quality across German newspapers that differ in their reach (national vs. regional) and modality (online vs. offline). The results indicate that both modality and reach appear to affect newspapers’ performance in terms of political news quality indicators, even though these differences tend to not be particularly pronounced.

While especially online newspapers performed comparably worse in terms of indicators such as actor diversity, impartiality, and emotionality, the results suggest that modality and reach alone are not sufficient to explain differences across news outlets. On the whole, this thesis highlights the potential that automated research methods have for future research into

(political) news quality and urges scholars to employ and advance existing measures to provide a fuller picture of (political) news quality across countries, outlets and, maybe most importantly, over time.

Keywords: Automated content analysis, News quality, Impartiality, Diversity, Supervised machine learning

1

Introduction

Often referred to as the fourth estate, news media are widely regarded as crucial for well-functioning democracies (Jacobi, Kleinen-von Königslöw, & Ruigrok, 2016). Building on Locke (1967), Strömbäck (2005) argues that one can describe the relationship between news media and democracy as a social contract: Democracy creates the necessary conditions for news media to operate in, while news media contribute to democracy by providing relevant, high-quality information to both the public and the government, as well as by serving as a watchdog of a countries’ institutions. To live up to those standards and inform the public both accurately and fairly, news outlets need to adhere to certain normative news quality standards such as diversity and impartiality (Urban & Schweiger, 2014).

Naturally, this begs the question how well newspapers in a given media market adhere to such standards. While there is ample (comparative) research on different news quality indicators such as diversity, negativity, and objectivity (e.g. Burggraaff & Trilling, 2017;

Humprecht & Esser, 2018; Jacobi et al., 2016; Masini et al., 2018), studies often differ in their choice and operationalisation of these indicators. Hence, this thesis proposes a comprehensive framework for assessing news quality through an automated content analysis (ACA) by combining existing measures with newly developed classifiers that assess three key indicators of impartiality on the article level.

ACA constitutes an efficient and affordable research methodology for the analysis of large bodies of data (Grimmer & Stewart, 2013) that can be applied to journalistic content in both an inductive and a deductive manner (Boumans & Trilling, 2016). Given that the field of journalism studies tends to largely neglect automated research methods (Boumans & Trilling,

2016), this thesis hopes to a) drive the field methodologically forward – by illustrating the potential of supervised machine learning (SML) and moving beyond mere case studies - and b) facilitate future comparative research by providing a means to assess and monitor important news quality indicators in an easily scalable and resource-efficient manner. 2

On a theoretical level, this thesis addresses concerns over an overall decrease in journalistic news quality, that a number of scholars have voiced since new technological affordances and increased economic pressures have begun transforming traditional newspaper markets (e.g. Burggraaff & Trilling, 2017; Humprecht & Esser, 2018; Jacobi et al., 2016;

Jungnickel, 2011; McManus, 2009; Plasser, 2005). The underlying argument of those concerns is that the current transformation of the newspaper market results in a fierce competition for advertising revenue. In order to cope, newspapers attempt to boost their reach to attract advertisers, often at the expense of journalistic quality (McManus, 2009) – a process that scholars refer to as commercialisation (Jacobi et al., 2016) or tabloidization (Esser, 1999).

Commercialisation is often assumed to be especially pronounced in online news content (e.g. Burggraaf & Trilling, 2017). However, existent research on the effects of modality is inconclusive as some researchers have found evidence for lower news quality online (e.g. Burggraaff & Trilling, 2017; Welbers, Van Atteveldt, Kleinnijenhuis, & Ruigrok,

2018), whereas others have found no notable differences (Ghersetti, 2014) or even contradictory ones (e.g. Humprecht & Esser, 2018). Other important factors that might affect news quality are the structure of a given media market (Esser & Umbricht, 2013) and the size of a newspaper (Masini et al., 2018). For example, Masini et al. (2018) claim that local newspapers can allocate fewer resources to quality reporting, especially about events on the national level. In light of these considerations, this thesis compares political news quality across German newspapers with different modalities (online vs. offline) and reach (national vs. regional). By unravelling the effects that those factors have, this thesis hopes to add to existing research by providing a clearer picture of political news quality in .

In the following, this thesis will a) lay out the theoretical underpinnings of an automated news quality measurement framework, b) apply it to a sample of German newspapers, c) present the differences across newspapers with different reach and modalities, and d) close with implications of ACA and suggestions for future research. 3

Theoretical Framework

Political news quality and its indicators

What constitutes good political news? The answer to this question will likely depend on who answers it. As Urban and Schweiger (2014) argue, a journalist might judge an article’s quality by the effort that it took to produce, whereas a reader might simply judge it by how enjoyable it is to read. This study builds on McQuail’s (1992) notion of the

‘marketplace of ideas’ and takes a normative perspective on the quality of news accordingly.

Following Urban and Schweiger (2014), it posits that high-quality political news should provide accurate and impartial information that gives room to a wide variety of relevant actors and their positions in order to enhance the public’s understanding of important political matters as well as broader societal debates. This perspective builds on Strömbäck’s (2005) idea of a “participatory democracy”, the notion that citizens should (be able to) participate in all aspects of political life. Naturally, to do so effectively, citizens need to have access to high quality political information – not only during and before elections, but all year round.

Over time, various media and journalism scholars have spelled out the elements that constitute (political) quality news. For example, Jungnickel (2011) identified seven quality criteria, namely lawfulness, accuracy, relevance, comprehensibility, transparency, impartiality, and diversity, with various sub-dimensions. Urban and Schweiger (2014) propose a somewhat similar, yet slightly more parsimonious model with six quality criteria: diversity, impartiality, relevance, comprehensibility, accuracy, and ethics. Although many analyses of news quality indicators have relied on manual content analyses (e.g. Esser &

Umbricht, 2013; Masini et al., 2018; Ramírez de la Piscina, Gonzalez Gorosarri, Aiestaran,

Zabalondo, & Agirre, 2015), several of those indicators can be assessed through ACA. In fact, a few studies have already done so (Burggraaff & Trilling, 2017; Jacobi et al., 2016). ACA constitutes a valuable research methodology in journalism studies as it a) significantly reduces the cost of traditional content analysis, b) provides a means to test hypotheses on a larger 4 scale, and c) potentially might even reveal insights that more traditional methods have missed

(Boumans & Trilling, 2016). It also allows researcher to explore over-time developments with comparable ease. In the following, four core dimensions of an automated measurement approach as taken in this study, namely diversity, impartiality, emotionality, and comprehensibility are discussed.

Diversity

“[D]iversity in public affairs coverage is crucial because the news media are expected to create a mediated public sphere that reflects the diversity of interests, voices, and views in society” (McQuail 1992, as cited in Humprecht & Esser, 2018, p. 1825). However, despite a sharp increase in literature on the topic of news diversity, the concept’s exact definition remains contested (Humprecht & Esser, 2018). Furthermore, diversity can be assessed at different levels of analysis, such as on the article- or newspaper-level (Masini et al., 2018).

Despite these issues, most studies agree on two core dimensions: viewpoint diversity and actor (or source) diversity (e.g. Masini et al., 2018; Urban & Schweiger, 2014; Voakes,

Kapfer, Kurpius & Chern, 1996). While these dimensions are undoubtedly intertwined

(Masini et al., 2018), they differ in the granularity of their operationalisations. Viewpoint diversity is a multidimensional and context-dependent concept that often refers specifically to frames (e.g. Benson, 2009). Automatically measuring viewpoint diversity is therefore a considerable challenge that exceeds the scope of this project (for an attempt see Czymara & van Klingeren, 2019). Actor diversity in contrast is a more straightforward concept in that it merely measures the quantity and range of different sources. Often, a differentiation is made between elite and non-elite sources (e.g. Humprecht & Esser, 2018). Other studies examine the proportional representation of governing and opposition parties (e.g. van Hoof et al.,

2014). The logic underlying both approaches is that elite actors such as governing parties or their representatives are inherently more newsworthy and therefore covered more frequently than opposition parties or laypeople (Castells, 2009). 5

While the assumption that a greater variety of actors equals a greater variety of news does not necessarily hold true (Carpenter, 2010), actor diversity can have important implications for viewpoint diversity (Bennet, 1996), as it reveals to what extent different actors are given the space to shape public debates (Benson & Wood, 2015). In fact, Masini and van Aelst (2018) showed that actor and viewpoint diversity are strongly intertwined.

Hence, actor diversity can be considered a necessary precondition for viewpoint diversity.

Existing research into actor diversity points towards several medium-specific differences. For example, Masini et al. (2018) found that overall, national newspapers exhibit greater actor diversity than local newspapers - supposedly due to differences in capital, staff, and resources (for contradicting findings see Voakes et al., 1996). Regarding differences between modalities, Burggraaff and Trilling (2017) argue that commercialisation affects online news outlets more strongly, as they a) face a higher degree of competition within a dynamic and distraction provoking environment, b) exhibit a slightly different understanding of their journalistic roles, and c) profit from detailed insights into what types of articles generate the most attention that allow them to fine-tune news accordingly. Accordingly, they found that online newspapers to amplify differences between elite and popular news outlets.

Lastly, Jacobi et al. (2016) demonstrated that online news articles are more likely to focus on leaders and reference elites. Together, these insights motivate the following hypotheses:

H1: National newspapers exhibit greater degrees of actor diversity than local newspapers.

H2: The positive effect of national (vs. regional) newspaper types on actor diversity will be

more pronounced for print than online news.

Impartiality

The notion of impartiality emerged as a journalistic norm in the early 20th century (Boudana, 2016) and has been prominent among both media scholars and journalists 6 ever since (Maras, 2013). It is often equated with objectivity (Boudana, 2016; Maras, 2013), and remains one of the core principles that news editors and journalists around the world operate by (Maras, 2013). However, despite its popularity, impartiality still lacks a clear and agreed-upon definition and operationalisation (Cushion & Thomas, 2019). Prior examinations of impartiality have either examined journalists’ and editors’ selection processes (e.g.

Cushion, Kilby, Thomas, Morani, & Sambrook, 2018), or zoomed in on specific indicators such as the proportion of different sources or the use of and elaboration on statistics (Cushion,

Lewis, & Callaghan, 2017, Wahl-Jorgensen, Berry, Garcia-Blanco, Bennett, & Cable, 2017).

This thesis focuses on impartiality on the content level. It builds on Urban and

Schweiger’s (2014) definition of impartiality as “a neutral and balanced coverage of all facts, demands and positions” (p. 823). Accordingly, it employs three key indicators to assess impartiality by: neutrality, balance of viewpoints, and balance of sources. These dimensions are taken from Urban and Schweiger (2014) and lend themselves rather well to content analysis, as articles can be coded according to the presence or absence of each dimension.

In its purest form, balance is defined as “the allocation of equal space to opposing views”

(Cox, 2007, as cited in Wahl-Jorgensen et al., 2017, p.783). However, systematically balancing sources and viewpoints might still distort reality and introduce artificial balance

(Boudana, 2016). A good illustration of this is climate change journalism: Balancing believers and deniers, as has frequently been done in news media (Hiles & Hinnant, 2014), creates a false image of an open debate that is arguably worse than, for example, a “’weight of evidence’ approach” (Cushion & Thomas, 2019, p. 395). For this reason, this thesis operationalises balance in terms of whether or not an article gives room to challengers of the central actor. An article that does so arguably depicts at least a limited range of sources and viewpoints that a) constitute an attempt by the journalist to create a certain degree of balance, and b) expose citizens to a certain range of views. Neutrality refers to lack of evaluation by the author, which relates directly to the notion of an objective reporting style (Maras, 2013). 7

Given the various different operationalisations of impartiality, specific insights into differences in impartiality among German newspapers are still missing. Hence, a research question is formulated.

RQ1: Does impartiality differ depending on a) the modality (online vs. print) and b) the

type (national vs. regional) of newspaper outlets?

Emotionality & negativity

Various scholars have argued that an increased use of emotions might be one of the ways in which news media react to the economic pressures they are facing (e.g. Burggraff &

Trilling, 2017; Jacobi et al., 2016), as emotional news is more likely to grab people’s attention, therefore maximising readership and advertising revenue (Burggraff & Trilling,

2017; McManus, 2009). This thesis conceptualises emotionality as a bi-polar concept with positivity on the one, and negativity on the other side of the spectrum. Arguably, negativity has received considerably more scholarly attention than positivity. Negative information has repeatedly been shown to attract more attention and be better remembered (Knobloch-

Westerwick, Mothes, & Polavin, 2020; Soroka & McAdams, 2015). This so-called negativity bias provides a strong incentive for journalists to use negativity strategically in order to attract attention. While research also suggests a certain demand for it (Shoemaker & Cohen, 2006, as cited by Burggraaf & Trilling, 2017), similar effects cannot be claimed for positive news.

However, it can be argued that strongly positive news still deviates from the ideal of neutrality that is traditionally valued in political news (Jacobi et al., 2016).

Existing research into emotionality has shown that a) regional newspapers employ comparably much negativity (Boukes, & Vliegenthart, 2020), b) print news tends to be more positive than online news (Burggraaff & Trilling, 2017) and c) emotionality is less pronounced in online news (Burggraaff & Trilling, 2017; Jacobi et al., 2016) - potentially due 8 to higher reliance on agency material among online newspapers (Jacobi et al., 2016; Welbers et al., 2018). Taken together, these findings motivate the following hypotheses as well as an explorative research question that addresses emotionality across news outlets with a different reach:

H3a: Online news will feature more negativity than print news.

H3b: Regional news will feature more negativity than national news.

H4: Online news will feature less emotionality than print news.

RQ2: (To what extent) does emotionality differ between national and regional newspapers?

Comprehensibility

Especially in light of the vast amount of literature that suggests that the average citizen lacks a detailed understanding of politics (Lau & Redlawsk, 2001), it is easy to argue that in order to live up to its ideal societal role, news media needs to convey information in an understandable fashion. Although comprehensibility is determined by several factors such as coherence, conciseness or the use of additional stimuli (see Urban & Schweiger, 2014), this thesis focuses exclusively on readability. Readability refers to how easy or difficult it is to read a given text, thereby capturing quite closely what Urban and Schweiger (2014) term simplicity. Readability has been linked to newspaper circulation in Germany in the past

(Schoenbach & Lauf, 2002) as it constitutes not only a normative ideal to evaluate news by, but it also appears to be a factor that affects audience evaluation and readership (Humprecht

& Esser, 2018). Thus, readability constitutes an important aspect of comprehensibility that can be measured reliably. The readability of German dailies appears to be comparably high

(Björnsson, 1983), but the literature does not yet reveal generalisable differences between various types of outlets. Hence, potential differences are explored by means of the following research question. 9

RQ3: (To what extent) do German news media differ in their readability depending on a)

reach (national vs. regional) and b) modality (online vs. print)?

The Framework

Taken together, these quality dimensions result in a comprehensive framework for automatically assessing news quality (see Figure 1). The framework combines various quality criteria that are largely laid out by Urban and Schweiger (2014) (see Appendix A) and, despite being incomplete, allows establishing a benchmark for assessing news quality in a resource-effective manner.

Figure 1. Framework for automatically assessing news quality.

Methodology

This thesis combined several computational methods in order to tap into four indicators of news quality. Due to a lack of automated measurements for impartiality, a new measurement approach was developed through manual content analysis (MCA) and SML. For

10 the other three quality indicators, this thesis built on and partly adapted previous work (e.g.

Burggraaff & Trilling, 2017; Jacobi et al., 2016, Masini et al., 2018).

Sample

The final sample consisted of 11,491 political news articles that were gathered from six German newspapers’ online and print editions over a seven-week period between the 20th of April 2020 and the 8th of June 2020. 8,077 duplicated or incorrectly scraped articles were deleted from the initial dataset (N= 19,568). The newspapers had either a national (“Die

Welt”, “Die Süddeutsche”, “Der Tagesspiegel”) or a regional scope („Aachener Zeitung“,

“Rheinische Post“, “Stuttgarter Zeitung“). The national newspapers are usually referred to as elite newspapers (e.g. Masini et al., 2018). For the regional newspapers, a distinction between elite and popular is more difficult (Boukes & Vliegenthart, 2017) if applicable at all.

Importantly, the sampling was conducted during the height of the Covid-19 crisis. As a result, the article content might differ uniquely from comparable samples.

The German media market

Furthermore, it is important to consider two particularities of the German media market that might affect the results and their comparability to other studies. First German newspapers perform comparably well in terms of news quality, as they profit from strong levels of professionalisation and institutionalised self-regulation (Hallin & Mancini, 2004), a media culture that values the notion of a marketplace of ideas (Esser & Brüggemann, 2010) and a strong public broadcast sector that appears to have spill over effects on other media

(Humprecht & Esser, 2018). Moreover, the challenges brought about by declining readership, increased competition, and the internet are less pronounced in Germany than they are in many other countries (Brüggemann, Engesser, Büchel, & Castro, 2016).

Second, German regional newspapers are not per se localised, but often cover a wide range of topics and reach comparably high levels of readership (Humprecht & Esser, 2018).

In fact, regional newspapers constitute about 75% of the total market and even quality papers 11 such as the Süddeutsche “draw a large chunk of their readership from their […] area” (Esser

& Brüggemann, 2010, p. 40f). Hence, differences that have been found in other European countries might be less pronounced in the German market.

Data collection

All online content was gathered by means of RSS-scrapers within the inca infrastructure for automated content analysis (Trilling et al., 2018). All scrapers were written by the author prior to the data collection. The scrapers accessed each newspapers RSS-feed on an hourly basis and checked if new articles were available. If so, the key elements of each article (date, title, teaser, text, category, author) were downloaded, parsed, and stored in a database. However, due to server issues during the sampling period, only very few articles were scraped in the first month (see Figure 2). For the final sample of political online news articles (N= 1,072), only articles that were published in the politics section of a given newspaper were retained. The scrapers are available in a public GitHub repository together with the rest of the code that was run for this thesis (https://github.com/nickma101/Thesis).

Figure 2. Number of sampled online and print political news by publication date

Print articles were accessed through NexisUni, downloaded manually in sets of 100 articles at a time, and parsed with the LexisNexisTools package (Gruber, 2020) in R. Articles 12 were selected if at least one of the following terms was present in Nexis Uni’s classification section: politik, politische, politisch, partei, parteien, , , regierung, wahl, wahlen. Arguably, this sampling procedure resulted in a broader scope of articles than the category-based sampling for the online articles. Together with the server issues, this might explain the stark difference in the amount of print (N= 10,419) and online news articles (N=

1,072) in the sample. For an overview of the final sample distribution see Appendix B.

Data pre-processing

In order to remove unnecessary noise within the data, several data cleaning steps were performed prior to the hypothesis testing. All article texts were processed with the python packages SpaCy (Honnibal & Montani, 2017) and NLTK (Bird, Klein, & Loper, 2009) in order to remove duplicates, formatting errors, and articles that had not been scraped correctly

(e.g. because they were behind a paywall). In addition to that, a second version of the article text was created by removing stop words (words that are very frequent but not important for the meaning of a sentence) and reducing all words to their stems. This step was necessary to improve the accuracy of the emotionality analyses as well as the overall performance of some of the impartiality classifiers.

Independent variables

The two independent variables under study where the reach and the modality of a given news article. An article’s reach (M= .56, SD= .50) was determined by the newspaper that published it and assessed by means of a dummy variable that was coded as one for national and zero for regional newspapers. Similarly, an article’s modality (M= .91, SD= .29) was assessed by means of a dummy variable that was coded with one for print and zero for online articles.

Dependent variables

Following Masini et al. (2018), actor diversity was assessed as a count variable on the article level. This thesis differentiated four actor types, namely political elite actors, political 13 opposition actors, persons, and organisations. It thereby accounted for the frequency of not only different types of political actors, but also laypeople and non-political organisations. All actors were detected through SpaCy’s NER feature. If an entity that SpaCy had classified as a person was present in one of the manually created political actor lists (see https://github.com/nickma101/Thesis), it was coded accordingly. If not, it was coded as a generic person with no particular political significance. For each actor group, the overall number of references to their respective actors was calculated. Next, a dummy variable was created for each entity group with the value one, if at least one actor from this group was named and zero if not. Lastly, the four dummy variables were added together into an index that ranged from zero (no actor groups mentioned) to four (all actor groups mentioned) (M=

2.55, SD= .80).

Impartiality was defined as a balanced coverage of relevant sources and viewpoints in combination with an author that refrains from personal evaluation. As laid out in the theoretical framework, balance was operationalised in terms of whether or not an article gave room to challengers of the central actor. By using a definition that expects journalists to provide more than just a single view and source for a particular topic or standpoint rather than to achieve a (near-) perfect balance, this thesis hoped to avoid earlier mentioned fallacies of assessing balance.

Impartiality was assessed through three indicators: 1) The presence/absence of balanced viewpoints (“Is the standpoint of the central political actor challenged by another actor in the text?”), 2) the presence/absence of balanced sources (“Does the article quote two or more different types of political actors - e.g. a national elite and a national opposition actor?”), and 3) the presence / absence of evaluation by the author (“Does the author personally evaluate anything within the article?”). Added together, these indicators amount to a four-point impartiality index that ranges from a minimum of zero for not impartial, to a maximum of three for very impartial (M= 1.39, SD= .80). 14

Manual content analysis

Given the nuance that was necessary for assessing these indicators, dictionary-based measures did not suffice to accurately determine to what extent an article was impartial.

Hence, impartiality was assessed through a SML approach, where binary classifiers were trained on manually coded training material. The manual content analysis was performed by a set of four student coders, the researcher being one of them. Before the final coding, all coders received training and the codebook (see Appendix C) was amended in accordance with the problems that had emerged during this training. Overall, a total of 487 articles were coded into three binary categories. See table 1 in Appendix E for their distribution.

Intercoder reliability

Several intercoder reliability tests were performed to ensure a sufficient level of reliability. Overall, three different sets of articles (Datasets A, B, and C) were checked for intercoder reliability: a) a subsample (N= 25) of the initial print data (N= 250) for all coders, b) a subsample (N= 15) of the online data (N= 150) for two coders, and c) a subsample (N=

10) for a second set of print articles (N= 98) for another two coders. The indicator “neutrality” proved to be reliable across all datasets and coders with a Krippendorff’s alpha of .79 or higher and a Cohen’s Kappa of .60 or higher. However, the other two indicators were less reliable. For balance of actors, dataset A (α = .63) and dataset B (α = .61) failed to meet the recommended intercoder reliability threshold of .667 (Neuendorf, 2002). For balance of viewpoints, the same was true for dataset A (α = .53) For a detailed overview of all results see

Appendix D.

Given that SML relies on highly reliable data, coders who didn’t achieve acceptable results in a dataset were excluded from the training data. Specifically, coder 4 was excluded from the training data for balance of both actors and viewpoints, due to the comparably low

Cohen’s Kappa results (see Appendix D, table 1). For the same reason Coder 3 was excluded from the online training data for balance of viewpoints (see Appendix D, table 3). 15

Classifier training & prediction

The classifiers were trained in python using the sklearn package. To do so, the training data for each variable was split into a training (80%) and a validation set (20%). The article text was represented in the form of vectors. Specifically, four types of vectors were created and compared: 1) Count vectors, 2) Term Frequency-Inverse Document Frequency (TF-IDF) vectors with unigrams, 3) TF-IDF vectors with bigrams, and 4) TF-IDF vectors with both, uni- and bigrams. For each indicator, four different types of classifiers were tested: 1) a stochastic gradient descent classifier, 2) a naïve Bayes classifier, 3) a support vector machines classifier and 4) a k-nearest neighbour classifier. All classifiers were cross-validated and their hyperparameters were tuned using either grid-search or randomised search. Furthermore, all classifiers were trained on both the original and the clean text to compare their performance.

Table 1. Best text classification results for impartiality indicators

Indicator Classifier Text Vector type Categories Precision Recall F1 Balance of Stochastic Original Count 0 (N=204) .82 .68 .75 viewpoints Gradient 1 (N=99) .52 .70 .60 Descent .69

Balance of K-nearest Original TF-IDF 0 (N=297) .77 .73 .75 actors neighbour with uni- & 1 (N=152) .43 .48 .46 bigrams .66

Neutrality Support Clean Count 0 (N= 204) .68 .60 .63 vector 1 (N= 283) .72 .79 .75 machine .70 Classifier parameters as follows:1) Balance of viewpoints: loss="hinge", alpha = .0001, max_iter=200, random_state=8, 2) Balance of actors: default settings, 3) Neutrality: default settings

The final classifier evaluation was based on their precision, recall, and f1-score.

Preference was given to balanced results, as both categories were equally important for all indicators. Overall, the distribution of the predicted categories mirrored the distribution of the manually coded categories, except for balance of actors where the trained classifier reversed the two categories’ distribution (see table 5 in Appendix E). Table 1 provides an overview of 16 the best classification results per indicator as well as the text versions and vector representations that they worked best on. For additional information see tables 2 through 4 in

Appendix E.

Emotionality was defined as “the presence of positivity and/or negativity as opposed to the absence of both” (Burggraaff & Trilling, 2017, p. 6) in a given news article.

Emotionality was assessed on the article-level through dictionary-based counting of positive or negative words. All analyses were performed based on the Rauh sentiment dictionary

(Rauh, 2018), which has been specifically developed for the application to political texts. It augments two more general sentiment dictionaries, namely SentiWS (Remus, Quasthoff, &

Heyer, 2010) and GPC (Waltinger, 2010) and allows for a better and more valid measurement of sentiment in political texts (Rauh, 2018). To account for article length, the number of emotional words was divided by the number of words in a text, resulting in a final emotionality ratio that was used for the hypothesis testing (M= .11, SD= .03)

In addition to emotionality, this study also measured negativity. Negativity was assessed through the same dictionary-based procedure as emotionality, where all negative words in a text were counted based on Rauh’s (2018) sentiment dictionary. Dividing the sum of negative words by the number of words in an article resulted in a final negativity ratio (M=

.48, SD= .02), that was used for the hypothesis testing.

Readability (M= 40.68, SD= 10.67) was used as a single indicator for news article’s comprehensibility. It was measured with the Flesch-reading-ease score (FRE), which assigns different weights to a text’s average sentence length (ASL) and the average number of syllables per word (ASW). It was computed with the textstat python package

(https://github.com/shivam5992/textstat). For German texts, the package relies on Amstad’s

(1987) adapted formula:

FREdeutsch = 180 – ASL – (58,5*ASW)

17

The FRE has been shown to be almost identical to similar other readability measures

(Štajner, Evans, Orasan, & Mitkov, 2012) and has been applied to news articles before in various countries (e.g. Amstad, 1978; Dalecki, Lasorsa, & Lewis, 2009; Plavén-Sigray,

Matheson, Schiffler, & Thompson, 2017). It ranges from a minimum of zero (very difficult) to a maximum of 100 (very easy).

Data analysis & storage

All hypothesis tests were performed in either Python or SPSS. The code for both the data preparation and the analyses is available in a public GitHub repository

(https://github.com/nickma101/Thesis). The raw data on which the code was run as well as all relevant SPSS output can be accessed on an OSF server (https://osf.io/rdw9z/).

Results

This thesis explored four automatically measured news quality indicators as well as negativity. Table 2 provides an overview with means and standard deviations for the overall sample and the four subsamples under study (see Appendix F for newspaper comparisons).

Since the dependent variables were not normally distributed (see Appendix G), all following analyses relied on statistical approaches that do not require normally distributed data.

Actor diversity

The first political news quality indicator under study was the diversity of actors. H1 assumed that national newspapers exhibit greater degrees of actor diversity than regional newspapers and H2 assumed that modality moderates this effect in such a way that online news exhibit greater differences than print news. The two hypotheses were tested through an ordinal regression in SPSS, with the reach and modality dummies as predictors, article length as a covariate and the diversity index as the dependent variable. The results from table 3 supported H2 but not H1, as, contrary to what H1 had expected, the odds of a regional article exhibiting a higher degree of actor diversity was 1.167 (95% CI [1.082, 1.258]) that of a national article. H2 was supported, as an interaction effect showed that the odds of an online 18 article by a regional newspaper to exhibit higher levels of actor diversity was .71 (95% CI

[.599, .905]).1

Overall, H2 was thus supported, whereas H1 was rejected, as regional newspapers displayed higher actor diversity when controlling for article length and comparable actor diversity one when article length was not taken into account.

Table 2. Overall means and standard deviations of dependent variables.

Group N Diversity Impartiality Emotionality Negativity Readability Length

total 11,491 2.52 (.80) 1.39 (.80) .11 (.03) .05 (.02) 40.68 495.96

(10.67) (709.94)

online 1,072 2.27 (.84) 1.38 (.81) .12 (.03) .06 (.03) 40.66 460.12

(10.10) (310.21)

print 10,419 2.55 (.79) 1.39 (.80) .11 (.03) .05 (.02) 40.68 499.65

(10.73) (738.81)

regional 5,038 2.52 (.84) 1.49 (.77) .11 (.04) .04 (.02) 40.66 399.33

(10.90) (976.15)

national 6,453 2.52 (.76) 1.30 (.81) .12 (.03) .05 (.02) 40.70 571.40

(10.90) (375.08)

Means with standard deviations in brackets. Length is calculated as the average number of words in a text.

1 The results must be interpreted with caution, due to bad model fit and violotation of the assumption of proportional odds (see Appendix H). Independent Kruskal-Wallis H tests showed a significant mean difference between online (M= 2.27, SD= .84) and print newspapers (M= 2.55, SD= .79); H(1)= 112.70, p= <.001, but an insignificant one between national (M= 2.52, SD= .76) and regional newspapers (M= 2.52, SD= .84); H(1)= .69, p= .407.

19

Table 3. Ordinal regression results for the effects of reach, modality, and article length on actor diversity and impartiality.

Parameter estimates

Actor Diversity Impartiality

Parameters B (SE) OR (95% CI) B (SE) OR (95% CI)

Diversity index = 0 -4.686 (.12)*** .009 (.007 - .012) - -

Diversity index = 1 -2.236 (.05)*** .107 (.097 - .118) - -

Diversity index = 2 .252 (.04)*** 1.286 (1.189 – 1.391) - -

Diversity index = 3 2.490 (.04)*** 12.061 (1.082 – 1.258) - -

Impartiality = 0 - - -3.457 (.06)*** .032 (1.189 – 1.391)

Impartiality = 1 - - -.890 (.04)*** .378 (.378 - .446)

Impartiality = 2 - - 1.494 (.05)*** 4.046 (4.046 – 4.905)

Reach = regional .154 (.04)*** 1.167 (1.082 – 1.258) -.012 (.04) .988 (.916 – 1.066)

Modality = online -.514 (.08)*** .598 (.511 - .700) -.224 (.08)** .799 (.682 - .937)

Length .001 (<.01)*** 1.001 (1.000 – 1.001) -.002 (<.01)*** .998 (1.048 – 1.699)

Reach x Modality -.341 (.12)** .711 (.559 - .905) .288 (.12)* 1.334 (1-048 – 1.699)

R2 .022 .177

N = 11,491 OR (95% CI) = Odds ratios with 95% confidence intervals Function= Logit Diversity and impartiality indexes are the intercepts. *p< .05, **p<.01, ***p<.001

Impartiality

RQ 1 asked whether impartiality differs depending on a) the modality (online vs. print) and b) the type (national vs. regional) of different newspaper outlets? This RQ was answered through an ordinal regression with the modality and reach dummies as well as article length as predictors of the dependent variable impartiality. The results from table 3 showed that the

20 odds of an online article exhibiting a higher degree of impartiality was .80 (95% CI [682,

.937]) that of a print article. This effect was statistically significant; X2(1)= 7.65, p= .006. In contrast, the reach of an article had a statistically insignificant effect; X2(1)= .09, p= .760.

Article length had a very minor, but significant positive effect on impartiality with an odds ratio of 1.00 (95% CI [1.048, 1.699]); X2(1)= -.002, p< .001. Overall, the model explained

17.7% of the variance in the dependent variable and fit the data significantly better than an intercept only model; X2(4)= 2007.99, p<.001. While the model failed to pass the test of parallel lines (X2(8)= 192.04, p< .001), its outcomes still strongly suggest that online news articles are on average less impartial than print articles, whereas the reach of a newspaper does not appear to make a difference when length is controlled for (See Appendix H for model fit).

Emotionality & negativity

This thesis assumed that the use of emotionality and negativity are driven by the amount of competition that newspapers face and the resources that they have at their disposal.

Specifically, it argued that online news articles (H3a) and regional news articles (H3b) tend to exhibit significantly more negativity than print and national news respectively, as they try to catch readers attention in a way that does not necessarily rely on resource extensive quality reporting. Furthermore, online news articles were expected to feature less emotionality than print news articles (H4), as they tend to rely more on agency-copy.

All hypotheses were tested by means of two linear ordinary least square regressions in

SPSS – one with emotionality and a second one with negativity as the dependent variable.

The three predictors, namely modality, reach, and article length were added stepwise, thereby allowing for a comparison of model fit with different predictor variables. For both regression models, model fit increased significantly with each added predictor. Overall, the regression model for negativity explained 4.1% and the regression model for emotionality explained

2.1% of the overall variance. See Appendix I for an overview of model fit measures. 21

Table 4. Ordinary least squares (OLS) regressions for emotionality, negativity, and readability

Emotionality Negativity Readability

b (SE) β b (SE) β b (SE) β

Constant .114 (.001) .054 (.001) 40.64

Modality = print -.006 (.001) -.053*** -.011 -.136*** .019 .001

Reach = national .008 (.001) .116*** .007 .140*** .033 .002

Length .000003(.000) .056*** .000001 (.000) .041***

R2 .021 .041 <.001

SE: standard error. N = 11,491. *p < .05 **p < .01 ***p < .001.

While the difference in negativity between online (M= .06, SD= .03) and print newspapers (M= .05, SD= .02) was rather small overall, the results of the regression analysis in table 5 show that this difference was indeed significant, even when controlling for an articles’ length. Thus, H3a was supported. In contrast, the regression results in table 5 did not support H3b, as national newspapers exhibited slightly more negativity (M= .05, SD= .02) than regional newspapers (M= .04, SD= .02) when controlling for article length. Regarding

H4, the results of the regression analysis in table 4 showed that emotionality was higher in online news articles (M= .12, SD= .03), than it was in print news articles (M= .11, SD= .03).

While this result contradicts the initial hypothesis, it aligns with the negativity results and indicates that online news articles might generally employ more emotional words in order to attract readers’ attention.

RQ3 examined differences in emotionality between national and regional newspapers.

Mirroring the results for negativity, the regression results in table 4 showed that news articles published by national newspapers (M= .12, SD= .03) featured a higher proportion of emotional words than articles that were published by regional newspapers (M= .11, SD= .04). 22

Figure 3. Emotionality ratio by modality Figure 5. Emotionality ratio by reach

Figure 5. Negativity ratio by modality Figure 6. Negativity ratio by reach

Figure 7. Readability score by modality Figure 8. Readability score by reach

23

This could indicate that regional newspapers rely more on agency copy in order to make up for a lack of resources.

However, in light of a) the low percentage of variance that the two models explained and b) the fact that in large samples even small differences can become significant, it is important to note that the differences between newspapers with different reach and modalities were rather small on the aggregate and do not explain the variation in emotionality or negativity very well. Especially given the large variation in individual news articles’ scores

(see Figures 3 to 8) and the fact that paid-for online content was missing from the sample.

Readability

RQ3 explored differences in readability across modalities (online vs. print) and reach

(national vs. regional). It was answered through a linear ordinary least square regression with the readability score as the dependent and the modality and reach dummies as the independent variables. The independent variables were added step by step. Adding reach to the base model with modality as the only predictor led to a significantly better model. The regression results from table 4 showed that on the aggregate there was no significant difference in readability between online (M= 40.66, SD= 10.10) and print (M= 40.68, SD= 10.73) newspapers, nor was there a significant difference between national (M= 40.70, SD= 10.90) and regional (M=

40.66, SD= 10.90) newspapers. Despite interesting and statistically significant differences on the newspaper level2, the overall readability scores for each newspaper were somewhat close to the value of 40, indicating that the articles were somewhat difficult to read but still understandable for a larger part of the population. see Appendix J for newspaper comparisons.

Conclusion & Discussion

In an ideal democracy, news media should provide citizens with high-quality political information, so that they can best perform their civic duties (Strömbäck, 2005). This thesis

2 For example, Die Aachener Zeitung appeared to be comparably difficult to read (see Appendix J). 24 automatically measured four important news quality indicators, to explore if and to what extent different modalities and types of German newspapers adhere to this ideal. The results revealed that both the modality and the reach of a newspaper play a role in determining its performance in terms of news quality indicators.

Modality affected news quality in so far, as print news exhibited a more diverse set of actors and a higher degree of impartiality as well as a less emotional and less negative reporting style. This largely aligns with Burggraaff and Trilling’s (2017) assumption that the comparably high online competition leads to content with lower news quality. However, the results for emotionality and negativity deviate from the outcomes one would expect if online news relied more on agency copy, as suggested by Welbers et al. (2018). That said, the different levels of news quality across modalities might also be (partly) due to only freely available articles being scraped. Possibly, German newspapers offer a certain extent of free but lower-quality content online, whereas high-quality content must be paid for. Future research should try to find ways to overcome the difficulties of sampling paid-for articles in order to investigate if this assumption is true, especially as it might have important societal implications for news consumers that are not willing to pay for online subscriptions.

For newspapers with different levels of reach, fewer differences emerged, although regional newspapers did appear to report in a less emotional and negative manner. This might either be due to reliance on agency copy as a result of comparably limited resources (Welbers et al., 2018) or attest to the comparably high quality of regional German newspapers

(Humprecht & Esser, 2018) – a notion that was supported by comparably high levels of actor diversity and impartiality among regional sample. A third explanation might be that this thesis’ differentiation of reach a) constitutes an oversimplification, as two of the three sampled national newspaper also cater to local audiences, and b) underestimates the high levels of readership that regional newspapers have (Esser & Brüggemann, 2013) which might translate into considerable financial resources that can be invested into quality reporting. 25

Thus, future research should explore different newspaper classifications, for example differentiating them by their number of subscribers in order to account for differences in their available resources.

Apart from the theoretical contributions of this thesis, its arguably biggest value lies in proposing a scalable framework for the assessment of news quality and especially in the creation of classifiers that assess impartiality. While the classifiers did not reach optimal performance in terms of their precision and recall and while the balance of actors classifier seemed to overstate the prevalence of a balanced set of actors, they still show the potential of

SML approaches for journalism studies. Especially, since such methodological approaches are still under-utilised (Boumans & Trilling, 2015). Future research should build on this work either by advancing the existing classifiers or by developing new and more comprehensive measurements for impartiality and other high-level constructs. Furthermore, scholars should use automated approaches such as the one taken in this study in order to assess the development of news quality indicators over time. In contrast to the comparison of newspaper categories, research into the over-time development of news quality could better address the arguments that have been put forward about the implications of commercialisation (e.g.

Burggraaff & Trilling, 2017; Humprecht & Esser, 2018; Jacobi et al., 2016).

Naturally, the results of this thesis must be interpreted in light og several important limitations. First, the server issues led to a comparably small amount of political news articles that might somewhat impede their comparability with the considerably larger print sample.

Second, the classifiers, especially the one for balance of actors, did not reach optimal performance.

Third, the large sample size (N=11.489) might have turned even small differences statistically significant. Hence, it is important to stress that most mean differences in the sample were rather small – at least across the different modalities and levels of reach. This directly relates to a third limitation, namely the classification of newspapers. The regression 26 models with modality and reach as predictors mostly only accounted for a small amount of variance in the dependent variables. Given the at times considerable differences between individual newspapers, this suggests that other factors such as available resources (Masini et al., 2018), journalistic style and reporting culture might be more important.

Lastly, it is important to stress that dictionary-based approaches cannot substitute the analytical depth and contextualisation that manual content analyses (MCA) provide (Boyd &

Crawford, 2012) as they rely on language models that are at best an approximation of the real phenomenon (Grimmer & Stewart, 2013).

Nonetheless, by combining MCA with ACA through SML, this thesis has shown that automated research cannot only provide basic insights into news quality, but that it can in fact also be used for capturing high-level constructs in a resource-efficient and scalable manner.

Since doing so holds great potential for comparative studies, this thesis hopes to inspire future applications of SML that draw on and extend the framework employed by this study.

27

References

Amstad, T. (1978). Wie verständlich sind unsere Zeitungen?[How understandable are our

newspapers?]. Unpublished doctoral dissertation, University of Zürich, Switzerland.

Bennett, W. L. (1996). An introduction to journalism norms and representations of politics.

Political Communication 13(4), 373–384.

https://doi.org/10.1080/10584609.1996.9963126

Benson, R. (2009). What makes news more multiperspectival? A field analysis. Poetics, 37(5-

6), 402-418. https://doi.org/10.1016/j.poetic.2009.09.002

Benson, R., & Wood, T. (2015). Who says what or nothing at all? Speakers, frames, and

frameless quotes in unauthorized immigration news in the United States, Norway, and

France. American Behavioral Scientist, 59(7), 802-821.

https://doi.org/10.1177/0002764215573257

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing

text with the natural language toolkit. O'Reilly Media, Inc..

Björnsson, C. H. (1983). Readability of newspapers in 11 languages. Reading Research

Quarterly, 480-497. https://doi.org/10.2307/747382

Boudana, S. (2016). Impartiality is not fair: Toward an alternative approach to the evaluation

of content bias in news stories. Journalism, 17(5), 600-618.

https://doi.org/10.1177/1464884915571295

Boukes, M., & Vliegenthart, R. (2020). A general pattern in the construction of economic

newsworthiness? Analyzing news factors in popular, quality, regional, and financial

newspapers. Journalism, 21(2), 279-300. https://doi.org/10.1177/1464884917725989

Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant

automated content analysis approaches and techniques for digital journalism scholars.

Digital Journalism, 4(1), 8–23. https://doi.org/10.1080/21670811.2015.1096598

28

Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural,

technological, and scholarly phenomenon. Information, communication & society,

15(5), 662-679. https://doi.org/10.1080/1369118X.2012.678878

Brüggemann, M., Engesser, S., Büchel, F., Humprecht, E., and Castro, L. (2016). “Framing

the Newspaper Crisis.” Journalism Studies 17(5), 533–551.

http://dx.doi.org/10.1080/1461670X.2015.1006871

Burggraaff, C., & Trilling, D. (2017). Through a different gate: An automated content

analysis of how online news and print news differ. Journalism.

https://doi.org/10.1177/1464884917716699

Carpenter, S. (2010). A study of content diversity in online citizen journalism and online

newspaper articles. New Media & Society, 12(7), 1064-1084.

https://doi.org/10.1177/1461444809348772

Carpenter, S., Boehmer, J., & Fico, F. (2016). The measurement of journalistic role

enactments: A study of organizational constraints and support in for-profit and

nonprofit journalism. Journalism & Mass Communication Quarterly, 93(3), 587-608.

https://doi.org/10.1177/1077699015607335

Castells, M. (2013). Communication power. Oxford University Press. Oxford.

Cushion, S., Lewis, J., & Callaghan, R. (2017). Data journalism, impartiality and statistical

claims: Towards more independent scrutiny in news reporting. Journalism Practice,

11(10), 1198-1215. https://doi.org/10.1080/17512786.2016.1256789

Cushion, S., Kilby, A., Thomas, R., Morani, M., & Sambrook, R. (2018). Newspapers,

impartiality and television news: Intermedia agenda-setting during the 2015 UK

general election campaign. Journalism Studies, 19(2), 162-181.

https://doi.org/10.1080/1461670X.2016.1171163

Cushion, S., & Thomas, R. (2019). From quantitative precision to qualitative judgements:

Professional perspectives about the impartiality of television news during the 2015 UK 29

General Election. Journalism, 20(3), 392-409.

https://doi.org/10.1177/1464884916685909

Czymara, C. S., & van Klingeren, M. (2019). New perspective? Comparing Frame

Occurrence in Online and Traditional News Media Reporting on Europe’s “Migration

Crisis”. https://doi.org/10.31235/osf.io/h3tpy

Dalecki, L., Lasorsa, D. L., & Lewis, S. C. (2009). The news readability problem. Journalism

Practice, 3(1), 1-12. https://doi.org/10.1080/17512780802560708

Esser, F. (1999). Tabloidization'of news: A comparative analysis of Anglo-American and

German press journalism. European journal of communication, 14(3), 291-324.

https://doi.org/10.1177/0267323199014003001

Esser, F., & Brüggemann, M. (2010). The strategic crisis of German newspapers. The

changing business of journalism and its implications for democracy, 39-54.

Esser, F., & Umbricht, A. (2013). Competing models of journalism? Political affairs coverage

in US, British, German, Swiss, French and Italian newspapers. Journalism, 14(8), 989-

1007. https://doi.org/10.1177/1464884913482551

Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic

content analysis methods for political texts. Political analysis, 21(3), 267-297.

https://doi.org/10.1093/pan/mps028

Gruber, J. (2020). LexisNexisTools. An R package for working with newspaper data from

'LexisNexis’. Retreived from: https://github.com/JBGruber/LexisNexisTools

Hallin, D. C., & Mancini, P. (2004). Comparing media systems: Three models of media and

politics. Cambridge university press. https://doi.org/10.1017/CBO9780511790867

Hiles, S. S., & Hinnant, A. (2014). Climate change in the newsroom: Journalists’ evolving

standards of objectivity when covering global warming. Science Communication,

36(4), 428-453. https://doi.org/10.1177/1075547014534077

30

Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom

embeddings, convolutional neural networks and incremental parsing.

Humprecht, E., & Esser, F. (2018). Diversity in online news: On the importance of ownership

types and media system types. Journalism Studies, 19(12), 1825-1847.

https://doi.org/10.1080/1461670X.2017.1308229

Jacobi, C., Kleinen-von Königslöw, K., & Ruigrok, N. (2016). Political News in Online and

Print Newspapers: Are online editions better by electoral democratic standards?.

Digital Journalism, 4(6), 723-742. https://doi.org/10.1080/21670811.2015.1087810

Jungnickel, K. (2011). Nachrichtenqualität aus Nutzersicht. Ein Vergleich zwischen

Leserurteilen und wissenschaftlich-normativen Qualitätsansprüchen. M&K Medien &

Kommunikationswissenschaft, 59(3), 360-378. https://doi.org/10.5771/1615-634x-

2011-3-360

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big data & society,

1(1), 2053951714528481. https://doi.org/10.1177/2053951714528481

Knobloch-Westerwick, S., Mothes, C., & Polavin, N. (2020). Confirmation bias, ingroup bias,

and negativity bias in selective exposure to political information. Communication

Research, 47(1), 104-124. https://doi.org/10.1177/0093650217719596

Lau, R. R., & Redlawsk, D. P. (2001). Advantages and disadvantages of cognitive heuristics

in political decision making. American Journal of Political Science, 951-971.

https://doi.org/10.2307/2669334

Leung, D. K., & Lee, F. L. (2015). How journalists value positive news: The influence of

professional beliefs, market considerations, and political attitudes. Journalism Studies,

16(2), 289-304. https://doi.org/10.1080/1461670X.2013.869062

Locke, J. (1967). Locke: Two treatises of government. Cambridge University Press.

Maras, S. (2013). Objectivity in journalism. John Wiley & Sons.

31

Masini, A., Van Aelst, P., Zerback, T., Reinemann, C., Mancini, P., Mazzoni, M., ... & Coen,

S. (2018). Measuring and explaining the diversity of voices and viewpoints in the

news: A comparative study on the determinants of content diversity of immigration

news. Journalism Studies, 19(15), 2324-2343.

https://doi.org/10.1080/1461670X.2017.1343650

Masini, A., & Van Aelst, P. (2017). Actor diversity and viewpoint diversity: Two of a kind?.

Communications, 42(2), 107-126. https://doi.org/10.1515/commun-2017-0017

McManus, J. H. (2009). The commercialization of news. In The handbook of journalism

studies (pp. 238-254). Routledge.

McQuail, D. (1992). Media performance: Mass communication and the public interest (Vol.

144). London: Sage.

Plasser, F. (2005). From hard to soft news standards? How political journalists in different

media systems evaluate the shifting quality of news. Harvard International Journal of

Press/Politics, 10(2), 47-68. https://doi.org/10.1177/1081180X05277746

Plavén-Sigray, P., Matheson, G. J., Schiffler, B. C., & Thompson, W. H. (2017). The

readability of scientific texts is decreasing over time. Elife, 6, e27725.

https://doi.org/10.7554/eLife.27725.029

Ramírez de la Piscina, T., Gonzalez Gorosarri, M., Aiestaran, A., Zabalondo, B., & Agirre, A.

(2015). Differences between the quality of the printed version and online editions of

the European reference press. Journalism, 16(6), 768-790.

https://doi.org/10.1177/1464884914540432

Rauh, C. (2018). Validating a sentiment dictionary for German political language—a

workbench note. Journal of Information Technology & Politics, 15(4), 319-343.

https://doi.org/10.1080/19331681.2018.1485608

32

R. Remus, U. Quasthoff & G. Heyer: SentiWS - a Publicly Available German-language

Resource for Sentiment Analysis. In: Proceedings of the 7th International Language

Resources and Evaluation (LREC'10), pp. 1168-1171, 2010

Schoenbach, K., & Lauf, E. (2002). Content or design? Factors influencing the circulation of

American and German newspapers. Communications, 27(1), 1-14.

https://doi.org/10.1515/comm.27.1.1

Soroka, S., & McAdams, S. (2015). News, politics, and negativity. Political Communication,

32(1), 1-22. https://doi.org/10.1080/10584609.2014.881942

Štajner, S., Evans, R., Orasan, C., & Mitkov, R. (2012). What can readability measures really

tell us about text complexity. In Proceedings of the the Workshop on Natural

Language Processing for Improving Textual Accessibility (NLP4ITA) (pp. 14-21).

Strömbäck, J. (2005). In search of a standard: Four models of democracy and their normative

implications for journalism. Journalism studies, 6(3), 331-345.

https://doi.org/10.1080/14616700500131950

Trilling, D., Van De Velde, B., Kroon, A. C., Löcherbach, F., Araujo, T., Strycharz, J., ... &

Jonkman, J. G. (2018, October). INCA: Infrastructure for content analysis. In 2018

IEEE 14th International Conference on e-Science (e-Science) (pp. 329-330). IEEE.

https://doi.org/10.1109/eScience.2018.00078

Urban, J., & Schweiger, W. (2014). News Quality from the Recipients' Perspective:

Investigating recipients' ability to judge the normative quality of news. Journalism

Studies, 15(6), 821-840. https://doi.org/10.1080/1461670X.2013.856670

Van Hoof, A. M., Jacobi, C., Ruigrok, N., & Van Atteveldt, W. (2014). Diverse politics,

diverse news coverage? A longitudinal study of diversity in Dutch political news

during two decades of election campaigns. European Journal of Communication,

29(6), 668-686. https://doi.org/10.1177/0267323114545712

33

Voakes, P. S., Kapfer, J., Kurpius, D., & Chern, D. S. Y. (1996). Diversity in the news: A

conceptual and methodological framework. Journalism & Mass Communication

Quarterly, 73(3), 582-593. https://doi.org/10.1177/107769909607300306

Wahl-Jorgensen, K., Berry, M., Garcia-Blanco, I., Bennett, L., & Cable, J. (2017). Rethinking

balance and impartiality in journalism? How the BBC attempted and failed to change

the paradigm. Journalism, 18(7), 781-800. https://doi.org/10.1177/1464884916648094

Waltinger, U. (2010, May). GermanPolarityClues: A Lexical Resource for German Sentiment

Analysis. In LREC (pp. 1638-1642).

Welbers, K., Van Atteveldt, W., Kleinnijenhuis, J., & Ruigrok, N. (2018). A gatekeeper

among gatekeepers: News agency influence in print and online newspapers in the

Netherlands. Journalism Studies, 19(3), 315-333.

https://doi.org/10.1080/1461670X.2016.1190663

34

Appendix A Quality criteria for news as proposed by Urban and Schweiger (2014)

35

Appendix B

Sample distribution

Table 1. Sample distribution across news outlets and modalities.

Newspaper Print articles Online articles Total articles

Der Tagesspiegel (national) 1,286 264 1,550

Die Süddeutsche (national) 3,720 175 3,895

Die Welt (national) 831 177 1,008

Aachener Zeitung (regional) 970 168 1,138

Rheinische Post (regional) 2,375 173 2,548

Stuttgarter Zeitung (regional) 1,237 115 1,350

Total 10,419 1,072 11,491

36

Appendix C Codebook for manual content analysis of impartiality training material

V1. Coder ID 01 Coder 1 02 Coder 2 03 Coder 3 04 Coder 4

V2. Article identification number

V3. News outlet 1 Aachener Zeitung 2 Stuttgarter Zeitung 3 Rheinische Post 4 Der Tagesspiegel 5 6 Die Süddeutsche

V4. Who is the central political actor in the story? (if in doubt, see list of actors below) 1 A governing party or a member of it on the national level 2 An opposition party or a member of it on the national level 3 A governing party or a member of it on the regional level 4 An opposition party or a member of it on the regional level 5 A foreign/international politician, party, or organisation 6 No political actor mentioned

Indicators of importance are… … duration, space of information about the actor … frequency of being mentioned … quotes, statements of the actor.

37

… mentioned in the headline or teaser

Notes: ➢ If two actors are equally prominent in the article with regard to the above criteria, then count the number of references to each actor and choose the one who is most often referred to. However, this rule only applies if two actors are really exactly evenly prominent with regard to the above criteria. ➢ Everything that happens on the federal state level or below is considered regional ➢ If there are two equally central actors of opposing categories, code for the one that is mentioned first (headline included) ➢ Foreign/international actors are all political actors that are not working in German politics. This includes foreign countries, heads of states or other foreign politicians, foreign parties, international political organisations (e.g. NATO, EU) and also German politicians that work on the EU level. ➢ It doesn’t matter if political actors are not very prominent in an article. As long as it mentions at least a single political actor once or more, that is enough to code for central political actor. ➢ See Appendix A for a list of relevant politicians and parties per category

V5. Balance of political viewpoints “Is the standpoint of the central political actor challenged by another actor in the text?” 1 Yes 2 No

Notes: ➢ A challenge has to be expressed in the form of a quote (either direct or indirect) ➢ The challenging actor can either be … … another political actor (of the same or a different party), or … another actor such as an expert, a journalist, or anyone else who is relevant in the context of the article’s topic … the author ➢ Challenging a viewpoint means critically engaging with it. Therefore, it encompasses not only contrary statements, but also criticisms of particular aspects.

38

Example (for code 1): ➢ - Tübingens Oberbürgermeister Boris Palmer (Grüne) hat Forderungen nach einem Parteiaustritt zurückgewiesen. „Selbstverständlich trete ich nicht aus meiner Partei aus“, sagte Palmer am Freitag der „“-Zeitung. „Ich bleibe weiterhin aus ökologischer Überzeugung Mitglied der Grünen. Da die Vorwürfe gegen mich von meinen Gegnern erfunden beziehungsweise konstruiert worden sind, gibt es überhaupt keinen Grund, darüber nachzudenken.“ Der Landesvorstand der Grünen in Baden-Württemberg hatte den umstrittenen Kommunalpolitiker zuvor zum Parteiaustritt aufgefordert. Mit seinen Äußerungen stelle sich Palmer gegen politische Werte und Grundsätze der Partei und agiere „systematisch“ gegen sie, erklärte der Landesvorstand nach einer Sitzung am Freitagabend. Mit seinem Auftreten diene der Politiker „nicht der politischen oder innerparteilichen Debatte, sondern der persönlichen Profilierung“.

V6. Balance of political sources “Does the article quote two or more different types of actors (V4)?” 1 Yes 2 No

Notes: ➢ A quote can be either direct, indirect, or a mix of the two; e.g.: o „Selbstverständlich trete ich nicht aus meiner Partei aus“, sagte Palmer am Freitag der „Bild“-Zeitung“ o Flynn’s Eingeständnis, dass er im Dezember 2016, also vor der Amtseinführung Trumps, den russischen Botschafter bei einem zunächst bestrittenen Geheimtelefonat um eine zurückhaltende Reaktion auf die vom amtierenden Präsidenten Barack Obama verhängten Sanktionen bat, war ein wichtiger Beleg für die Zusammenarbeit der Trump-Kampagne mit Moskau o Mit seinen Äußerungen stelle sich Palmer gegen politische Werte und Grundsätze der Partei und agiere „systematisch“ gegen sie, erklärte der Landesvorstand nach einer Sitzung am Freitagabend. ➢ The actor types are explained under V4 as well as in the Appendix.

39

V5. Neutrality: “Does the author personally evaluate anything within the article?” 1 Yes 2 No

Notes: ➢ A personal evaluation is made when the article’s author comments on an actor, a thing, or a topic in a way that renders them either positive or negative. Importantly, this evaluation must be made by the author. Citing other sources’ evaluations does not count as a personal evaluation (Here, it is important to consider the context. Often, an article will make a statement in one sentence and then provide a source in the next one). ➢ Personal evaluations need to be made explicit (not just a feeling) ➢ An evaluation can also take the form of a word or phrase that describes something in a positive or negative way. ➢ If an article concerns different topics and actors and only a part of it is evaluated by the author, the article still needs to be coded with a 1.

Examples for code 1: ➢ „Doch es kam bekanntlich anders. An jenem 5. Februar begannen vier Chaos- Wochen, in denen die Presse, die im Erfurter Landtag dauercampte, mit ihren Eilmeldungen kaum hinterherkam. … Aber kaum war die schwerste Krise in der jüngeren Geschichte Thüringens überstanden, begann die noch schwerere, globale Krise.“ ➢ „Eva Högl ist neue Wehrbeauftragte, die erste Sozialdemokratin in dem Amt. Aber statt Stolz löst das Frust bei der SPD aus.“ ➢ „Das könnte man als klassische Meckerei der Opposition abtun.“ ➢ „In den USA ist die Pandemie alles andere als überstanden, doch Donald Trump denkt nur an seine Wiederwahl. Er weiß nicht, was er tut – und gefährdet damit Menschenleben.“ ➢ „Flynn’s Eingeständnis, dass er im Dezember 2016, also vor der Amtseinführung Trumps, den russischen Botschafter bei einem zunächst bestrittenen Geheimtelefonat um eine zurückhaltende Reaktion auf die vom amtierenden Präsidenten Barack

40

Obama verhängten Sanktionen bat, war ein wichtiger Beleg für die Zusammenarbeit der Trump-Kampagne mit Moskau“

41

List of relevant political parties and actors per category The following list displays all relevant politicians on the national and regional level. If an actor is mentioned in one of the following lists, they should be coded as either a national or a regional governing actor. If both applies, code for national. If a political actor is not present in this list, code as opposition actor (either national or regional, depending on the article).

Governing parties National government CDU, CSU, SPD Baden Württemberg Grüne, CDU Bayern CSU, Freie Wähler Berlin SPD, Die Linke, Grüne SPD, CDU, Grüne SPD, Grüne, Die Linke SPD, Grüne Hessen CDU, Grüne Mecklenburg-Vorpommern SPD, CDU Niedersachsen SPD, CDU Nordrhein-Westphalen CDU, FDP Rheinland-Pfalz SPD, FDP, Grüne CDU, SPD Sachsen CDU, Grüne, SPD Sachsen-Anhalt CDU, SPD, Grüne Schleswig-Holstein CDU, Grüne, FDP Thüringen Die Linke, SPD, Grüne Note: The states in bold print are the ones from which regional newspapers are sampled.

Members of parliament (from governing parties) National Government CDU Elisabeth Winkelmeier- Eva Högl Becker Nikolas Löbel Jan-Marco Luczak Johannes Kahrs Thomas Bareiß CSU Thomas de Maizière Dorothee Bär André Berghegger Hans-Georg von der Bärbel Kofler Marwitz Hansjörg Durz 42

Peter Beyer Hans-Peter Friedrich Christian Lange Alexander Hoffmann Michael Brand Karsten Möring Kirsten Lühmann Axel Müller Carsten Müller Michael Kießling Isabel Mackensen Sepp Müller Ulrich Lange Marie-Luise Dött Hermann Färber Siemtje Möller Gerd Müller Bettina Müller Christoph Ploß Stefan Müller Detlef Müller Hans-Joachim Fuchtel Georg Nüßlein Michelle Müntefering Ingo Gädechens Florian Oßner Rolf Mützenich Alois Rainer Ursula Groden-Kranich Johannes Röring Mahmut Özdemir Hermann Gröhe Norbert Röttgen Aydan Özoğuz Klaus-Dieter Gröhler Christian Schmidt Michael Grosse-Brömer Erwin Rüddel Astrid Grotelüschen Markus Grübel Anita Schäfer Wolfgang Schäuble Monika Grütters Volker Ullrich Fritz Güntzler Nadine Schön Jürgen Hardt SPD Klaus-Peter Schulze Sönke Rix Ingrid Arndt-Brauer René Röspel Michael Roth Mark Helfrich Susann Rüthrich Bernd Rützel Björn Simon Sören Bartol Bärbel Bas Axel Schäfer Karl-Heinz Brunner Hans-Jürgen Irmer 43

Andreas Jung Torbjörn Kartes Stefan Kaufmann Hermann-Josef Tebroke Hans-Jürgen Thies ] Rita Schwarzelühr- Sutter Carsten Körber Alexander Krauß Angelika Glöckner Martina Stamm-Fibich Günter Krings Rüdiger Kruse Michael Groß Roy Kühne Uli Grötsch Karl A. Lamers Andreas Lämmel Rita Hagl-Kehl Markus Töns Carsten Träger Peter Weiß Marja-Liisa Völlers Sabine Weiss Dirk Vöpel Barbara Hendricks Annette Widmann-Mauz Gabriele Hiller-Ohm Gülistan Yüksel Klaus-Peter Willsch Jens Zimmermann

Baden Württemberg Grüne Manfred Lucha CDU Joachim Kößler Alexander Maier Norbert Beck Sabine Kurtz Theresia Bauer Thomas Marwein Alexander Becker Siegfried Lorek Susanne Bay Bärbl Mielich Thomas Blenke Winfried Mack Hans-Peter Behrens Bernd Murschel Klaus Burger Claudia Martin Andrea Bogner-Unden Jutta Niemann Andreas Deuschle Paul Nemeth Sandra Boser Reinhold Pix Thomas Dörflinger Christine Neumann Martina Braun Thomas Poreski Konrad Epple Claus Paal Nese Erikli Daniel Renkonen Arnulf Freiherr von Eyb Julia Philippi Jürgen Filius Markus Rösler Marion Gentges Patrick Rapp Josha Frey Barbara Saebel Fabian Gramling Nicole Razavi Martin Grath Alexander Salomon Friedlinde Gurr-Hirsch Wolfgang Reinhart Petra Häffner Alexander Schoch Manuel Hagel Karl-Wilhelm Röhm Martin Hahn Andrea Schwarz Sabine Hartmann- Karl Rombach Willi Halder Andreas Schwarz Müller Volker Schebesta Thomas Hentschel Uli Sckerl Raimund Haser Stefan Scheffold Stefanie Seemann Peter Hauk August Schuler Hermann Katzenstein Ulli Hockenberger Albrecht Schütte 44

Manfred Kern Nicole Hoffmeister- Willi Stächele Petra Krebs Thekla Walker Kraut Stefan Teufel Jürgen Walter Isabell Huber Tobias Wald Daniel Lede Abal Dorothea Wehinger Karl Klein Guido Wolf Ute Leidig Elke Zimmer Wilfried Klenk Karl Zimmermann Andrea Lindlohr Brigitte Lösch

Nordrhein Westphalen CDU Kirstin Korte Marco Schmitz Lorenz Deutsch Günther Bergmann Wilhelm Korth Thomas Schnelle Markus Diekhoff Peter Biesenbach Oliver Krauß Rüdiger Scholz Angela Freimuth Jörg Blöming Bernd Krückel Fabian Schrumpf Jörn Freynick Marc Blondin André Kuper Christina Schulze Yvonne Gebauer Frank Boss Föcking Marcel Hafke Florian Braun Olaf Lehne Daniel Sieveke Martina Hannen Rainer Deppe Lutz Lienenkämper Martin Sträßer Stephan Haupt Guido Déus Bodo Löttgen Andrea Stullich Henning Höne Angela Erwin Arne Moritz Raphael Tigges Stefan Lenzen Björn Franken Stefan Nacke Heike Troles Marc Lürbke Heinrich Frieling Jens-Peter Nettekoven Christian Untrieser Christian Mangen -Dreisbach Ralf Nolten Marco Voge Rainer Matheisen Katharina Gebauer Britta Oellers Petra Vogt Bodo Middeldorf Jörg Geerlings Marcus Optendrenk Margret Voßeler Franziska Müller-Rech Matthias Goeken Dietmar Panske Klaus Voussem Thomas Nückel Gregor Golland Patricia Peill Simone Wendland Stephen Paul Daniel Hagemeier Bernd Petelkau Heike Wermer Werner Pfeil Wilhelm Hausmann Romina Plonsker Bianca Winkelmann Christof Rasche Bernhard Hoppe-Biermeyer Peter Preuß Hendrik Wüst Ulrich Reuter Josef Hovenjürgen Charlotte Quik Susanne Schneider Klaus Kaiser Henning Rehbaum FDP Joachim Stamp Jens Kamieth Jochen Ritter Daniela Beihl Andreas Terhaag Christos Georg Katzidis Frank Rock Ralph Bombis Ralf Witzel Oliver Kehrl Thorsten Schick Dietmar Brockes Matthias Kerkhoff Claudia Schlottmann Alexander Brockmeier Jochen Klenner Hendrik Schmitz

45

Appendix D Details about the Manual Content Analysis including procedure, payment and an overview of reliability measures

Procedure & payment In total, four different student coders worked on the manual content analysis. One of them was the researcher, the other three had been recruited through the researchers’ personal network. All coders were following University education at the time of the project. The coding of the data was preceded by a short coder training, where seven specifically selected news articles were individually coded and then discussed in an online Zoom meeting. Following this discussion, the codebook as well as the Qualtrics coding form were adjusted in order to minimise confusion and unclarity. After that, individual coding material was sent out by the researcher on a regular basis to each student coder. The coding was done online through a Qualtrics form. The three student coders were paid 12,50€ for every hour of work. The payment was processed through the University of Amsterdam under the Digital Communication Lab’s research master thesis grant. Intercoder Reliability In total, three different sets of articles were coded by two or more of the four student coders. That is, because a) the student coders varied in the number of hours that were available and b) some data was not available at the beginning of the coding procedure. In total, 499 news articles were coded. See table 1 for an overview of datasets and the coders involved.

Table 1. Datasets for MCA. Source N Coders Dataset A Print 250 All coders Dataset B Online 150 Coders 1 & 2 Dataset C Print 99 Coders 1 & 3

Inter-coder reliability was assessed by means of Krippenorff’s alpha and Cohen’s Kappa. Overall, the results of the two intercoder-reliability tests were good for neutrality, but only borderline acceptable for balance of actors and balance of viewpoints. In the following, I elaborate on the outcomes of the intercoder-reliability tests per dataset. 46

Dataset A Intercoder reliability for the initial print dataset was assessed on a sample of 10% of the overall dataset (N= 25). For the variable neutrality, Krippendorff’s alpha (α= .79) was well above Neuendorf’s (2002) threshold for satisfactory reliability and close the .8 threshold for good reliability. However, for the variable balance of viewpoints, Krippendorff’s alpha (α= .53) was below Neuendorf’s (2002) threshold for satisfactory reliability. The same was true for balance of actors (α = .63). However, it has to be noted that Neuendorf (2002) suggests a sample size of at least 50 or higher in order to obtain reliable results. In light of this, additional intercoder reliability tests between each coder were performed with Cohen’s Kappa. Despite still being far from ideal, all intercoder-reliability tests revealed either close to moderate, moderate, or even better intercoder reliability. See table 2, 3, and 4 for an overview of the exact results.

Table 2. Cohen’s kappa for balance of actors. Coder 1 Coder 2 Coder 3 Coder 4 Coder 1 - .593 .715 .453 Coder 2 - .675 .434 Coder 3 - .733 Coder 4 -

Table 3. Cohen’s kappa for balance of viewpoints. Coder 1 Coder 2 Coder 3 Coder 4 Coder 1 - .630 .593 .757 Coder 2 - .412 .394 Coder 3 - .453 Coder 4 -

Table 4. Cohen’s kappa for neutrality. Coder 1 Coder 2 Coder 3 Coder 4 Coder 1 - .760 .920 .840 Coder 2 - .682 .601 Coder 3 - .920 Coder 4 -

Dataset B Intercoder reliability for online dataset was assessed on a sample of 10% of the overall dataset (N= 15). The results for the variables neutrality (α= .87) and balance of viewpoints (k= .86) were good. However, the results for the variable balance of actors (α = .61) was below the advised threshold of .67 (Neuenorf, 2002). However, with a Cohen’s Kappa of .62 there 47 still appeared to be what the literature refers to as substantial agreement. See table 5 for an overview.

Table 5. Cohen’s kappa for Dataset B. Coder 2 BOA Coder 2 BOV Coder 2 NEU Coder 1 BOA .615 - - Coder 1 BOV - .857 - Coder 1 NEU - - .865

Dataset C For the second print dataset, intercoder-reliability was assessed on a subsample of 10 articles (10%). Krippendorff’s alpha for neutrality (α = .79) and balance of viewpoints (α = .75) were both well above the recommended threshold of .67 (Neuendorf, 2002). For balance of actors no statistic could be computed as there was no variation in the subsample. That said, both coders agreed in 100% of the cases. These outcomes are further reflected in the Cohen’s Kappa values reported in table 6.

Table 6. Cohen’s kappa for Dataset C. Coder 3 BOA Coder 3 BOV Coder 3 NEU Coder 1 BOA - - - Coder 1 BOV - .737 - Coder 1 NEU - - .783

Coding hours and articles coded Table 7 presents an overview of the number of hours that the three external student coders worked as well as the number of articles that they and the thesis author coded.

Table 7. Coding hours and articles coded Hours worked Number of articles coded

Coder 1 - 128 Coder 2 16 166 Coder 3 10 155 Coder 4 14 38 Note: Reliability material only included once by the coder who coded it first

48

Appendix E Distribution of training data for the SML and final classification results

Table 1. Labelled datasets indicating the distribution of different classes.

N Balance of Actors Balance of Viewpoints Neutrality

Yes No Yes No Yes No

Die Welt 53 25 (47%) 28 (53%) 29 (55%) 24 (45%) 23 (43%) 30 (57%)

Die Süddeutsche 142 49 (35%) 93 (65%) 69 (49%) 73 (51%) 74 (52%) 68 (48%)

Der Tagesspiegel 81 24 (30%) 57 (70%) 41 (51%) 40 (49%) 41 (51%) 40 (49%)

Aachener Zeitung 51 20 (39%) 31 (61%) 16 (31%) 35 (69%) 36 (71%) 15 (29%)

Rheinische Post 99 34(34%) 65 (66%) 45 (46%) 54 (54%) 65 (66%) 34 (35%)

Stuttgarter Zeitung 61 15 (25%) 46 (75%) 24 (40%) 37 (60%) 44 (72%) 17 (56%)

Total 487 167 (34%) 320 (66%) 224 (46%) 263 (54%) 283 (58%) 204 (42%)

For further information on the coding process consult Appendix C.

Table 2. Classification results for neutrality Model Data Vector Categories Precision Recall F1- Support source type score Support Cleaned Count 0 .68 .60 .63 42 Vector text 1 .72 .79 .75 56 Machines .70 98 K-nearest Original TF-IDF 0 .60 .67 .63 42 neighbour text with 1 .73 .66 .69 56 uni- & .66 98 bigrams Naïve Cleaned Count 0 .55 .71 .62 42 Bayes text 1 .72 .55 .63 56 .62 98 Note: Only the three best results are reported. Balanced classifiers were prioritised. Distribution of categories: For category 0: N = 204 For category 1: N = 283

49

Table 3. Classification results for Balance of Actors Model Data Vector Categories Precision Recall F1- Support source type score K-nearest Original TF-IDF 0 .77 .73 .75 63 neighbour text with uni- 1 .43 .48 .46 27 & bigrams .66 90 Support Cleaned Count 0 .74 .78 .76 63 Vector text 1 .42 .37 .39 27 Machines .66 90 Naïve Original Count 0 .73 .89 .80 63 Bayes text 1 .46 .22 .30 27 .69 90 Note: Only the three best results are reported. Balanced classifiers were prioritised. Distribution of categories: 0: N = 297 1: N = 152

Table 4. Classification results for Balance of Viewpoints Model Data Vector Categories Precision Recall F1- Support source type score Stochastic Original Count 0 .82 .68 .75 41 Gradient text 1 .52 .70 .60 20 Descent .69 61 K-nearest Cleaned TF-IDF 0 .74 .85 .80 41 neighbour text with 1 .57 .40 .47 20 unigrams .70 61 Support Original TF-IDF 0 .76 .71 .73 41 Vector text with 1 .48 .55 .51 20 Machines unigrams .66 61 Note: Only the three best results are reported. Balanced classifiers were prioritised. Distribution of categories: 0: N = 212 1: N = 89

Table 5. Impartiality indicator distribution for the manual content analysis sample and the total sample. Variable Category MCA sample distribution Total sample distribution Balance of actors 0 297 (66%) 4,347 (38%) 1 152 (34%) 7,144 (62%) Balance of viewpoints 0 212 (70%) 8,410 (73%) 1 89 (30%) 3,081 (27%) Neutrality 0 204 (42%) 5,779 (50%) 1 283 (58%) 5,712 (50%) Note: Online articles were more prevalent in the MCA data than in the overall sample

50

Appendix F Means and standard deviations for all dependent variables per newspaper

Table 1. Overall means and standard deviations of dependent variables.

Outlet modality N Diversity Impartiality Emotionality Negativity Readability Length

AZ Total 1,138 2.51 (.85) 1.48 (.78) .109 (.037) .047 (.025) 38.27 (11.40) 471.83 (2003.44)

Print 970 2.53 (.84) 1.49 (.77) .108 (.038) .045 (.024) 38.61 (11.83) 484.01 (2168.39)

Online 168 2.43 (.87) 1.46 (.80) .119 (.009) .057 (.052) 36.29 (8.30) 401.48 (197.70)

RP Total 2,548 2.51 (.83) 1.52 (.77) .107 (.034) .041 (.023) 41.68 (10.77) 364.83 (238.36)

Print 2,375 2.54 (.83) 1.51 (.77) .106 (.034) .040 (.022) 41.85 (10.82) 377.46 (240.33)

Online 173 2.25 (.82) 1.61 (.73) .117 (.034) .058 (.030) 39.34 (9.89) 338.70 (208.06)

STZ Total 1,350 2.53 (.84) 1.46 (.77) .114 (.036) .048 (.025) 40.77 (10.40) 384.50 (248.59)

Print 1,237 2.60 (.79) 1.45 (.78) .114 (.036) .048 (.025) 40.65 (10.45) 394.24 (246.07)

Online 115 1.78 (.97) 1.57 (.74) .117 (.037) .056 (.031) 42.02 (9.88) 279.68 (252.44)

Welt Total 1,008 2.61 (.79) 1.12 (.78) .125 (.028) .056 (.019) 41.24 (10.05) 740.08 (393.01)

Print 831 2.63 (.79) 1.08 (.76) .126 (.028) .056 (.019) 41.60 (9.87) 774.44 (390.82)

Online 177 2.53 (.81) 1.30 (.82) .118 (.030) .057 (.023) 39.52 (10.74) 578.75 (362.89)

TS Total 1,550 2.60 (.72) 1.32 (.81) .118 (.031) .052 (.022) 40.57 (11.32) 574.82 (358.22)

Print 1,286 2.66 (.71) 1.36 (.81) .117 (.031) .051 (.021) 40.18 (11.63) 574.95 (360.53)

Online 264 2.35 (.72) 1.13 (.80) .124 (.032) .061 (.053) 43.03 (9.30) 574.19 (347.39)

SZ Total 3,895 2.47 (.77) 1.35 (.81) .116 (.032) .049 (.023) 40.57 (10.24) 526.38 (364.30)

Print 3,720 2.48 (.79) 1.34 (.81) .116 (.032) .051 (.023) 40.46 (10.20) 529.37 (367.38)

Online 175 2.10 (.76) 1.38 (.83) .119 (.032) .057 (.026) 42.87 (10.86) 462.93 (284.73)

Means with standard deviations in brackets. AZ: Aachener Zeitung RP: Rheinische Post STZ: Stuttgarter Zeitung Welt: Die Welt TS: Der Tagesspiegel SZ: Die Süddeutsche Zeitung

51

Appendix G Assessment of normality for all dependent variables under study

Table 1. Normality tests for actor diversity.

Group N Kolmogorov-Smirnov test statistic p-value total 11,491 .238 < .001 online 1,072 .264 < .001 print 10,419 .238 < .001 regional 5,038 .232 < .001 national 6,453 .255 < .001

Table 2. Normality tests for impartiality.

Group N Kolmogorov-Smirnov test statistic p-value total 11,491 .254 <.001 online 1,072 .257 <.001 print 10,419 .254 <.001 regional 5,038 .245 <.001 national 6,453 .262 <.001

Table 3. Normality tests for emotionality.

Group N Kolmogorov-Smirnov test statistic p-value total 11,491 .028 <.001 online 1,072 .031 .017 print 10,419 .030 <.001 regional 5,038 .033 <.001 national 6,453 .025 <.001

Table 4. Normality tests for negativity.

Group N Kolmogorov-Smirnov test statistic p-value total 11,491 .039 <.001 online 1,072 .060 < .001 print 10,419 .036 <.001 regional 5,038 .046 <.001 national 6,453 .038 <.001 52

Table 5. Normality tests for readability.

Group N Kolmogorov-Smirnov test statistic p-value

total 11,491 .045 <.001 online 1,072 .039 .001 print 10,419 .046 <.001 regional 5,038 .046 <.001 national 6,453 .045 <.001

53

Appendix H Kruskal Wallis H tests for group differences in impartiality between different outlets & model fit for ordinal regression models

Table 1. Kruskal-Wallis H tests for mean differences in impartiality across article modality and reach. Group M (SD) N (total) Test statistic p-value Regional 1.49 (.77) National 1.30 (.81) 11,491 .288 <.001 Online 1.38 (.81) Print 1.39 (.80) 11,491 166.297 <.001

Model fit for ordinal regression model with actor diversity as the DV

The model fit significantly better than an intercept only model; X2(4) = 235.85, p<

.001.However, the model fit according to the Pearson Chi-Square was bad (X2(12292) =

897070271841, p< .001), which might in part be due to the large sample size. Furthermore, the model failed the test of parallel lines; X2(12)= 627.72, p< .001. This suggests that the assumption of proportional odds was violated, casting doubt on the overall regression results.

In total, the model explained only 2.2% of the variance in the data as indicated by

Nagelkerke’s Pseudo R2.

As a robustness check, the analysis was re-run with a linear OLS regression model that treated the dependent variable as continuous. However, while the stepwise addition of predictors each significantly improved the model fit, the final model still only explained 1.6% of the variance in the dependent variable. However, the results did align better with the

Independent Kruskal-Wallis H test that showed no significant mean difference between online between national (M= 2.52, SD= .76) and regional newspapers (M= 2.52, SD= .84); H(1)=

.69, p= .407, as the model showed no significant effect for an article’s reach (b= -.01, p=

.326). 54

Model fit for ordinal regression model with impartiality as the DV

The model fit significantly better than an intercept only model; X2(4) = 2007.99, p<

.001. However, the goodness of fit chi-square values and their significance could not be calculated due to floating point overflow. Furthermore, the model failed the test of parallel lines; X2(8)= 192.04, p< .001. This suggests that the assumption of proportional odds was violated, casting doubt on the overall regression results. In total, the model explained 17.7% of the variance in the data as indicated by Nagelkerke’s Pseudo R2.

55

Appendix I Model fit and assumption checks for OLS regressions

Emotionality

Adding the reach dummy to a model with only modality as a predictor significantly increased the model fit; X2(1)= 252.29, p< .001. Adding article length as a covariate improved

the model fit further; X2(1)= 20.07, p< .001. In total, the model explained 4.1% of the

variance in the independent variable.

56

Negativity

Adding the reach dummy to a model with only modality as a predictor significantly increased the model fit; X2(1)= 175.29, p< .001. Adding article length as a covariate improved the model fit further; X2(1)= 36.82, p< .001. In total, the model explained 2.1% of the variance in the independent variable.

57

Readability

Adding reach to a model with only modality as a predictor did not significantly increase model fit, X(1)= .027, p = .871. Furthermore, the model explained less than .1% of the variance in the dependent variable, suggesting that neither an article’s reach, nor its modality affect its readability.

58

Appendix J Kruskal Wallis H tests for group differences in readability scores between different outlets

Table 1. Descriptive statistics and Kruskall Wallis test results for group differences between all outlets and national outlets

Descriptives Kruskall Wallis H test results for Mean differences

N M (SD) AZ STZ RP TS DW SZ

Aachener Zeitung 1138 - -734.86*** -1081.96*** -684.53*** -854.97*** -656.58*** (AZ)

Stuttgarter Zeitung 1352 - -347.10* -50.34 -120.11 -78.28 (STZ)

Rheinische Post 2,548 - -397.43** -226.99 -425.38*** (RP)

Der Tagesspiegel 1550 - -170.44 -27.95 (TS)

Die Welt 1008 (DW) - -198.39

Die 3895 Süddeutsche - (SZ)

Note: *p<0.05, **p<.01, ***p<.001

59