Media Bias, the Social Sciences, and NLP: Automating Frame Analyses to Identify Bias by Word Choice and Labeling
Total Page:16
File Type:pdf, Size:1020Kb
Media Bias, the Social Sciences, and NLP: Automating Frame Analyses to Identify Bias by Word Choice and Labeling Felix Hamborg Dept. of Computer and Information Science University of Konstanz, Germany [email protected] Abstract Even subtle changes in the words used in a news text can strongly impact readers’ opinions (Pa- Media bias can strongly impact the public per- pacharissi and de Fatima Oliveira, 2008; Price et al., ception of topics reported in the news. A dif- 2005; Rugg, 1941; Schuldt et al., 2011). When re- ficult to detect, yet powerful form of slanted news coverage is called bias by word choice ferring to a semantic concept, such as a politician and labeling (WCL). WCL bias can occur, for or other named entities (NEs), authors can label example, when journalists refer to the same the concept, e.g., “illegal aliens,” and choose from semantic concept by using different terms various words to refer to it, e.g., “immigrants” or that frame the concept differently and conse- “aliens.” Instances of bias by word choice and label- quently may lead to different assessments by ing (WCL) frame the referred concept differently readers, such as the terms “freedom fighters” (Entman, 1993, 2007), whereby a broad spectrum and “terrorists,” or “gun rights” and “gun con- trol.” In this research project, I aim to devise of effects occurs. For example, the frame may methods that identify instances of WCL bias change the polarity of the concept, i.e., positively and estimate the frames they induce, e.g., not or negatively, or the frame may emphasize specific only is “terrorists” of negative polarity but also parts of an issue, such as the economic or cultural ascribes to aggression and fear. To achieve effects of immigration (Entman, 1993). this, I plan to research methods using natural In the social sciences, research over the past language processing and deep learning while decades has resulted in comprehensive models to employing models and using analysis concepts from the social sciences, where researchers describe media bias as well as effective methods have studied media bias for decades. The first for the analysis of media bias, such as content anal- results indicate the effectiveness of this inter- ysis (McCarthy et al., 2008) and frame analysis disciplinary research approach. My vision is (Entman, 1993). Because researchers need to con- to devise a system that helps news readers to duct these analyses mostly manually, the analyses become aware of the differences in media cov- do not scale with the vast amount of news that is erage caused by bias. published nowadays (Hamborg et al., 2019a). In turn, such studies are always conducted for topics 1 Introduction in the past and do not deliver insights for the cur- Media bias describes differences in the content or rent day (McCarthy et al., 2008; Oliver and Maney, presentation of news (Hamborg et al., 2018). It is a 2000); this would, however, be of primary interest ubiquitous phenomenon in news coverage that can to people reading the news. Revealing media bias have severely negative effects on individuals and so- to news consumers would also help to mitigate bias ciety, e.g., when slanted news coverage influences effects and, for example, support them in making voters and, in turn, also election outcomes (Alsem more informed choices (Baumer et al., 2017). et al., 2008; DellaVigna and Kaplan, 2007). Po- In contrast, in computational linguistics and com- tential issues of biased coverage, whether through puter science, fewer approaches systematically an- the selection of topics or how they are covered, are alyze media bias (Hamborg et al., 2019a). The compounded by the fact that in many countries only models used to analyze media bias tend to be sim- a few corporations control large parts of the media pler (Hamborg et al., 2018; Park et al., 2009) com- landscape–in the US, for example, six corporations pared to previously mentioned models. Many ap- control 90% of the media (Business Insider, 2014). proaches analyze media bias from the perspective 79 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 79–87 July 5 - July 10, 2020. c 2020 Association for Computational Linguistics of news consumers while neglecting both the es- well as (2) target-dependent sentiment classifica- tablished approaches and the comprehensive mod- tion (TSC) including “sentiment shift” and iden- els that have already been developed in the social tification of framing effects and causes (see Sec- sciences (Evans et al., 2004; Mehler et al., 2006; tion 4.2). I plan to embed both techniques into Munson et al., 2013, 2009; Oelke et al., 2012; Park an approach that is inspired by the procedure of et al., 2009; Smith et al., 2014). Correspondingly, manually conducted, inductive frame analyses (cf. their results are often inconclusive or superficial, Section 3.1). despite the approaches being technically superior. For the first technical contribution, a sieve-based CDCR approach was already devised that addresses 2 Research Question, Tasks, and characteristics of coreferences as they often occur Contributions in bias by WCL. The examples in the Abstract To address the issues described in Section1, I de- show that even phrases that are usually considered fine the following research question for my Ph.D. contrary may be coreferential in a set of articles research: How can an automated approach identify reporting on a specific event. For the second tech- instances of bias by word choice and labeling in a nical contribution, i.e., to estimate how a semantic set of English news articles reporting on the same concept may be perceived by people when read- event? To address this research question, I derive ing a news article, I primarily plan to devise and the following research tasks: test neural models that I will design specifically for the task. I also plan to implement a prototype T1. Identify the strengths and weaknesses of man- that includes visualizations to reveal the identified ual and of automated methods used to identify instances of bias by WCL to users of the system. media bias. In the remainder of this document, I will give a T2. Research NLP techniques and required brief overview of manual techniques for the anal- datasets to address these weaknesses. To do ysis of bias by WCL and exemplary results from so, use established bias models and (semi-) the social sciences as well as related, automated automate currently manual analysis methods. approaches (Section3). Section3 concludes with the current research gap, which motivates my Ph.D. T3. Implement a prototype of a media bias iden- research. Section4 describes the tasks that I have tification system that employs the developed already conducted as well as current and future methods to demonstrate the applicability of tasks to complete my Ph.D. research. Section5 the approach in real-world news article collec- describes a preliminary evaluation, which I already tions. The target group of the prototype are completed, as well as remaining tasks. non-expert people. T4. Evaluate the effectiveness of the bias identifi- 3 Related Work cation methods with a test corpus and evaluate The following summarizes an interdisciplinary lit- the effectiveness of using the prototype in a erature review that I conducted as part of my Ph.D. user study. research (T1) (Hamborg et al., 2019a). Combining the expertise of the social sciences and computational linguistics appears beneficial for 3.1 Manual Approaches research on media bias. Thus, the main contribu- In the social sciences, the news production pro- tion of my Ph.D. research will be an approach that cess is an established model that defines nine forms combines models and methods from multiple disci- of media bias and describes where these forms plines. On the one hand, it will leverage established originate from (Baker et al., 1994; Hamborg et al., models from the social sciences to describe media 2019a, 2018; Park et al., 2009). For example, bias and will follow currently manual methods to journalists select events, sources, and from these analyze media bias. On the other hand, it will take sources the information they want to publish in a advantage of scalable methods for text analysis de- news article. While these initial selections are nec- veloped and used in computational linguistics. I essary due to the multitude of real-world events, need to employ and extend the state-of-the-art in they may also introduce bias to the resulting story. two closely related NLP fields (cf. Section4): (1) While writing an article, authors can affect readers’ cross-document coreference resolution (CDCR) as perception of a topic through word choice (cf. Sec- 80 tion1, Baker et al., 1994; Gentzkow and Shapiro, example, Lim et al.(2018); Spinde et al.(2020b) 2006; Oelke et al., 2012). Lastly, for example, the propose to investigate words with a low document placement and size of an article on a website deter- frequency in a set of news articles reporting on the mine how much attention the article will receive. same event, to find potentially biasing words that Researchers in the social sciences primarily con- are characteristic for a single article. NewsCube duct frame analyses or, more generally, content 2.0 employs crowdsourcing to estimate the bias of analyses to identify instances of bias by WCL (Mc- articles reporting on a topic. The system allows Carthy et al., 2008; Oliver and Maney, 2000). In users to annotate WCL in news articles collabora- content analysis, researchers first define one or tively (Park et al., 2011a). more analysis questions or hypotheses. Then, they The most related, fully automated field of meth- gather the relevant news data, and coders system- ods is TSC, which aims to find the connotation of atically read the texts, annotating parts of the texts a phrase regarding a given target.