MSC ARTIFICIAL INTELLIGENCE MASTER THESIS

Modeling the Language of Populist Rhetoric

by PERE-LLUÍS HUGUET CABOT 12345466

March 1, 2021

48 ECTS Nov 19 - Jun 20

Supervisor: Dr. Ekaterina SHUTOVA Cosupervisor: Dr. David ABADI Assessor: Dr. Giovanni COLAVIZZA

INSTITUTEFOR LOGIC, LANGUAGEAND COMPUTATION

iii

UNIVERSITY OF AMSTERDAM Abstract

Institute for Logic, Language and Computation

Master of Science

Modeling the Language of Populist Rhetoric

by Pere-Lluís HUGUET CABOT

In recent years, populism has taken the spotlight with its growth and media presence across various countries worldwide. While socio-economic factors have been con- sidered key in populist attitudes, lately the interaction between emotions and social identity is scrutinized as crucial to explain populist attitudes and its rhetoric. At the same time, Natural Language Processing (NLP) has recently provided computational models that tackle more ambitious tasks, enabling the in-depth study of political discourse and populist rhetoric. In this thesis, we will provide one of the first com- putational approaches into populist attitudes and political discourse through the use of deep learning architectures. We incorporate Multi-task Learning (MTL) with the use of auxiliary tasks that act symbiotically with political discourse. We create a new populism centered dataset (PopulismVsReddit) that enables us to model social identity in social media comments (Reddit) and the influence of biased news. In our work, we observe that metaphors and emotions play an important role when addressing political discourse. Moreover, we found evidence that emotions inter- act with the attitude different social groups receive online and provide significant improvements to identify out-group sentiments in Reddit comments. Overall, we highlight the importance of emotions on political discourse and the use of multi-task approaches that incorporate them to assess social identity and populist rhetoric.

v

Acknowledgements

First of all, I want to acknowledge and express my gratitude to both my supervisor Ekaterina Shutova and co-supervisor David Abadi for trusting in me and offering me a project that would spark my interests, academically and personally. Thank you for going the extra mile, especially as under the current circumstances around the COVID-19 pandemic your help and commitment wasn’t affected. Katia, thank you for the patience and dedication, you helped me to keep on track while encouraging my freedom and curiosity within the research conducted. In chal- lenging times, for someone with many responsibilities you have always given a hu- man factor to your supervision and I really appreciate that. Thank you for your commitment on teaching NLP, it is because of your enthusiasm that I am invested in this field. David, I will remember our lengthy meetings fondly. You have provided me the crucial perspective from different fields (psychology and communication science), while being invested in learning about Natural Language Processing and Artificial Intelligence. Thank you for pushing the collaboration between our fields, which has made this thesis possible. I want to particularly thank Verna Dankers, who laid the ground of what this thesis is about, first by helping to spark my interest in NLP as a TA, and afterwards with her invaluable feedback, ideas and collaboration. I want to thank anyone else who has provided information, perspectives or ideas that have contributed in any way in this work, either consciously or not, through conversation, sharing a beer or an email. Thank you to the friends back home (Àdel, Gerard, Xavi, etc) for not holding a grudge for going abroad and being a remote moral support. Finally, I must thank colleagues within our MSc program. Our course has been a great example on how collaboration can be a boost for anyone who takes part in it. And I enjoyed sharing ideas and discussions in our Slack channel. Thank you Christina for making life easier and happier, those are perhaps the most important aspects for a successful thesis. Finally thank you to my mother, my father and my sister, because even if family isn’t chosen, I would choose them anyways if I had the chance and I feel immensely lucky for having their support.

vii

Contents

1 Introduction 1 1.1 Motivation and Research Questions ...... 1 1.2 Methodology and Contributions ...... 2 1.3 Thesis Structure ...... 3

2 Related Work 5 2.1 Neural Networks models in NLP ...... 5 2.2 Populism ...... 7 2.3 Political ...... 8 2.4 Emotions ...... 9 2.5 Metaphors ...... 11

3 Modeling Metaphor and Emotion in Political Discourse 13 3.1 Related work ...... 14 3.2 Tasks and Datasets ...... 15 3.3 Methods ...... 15 3.4 Experiments and Results ...... 17 3.5 Discussion ...... 19 3.6 Conclusion ...... 21

4 PopulismVsReddit: A Dataset on Emotions, News Bias and Social Identity 23 4.1 Dataset Creation ...... 24 4.2 Data Analysis ...... 30 4.3 Conclusion ...... 38

5 Modeling the Out-Group through Multi-task Learning 39 5.1 Tasks ...... 40 5.2 Methods ...... 41 5.3 Experiments ...... 42 5.4 Results ...... 43 5.5 Discussion ...... 43 5.6 Conclusion ...... 49

6 Conclusion 55 6.1 Future Work ...... 56 6.2 Social Impact and Responsible Use ...... 56 6.3 Publications ...... 57 6.4 Funding statement ...... 57

Bibliography 63 viii

A Extra Material 75 A.1 System ...... 75 A.2 Lists ...... 75 A.3 Additional Tables ...... 76 A.4 Figures ...... 76 1

Chapter 1

Introduction

Populism has taken the spotlight in political communication in recent years. Various countries around the globe have experienced a surge of populist rhetoric (Inglehart and Norris, 2016) in both the public and political space. Populism, when understood as a communication strategy, employs political discourse as a channel through dif- ferent types of media, such as news and social networks (Jagers and Walgrave, 2007). Through different platforms, populism uses certain rhetoric that revolves around social identity (Hogg, 2016; Abadi, 2017) and the Us vs. Them argumentation (Mudde, 2004). Social psychological and emotional perspectives have described populist communication strategies (Rico, Guinjoan, and Anduiza, 2017) and demonstrated its operationalization through experimental research (Wirz et al., 2018) as being suc- cessful in inducing emotions. Moreover, emotions have been shown to be crucial in shaping the public opinion (Demertzis, 2006; Marcus, 2002; Marcus, 2003). At the same time, metaphors also serve as a mechanism to influence public opinion within political discourse (Lakoff, 1991; Musolff, 2004). Natural Language Processing (NLP) has a wide range of applications, aiming at understanding text and perform language processing tasks computationally, which often involve comprehension of complex language, including political discourse. Previous work has proven to be successful in determining the political affiliation of politicians using parliamentary data (Iyyer et al., 2014) and the in news sources (Li and Goldwasser, 2019; Kiesel et al., 2019). However, there is a lack of computational modeling approaches for populist rhetoric and populist attitudes, the closest approach being hate speech detection (Silva et al., 2016).

1.1 Motivation and Research Questions

This thesis has populism as a focus through the lenses of Natural Language Process- ing (NLP). Due to the nature of populism as an umbrella term usually interpreted from different perspectives, such as ideology, political discourse or rhetoric, explor- ing populism leads to several questions regarding how to model populist rhetoric computationally. Our main research question is:

1. How can we capture populist rhetoric using Deep Learning models within Natural Language Processing?

In previous research, emotions have been shown to be tied to populist rhetoric (Demertzis, 2006). Recent examples using emotions in multi-task learning models have shown how it can improve performance on other tasks such as metaphor de- tection (Dankers et al., 2019). Certain metaphors (implicit vs. explicit) are used within the populist rhetoric to evoke certain emotional reactions. Since emotions 2 Chapter 1. Introduction have a strong relation to populist attitudes, we aim to explore whether computa- tionally they offer any advantage in MTL setups with populist rhetoric, leading to the research question,

2. How do emotions interact with populist rhetoric and can they contribute to modeling it?

Metaphors are used as mechanisms to engage and convince Flusberg, Matlock, and Thibodeau, 2018, and are ubiquitous within political discourse (Beigman Kle- banov, Diermeier, and Beigman, 2008). We expect them to be beneficial in the context of populist rhetoric and we also intend to explore the role of metaphor in multitask learning setups.

3. How do metaphors interact with populist rhetoric and can they contribute to modeling it?

Populist rhetoric adapts according to the communication channel it deploys. Bi- ased news sources and fake news are ways of circulating populist attitudes (Schulz, Wirth, and Müller, 2020). Social media interactions show both reactions and spread of populist rhetoric (Engesser et al., 2017; Mazzoleni and Bracciale, 2018). Political speeches include populist rhetoric and provide an opportunity to identify populist actors Hawkins et al., 2019. Due to the lack of available data, explicitly referring to populist rhetoric, we intend to explore the discourse using populist rhetoric, and to model political discourse in different media.

4. How to model political discourse or populist rhetoric used in different me- dia?

Social media constitute a crucial platform for the spread of populist rhetoric (Pos- till, 2018; Alonso-Muñoz, 2019). Biased news sources have been shown to incite pop- ulist rhetoric and are often spread through social media, which act as their amplifier (Speed and Mannion, 2017). Reddit is an online social news platform, allowing for such phenomena by creating threads started by posting a news article URL. We in- tend to explore how the populist rhetoric is spread in social media as a reaction to biased news sources, in order to be able to model such communication behavior.

1.2 Methodology and Contributions

New Transformer-based models, started by BERT (Bidirectional Encoder Representa- tions from Transformers) (Devlin et al., 2018), allowed us to utilize large-scale language models, pretrained in an unsupervised fashion and fine-tune them in new settings and downstream tasks. We base our models in RoBERTa (Liu et al., 2019b), by fine- tuning its pre-trained base model to the different setups and tasks. Multi-task Learning (MTL) (Caruana, 1993) provides a paradigm within Deep Learning by simultaneously training models in multiple tasks. MTL can provide a novel way to incorporate new information into modeling political discourse and populist rhetoric. Therefore, we aim to utilize data from different sources and MTL approaches with emotions and metaphors as auxiliary tasks. We will explore po- litical bias from political and mainstream discourse (politicians vs. newspapers) as well as framing in news sources. We also collect and annotate a new dataset of Reddit comments posted in re- sponse to shared news articles, to study the relation of populist rhetoric and social 1.3. Thesis Structure 3 identity. We call this new dataset RedditVsPopulism and employ it in an MTL set- ting, in order to exploit the interactions between populist rhetoric, social identity and emotions. We present the first models to jointly learn political discourse related tasks with emotion or metaphor detection. Finally, we provide a model trained simultaneously on emotions and group identification, which recognizes the attitude towards a social group, providing a valuable tool to assess the Us vs. Them rhetoric in social media. We find that MTL provides a successful setup to boost performance on political discourse tasks as well as populist attitude. Moreover, we show the interaction be- tween the auxiliary tasks and the way information is shaped within the network, as well as a qualitative analysis on the results. These insights demonstrate the impor- tance of emotions in populist rhetoric. Furthermore, a data analysis of the annota- tion results reveals the significant differences on the attitudes towards social groups online, as well as the influence of the news source bias on shaping those attitudes.

1.3 Thesis Structure

Chapter 2. Provides an overview of background work, which serves to introduce work related to the main body of the thesis and useful context on the topics it re- volves around.

Chapter 3. Focuses on political discourse and its use in different media as posed by question 4. We explore how MTL can aid in its comprehension by showing to what extent emotions and metaphors play an important role in tasks such as political bias, framing in news or political affiliation (questions 2 and 3).

Chapter 4. Discusses the creation of a new dataset through crowd-sourced anno- tation due to the lack of available data, which tackles populist rhetoric to train Deep Learning models. We describe the data gathering process, the nature of the data and the design of the annotation task. We analyze the annotation results, which contain valuable information on how populist rhetoric spreads within social media. Our annotation procedure includes emotions closely related to populist attitudes. To the best of our knowledge, this dataset is the first of its kind. It will allow us to answer question 1 and to further explore the relationship between populist rhetoric and emotions (question 2).

Chapter 5. We deploy the dataset created in the previous chapter to train several Neural Networks to model the out-group attitude and tackle question 1, and we ex- plore the use of MTL setups using emotions (question 2) and group identification.

5

Chapter 2

Related Work

2.1 Neural Networks models in NLP

The recent explosion of Deep Learning approaches to NLP, triggered by word em- beddings and the Skipgram model (Mikolov et al., 2013) has brought a diverse spec- trum of new tasks and models. Convolutional Neural Networks (CNNs) are ubiqui- tous neural networks within Computer Vision, due to their properties as being fairly lightweight due to their shared weight architecture and being translation invariant. As early as 2008 Collobert and Weston, 2008 used CNNs in NLP on a Multi-Task Neural Network that used a CNN to encode sentences to predict 6 different tasks for a given word. CNNs in NLP are mostly applied in classification tasks because of their fixed size output but have been also used in other applications as discussed in Moreno Lopez and Kalita, 2017. So far, Recurrent Neural Networks (RNNs) were the most used networks in NLP. Their ability to capture long-term dependencies in text, a property that CNNs cannot achieve to the same degree, is key in several NLP tasks. Long Short Term Memory (LSTM) cells (Hochreiter and Schmidhuber, 1997) and Gated Recurrent Units (GRU) (Cho et al., 2014) extend the idea behind RNNs to avoid common problems as the vanishing gradient. ELMO (Embeddings from Language Models) (Peters et al., 2018) made use of Bidi- rectional LSTMs trained on a large corpora to get context-aware pre-trained embed- dings, which can be used in other tasks through Transfer Learning. However, RNNs by themselves may fall short to capture very long dependencies due to all the infor- mation being "bottle-necked" in the hidden output of the previous time-step. This leads to a decline in performance for longer sentences. Attention is a mechanism that computes a linear transformation to hidden rep- resentations, which is learned from previous states in the network, which allows an explicit way to encode information from previous steps. It was first introduced in Bahdanau, Cho, and Bengio, 2015 for an encoder/decoder architecture. Attention allowed to use a linear transformation of the encoder’s hidden representations as context vectors in the decoder, computed for each new word generated in the de- coder, allowing the decoder to pay attention to different units from the encoder at each generation step. Since then, different versions of Attention have been imple- mented with significant success. Yang et al., 2016 made use of Hierarchical Attention LSTMs to classify longer texts. In their work, words are first encoded using a Bidirectional LSTM (BiLSTM), and then they learn a sentence representation through Attention over the different words, which at the same time are fed to another BiLSTM to be encoded to a document rep- resentation by using Attention once again. Finally, Transformers models have taken advantage of self-attention mechanism. Vaswani 6 Chapter 2. Related Work et al., 2017 introduced this type of Attention module as a seq2seq model for transla- tion. The model comprises a stack of multi-head self-attention mechanisms as an en- coder which then used the same number of decoder heads, allowing Attention over the output of the encoder. The model achieved state of the art results and inspired a series of models called Transformers. By training on a masked language modeling using the encoders multi-head self- attention mechanism of Transformers, the authors behind BERT (Devlin et al., 2018) translated the success from Vaswani et al., 2017 to generate word and sentence level embeddings to achieve a state of the art performance on several NLP tasks, provid- ing a Transfer Learning model as ELMO did. The model was trained on a massive generic dataset but has been used on several other tasks by fine-tuning it without the need to train from scratch.

2.1.1 Multi-task Learning Multi-task Learning, or MTL, is the area of ML where a model is trained jointly on more than one task to either improve the performance in any of those tasks. MTL is based on the principle that the information behind learning multiple tasks can benefit the understanding of those tasks in a better way than if they had been learned independently from each other. The assumption is usually that related tasks can help provide regularization and generalization on the tasks. When we use MTL to improve performance on the main task by using related tasks, we refer to it as auxiliary learning, where the related tasks are considered auxiliary tasks. One of the first to point at the relevance of MTL within was Caruana, 1993. By examining the use of MTL in different tasks, he describes in which ways MTL may be helpful. Citing his article, "MTL can provide a data amplification effect (1), can allow tasks to eavesdrop on patterns discovered for other tasks (2), and can bias the network towards representations that might be overlooked if the tasks are learned separately (3)." In his paper Caruana, 1997 he sums up these as the auxiliary tasks providing an inductive bias to improve generalization since the model will be biased to prefer features that are useful across multiple tasks. As he also points out, it is hard to know where and how MTL does help on the main task, even after its benefits are proven. Nevertheless, it is shown to be a useful approach, in particular for tasks where data may be scarce or the tasks may be complex and prove to overfit, MTL can be a useful tool. It is also worth mentioning the similarity between Transfer Learning and MTL. Transfer Learning learns a task where information is then used on other ones, while MTL learns those tasks simultaneously. Its benefits lie on very similar principles of information sharing. The success of Transfer Learning in the recent NLP mod- els discussed in the previous section is at the same time an argument in favor of MTL. Successful Transfer Learning models, such as BERT, (Devlin et al., 2018) use an MTL approach at pre-training to learn different levels of language tasks that help the model generalize better at Transfer Learning once it is fine-tuned on high-level tasks like (Question Answering) or lower ones (SST-2). In Deep Learning, MTL models can have two types of sharing mechanisms be- tween the tasks. In hard-parameter sharing, the tasks share the hidden network and only the last layers are task-specific, which give each task’s output. This setup can help overcome overfitting issues since the same hidden representations are shared through the network for all tasks. It assumes that tasks are similar enough, such that 2.2. Populism 7 the shared hidden layers can encode meaningful information for both tasks, bene- fiting from domain information shared between tasks. Then the task-specific layers, known as a critical layer, must perform the task-specific transformation. In some situations, the difference between the tasks may involve having a hierarchical shar- ing of the layers, where the first layers perform a lower level shared task and other tasks have more depth in the network before its critical layer (Søgaard and Gold- berg, 2016). This could be the case where one task is performed on a word level, and therefore it has its critical layer earlier in the network, and the other is sentence clas- sification which needs to encode a sentence level representation before performing any inference in its last layer. In soft-parameter sharing, instead of sharing the network, the information is shared between tasks through sharing mechanisms at different points of the net- work. This can be through exchange of information encoded in their hidden rep- resentations, or having a shared layer between the tasks. The latter difference with hard parameter sharing is that both tasks still have their task-specific networks, and the shared layer adds shared information from different tasks. For instance, in Liu, Qiu, and Huang, 2016, a Coupled-Layer Architecture is used by having task-specific LSTMs that can share information between tasks. Within NLP, Collobert and Weston, 2008 trained a model using hard-parameter MTL on six different tasks, and led to state of the art results at the time. In Dankers et al., 2019, an MTL approach is used to model both emotions and metaphors at different levels. Several MTL approaches are used, from hard-parameter sharing to soft-sharing by using a Cross-Stitch Network (Misra et al., 2016). In Liu et al., 2019a a hard-parameter sharing method across the GLUE (Wang et al., 2018) tasks achieved state of the art results by training all their tasks at the same time and fine-tuning BERT. MTL can be a hard task to tune and successfully train. Chen et al., 2018 pro- posed the GradNorm algorithm which automatically balances training in deep MTL models by dynamically tuning gradient magnitudes. This can be helpful in both soft and hard parameter sharing networks. Here, we are interested in the interplay between populism, political bias, emo- tions and metaphors. Since the focus is populism and political bias, we consider metaphors and emotions as auxiliary tasks. This is known as Auxiliary Learning, where the main task, here political bias or populism, is improved by the use of aux- iliary tasks, metaphor and emotion detection.

2.2 Populism

Populism can be described from different perspectives. Essentially it is described not as a fully developed political ideology, but as a series of background beliefs and techniques, traditionally centered around the Us vs. Them dichotomy. In one of the first attempts to fully define populism (Mudde, 2004), it is described it as a thin ideology around the distinction between "the people", which includes the "Us", and the elites, which includes the Them, with politics being a tool for the people to achieve the common good or will. Over time, these descriptions have evolved and the understanding of the Us vs. Them adapted across countries or ideologies. But the general framework in which populist actors operate can be understood as a populist rhetoric and the use of lan- guage to elicit (emotional) responses and gain support. Previous work to outline populist rhetoric focused on a general description of populism to determine if a 8 Chapter 2. Related Work certain text, such as a party manifesto or a political speech contains what is under- stood as populist rhetoric or attitudes (Hawkins, 2009; Rooduijn and Pauwels, 2011; Manucci and Weber, 2017). Manual annotation was necessary to perform this anal- ysis, often by experts, which also limited the scope and amount of data used. While this work has been crucial to describe and determine how politicians or media has used populist rhetoric, the data gathered is too limited to train machine learning models, in order to capture what populist rhetoric contains. Even further, the sole description of what constitutes populist rhetoric is still diffuse and covers many dif- ferent aspects. In works like Hawkins et al., 2019, it is attempted to use holistic grading to assess whether a text is populist or not, in order to later determine the degree of populism of certain political leaders. This work ended up in creating a Global Populist Database, which contains political discourses from different countries labeled by their content of populist rhetoric. It is one of the most ambitious projects to systematically approach the use of populist rhetoric across the globe. However, the issue still lies in the nature of such a project, as labeling the data requires lots of time and expertise to assess extensive texts, resulting in a rather specific and multi- lingual dataset for a few populist actors, hence being too limited for the purpose of training Deep Learning models.

2.2.1 The Us vs. Them rhetoric Social identity explores the relations of individuals to social groups. Turner and Reynolds Turner and Reynolds, 2010 study the evolution of research into social iden- tity where they explain the Us vs. Them as an inter-group phenomenon, exposing its relation to social identity where the self is hierarchically organized and that it is possible to shift from intra-group (we) to inter-group (us versus them) and vice versa. Within the Us vs. Them concept, the inter-group has two levels, the in-group as- similation, where one identifies with a group, through a shared experience or sense of belonging; and the interaction with the out-group, where the outside groups are seen as antagonistic, or contrary to the in-group through identifying them as threats. These aspects are mainly explored within the social identity theory, however here we pragmatically refer to them as the Us vs. Them rhetoric. In our work we will focus on the out-group interaction, assuming the in-group to be implicit by using data from online communities (i.e., sub-Reddits). While there is no explicit out-group senti- ment identification work within the context of populism and Deep Learning, there is some work in terms of abusive language detection and hate speech (Silva et al., 2016), which is closely related.

2.3 Political Bias

Populism is not exclusive to a certain political perspective or side. There is right- wing populism as well as left-wing populism, however the main aim of populist rhetoric is to shift the public opinion towards a certain agenda (Alonso-Muñoz, 2019; Schroeder, 2019; Hameleers and Vliegenthart, 2020). The recent rise of biased me- dia, by the use of hyperpartisan news (i.e., whether it exhibits blind, prejudiced, or unreasoning allegiance to one party, faction, cause, or person), misinformation and fake news, is tied to populism. Moreover, the great majority of fake news are hyper- partisan (Potthast et al., 2018) and are tied to populist communication (Speed and Mannion, 2017). 2.4. Emotions 9

That is why in this work we also explore the political bias behind textual con- tent, such as news sources or political speeches. However, it is not a new task within NLP and has been previously explored. In most cases, data used for the task is obtained by Distant Supervision, where labels are rather obtained indirectly by us- ing the political bias of authors or the publishers than being determined by labeling textual data. Political speech can be assigned to a certain bias by the party mem- bership of the politician, and news media publishers often have an established po- litical bias. Allsides (AllSides Ratings) or Media-Bias/Fact-Check (Search and Learn the Bias of News Media) are common sources to learn about the bias of var- ious news sources. The Convote dataset is one of the first examples (Thomas, Pang, and Lee, 2006) of established text datasets to include political bias. It consists of US congressional-speech data, while each speech contains the support of or opposi- tion to a bill of a party spokesman. Along RNNs and word2vec embeddings it was used in the work of (Iyyer et al., 2014) to detect party membership based on political speeches. In the same paper, data from the Ideological Books Corpus (Sim et al., 2013) is used, which contains a collection of books and magazine articles by authors with a publicly known political ideology. The same issue has been tackled for news media. Recently, SemEval19 (Task 4 Hyperpartisan News Detection) (Kiesel et al., 2019) revolved on the challenge to pre- dict the hyperpartisan argumentation of a given news article. The task involved a dataset containing 645 manually annotated articles for which 238 were labeled as hyperpartisan to create a classification model. The first team (Jiang et al., 2019) con- structed a sentence representation as an average of pre-trained ELMo embeddings of its words and then predicted the label with a five-layer CNN with different filter sizes. The second team (Srivastava et al., 2019) used a set of handcrafted features, such as polarity and bias from lexicons, the use of superlatives and comparatives, as well as semantic features from word and sentence encoders, Glove, Doc2Vec and Universal Sentence Encoder (Pennington, Socher, and Manning, 2014; Le and Mikolov, 2014; Cer et al., 2018). Li and Goldwasser, 2019 use Twitter social information encoded with a Graph Neural Network combined with a Hierarchical LSTM (HLSTM) to predict the bias of news articles given by distant supervision by the publishers known bias. To encode the Twitter social information, they used the known bias of public figures and the users who shared each article to predict the bias of the article itself, which combined with the embedding from the HLSTM, lead to improved results.

2.4 Emotions

2.4.1 Emotions and Populism Emotions constitute part of the populist rhetoric and have been essential for informa- tion processing as well as the formation of (public) opinion among citizens (Marcus, 2002; Götz et al., 2005; Demertzis, 2006). While social identity and economical fac- tors have been considered as main indicators of populist parties growth (Rooduijn and Burgoon, 2018), emotional factors have lately become a focus within empirical studies, in particular regarding the reactions and spread of populist views. Latest attempts to scrutinize populism from the social psychological and emo- tional perspective have described populist communication strategies (Abadi et al., 2016; Rico, Guinjoan, and Anduiza, 2017) and demonstrated its operationalization through experimental research (Wirz et al., 2018) as being successful in inducing emotions. According to the concept of media populism (Krämer, 2014; Mazzoleni 10 Chapter 2. Related Work and Bracciale, 2018), media effects can further evoke hostility toward elites and (eth- nic/religious) minorities, because it contributes to the construction of social identi- ties, such as in-groups and out-groups (i.e., Us vs. Them). Emotions have been char- acterized by certain appraisal patterns, i.e. a negative event for which one blames the other is felt as anger - a pattern of appraisals is referred to as Core Relational Themes (Lazarus, 2001), which are the central (therefore core) harm or benefit that underlies each of the negative and positive emotions (Smith and Lazarus, 1993). It is important to also distinguish between different types of emotions through these Core Relational Themes, in particular in their relation to populism. For instance, contempt is more strongly associated with illegal and violent actions, while Anger is present in legal protests (Tausch et al., 2011). Furthermore, different emotions may have different roles within the left or right-wing populism, as proposed by Salmela and Scheve, 2017, and negative emotions appear to play a bigger role in right-wing populism (Nguyen, 2019).

2.4.2 Emotion Classification Emotion classification is closely related to one of the most prominent fields within NLP, sentiment analysis. While sentiment analysis focuses on assessing whether a piece of text is positive or negative, emotion classification focuses on the classifica- tion of a text based on the emotions it contains. Emotions have been described with different scales, the most common one being called Ekman’s Basic Emotions (Ekman, 1992), which encompasses six categories: Anger, Disgust, Fear, Happiness, Sadness and Surprise. Other scales translate emotions to a 3-dimensional space, such as the Valence- Arousal-Dominance (VAD) model (Russell and Mehrabian, 1977). VAD approach states that all emotions can be mapped to a point in a three-dimensional space, which is composed by three independent dimensions; Valence, which encompasses posi- tive or negative sentiment; Arousal, which shows the degree of engagement; and Dominance, which indicates the control or dominance over emotions. These models have been translated into different datasets, in order to assess emo- tions in text. ISEAR model (Scherer and Wallbott, 1994) used Ekman’s model, with a single discrete label granted to each text sample (obtained from crosscultural stud- ies in 37 countries). Each emotion is roughly equally represented, with 1093-196 samples each. EmoBank (Buechel and Hahn, 2017) is a recent dataset, which includes more than ten thousand manually annotated sentences on each dimension of the VAD model. This dataset contains text from a variety of sources such as headlines, blogs and books. One of the most common approaches to identify emotions in text is, similarly to sentiment analysis, to use a lexicon. There are various General-purpose Emotion Lex- icons (GPELs), as explored in (Bandhakavi et al., 2017). For example, EmoLex (Mo- hammad and Turney, 2013) has achieved a certain degree of success in identifying emotions within textual data with the use of a lexicon. Similarly, software prod- ucts such as LIWC (Linguistic Inquiry and Word Count) (Tausczik and Pennebaker, 2010) use word frequency and a proprietary dictionary to categorize text, including its emotional content. SemEval (Semantic Evaluation) conference has hosted emotion-related tasks, such as the SemEval-2007 (Task 14: Affective Text) (Strapparava and Mihalcea, 2007), which revolved around the classification of emotions in news headlines using Ekman’s Ba- sic Emotions scale. In Danisman and Alpkocak, 2008 a vector space model is used, 2.5. Metaphors 11 which improved results over Naive Bayes classifier and Support Vector Machine with an F1 of 33.22%. SemEval-2018 (Task 1: Affect in Tweets) included another emotion- related task, in which Mohammad et al., 2018 a dataset of tweets were labeled for eleven emotions through Crowdsourcing Annotation. Each tweet could be anno- tated for more than one emotion, including intensity and valence, therefore com- prising more than one task within the same dataset. It included Arabic, English and Spanish tweets, with the English portion containing 10,097 tweets. Recently, Zhang et al., 2018 have approached emotion classification using a Multi- task Convolutional Network by introducing the task of Emotion Distribution Learning, which consists of mapping each sentence into an emotion-vector, where each dimen- sion represents the intensity of each emotion. This approach achieved state of the art results in some of the previously mentioned datasets like EmoBank and ISEAR.

2.5 Metaphors

2.5.1 Metaphors and Political Language Often metaphors are considered to be prominent in the political domain (Beigman Klebanov, Diermeier, and Beigman, 2008) and known to reflect or reinforce a par- ticular viewpoint. For instance, war metaphors are commonly used in political lan- guage, particularly in populist rhetoric. Flusberg, Matlock, and Thibodeau, 2018 discuss the use of such metaphors with historical examples such as the "War on Drugs" by former US-president Ronald Reagan. Moreover, War on Christmas is mentioned as an assault on public displays of Christianity by the political Left and decried by right-wing pundits. This shows an example how out-group threat is com- monly viewed in the Us vs. Them dichotomy and to what extent populist rhetoric is connected to social identity. Using multivocal sense through the use of metaphors reinforces these perceptions, as they can also be used as in-group signals (Albertson, 2014). "Dog Whistle Politics" refers to the act of using coded or hidden-intended lan- guage aimed at a particular audience to stir support. During his presidency, Ronald Reagan used the cryptic term "Welfare Queens" considered as dog-whistles to middle- class white Americans to gain support and antagonize minorities (Lopez, 2013). Dog whistling has lately gained traction in social media where such coded messages are used to appeal and signal to a certain group, such as the Alt-Right. These dog-whistles can be understood as a metaphorical language by the in-group. Therefore, it is hy- pothesized that tasks like hyperpartisan news detection could benefit from being combined with metaphor detection in a MTL setup.

2.5.2 Computational Modeling of Metaphors. Metaphors can be interpreted at different levels of language. Whether a word has a metaphorical meaning, or a sentence is metaphorical are different tasks and various computational methods have used specific approaches according to the task. At the same time, identifying when a word or a sentence has a metaphorical meaning is a distinct task than assessing the intended meaning behind the metaphor. The latter is similar to word-sense disambiguation, where the system has to identify which sense of a word is used, and becomes a much more complex task. In this thesis, we will focus on metaphor identification as an auxiliary task at the word level. However, any metaphor related task has the relevance of context in common, in order to assess the 12 Chapter 2. Related Work metaphoricity (i.e., the quality of being metaphorical) of language even if it occurs at the word level. Early approaches used hand-engineered properties that involved the use of su- pervised texts, in order to take advantage of patterns or properties for which metaphors have different behaviors, before using machine learning models trained on those features. Mohler et al., 2013 used semantic signatures based on domain concepts extracted from Wikipedia and then trained on a Random Forest classifier. Several computational approaches have been successful in capturing metaphors in different domains as discussed in Shutova, 2015 and Veale, Shutova, and Klebanov, 2016. Deep Learning systems improved upon handcrafted features and have been the base of most recent computational metaphor detection approaches. Rei et al., 2017 presented the first deep learning architecture designed to capture metaphorical com- position. Supervised Similarity Network was inspired by Shutova, Kiela, and Maillard, 2016, where learned word embeddings are compared by using cosine similarity, in- corporating a gating mechanism that modulates the representation, incorporating the one from the preceding word. Gao et al., 2018 made use of Bidirectional LSTMs in order to capture context and predict metaphors at word and sentence levels. In recent work of Dankers et al., 2019, a Multi-Task Neural Network achieved state of the art results on metaphor clas- sification by using emotions as an auxiliary task. 13

Chapter 3

Modeling Metaphor and Emotion in Political Discourse

While the core definition of populism is still discussed, Mudde, 2004 provides a formal definition. In his discussion on populism, he poses the question is populism an ideology, a syndrome, a political movement or a political style?. Moreover, he defines populism as an ideology that considers society to be ultimately separated into two homogeneous and antagonistic groups, the pure people versus the corrupt elite, and which argues that politics should be an expression of the volonté générale (gen- eral will) of the people. In that definition it is clear there is a rhetorical component of populism, which uses political discourse as a mechanism (Jagers and Walgrave, 2007). Therefore, in this chapter we focus on different aspects and channels of polit- ical discourse where populist rhetoric takes place and also take into account several prominent tasks there. The role of metaphor and emotion in political discourse has been investigated in fields such as communication studies (Weeks, 2015; Mourão and Robertson, 2019), political science (Charteris-Black, 2009; Ferrari, 2007) and psychology (Bougher, 2012; Edwards, 1999). Political rhetoric may rely on metaphorical framing to shape pub- lic opinion (Lakoff, 1991; Musolff, 2004). Framing selectively emphasizes certain aspects of an issue that promote a particular perspective (Entman, 1993). For in- stance, government spending on the wealthy can be portrayed as a partnership or bailout, spending on the middle class as simply spending or stimulus to the economy and spending on the poor as a giveaway or a moral duty, the former corresponding to the conservative and the latter to the liberal point of view (Peters, 1988). Metaphor is an apt framing device, with different metaphors used across communities with distinct political views (Kövecses, 2002; Lakoff and Wehling, 2012). At the same time, metaphorical language has been shown to express and elicit stronger emotion than literal language (Citron and Goldberg, 2014; Mohammad, Shutova, and Tur- ney, 2016) and to provoke emotional responses in the context of political discourse covered by mainstream newspapers (Figar, 2014). For instance, the phrase “immi- grants are strangling the welfare system” aims to promote fear of immigration. On the other hand, the experienced emotions may influence the effects of news framing on public opinions (Lecheler, Bos, and Vliegenthart, 2015) and individual variations in emotion regulation styles can predict different political orientations and support for conservative policies (Lee Cunningham, Sohn, and Fowler, 2013). Metaphor and emotion thus represent crucial tools in political communication. At the same time, computational modeling of political discourse, and its specific aspects, such as political bias in news sources (Kiesel et al., 2019), framing of societal issues (Card et al., 2015), or prediction of political affiliation from text (Iyyer et al., 2014) have received a great deal of attention in the NLP community. Yet, none of 14 Chapter 3. Modeling Metaphor and Emotion in Political Discourse this research has incorporated the notions of metaphor and emotion in modeling political rhetoric. In this chapter we present the first joint models of metaphor, emotion and polit- ical rhetoric, within a multi-task learning (MTL) framework. We make use of aux- iliary learning, i.e. training a model in more than one task to improve the perfor- mance on a main task. We experiment with three tasks from the political realm, predicting (1) political perspective of a news article; (2) party affiliation of politi- cians from their social media posts; and (3) framing dimensions of policy issues. We use metaphor and emotion detection as auxiliary tasks, and investigate whether in- corporating metaphor or emotion-related features enhances the models of political discourse. Our results show that incorporating metaphor or emotion significantly improves performance across all tasks, emphasizing the prominent role they play in political rhetoric.

3.1 Related work

Modeling political discourse encompasses a broad spectrum of tasks, including es- timating policy positions from political texts (Thomas, Pang, and Lee, 2006; Lowe et al., 2011), identifying features that differentiate political rhetoric of opposing par- ties (Monroe, Colaresi, and Quinn, 2008) or predicting political affiliation of Twit- ter users (Conover et al., 2011; Pennacchiotti and Popescu, 2011; Preo¸tiuc-Pietro et al., 2017; Rajamohan, Romanella, and Ramesh, 2019). Deep neural networks have been widely used to model political perspective, bias or affiliation at document level: Iyyer et al. (2014) used a Recurrent Neural Network (RNN) to predict politi- cal affiliation from US congressional speeches. Li and Goldwasser (2019) identified the political perspective of news articles using a hierarchical Long Short-Term Mem- ory (LSTM) and social media user data modeled with Graph Convolutional Networks (GCN). Lastly, a recent shared task presented a multitude of Deep Learning methods to detect political bias in articles (Kiesel et al., 2019). Framing in political discourse is a relatively unexplored task. Hartmann et al. (2019) classified frames at a sentence level using bidirectional LSTMs and GRUs. Ji and Smith (2017) trained Tree-RNNs to classify framing of policy issues in news articles. Approaches predicting emotions for a given text typically adopt a categorical model of discrete, prototypical emotions, e.g. the six basic emotions of Ekman (1992). Early computational approaches employed vector space models (Danisman and Alpkocak, 2008) or shallow machine learning classifiers (Alm, Roth, and Sproat, 2005; Yang, Lin, and Chen, 2007). Examples of deep neural methods are the recurrent model of Abdul-Mageed and Ungar (2017), who classified 24 fine-grained emotions, and the transformer-based SentiBERT architecture of Yin, Meng, and Chang (2020). Computational research on metaphor has mainly focused on detecting metaphor- ical language in text. Early research performed supervised classification with hand- engineered lexical, syntactic and psycholinguistic features (Tsvetkov et al., 2014; Beigman Klebanov et al., 2016; Turney et al., 2011; Strzalkowski et al., 2013; Bu- lat, Clark, and Shutova, 2017). Alternative approaches perform metaphor detection from distributional properties of words (Shutova, Sun, and Korhonen, 2010; Gutiér- rez et al., 2016) or by training deep neural models (Rei et al., 2017; Gao et al., 2018). Dankers et al. (2019) developed a joint model of metaphor and emotion by fine- tuning BERT in an MTL setting. 3.2. Tasks and Datasets 15

3.2 Tasks and Datasets

Political Perspective in News Political news can be biased towards the left or right wing of the political spectrum. To model such biased perspectives computationally, we classify articles as left, right or center using data from Li and Goldwasser (2019).1 The articles are from the website AllSides2 and are annotated based on their source’s bias. The training and test sets contain 2008 and 5761 articles, respectively. We use 30% of training data for validation. The splits are stratified based on the bias, yet, they do not take into account the news source. This may lead to articles of the same source being contained in different sets, which can lead to some degree of data con- tamination since the labels are based on sources.

Political Affiliation For this task, we use the dataset of Voigt et al. (2018)3, which was created to explore gender bias in online communication. The data comes from different sources; politicians and public figures Facebook posts, responses to TED speakers in TED talks, responses to fitocracy fitness posts and Reddit comments from selected sub-Reddits, while the latter are user-created areas of interest. In our case, we use the portion of the dataset that contains public Facebook posts from 412 US politicians. The training, validation and test sets contain 9792, 2356 and 2458 posts, respectively. The task is to predict republican or democrat for posts of unseen politicians. The classes are balanced on each set. We perform the sets split such that each set does not include any posts by politicians present in the other sets.

Framing The Media Frames Corpus4 (Card et al., 2015) contains news articles dis- cussing five policy issues: tobacco, immigration, same-sex marriage, gun-control and death penalty. There are 15 possible framing dimensions, e.g. economic, political etc. (see Appendix A, A.2). We use the article-level annotation to predict the framing dimension. Out of 23,580 articles, we use 15% as the test set, and 15% of the train- ing data for validation. The use of framing is based on the definition provided by Entman (1993). Articles are labeled according to how they frame a topic.

Metaphor For metaphor detection we use the VU Amsterdam dataset (Steen et al., 2010), which is a subset of the British National Corpus (Leech, 1992). The dataset contains 9,017 sentences and binary labels (literal or metaphorical) per word. We use the data split of Gao et al. (2018), that includes 25% of the sentences in the test set.

Emotion For emotion classification, we use a dataset from SemEval-2018 Task 1 (Mohammad et al., 2018), in which tweets were labeled for eleven emotion classes or as neutral (see Appendix A, A.2). We use the English portion of the dataset (10,097 tweets) and the shared task splits.

3.3 Methods

We employ the Robustly Optimized BERT Pretraining Approach (RoBERTa-base) pre- sented by Liu et al. (2019b) through the library provided by Wolf et al. (2019)5.

1https://github.com/BillMcGrady/NewsBiasPrediction 2https://www.allsides.com/unbiased-balanced-news 3https://nlp.stanford.edu/robvoigt/rtgender/ 4https://github.com/dallascard/media_frames_corpus 5https://huggingface.co/transformers 16 Chapter 3. Modeling Metaphor and Emotion in Political Discourse

Political Perspective Framing in News Legality, constitutionality 4920 Left 3931 and jurisprudence Political 4762 Center 4164 Crime and punishment 2187 Right 2290 Policy prescription 2116 VUA metaphors 200K words and evaluation Cultural identity 1633 Metaphors 11.6% Economic 1536 SemEval Emotions 10,097 Tweets Health and safety 1380 Anger 36.1% Public opinion 1122 Anticipation 13.9% Quality of life 1084 Disgust 36.6% Fairness and equality 849 Fear 16.8% Morality 811 Joy 39.3% Security and defense 609 Love 12.3% External regulation 312 Optimism 31.3% and reputation Capacity and resources 249 Pessimism 11.6% Other 10 Sadness 29.4% Political Affiliation Surprise 5.2% Republican Party 8132 Trust 5% Democratic Party 8132 Neutral 2.7%

TABLE 3.1: Dataset contents.

RoBERTa contains twelve stacked transformer layers and assumes an input sequence to be tokenized into subword units called Byte-Pair Encodings (BPE). A special token is inserted at the beginning of the input sequence to compute a contextualized sequence representation. Our tasks are defined at three levels of the linguistic hierarchy. The auxiliary tasks of metaphor detection and emotion prediction are defined at word and sen- tence level, respectively, while the main political tasks are defined at document level. For word-level metaphor identification, the subword encodings from RoBERTa’s last layer are processed by a linear classification layer. A word is considered metaphor- ical provided that any of its BPEs was labeled as metaphorical. We assume the BPE from inflections unlikely to cause a word to be incorrectly labeled as metaphorical. Figure 3.1 visualizes metaphor detection at its right side. For the sentence-level emotion prediction task and the document-level tasks of political affiliation and framing, the encoding serves as sequence representation and is fed to a linear classification layer. For political perspective in news, the av- erage document length exceeds the maximum input size of RoBERTa. Therefore, we split its documents into sentences and collect them in a maximum of 5 subdoc- uments with up to 256 subwords. After applying RoBERTa to the subdocuments, their encodings are fed to an attention layer yielding a document representation to be classified. Figure 3.1 depicts the classification of sentences or short documents at the right, and the processing of longer documents at the left. All task models use the cross-entropy loss with a sigmoid activation function. For the political perspective detection, the loss function includes class weights to account for class imbalance. 3.4. Experiments and Results 17

3.3.1 Multi-task Learning The MTL architecture uses hard parameter sharing for the first eleven transformer layers. The last layer of RoBERTa, the classification and attention layers are task- specific to allow for specialization, similar to the approach of Dankers et al. (2019). The main political tasks are paired with the metaphor and emotion tasks one by one. The task losses are weighted with α for the main task and 1 − α for the auxiliary task. We include an auxiliary warm-up period, during which α = 0.01, for some tasks. This allows the model to initially learn the (lower-level) auxiliary task while focusing mostly on the main task afterwards. This approach is similar to Kiperwasser and Ballesteros (2018).

3.4 Experiments and Results

3.4.1 Experimental Setup For α values intervals of 0.1 were tried. Also to set the warm-up period on scheduled learning, 3,4 or 5 epochs were tried. For the political affiliation task, 0.1, 0.2 and 0.3 were tested for Dropout probabilities. The hyperparameters were chosen by manual tuning based on the accuracy score on the validation sets. Hyperparameters that were shared between MTL and STL for the same main task were selected based on the performance on STL. The models are trained with the AdamW optimizer, a learning rate of 1e − 5 and a batch size of 32. The learning rate is annealed through a cosine-based schedule and warm-up ratios of 0.2, 0.3 and 0.15 for the political perspective in news, the political

FIGURE 3.1: Schematics of the MTL model. The left side shows the path for longer documents from the Political Perspective in News dataset, while the right side is the path for the rest of datasets and the auxiliary tasks. 18 Chapter 3. Modeling Metaphor and Emotion in Political Discourse

Framing Affiliation Perspective Li and Goldwasser (2019) - HLSTM (text-based) - - .746 - GCN-HLSTM (using social information) .917 STL .707 .794 .848 MTL, Metaphor .716 .805 .854 MTL, Emotion .708 .802 .860

TABLE 3.2: Accuracy scores for the main political tasks. Significance compared to STL is bolded (p < 0.05).

Perspective Affiliation Framing STL .832 .804 .699 MTL, Metaphor .835 .804 .703 MTL, Emotion .838 .811 .704

TABLE 3.3: Accuracy validation scores for the main political tasks. affiliation and the framing tasks, respectively. Dropout is applied per layer with a probability of 0.3 for political affiliation and 0.1 otherwise. The auxiliary warm-up period and α values are estimated per main task, for metaphor (αM) and emotion (αE) separately. For political perspective in news, αM = 0.7, αE = 0.8, and models were trained for 20 epochs, with early stopping. Within political affiliation prediction, αM = αE = 0.9 and the first 5 epochs are for auxiliary warm-up. The models were trained for 20 epochs total. For the framing task αM = αE = 0.5, with 5 epochs of auxiliary warm-up for metaphor. Training lasted 10 epochs at most, with early stopping. We average results over 10 random seeds. We perform significance testing using an approximated permutation test and 10 thousand permutations. Our work used Pytorch and the Huggingface library to load the pretrained models and train on the MTL. Some code was adapted from utils_nlp 6 library for training. Data splits and code are attached with the submission.

3.4.2 Results Table 3.2 summarizes our results. For political perspective in news, the STL model improves over the text-based method of Li and Goldwasser (2019). This illustrates that RoBERTa provides an enhanced document encoding for predicting political perspective. Both MTL setups significantly improved over the STL model. Joint learning with emotion proved most beneficial and outperformed the metaphor de- tection setup, significantly. While neither outperformed the GCN-HLSTM model, that model used social information that already outperformed by itself the text-only models in their original work, and requires such information to exist for the articles. For political affiliation prediction, both MTL setups improve over STL signifi- cantly, although there is no significant difference between them. Although we do not have any previous work on this dataset for political affiliation, the performance is on par of previous work for the task.

6https://github.com/microsoft/nlp-recipes/tree/master/utils_nlp 3.5. Discussion 19

In case of the framing task, joint learning with metaphor significantly outper- formed STL. MTL using emotion, on the other hand, yielded results on par with STL.

n Document Piece Gold Label MTL, Metaphor STL 1 . . . the anger simmering just below the surface in the U. S. is beginning to boil over. Right Right Centre 2 . . . and DNA evidence does not match. What once was considered an airtight case, Fairness and Fairness and Leg., Constit., Devine said, has evaporated into nothing Equality Equality Jurisdiction 3 . . . border security long have been sticking points in the immigration debate. Security Security Political Bowing to those concerns, Presidents Bush . . . & Defense & Defense

TABLE 3.4: Political perspective (1) and framing (2, 3) examples of metaphor-MTL improving over STL. Underlined are words predicted as metaphorical.

3.5 Discussion

Political Perspective in News For the political perspective task, the performance improvements of MTL models stem mostly from improved predictions for the right- wing class. Example 1 of Table 3.4 presents an emotive article snippet containing the metaphors of “boil over” and “simmering anger”, for which joint learning with metaphor corrected the STL prediction. Table 3.6 presents a breakdown of the per- formance per class. In both MTL models, there was an improvement for right-wing articles. It is worth to mention that it was the less represented class at training, which may indicate how MTL helped overcome issues of data imbalance, without hurting performance on the other classes. For instance Wu, Wu, and Liu (2018) successfully used an MTL approach to improve performance on unbalanced sentiment analysis tasks.

Political Affiliation Improvements from auxiliary tasks are due to a more accurate identification of the class of democrats. According to Pliskin et al. (2014), liberals are

FIGURE 3.2: Average performance accross the political spectrum for the Political Affiliation task. Dimension taken from Voteview. 20 Chapter 3. Modeling Metaphor and Emotion in Political Discourse

Anger Anticipation Disgust Fear Joy Love Optimism Pessimism Sadness Surprise Trust Democrat 34.0% 42.9% 42.2% 23.1% 61.9% 73.6% 54.0% 82.5% 76.4% 75.4% 41.6% Republican 66.0% 57.1% 57.8% 76.9% 38.1% 26.4% 46.0% 17.5% 23.6% 24.6% 58.4%

TABLE 3.5: Proportion of posts predicted for each emotion, using the best-performing emotion-MTL model.

STL Metaphor Emotion Political Perspective - Center .874 .879 .885 - Left .860 .863 .871 - Right .774 .784 .798 Political Affiliation - Democrat .788 .806 .799 - Republican .802 .805 .800 Framing - Economic .747 .759 .758 - Capacity and Resources .601 .604 .602 - Morality .646 .662 .648 - Fairness and Equality .502 .527 .511 - Crime and Punishment .719 .721 .717 - Security and Defense .554 .577 .560 - Health and Safety .683 .694 .684 - Quality of Life .572 .554 .556 - Cultural Identity .690 .703 .695 - Public Sentiment .670 .678 .675 - Political .808 .815 .812 - Legality, Constitutionality .787 .795 .784 and Jurisdiction - Policy Prescription .525 .538 .530 and Evaluation - External Regulation .695 .675 .681 and Reputation

TABLE 3.6: Average F1 for each class and task. more susceptible to emotions, which could in part explain this result. Figure 3.2 visualizes the performance across the political spectrum, from which we infer that politicians at the center are harder to distinguish, and those on the left are better identified by our MTL models. We explored the emotions predicted by the MTL model in politicians’ posts, as shown in Table 3.5. We found that emotions typically associated with conservative rhetoric – e.g. anger, disgust or fear (Jost et al., 2003) – were more frequent in republicans’ posts. On the contrary, emotions associated with liberals – e.g. love (Lakoff, 2002) or sadness (Steiger et al., 2019) – are more often predicted for democrats’ posts. Table 3.7 contains example posts where joint learning using emotion corrected the STL setup, where the emotions predicted align with emotions usually associated to each bias.

Framing In case of the framing task, MTL with metaphor prediction yielded the largest improvements for the frames of security and defense, morality and fairness 3.6. Conclusion 21

Emotions Facebook Post Gold Label Emo. MTL STL Last week, I held a Congress on Your Corner event in Frankfort. Monica was upset by the recent deal between the United States, our global partners, and Iran. The deal provides $7 billion in sanction relief in exchange for Iran limiting, but not halting, its nuclear Anger, activities. I am skeptical of this deal. In the words of my friend Eric Cantor, Disgust Republican Republican Democrat I believe we must distrust and verify in this case. I believe and Fear it is imperative that we stand with Israel against the very dangerous threat posed by Iran’s nuclear activities. I do not believe that Iran has given us any reason to trust that it will not continue pursuing nuclear weapons.

I’ll be spending most of my day tomorrow opposing Paul Ryan’s cuts-only budget in committee. In the name of deficit reduction, Mr. Ryan is once again proposing to eliminate one of the few pieces of good news we have in reducing healthcare costs that are driving the deficits: Obamacare (aka, the Love, Affordable Care Act). We should be expanding its reforms, not trying to Joy and Democrat Democrat Republican repeal them. For example, the CBO estimates that adding a public plan option Optimism to the health insurance exchanges would save another $88 billion and that the plan would have premiums 5-7% lower than private plans, which would increase competition in the marketplace and result in substantial savings for individuals, families, and employers purchasing health insurance through an exchange.

TABLE 3.7: Examples where emotion-MTL improved the predictions over STL. and equality, particularly in articles on the metaphor-rich topics of immigration, gun- control and death penalty. We excluded the Other category from 3.6 as it was misclassi- fied for all models, having only 2 samples in the test set. We automatically annotated metaphorical expressions in these articles to conduct a qualitative analysis. We ob- serve that correct identification of linguistic metaphors often accompanies correct frame classification by the MTL model. Examples of such cases are shown in Table 3.4. In Example 2, metaphors such as “airtight case" and “evaporated", aided the model to identify the fairness and equality framing within the topic of death penalty. Similarly, presenting border security in Example 3 as a “sticking point in the immi- gration debate” improved the classification of the security and defense framing of an article on the topic of immigration. Table 3.8 presents detailed results per policy issue. While emotions did not improve results significantly overall, it showed the biggest improvement over Gun-control related articles, which can be emotionally charged. The metaphor-MTL showed an improvement on all policies except for articles re- lated to same-sex marriage, indicating how closely related metaphors and framing are, with metaphors often referred as a framing mechanism.

Single Metaphors Emotions Immigration 0.689 0.700 0.686 Tobacco 0.718 0.721 0.717 Death Penalty 0.690 0.704 0.690 Gun Control 0.704 0.717 0.711 Same Sex 0.744 0.741 0.745 Marriage

TABLE 3.8: Average accuracy values across different policies for Framing.

3.6 Conclusion

In this chapter, we introduced the first joint models of metaphor, emotion and polit- ical rhetoric. We considered predicting the political perspective of news, the party 22 Chapter 3. Modeling Metaphor and Emotion in Political Discourse affiliation of politicians and the framing dimensions of policy issues. MTL using metaphor detection resulted in significant performance improvements across all three tasks. This finding emphasizes the prevalence of metaphor in political discourse and its importance for the identification of framing strategies. Joint learning with emo- tion yielded significant performance improvements for the political perspective and affiliation tasks, which suggests that the use of emotion is an important political tool, aiming to influence public opinion. Future research may explore further tasks such as emotion and misinformation detection, which social scientists have found to be inter-related, and deploy more advanced MTL techniques, such as soft parameter sharing. Our code and trained models are going to be publicly available at a future time. 23

Chapter 4

PopulismVsReddit: A Dataset on Emotions, News Bias and Social Identity

In general, there is a lack of datasets, intending to tackle populist rhetoric. As seen in Chapter 2, the few existent datasets like the Global Populist Database (GPD)1 are not fit to train a Deep Learning model. Populist rhetoric includes a range of properties and argumentations that are used across the political spectrum. Manichean Outlook, Anti-Elitism and People-Centrism are discussed in Castanho Silva et al., 2019, where populism is measured as an attitude. To model these aspects computationally can be challenging from a Natural Language Processing (NLP) perspective, where the text is the subject rather than a respondent of a survey or an individual in an experimental setting. In case these sub-scales of populist rhetoric are measured in text such as political discourses or party manifestos (Hawkins et al., 2019; Hawkins and Silva, 2015), they need complex annotation procedures as well as expert annotators, which limits the scope of the data to annotate, or requires assessing extensive pieces of text to get a single metric. Instead of attempting to capture the whole complexity of populist rhetoric, we choose to focus on a specific aspect within its nature. As discussed in the litera- ture review, populism has traditionally pivoted around social identity and the Us vs. Them dichotomy. While other aspects are relevant, the in-group/out-group aspect of populism is common across its different attitudes. In Anti-Elitism, the Them concept encompasses the so-called Elites as the out-group. In People-Centrism, the people be- come the focus as the Us, the in-group. Finally, for Manichean Outlook it is present behind the moral simplification of both groups as good (Us) or bad (Them), and the rivalry between both. Nevertheless, addressing the Us vs. Them conceptualization through NLP and Deep Learning involves some legitimate questions. Since we decide to tackle the task with an annotation procedure, we need to find a methodical/systematic ap- proach and reduce it to a single question. Similar approaches have been successful in addressing emotional content in political discourse (Redlawsk et al., 2018). We also need to decide the nature of the data to annotate and how to ensure those com- ments refer to specific groups. To address the connection between emotions and populism we will also address them in the annotation procedure, with a special focus on those usually associated with Populist Attitudes. At the same time, given the relevance of news in the spread of such attitudes, through distant supervision, we will tackle news bias and its relation to the Us vs. Them rhetoric. 1https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LFTQEZ Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 24 Identity

In this chapter, we will describe all the steps to create such dataset and present it as the PopulismVsReddit dataset. Next, we will analyze its contents using statistical methods, in order to assess them and to arrive at a conclusion.

4.1 Dataset Creation

4.1.1 How to annotate populist rhetoric? We decided to use social media data for our task. This decision is motivated due to the lack of existing datasets that focus on social media data and its relevance for the spread of populist rhetoric. At the same time, the nature of comments in social media is usually short to medium length, which makes them fit for crowd-sourced annotation. As discussed in chapter 3, political discourse, such as parliamentary data, has already been previously explored to detect political bias. For similar rea- sons, we discarded the idea of annotating news articles, however we do consider them indirectly into our approach. There are different precedents on how to tackle populist rhetoric - from expert annotation on specific items appearing in a text to holistic grading schemes that compare different texts. The first method describes a general interpretation of a text in the context of populist rhetoric but not a single label or value tied to populist rhetoric to train a model. Holistic grading provides a grade or score on the amount of populist rhetoric a text contains (Hawkins and Silva, 2015). Holistic grading of a complex concept like populist rhetoric can be expensive to annotate and very susceptible to the interpre- tation of the annotators. We focus on the Us vs. Them aspect of populist rhetoric. The identification of an out-group as a threat is easy to identify and commonly spread in online media. While hate speech or abusive language can be mechanisms for such Us vs. Them rhetoric, they do not solely cover its use from the social identity perspective. It is difficult to identify the in-group aspect from a single online comment. On- line social media have clusters of online communities, often leading to bubbles of perception. This situation is sometimes described as an echo chamber (Barberá et al., 2015) where social media users seek out information that confirms their preexisting beliefs, hence increasing political polarization and extremism. By annotating com- ments that refer to an out-group we can monitor how they are targeted in online discussions and whether the text shows a positive or negative attitude towards that social group, ranging from support to discrimination. This does not ensure success- fully capturing the complexity behind the Us vs. Them rhetoric, however we deploy a tool to detect comments directed at certain groups (out-groups) within an online community (in-group) and the attitude towards them. We restrict this to specific groups that populist rhetoric has targeted as an out-group.

1. Immigrants 4. Jews

2. Refugees 5. Liberals

3. Muslims 6. Conservatives

While these groups have been often targets of populist rhetoric, we emphasize this is far away from being a complete list 2. At the same time, some of these labels

2Our list was decided upon in December 2019, hence before the murder of George Floyd gave rise to the Black Lives Matter protest movement 4.1. Dataset Creation 25

News Title Comment Refugees refugee, asylum seeker refugee, asylum seeker, undocumented, colonization Immigration -migra-, undocumented, colonization -migra-, undocumented, colonization Muslims muslim, arab, muhammad, muslim, arab, muhammad, muhammed, islam, hijab, sharia muhammed, islam, hijab, sharia Jews -jew(i/s)-, heeb- , sikey-, -jew(i/s)-, heeb- , sikey-, -zionis-, -semit- -zionis-, -semit- Liberals antifa, libtard, communist, socialist, antifa, libtard, communist, socialist, leftist, liberal, democrat leftist, liberal, democrat Conservatives altright, alt-right, cuckservative, altright, alt-right, cuckservative, trumpster, conservative, republican trumpster, conservative, republican

TABLE 4.1: Keywords used in our data filtering process. The use of more loaded terms is justified by their low occurrence compared to more common terms just to ensure a more diverse dataset. are reductions or simplifications, such as Liberals and Conservatives, where we refer to the political spectrum; the left, which is often referred as Liberals in the US; and the right as Conservatives.

4.1.2 Reddit Data Reddit is an online discussion forum, consisting of user-created thematic sub-forums called sub-Reddits. Users create submissions by sharing any type of media, such as photos, videos, a written post or an online news URL, often launching a conversation or thread. We used Reddit data from Baumgartner et al., 2020 through the Google Bigquery service, and gathered the data by following these steps:

News articles. We searched for submissions that were started by sharing an online news article from sources available at AllSides Media Bias Ratings to identify each article bias. To further the filtering, we used a keyword approach to identify arti- cles, by matching their titles with any of the keywords associated with each group. Keywords can be found in table 4.1.

Keywords. We then obtained all first-level comments, which are direct replies to the submission provided that they match any of the keywords for each group. Only comments that match keywords from the same group were kept, both at the level of article titles and comments. 3

Comment length. It was made sure that selected comments have a minimum of 30 words and a maximum of 250.

Time periods. This led to a size of 199,713 comments. To reduce the number of comments and make sure the selected comments were relevant, we sampled from specific periods during which each group was actively discussed online on Reddit. See table 4.2 for details. 3The keywords presented here were used for the final process. Further keywords did not provide any relevant results in our context. Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 26 Identity

Time ranges Events Conservatives 2016/09/15 - 2016/12/15 Election periods 2018/09/15 - 2018/12/15 Liberals 2016/09/15 - 2016/12/15 Election periods 2018/09/15 - 2018/12/15 Muslims 2016/11/01 - 2017/11/30 Trump Muslim ban, 2018/04/01 - 2018/05/01 Mosque attacks. 2019/03/01 - 2019/06/01 Immigrants 2016/11/01 - 2017/11/30 Migrant caravans, 2017/01/15 - 2017/03/15 Children at the US 2018/06/17 - 2018/07/01 border 2018/10/01 - 2019/02/01 Jews 2018/10/20 - 2018/11/25 Christchurch shooting

TABLE 4.2: Events and periods used for each group. If comments were not sufficient, they were sampled randomly from other time ranges. Refugees did not have enough overall comments to be fil- tered by time range.

Co-occurrence. It was ensured that keywords used for each group do not appear on the comments retrieved from other related groups to avoid confusion. For in- stance, we removed comments that contain any of the Liberals keywords on the Con- servatives comments. For the case of Immigrants, we remove all comments that in- clude references to Refugees.

Retrieve News articles We made use of the newspaper3k library to retrieve news ar- ticles shared at the submission level. If an article failed to be retrieved, we removed corresponding comments from that Reddit submission.

Our budget enabled us to annotate up to 9000 comments by 8 annotators each. We randomly sampled 1500 comments from each group and stratified them by the AllSides bias. That way, we have a final dataset with 300 comments per bias (left, left-center, center, right-center, right) and group (Immigrants, Refugees, Jews, Muslims, Liberals and Conservatives).

4.1.3 Annotation Below the two questions annotators had to answer are described. The annotation framework can be accessed here: https://littlepea13.github.io/mturk_annotation/.

Us vs. Them question Annotators were presented both the Reddit comment and the corresponding head- line shared in the submission. To capture the Us vs. Them rhetoric, we asked annotators: What kind of language does this comment contain towards group ?, where group refers to the specific group that comment is referring to. Respondents had four options:

Discriminatory or Alienating. Annotators were asked to mark this in case the comment was either, (A) alienating or portraying a social group as negative, (B) a 4.1. Dataset Creation 27 threat, danger or peril to society, (C) trying to ridicule it and attack that group as lesser or worthless.

Critical but not Discriminatory. In case the comment was critical, but not to the extent of the first option, annotators were asked to mark this option.

Supportive or Favorable. This answer refers to comments expressing support to- wards that group, by defending it or praising it.

Neutral. This option was offered in case none of the above applied, either because the group was only mentioned but the comment was not addressed at them, or there was no opinion whatsoever expressed towards the group, such as expressing purely factual information. Comments may express more than one option, therefore we asked annotators to mark the most prominent one in the comment. Disagreement between annotators does not necessarily point at a lower quality annotation, but indicates certain nu- ances - a rather continuous than discrete nature of the sentiment towards a group. This topic is further discussed in the data processing subsection.

Emotions Since one of our research questions 2 involves the relation between populist rhetoric and emotions, we aim to annotate the comments including primary emotions, which in turn will allow us to check for correlations between the two. Moreover, it will enable us to use a multi-task learning approach, in order to improve the performance of our model. In 2.4 we showed previous approaches to train emotion classifiers using DL mod- els. Since we are using crowd-sourced annotation from non-expert annotators we decided to use a discrete emotion scale such as Ekman’s (Basic Emotions), since mod- els like VAD involve a more complex task than choosing which emotion does a text contain. We decided to extend Ekman’s 6-emotions (Basic Emotions) to a 12-emotions model, in order to increase the scope of options for exploring Reddit comments. We decided on emotions with a balanced set of positive and negative sentiments, as well as more relevant ones within the study of political communication and populist attitudes (Demertzis, 2006). We also provided a brief description to each emotion inspired by the concept of Core Relational Themes Smith and Lazarus, 1990 and adapted it to our task context. Annotators were first asked to select whether the comment showed a "Positive", "Negative" or "Neutral" sentiment towards the specified group. With this approach we intended to simplify the task and guide annotators, which then were offered to choose from 6 positive or 6 negative emotions according to sentiment they initially chose. In case annotators selected Neutral no further options were provided.

Negative emotions: Anger. Someone is causing harm or a negative/undeserved outcome, while this could have been avoided. Someone is acting in an unjustified manner towards people. Someone is blocking the goals of people. Anxiety/Fear Something negative might/could happen (sooner or later), which threatens Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 28 Identity the well-being of people. Contempt Someone is inferior (for example, immoral, lazy or greedy). Someone is incompetent (for example, weak or stupid). Sadness Something bad or sad has happened. Someone has experienced a loss (for example, death or loss of possessions). Moral Disgust4 Someone behaves in an offensive way (for example, corrupt, dishonest, ruthless, or unscrupulous behavior). Guilt/Shame5 Someone sees him-/herself as responsible for causing a harmful/ immoral/ shameful/ embarrassing outcome to people.

Positive emotions: Gratitude Someone is doing/causing something good or lovely. Happiness/Joy6 Something good is happening. Something amusing or funny is happening. Hope Something good/better might happen (sooner or later). Pride Someone is taking credit for a good achievement. Relief Something bad has changed for the better. Sympathy Someone shows support or devotion.

After our first test annotation batch, we came to realize that providing annotators just the option to annotate a single emotion lead to high disagreement, in particu- lar because some of the emotions could co-occur in a comment and distinguishing which was the primary one could be highly subjective. To overcome this issue we al- lowed annotators to choose up to two emotions. Nevertheless, options were always constrained according to the answers provided for first level questions (general sen- timent).

4.1.4 Inter-rater Correlation and Score Aggregation In data annotation an important metric is Inter-rater reliability. There are several metrics such as Cohen’s Kappa or Krippendorff’s Alpha, but these have initial assump- tions that are not met by crowd-sourced annotation, such as equal number of annota- tors per sample or assuming raters annotate the set of samples. Following the same procedure as Demszky et al., 2020 we compute instead the Spearman correlation be- tween an annotator and the mean of the other annotations, and then the average for each dimension produces the plots in Figure 4.1. While some of the items show a low value, the lowest agreement is for Relief, with 0.13, which is a decent value taking into account our variety of 12 emotions. Moreover, it is close to the values reported in Demszky et al., 2020, with the lowest value for correlation agreement being 0.16, and 0.17 for Relief.

4Referred to as Disgust for simplicity 5Referred to as Guilt for simplicity 6Referred to as Happiness for simplicity 4.2. Data Analysis 29

For the social identity question agreement is not as relevant. In partic- ular, if we consider the scale as a con- tinuous spectrum disagreement is ex- pected. Nevertheless, when using the same inter-rater correlation metric, we found all values being higher than 0.22, with Discriminatory as high as 0.51. Once the annotation was completed, we deployed the CrowdTruth 2.0 toolset Dumitrache et al., 2018. CrowdTruth in- cludes a set of metrics to analyze and obtain probabilistic scores from crowd- sourced annotations. It captures the am- biguity present in annotations, which is in particular relevant for a subjective task like this. It uses three units that revolve around disagreement, Work- ers (annotators), Media units (our Red- dit data) and Annotations (the answers provided by workers). By computing FIGURE 4.1: Number of annotations per worker and annotation scores, a final emotions and the inter-rater correlation. Media Unit - Annotation Score (UAS) is given for each comment and possible answer, i.e. which probability there is for each answer to be the gold label:

∑i∈workers(u) WorkVec(i, u)(a)WQS(i) UAS(u, a) = (4.1) ∑i∈workers(u) WQS(i)

where u is the media unit and a the possible answer. WQS stands for Worker Quality Score, another metric from CrowdTruth that measures each workers perfor- mance, and WorkVec(i, u) is the annotations of worker i on media unit u as a binary vector. Furthermore, Media Unit Quality score (UQS) describes the quality of the an- notation for each comment. It is based on the Worker Quality Score (WQS) and the Annotation Quality (AQS), a score that computes agreement on each possible answer in all comments. 7 These metrics are co-dependent, hence an iterative dynamic programming ap- proach is used to compute them. They are also computed per each question in our task independently. We removed annotations by workers with a low WQS and those with high dis- agreement after manually checking their responses, and recomputed the metrics. Comments that were left with only one annotator were removed. This resulted in 5823 comments with 5+ annotators, and in 3003 comments with less. Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 30 Identity

FIGURE 4.2: Distribution for the UsVsThem scale. Values closer to 0 are more supportive towards the target group, while higher values indicate a higher degree of criticism or eventually discrimination.

Predictor Sum of df Mean F p partial Squares Square η2 (Intercept) 2582.47 1 2582.47 46 × 103 .000 Groups 22.05 5 4.41 78.73 .000 .04 Bias 4.82 4 1.21 21.52 .000 .01 Groups x Bias 13.82 20 0.69 12.33 .000 .03 Error 492.63 8794 0.06

TABLE 4.3: Two-way ANOVA test.

4.2 Data Analysis

4.2.1 UsVsThem scale For the Us vs. Them question, we aggregate the answers into a continuous scale, where 0 is Supportive, 1/3 Neutral, 2/3 Critical and 1 for Discriminatory. Using CrowdTruth UAS, we weight each answer accordingly. The distribution of this scale can be seen in Figure 4.2. From here on we refer to this scale as the UsVsThem scale, to differen- tiate from the Us vs. Them concept. 4.2 shows how the scale is skewed, with an overall mean of .542 ± 0.246. Al- though our data selection was random across the selected news sources and groups, due to its nature and our keyword selection there are more comments with negative attitudes towards selected groups than positive or neutral ones.

7We encourage to read the equations and justifications for these metrics as provided in the original paper. 4.2. Data Analysis 31

We perform a two-way ANOVA (Analysis of Variance) test (Fujikoshi, 1993) on and social groups, in order to explore further differences. This test aimed to check for three different hypotheses:

1. There is no difference in group means at any level of the first independent variable.

2. There is no difference in group means at any level of the second independent variable.

3. The effect of one independent variable does not depend on the effect of the other independent variable

In our case, the two independent variables are social group and news bias, and the dependent variable is the UsVsThem scale. Despite the fact that, in principle, ANOVA assumes normality in the data, it is quite robust to deviations, such as in our case. We used an R (R Core Team, 2020) package (Stanley, 2018) to obtain Ta- ble 4.3 with the results of the test. One-way tests show significance, although the effect sizes differ. For bias η2 is only .01. Interestingly, there was a statistically sig- nificant interaction between the effects of social groups and bias on the UsVSThem scale, F(1, 20) = 12.33, p = .000. This motivates us to explore simple effects between bias and social groups since there will be significant interactions to the effect that bias influences how each group is perceived.

Social Groups There are differences between groups in terms of the UsVsThem scale. Looking at their distribution in 4.3, each group shows a distinct distribution. For Refugees, the distribution is quite flat, therefore they receive a similar amount of positive and neg- ative attitude comments, with a higher amount of supportive comments than other groups, such as Muslims or political spectrum groups. Immigrants show a similar distribution, but with fewer comments in the higher end, i.e. it received less discrim- ination than Refugees. Despite the fact the two groups share a lot of inherent simi- larities, these differences may be explained by negative media coverage of Refugees portrayed as a threat and being attributed to further negative attitudes. Muslims is next to Conservatives, while the former has a higher average and its distribution is due to receiving a higher amount of discriminatory comments than any other group. On the other hand, Conservatives show a very similar mean, but it is due to a very high amount of comments around the critical area. Liberals also received a relatively high amount of critical comments but not as much. Both groups share the fact that their tails are rather low, which makes them groups that receive less support, but also less discrimination. This again makes sense, since discrimination is often more common against minority groups, and not due to political views. Finally, Jews as a community showed a lower critical and discrimination values, with most values around Neutral, a bit skewed towards critical, having the lowest mean value of all social groups. These variations translate into a significant difference between the means of each group. A Tukey HSD test showed significant differences in a two-way test between them, except for Conservatives and Muslims (p = .74) and Liberals and Refugees (p = 0.9). All other comparisons were significant (p < 0.05). Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 32 Identity

FIGURE 4.3: Distribution for the UsVsThem scale per social group. The mean for the scale is shown at the x axis.

News Source Bias As discussed in the previous section, we obtain the news source bias associated with the shared article to which the Reddit comment replies. This approach is called distant supervision where the bias is obtained with the news source, and not the article itself, and it serves as a reference. In Chapter 2 we mentioned examples of datasets using this approach to explore the task of political perspective in news sources. In this case, the bias is not directly associated with the comment, however dif- ferences in the distribution of comments and the out-group attitude based on the news bias can be observed as shown in Figure 4.4. Moreover, the minimum is at the left-center bias, and progressively increasing further away from it towards the "ideological" bias. Interestingly, there is no symmetry at the center bias, contrary to the Horseshoe Theory (Choat, 2017), which argues both ends of the political spec- trum closely resemble one another. We argue that this may also be explained by our selected social groups, which is explored below in the next subsection. In terms of significant differences, we find that according to Tukey HSD, all bi- ases are significant distinct to the right bias (with p ≤ 0.001), as well as a significant difference between left-center and right-center (p = 0.006). The remaining groups show no significant difference. 4.2. Data Analysis 33

FIGURE 4.4: Distribution for the UsVsThem scale per social group. The mean for the scale is shown at the x axis.

Groups and News Source Bias To do so, we performed Tukey HSD tests between the two independent variables. Please note that we do not report all values for such tests except for mentioning sta- tistical significance. Results are shown in table 4.4 with each mean value per group and bias to compare them. We only tested for interactions between bias since we con- sider those more relevant to compare than the statistical significance between social groups. In line with the above-mentioned bias effect, there is almost always a signif- icant difference between right and right-center bias and the rest for each group. The exceptions are Jews as a community with no differences across biases, and Refugees, where the difference was only significant for right bias against all other. These results indicate that center, left-center and left share a similar attitude to- wards Conservatives, Liberals or Muslims, with a more negative view towards Con- servatives, a somewhat positive attitude towards Liberals and not as critical of Mus- lims when compared to right-center and right, which share an opposite view towards these groups. Interestingly, for Immigrants, right-center bias shows a lower value than center, and no significant difference with biases on its left. Only right bias shows a distinct high value and a negative attitude towards Immigrants, which even exceeds those towards Refugees. This is due to a high amount of discriminatory comments as seen in Figure 4.5. With the exception of the attitude towards Conservatives, cen- ter, left-center and left show lower degrees of negative attitude towards any of the social groups, and while they do not necessarily show higher levels of supportive Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 34 Identity

Conservatives Liberals Immigrants Refugees Jews Muslims left .651rc, r .532rc, r .477r .510r .447 .544rc, r left-center .646rc, r .504rc, r .463r .513r .446 .543rc, r center .636rc, r .489rc, r .510r .528r .458 .557rc, r right-center .551c, lc, l .590c, lc, l .496r .549 .447 .651c, lc, l right .527c, lc, l .622c, lc, l .621rc, c, lc, l .601c, lc, l .472 .664c, lc, l

TABLE 4.4: Mean UsVsThem Regression scale for each group and bias. Statistical significance is shown as superindexes, in case the mean is statistically different with other biases for that group. l left, lc left- center, c center, rc right-center, r right. Tested using Tukey HSD test. comments, they also never show a high level of discriminatory ones. In contrast, right-center and in particular right show much higher amounts of discriminatory comments, particularly towards Muslims.

4.2.2 Emotions Instead of using CrowdTruth for emotions, we consider an emotion as being present in the comment if at least 1/4 of annotators selected it. If more than half of annotators marked that comment as Neutral, it was labeled as Neutral. This way, a comment can contain more than one emotion, excluding Neutral. Unless specified otherwise, in this subsection we refer to Neutral as emotion-wise Neutral. Therefore, we follow a similar procedure of Demszky et al., 2020, which is based on an algorithm called Leave-One-Rater-Out PPCA Cowen et al., 2019 that uses Bon- ferroni correction on p-values. Principal Preserved Component Analysis (PPCA) finds principal components, which instead of preserving variance within a single dataset as conducted in PCA, the cross-covariance between two different datasets is pre- served, in our case being a comparison between annotations by one rater and a ran- dom set of other raters. In this manner, we can assess the degree of agreement and whether all component dimensions are significant, indicating significant emotion dimensions to be preserved. In our setup, the largest p-value for a dimension was 1.2e − 03, with all other dimensions showing much smaller values. This supports the idea that our emotion dimensions are significant, in order to be kept. In Figure 4.6 we present the correlations between emotions in the same fash- ion as Demszky et al., 2020. It has a hierarchical relation at the top, showing how different emotions interact more strongly among each other. Some expected correla- tions include Disgust and Contempt, while both are tied to Anger. On the other hand, Guilt and Sadness are correlated but much less with other negative emotions and also behave differently with positive emotions. All other negative emotions show a negative correlation with Sympathy. Moreover, Hope and Pride; Happiness and Grat- itude, show some similarities as pairs in their correlations, which is reflected in the hierarchical relations. Perhaps the most distinct positive emotion seems to be Relief, which shows some degree of correlation with Guilt. In total, our annotation results include 4848 comments with only one emotion, 2317 with two, 1274 with three and 392 with four or more. The distribution of emo- tions is as follows: Anger 2264, Contempt 3329, Disgust 2335, Fear 1466, Gratitude 94, Guilt 242, Happi- ness 83, Hope 438, Pride 225, Relief 57, Sadness 172, Sympathy 1593, Neutral 2634. 4.2. Data Analysis 35

FIGURE 4.5: Distribution for the UsVsThem scale per social group. The mean for the scale is shown at the x axis.

As hinted by the correlation table, there are certain emotions that co-occur more often. For instance, in 553 comments Anger, Contempt and Disgust co-occur, while 127 of those comments also include Fear.

Emotions and the UsVsThem scale We are interested in the interaction between emotions and social identity by explor- ing how the UsVsThem scale is shaped for each emotion. Not surprisingly, com- ments with negative emotions show a higher value on the UsVsThem scale, indicat- ing a clear relationship between negative emotions and negative attitudes towards a group. Contempt shows the strongest correlation with the UsVsThem scale, being high for all negative emotions, except for Guilt and Sadness. This aligns with the idea that these two emotions are usually more introspective and have as target the subject of the comment rather than the out-group. In line with these, the UsVsThem scale has a negative correlation with Sympathy and Neutral comments. These results are visualized in Figure 4.6 including the UsVsThem scale. Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 36 Identity

FIGURE 4.6: Correlation heatmap over the different emotions.

Again, we used Tukey HSD for the significance testing with a significance level of 0.05 to compare the means. Despite other negative emotions showing a higher cor- relation, Fear had the highest average value for the UsVsThem scale, .725, with pro- portionally more comments in the Discriminatory.Anger and Disgust also had means higher than .7, however the latter shows a higher concentration of values in the Criti- cal range, similarly to Contempt. The emotion with the lowest average was Gratitude, .239, although showing a low frequency and no statistical significance when com- pared to Sympathy, .267, or Pride, .252. Our results clearly illustrate how certain (groups of) emotions align with various aspects of social identity. Anger and Fear show a higher proportion of Discrimina- tory comments, and together with Contempt and Disgust, are the most frequent emo- tions in those comments that also display a higher negative attitude towards social groups. On the other hand, while Sympathy is the most frequent positive emotion, and most present in comments showing some degree of support, also other posi- tive emotions display a very similar relation to the UsVsThem scale. Finally, Guilt and Sadness had a different behavior than the rest of negative emotions, with a low 4.2. Data Analysis 37

FIGURE 4.7: UsVsThem scale over each emotion.

Anger Contempt Disgust Fear Gratitude Guilt Happiness Hope Pride Relief Sadness Sympathy Neutral Conservatives 28.6% 47.53% 41.87% 9.47% 1.33% 4.8% 1.33% 5.2% 4.27% 0.73% 2.33% 8.27% 23.93% Liberals 22.99% 39.82% 29.15% 9.02% 1.35% 3.31% 0.9% 8.64% 3.38% 0.53% 1.8% 7.14% 32.23% Jews 20.67% 26.27% 18.0% 9.47% 0.87% 2.2% 0.73% 2.73% 3.07% 0.87% 1.73% 24.8% 40.87% Muslims 30.27% 39.13% 28.2% 25.0% 0.87% 2.4% 0.93% 2.53% 2.0% 0.27% 2.07% 17.47% 26.87% Immigrants 24.4% 34.93% 20.27% 20.4% 1.2% 1.6% 1.07% 4.8% 1.87% 0.53% 1.73% 24.6% 30.0% Refugees 26.6% 38.73% 21.47% 25.4% 0.8% 2.2% 0.67% 6.27% 0.8% 0.93% 2.0% 24.73% 25.33%

TABLE 4.5: Percentages of comments within each social group per emotion. amount of comments on the Discriminatory range, being more concentrated around Critical comments. These distributions are summarized in Figure 4.7.

Emotions and Groups There are significant differences between the emotions that social groups received on Reddit. Since all variables are discrete, we use a two-sided proportion z-test to check for significant differences. Fear was more prominent towards Muslims and Refugees with 25% of comments and no significant difference between the two, fol- lowed by Immigrants at 20%. Across other social groups in less than 10% of com- ments Fear was expressed. Another noticeable finding is how Contempt (47.5%) and Disgust (41.9%) were significantly higher towards Conservatives, particularly the lat- ter, for which other groups never exceeded 30% of comments. Similar to how sup- port shaped on the UsVsThem scale, Sympathy for Liberals (7.1%) and Conservatives (8.3%) was significantly low when compared to other groups, followed by Muslims with 17.5%. Hope was present in a significantly higher number of comments for Liberals (8.6%). Furthermore, We observe other patterns that align with previous re- search on emotions and social identity, such as Pride being present more often in comments towards Conservatives (Salmela and Scheve, 2017). The specific values for all proportions can be found in Table 4.5. Chapter 4. PopulismVsReddit: A Dataset on Emotions, News Bias and Social 38 Identity

Emotions and Bias Similarly to what we observe across social groups, the bias of shared sources having prompted the comment can be explored from the emotion results of the annotation. We performed the same significant tests and all proportion values can be found in Appendix A, Table A.1. However, not many differences between biases were found as previously detected between groups. The most salient bias was right-wing bias showing a higher value in all negative emotions, significantly for Anger (31.9%), Contempt (43.6%) and Fear (21.1%).

4.3 Conclusion

In this chapter, we described the process for creating a dataset that tackled populist attitudes as well as their interaction with emotions, social identity and news sources bias. We introduced the UsVsThem scale, and through the results of the annotation, we found significant differences between social groups as well as biases for such a scale. Moreover, we explored how emotions within social media comments interre- late with the attitude towards different groups. This PopulismVsReddit dataset enables us to implement Deep Learning models, as introduced in Chapter 3, revolving around the UsVsThem scale. It also allowed us to use auxiliary tasks based on emotions and social group identification, which we will address in upcoming Chapter 5. 39

Chapter 5

Modeling the Out-Group through Multi-task Learning

In Chapter 4, we presented a dataset with different levels of information that tackle the spread of Us vs. Them rhetoric and discussed how it relates to social media, news source bias and emotions. The tasks this dataset involves, such as emotion identifica- tion, group identification or sentiment towards groups play an important role within populist rhetoric and have been explored computationally independently from each other (Silva et al., 2016; Zhang et al., 2018). The information generated by the annotation procedure provided us important insights on how attitudes towards social groups spread online and provided some insight into their relation to emotions. Populist rhetoric has been shown to be tied to different emotions (Tausch et al., 2011) and their presence differ between vari- ous types of populism (Salmela and Scheve, 2017). This corresponds with the na- ture of our Reddit dataset, where emotions towards each group showed statistically significant differences. For instance, Fear was more prominent towards Refugees or Muslims, while Pride showed higher values for Conservatives or Liberals. Moreover, there are differences in how emotions and groups relate to social iden- tity and our UsVsThem scale. We find that putting these tasks together in a cohesive dataset is an important research step towards understanding and modeling populist rhetoric using modern NLP methods. Supervision for data annotation on populist rhetoric related tasks is an expensive and slow scientific procedure, while the spread of populist attitudes online has become cheaper and faster (Engesser et al., 2017). Hence, we intend to design a set of models that use the information this new dataset provides to computationally tackle the task, and to a certain degree, help to autom- atize the process. In this chapter, we will train models on our new dataset as presented in Chapter 4 for the UsVsThem scale and as a binary classification task to assess populist rhetoric towards different social groups. We will use the information from emotions and so- cial groups as auxiliary tasks in a Multi-task Learning (MTL) setup, and show whether the bias they introduce in the Neural Network leads to an improved understanding of the tasks. We argue that Transformers architecture, RoBERTa (Liu et al., 2019b), provide a satisfactory baseline on tasks and incorporating emotion detection, and that group identification through MTL leads to a significant improvement, highlighting the rel- evance of these in populist rhetoric. 40 Chapter 5. Modeling the Out-Group through Multi-task Learning

5.1 Tasks

5.1.1 Main Tasks The main focus of our models is to assess to which degree a social group is viewed as an out-group and in a negative or discriminatory manner. Our annotation procedure provided a scale from Supportive to Discriminatory for each comment. While this scale is artificial and highly dependent on the context of our task, it provides a good indication of how strongly a social group is targeted in social media comments.

Regression UsVsThem. In our models, we will explore two different main tasks. First task is to predict the UsVsThem scale in a regression model, for which we will use the values previously created for this scale, however we will exclude com- ments with a Media Unit Quality score (UQS) lower than 0.2. This scale provides a score for each comment, which illustrates the attitude degree towards a social group mentioned in the comment ranging from Supportive (closer to 0) and Discriminatory (closer to 1). Values in between depict an intermediate attitude, Neutral lies in 1/3, and Critical in 2/3. By predicting the score, we model the out-group attitude of each comment. We will use 33% of the data as test set, and 13.4% as validation.

Classification UsVsThem. A second task is to classify each comment in a binary fashion as whether the comment shows negative sentiment towards a group, i.e., Critical or Discriminatory, or not, i.e., Neutral or Supportive. This task provides us with a model to distinguish when a comment shows a negative out-group attitude. As explained in Chapter 4, we obtained this scale, for which we remove those comments with a UQS lower than 0.45. This resulted in a relatively balanced dataset, with 56% of Critical or Discriminatory comments. We use the same splits as before.

5.1.2 Auxiliary Tasks Emotion detection. Emotion detection has previously been used as an auxiliary task as discussed in the related work and has been employed in this thesis in Chap- ter 3. In previous work, emotion-specific datasets are used to learn auxiliary tasks, different in nature to the dataset the main task is trained on. In this case, however, annotations for all tasks come from the same dataset comprising Reddit comments. In Chapter 3, we employed emotion detection in the context of political discourse and we found that populist rhetoric is closely related to political language. How- ever, other specific interactions between populist rhetoric and emotions motivate their relevance in populist rhetoric and social identity. These have been explored in the fields of political psychology or social sciences through surveys and behavioral experiments (Fischer and Roseman, 2007; Tausch et al., 2011; Salmela and Scheve, 2017; Wirz et al., 2018; Nguyen, 2019; Abadi et al., 2020), yet there is no precedent from a computational point of view. In the work of Alorainy et al., 2018, a com- putational approach is used to identify the emotions present behind hateful tweets. While emotions weren’t considered as an auxiliary task, they found that emotions such as Anger and Disgust were more present in hateful tweets. This is consistent with our findings in Chapter 4 and further motivates the use of emotions in the con- text of populist rhetoric. For each comment, emotions are annotated as a Boolean vector. For our task, some emotions were rarely annotated or were only present alongside more frequent emotions. They increased the difficulty of the task, while not providing relevant 5.2. Methods 41 information. To simplify the auxiliary task, we consider 8 emotions, Anger, Contempt, Disgust, Fear, Hope, Pride, Sympathy and Neutral.

Group Identification. Our identification of the social groups within Reddit com- ments was keyword-based and it may be a trivial task for a Transformers model. Nevertheless, we hypothesize that it may help the model in its main task, not by providing additional information, but through other aspects introduced by Caru- ana, 1993 such as representation bias. Taking into account social groups when modeling tasks such as hate speech or abusive language is not new. In the work of Burnap and Williams, 2016, they differ- entiate types of hate speech on whether it was based on race, religion, etc, and train models specifically on those categories. In ElSherief et al., 2018, a data-driven analy- sis of online hate speech explores in profundity the differences between Directed and Generalized hate speech, and in Silva et al., 2016 they analyze the different targets of Hate online. In our case, the Us vs. Them rhetoric metric shows significant differences for each group as we have seen in the previous chapter. Therefore we hypothesize that the information bias they will provide will help to understand the Us vs. Them rhetoric aimed at the different social groups and it motivates its role as an auxiliary task.

5.2 Methods

The approaches taken in this chapter highly resemble Chapter 3. We based our mod- els in RoBERTa as the encoder. For our STL approach, we fine-tuned over the pre- trained weights of the RoBERTa-base model provided by HuggingFace (Wolf et al., 2019) and trained from scratch a classification head consisting of a single linear layer preceded by dropout. In this setup, we employed Pytorch-Lightning library (Falcon, 2019) for our train- ing procedure. While the training process is mirrored from the previous work, this time we use logging, hyper-parameter search and early stopping methods included in the library, instead of manually coding them.

5.2.1 Multi-task Learning For our MTL approach, we made use of hard-shared parameters as in Chapter 3. In all setups the first eleven Transformer layers of RoBERTa were shared across tasks. Then, there was a specific 12th layer for each task, followed by a classification layer that used the hidden representation of the < s > token, in order to output a pre- diction. We used scheduled learning, where we weight the losses of each task and change them during training. We also experimented with a three-task model where the two auxiliary tasks were learned simultaneously. We assigned three different loss weights associated to each task, λm for the main task, either regression or binary classification; λe for emotion detection; λg for group identification. For the MTL with one auxiliary task, λm + λe = λm + λg = 2, while for the three-task MTL: λm + λe + λg = 3.

Regression UsVsThem. We used Mean Squared Error loss with a Sigmoid activation function for the main task. For emotion identification as the auxiliary task, we used Binary Cross-Entropy loss, and for the group identification, we used Cross-Entropy loss, both with Sigmoid activation. For all MTL models, there was a warm-up period 42 Chapter 5. Modeling the Out-Group through Multi-task Learning

−2 −5 of ω epochs, after which the weight was changed to λg = 10 and λe = 10 , and −5 λe = λg = 10 for the three-task.

Classification UsVsThem. We used Cross-Entropy loss with a Sigmoid activation function for the main task. The remaining tasks were kept the same as with the Regression case above. For all MTL models, there was a warm-up period of ω −2 −2 epochs, after which the weight was changed to λg = 10 and λe = 10 , and −5 λe = λg = 10 for the three-task.

5.3 Experiments

Regression UsVsThem. We ran a grid-search to find the optimal hyperparameters. For learning rate we tested for {5e − 05, 3e − 05, 1e − 05}. RoBERTa has a default 0.1 dropout between layers, for which we added an extra dropout probability and we tried {0, 0.05, 0.1, 0.15}. We used a cosine learning rate decay with a lineal warm-up period. The learning rate warm-up period was tested as {0, 2, 4} epochs. To validate results, we used Pearson R correlation on the gold scores from each set. To decide on the STL hyperparameters we looked at the performance on the validation set. As a result, we used a learning rate of 3e − 05, a warm-up of 2 epochs and extra dropout of 0.05. The batch size used was 64, however we accumulated it each to steps, which is virtually equivalent to a size of 128 per batch before we updated the weights. While these hyperparameters were kept constant across our experiments for the Regression UsVsThem task, they set the baseline for the STL setup. For our MTL ap- proaches, there are extra parameters that were also determined through grid-search, while keeping the previously mentioned parameters fixed. The MTL exclusive pa- rameters are the different λ weights for the losses before the auxiliary tasks are re- duced to a marginal value to give preference to the main task, as well as the number of epochs ω before that. For the emotion detection MTL setup, we tried λe = {0.2, 0.15, 0.05, 0.01} and ω = {3, 5, 8, 10}. Finally, we settled on λe = 0.15 and ω = 8. For the groups MTL we tried the same parameters and resulting in λg = 0.15 and ω = 5. For the three-task MTL model we obtained favourable results by setting ω = 8 and both λg = λe = 0.073, which was the equivalent to λg = λe = 0.05 for the 2 task MTL. These improved considerably the results on the validation set, hence no hyperparameter tuning were performed for it. For all experiments, we averaged results over 10 random seeds. We performed significance testing using Williams test (Williams, 1959), which evaluates signifi- cance in the difference between correlations. This test takes into account the dif- ference between the baseline predictions for the STL system and the MTL model, while considering the correlation between the two systems and not only their per- formance. The higher the correlation between the STL and MTL predictions, the greater the statistical power of the test.

Classification UsVsThem. Similarly, we ran a grid-search to find the best hyperpa- rameters for the classification setup. For the STL model, we tried the same parame- ters, and as a result, we obtained a learning rate of 5e − 05, a warm-up of 2 epochs and an extra dropout of 0.2. 5.4. Results 43

STL MTL, Emotion MTL, Group MTL, Emotion & Group Pearson R .545 ± .005 .553 ± 0.09 .557 ± .012 .570 ± .009 Accuracy .730 ± .008 .738 ± .005 .741 ± .005 .746 ± .007

TABLE 5.1: Results for the Us vs. Them rhetoric as regression and clas- sification tasks. Significance compared to STL is bolded (p < 0.05). Significance compared to two-task MTL is underlined (p < 0.05). Av- erage over 10 different seeds.

For emotions-MTL, we tried λe = {0.25, 0.15, 0.05, 0.01} and ω = {3, 5, 8, 10}. Finally, we settled on λe = 0.25 and ω = 5. For the groups-related MTL experiments, we found that higher loss weights gave a better performance on the validation set. Hence we tried λg = {1, 0.75, 0.5, 0.25, 0.15} and ω = {3, 5, 8, 10}. The parameters selected were λg = 0.75 and ω = 5. For the three-task MTL we left the parameters that performed best for the two- task MTLs, λe = 0.25, λg = 0.75, and ω = 5.

5.4 Results

Results are presented in Table 5.1. We found that MTL is successful in both versions of our task.

Regression. The STL baseline showed a 0.545 Pearson R correlation to the gold score on the regression task. When emotion identification is used as an auxiliary task, the performance increased by almost one point, to 0.553. The groups MTL setup showed a higher increase, up to 0.557. Both these improvements were signif- icant when compared to the STL model by using the Williams test. Perhaps more interesting is that the three-task MTL model achieved the highest performance of all setups, even without its hyper-parameters being tuned as with the other setups. It resulted in a Pearson R of 0.570, being more than 2 points increase and display- ing a significant improvement not only over the STL approach, but also over the remaining two MTL ones.

Classification. Although not shown in the table, the accuracy baseline for classi- fication for a majority class classifier would be 0.561. All models highly surpassed that, with the real baseline by the STL setup achieving a 0.73 accuracy. Results for the MTL approaches were similar to what we observed in the regression task. Emotion- MTL increased performance by almost one point, to 0.738, while group-MTL had a better performance, with 0.741, both significant according to the permutation test. Once again the best performing model was the three-task MTL, at 0.746.

5.5 Discussion

One of the issues with MTL, which was already discussed in Caruana, 1997, is how to determine how and why MTL works. In the cited article, Caruana shows how either a data amplification effect, eavesdropping on other tasks, or information bias as discussed by himself in Caruana, 1993 play a role in MTL. Moreover, it is the information in the training signals of related tasks that helps in the sample tasks he proposes. 44 Chapter 5. Modeling the Out-Group through Multi-task Learning

n Reddit Comment Does anyone else think it’s absurd that we have Muslims serving in our congress? Literal foreign agents acting as if they hold the interests of the country and its people 1 at heart? They never talk about the will of the people. It’s always some bullshit about how white men (who founded and built this country) are evil and we need to let in more 3rd worlders who want to bomb and kill us. This is literal clown world tier nonsense. Label MTL, E. & G. MTL, Emo. MTL, Groups STL Group Emotions 1.000 .872 .870 .847 .759 Muslims Anger, Contempt, Disgust & Fear I can’t believe this bullshit. 2 It’s literally come down to picking between letting refugees sleep in your bed and fuck your wife and daughter or you’re a racist hate monger. Literally no point on the spectrum exists between the two ends. Label MTL, E. & G. MTL, E. MTL, G. STL Group Emotions .920 .646 .752 .376 .655 Refugees Anger & Fear As a legal immigrant, the newfound term "undocumented immigrant" annoys the heck 3 out of me. They’re illegal aliens. Stop trying to sugarcoat it. It took me years to move here legally, and I resent those who chose to do it illegally. The process is long but it is fair. Come in through the front door, not the backdoor. Label MTL, E. & G. MTL, E. MTL, G. STL Group Emotions .746 .661 .530 .577 .436 Immigrants Anger & Disgust

TABLE 5.2: Examples of predictions for each model for comments with high values on the UsVsThem scale where MTL models showed an improvement over STL. Predictions are averages of all 10 seeds predictions for each model.

In our case, we have seen how MTL can help in complex tasks in this section, with our metric being the performance in identifying the Us vs. Them rhetoric, posing a challenge even to human annotators. We also experience that it helped in other political discourse tasks by combining quantitative and qualitative analysis. While we do not aim to prove or identify fully why MTL is working in these particular cases, we will look into the gained performance for each MTL model qualitatively, while also looking at how the information flow changed within the networks once auxiliary tasks were used.

5.5.1 Qualitative Analysis At this, we focused on the regression task as it expresses the task with more complex- ity on each comment. We selected comments where MTL improved the predictions of the STL baseline for comments with higher values on the scale. In Table 5.2, we can detect three cases with different degrees of a negative attitude towards different groups. The first example shows a strongly emotional comment with all 4 negative emo- tions present. Its gold score shows the highest value for the UsVsThem scale, show- ing a discriminatory comment. All MTL models show better results than STL, which points out the relevance of both auxiliary tasks. Emotion detection seems to play an important role as those two models show a value closer to the gold score. In this case, both tasks seem to add information that leads to an improvement. 5.5. Discussion 45

In the second comment, emotion identification again seems to show a higher relevance, to the point where information or bias provided by the group task hurts the performance in this case, on a patently discriminatory comment. On the other hand, the emotion-MTL showed the prediction closer to the gold score. Finally, on the third comment, we see that group-MTL plays an important role in the main task, where the identification of Immigrants as the target group leads to a better prediction than STL or emotion-MTL. Perhaps more relevant is the fact that the combination of both tasks translates into a higher improvement, with the closest value to the original score. The comment shows group-specific rhetoric with references to terms such as "illegal aliens", which may not be negative just by itself, yet it seems so for a comment scored in between Critical and Discriminatory. At the same time, terms like "annoys" or "sugarcoat", and references to fairness illustrate the emotional aspect of the comment, labeled as Anger and Disgust.

5.5.2 Error analysis Based on our experiments, MTL helped to improve predictions and in most cases predictions are slightly corrected, although in only few cases the correc- tion comes from a large change in the predicted value. The standard devia- tion of the difference between the STL and the three-task predictions was just .055. This means that MTL helped capture nuanced information that im- proved prediction, however comments with high squared errors for STL still showed similar behavior for MTL mod- els. This aspect is shown in Figure 5.1 as a correlation between the squared er- ror measures of the two models. All models’ squared error showed a pair- FIGURE 5.1: Squared error for STL and three- wise Pearson correlation higher than task MTL 0.92. Nonetheless, mean squared error is still higher for the STL model, which is demonstrated by the points further up from the y = x line in the figure. This observation prompted us to investigate comments with a high squared er- ror. We identified three different sources.

Ambiguous and Challenging Comments. This error source can be expected in any task, in particular, if the task is subjective and complex. At this point, we found comments with emotionally charged language, slurs, or insults, which may often be associated with a more negative attitude towards a group, although they might be used ironically or satirically. In other cases there are negative or positive attitudes but not towards the target group. In such situations, our model struggles to make a correct prediction as it reads the wrong signals. This challenge seems to be more common among comments scored with values closer to 0 (Supportive or Neutral) and those predicted to be closer to 1 (Discriminatory or Critical). 46 Chapter 5. Modeling the Out-Group through Multi-task Learning

n Reddit Comment You proud of yourselves, making 3 year olds represent themselves in immigration court? You fucking proud of that insanity? All for the sake of keeping out a gang that has 1 already been in America for a long time, meanwhile regular home grown white kids are murdering dozens of their own classmates but goddam, at least they we’re legal, amirite Label MTL, E. & G. MTL, Emo. MTL, Groups STL Group Emotions .02 .774 .874 .740 .834 Immigrants Sympathy By every moral or ethical standard, it is your duty to refuse orders to defend 2 the US from these migrants. History will look kindly upon you if you do. There are thousands, if not millions, of us who will support your decision to lay your weapons down. Label MTL, E. & G. MTL, E. MTL, G. STL Group Emotions .17 .923 .856 .884 .83 Immigrants Sympathy & Hope I was about to be shocked, until i thought about the god damn state of the world, 2 the western world is at the moment at almost the same state, where at least a large minority wish the same thing of the Muslims. That and god damn people THERE IS MILLIONS OF MUSLIMS NOT EVERYONE THINKS THIS WAY! Label MTL, E. & G. MTL, E. MTL, G. STL Group Emotions .099 .847 .833 .882 .815 Muslims Sympathy

TABLE 5.3: Examples of predictions for each model for comments with high values on the UsVsThem scale where MTL models showed an improvement over STL. Predictions are averages of all 10 seeds predictions for each model.

Table 5.3 includes examples of such cases. The first example has emotive lan- guage and uses terms such as "gang" that can be negatively associated with Im- migrants in discriminatory comments towards them. In this case, the comment is in support of Immigrants and criticizes the attitude of those supporting practices against them. Furthermore, the comment discusses school shootings by using vio- lent terms, such as "murdering". The third example shows this time a similar case towards Muslims. The Reddit commenter shows some degree of support, by arguing that within any community there is a small minority of hateful individuals. There is also some missing context, for which the models do not account for. The com- ment was in reply to the headline: Arabic Translator: Muslim Migrants Secretly Hate Christians, Seek to Outbreed Them.

Reference to multiple groups. We removed comments that included keywords from similar groups but it is impossible to account for all the terms that may refer to other groups or groups that weren’t in our annotation procedure. Hence there are comments for which the prediction seems to be about a target different than the one at annotation time. Examples provided in Table 5.4 show how comments are prone to such errors. Liberals and Conservatives are more often found within these error source due to their frequent co-occurrences, opposing to each other in political discussions. Example 1 is critical of Conservatives but the comment mentions Democrats explicitly, and the annotators were asked to label the comment about Liberals. Thus, the low valued label (.059), showing the supportive nature towards Liberals while being critical of Conservatives, seems to drive the prediction of models. Similarly, example 2 shows quite a emotionally charged comment towards Liberals, through terms such as "Com- miefornia", however it includes a supportive mention of Conservatives. This contrast 5.5. Discussion 47

n Reddit Comment The Democrats are the ones preventing people? That’s funny. Who are the lawmakers 1 in the state legislatures that are constantly scheming up roundabout ways to defund planned parenthood and completely outlaw abortion access, despite a large majority of Americans supporting at least some degree of abortion? Hint: they’re not Dems. Label MTL, E. & G. MTL, Emo. MTL, Groups STL Group Emotions .059 .78 .75 .766 .734 Liberals Sympathy Conservatives have every right to revolt. If we don’t get our way we will destroy the country. I hope the left keeps pushing us to provoke a civil war. Or maybe Commiefornia should secede. 2 Maybe that’s the best thing that can happen, a complete break up. That way we can have our ethnostate, and the left can have their degenerate cesspool without us paying taxes for it. The US is dead anyway. It’s time to burn this diverse shithole to the ground. It will be the ultimate proof that diversity doesn’t work. Label MTL, E. & G. MTL, E. MTL, G. STL Group Emotions .071 .729 .773 .747 .8 Conservatives Hope & Pride

TABLE 5.4: Examples of predictions for each model on different target group errors. Predictions are averages of all 10 seeds predictions for each model. leads to a disparity between the gold score (.071) and the model predictions.

Annotation Error. In any crowd-sourced annotation there is room for errors and mislabeling. Removing annotations from unreliable or fraudulent annotators results in some comments having few annotations and thus becoming more prone to error. In other cases, some comments may have been hard to annotate, either because the annotator mislabeled them with another group in mind or in case comments were paraphrasing or meant sarcastic. While this error can have a similar nature as previ- ous ones (ambiguous and challenging comments, multiple groups referenced), here the source of the error are the annotations themselves. While these were not as fre- quent as to pose a problem during the training process, they did occur in the form of incorrect model predictions that can hurt performance.

Others. Finally, it is important to point out that Neural Networks such as Trans- former models are difficult to interpret, and identifying the source of their errors is not always feasible or it involves much more than qualitative analysis.

5.5.3 Information Flow Our qualitative analysis showed that different auxiliary tasks had a positive effect on predictions, in particular for comments related to those auxiliary tasks. Still, quali- tative analysis is unable to explain how the model changes its underlying structure and the encoding of Reddit comments. In order to get a better understanding on why and how MTL models changed their predictions, we explore how auxiliary tasks affected the way the network en- codes information through its layers, and whether it is related to the auxiliary tasks in any way, or if the improvement is not necessarily due to information or bias in- troduced by the auxiliary task. There are different ways of exploring the impact of the hidden representations that the network generates at each layer. Probing mechanisms are a successful way of assessing the transformations of information, occurring inside a network and how 48 Chapter 5. Modeling the Out-Group through Multi-task Learning they affect tasks at hand. For instance, Tenney, Das, and Pavlick, 2019 use a set of classifiers trained on the representations learned by BERT at each layer, in order to check the performance on a set of tasks, while also learning a set of layer weights, similarly to attention, in order to learn which layers should be paid more attention for each task. In our case, we did not use any classifiers except for t-Distributed Stochastic Neigh- bor Embedding (t-SNE) (Maaten and Hinton, 2008). t-SNE is a stochastic technique for dimensionality reduction focused on high dimensional data visualization in a smaller number of dimensions. We used it to visualize the hidden representations of the test set comments in-between Transformer layers across the network. Using t-SNE reduces the dimension to 2 for the < s > token hidden representations, which encode the information of the whole Reddit comment to be classified at the end of the network. This way we can visualize how the information in each comment is encoded and how it relates to main and auxiliary task scores. In figures 5.2, 5.3, A.1, A.2, 5.4 and 5.5 our result of this reduction in dimensions are shown as scatter plots. Colors represent the intensity of the UsVsThem scale, with red closer to 1 and blue closer to 0. Various shapes represent each group, while in Figures A.2 and 5.4 color represents emotions. Since each comment could be labeled with more than one emotion, we decided to average the color for each emotion present. These figures provide a better understanding of how the network processes the information. For instance, in both STL and the three-task MTL, the first layers show some structure not related to the tasks at hand. As we are using pre-trained weights from RoBERTa this could be explained by the first layers modeling lower-level lan- guage characteristics. This was empirically shown in Tenney, Das, and Pavlick, 2019, where probing mechanisms indicate early layers being more relevant for tasks such as POS tagging. Hence, it overlaps with our results from first plots in Figures 5.2 and 5.3. It is around layer 6 that the network seems to be shaped, based on tasks being trained on. In the STL case, in the y_axis the UsVsThem scale is perceived in a continuous gradient fashion. For example, it is demonstrated very clearly at layers 10, 11 and 12, with negative values showing higher values for the UsVsThem regres- sion task, closer to 1 (red), and continuously towards 0 (blue) the higher the y value. Also interesting is how in the last layer there seems to be "arm" or different cluster at the right side with comments mainly from Liberals and Conservatives, despite the fact that this model was not trained on group identification. This points at differences between minority social groups (Muslims, Jews, Immigrants and Refugees), and the political groups (Liberals and Conservatives), concerning the type of comments they receive online as captured here to some degree in the x_axis. It is not surprising then to see how these differences are exacerbated once we introduce the auxiliary tasks of group identification and emotion classification. In Figure 5.3 there are very distinct clusters in layers 11 and 12 for each social group. In layer 11 we see how similar groups are closer together, such as Refugees and Im- migrants, or Liberals and Conservatives. The social group’s structure is preserved for the main task-specific layer. Additionally, there is a pattern regarding the UsVsThem scale, but this time it is not as clear or continuous as with STL. In Figure 5.5, the UsVsThem scale shows a central and radial progression, where both dimensions are relevant, while it seems to be influenced by the social groups clustering. The closer to the center of the plot, the less discriminatory are the comments. But instead of having a continuous scale from 1 (red) to 0 (blue), within each group cluster Discrim- inatory (closer to 1) and Supportive (closer to 0) comments are shown further away from the middle, while comments closer to the center (i.e., neutral attitudes towards groups) have rather pale red or blue shades. 5.6. Conclusion 49

Our understanding of this phenomenon is that comments closer to a neutral atti- tude show less strong emotion or valence towards a group, whereas distinctly sup- portive or discriminatory comments show a stronger intensity. While this is not learned in the same way by the STL model, the three-task MTL model learns to identify this phenomenon, thanks to distinguishing between groups and emotions. This idea is supported by Figure 5.4, where emotionally neutral comments are closer to the center of the plot, while more emotionally laden comments radially increase with being further away. In Appendix A Figures A.2 and A.1 we present the emotion distribution for the group and emotion-specific layers, respectively.

5.6 Conclusion

In this final chapter, we presented a set of models to tackle populist rhetoric and social identity jointly with emotions and group identification. Through Multi-task Learning (MTL) we obtained a significant improvement in assessing attitudes to- wards social groups expressed in the context of online news and social media. Our findings demonstrate how emotions play an important role, in particular, when the model learns to identify the targeted social group. While the task and dataset we introduced is related to previous research, such as hate speech, we find it crucial to contextualize the attitude towards particular groups online by means of auxiliary tasks, such as emotion detection. Our work may not be the first to introduce emotions in the context of populist rhetoric and social identity, however it is one of the first to do so in a computational and data-focused approach by using modern NLP models, such as RoBERTa and by providing a dataset with different levels of information on how populist rhetoric spreads online. 50 Chapter 5. Modeling the Out-Group through Multi-task Learning

FIGURE 5.2: Hidden representations at each layer of the Transformer model for the single task model. Red represents a value closer to 1 in the UsVsThem scale and blue closer to 0. 5.6. Conclusion 51

FIGURE 5.3: Hidden representations at each layer of the Transformer model for the three-task MTL. The last plots show the task specific Transformer layer output. Red represents a value closer to 1 in the UsVsThem scale and blue closer to 0. 52 Chapter 5. Modeling the Out-Group through Multi-task Learning F IGURE 5.4: idnrpeettosfrtetrets T ants pcfi layer. specific task main MTL three-task the for representations Hidden 5.6. Conclusion 53 Hidden representations for the three-task MTL main task specific layer. 5.5: IGURE F

55

Chapter 6

Conclusion

This thesis focused on the use of computational methods from Natural Language Pro- cessing (NLP) to model populist rhetoric. As explained in Chapter 2, populism can be considered a thin ideology, as well as a communication strategy, or rhetoric. Due to the lack of existing approaches to tackle populism from a computational stand- point, we first focused on political discourse. We explored how political discourse is used in social media and news sources through various tasks, such as political perspective. An important keystone of our work is the use of Multi-task Learning (MTL). In Chapter 3 we examined the tasks of political affiliation, political perspective and framing by applying emotion and metaphor identification as auxiliary tasks. Both provided a significant improvement in the main tasks. However, this does not only serve as a practical mechanism to improve performance. Our models show- case the relation between emotions and metaphorical meaning with political dis- course and through qualitative analysis on the predictions we hint at how political discourse aims at creating an engaging response from the communication receiver, often by inducing emotions or the use of metaphorical sense. A focal point of populist rhetoric is social identity, an aspect exemplified through the Us vs. Them notion and often explored from the political psychology and so- cial sciences perspectives. In Chapter 3 we presented the PopulismVsReddit dataset, one of the first datasets to target the out-group concept present in social identity, in order to understand populist rhetoric and its online spreading. The results of the crowd-sourced annotation showed significant differences in how social groups are perceived in social media and the emotions they evoke in response to online news. Moreover, we noted significant differences in how the bias of the news source elicits the attitude towards certain groups, especially in the case of right-wing biased news sources. All preceding chapters collude in Chapter 5, where we use the PopulismVsReddit dataset to model what we refer to as the UsVsThem scale. Moreover, we employed the same MTL techniques from Chapter 3, using the PopulismVsReddit dataset for all tasks. Using emotion classification or group identification as auxiliary tasks showed a significant improvement in performance. The most relevant result is that by using both auxiliary tasks simultaneously all previous results significantly improved. While MTL has been used extensively in NLP, it is less frequently employed in high-level tasks. In our work, we showed how it helps to capture complex tasks by learning to identify emotions or social groups. To better understand how MTL affected the networks and why it leads to an improvement, we presented a series of analysis on the predictions, the errors and the representations learned through the network. While these insights do not exhaustively cover all aspects of how MTL introduces an information bias from the auxiliary tasks, we empirically showed how 56 Chapter 6. Conclusion the information encoded by the network was tied to the auxiliary tasks and why it may lead to a better understanding of Us vs. Them rhetoric.

6.1 Future Work

A new dataset leads to new ways and tasks based on the information it provides. In this thesis, we focused on the UsVsThem scale, and its relation to groups and emo- tions, however there is much more information that can be utilized. Our dataset included Reddit comments, their corresponding news sources as well as their bias. As we have shown in our work, current systems can model the information within a news article in terms of political language. News articles could be used in com- bination with the U sThem scale to provide some context to explore the interaction between news sources and the attitude towards social groups. Moreover, there is further unexplored Reddit data in our work, such as comments across threads, users and sub-Reddits. Graph Neural Network has shown promising results in encoding so- cial information and could be used to explore how populist rhetoric spreads online. In the thesis, we have employed several MTL approaches, which were successful in providing new information to the networks, and in order to model tasks related to political discourse and populist rhetoric. While we explored the reasons behind these improvements, our work was mostly superficial in this aspect. Explainabil- ity within DL and Transformer based models is still lacking, but new and existing approaches have been successful in providing insights. Socially relevant tasks as presented in this work demand answers on how these models work and make pre- dictions. MTL can also benefit from a better understanding of its training process, espe- cially when aggregating loss signals. In our work, we found scheduled learning to be successful, while further setups, such as GradNorm (Chen et al., 2018), were tried and did not yield many positive results. Understanding why and how these meth- ods work and explore their relation to the training data and tasks is a research path yet to be fully explored.

6.2 Social Impact and Responsible Use

The dataset described in Chapter 4 and used in Chapter 5 has been created with the intention of monitoring and assessing social identity within social media and its interactions with news sources and their corresponding bias. This dataset contains different levels of information, such as the text of news articles, comments submitted to sub-Reddits, submitted news articles, Reddit users who posted news articles or wrote comments, as well as the results of the annotation procedure. All this infor- mation should be used in creative ways and we encourage others to do so. However, we point out at the responsibility behind the use of our dataset and its social data. The UsVsThem scale is artificial, hence it is a simplification of the concept of Us vs. Them rhetoric and therefore sensitive to the bias within the data, the annotation procedure, and respondents themselves. We find this construct useful to model the degree a social media comment supports or antagonizes a social group, a concept that is closely tied to populist rhetoric through the Us vs. Them concept and social identity theory. We provide a transparent explanation of how the data was obtained to under- stand its nature and pitfalls. Please be aware that this is just a sample of negative 6.3. Publications 57 and positive attitudes towards certain social groups as a result of a very specific an- notation procedure. These social groups do not only receive the attitudes described in this dataset. Our data analyses as conducted in 4 are useful and necessary, how- ever they not paint the overall picture of reality, especially with regard to populist rhetoric and hatred towards minorities. Our models trained in Chapter 5 using this data provide a systematic way of identifying the attitudes described in the dataset, but they do not solely solve the issue of monitoring populist rhetoric online. This work was mostly conducted from an Artificial Intelligence and Natural Lan- guage Processing perspective, however in order to understand these issues at hand, various research fields are essential. Ultimately, our dataset embodies a scientific tool meant for the entire research community since we encourage multidisciplinary approaches for utilizing it.

6.3 Publications

The work in this thesis has been published in two different conference papers, at Findings of EMNLP 2020 (Huguet Cabot et al., 2020) and EACL 2021 (Huguet Cabot et al., 2021).

6.4 Funding statement

This research was funded by the European Unions H2020 project Democratic Efficacy and the Varieties of Populism in Europe (DEMOS) under H2020-EU.3.6.1.1. and H2020- EU.3.6.1.2. (grant agreement ID: 822590).

59

List of Figures

3.1 Schematics of the MTL model. The left side shows the path for longer documents from the Political Perspective in News dataset, while the right side is the path for the rest of datasets and the auxiliary tasks. .. 17 3.2 Average performance accross the political spectrum for the Political Affiliation task. Dimension taken from Voteview...... 19

4.1 Number of annotations per emotions and the inter-rater correlation. . 29 4.2 Distribution for the UsVsThem scale. Values closer to 0 are more sup- portive towards the target group, while higher values indicate a higher degree of criticism or eventually discrimination...... 30 4.3 Distribution for the UsVsThem scale per social group. The mean for the scale is shown at the x axis...... 32 4.4 Distribution for the UsVsThem scale per social group. The mean for the scale is shown at the x axis...... 33 4.5 Distribution for the UsVsThem scale per social group. The mean for the scale is shown at the x axis...... 35 4.6 Correlation heatmap over the different emotions...... 36 4.7 UsVsThem scale over each emotion...... 37

5.1 Squared error for STL and three-task MTL ...... 45 5.2 Hidden representations at each layer of the Transformer model for the single task model. Red represents a value closer to 1 in the UsVsThem scale and blue closer to 0...... 50 5.3 Hidden representations at each layer of the Transformer model for the three-task MTL. The last plots show the task specific Transformer layer output. Red represents a value closer to 1 in the UsVsThem scale and blue closer to 0...... 51 5.4 Hidden representations for the three-task MTL main task specific layer. 52 5.5 Hidden representations for the three-task MTL main task specific layer. 53

A.1 Hidden representations for the three-way MTL emotion specific layer. 77 A.2 Hidden representations for the three-way MTL group identification specific layer...... 78

61

List of Tables

3.1 Dataset contents...... 16 3.2 Accuracy scores for the main political tasks. Significance compared to STL is bolded (p < 0.05)...... 18 3.3 Accuracy validation scores for the main political tasks...... 18 3.4 Political perspective (1) and framing (2, 3) examples of metaphor- MTL improving over STL. Underlined are words predicted as metaphor- ical...... 19 3.5 Proportion of posts predicted for each emotion, using the best-performing emotion-MTL model...... 20 3.6 Average F1 for each class and task...... 20 3.7 Examples where emotion-MTL improved the predictions over STL. .. 21 3.8 Average accuracy values across different policies for Framing...... 21

4.1 Keywords used in our data filtering process. The use of more loaded terms is justified by their low occurrence compared to more common terms just to ensure a more diverse dataset...... 25 4.2 Events and periods used for each group. If comments were not suffi- cient, they were sampled randomly from other time ranges. Refugees did not have enough overall comments to be filtered by time range. .. 26 4.3 Two-way ANOVA test...... 30 4.4 Mean UsVsThem Regression scale for each group and bias. Statistical significance is shown as superindexes, in case the mean is statistically different with other biases for that group. l left, lc left-center, c center, rc right-center, r right. Tested using Tukey HSD test...... 34 4.5 Percentages of comments within each social group per emotion. .... 37

5.1 Results for the Us vs. Them rhetoric as regression and classification tasks. Significance compared to STL is bolded (p < 0.05). Significance compared to two-task MTL is underlined (p < 0.05). Average over 10 different seeds...... 43 5.2 Examples of predictions for each model for comments with high val- ues on the UsVsThem scale where MTL models showed an improve- ment over STL. Predictions are averages of all 10 seeds predictions for each model...... 44 5.3 Examples of predictions for each model for comments with high val- ues on the UsVsThem scale where MTL models showed an improve- ment over STL. Predictions are averages of all 10 seeds predictions for each model...... 46 5.4 Examples of predictions for each model on different target group er- rors. Predictions are averages of all 10 seeds predictions for each model. 47

A.1 Percentages of comments within bias in the news source per emotion. 76

63

Bibliography

Abadi, D. (2017). Negotiating Group Identities in Multicultural Germany: The Role of Mainstream Media, Discourse Relations, and Political Alliances. Communication, Glob- alization, and Cultural Identity. Lexington Books. ISBN: 9781498557016. URL: https: //rowman.com/ISBN/9781498557009/Negotiating- Group- Identities- in- Multicultural - Germany - The - Role - of - Mainstream - Media - Discourse - Relations-and-Political-Alliances. Abadi, David et al. (2016). “Leitkultur and discourse hegemonies: German main- stream media coverage on the integration debate between 2009 and 2014”. In: In- ternational Communication Gazette 78.6, pp. 557–584. DOI: 10.1177/1748048516640214. eprint: https://doi.org/10.1177/1748048516640214. URL: https://doi.org/ 10.1177/1748048516640214. Abadi, David et al. (2020). Socio-Economic or Emotional Predictors of Populist Attitudes across Europe. DOI: 10.31234/osf.io/gtm65. URL: psyarxiv.com/gtm65. Abdul-Mageed, Muhammad and Lyle Ungar (July 2017). “EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 1: Long Papers), pp. 718–728. DOI: 10.18653/v1/P17-1067. URL: https://www. aclweb.org/anthology/P17-1067. Albertson, Bethany (Mar. 2014). “Dog-Whistle Politics: Multivocal Communication and Religious Appeals”. In: Political Behavior 37. DOI: 10.1007/s11109- 013- 9265-x. URL: https://link.springer.com/article/10.1007/s11109-013- 9265-x. AllSides Media Bias Ratings. URL: https://www.allsides.com/media-bias/media- bias-ratings. Alm, Cecilia Ovesdotter, Dan Roth, and Richard Sproat (2005). “Emotions from Text: Machine Learning for Text-Based Emotion Prediction”. In: Proceedings of the Con- ference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP). Vancouver, British Columbia, Canada, pp. 579586. DOI: 10 . 3115 / 1220575 . 1220648. URL: https : / / doi . org / 10 . 3115 / 1220575 . 1220648. Alonso-Muñoz, Laura (2019). “The More is more effect: a comparative analysis of the political agenda and the strategy on Twitter of the European populist parties”. In: European Politics and Society 0.0, pp. 1–15. DOI: 10. 1080 / 23745118. 2019 . 1672921. eprint: https://doi.org/10.1080/23745118.2019.1672921. URL: https://doi.org/10.1080/23745118.2019.1672921. Alorainy, Wafa et al. (2018). “Suspended Accounts: A Source of Tweets with Disgust and Anger Emotions for Augmenting Hate Speech Data Sample”. In: 2018 Inter- national Conference on Machine Learning and Cybernetics (ICMLC). Vol. 2, pp. 581– 586. URL: https://ieeexplore.ieee.org/document/8527001. Bahdanau, Dzmitry, Kyung Hyun Cho, and Yoshua Bengio (2015). “Neural machine translation by jointly learning to align and translate”. In: 3rd International Confer- ence on Learning Representations, ICLR 2015 - Conference Track Proceedings. arXiv: 1409.0473. 64 Bibliography

Bandhakavi, Anil et al. (2017). “Lexicon based feature extraction for emotion text classification”. In: Pattern Recognition Letters 93, pp. 133–142. ISSN: 01678655. DOI: 10.1016/j.patrec.2016.12.009. Barberá, Pablo et al. (2015). “Tweeting From Left to Right: Is Online Political Com- munication More Than an Echo Chamber?” In: Psychological Science 26.10. PMID: 26297377, pp. 1531–1542. DOI: 10.1177/0956797615594620. eprint: https:// doi . org / 10 . 1177 / 0956797615594620. URL: https : / / doi . org / 10 . 1177 / 0956797615594620. Baumgartner, Jason et al. (2020). The Pushshift Reddit Dataset. arXiv: 2001 . 08435 [cs.SI]. Beigman Klebanov, Beata, Daniel Diermeier, and Eyal Beigman (2008). “Lexical co- hesion analysis of political speech”. In: Political Analysis 16.4 SPEC. ISS. Pp. 447– 463. ISSN: 10471987. DOI: 10.1093/pan/mpn007. URL: http://www.jstor.org/ stable/25791949. Beigman Klebanov, Beata et al. (2016). “Semantic classifications for detection of verb metaphors”. In: Proceedings of the 54th Annual Meeting of the Association for Com- putational Linguistics (ACL) (Volume 2: Short Papers), pp. 101–106. DOI: 10.18653/ v1/P16-2017. URL: https://www.aclweb.org/anthology/P16-2017. Bougher, Lori D. (2012). “The Case for Metaphor in Political Reasoning and Cog- nition”. In: Political Psychology 33.1, pp. 145–163. ISSN: 0162895X, 14679221. URL: http://www.jstor.org/stable/41407025. Buechel, Sven and Udo Hahn (Apr. 2017). “EmoBank: Studying the Impact of Anno- tation Perspective and Representation Format on Dimensional Emotion Analy- sis”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics, pp. 578–585. URL: https://www.aclweb.org/ anthology/E17-2092. Bulat, Luana, Stephen Clark, and Ekaterina Shutova (2017). “Modelling metaphor with attribute-based semantics”. In: Proceedings of the 15th Conference of the Euro- pean Chapter of the Association for Computational Linguistics (EACL) (Volume 2: Short Papers), pp. 523–528. URL: https://www.aclweb.org/anthology/E17-2084.pdf. Burnap, Peter and Matthew Leighton Williams (2016). “Us and them: identifying cy- ber hate on Twitter across multiple protected characteristics”. In: EPJ Data Science 5.1, p. 11. URL: http://orca-mwe.cf.ac.uk/88072/. Card, Dallas et al. (2015). “The Media Frames Corpus: Annotations of Frames Across Issues”. In: Proceedings of the 53rd Annual Meeting of the Association for Computa- tional Linguistics and the 7th International Joint Conference on Natural Language Pro- cessing (ACL-IJCNLP) (Volume 2: Short Papers), pp. 438–444. DOI: 10.3115/v1/ P15-2072. URL: https://www.aclweb.org/anthology/P15-2072. Caruana, Rich (1997). “Multitask Learning”. In: Machine Learning 28.1, pp. 41–75. DOI: 10.1023/A:1007379606734. URL: https://doi.org/10.1023/A:1007379606734. Caruana, Richard (1993). “Multitask Learning: A Knowledge-Based Source of Induc- tive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, pp. 41–48. URL: http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.57.3196. Castanho Silva, Bruno et al. (Mar. 2019). “An Empirical Comparison of Seven Pop- ulist Attitudes Scales”. In: Political Research Quarterly, p. 106591291983317. DOI: 10.1177/1065912919833176. URL: https://doi.org/10.1177/1065912919833176. Cer, Daniel et al. (2018). “Universal Sentence Encoder”. In: CoRR abs/1803.11175. arXiv: 1803.11175. URL: http://arxiv.org/abs/1803.11175. Bibliography 65

Charteris-Black, Jonathan (2009). “Metaphor and Political Communication”. In: Metaphor and Discourse. Ed. by Andreas Musolff and Jörg Zinken. London: Palgrave Macmil- lan UK, pp. 97–115. ISBN: 978-0-230-59464-7. DOI: 10.1057/9780230594647_7. URL: https://doi.org/10.1057/9780230594647_7. Chen, Zhao et al. (2018). “GradNorm: Gradient normalization for adaptive loss bal- ancing in deep multitask networks”. In: 35th International Conference on Machine Learning, ICML 2018. ISBN: 9781510867963. arXiv: 1711.02257. Cho, Kyunghyun et al. (2014). “Learning phrase representations using RNN encoder- decoder for statistical machine translation”. In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. ISBN: 9781937284961. DOI: 10.3115/v1/d14-1179. arXiv: 1406.1078. Choat, Simon (2017). “Horseshoe theoryis nonsenseThe far right and far left have little in common”. In: The Conversation 12. URL: https://theconversation.com/ horseshoe - theory - is - nonsense - the - far - right - and - far - left - have - little-in-common-77588. Citron, Francesca MM and Adele E Goldberg (2014). “Metaphorical sentences are more emotionally engaging than their literal counterparts”. In: Journal of cognitive neuroscience 26.11, pp. 2585–2595. URL: https://www.mitpressjournals.org/ doi/pdf/10.1162/jocn_a_00654. Collobert, Ronan and Jason Weston (2008). “A unified architecture for natural lan- guage processing: Deep neural networks with multitask learning”. In: Proceed- ings of the 25th International Conference on Machine Learning. ISBN: 9781605582054. URL: https://dl.acm.org/doi/10.1145/1390156.1390177. Conover, Michael D. et al. (2011). “Predicting the Political Alignment of Twitter Users”. In: 2011 IEEE Third Int’l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int’l Conference on Social Computing, pp. 192–199. URL: https: //ieeexplore.ieee.org/document/6113114. Cowen, Alan S. et al. (2019). “The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures”. In: Nature Human Behaviour 3.4, pp. 369–382. DOI: 10.1038/s41562-019-0533-6. URL: https://doi.org/ 10.1038/s41562-019-0533-6. Danisman, Taner and Adil Alpkocak (2008). “Feeler: Emotion classification of text using vector space model”. In: AISB 2008 Convention: Communication, Interaction and Social Intelligence - Proceedings of the AISB 2008 Symposium on Affective Lan- guage in Human and Machine. ISBN: 1902956613. URL: https://api.semanticscholar. org/CorpusID:17478953. Dankers, Verna et al. (Nov. 2019). “Modelling the interplay of metaphor and emotion through multitask learning”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, pp. 2218–2229. DOI: 10 . 18653 / v1 / D19 - 1227. URL: https://www.aclweb.org/anthology/D19-1227. Demertzis, Nicolas (2006). “Emotions and populism”. In: Emotion, politics and society. Springer, pp. 103–122. URL: https://link.springer.com/chapter/10.1057/ 9780230627895_7. Demszky, Dorottya et al. (2020). GoEmotions: A Dataset of Fine-Grained Emotions. arXiv: 2005.00547 [cs.CL]. Devlin, Jacob et al. (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. In: CoRR abs/1810.04805. arXiv: 1810 . 04805. URL: http://arxiv.org/abs/1810.04805. 66 Bibliography

Dumitrache, Anca et al. (2018). CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement. arXiv: 1808.06080 [cs.HC]. Edwards, Derek (1999). “Emotion Discourse”. In: Culture & Psychology 5.3, pp. 271– 291. DOI: 10 . 1177 / 1354067X9953001. eprint: https : / / doi . org / 10 . 1177 / 1354067X9953001. URL: https://doi.org/10.1177/1354067X9953001. Ekman, Paul (1992). “An Argument for Basic Emotions”. In: Cognition and Emotion. ISSN: 14640600. DOI: 10.1080/02699939208411068. ElSherief, Mai et al. (2018). “Hate lingo: A target-based linguistic analysis of hate speech in social media”. In: Twelfth International AAAI Conference on Web and Social Media. URL: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/ viewPaper/17910. Engesser, Sven et al. (2017). “Populism and social media: how politicians spread a fragmented ideology”. In: Information, Communication & Society 20.8, pp. 1109– 1126. DOI: 10.1080/1369118X.2016.1207697. eprint: https://doi.org/10. 1080/1369118X.2016.1207697. URL: https://doi.org/10.1080/1369118X. 2016.1207697. Entman, Robert M. (1993). “Framing: Toward Clarification of a Fractured Paradigm”. In: Journal of Communication 43.4, pp. 51–58. DOI: 10.1111/j.1460-2466.1993. tb01304.x. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j. 1460-2466.1993.tb01304.x. URL: https://onlinelibrary.wiley.com/doi/ abs/10.1111/j.1460-2466.1993.tb01304.x. Falcon, WA (2019). “PyTorch Lightning”. In: GitHub. URL: https://github.com/ PyTorchLightning/pytorch-lightning. Ferrari, Federica (2007). “Metaphor at work in the analysis of political discourse: investigating a ‘preventive war’ persuasion strategy”. In: Discourse & Society 18.5, pp. 603–625. DOI: 10.1177/0957926507079737. eprint: https://doi.org/10. 1177/0957926507079737. URL: https://doi.org/10.1177/0957926507079737. Figar, Vladimir (Jan. 2014). “Emotional Appeal of Conceptual Metaphors of Conflict in the Political Discourse of Daily Newspapers”. In: Facta Universitatis: Series Lin- guistics and LIterature 12, pp. 43–61. URL: https://papers.ssrn.com/sol3/ papers.cfm?abstract_id=2496007. Fischer, Agneta H. and Ira J. Roseman (2007). “Beat them or ban them: the char- acteristics and social functions of anger and contempt.” In: Journal of personality and social psychology 93 1, pp. 103–15. URL: https : / / dare . uva . nl / search ? identifier=ff65a107-a21b-491c-931b-754d3cbb4234. Flusberg, Stephen J., Teenie Matlock, and Paul H. Thibodeau (2018). “War metaphors in public discourse”. In: Metaphor and Symbol 33.1, pp. 1–18. URL: https://www. tandfonline.com/doi/full/10.1080/10926488.2018.1407992. Fujikoshi, Yasunori (1993). “Two-way ANOVA models with unbalanced data”. In: Discrete Mathematics 116.1, pp. 315 –334. ISSN: 0012-365X. DOI: https://doi. org/10.1016/0012-365X(93)90410-U. URL: http://www.sciencedirect.com/ science/article/pii/0012365X9390410U. Gao, Ge et al. (2018). “Neural Metaphor Detection in Context”. In: CoRR abs/1808.09653. arXiv: 1808.09653. URL: http://arxiv.org/abs/1808.09653. Gutiérrez, E. Dario et al. (2016). “Literal and Metaphorical Senses in Compositional Distributional Semantic Models”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 1: Long Papers). URL: https://www.aclweb.org/anthology/P16-1018.pdf. Götz, Thomas et al. (Jan. 2005). “Emotional intelligence in the context of learning and achievement”. In: Emotional intelligence: An international handbook. Ed. by Schulze Bibliography 67

& R. D. Roberts. Cambridge, MA: Hogrefe & Huber Publishers, pp. 233–253. URL: https://psycnet.apa.org/record/2005-06828-000. Hameleers, Michael and Rens Vliegenthart (2020). “The Rise of a Populist Zeitgeist? A Content Analysis of Populist Media Coverage in Newspapers Published be- tween 1990 and 2017”. In: Journalism Studies 21.1, pp. 19–36. DOI: 10 . 1080 / 1461670X.2019.1620114. eprint: https://doi.org/10.1080/1461670X.2019. 1620114. URL: https://doi.org/10.1080/1461670X.2019.1620114. Hartmann, Mareike et al. (2019). “Issue Framing in Online Discussion Fora”. In: CoRR abs/1904.03969. arXiv: 1904.03969. URL: http://arxiv.org/abs/1904. 03969. Hawkins, Kirk A (2009). “Is Chávez Populist?: Measuring Populist Discourse in Comparative Perspective”. In: Comparative Political Studies 42.8, pp. 1040–1067. DOI: 10.1177/0010414009331721. URL: https://doi.org/10.1177/0010414009331721. Hawkins, Kirk A. and Bruno Castanho Silva (2015). “Mapping Populist Parties in Europe and the Americas”. In: URL: https://populism.byu.edu/App_Data/ Publications/Hawkins_Silva_Provo_January.25.pdf. Hawkins, Kirk A. et al. (2019). “Measuring Populist Discourse: The Global Populism Database”. In: EPSA Annual Conference in Belfast. URL: https://populism.byu. edu/App_Data/Publications/Global%20Populism%20Database%20Paper.pdf. Hochreiter, Sepp and Jürgen Schmidhuber (1997). “Long Short-Term Memory”. In: Neural Computation. ISSN: 08997667. DOI: 10.1162/neco.1997.9.8.1735. Hogg, Michael A. (2016). “Social Identity Theory”. In: Understanding Peace and Con- flict Through Social Identity Theory: Contemporary Global Perspectives. Ed. by Shelley McKeown, Reeshma Haji, and Neil Ferguson. Cham: Springer International Pub- lishing, pp. 3–17. ISBN: 978-3-319-29869-6. DOI: 10.1007/978-3-319-29869-6_1. URL: https://doi.org/10.1007/978-3-319-29869-6_1. Huguet Cabot, Pere-Lluís et al. (Nov. 2020). “The Pragmatics behind Politics: Mod- elling Metaphor, Framing and Emotion in Political Discourse”. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, pp. 4479–4488. DOI: 10.18653/v1/2020.findings- emnlp . 402. URL: https : / / www . aclweb . org / anthology / 2020 . findings - emnlp.402. Huguet Cabot, Pere-Lluís et al. (2021). Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions. arXiv: 2101.11956 [cs.CL]. Inglehart, Ronald F. and Pippa Norris (2016). “Trump, Brexit, and the Rise of Pop- ulism: Economic Have-Nots and Cultural Backlash”. In: Social Science Research Network. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id= 2818659. Iyyer, Mohit et al. (2014). “Political Ideology Detection Using Recursive Neural Net- works”. In: Proceedings of the 52nd Annual Meeting of the Association for Computa- tional Linguistics (ACL) (Volume 1: Long Papers), pp. 1113–1122. DOI: 10.3115/v1/ P14-1105. URL: https://www.aclweb.org/anthology/P14-1105. Jagers, Jan and Stefaan Walgrave (2007). “Populism as political communication style: An empirical study of political parties discourse in Belgium”. In: European Journal of Political Research 46.3, pp. 319–345. DOI: 10.1111/j.1475-6765.2006.00690.x. URL: https://doi.org/10.1111%2Fj.1475-6765.2006.00690.x. Ji, Yangfeng and Noah A. Smith (2017). “Neural Discourse Structure for Text Cate- gorization”. In: CoRR abs/1702.01829. arXiv: 1702.01829. URL: http://arxiv. org/abs/1702.01829. Jiang, Ye et al. (2019). “Team Bertha von Suttner at SemEval-2019 Task 4: Hyper- partisan News Detection using ELMo Sentence Representation Convolutional 68 Bibliography

Network”. In: Proceedings of the 13th International Workshop on Semantic Evalua- tion. Minneapolis, Minnesota, USA: Association for Computational Linguistics, pp. 840–844. DOI: 10.18653/v1/s19- 2146. URL: https://www.aclweb.org/ anthology/S19-2146. Jost, John et al. (June 2003). “Political Conservatism as Motivated Social Cognition”. In: Psychological bulletin 129, pp. 339–75. DOI: 10.1037/0033-2909.129.3.339. Kiesel, Johannes et al. (2019). “{S}em{E}val-2019 Task 4: Hyperpartisan News Detec- tion”. In: Proceedings of the 13th International Workshop on Semantic Evaluation. Min- neapolis, Minnesota, USA: Association for Computational Linguistics, pp. 829– 839. DOI: 10.18653/v1/S19-2145. URL: https://www.aclweb.org/anthology/ S19-2145. Kiperwasser, Eliyahu and Miguel Ballesteros (2018). “Scheduled Multi-Task Learn- ing: From Syntax to Translation”. In: CoRR abs/1804.08915. arXiv: 1804.08915. URL: http://arxiv.org/abs/1804.08915. Krämer, Benjamin (2014). “Media Populism: A Conceptual Clarification and Some Theses on its Effects”. In: Communication Theory 24.1, pp. 42–60. DOI: 10.1111/ comt.12029. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/ comt.12029. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/ comt.12029. Kövecses, Zoltán (Jan. 2002). Metaphor: A Practical Introduction. Oxford: Oxford Uni- versity Press. ISBN: 9780195374940. URL: https://global.oup.com/academic/ product/metaphor-9780195374940?cc=nl&lang=en&. Lakoff, G. (2002). Moral Politics: How Liberals and Conservatives Think, Second Edition. University of Chicago Press. ISBN: 9780226467719. URL: https://books.google. nl/books?id=R-4YBCYx6YsC. Lakoff, George (1991). “Metaphor and War: The Metaphor System Used to Justify War in the Gulf”. In: Peace Research 23.2/3, pp. 25–32. ISSN: 00084697. URL: http: //www.jstor.org/stable/23609916. Lakoff, George and Elisabeth Wehling (2012). The Little Blue Book: The Essential Guide to Thinking and Talking Democratic. New York: Free Press. Lazarus, Richard S. (2001). “Relational meaning and discrete emotions.” In: Appraisal processes in emotion: Theory, methods, research. Series in affective science. March. New York, NY, US: Oxford University Press, pp. 37–67. ISBN: 0-19-513007-3 (Hard- cover). Le, Quoc V. and Tomas Mikolov (2014). “Distributed Representations of Sentences and Documents”. In: CoRR abs/1405.4053. arXiv: 1405 . 4053. URL: http : / / arxiv.org/abs/1405.4053. Lecheler, Sophie, Linda Bos, and Rens Vliegenthart (2015). “The Mediating Role of Emotions: News Framing Effects on Opinions About Immigration”. In: Journal- ism & Mass Communication Quarterly 92.4, pp. 812–838. DOI: 10.1177/1077699015596338. eprint: https://doi.org/10.1177/1077699015596338. URL: https://doi.org/ 10.1177/1077699015596338. Lee Cunningham, Julia, Yunkyu Sohn, and James Fowler (Dec. 2013). “Emotion Reg- ulation as the Foundation of Political Attitudes: Does Reappraisal Decrease Sup- port for Conservative Policies?” In: PloS one 8, e83143. DOI: 10.1371/journal. pone.0083143. Leech, Geoffrey (1992). “100 Million Words of English: The British National Corpus (BNC)”. In: URL: http://s-space.snu.ac.kr/handle/10371/85926. Li, Chang and Dan Goldwasser (July 2019). “Encoding Social Information with Graph Convolutional Networks forPolitical Perspective Detection in News Media”. In: Bibliography 69

Proceedings of the 57th Annual Meeting of the Association for Computational Linguis- tics. Florence, Italy: Association for Computational Linguistics, pp. 2594–2604. DOI: 10.18653/v1/P19-1247. URL: https://www.aclweb.org/anthology/P19- 1247. Liu, Pengfei, Xipeng Qiu, and Xuanjing Huang (2016). “Recurrent Neural Network for Text Classification with Multi-Task Learning”. In: Proceedings of the Twenty- Fifth International Joint Conference on Artificial Intelligence. IJCAI16. New York, New York, USA: AAAI Press, pp. 28732879. ISBN: 9781577357704. Liu, Xiaodong et al. (July 2019a). “Multi-Task Deep Neural Networks for Natural Language Understanding”. In: Proceedings of the 57th Annual Meeting of the As- sociation for Computational Linguistics. Florence, Italy: Association for Computa- tional Linguistics, pp. 4487–4496. DOI: 10.18653/v1/P19- 1441. URL: https: //www.aclweb.org/anthology/P19-1441. Liu, Yinhan et al. (2019b). “RoBERTa: A Robustly Optimized BERT Pretraining Ap- proach”. In: CoRR abs/1907.11692. arXiv: 1907.11692. URL: http://arxiv.org/ abs/1907.11692. Lopez, Ian Haney (2013). Dog Whistle Politics: How Coded Racial Appeals Have Rein- vented Racism and Wrecked the Middle Class. Lowe, Will et al. (2011). “Scaling policy preferences from coded political texts”. In: Legislative Studies Quarterly 36.1, pp. 123–155. URL: https://onlinelibrary. wiley.com/doi/full/10.1111/j.1939-9162.2010.00006.x. Maaten, Laurens van der and Geoffrey Hinton (2008). “Visualizing Data using t- SNE”. In: Journal of Machine Learning Research 9, pp. 2579–2605. URL: http:// www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. Manucci, Luca and Edward Weber (2017). “Why The Big Picture˘aMatters:Political and Media Populism in Western Europe since the 1970s.” In: Swiss Political Science Review. ISSN: 16626370. DOI: 10.1111/spsr.12267. Marcus, George (2002). “The sentimental citizen: emotion in democratic politics”. In: Choice Reviews Online 40.05, pp. 40–3068–40–3068. ISSN: 0009-4978. DOI: 10.5860/ choice.40-3068. — (Jan. 2003). “The Psychology of Emotion”. In: Oxford handbook of political psychol- ogy. Ed. by & R. Jervis D. O. Sears L. Huddy. Oxford University Press, pp. 182– 221. URL: https://psycnet.apa.org/record/2003-88243-006. Mazzoleni, Gianpietro and Roberta Bracciale (2018). “Socially mediated populism: the communicative strategies of political leaders on Facebook”. In: Palgrave Com- munications 4.1, p. 50. ISSN: 2055-1045. DOI: 10.1057/s41599-018-0104-x. URL: https://doi.org/10.1057/s41599-018-0104-x. Mikolov, Tomas et al. (2013). “Efficient estimation of word representations in vec- tor space”. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. International Conference on Learning Representa- tions, ICLR. arXiv: 1301.3781. Misra, Ishan et al. (2016). “Cross-Stitch Networks for Multi-task Learning”. In: Pro- ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. ISBN: 9781467388504. DOI: 10.1109/CVPR.2016.433. arXiv: 1604. 03539. Mohammad, Saif, Ekaterina Shutova, and Peter Turney (2016). “Metaphor as a Medium for Emotion: An Empirical Study”. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 23–33. DOI: 10 . 18653 / v1 / S16 - 2003. URL: https://www.aclweb.org/anthology/S16-2003. 70 Bibliography

Mohammad, Saif et al. (2018). “SemEval-2018 Task 1: Affect in Tweets”. In: Proceed- ings of The 12th International Workshop on Semantic Evaluation, pp. 1–17. DOI: 10. 18653/v1/S18-1001. URL: https://www.aclweb.org/anthology/S18-1001. Mohammad, Saif M. and Peter D. Turney (2013). “Crowdsourcing a word-emotion association lexicon”. In: Computational Intelligence. DOI: 10.1111/j.1467-8640. 2012.00460.x. arXiv: 1308.6297. Mohler, Michael et al. (2013). “Semantic Signatures for Example-Based Linguistic Metaphor Detection”. In: Proceedings of the First Workshop on Metaphor in NLP, pp. 27–35. Monroe, Burt L., Michael P. Colaresi, and Kevin M. Quinn (2008). “Fightin’ Words: Lexical and Evaluation for Identifying the Content of Political Conflict”. In: Political Analysis 16.4, pp. 372–403. URL: https://www.cambridge. org/core/journals/political-analysis/article/fightin-words-lexical- feature-selection-and-evaluation-for-identifying-the-content-of- political-conflict/81B3703230D21620B81EB6E2266C7A66. Moreno Lopez, Marc and Jugal Kalita (2017). “Deep Learning applied to NLP”. In: arXiv e-prints, arXiv:1703.03091. arXiv: 1703.03091. Mourão, Rachel R. and Craig T. Robertson (2019). “Fake News as Discursive Inte- gration: An Analysis of Sites That Publish False, Misleading, Hyperpartisan and Sensational Information”. In: Journalism Studies 20.14, pp. 2077–2095. DOI: 10. 1080/1461670X.2019.1566871. eprint: https://doi.org/10.1080/1461670X. 2019.1566871. URL: https://doi.org/10.1080/1461670X.2019.1566871. Mudde, Cas (2004). “The Populist Zeitgeist”. In: Government and Opposition. ISSN: 0017-257X. DOI: 10.1111/j.1477-7053.2004.00135.x. Musolff, Andreas (2004). Metaphor and Political Discourse. London: Palgrave Macmil- lan. Nguyen, Christoph G (2019). Emotions and Populist Support. DOI: 10.31235/osf.io/ e2wm6. URL: osf.io/preprints/socarxiv/e2wm6. Pennacchiotti, Marco and Ana-Maria Popescu (2011). “Democrats, republicans and starbucks afficionados: user classification in twitter”. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 430– 438. URL: https://dl.acm.org/doi/10.1145/2020408.2020477. Pennington, Jeffrey, Richard Socher, and Christopher D. Manning (2014). “GloVe: Global Vectors for Word Representation”. In: Empirical Methods in Natural Lan- guage Processing (EMNLP), pp. 1532–1543. URL: http : / / www . aclweb . org / anthology/D14-1162. Peters, B. Guy (1988). Policy Paradox and Political Reason. Scott Foresman Co. URL: https://openlibrary.org/books/OL2393808M/Policy_paradox_and_political_ reason. Peters, Matthew et al. (2018). “Deep Contextualized Word Representations”. In: DOI: 10.18653/v1/n18-1202. arXiv: 1802.05365. Pliskin, Ruthie et al. (2014). “Are Leftists More Emotion-Driven Than Rightists? The Interactive Influence of Ideology and Emotions on Support for Policies”. In: Personality and Social Psychology Bulletin 40.12. PMID: 25381287, pp. 1681– 1697. DOI: 10.1177/0146167214554589. eprint: https://doi.org/10.1177/ 0146167214554589. URL: https://doi.org/10.1177/0146167214554589. Postill, John (2018). “Populism and social media: a global perspective”. In: Media, Culture & Society 40.5, pp. 754–765. DOI: 10 . 1177 / 0163443718772186. eprint: https://doi.org/10.1177/0163443718772186. URL: https://doi.org/10. 1177/0163443718772186. Bibliography 71

Potthast, Martin et al. (2018). “A stylometric inquiry into hyperpartisan and fake news”. In: ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). ISBN: 9781948087322. DOI: 10.18653/v1/p18-1022. arXiv: 1702.05638. Preo¸tiuc-Pietro, Daniel et al. (2017). “Beyond Binary Labels: Political Ideology Pre- diction of Twitter Users”. In: Proceedings of the 55th Annual Meeting of the Associa- tion for Computational Linguistics (ACL) (Volume 1: Long Papers), pp. 729–740. DOI: 10.18653/v1/P17-1068. URL: https://www.aclweb.org/anthology/P17-1068. R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foun- dation for Statistical Computing. Vienna, Austria. URL: https://www.R-project. org/. Rajamohan, Srijith, Alana Romanella, and Amit Ramesh (2019). “A Weakly-Supervised Attention-based Visualization Tool for Assessing Political Affiliation”. In: CoRR abs/1908.02282. arXiv: 1908.02282. URL: http://arxiv.org/abs/1908.02282. Redlawsk, David P. et al. (2018). “Donald Trump, contempt, and the 2016 GOP Iowa Caucuses”. In: Journal of Elections, Public Opinion and Parties 28.2, pp. 173–189. DOI: 10.1080/17457289.2018.1441848. eprint: https://doi.org/10.1080/ 17457289.2018.1441848. URL: https://doi.org/10.1080/17457289.2018. 1441848. Rei, Marek et al. (Sept. 2017). “Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection”. In: Proceedings of the 2017 Conference on Empir- ical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, pp. 1537–1546. DOI: 10 . 18653 / v1 / D17 - 1162. URL: https://www.aclweb.org/anthology/D17-1162. Rico, Guillem, Marc Guinjoan, and Eva Anduiza (2017). “The Emotional Underpin- nings of Populism: How Anger and Fear Affect Populist Attitudes”. In: Swiss Political Science Review 23.4, pp. 444–461. DOI: 10.1111/spsr.12261. URL: https: //onlinelibrary.wiley.com/doi/abs/10.1111/spsr.12261. Rooduijn, Matthijs and Brian Burgoon (2018). “The Paradox of Well-being: Do Unfa- vorable Socioeconomic and Sociocultural Contexts Deepen or Dampen Radical Left and Right Voting Among the Less Well-Off?” In: Comparative Political Studies 51.13, pp. 1720–1753. DOI: 10.1177/0010414017720707. URL: https://doi.org/ 10.1177/0010414017720707. Rooduijn, Matthijs and Teun Pauwels (2011). “Measuring populism: Comparing two methods of content analysis”. In: West European Politics. ISSN: 01402382. DOI: 10. 1080/01402382.2011.616665. Russell, James A and Albert Mehrabian (1977). “Evidence for a three-factor theory of emotions”. In: Journal of Research in Personality 11.3, pp. 273 –294. ISSN: 0092- 6566. DOI: https://doi.org/10.1016/0092- 6566(77)90037- X. URL: http: //www.sciencedirect.com/science/article/pii/009265667790037X. Salmela, Mikko and Christian von Scheve (2017). “Emotional roots of right-wing political populism”. In: Social Science Information 56.4, pp. 567–595. DOI: 10.1177/ 0539018417734419. eprint: https : / / doi . org / 10 . 1177 / 0539018417734419. URL: https://doi.org/10.1177/0539018417734419. Scherer, Klaus R. and Harald G. Wallbott (1994). “Evidence for Universality and Cul- tural Variation of Differential Emotion Response Patterning”. In: Journal of Per- sonality and Social Psychology. ISSN: 00223514. DOI: 10.1037/0022-3514.66.2. 310. Schroeder, Ralph (2019). “Digital Media and the Entrenchment of Right-Wing Pop- ulist Agendas”. In: Social Media + Society 5.4, p. 2056305119885328. DOI: 10.1177/ 72 Bibliography

2056305119885328. eprint: https : / / doi . org / 10 . 1177 / 2056305119885328. URL: https://doi.org/10.1177/2056305119885328. Schulz, Anne, Werner Wirth, and Philipp Müller (2020). “We Are the People and You Are Fake News: A Social Identity Approach to Populist Citizens False Consen- sus and Hostile Media Perceptions”. In: Communication Research 47.2, pp. 201– 226. DOI: 10.1177/0093650218794854. eprint: https://doi.org/10.1177/ 0093650218794854. URL: https://doi.org/10.1177/0093650218794854. Search and Learn the Bias of News Media. URL: https://mediabiasfactcheck.com/. Shutova, E., L. Sun, and A. Korhonen (2010). “Metaphor Identification Using Verb and Noun Clustering”. In: Coling. URL: https://www.aclweb.org/anthology/ C10-1113/. Shutova, Ekaterina (Dec. 2015). “Design and Evaluation of Metaphor Processing Sys- tems”. In: Computational Linguistics 41.4, pp. 579–623. DOI: 10.1162/COLI_a_ 00233. URL: https://www.aclweb.org/anthology/J15-4002. Shutova, Ekaterina, Douwe Kiela, and Jean Maillard (June 2016). “Black Holes and White Rabbits: Metaphor Identification with Visual Features”. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics, pp. 160–170. DOI: 10 . 18653 / v1 / N16 - 1020. URL: https://www.aclweb.org/anthology/N16-1020. Silva, Leandro et al. (2016). Analyzing the Targets of Hate in Online Social Media. URL: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13147. Sim, Yanchuan et al. (2013). “Measuring ideological proportions in political speeches”. In: EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Pro- cessing, Proceedings of the Conference. ISBN: 9781937284978. Smith, Craig and Richard Lazarus (Jan. 1990). “Emotion and Adaptation”. In: vol. 21, pp. 609–637. URL: https://www.researchgate.net/publication/232438867_ Emotion_and_Adaptation. — (1993). “Appraisal Components, Core Relational Themes, and the Emotions”. In: Cognition & Emotion - COGNITION EMOTION 7, pp. 233–269. DOI: 10.1080/ 02699939308409189. Søgaard, Anders and Yoav Goldberg (Aug. 2016). “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Berlin, Germany: Association for Computational Linguistics, pp. 231–235. DOI: 10.18653/v1/P16-2038. URL: https://www.aclweb.org/anthology/P16-2038. Speed, Ewen and Russell Mannion (2017). The rise of post-truth populism in pluralist liberal democracies: Challenges for health policy. DOI: 10.15171/ijhpm.2017.19. Srivastava, Vertika et al. (June 2019). “Vernon-fenwick at SemEval-2019 Task 4: Hy- perpartisan News Detection using Lexical and Semantic Features”. In: Proceed- ings of the 13th International Workshop on Semantic Evaluation. Minneapolis, Min- nesota, USA: Association for Computational Linguistics, pp. 1078–1082. DOI: 10. 18653/v1/S19-2189. URL: https://www.aclweb.org/anthology/S19-2189. Stanley, David (2018). apaTables: Create American Psychological Association (APA) Style Tables. R package version 2.0.5. URL: https://CRAN.R-project.org/package= apaTables. Steen, G.J. et al. (2010). A method for linguistic metaphor identification. From MIP to MIPVU. English. Converging Evidence in Language and Communication Re- search 14. John Benjamins. ISBN: 9789027239037. URL: https://research.vu. nl/en/publications/a-method-for-linguistic-metaphor-identification- from-mip-to-mipvu. Bibliography 73

Steiger, Russell L. et al. (2019). “Contempt of Congress: Do Liberals and Conserva- tives Harbor Equivalent Negative Emotional Biases Towards Ideologically Con- gruent vs. Incongruent Politicians at the Level of Individual Emotions?” In: Jour- nal of Social and Political Psychology 7, pp. 100–123. URL: https://jspp.psychopen. eu/article/view/822. Strapparava, Carlo and Rada Mihalcea (2007). “SemEval-2007 task 14: Affective text”. In: ACL 2007 - SemEval 2007 - Proceedings of the 4th International Workshop on Se- mantic Evaluations. Strzalkowski, Tomek et al. (2013). “Robust Extraction of Metaphor from Novel Data”. In: Proceedings of the First Workshop on Metaphor in NLP, pp. 67–76. URL: https: //www.aclweb.org/anthology/W13-0909/. Tausch, Nicole et al. (July 2011). “Explaining Radical Group Behavior: Developing Emotion and Efficacy Routes to Normative and Nonnormative Collective Ac- tion”. In: Journal of personality and social psychology 101, pp. 129–48. DOI: 10.1037/ a0022728. URL: https://pubmed.ncbi.nlm.nih.gov/21500925/. Tausczik, Yla R. and James W. Pennebaker (2010). “The psychological meaning of words: LIWC and computerized text analysis methods”. In: Journal of Language and Social Psychology 29.1, pp. 24–54. URL: https://journals.sagepub.com/ doi/pdf/10.1177/0261927x09351676. Tenney, Ian, Dipanjan Das, and Ellie Pavlick (Jan. 2019). “BERT Rediscovers the Clas- sical NLP Pipeline”. In: pp. 4593–4601. DOI: 10 . 18653 / v1 / P19 - 1452. URL: https://arxiv.org/abs/1905.05950. Thomas, Matt, Bo Pang, and Lillian Lee (2006). “Get out the vote: Determining sup- port or opposition from Congressional floor-debate transcripts”. In: COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Pro- cessing, Proceedings of the Conference. ISBN: 1932432736. arXiv: 0607062 [cs]. Tsvetkov, Yulia et al. (2014). “Metaphor Detection with Cross-Lingual Model Trans- fer”. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 248–258. DOI: 10.3115/v1/P14-1024. URL: https://www.aclweb.org/anthology/P14-1024. Turner, John C and Katherine J Reynolds (2010). “The story of social identity”. In: Rediscovering Social Identity: Key Readings. New York: Psychology Press, Taylor & Francis, pp. 13–32. URL: https://openresearch- repository.anu.edu.au/ handle/1885/24530. Turney, P. D. et al. (2011). “Literal and metaphorical sense identification through concrete and abstract context”. In: EMNLP. Edinburgh, UK. URL: https://www. aclweb.org/anthology/D11-1063/. Vaswani, Ashish et al. (2017). “Attention is all you need”. In: Advances in Neural Information Processing Systems. arXiv: 1706.03762. Veale, T., E. Shutova, and B. B. Klebanov (2016). Metaphor: A Computational Perspec- tive. Morgan & Claypool. ISBN: null. URL: https : / / ieeexplore . ieee . org / document/7423916. Voigt, Rob et al. (2018). “RtGender: A Corpus for Studying Differential Responses to Gender”. In: Proceedings of the Eleventh International Conference on Language Re- sources and Evaluation (LREC). URL: https://www.aclweb.org/anthology/L18- 1445. Wang, Alex et al. (Nov. 2018). “GLUE: A Multi-Task Benchmark and Analysis Plat- form for Natural Language Understanding”. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brus- sels, Belgium: Association for Computational Linguistics, pp. 353–355. DOI: 10. 18653/v1/W18-5446. URL: https://www.aclweb.org/anthology/W18-5446. 74 Bibliography

Weeks, Brian E. (2015). “Emotions, Partisanship, and Misperceptions: How Anger and Anxiety Moderate the Effect of Partisan Bias on Susceptibility to Political Misinformation”. In: Journal of Communication 65.4, pp. 699–719. DOI: 10.1111/ jcom.12164. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/ jcom.12164. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/ jcom.12164. Williams, E. J. (1959). “The Comparison of Regression Variables”. In: Journal of the Royal Statistical Society: Series B (Methodological) 21.2, pp. 396–399. DOI: 10.1111/ j.2517-6161.1959.tb00346.x. eprint: https://rss.onlinelibrary.wiley. com/doi/pdf/10.1111/j.2517-6161.1959.tb00346.x. URL: https://rss. onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1959.tb00346.x. Wirz, Dominique S et al. (2018). “The Effects of Right-Wing Populist Communication on Emotions and Cognitions toward Immigrants”. In: The International Journal of Press/Politics 23.4, pp. 496–516. DOI: 10.1177/1940161218788956. URL: https: //doi.org/10.1177/1940161218788956. Wolf, Thomas et al. (2019). HuggingFace’s Transformers: State-of-the-art Natural Lan- guage Processing. arXiv: 1910.03771 [cs.CL]. Wu, Fangzhao, Chuhan Wu, and Junxin Liu (2018). “Imbalanced Sentiment Classi- fication with Multi-Task Learning”. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. CIKM 18. Torino, Italy: As- sociation for Computing Machinery, pp. 16311634. ISBN: 9781450360142. DOI: 10. 1145/3269206.3269325. URL: https://doi.org/10.1145/3269206.3269325. Yang, C., K. H. Lin, and H. Chen (2007). “Emotion Classification Using Web Blog Corpora”. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI’07), pp. 275–278. URL: https://ieeexplore.ieee.org/document/4427100. Yang, Zichao et al. (2016). “Hierarchical attention networks for document classifica- tion”. In: 2016 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. ISBN: 9781941643914. DOI: 10.18653/v1/n16-1174. Yin, Da, Tao Meng, and Kai-Wei Chang (2020). “SentiBERT: A Transferable Transformer- Based Architecture for Compositional Sentiment Semantics”. In: CoRR abs/2005.04114. URL: https://ui.adsabs.harvard.edu/abs/2020arXiv200504114Y. Zhang, Yuxiang et al. (2018). “Text Emotion Distribution Learning via Multi-Task Convolutional Neural Network”. In: Proceedings of the Twenty-Seventh Interna- tional Joint Conference on Artificial Intelligence, {IJCAI-18}. International Joint Con- ferences on Artificial Intelligence Organization, pp. 4595–4601. DOI: 10.24963/ ijcai.2018/639. URL: https://doi.org/10.24963/ijcai.2018/639. 75

Appendix A

Extra Material

A.1 System

To train the models described in this thesis we used a cluster with 4 x Titan RTX, 24 ⃝ ⃝ GB GDDR6 GPU with an Intel R Xeon R 2.30 GHz CPU was used. RoBERTa itself has 125M parameters and our task specific layers added around 4.20M parameters, with some variance per task, making a total of 130M parameters.

A.2 Lists

List of Frames in Card et al. (2015) dataset – Economic: costs, benefits, or other financial implications – Capacity and resources: availability of physical, human or financial resources, and capacity of current systems – Morality: religious or ethical implications – Fairness and equality: balance or distribution of rights, responsibilities, and resources – Legality, constitutionality and jurisprudence: rights, freedoms, and authority of individuals, corporations, and government – Policy prescription and evaluation: discussion of specific policies aimed at addressing problems – Crime and punishment: effectiveness and implications of laws and their en- forcement – Security and defense: threats to welfare of the individual, community, or na- tion – Health and safety: health care, sanitation, public safety – Quality of life: threats and opportunities for the individuals wealth, happi- ness, and well-being – Cultural identity: traditions, customs, or values of a social group in relation to a policy issue – Public opinion: attitudes and opinions of the general public, including polling and demographics – Political: considerations related to politics and politicians, including lobbying, elections, and attempts to sway voters – External regulation and reputation: international reputation or foreign policy of the U.S. 76 Appendix A. Extra Material

– Other: any coherent group of frames not covered by the above categories

List of emotions in Mohammad et al. (2018) dataset – Anger (also includes annoyance, rage) – Anticipation (also includes interest, vigilance) – Disgust (also includes disinterest, dislike, loathing) – Fear (also includes apprehension, anxiety, terror) – Joy (also includes serenity, ecstasy) – Love (also includes affection) – Optimism (also includes hopefulness, confidence) – Pessimism (also includes cynicism, no confidence) – Sadness (also includes pensiveness, grief) – Surprise (also includes distraction, amazement) – Trust (also includes acceptance, liking, admiration)

A.3 Additional Tables

Anger Contempt Disgust Fear Gratitude Guilt Happiness Hope Pride Relief Sadness Sympathy Neutral left 23.06% 36.66% 26.63% 15.47% 0.91% 2.66% 0.68% 5.38% 2.44% 0.68% 2.04% 20.34% 30.59% left-center 22.34% 34.69% 26.53% 14.97% 1.08% 2.66% 0.96% 5.27% 2.1% 0.45% 1.87% 18.99% 31.8% center 24.63% 36.8% 25.43% 15.53% 1.31% 2.79% 1.02% 5.29% 2.67% 0.74% 1.93% 17.01% 31.0% right-center 26.23% 36.69% 25.95% 15.88% 1.24% 2.54% 0.96% 4.47% 3.0% 0.79% 1.47% 18.65% 29.68% right 31.87% 43.6% 27.69% 21.09% 0.79% 2.99% 1.07% 4.4% 2.54% 0.56% 2.37% 15.23% 26.11%

TABLE A.1: Percentages of comments within bias in the news source per emotion.

A.4 Figures A.4. Figures 77 Hidden representations for the three-way MTL emotion specific layer. A.1: IGURE F 78 Appendix A. Extra Material F IGURE A.2: idnrpeettosfrtetrewyMLgopietfiainseiclayer. specific identification group MTL three-way the for representations Hidden