Recognising Moral Foundations in Online Extremist Discourse

Recognising Moral Foundations in Online Extremist Discourse A Cross-Domain Classification Study Anne Fleur van Luenen Uppsala University Department of Linguistics and Philology Master Programme in Language Technology Master’s Thesis in Language Technology, 30 ects credits December 2, 2020 Supervisors: Dr. Mats Dahllöf, Uppsala University Anne Merel Sternheim, TNO Dr. Tom Powell, TNO Abstract So far, studies seeking to recognise moral foundations in texts have been relatively successful (Araque et al., 2019; Lin et al., 2018; Mooijman et al., 2017; Rezapour et al., 2019). There are, however, two issues with these studies: rstly, it is an extensive process to gather and annotate sucient material for training. Secondly, models are only trained and tested within the same domain. It is yet unexplored how these models for moral foundation prediction perform when tested in other domains, but from their experience with annotation, Hoover et al. (2017) describe how moral sentiments on one topic (e.g. black lives matter) might be completely dierent from moral sentiments on another (e.g. presidential elections). This study attempts to explore to what extent models generalise to other domains. More specically, we focus on training on Twitter data from non-extremist sources, and testing on data from an extremist (white nationalist) forum. We conducted two experiments. In our rst experiment we test whether it is possible to do cross domain classication of moral foundations. Additionally, we compare the performance of a model using the Word2Vec embeddings used in previous studies to a model using the newer BERT embeddings. We nd that although the performance drops signicantly on the extremist out-domain test sets, out-domain classication is not impossible. Furthermore, we nd that the BERT model generalises marginally better to the out-domain test set, than the Word2Vec model. In our second experiment we attempt to improve the generalisation to extremist test data by providing contextual knowledge. Although this does not improve the model, it does show the model’s robustness against noise. Finally we suggest an alternative approach for accounting for contextual knowledge. Contents Preface4 1. Introduction5 2. Previous Work7 2.1. Moral foundations theory.........................7 2.2. Technical background...........................8 2.2.1. Word2Vec.............................8 2.2.2. BERT................................9 2.2.3. Bidirectional LSTMs....................... 11 2.3. Computational analysis of moral foundations.............. 11 2.3.1. Computational tools for psychological analysis........ 12 2.3.2. Word embeddings........................ 13 2.3.3. Feature-based models...................... 14 2.3.4. Incorporating Background Knowledge............. 15 3. Data 17 3.1. The Moral Foundations Twitter Corpus................. 17 3.2. The Stormfront Corpus.......................... 18 3.3. The Reddit Corpus............................. 19 3.4. Annotation procedure........................... 19 4. Method 21 4.1. Study 1: Comparing in-domain and out-domain performance using Word2Vec and BERT........................... 22 4.2. Study 2: Improving the set-up for out-domain purposes........ 23 5. Results 26 5.1. Results study 1............................... 26 5.2. Results study 2............................... 26 6. Discussion 28 6.1. Qualitative analysis of extremist data classication.......... 28 6.2. Comparison between the BERT and Word2Vec models........ 30 6.3. Suggestions for further work....................... 32 6.4. The subjective nature of moral foundations............... 33 7. Conclusion 34 A. Inter-annotator agreement per batch 35 B. Precision and recall study 1 37 C. Precision and recall study 2 38 D. Confusion matrices for study 1 39 3 Preface This thesis has been written at TNO, the Dutch research and technology organisation. TNO mainly focuses on applied research, in cooperation with with Dutch companies and the Dutch government. More specically, this thesis was written as part of a project called Opponent Modelling. Within Opponent Modelling, the data science team aims to extract information from online news sources and forums, in order to gain insight in dierent opposing groups. More information on the work done in this project can be found in the paper by Powell et al. (2020). At TNO I specically would like to thank Anne Merel Sternheim for the many Monday morning teas we spent talking about life, thesis work and job applications. The start of the week was always very motivating! I also want to thank Tom Powell, for enthusiasm about the work and results obtained. Finally, thanks to my colleagues and fellow interns, who made this rst work experience a lot of fun, even if it was mostly remote. To Mats Dahllöf, thank you for your encouraging words and advice. To my annotators, thank you so much for dedicating time to help me out. Without you this work would not have been possible. I’d also like to thank my friends, specically Adam Moss, Camilla Casula and Abeer Salah Ahmad. The Uppsala experience would not have been the same without you. Finally, I’d like to thank my family and Niek Mereu, for being my biggest support, even across large distances. 4 1. Introduction Jesse Graham and Jonathan Haidt sought to nd universal components in moral ideas all over the world. They postulate ve ‘moral foundations’: ve basic moral ideas that we all agree on, although we dier in which ones we nd most important, and how we apply them to specic cases (Graham et al., 2013, 2009; Haidt, 2012; Haidt and Graham, 2009). One of the examples given is the one of taxes. People tend to agree that life should be fair, however, the liberal worldview prescribes that rich people should be “paying their fair share”, while the conservative worldview typically entails that “people should be rewarded in proportion to what they contribute”. In the debate on taxation, both liberals and conservatives appeal to the moral foundation of fairness, but they disagree about what exactly is fair. These kinds of understandings allow us to be more aware of how and why we think dierently (Graham et al., 2009). In the context of this paper, we are specically interested in the analysis of moral foundations among extremist groups, such as white nationalism. If we know which moral foundations are regarded as important within these groups, we also know which moral foundations can be used to provide a counter narrative. Additionally, Mooijman et al. (2017) describe how an increase in moral rhetoric on social media predicts violent protests. Therefore, reliable information on the use of moral rhetoric on social media can be useful in the security domain, as it provides insight in the psychological motivations of the group studied. In order to train a model to recognise moral foundations we need annotated data. Annotating data for moral foundations – like with any latent semantic task – is very time-consuming. It would therefore be convenient if a model trained on one corpus could generalise to other types of data. At the time of writing, the only annotated data available for this task is the Moral Foundations Twitter Corpus (MFTC) (Hoover et al., 2017). This corpus consists of roughly 35k tweets on various topics that have been annotated for moral foundations. Most previous studies on the automatic recognition of moral foundations have worked with a subset of this corpus. Although they are quite successful, most of these studies train and test on a specic set of tweets related to one topic. For instance, Mooijman et al. (2017) train and test on a subset of the corpus related to the Baltimore riots in the United States. Hoover et al. (2017) acknowledge that the sentiments expressed in discussions on one topic might contain completely dierent moral foundations from the sentiments expressed in discussions on another topic. They take this into account when creating their MFTC, purposefully using tweets from the political left (about the Black Lives Matter movement), the political right (about the All Lives Matter movement), both sides (about the presidential elections) and non-ideologically related tweets (about hurricane Sandy). Although this heterogeneity likely improves the generalisation of the model to out-domain data, it might still be dicult to generalise to completely unrelated topics. A similar diculty arises if we want to generalise training data like the MFTC to extremist data. Even in discussions that revolve around roughly the same topics as on the non-extremist forum, extremist groups have their own common theories, associations and jokes. This means that if we search for the MFTC topics on Stormfront (a white nationalist discussion forum), we might nd a completely dierent set of arguments, and thus moral foundations. The dierence between the extremist and the 5 non-extremist data could mean that each perspective uses completely dierent set of moral foundations even when they are discussing the same topic. Therefore, the purpose of this study is two-fold: rstly, we want to test to what extent a model trained on non-extremist tweets generalises to another ideology, namely extremist forum data. In order to compare the performance of our model on our test sets: a subset of the non-extremist training data and a set of extremist test data, we simultaneously test on another non-extremist data set. The reason for this is that our non-extremist training set consists of tweets, while our extremist test set consists of forum messages. By adding a non-extremist test set that consists of forum messages as well, we can estimate what part of the dierence in performance is most likely caused by the dierence between tweets and forum messages, and which part is caused by the dierence in ideology. In our rst study we train two bidirectional LSTMs to recognise moral foundations in our non-extremist training data: one using Word2Vec embeddings, and another using BERT embeddings. We test on the aforementioned test data to see how well our moral foundations classication model performs on out-domain extremist data compared to in-domain non-extremist data.

Load more