A Bootstrapped Approach for Abusive Intent Detection in Social Media Content

by

Benjemin Simons

A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science

Queen’s University Kingston, Ontario, Canada September 2020

Copyright © Benjemin Simons, 2020 Abstract

The proliferation of Internet connected devices continues to result in the creation of massive collections of human generated content from websites such as social media. Unfortunately, some of these sites are used by criminal or terrorist organizations for recruitment or to spread rhetoric. By analyzing this content, it is possible to gain insights into the future actions of the writers. This information can support orga- nizations in taking proactive measures to modify or stop said actions from taking place. The textual feature of interest is the expression of abusive intent, which can be thought of as a plan to carry out a malicious action. The proposed approach independently detects abuse and intent in documents, then computes a joint predic- tion for the document. Abusive language detection is a well-studied problem, which enabled a model to be trained using supervised learning. The intent detection model requires a semi-supervised technique since no labelled datasets exist. To do this, an initial collection of labels was generated using a linguistic model. These labels were then used to co-train a statistical and deep learning model. Using crowd-sourced labels, the abuse and intent models were found to have accuracies of 95% and 80%, respectively. The joint predictions were then used to prioritize documents for manual assessment.

i Acknowledgments

I would like to thank my supervisor, Professor David Skillicorn for all his guidance and patience over the past year and a half. His advice was always available and invaluable to the completion of this research. Thank you to all the professors and students in the School of Computing who made my time memorable and helped me work through problems and bugs. Finally, I would like to thank my family and friends for their continuous support. Their encouragement and company enabled me to finish school successfully and enjoyably.

ii Contents

Abstract i

Acknowledgments ii

Contents iii

List of Tables vi

List of Figures viii

Chapter 1: Introduction 1

Chapter 2: Related Work 5 2.1 Linguistics ...... 5 2.2 Social science ...... 8 2.3 Computing ...... 9 2.3.1 Intent detection ...... 10 2.3.2 Abusive language detection ...... 11 2.4 Tools and techniques ...... 12 2.4.1 Data handling and manipulation ...... 13 2.4.2 Word embeddings ...... 13 2.4.3 Deep learning ...... 15 2.4.4 Interpretation and visualization ...... 19

Chapter 3: Methodology 21 3.1 Data preparation ...... 22 3.1.1 Source selection and data collection ...... 23 3.1.2 Data cleaning ...... 23 3.1.3 Document partitioning ...... 25 3.1.4 Context-sequence matrix ...... 25 3.1.5 Word embeddings ...... 26 3.2 Label generation ...... 29

iii 3.2.1 Linguistic template ...... 31 3.2.2 Rough label generation ...... 34 3.2.3 Label refinement ...... 35 3.3 Extrapolation ...... 41 3.3.1 Rate limiter ...... 42 3.3.2 Sequence learner ...... 43 3.3.3 Deep learner ...... 45 3.3.4 Consensus ...... 47 3.4 Abusive language ...... 48 3.4.1 Training dataset ...... 48 3.4.2 Network architecture ...... 49 3.4.3 Model training ...... 50 3.5 Abusive intent ...... 51 3.5.1 Prediction generation ...... 51 3.5.2 Distribution normalization ...... 52 3.5.3 Vector norm ...... 54 3.5.4 Product ...... 57 3.6 Validation ...... 59 3.6.1 Data generation ...... 59 3.6.2 Label collection ...... 60 3.6.3 Label validation ...... 62 3.6.4 Prediction validation ...... 63 3.7 Document aggregation ...... 64 3.7.1 Average ...... 65 3.7.2 Maximum ...... 65 3.7.3 Windowed maximum ...... 66

Chapter 4: Results 67 4.1 Data preparation ...... 67 4.1.1 Data cleaning ...... 67 4.1.2 Document partitioning ...... 69 4.1.3 Context-sequence matrix ...... 72 4.1.4 Word embeddings ...... 73 4.2 Label generation ...... 74 4.2.1 Rough label generation ...... 74 4.2.2 Label refinement ...... 76 4.3 Extrapolation ...... 80 4.3.1 Sequence learner ...... 80 4.3.2 Deep learner ...... 82 4.3.3 Consensus ...... 83

iv 4.4 Abusive language ...... 84 4.4.1 Model training ...... 85 4.4.2 Abuse predictions ...... 86 4.5 Abusive intent ...... 89 4.5.1 Distribution normalization ...... 89 4.5.2 Vector norm ...... 90 4.5.3 Product ...... 94 4.5.4 Predictions ...... 100 4.6 Validation ...... 103 4.6.1 Volunteers ...... 105 4.6.2 Collected labels ...... 107 4.6.3 Prediction validation ...... 108 4.7 Document aggregation ...... 111 4.7.1 Averaged ...... 111 4.7.2 Maximum ...... 113 4.7.3 Windowed ...... 114

Chapter 5: Discussion 116 5.1 Limitations ...... 117 5.1.1 Imperfect initial labels ...... 117 5.1.2 Lack of support to detect implicit intent ...... 119 5.1.3 Adversarial input ...... 119 5.1.4 Accepting statements as truth ...... 120 5.1.5 Unrepresentative validation data ...... 120 5.1.6 Reliance and non-uniform definitions of abuse ...... 120

Bibliography 122

Appendix A: Datasets 134 A.1 Storm-Front (intent) ...... 134 A.2 Wikipedia (intent) ...... 134 A.3 Iron March ...... 135 A.4 Manifesto ...... 135 A.5 Hate speech ensemble ...... 135

Appendix B: Computational resources 137

Appendix C: Data labelling interface 138 C.1 Architecture ...... 138 C.2 Ethical clearance ...... 140 C.3 Labelling instructions ...... 140

v List of Tables

3.1 Table of dependencies defining short and long form intent templates . 33 3.2 Intent model architecture ...... 46 3.3 Abusive language dataset composition ...... 49 3.4 Abusive language model architecture ...... 50 3.5 Qualifying contexts ...... 60

4.1 Dataset character lengths before and after processing ...... 68 4.2 Words closest to “liberal” in custom and default fastText embeddings 75 4.3 15 sequences with the highest positive rates after 1 vs. 20 epochs . . 83 4.4 Examples of contexts with high abusive intent when normalizing dis- tribution ...... 94 4.5 Examples of contexts with high abusive intent calculated using the infinite norm ...... 95 4.6 Examples of contexts with high abuse or intent identified using the infinite norm ...... 96 4.7 Examples of contexts with high abusive intent when using the one norm 97 4.8 Examples of contexts with high abusive intent when using the two norm 98 4.9 Examples of contexts with high abusive intent when using the product calculation ...... 99

vi 4.10 Examples of Storm-Front contexts with low abusive intent when using the product calculation ...... 101 4.11 Examples of contexts with high abusive intent from abusive training dataset ...... 104 4.12 Examples of contexts with high abusive intent from Manifesto . . . . 105 4.13 Examples of contexts with high abusive intent from Iron March . . . 106 4.14 False negative examples ...... 110 4.15 False positive examples ...... 112 4.16 Aggregated document with high abusive intent using the windowed method ...... 114 4.17 Aggregated document with high abusive intent using the windowed method ...... 115 4.18 Aggregated document with high abusive intent using the windowed method ...... 115 4.19 Aggregated document with high abusive intent using the windowed method ...... 115

B.1 Machine descriptions ...... 137

vii List of Figures

2.1 Visualization of CBOW and Skip-gram architectures ...... 14 2.2 Visualization of an RNN neuron ...... 16 2.3 Visualization of an LSTM block ...... 17

3.1 Visualization of the pre-processing and computing performed before training ...... 22 3.2 Architecture of the real-time embedding mechanism ...... 28 3.3 Visualization of the linguistics model pipeline ...... 30 3.4 Template for short and long form of intent ...... 32 3.5 Visualization of tree refinement ...... 38 3.6 2D visualization of spatial refinement techniques ...... 40 3.7 Visualization of double-bootstrap learning phase ...... 42 3.8 Abusive-intent network model ...... 52 3.9 Visualization of the normalized one norm ...... 55 3.10 Visualization of the normalized two norm ...... 56 3.11 Visualization of the normalized infinity norm ...... 57 3.12 Visualization of the normalized product transform ...... 58 3.13 Sample weight based on effective label ...... 63

viii 4.1 Cumulative distribution of unique tokens within processed Storm-Front dataset ...... 69 4.2 Histogram of Storm-Front document lengths before and after being pre-processed ...... 70 4.3 Histogram of the number of contexts per document in the intent corpus 71 4.4 Cumulative distribution of sequences in the intent corpus ...... 73 4.5 Example of a dependency graph for a context ...... 76 4.6 Class distribution of the rough labels ...... 76 4.7 Minimum spanning sub-tree for desire verbs ...... 78 4.8 Histogram of the hyper-cube widths for desire verbs ...... 79 4.9 Number of non-unknown contexts available for training in each epoch 81 4.10 Positive and negative sequence rates throughout training ...... 82 4.11 Accuracy and loss after each epoch ...... 84 4.12 Visualization of the consensus labels after each training epoch . . . . 85 4.13 Validation accuracy and loss after each epoch ...... 86 4.14 Confusion matrix of abuse predictions on validation data ...... 87 4.15 Histogram of abuse predictions on two corpora ...... 88 4.16 Example SHAP values for Storm-Front contexts ...... 88 4.17 Cumulative distribution of abuse and intent predictions ...... 90 4.18 Joint histogram of abuse and intent predictions ...... 91 4.19 Estimated cumulative distribution of abuse and intent ...... 92 4.20 Histogram of abusive intent predictions computed by normalizing the distribution ...... 93 4.21 Abusive intent histograms computed using different norms ...... 100

ix 4.22 Example context with SHAP values for abuse and intent models . . . 102 4.23 Example context with SHAP values for abuse and intent models . . . 103 4.24 Histogram of the number of labels submitted by each volunteer . . . . 107 4.25 Histogram of effective labels ...... 108 4.26 Confusion matrix for intent model using validation labels ...... 109 4.27 Document level Abusive intent histograms computed using different aggregation methods ...... 113

C.1 Overview of the web application used to collect data labelling. . . . . 139 C.2 PM2 web application status readout ...... 140

x 1

Chapter 1

Introduction

The digitization of information now enables humans to have access to more informa- tion than at any other time in recorded history. However, much of the available data is in the form of unstructured text from sources such as social media. These collections of data contain insights into the past, present, and future thoughts and actions of the writer. Although past and present information about a person is valuable, future actions are those of interest to most organizations. Leveraging the available data, in- sights into someone’s future actions can be gained by analyzing documents (or posts) they author for expressions of intent. So, if a person were to share that they intend to do something, then an organization could use this information to attempt to modify or stop the action before it occurs. Within this thesis, intent will be defined as “the state of mind of one who aims to bring about a particular consequence” [23]. This technology can be used for applications such as online marketing, customer service, or defense. Many people and organizations now use the Internet as a way to communicate, organize, and conduct business. Unfortunately, the same adoption of technology has 2

occurred with criminal organizations and terrorist groups. A key mission of terror- ist organizations is the ongoing recruitment of followers and spread of propaganda [19, 27]. The forum for this rhetoric can either be public social media (for exam- ple, Twitter) or purpose-specific social media (for example, Storm-Front, a white- supremacist social media site). Such expressions also provide a window into a group’s thoughts, thereby giving value to this text. If performed effectively, this could allow law enforcement to proactively respond to problematic accounts and reach out with support to individuals being targeted. With the continuing rise in activity of terrorist organizations, especially domestic terrorism, there is a need for a proactive response or strategy [37]. This need in con- junction with the computational resources, techniques, and data currently available, mean the problem is now one that can be approached in a meaningful way. Within the context of supporting law enforcement, this translates to helping rank or prior- itize documents to be manually assessed. To do this a model is required that can accurately detect expressions of abusive intent. The detection of abusive intent can be thought of as having two component parts, the detection of abuse and the detection of intent. The former is a well studied problem with established techniques and existing datasets for training and evaluating models. Using these techniques and resources, the resulting model had a validation accuracy of 95%. However, the latter is an unsolved problem within computing, with previous works focusing largely on intent classification and intent detection on topic specific corpora. Though, there is a corpora of work in social science focused on threat detection and assessment. To bridge the gap, the objective of this work is to be able to accurately identify strong and explicit expressions of abusive intent in social media 3

text. To accomplish this a method was devised for predicting the presence of intent without the use of manually labelled data. To develop a model to detect intent in social media content, cues can be taken from the existing work on abusive language detection. However, a large amount of this content has irregularities in writing style, vocabulary, and contains spelling mistakes. As a result, there is a need to develop a method that is tolerant to noise while accurately predicting the presence of intentful content. Unfortunately, due to the immaturity of the field, there are no labelled datasets publicly available. This prevents the problem from being tackled using supervised learning techniques, that is, inferring the definition of intent by observing documents said to contain it (i.e. labelled as containing it). To address this problem, a method is required that produces a robust intent detection model based on an explicit linguistic definition for intent. While a model explicitly defining intent is required, previous work in abusive lan- guage detection have shown rule-based approaches encounter difficulty with messy data. To address this, a semi-supervised training technique was adapted for intent as it relates to law enforcement. The approach introduces an explicit model that generates initial labels, which are then used to co-train two models. These are a deep learning model and a statistical analysis technique that identifies relevant word se- quences. The training process results in a deep learning model that is able to identify instances of intent with an accuracy of 80%. Since the intent model is developed without labels, its performance was quantified using a set of crowd-sourced labels. This model, alongside an abusive language model, can be used to rank documents by the amount of abusive intent they contain. The ranking could then support orga- nizations in the timely identification of problematic documents and users requiring 4

manual assessment. The thesis is broken into five chapters. Chapter 2 will cover the previous work in linguistics, social science, and computing related to abusive intent detection. Chap- ter 3 will cover the new and adapted techniques used in this work. Chapter 4 will cover the results of applying these techniques. Finally, Chapter 5 will summarize the work and its limitations. 5

Chapter 2

Related Work

2.1 Linguistics

To cover the detection of abusive intent in text, it is necessary to consider the work in linguistics that defines it. The first work that should be discussed is that of Grice, specifically Logic and Conversation [24]. In this, Grice explains that someone’s utterance is not necessarily equivalent to their implicatum. By this, he means that what someone says is not necessarily the same as what they mean. Due to the complexity of this relationship in natural language, it is not something that can be modeled explicitly. Instead, Grice introduces maxims that help structure the relationship in typical (i.e. cooperative) conversation. He also introduces ways in which these maxims can be infringed on or when they do not have to be satisfied. Within intent detection, this framework expresses the complexity of natural language and stresses the need for advanced tools to assess it. The differentiation of utterance and implicatum was also studied by Austin when defining speech acts, which he broadly describes as interactions between people [4]. 2.1. LINGUISTICS 6

Most significantly in the context of this work, he splits an interaction into three com- ponents: the locution, illocution, and perlocution. Austin’s definition of locution and illocution correspond to Grice’s use of utterance and intention, respectively (i.e. what is said by the speaker and what is meant to be understood by the listener). How- ever, Austin introduces a third component, the perlocution, which is what happens as a result of the implication. For example, the speaker expresses the locution (exact phrasing) to the listener, who interprets the illocution (meaning), (potentially) causes the listener to perform the perlocution (desired outcome). Within linguistics, much of the work on intent detection focuses on analysis of the locution. A more specific definition for intent can be derived from work by Leech, where he describes and compares the use of the going-to and will methods of expressing future intentions [41]. He starts by defining such verb usages as references to event(s) that have not happened. As such, the expression should be thought of as a comment on the author’s prediction and/or intention for the future (in the context of the statement). For example, the locution “I will go to the store tomorrow” explains the speaker’s intent to follow through with the perlocution (of going to the store). The statement is also altered by the pronoun (i.e. first person vs third) and the external relationship between the speaker and listener (for example, parent to child vs. child to parent). Leech continues, explaining that the expression of a future intent using the will or going-to forms is semantically very similar. He also describes their use as a statement on the “future as outcome of present circumstances” [39]. Less formally, he suggests they be thought of as a “future outcome of present intention” or “future outcome of present cause” [39]. For abusive intent detection, this can be thought of as the writer’s plan for the future as a result of their current (mental, physical, etc.) 2.1. LINGUISTICS 7

state. Expanding on the work of Leech, Mair introduces a more current picture of how the forms of intent are used in written text [41]. He found that the going-to (and similar) expressions of future intent are significantly more common and important (than will) in modern English. From the corpus assessed, Mair observed that the increase in the use of going-to is largely a result of it being the informal method of expressing future intent. Specifically, with changes in the use of written language, people are increasingly using the informal expression of future statements. As a result, when considering written intent, the going-to form should be the primary focus, with will having secondary importance. There is also work using linguistics to better interpret the illocution from the locution (i.e. understanding the meaning or intention from what is said). Most significantly is the work of Pennebaker and Chung, which assesses the relationship between the locution and illocution by observing whether the perlocution occurs [13]. They consider the rhetoric of four extremist groups and how this translated to their actions taken over a period. It was found that function words, or filler words (for example, the, a, he, she, if, then, etc.) play a significant role in expressing the illocution of a statement (i.e. the group’s actual intent) [48]. This was reflected in changes in function word usage during the lead-up to violent action by the groups in question. A similar result was found by Walker, who noted that there is a shift in communication patterns and verb usage in the lead-up period [65]. These results show that despite the complexity of natural language, locutions can be assessed to attempt to understand the illocution. 2.2. SOCIAL SCIENCE 8

2.2 Social science

In addition to expressions of intent being defined within formal language, it has also been studied within social science. However, unlike linguistic approaches, much of this work is interested in how the locution and illocution relate to the perlocution (i.e. how what is said or meant translates to future action). Most directly relevant is frame analysis, which studies the effect of communications on their audience [51]. Specifically, Sanfilippo has published several papers investigating the relationship between frames and violent action [52]. This work helps structure the real-world relationship between the utterance (locution), meaning (illocution), and resulting action (perlocution). Sanfilippo et al. also equate frames to speech acts, where a promoter speaks the locution to convey an illocution, which contains an issue and specifies a target for the perlocution [51]. By defining these relationships, Sanfilippo et al. are able to better identify the sources, effectiveness, and targets of malicious communications. Using this approach, they were able to model the communication method and issues targeted by the same groups as studied by Pennebaker and Chung (see Section 2.1) [13, 52]. By framing the communications, Sanfilippo et al. were able to better understand and characterize the promoters based on their communicated content and methods. Intergroup Threat Theory (ITT) is a method introduced by Stephan and Stephan for understanding and characterizing threats [60]. ITT does this by defining the presence of an in-group (source) and an out-group (target) in communications and threats. ITT also classifies threats as realistic, symbolic, intergroup anxiety, or nega- tive stereotypes. These classes can be used to help formally differentiate threats and help frame analysis in explaining promoter and audience attitudes. It can then be 2.2. SOCIAL SCIENCE 9

used with frame analysis to further define the issue (as expressed by the promoter) and help determine whether an illocution is genuine. By better understanding the illocution, more insight into the specifics or probability of the resulting action can be gained. This insight is leveraged by work such as that of Farag´oet al., showing the relationship between intergroup attitudes and resulting violence [20]. Egnoto et al. furthered this by using ITT to predict the membership of the author as part of the in or out-group by using the content of communications during an event [18]. There is also a body of research around threat assessment, which aims to determine whether an attack will occur [42]. Threat assessment is a well-developed field, which is primarily used by military, law enforcement, and security personnel to quantify and respond to risks. The methods generally focus on evaluating a specific threat made against a target and general threats/vulnerability analysis of a target [42]. This can be used alongside frame analysis and ITT to identify the source, target, and reason behind a potential event and estimate the probability of it occurring. Part of this literature has shown that presence of abuse alongside intent can increase the chance of the source acting (discussed further in Section 2.3) [42]. There has also been work by Ram´onand Hamm, and Asal and Vitek in assessing lone-wolf extremists [3, 58, 59]. It was also found that the capability of the source (to carry out what they say) is a key factor in the probability of action (i.e. if a group couldn’t do what they said, it didn’t happen) [42]. Schuurman and Eijkman developed a framework to guide the assessment of a source’s intent and capability [57]. 2.3. COMPUTING 10

2.3 Computing

From a computational perspective, there are two significant groups of work to con- sider, intent detection and abusive language detection. The former directly relates to this work in its attempt to identify expressions of intent in text. The latter sup- ports abusive intent detection by providing methods for identifying abuse or malicious sentiments.

2.3.1 Intent detection

Over the past decade, intent detection in text has been approached from a compu- tational perspective. However, most of this work is similar to that of Kim et al. in finding approaches to classify intent for applications such as chatbots [35]. Similarly, there has been work such as that by Dai et al. in commercial intent classification in search queries. The goal of this work is to determine if what someone is search- ing implies that they are going to buy something [14]. More relevant is the work of Vedula et al. in developing a technique called open intent detection, which iden- tifies expressions of query intent and its subject in forum text [64]. This aims to detect any post containing an actionable verb and subject (for example, “How can I keep my phone from just falling over when watching videos?” → keep phone from falling) [64]. There is also work such as that of Chen et al., Wang et al., Gupta et al., and Hollerit et al. on commercial intent detection and classification from social media text [11, 66, 25, 28]. Subramani et al. have similar work, performing word- based intent classification within a topic-specific social media forum [61]. However, all these papers used a wide definition of intent and relied heavily on bag-of-words based approaches and/or a topic specific corpus. These approaches resulted in either 2.3. COMPUTING 11

inaccurate predictions or accurate predictions for an ambiguous (and large) target. This can be partially attributed to an observation from Agarwal and Sureka, who found that keyword-based approaches don’t carry enough information to accurately identify intent [2]. Most notable is the work of Purohit et al., who developed a relatively accurate (approx. 80% accuracy) approach for intent detection from social media sources [49]. This was accomplished through the use of two complementary approaches: bag-of- words and declarative knowledge-guided (DK) patterns. Unlike previous bag-of-words approaches, theirs used word n-grams (sequences of words) to better capture the usage and context of words. They then used a small set of labelled data to extract the sig- nificant features from this set. DK patterns are a set of regex functions that check for the presence of specific word patterns (for example, (I).*(want).*(to).*(give)) [49]. However, the problem was also set up as a multi-class classification problem where the classes and patterns were chosen based on the source content. Similar to the other papers, the definition of intent used was also broad. There has also been work in automating some of the social science approaches (as discussed in Section 2.2) to perform intent analysis and estimate the probability of action for specific targets. Most prominently is the Violent Intent Modelling (VIM) framework by Sanfilippo et al. for automating and organizing information related to frame analysis [54, 53, 6]. This research is supported by previous work, such as the other work of Sanfilippo et al. in developing frame analysis (see Section 2.2) [51]. However, the main purpose of this work is to enable more thorough manual analysis of individuals and groups, such as those mentioned in Section 2.2 and The International Handbook for Threat Assessment [42]. 2.4. TOOLS AND TECHNIQUES 12

2.3.2 Abusive language detection

The handbook noted a correlation between the presence of abusive language in threats and violent action [42]. Based on this finding, it is hypothesized that should docu- ments contain both intentful and abusive language, their author has a higher chance of committing a violent act. Abusive language or hate speech also helps to identify malicious sentiment directed towards a person or group. Abusive language detec- tion has gained popularity thanks in large part to the dramatic increase in the use of social media [16, 46]. Past work, such as that of LeBlanc, has shown that hate speech on social media can be predicted with high accuracy [38]. Specifically, LeBlanc showed abuse can be detected with high-accuracy based on the presence of word or character sequences, and recurrent neural networks. Other papers have also shown high-accuracy results for insult and hate speech detection in social media content [16, 15, 67]. Gao et al. introduced a technique called double bootstrapping for predicting abusive language from messy unlabeled datasets [22]. The approach starts by using a dictionary of hate speech terms to generate initial labels for the data. The labels are then fed into two models, which then proceed to co-train, thereby learning from the labels, making new predictions, then passing the predictions to the other model. This enables accurate extrapolation on the initial labels and reinforces the learning process on a large set of text data [9].

2.4 Tools and techniques

The research also relies on previous work developing the tools and techniques for tasks such as deep learning or data visualization. The works can be split into data 2.4. TOOLS AND TECHNIQUES 13

handling and manipulation, word embeddings, deep learning, and data visualization.

2.4.1 Data handling and manipulation

To facilitate easy and efficient manipulation of data, the Numpy, Scipy, SKLearn, and Pandas open source packages were used [47, 32]. The work used a statistical part of speech tagger to model the use of and relationships between words. Finally, the work also uses Empath, which is a statistically derived thesaurus tool [21]. It is constructed by taking large amount of digital text and deriving relationships between words. These relationships were then reviewed and evaluated manually, giving the derived categories high semantic accuracy. This clustering into categories was done by using word embeddings, as discussed below.

2.4.2 Word embeddings

When performing deep learning or analysis with natural language data, it is often beneficial to use word embeddings. These are continuous real-valued vectors that represent words based on where they are located in the vector space. Over the past decade, a significant amount of work has gone into developing new ways to train and interpret these vectors. A significant work is that of Mikolov et al., who introduced the word2vec method [43]. This is an unsupervised learning method where the word vectors are trained based on their usage in a given corpus. Specifically, the work intro- duced the CBOW and Skip-gram network architectures for training word embeddings (see Figure 2.1). The methods were developed to avoid the use of a traditional neural network, allowing the models to train faster. This allows the models to be trained over more data over the same period. Additionally, by avoiding neural networks, the 2.4. TOOLS AND TECHNIQUES 14

relationships between the words are increasingly linear [43].

Figure 2.1: Visualization of the CBOW vs. Skip-gram architectures [43]

Although similar in inspiration, the CBOW and Skip-gram architectures take

th opposite approaches to train the vectors. Using CBOW, the t word vector, wt, is predicted based on the word vectors in a neighborhood around it, or

0 0 0 (wt : 0 < |t − t | < n,˜ t ∈ Z),

wheren ˜ is the size of the neighborhood. In Figure 2.1, this neighborhood can be seen to be 2. Conversely, Skip-gram fits the vectors by estimating the other words in the neighborhood based on the current word. Both models were found to have high semantic and syntactic accuracies. However, Skip-gram reported a higher semantic accuracy, so it was used in this work [43]. The computational complexity of word2vec’s training process is dependent on the number of words in its dictionary [43]. Since the data source of this work is social 2.4. TOOLS AND TECHNIQUES 15

media, and thus messy, the number of unique words is extremely high. Additionally, in a real-world scenario, the words that could be encountered by the model would not be known ahead of time. As such, it is important to be able to produce word vectors for new terms as needed. This problem was solved by the work of Bojanowski et al. with the fastText model [10]. FastText is an extension of word2vec that instead trains vectors for the sub-components of words. It does this by splitting a word into its component n-grams (for example, “where” becomes { “wh”, “whe”, “her”, “ere”, “re” }) [10]. This allows a word vector to be computed by taking the sum of its component n-grams vectors. This is given by,

|w| X w = zg, g=0

th where zg is the g of |w| character n-grams that make up the word. The composite word vectors can then be trained in the same way as introduced by word2vec. By making word vectors compositions of their sub-components, fastText also re- quires less data to train [10]. This is because although entire words may not repeat often in the corpus, conjugations, for example, likely do. For example, “becom- ing” may not appear often, but its collective frequency with similar words such as “become”, is much higher. Since the component vectors are used to generate the com- posite word vector, becoming’s vector is trained by occurrences of other words that share n-grams with it. For example, “become”, “became”, or even “going”. Unlike approaches based on the semantic meaning of the words (for example, a derivational parser), this also enables the effective handling of misspelt and corpus-specific words. 2.4. TOOLS AND TECHNIQUES 16

2.4.3 Deep learning

For the deep learning component of this work, recurrent neural networks (RNNs) are crucial. These are layers that are designed to model and make predictions around ordered data. Natural language, or text, is an example of ordered data, where each word’s meaning is influenced and/or altered by those around it. A basic RNN node works similar to a traditional neuron, but is additionally able to pass its output to

the proceeding neuron (see Figure 2.2). Specifically, given an input xt, a traditional neuron would simply output a scaled version of the input, ht. A RNN neuron would additionally have the output of the preceding neuron, ht−1, as an input and pass its output to the following neuron. However, this architecture is not able to effectively model relationships between non-neighboring neurons [8].

Figure 2.2: Visualization of an RNN neuron

This problem was solved by Hochreiter and Schmidhuber with the introduction of the Long Short-Term Memory (LSTM) block [26]. The LSTM is a extension of the traditional RNN with the addition of memory and gates to control how the memory 2.4. TOOLS AND TECHNIQUES 17

is altered. Specifically, it has the same two inputs as a RNN, xt and ht−1, as well as the memory input, Ct−1. The memory is modified by three mechanisms within the block, a, b, and c, as indicated in Figure 2.3. Mechanism a is what decides how much of the previous memory, Ct−1, will be kept. Mechanism b is what decides how much of the current input will be added to the memory for the following blocks, denoted

Ct. Mechanism c is what decides how much of the current input is used to compute the current output, ht. Using these controls an LSTM layer can effectively model relationships over long text sequences [45].

Figure 2.3: Visualization of an LSTM block

RNNs are meant to model sequential relationships in a single direction. However, words stated later can modify the meaning of previous ones. Since this work deals with complete documents (vs. predicting the next word when someone is typing), the performance of the LSTM can be enhanced by making it bidirectional. In this case, it has been seen that modelling the input both forward and backward improves the networks ability to understand it [56]. 2.4. TOOLS AND TECHNIQUES 18

While LSTMs were designed to help mitigate the vanishing gradient problem (i.e. modelling relationships in sequential data), some limitations persist for long input sequences. This problem was addressed by Bahdanau et al. with the introduction of attention layers for machine translation [5]. Yang et al. then developed hierar- chical attention, which is meant for document classification and/or summarization [71]. The proposed architecture was built to (roughly) use the sequential modelling characteristics of RNNs to produce content-aware embeddings. The attention layer then models the input sequence while having a more holistic view of the input data. That is, it can interpret the RNN output (i.e. memory capable sequential encoding of the data) with the additional ability to view the remainder of the RNN nodes. This allows the network to understand a word based on its usage, but then interpret its meaning considering the entire input. To train the network, the Adam optimization algorithm was chosen. Adam was designed to optimize stochastic objectives with minimal processing and memory re- quirements [36]. A stochastic objective is necessary since when performing optimiza- tion with real-world data since it likely contains noise. As previously discussed, this is especially true for social media data. The low processing and memory requirements mean that the algorithm can be executed faster given equivalent hardware, reducing training time. Finally, the algorithm also supports non-stationary objectives. This is important because of the proposed extrapolation process, which allows changes to the labels between epochs (discussed in Section 3.3). The implementation of the previously discussed components and algorithms (within Section 2.4.3), with the exception of attention, were from Keras [12]. Keras is a deep learning library built on top of TensorFlow, an open-source deep learning library by 2.4. TOOLS AND TECHNIQUES 19

Google [1]. The hierarchical attention layer implementation was done by Sankesara [55].

2.4.4 Interpretation and visualization

To visualize and interpret the results, the open-source Python library Matplotlib was used. The previously mentioned libraries (see Section 2.4.1) were used to prepare the data to be fed to Matplotlib. To visualize feature significance for deep learning models, Shapely additive expla- nations (SHAP) are used. Although deep learning models perform extremely well for text classification tasks, their results are often hard to interpret. To address this, a set of techniques called additive feature attribution methods have been developed. These seek to explain how a model makes a given prediction using an explanation model. The explanation model used by these methods is a linear model given by,

Ti 0 X 0 g(x ) = φ0 + φtxt, t=1 where g is the explanation model, x0 ∈ {0, 1}Ti are the (simplified) inputs, and φ are the feature significance values [40]. By using a linear model, the contribution of feature t is just φt. In other words, the magnitude of φt can be thought of as the significance of the feature and the polarity shows whether it contributes to a positive or negative prediction. However, for φt to be meaningful, the explanation model, g, must be chosen so it obeys three properties, local accuracy, missingness, and consistency [40]. Local accuracy requires that for a given input, x, the explanation model gives the same prediction as the model, or f(x) = g(x0). The relationship

0 0 between x and x is defined using a simplification function, hx, such that x = hx(x ). 2.4. TOOLS AND TECHNIQUES 20

Missingness requires that if a feature isn’t present, it cannot have a contribution to

0 the model’s prediction, or xt = 0 =⇒ φt = 0. Consistency requires that a feature’s

0 significance, φt, is non-decreasing with respect to the contribution of the input xt (should the model change). In other words, if the model is trained further increasing

0 the impact of xt on the prediction’s value, its significance, φt, cannot decrease. This can be expressed explicitly as,

0 0 0 0 0 0 fx(x ) − fx(x \ t) ≥ fx(x ) − fx(x \ t),

0 0 where fx is the modified model, and (x \ t) is the input vector with the feature t removed. Lundberg and Lee showed that there is a unique solution (i.e. SHAP) for the feature significance values when adhering to all three properties [40]. This solution is given as,

−1 1 X (T − 1) φ (f, x) = (f (z0) − f (z0 \ t)), (2.1) t T |z0| x x 0 0 z ∈P (xS )

0 0 0 where P (xS) is the powerset of the non-zero components of x , |z | is the number of non-zero components, and T is the number of input features [40]. However, to derive

0 |x0| feature significance values this way would require the training of T ·|P (xS)| = T ·2 models, giving SHAP an exponential computational complexity. Lundberg and Lee solved this problem by developing estimations using conditional probability. This is accomplished by estimating the model’s output for a given set of input features, or

0 0 fx(x ) = E[fx(x) | xS],

0 where xS are the non-zero components of x . This result allows SHAP values to 2.4. TOOLS AND TECHNIQUES 21

be computed in O(T 2) time [40]. Note that this also relies on assuming feature independence. 22

Chapter 3

Methodology

In order to detect intent in text, a model-based approach is required. This means that a model must be developed then evaluated rather than simply being derived from labelled data. The devised process can be split into data preparation, initial label generation, label extrapolation, and validation. Data preparation encompasses how the raw text is cleaned and data is precomputed before being used by the model. After being prepared, an initial collection of labels is computed to indicate the presence of explicit expressions of intent. These labels are then passed to a sequence and deep learning model, which perform co-training to iteratively derive additional labels for the dataset, thereby extrapolating on the meaning of intent. Finally, the deep learning model’s performance can be validated using crowd-sourced labels. In order to detect abusive intent, an abuse model must also be trained, then have its predictions combined with those of the intent model. The model is trained using established techniques and existing datasets for abusive language detection. The model’s predictions are then combined with those of the intent model to compute abusive intent predictions for the data. Finally, predictions can be aggregated to provide a more holistic view. 3.1. DATA PREPARATION 23

3.1 Data preparation

Before using the data for training and quantification, it must be collected, cleaned, and used to pre-compute additional data objects. Data collection comprises not only acquiring the data, but also selecting which data to collect. Once the raw data is made available, it is cleaned to eliminate characters and elements that would impede its use. The cleaned data can be used to pre-compute information also used in the training and prediction process. The information pre-computed includes document contexts, a context-sequence matrix, and word embeddings (covered in sections 3.1.3- 3.1.5). See Figure 3.1 for an overview of the data preparation and pre-computing pipeline.

Figure 3.1: Visualization of the pre-processing and computing performed before train- ing 3.1. DATA PREPARATION 24

3.1.1 Source selection and data collection

The purpose of the work is to enable the robust prediction of abusive intent from social media content. As such, the text used to develop the model should likewise be from an online forum to increase the likelihood of similar word and grammar usages. In order to maximize the chances of success, the data selected should contain a high density of intentful statements. This is significant since, as previously discussed, social media text is often very noisy. Thus, the more training examples that are available to the models, the higher the chance of success. Based on the above criteria, a sensible target for data collection is the white-supremacist forum, Storm-Front, given its highly volatile nature and large data size. Unrelated to this research, a dataset containing 22, 757, 468 posts from the Storm-Front site was collected by Professor Richard Frank at Simon Fraser University [70]. A random subset of 250, 000 posts were selected to be used in the training process. For more information about the source and ethics clearance for the dataset can be found in Appendix A.1.

3.1.2 Data cleaning

Once collected, the text will be processed to remove undesirable structures such as hyperlinks or HTML tags. This processing is applied in two stages, the primary, then runtime processing. After the primary stage is complete the text contains only lowercase characters and punctuation. As depicted in Figure 3.1, this text is used to compute document contexts and initial labels. The functions that form the primary stage of processing are as follows:

1. Remove quotes by checking for HTML formatting as used by Storm-Front 3.1. DATA PREPARATION 25

2. Convert Unicode to ascii (for example, removes accents from characters) for ease of use and because English does not require them

3. Remove URLs from the text

4. Remove HTML tags

5. Remove embedded images using HTML tags

6. Remove optional brackets (i.e. word(s) becomes words)

7. Remove emojis when present as tags instead of Unicode (i.e. when they’re encoded using ascii characters)

8. Remove Twitter user handles (where applicable)

9. Remove hashtags and split the text into words when in camel case (for example, “#WordWord” becomes “word word”)

10. Convert uppercase characters to lowercase

11. Remove periods from acronyms (for example, “y.y.z.” becomes “yyz”)

12. Remove digits

13. Remove characters when repeated more than twice (for example, “wooord” becomes “word”)

14. Remove all characters identified by the regex [^a-zA-Z,.!?’";:- ]

The processed documents are then split into their component contexts (discussed further in Section 3.1.3). The following final transforms are then applied before using the data for analysis or training the word embeddings. 3.1. DATA PREPARATION 26

1. Remove all non-alphabetic characters identified by the regex function [^a-zA-Z ]

2. Remove extra spaces between words (for example, “word word” becomes “word word”)

3.1.3 Document partitioning

Before passing the data to the model, it is important that the documents are broken down into statements. This is done since an expression of intent or abuse is made within a sentence. Though other sentences or paragraphs can support the statement, it is the statement itself that should be identified. Unlike other social media sites such as Twitter (which has a post size limit), many Storm-Front posts are long. In a long document, it is also possible for multiple statements to be present, some of which could express non-intent. Thus, when training, documents with both intent and non-intent present could affect the model’s understanding of intent. To prevent this, documents are split at sentence terminators (for example, “.”) and semi-colons. By splitting documents into their component parts, or statements, the additional granularity provides the model the best chance of learning to identify expressions of intent. Since an expression of intent resides within a single statement, the splitting does not remove or alter it. Similarly, semi-colons divide related clauses, so splitting there would likewise not affect the expression of intent in either clause. The documents are split using the regex pattern [.?!;]+, forming what will be referred to as contexts.

3.1.4 Context-sequence matrix

Before starting the training process, a context-sequence matrix should be computed. This is done because it is memory intensive and time consuming to compute, but cheap to store and query once complete. The N × M matrix stores the number of 3.1. DATA PREPARATION 27

occurrences that a given sequence of words, sj, appears in each context. The count of the number of times the sequence is present is denoted σi,j. The sequences, or word n-grams, are defined as ordered collections of three to six words. A minimum number of three words was selected since it is the minimum required to express explicit intent (discussed further in Section 3.2). This results in a sparse matrix computed using the SKLearn CountVectorizer class with a custom token pattern of \b\w+\b (which also considers single-character words). The number of columns, or M, in the matrix is chosen so the M most common sequences account for roughly 90% of those in the corpus. Specifically, M is defined such that

N M N M 0 X X X X σi,j ≈ 0.9 · σi,j, (3.1) i=0 j=0 i=0 j=0 where M 0 is the total number of unique sequences in the corpus. The value of M is limited because when used by the term-learner (discussed later in Section 3.3.2), rare terms are not of interest. The matrix is used to find patterns in the data, but if a sequence is only used in one location, then it is not helpful for this application.

3.1.5 Word embeddings

Before training the abuse or intent models, word embeddings are computed that reflect the target corpus. Although the default fastText model can generate a vector for any token it is given, the location of it within the vector space would be wrong. This is significant since the location of a word vector (relative to the other words) gives it meaning. Locally trained embeddings perform better on tasks with the same word usage [17]. This difference in location can be especially pronounced for corpora that have unconventional word usage or spelling errors. To give the model the best 3.1. DATA PREPARATION 28

chance of learning intent from the corpus, custom embeddings should be used.

Training

To generate the embeddings, the data was pre-processed as described in Section 3.1.2. This is done so that when the word embeddings are trained, they are trained to recognize the words as they will appear when used by the model. The cleaned data is then fed to fastText 0.9.2 to train 200-dimensional vectors using the Skip-gram technique with default training parameters. This embedding size was chosen as to include sufficient information, while providing memory and computational savings during training and prediction. Skip-gram was chosen instead of CBOW because of its superior semantic accuracy, as reported in the original word2vec and fastText papers [43, 10]. The embeddings were trained for 5 epochs over the full Storm-Front dataset (i.e. not the random subset of the data is used for training the intent model). The resulting fastText model can then be used to generate a word embedding for any token (as discussed in Section 2.4.2).

Realtime-Embeddings

Once the fastText model was trained using a topic-specific corpus, the model could be used to generate word embeddings as needed. In order to facilitate ease of use while maintaining adequate performance, a mechanism was built to compute document em- beddings in real-time. Additional logic was built around fastText’s Python wrapper to enable caching and label association. Given the size of the datasets, it was also important for the model’s input to be supplied as a generator rather than an allocated array. An overview of the mechanism’s architecture can be seen in Figure 3.2. 3.1. DATA PREPARATION 29

Figure 3.2: Architecture of the real-time embedding mechanism

Using a generator allows the documents to embedded in batches, which were chosen to be 512 documents in size. This produces a matrix of size 512 × 200 × 200, or documents by the maximum number of tokens by the embedding dimension, to represent the embedded batch. Since the deep learning code is executed on machine B, which uses a Nvidia graphics card, the matrix is composed of 32-bit floats. This means that each embedded batch requires 98 MB (approx.) to store in memory. Without a generator and each document using 0.19 MB (approx.), 391 GB (approx.) would be needed to embed the entire dataset, which was not available. Using the converse strategy of generating each word embedding as needed would be computationally inefficient given the rate of word repetition. By comparison, a dumb cache that simply remembers every unique token while never removing tokens would only require 443 MB (approx.). This is since a single word embedding requires just 0.98 KB (approx.) to store. Using a python dictionary, the complexity of retrieving a word embedding is an O(1) operation since it is built using a hash table. As such, the mechanism can be seen to have an O(NM) computational complexity (where N 3.2. LABEL GENERATION 30

is the number of documents and M is the maximum number of tokens per document) and O(|W |) memory complexity (where |W | is the number of unique tokens in the corpus) when generating the embeddings for the dataset. Another approach could have been to pre-generate all required embeddings for the dataset being assessed. As discussed previously, this is a relatively cheap memory operation and only has to be performed once. However, it suffers from ease of use considerations since each dataset requires its own collection of word embeddings. Additionally, should a dataset be cleaned in a different way new tokens could be added, requiring the re-generation of the collection. The real-time embedding mechanism also manages the association between the labels and contexts. When training, the model requires the embedded documents along with the document label and loss weight. The mechanism was designed such that the appropriate labels and weights were supplied when queried by the model. The specific use and calculation of these weights will be discussed further in Section 3.3.3.

3.2 Label generation

Before computing an initial collection of labels, a template describing what intent looks like must first be created. The labels can then be used to train a deep-learning model and make predictions about intent in other contexts. To compute the initial set of labels, the following steps are taken: the template is defined, rough labels are computed using the template, then the labels are refined. The creation of a template for explicit intent relies on existing work in linguistics and computer science (as discussed in Chapter 2). Computing the rough labels is done by checking whether a context contains all the components specified by the template. 3.2. LABEL GENERATION 31

While computing the rough labels, the significant tokens are also collected to enable the labels to be refined. The rough labels are refined to better target contexts with strong expressions of intent. An overview of this process can be seen in Figure 3.3.

Figure 3.3: Visualization of the linguistics model processing pipeline and initial label generation

Explicit expressions of intent can be found using a technique similar to DK pat- terns, as used by Purohit et al. (discussed in Section 2.3.1) [49]. However, the proposed template relies on Part of Speech (PoS) tags and word dependence relations rather than regex functions (as used by Prohit et al.). This enables greater flexibility and accuracy, especially when dealing with messy text, as is usually the case with content from social media. However, it is a much more computationally expensive operation, especially compared with regex. This cost is acceptable since it is only 3.2. LABEL GENERATION 32

necessary to run it once before training the model.

3.2.1 Linguistic template

Before being able to compute the rough labels for explicit intent, a template must be defined. The template is built to identify a wide set of contexts that contain highly generalized forms of explicit intent. This way as many contexts as possible can be identified immediately. The template is defined as a directed graph with word classes for the nodes and word relations for the edges. The remainder of this subsection contains the derivation and definition of the template. As discussed in Section 2.1, the expression of a future intention can be broken into two basic categories. They will be referred to as short and long forms of intent, which correspond to the will and going-to forms, respectively. However, the long and short forms use a relaxed definition of intent compared to the will and going-to forms. Using PoS tags and word dependencies, both forms can be modelled computationally. A PoS tag is something that indicates the role of a given word in a sentence (for example, noun, adverb, etc.). Word dependencies model the relationships between words, that is how they relate or modify the meaning of each other (for example, indicating the subject of a verb). Modelling words and their interactions enables greater flexibility and accuracy when dealing with inconsistent spelling, sentence structure, and word use. This model is then searched for within contexts, enabling the computation of an initial set of labels. Using the PoS tags and word dependencies to model the short and long forms of intent results in two templates (see Figures 3.4a and 3.4b). The short and long forms, though different, share common features. The first criteria is the use of a 3.2. LABEL GENERATION 33

(a) Short form of explicit intent

(b) Long form of explicit intent

Figure 3.4: Template for short and long form of intent

first person pronoun (“I” or “we”). By making an expression in the first person, the speaker indicates an intention, rather than a hope or effort to get someone else to do something (for example, “you’re going to...”). The next components are the desire and action verbs. The desire verb indicates the commitment or veracity of the expression. For example, “I am planning to...” does not contain the same commitment as “I am going to...”. The action verb indicates what the author is planning to do (for example, “...going to fight”). Finally, the expression can also specify the target and/or timing of the intent, for example, “...going to fight them today”. 3.2. LABEL GENERATION 34

Table 3.1: Table of dependencies defining short and long form intent templates

(a) Components of the short form intent template

Role Parent Relationship to parent Pronoun Action verb Nominal subject Desire verb Action verb Auxiliary Action verb None N/A Target (Optional) Action verb Direct object Timing (Optional) Action verb Noun phrase as adverbial modifier (b) Components of the long form intent template

Role Parent Relationship to parent Pronoun Action verb Nominal subject Desire verb None N/A To Action verb Auxiliary Action verb Desire verb Open clausal complement Target (Optional) Action verb Direct object Timing (Optional) Action verb Noun phrase as adverbial modifier

As explained above, the template specifies a first-person pronoun, desire and ac- tion verbs, and optionally a target and timing. These elements additionally have to be related to one another by specific dependency relations. The dependency relations represent the effect one element has on another. For example, the nominal subject of the desire verb is the pronoun, specifying the source or speaker of the desire. A visualization and list of the word dependencies for both the short and long form can be found below (see Figure 3.4 and Table 3.1). PoS tags and word dependencies also model the words that occur between and around those specified by the templates. In some cases, the additional words allow us to identify instances of non-intent. These are expressions that indicate that the speaker is not going to do something (for example, “I am not going to fight today”). These expressions are not of interest since they indicate a plan of inaction. The 3.2. LABEL GENERATION 35

expression could also be preceded by a conditional statement, making the expression of intent weak and not of interest (for example, “if I am going to fight today...”).

3.2.2 Rough label generation

After the template has been defined, it can be used to compute a rough collection of labels for contexts containing explicit intent. Before checking whether the context contains intent, the PoS and dependency tags must be computed for the context. This is accomplished using the spaCy statistical parser, implemented as a Python library [29]. Unlike deterministic (i.e. rule-based) parsers, spaCy is better able to handle messy content. This is enabled by the model’s use of word embeddings to represent the words when predicting the tags. This way if something is misspelt or previously unseen, it can still be represented in the embedding space such that the parser can intelligently handle it. Once the content is parsed, the context can be checked for the presence of the linguistic template, indicating intent. If the template is found within the context, it is given a positive label, or 1. If a negation, question, or a non-first-person pronoun is present, the context is given a negative label, or 0. Otherwise it is initially labelled as unknown, or 0.5. In addition to assigning labels to each context, if intent or non-intent is present, the intent frame, or the words required by the template (i.e. pronoun, desire and action verbs, and target and timing), are saved. Capturing the core tokens enables the refinement of the labels before using them. In addition to the subset of posts from the Storm-Front site, a set of contexts from Wikipedia is added. Based on Wikipedia’s style guidelines, there should be no instances of first-person expressions of intent [69]. As such, each context can be 3.2. LABEL GENERATION 36

assumed to have a negative label, or 0. These contexts were added to teach a model that objective statements should also be labelled as having negative intent. This should result in the model labelling conditional and negated intent, and objective statements as negative. After adding the objective statements, the rough labels can be refined.

3.2.3 Label refinement

The template uses a wide definition of intent to enable a large variety of expressions to be collected. However, to provide the model with the best chance of learning to identify strong expressions of intent, the rough labels should first be refined. The goal of the refinement is to remove instances of weaker intent and non-intent that may have been accidentally flagged. Since intent is predicted independently from abuse, the refinement will only use the desire verb (i.e. not the action verb). The hypothesis is that by not considering the action verbs, what the person(s) intend to do is not in question, but rather whether they are expressing a plan to do it. The label refinement is performed by filtering the rough labels based on the desire verbs from the intent frames previously collected (see Section 3.2.1). The labels associated with strong desire verbs will retain their positive label, while those with weak desire verbs will be re-labelled as unknown (i.e. 0.5). The set of strong verbs will be constructed by expanding a small set of verbs known to be strong desire verbs. This expanded set can then be used to filter out the weak verbs. For example, a positive label with the desire verb “going” would be upheld as strong, whereas one with “dream” may be re-labelled as unknown. The difference in the strength of the expression can be seen by considering “I’m going to see...” vs. “I dream to see...”. 3.2. LABEL GENERATION 37

The initial set of terms that are known to be strong are {“want”, “need”, “going”, “have”, “about”, “plan”}. The set was chosen by considering a sample of the desire verbs captured (along with the context they were used) and selecting those which appear to reliably express strong intent. This set will then be expanded before being used to refine the rough labels previously generated. The expansion can be done using dictionary, tree-based, or spatial expansion techniques. Dictionary expansion uses defined word relationships to get the synonyms of each term in the seed set. Tree-based expansion constructs a tree using word vectors of the verbs, then finds a sub-tree whose verbs become the expanded set. Spatial expansion takes all verbs within a region in the word embedding space.

Dictionary refinement

The dictionary approach has a simple implementation and intuitive results. The expanded set of verbs is formed by taking the union of the initial verbs and all of their synonyms. The implementation requires a thesaurus database or service to query for the synonyms. The approach also produces predictable results since all of the expanded terms are defined based on formal semantic relationships. However, this also limits the expansion to formal word usage and has no support for misspellings. Some of the downsides to the approach can be partially addressed by using a library such as Empath. Empath is a statistically defined dictionary of terms that have been manually validated for accuracy [21]. The dictionary was formed by taking the 10, 000 most common terms from the Google News corpus, then clustering them using word embeddings. The clusters, or categories, are then manually modified to better represent the intuitive word usage. These embeddings can also be used to 3.2. LABEL GENERATION 38

perform lexical expansion with cosine similarity (see Equation 3.2). However, this approach has similar downsides to a lookup dictionary, with its limited dictionary size, no support for different word usage, and no support for misspellings. As a result, it was decided that dictionary-based approaches were not appropriate for this application.

Tree refinement

To address the shortcomings of the dictionary approach, a tree-based method can be used. Tree refinement is a technique that uses word embeddings to construct a tree to represent the similarities between all the desire verbs. Using the locally trained fastText model, this approach accommodates corpus specific word usage, misspellings, and a dictionary that fully covers the verbs present. These vectors are then clustered to generate the tree, representing the relationships between words. Specifically, the tree is formed using agglomerative clustering with complete linkage and cosine distance. This means that the tree is formed by making each word vector a sub-tree, then repeatedly connecting the closest two sub-trees. Taking the seed set of verbs (listed in Section 3.2.3), the refined set of terms can be computed by finding the smallest sub-tree that contains all of the terms. This can also be thought of as finding the parent node with the least number of leaf nodes, which includes all the seed verbs. When forming the tree, complete linkage is used because it compares sub-trees by using the most different verb in each sub-tree being compared. This (roughly) translates to words being grouped based on the largest difference between the words in each set. The result is a larger number of smaller groups, which are then joined. 3.2. LABEL GENERATION 39

This is more favorable for the application as it allows more granularity in finding the minimal sub-tree covering the seed set of terms. This should result in the most relevant set of desire verbs to use when refining the rough labels. When comparing sub-trees, the difference is calculated using the cosine distance between the two vectors in each sub-tree farthest from each other. Cosine distance

between two vectors, v1 and v2, is given by,

v1 · v2 DC (v1, v2) = | |. (3.2) ||v1||2 ||v2||2

Thus, using complete linkage, the distance between two subtrees, ta and tb, is defined as,

wi · wj DT (ta, tb) = max | |, {i : wi ∈ ta}, {j : wj ∈ tb}, i j ||wi||2 ||wj||2

th where wi is the word vector for i word. As an example, let the seed set of terms be {A, C}, with all possible terms being in the set {A, B, C, D, E}. Then, using the example from Figure 3.5, the refined set of terms would be {A, B, C}.

Figure 3.5: Visualization of tree refinement 3.2. LABEL GENERATION 40

However, based on the verbs present, the computed tree can result in an undesir- able expansion. This can occur if the minimum spanning sub-tree captures or misses significant terms because of the others present. Thus, the expanded set of verbs is dependent on candidates rather than just the initial set. This means that with the same initial set of verbs, the expanded set could be significantly harmed based on the set of verbs used to construct the tree. For example, given the same setup as the example above with the inclusion of another term F, could cause C to be discluded from the expanded set. This variability in the resulting set of verbs makes the tree refinement approach undesirable in practice.

Spatial refinement

To address the shortcomings of tree-based expansion, spatial refinement can be per- formed. Similar to tree refinement, spatial refinement is performed using the word vectors associated with the desire verbs in the intent frames. However, spatial re- finement instead bounds the initial verbs in the embedding space, then identifies the other verbs present in that region. This way, the defined region is solely based on the word embeddings and seed verbs. Two methods were used to bound the points, a hyper rectangle and hyper cone (see Figure 3.6). The hyper rectangle is constructed by computing the min and max values in each dimension for the word vectors in the seed set. A desire verb can then be seen to be within the hyper rectangle if it is within the computed bounds for each dimension.

Specifically, dimension j is bounded by the min and max values, bj, min and bj, max, respectively, such that 3.2. LABEL GENERATION 41

Figure 3.6: 2D visualization of spatial refinement techniques

bj, min = min wi,j and bj, max = max wi,j, ∀ wi ∈ W0, (3.3) i i

where W0 is the seed set of word vectors, wi is the vector representing word i, and

th wi,j is the j component of the vector. Thus, the refined set of word vectors, Wcube, is given by,

Wcube = {wi : wi,j ∈ [bj,min, bj,max], ∀j ∈ N≤D}, where D is the number of dimensions in the word embeddings. An 2D example of this can be seen in Figure 3.6 as the green box bounding the points. However, cosine similarity is the de facto standard and has been seen to most 3.3. EXTRAPOLATION 42

closely relate to semantic difference [10]. As such, spatial refinement should be per- formed using this measure, resulting in the bounded area being a hyper cone. This is

accomplished by computing the average of the seed vectors, denoted wavg. Next the

maximum cosine distance, max, of the vectors in the seed set is computed. This can

then be used to identify all of the verbs that lie within β · max of wavg, where β = 2 is the tolerance. Inspired by the hypercube, the magnitude of the vectors should be taken into account, resulting in a truncated hyper cone (as seen in Figure 3.6). Thus,

the minimum and maximum vector magnitudes, mmin and mmax, are also computed. The refined set of word vectors is then given by,

Wcone = {wi : DC (wavg, wi) < β · max, ||wi||2 ∈ [mmin, mmax]},

where DC is cosine distance (see equation 3.2).

3.3 Extrapolation

Once initial labels have been computed and refined, they can be extrapolated on. In this stage, the goal is to infer the definition of intent from the positively labelled contexts, then expand it. This is done using a co-training approach, where two com- plementary models learn at the same time, while maintaining each others accuracy [9]. The two models are a deep learner and a sequence learner, brought together by a consensus mechanism. The deep learner is built around fastText word embeddings and a biLSTM-Attention architecture. The sequence learner is based around the rel- ative frequency of sequences of words. As each learns, their predictions are fed back into each other to help ensure that as the meaning of intent is expanded it remains correct. This process then repeats itself to perform the extrapolation process. The 3.3. EXTRAPOLATION 43

approach is based on the work of Gao et al., where they used a slur learner and LSTM for abusive language detection (discussed in Section 2.3.2) [22]. An overview of the implemented process can be seen below in Figure 3.7

Figure 3.7: Visualization of double-bootstrap learning phase

3.3.1 Rate limiter

At the start of the extrapolation process the initial labels identify a set of contexts that contain intent and non-intent. The two models will then gradually train themselves while moving unknown context labels to either 0 or 1. However, especially at the beginning of the training process, the models may have the desire to pull a large number of contexts to one or both extremes. At this point it is possible that the models have only partially learned what intent looks like, making this movement a mistake. To prevent this from happening a rate-limiting mechanism was added to the training pipeline. The purpose of the mechanism is to slow down the modification of labels. The rate limiter works by receiving each model’s predictions, then limiting the number of modifications it can propose to the consensus mechanism. The limit is based on the number of labels that are currently 1 or 0. Each model can then only propose the number of label changes in each direction as there are 1 or 0 labels. For 3.3. EXTRAPOLATION 44

example, if there were Np positive labels (i.e. equal to 1), then each model could only propose its Np best predictions for contexts containing intent. It should be noted that each model is also able to predict less than this number of label changes if it is not sufficiently sure about a context.

The limits were chosen using the following hypothesis. Should Np positively la- belled contexts exist in the dataset, it is conceivable that Np additional also exist.

However, if the model tried to propose 10 · Np label changes, it is possible that some of these predictions were made in error. As such, the model is asked to only propose label changes for the contexts that it believes most strongly contain intent. This has the added benefit of training the models using the strongest available examples of intent while it is still unsure. Later in the training process should the model be- come more confident with another context; it can then propose its label is changed. This is likely what would happen to a context containing a weaker or more abstract expression of intent.

3.3.2 Sequence learner

The sequence learner identifies the sequences of words significant to positive and negative expressions of intent in order to predict context labels. This technique is similar to the slur learner from the work of Gao et al. [22]. However, abusive language is a simpler construct, where a single term can cause a given context to be considered abusive. Intent is a more dependent concept, requiring multiple words in the right order to express it [2]. As such, the sequence learner uses word n-grams, thus requiring a context-sequence matrix rather than a context-term matrix (discussed in Section 3.1.4). 3.3. EXTRAPOLATION 45

A sequence’s positive and negative label frequencies are computed by comparing its occurrences in positively and negatively labelled contexts. That is, for context i, each occurrence of a sequence j is summed to get the total number of occurrences within the context, denoted σi,j. Using the label for context i, ρi, the number of times a sequence appears in positive and negatively labelled contexts can be computed. The number of occurrences of sequence j in positive and negatively labelled contexts is denoted,

N N + X − X πj = σi,j 1{ρi > 0.5}, and πj = σi,j 1{ρi < 0.5}, i=0 i=0 respectively. The number of positive and negatively labelled contexts is given by

M M + X + − X − π = πj , and π = πj j=0 j=0 respectively. Together, these values can be used to compute a sequence’s ratio between positive and negative frequencies. This can also be thought of as the sequence’s normalized positive and negative rates, or

+ + + log(π ) πj − + −1 αj = − · − and αj = (αj ) , log(π ) πj respectively. The normalizing constant for each class (i.e. π+ and π−) is logged to reduce its sensitivity to class changes. Once computed, a sequence’s rate indicates its significance for predicting a given label. To make a prediction using the sequence learner, the most important sequences must first be identified. For positive labels, the first step is removing all sequences

+ where αj < 1. This is done because a rate less than one indicates that the sequence 3.3. EXTRAPOLATION 46

is more prevalent in negatively labelled contexts. The remaining rates are then used to compute a threshold to find the most significant sequences. The threshold, δ+, is calculated by taking the 99.9th percentile of the rates that are greater than one. The set of sequences predicted to be significant to intent are then given by,

+ + + S = {j : αj > δ }.

The predictions for negative labels are made using the corresponding values for neg-

− − + + ative labels (i.e. αj and δ instead of αj and δ , respectively). Once the sequences significant to positive and negatively labelled contexts, S+ and S−, are identified, the learner can make predictions. To do this, the contexts contain- ing a sequence in the positive or negative set are identified and labelled accordingly. However, if there is a collision, no label is applied. A collision occurs when a context contains sequence(s) related to both positive and negative labels. The contexts that the learner predicts are positive are given by,

X X {i : σi,j > 0, σi,j = 0}. j∈S+ j∈S−

3.3.3 Deep learner

The other model that is used to make predictions about intent is the deep learner. As previously stated, it is built around the fastText embeddings and a biLSTM- Attention architectures (see Table 3.2). The use of fastText embeddings allow signif- icant amounts of semantic and syntactic information about the tokens to be passed to the deep learner. The network is then able to learn intent based on both presence and order of the tokens. Together, these technologies enable the learner to effectively 3.3. EXTRAPOLATION 47

Table 3.2: Intent model architecture

Layer Units Input dimension Output dimension biLSTM 200 200 × 200 200 × 400 Attention 400 200 × 400 400 Dense 50 400 50 Dense 1 50 1

learn and make predictions about the presence of intent in contexts.

Training process

The deep learner’s training process has two phases, incremental learning and rein- forcement. In the incremental learning phase, the deep learner uses the current labels, ρ, to train its weights. Once this is complete, the reinforcement phase attempts to extrapolate on the current definition. Before starting the incremental learning phase, the model discards all the labels that are equal to 0.5 (i.e. unknown). The remaining labels will be used for training. Since the deep learner’s goal is to make boolean predictions, the provided labels must also be boolean. To achieve this, the remaining labels are rounded to 0 or 1. The deep learner also computes label weights, η, which represent the model’s confidence in the labels used for training. A given label weight is given by,

1 η = 2 · | − ρ |, (3.4) i 2 i

where ρi ∈ [0, 1] is context i’s current label. When training, the loss contributed by

context i is devalued based on the value of ηi. Using the boolean labels and computed weights, the model then trains over NT randomly selected contexts, given by, 3.3. EXTRAPOLATION 48

N X NT = min(250000, 1ρi6=0.5). i=0 After completing the incremental learning, the deep learner enters the extrap- olation phase. In this phase, the model proposes changes to the current labels to be used in the next round of training. The model starts by taking all the contexts (including those with unknown labels) and predicts whether they contain intent. A

context whose prediction is less than 0.02 or greater than 0.99 has its label (i.e. ρi) decremented or incremented by 0.1, respectively. By proposing changes to the cur- rent labels, the model extrapolates on the initial definition of intent. The proposed changes are then passed to the consensus mechanism.

3.3.4 Consensus

The consensus mechanism operates similarly to the term learner’s collision checker when modifying labels. Any context that has no modification to its label by either learner remains the same. If one of the learners proposes a modification to a context’s label, this change is accepted. If both learners propose a modification to a label, then it is only accepted if the change is the same as the other learner. In other words, if both learners propose different modifications, both proposals are rejected. The resulting labels are then passed back to the term and deep learner to perform additional training rounds. The consensus mechanism also performs label locking, where it prevents a label from changing once it is assigned either 1 or 0. This is done since 1 and 0 effectively signify certainty that the context contains intent or non-intent, respectively. Thus, should a context’s label be modified enough to reach what amounts to certain intent or 3.4. ABUSIVE LANGUAGE 49

non-intent, it is correct and should no longer change. The locking’s primary purpose is to prevent accidental modifications of the initial labels at the start of training.

3.4 Abusive language

In order to make abusive language prediction, which enables abusive intent prediction, a model must be trained. However, unlike intent detection, there are existing labelled datasets for abusive language. As such, it suffices to generate a deep learning model and train it.

3.4.1 Training dataset

Since abusive language detection is a well studied problem, there are several labelled datasets available. For the purposes of this work, the flavour of abusive language being used is hate speech. This contrasts with some datasets which have labels marking a document as abusive simply if it contains profanity. To train a model with such a definition, three datasets for hate speech were combined (see Table 3.3). The first dataset is made of a small set of documents also from Storm-Front, making it ideal since the samples have the same word usage [16]. The second dataset is from a competition run by Impermium for detecting insults, which is also significant to hate speech [30]. The final dataset is from a competition by Conversation AI to identify multiple types of abusive language in Wikipedia comments [31]. This is relevant because Wikipedia comments are styled similarly to social media but are not limited in length like other data sources (for example, Twitter). Additional information regarding the datasets can be found in Section A.5. The datasets were then randomly mixed to ensure no sequential structure was present, creating a dataset with 240, 846 3.4. ABUSIVE LANGUAGE 50

samples (see Table 3.3).

Table 3.3: Abusive language dataset composition

Dataset Size Abusive Fraction Storm-Front [16] 10, 703 11.2 % Insults [30] 6, 594 26.4 % Wikipedia [31] 223, 549 9.9 % Ensemble 240, 846 10.4 %

To enable the model to work effectively on the primary dataset (i.e. the Storm- Front), the word embeddings used are those trained on Storm-Front. These are the same ones used to train and make predictions with the intent model, as described in Section 3.1.5.

3.4.2 Network architecture

The model design takes cues from Leblanc and Yang et al., using word embeddings and a biLSTM-Attention architecture (see Table 3.4 for specific layers) [38, 71]. Sim- ilar to the intent model, the abuse model uses word embeddings as its input. The model also truncates the documents at 200 tokens from the cleaned documents. The embeddings are then fed into a biLSTM, which is used for its ability to interpret the sequential relationships between tokens. The biLSTM uses 0.5 standard and recur- rent dropout to help prevent the model from developing a dependence on any one token or dimension during training. An attention layer is then able to interpret the biLSTM’s output considering the entire input sequence. Finally, two dense layers reduce the size of the output dimension and make the final prediction. 3.4. ABUSIVE LANGUAGE 51

Table 3.4: Abusive language model architecture

Layer Units Input dimension Output dimension biLSTM 200 200 × 200 200 × 400 Attention 400 200 × 400 400 Dense 50 400 50 Dense 1 50 1

3.4.3 Model training

To train the model, a 85-15 train-test split was used, or 204, 719 documents for train- ing and 36, 127 for testing. The test set was also used during the training to compute the validation loss (i.e. did not directly influence the weights). The validation loss was computed at the end of each epoch to determine whether the training process should stop. This is called an early stopping condition and was done with a patience of three epochs up to a maximum of 50 epochs. A patience of three means the model will continue training until the validation loss increased for three consecutive epochs before stopping. If the training stops from the early stopping condition, the weights from the epoch with the best validation loss are used. Using patience allows the model to continue training even if weights slightly degrade with the hope that a more favorable set of weights can be reached. Early stopping is used because it enables the model to stop training if it begins overfitting to the training data provided. The Adam optimization method was used with the default parameters, or a learning rate of 0.001, beta one and two of 0.9 and 0.999, respectively, and an epsilon of 1 × 10−7 [34]. 3.5. ABUSIVE INTENT 52

3.5 Abusive intent

Since the abuse and intent models independently predict abuse and intent, once each prediction is made, a joint prediction for abusive intent can be calculated. There are several ways this can be done to best represent problematic contexts. These methods include using the cumulative distribution of the predictions, a norm, or a product. The distribution normalization approach is designed to best rank the documents as problematic when models with significantly different prediction distributions are used. A vector norm is best for identifying documents that rank highly in any or most of the categories. Finally, a product can be used to rank documents with all categories being valued. Regardless of the method used, the computed value will be normalized to lie in [0, 1].

3.5.1 Prediction generation

In order to compute the abuse and intent predictions, their respective models are used. The abuse model’s predictions are made by using the deep learner and real-time embedding mechanism. Similarly, the intent learner’s predictions are made using the deep learner and the real-time embedding mechanism. Since both models rely on the same embedded data, their networks can also be combined to improve performance and ease of use. This is accomplished by generating the embedded representation once then passing it to the abuse and intent networks at the same time. The composite network’s architecture can be seen in Figure 3.8. 3.5. ABUSIVE INTENT 53

Figure 3.8: Abusive-intent network model

3.5.2 Distribution normalization

This approach to computing abusive intent is valuable if the component models have significantly different distributions. For example, if one binary classification model and one regression model was being used, then it could be beneficial to normalize the predicted values between the models. This is addressed by treating the model predictions as distributions and normalize them using an approach inspired by prob- ability theory. The proposed approach can be thought of as a discrete version of the probability integral transform. The transform takes any probability distribution with a valid cumulative distribution and converts it to a cumulative standard uniform distribution. To perform the transform, the model must have a probability mass function (pmf), f. Then, for a finite set of source documents, D, an empirical distribution function (EDF), Fˆ, can be derived. This is an estimation of a cumulative distribution function 3.5. ABUSIVE INTENT 54

(CDF), F , derived from experimental data. Specifically, the EDF is given by,

1 X Fˆ(dˆ) = 1 ˆ . |D| {f(d)

ˆ ˆ ˆ ˆ ˆ ˆ FAI (d) = FA(d) · FI (d), (3.5)

where FA and FI are the cumulative distributions for abuse and intent, respectively. This is reasonable since linguistically both abuse and intent can be expressed with or without the other. However, since FAI is the estimated joint cumulative distribution, the data source could be such that this is not the case. Another advantage of the normalization method is that it is order preserving among the documents, or

ˆ ˆ f(da) < f(db) =⇒ f(da) < f(db), a, b ∈ N≤|D|, where fˆ is a modified version of the model. This means that the rank of the doc- ument’s abusive intent rank will be altered, but not its rank in the composite pre- dictions (i.e. abuse and intent). In practice, the transform could be applied to a subsection of the composite predictions if the model’s predicted values have no signif- icant meaning other than their ranking. However, since the transform is dependent on the set of source documents, the resulting intent values are also based on the data 3.5. ABUSIVE INTENT 55

present. As such if all the documents are ranked highly before this is applied, then it could result in some documents being demoted and ignored, which is not desirable for the application.

3.5.3 Vector norm

The predictions associated with a given document can also be thought of as inde- pendent dimensions of the document. Within a vector space, the magnitude of the resulting vector can be computed using a norm. The abusive intent score can be computed by taking the norm of the abuse and intent predictions. Specifically, given a documents component predictions, pi, its norm is denoted ||pi||. However, there are multiple norms that can be used to perform this calculation. Namely for this ap- plication, there is the one, two, and infinity norms. Note that the normalized forms of these norms are given by,

||p || i (3.6) ||1v||

One norm

The one norm, or Manhattan norm, given by,

X ||v||1 = |vi|, (3.7)

vi∈v is simply the sum of the component predictions (since each prediction is strictly positive). Once normalized (given by Equation 3.6) the resulting value lies in [0, 1], making the one norm equivalent to an average (see Figure 3.9 for a visualization). This norm is beneficial when the documents being searched for contain a reasonable 3.5. ABUSIVE INTENT 56

amount of both intent and abuse.

Figure 3.9: Visualization of the normalized one norm

Two norm

The two norm, or Euclidean norm, given by,

X 2 ||v||2 = vi , vi∈v is the length of the vector in a cartesian coordinate system. Like the one norm, the two norm also identifies documents that contain a large amount of both intent and abuse. However, as the component values grow, the increase to the combined value is reduced (see Figure 3.10 for a visualization). This translates to the norm scoring a document with a high score in one component and a moderate score in the other closer to the top as compared to the one norm. This placement is valuable since not 3.5. ABUSIVE INTENT 57

every potential document of intertest will be predicted to contain very high levels of both. Thus, by using such a transform, those documents would be less likely to be missed, as compared to the one norm.

Figure 3.10: Visualization of the normalized two norm

Infinity norm

The infinity norm, given by,

||v||∞ = max vi, ∀ i ∈ {i : vi ∈ v}, i simply takes the highest component prediction. This also means that no normal- ization is required since the range of the function is that of the domain, which is [0, 1]. Unlike the other two norms, this one will rank a document highly if just one of its component predictions is scored highly. This has the advantage of potentially 3.5. ABUSIVE INTENT 58

identifying more problematic documents. However, this would also result in a sig- nificantly higher set of documents being flagged, resulting in a large number of false positives. This also undermines the goal of being able to rank documents in order of likelihood of containing abusive intent (since it does not actually consider them as one prediction). As discussed in Section 3.5.2, the presence of abuse and intent are independent, which is not taken into account by this metric.

Figure 3.11: Visualization of the normalized infinity norm

3.5.4 Product

Though there are advantages to using a norm, taking the product of the component predictions is also desirable. The transform is denoted,

Y ||v||× = vi,

vi∈v 3.5. ABUSIVE INTENT 59

which multiplies all of the component predictions together. Note that this is not a valid norm since ||v||× = 0 =6 ⇒ v = 0v and uses this notation purely for continuity. This transform is valuable since for a document to score highly, both the abuse and in- tent predictions must be high. The advantage of this result is that only the documents containing large amounts of abuse and intent are ranked highly (see Figure 3.12). For the application, this is desirable compared to vector norms since it provides a more refined set of documents to review. Additionally, the ranking of the documents is done in a more intuitive manner. For example, a document containing significant amounts of intent but not abuse would not be flagged. This would correspond to someone expressing an intent to do something benign, which is not of concern given the application. As such, this is the method that was used for computing abusive intent from its component predictions.

Figure 3.12: Visualization of the normalized product transform 3.6. VALIDATION 60

3.6 Validation

Once abuse and intent predictions have been made, their accuracy must be validated. This is to quantify the degree to which the model can be trusted. Since abusive intent predictions are the product of abuse and intent predictions, the composite model’s accuracy can be estimated using the accuracy of the component models. Since the abusive language model was trained using labelled data, a validated accuracy and loss have already been calculated for it. The intent model, however, requires labels to be collected in order to determine the accuracy of the model. Once collected, the crowd-sourced labels can be refined and used to score the accuracy of the model. To facilitate the collection, refinement, and use of labels, several steps took place. Before collecting labels, a dataset was selected that would best quantify the accuracy of the model. To collect the labels, a tool was constructed that presents volunteers with a bias-free interface for performing the labelling. To help ensure the accuracy of the labels, validation questions will have to be presented to the volunteers in addition to performing an annotator agreement calculation. Finally, the validated labels can be used to estimate the accuracy of the intent model.

3.6.1 Data generation

Given the previous justifications for using the Storm-Front dataset for training the intent model, it will also be used as the source for validation data. A random subset of 250, 000 documents from the data from Storm-Front were taken, cleaned, split into their component contexts, then used to train an early version of the intent model. Once the model was trained, it was used to make predictions about the presence of intent in the same contexts. The contexts were then partitioned into zones based on 3.6. VALIDATION 61

Table 3.5: Qualifying contexts

Intent Context label True we will go get answers from them True got a new lawn mower im going to build a shed to keep it in next week True my friend thought it would be funny to throw my phone across the room so now im have to go get it repaired False what False why dont you just go back to twitter the value of their intent predictions. The zones were 2.5% wide and contiguous across [0, 0.4] ∪ [0.6, 1]. The hole in the middle used because the predictions of interest are those which tend towards positive or negative. From these zones, 5000 total contexts, or 200 contexts from each zone, were selected and randomly shuffled. This was done primarily to help ensure the validation data has a more even class distribution. In addition to the shuffled contexts selected from the dataset, validation contexts were added. These were written so they specifically demonstrated intent or non-intent (see Table 3.5 for the qualifying contexts). They were intended to be easy to label and served to ensure that a volunteer was not simply clicking through the contexts.

3.6.2 Label collection

Label collection was done by using a web-application that allows volunteers to anony- mously label data. It was decided that the system should allow anonymous labelling to help prevent bias. Volunteer anonymity was also important for ethical consid- erations and clearance [50]. A web application enables volunteers to perform the labelling anonymously from anywhere (for example, compared to doing it in-person). Additionally, since it was publicly accessible, people from a wider geographic pool 3.6. VALIDATION 62

were able to participate in the study. The labelling site used a ReactJS frontend, JavaScript backend, and SQLite data layer. The front end was written to be as lightweight as possible while presenting a clear and clean interface to volunteers. The backend was similarly built to be as lightweight as possible, enabling it to quickly and smoothly serve as many volunteers as possible. Finally, SQLite was used because of its small memory and storage require- ments, enabling regular and simple backups of the labelling data. Implementation details can be found in Section C. After acknowledging the ethics statement, volunteers’ computers were transpar- ently presented with an anonymous ID by the site. This enabled the site to ensure that someone was not given the same context to label more than once and that they were not given more than 30 to label in a 12 hour period. This was done since multi- ple labels on the same context by the same person would not reinforce confidence in the label but could reinforce errors (should they exist). Volunteers were also limited on the number of contexts they could label to help ensure a focused assessment of the contexts was performed. When labelling, the site transparently sent contexts in sets of six. In each set, five of the six were contexts requiring labels, with the last being a qualifying question. The order in which these were presented was randomized to mask the identity of the qualifying question (in addition to not informing volunteers about the presence of qualifying questions). Once returned to the server, the labels were related to the qualifying label to enable checking of the volunteers’ capacity to label properly. The site also ensured that each context received an adequate number of labels. Specifically, each context was labelled using a first to three out of five voting system. 3.6. VALIDATION 63

3.6.3 Label validation

Once the labelling period concluded, the labels were assessed. Any label submitted with the qualifying context from its set answered wrong was discarded. This was done because if someone could not label a simple example, then it was reasoned that they couldn’t be expected to label a (potentially) complex context properly. Additionally, volunteers could skip contexts if they wished, with the “SKIP” labels also being discarded. Validated labels are then formed through a first-to-three voting system using the remaining labels. This allows boolean labels to be derived using the voting method. The votes can also be used to compute an effective label for a given context. For

th + − the i context with vi and vi positive and negative votes, respectively, the effective label is given by,

+ vi li = + − . (3.8) vi + vi The computation of this label allows the uncertainty of the markers to be commu- nicated. For example, if all the votes were for positive or negative, then it is likely that the context clearly contains intent or not. However, should a context receive three positive and two negative, for example, then the expression of intent may not have been clear. The effective label, li, quantifies this by giving such a context a label

3 of 5 = 0.6 rather than 1. Using this strategy, a weighting can be assigned to each sample to compute a weighted accuracy. This weighting is done by using the effective label as a probability representing the confidence in the label. Specifically, a labels weight is given by, 3.6. VALIDATION 64

   1 − li li ∈ [0, 0.5) + −  max{vi , vi } χi = or + − (3.9)  vi + vi  li li ∈ [0.5, 1].

The visualization of this weighting is also shown in Figure 3.13.

Figure 3.13: Sample weight based on effective label

3.6.4 Prediction validation

Once labels have been computed, the accuracy of the model can be calculated. The model’s accuracy is computed by taking the number of correctly predicted context labels divided by the total number of contexts. Specifically, this is given by,

N 1 X γ = 1 (3.10) N {blie = bpie} i=0 3.7. DOCUMENT AGGREGATION 65

where the rounding notation is defined as

bxe := bxc + 1{x>=0.5}.

As noted above, some of the labels do not have the same degree of confidence. This reduction in confidence around those labels can be expressed using the weighted ac- curacy metric,

N 1 X γ¯ = χ · 1 . (3.11) ||χ|| i {blie = bpie} 1 i=0 It should be noted that this metric upholds many of the properties of a standard accuracy value, namely the result also resides in [0, 1]. However, it differs by devaluing less certain labels, including those the model predicted correctly. This is important because it does not strictly improve the accuracy of results, but rather emphasizes the contribution of ones with confident labels.

N For example, using traditional accuracy, if there are N questions, with 2 answered N correctly, the accuracy would be 0.5 or 50%. However, if the 2 correct predictions correspond to labels with confidences of 0.6 (with the remainder having confidences

N of 1), then the weighted accuracy would be 0.375 or 37.5%. Conversely, if the 2 correct predictions correspond to labels with confidences of 1 (and the remainder having confidences of 0.6), then the weighted accuracy would be 0.625 or 62.5%.

3.7 Document aggregation

Once the abusive intent predictions are computed for contexts, they can be aggregated to get a document level prediction. This is important for the application, because 3.7. DOCUMENT AGGREGATION 66

while a single context would contain the expression of abuse and/or intent, the doc- ument should still be considered as a whole. For example, one context may contain a statement of abusive intent while the surrounding ones contain the person’s justi- fication for the intended action. Conversely, someone may make a single expression of abusive intent in a large document, potentially diluting its potency. As such, it is important to identify contexts that contain high levels of abusive intent, but also to relate those scores back to their parent documents appropriately. There are sev- eral ways this can be done, namely computing the average, maximum, or windowed maximum.

3.7.1 Average

Taking the average abusive intent prediction across the child contexts would account for dilution across a document. However, this could allow such a system to be easily attacked. For example, if an adversary (i.e. someone expressing abusive intent) wanted to hide the statement, they could simply paste benign text after such a statement. As such, for this application an average across contexts would make the system too vulnerable to adversarial inputs and is not beneficial.

3.7.2 Maximum

The maximum would remedy this by simply scoring a document based on its highest ranking context. This would mean that any document contains as much abusive intent as its most powerful context. This makes sense in relation to the application, where the current threat from a person is whatever their most extreme statement is. Since a product is used to compute abusive intent, using a maximum is beneficial. This is 3.7. DOCUMENT AGGREGATION 67

because the use of a product results in a lower valued abusive intent score than the other norms presented. As such, if there is an instance where the product has a high value, it should be identified since the text is likely to be of interest. However, should there not be high levels of abuse and intent in the one context, then the document would not be identified.

3.7.3 Windowed maximum

In order to identify documents that contain abuse and intent, but not necessarily in the same context, a windowed approach was developed. Within a document or paragraph, sentences can refer and/or continue the thought of another one. As such, if someone were expressing abusive intent there could be two sentences that each contain one of abuse and intent, but not both. To identify such cases, a windowed maximum approach can be used. This would start by finding the maximum abuse and intent score between a context and its neighbors (i.e. a window of 3). The product of these maximum scores would then be taken, yielding the same benefits as discussed in Section 3.5.4. A document’s windowed score could then be computed by taking the maximum of the windowed products. However, it also enables documents containing abusive intent across sentences to be identified. It also helps reduce the product transforms zeroing property (as briefly discussed in Section 3.5.4). As a result of the transforms numerous benefits, it was the one chosen as the primary method for aggregating documents. 68

Chapter 4

Results

To reliably detect expressions of abusive intent in social media text, the strategies de- scribed in Chapter 3 were implemented and executed. The results from the execution of those techniques will be presented in corresponding sections to those in Chapter 3. Note to the reader: this chapter contains examples directly from the datasets previously mentioned. These datasets contain hate speech and explicit language. If this is an issue, then skip to Chapter 5 on Page 116 now.

4.1 Data preparation

Before being able to train the models, the data required to do so must be prepared. Specifically, the data must be pre-processed, and specific structures must be pre- computed. The pre-processing required includes the splitting of documents, genera- tion of a context-sequence matrix, and training of the word embeddings.

4.1.1 Data cleaning

As discussed in Section 3.1, several datasets are required in order to successfully train the entire abusive intent model. These include Storm-Front and Wikipedia 4.1. DATA PREPARATION 69

for the intent model, the composite dataset for the abuse model, and additional datasets for analysis. Once selected and collected, the pre-processing steps (outlined in Section 3.1.2) were applied to the data. The resulting data contains only lowercase characters and punctuation. The amount removed from each dataset can be seen in Table 4.1.

Table 4.1: Dataset character lengths before and after processing

Unprocessed Processed Dataset Removed length length Full Storm-Front (intent) [70] 252, 968, 165 141, 192, 445 44.1% Wikipedia (intent) [68] 63, 228, 684 6, 0372, 420 4.5% Abuse ensemble 91, 742, 800 87, 291, 993 4.8% Iron March [63] 9, 668, 405 7, 827, 782 19.0% Manifesto [62] 98, 642 96, 782 1.9%

It can be seen that two of the datasets had significantly more removed compared to the others. The first is the Storm-Front dataset used for training the intent detection model. The second is the Iron March dataset, which is also used for performing further qualitative assessment of the abusive intent model. These two had significantly more removed since they were both collected from the Internet with little to no pre-processing originally applied and contained significantly more formatting-related content (for example, HTML tags). This contrasts with the Manifesto, which was distributed as a text document. Another benefit to processing the documents is the reduction in the number of unique tokens. For example, the tokens “you’re” and “youre” would both become the latter example after the processing is applied. In cases like this one, instead of having two unique tokens, only a single will be present when the data is fed to the model. The cumulative distribution of unique tokens can be seen in Figure 4.1, which can be 4.1. DATA PREPARATION 70

seen to roughly correspond to Zipf’s Law.

Figure 4.1: Cumulative distribution of unique tokens within processed Storm-Front dataset

Once processed, a random subset of 250, 000 documents was selected from the Storm-Front dataset. Similarly, a random subset of 25, 000 articles were selected from the Wikipedia dataset. Moving forward, when the “Storm-Front” or “Wikipedia” datasets are referenced, it is referring to these random subsets. A histogram of the document lengths before and after processing can be seen in Figure 4.2.

4.1.2 Document partitioning

As discussed in Section 3.1.3, when training the intent model, contexts were used instead of whole documents. Once cleaned, the Storm-Front and Wikipedia datasets 4.1. DATA PREPARATION 71

(a) Histogram of the number of characters in (b) Histogram of the number of characters in the unprocessed Storm-Front documents the processed Storm-Front documents

Figure 4.2: Histogram of Storm-Front document lengths before and after being pre- processed

(as seen in Table 4.1) were combined and split into their component contexts at sen- tence terminators. Specifically, the combined dataset’s 268, 486 documents resulted in 2, 112, 719 contexts. A histogram showing the number of contexts per document can be seen in Figure 4.3. An example of the component contexts of a Storm-Front document are:

1. why would i want to

2. i ve heard it s the boil on the armpit of america

3. everyone i ve met who came from ohio left me with a strange and uncomfortable feeling

4. their not very friendly

where the parent document is 4.1. DATA PREPARATION 72

Figure 4.3: Histogram of the number of contexts per document in the intent corpus

why would i want to? i’ve heard it’s the boil on the armpit of amer- ica. everyone i’ve met who came from ohio left me with a strange and uncomfortable feeling. their not very friendly.

An example from a Wikipedia article is,

1. the korean native pig is a breed of domestic pig indigenous to korea

2. the meat is preferred to that of imported breeds and is of a darker red colour

3. characteristics the korean native pig was reported to have glossy black hair a dished face a greatly protruded mouth big eyes straightly upright ears round shoulders a narrow rear back a wide chest long hips well balanced short legs and teats 4.1. DATA PREPARATION 73

4. its major characteristics are high propagating power superior meat quality and strong adaption ability

5. its growth rate and feed conversion ratio are lower than in imported breeds but it yields meat of higher quality and adapts better to extensive management

where the source document is,

the korean native pig is a breed of domestic pig indigenous to korea. the meat is preferred to that of imported breeds and is of a darker red colour. characteristics the korean native pig was reported to have glossy black hair, a dished face, a greatly protruded mouth, big eyes, straightly upright ears, round shoulders, a narrow rear back, a wide chest, long hips, well-balanced short legs and - teats . its major characteristics are high- propagating power, superior meat quality and strong adaption ability. . its growth rate and feed conversion ratio are lower than in imported breeds, but it yields meat of higher quality and adapts better to extensive management.

4.1.3 Context-sequence matrix

As described in Section 3.1.4, a context-sequence matrix is computed before starting the training process. Before computing the matrix, the number of unique sequences, M, was determined to be 500, 000 based on the cumulative distribution of sequences (see Figure 4.4). This was determined by computing the full matrix (i.e. one with all M 0 sequences) using machine A (see Appendix B). The computation of the full matrix used a maximum of 88 GB of RAM and took 23 minutes (approx.) to complete. 4.1. DATA PREPARATION 74

Stored as a sparse matrix, the full matrix has a sparsity, or percentage of empty values, of 99.99994%. When only the M, or 500, 000, most common sequences are kept, the sparsity decreases to 99.99887%.

Figure 4.4: Cumulative distribution of sequences in the intent corpus

4.1.4 Word embeddings

The word embeddings were trained over the complete Storm-Front dataset using machine C after being pre-processed (see Section B for details about machine C). The training process took roughly 4.5 days to train 200-dimensional word vectors using the Skip-gram architecture. By training custom word embeddings, their placement better represents how they’re used in the corpus. For example, Table 4.2 shows the 4.2. LABEL GENERATION 75

25 word vectors with the smallest cosine distance to “liberal”. As would be expected from an extremist right-wing site, some of the similar words are insults. The token formatting in Table 4.2a also reflects the difference in pre-processing between the standard and custom embeddings (and how the model will see the tokens). These collections were computed using fastText’s command line interface for finding nearest neighbors.

4.2 Label generation

Before training the intent detection model, a collection of initial labels were required (as detailed in Section 3.2). The rough labels were computed by applying the linguistic model to the contexts previously prepared. The rough labels were then refined to produce the initial labels that are presented to the intent models.

4.2.1 Rough label generation

The rough labels were computed using machine A (see Appendix B for details). This was completed in 24 minutes when run over the 2, 112, 719 prepared contexts. During the computation, the SpaCy parser was used to generate the PoS and dependency tags for a given sentence. This creates a directional graph that is then traversed to check for the conditions specified in the linguistic template. An example of a graph for a context that was marked as containing intent can be seen in Figure 4.5. In the example, the model identified the words “I’ll find”, which is an example of the short form intent, or “I will X”. The rough labels classified 644, 144 of 2, 112, 719 contexts as either positive or negative. Of those, 1.4% were marked as containing intent, while 29.1% were marked 4.2. LABEL GENERATION 76

Table 4.2: Words closest to “liberal” in custom and default fastText embeddings

(a) 25 words closest to “liberal” in custom em- (b) 25 words closest to “liberal” in default beddings fastText embeddings [44]

Word Cosine Distance Word Cosine Distance justaliberal 0.734 super-liberal 0.705 liiberal 0.734 pseudo-liberal 0.705 muticulturalist 0.734 liberal-progressive 0.712 egalitarian 0.735 pro-liberal 0.713 liberalminded 0.737 centrist 0.714 ozliberal 0.738 liberal-minded 0.715 ultraliberals 0.739 rightwing 0.721 liberall 0.739 ultraliberal 0.725 milticulturalist 0.741 liberal- 0.738 conservativeliberal 0.756 left-leaning 0.742 socioliberal 0.757 libertarian 0.745 iliberal 0.757 moderate-liberal 0.748 liberals 0.758 liberalism 0.750 liberalistic 0.760 liberalist 0.750 lefty 0.762 conservatives 0.751 lberal 0.763 non-liberal 0.751 libtarded 0.773 left-wing 0.752 ultraliberal 0.784 leftwing 0.753 leaningliberal 0.791 right-wing 0.754 libtard 0.791 hyper-liberal 0.762 multiculturalist 0.792 left-liberal 0.763 lliberal 0.793 ultra-liberal 0.768 liberalist 0.803 leftist 0.792 leftist 0.814 liberals 0.799 gliberal 0.828 conservative 0.822 as containing non-intent or objective text. All of the contexts with expressions of intent are from Storm-Front text. Objective contexts, or those from Wikipedia, ac- count for 25.4% of the contexts and comprised 87.2% of the negative rough labels (i.e. those with a label of 0). The class distribution of the rough labels is also expressed in Figure 4.6. 4.2. LABEL GENERATION 77

Figure 4.5: Example of a dependency graph for a context

Figure 4.6: Class distribution of the rough labels

4.2.2 Label refinement

After the rough labels are computed, it is important that they are refined so the weaker expressions of intent are removed. This is done to give the model the best chance to extrapolate and learn to identify intentful statements. As discussed in Section 3.2.3, there are several methods that can be used to accomplish the refinement. Those of interest for this are dictionary, tree, and spatial. Recall that the objective of the refinement is to identify a subset of the desire verbs that indicate strong intent. 4.2. LABEL GENERATION 78

Contexts whose rough label is positive and use a desire verb not part of this set are re-labelled as unknown. Each refinement method was primed with the same initial set of seed desire verbs. This seed set is {“want”, “need”, “going”, “have”, “about”, “plan”, “will”}.

Dictionary refinement

Dictionary refinement was performed using the Empath library to expand the set of acceptable desire verbs. Using the seed set of desire verbs, Empath expanded it bringing the number of tokens to 55. This expansion was performed using the default parameters for Empath. When used to perform the refinement, 17.5% of the positive rough labels remained labelled positive. Additionally, some of the verbs in the expanded set could be considered weak. For example, the terms “might” and “maybe” could be part of statements where the author is not committed to the action. An example of a context with “might” as the desire verb is,

it is a day to day guide to acting legally in whatever small ways we might to resist the system s genocidal program against whites.

As could be expected, this does not appear to express intent and as such should likely be removed from the refined labels.

Tree refinement

Tree refinement was also performed on the rough labels. Using the word embeddings for the seed set of verbs, the smallest sub-tree containing all of them was identified. This sub-tree can be seen in Figure 4.7. This resulted in a refined set of 83 terms. If used to refine the rough labels, this results in 83.6% of the positive labels being 4.2. LABEL GENERATION 79

retained. Some of the sub-trees produce interesting results in the terms grouped together. For example, one of the turquoise sub-trees has “going” and “expect” closest to each other, which quantitatively makes sense (since “I am going to...” is similar to “I expect to...”). The tree then individually adds “make” and “look”, which seem to move down in veracity moving away from the central sub-component of the sub-tree (i.e. “going” ≈ “expect” > “make” > “look”).

Figure 4.7: Minimum spanning sub-tree for desire verbs

Spatial refinement

The final proposed method for refining the rough labels is spatial refinement using a hyper cube or cone. As the hyper cone is the selected strategy for the refinement, the cube method will be compared with it. When using a hyper cone with a tolerance of β = 2, 596 desire verbs were kept, corresponding to 95.3% of the positive labels. The hyper cube then had its tolerance adjusted to identify (roughly) the same number of 4.2. LABEL GENERATION 80

verbs as the hyper cone technique. It was found that a tolerance of 6 accomplished this, retaining 587 tokens, corresponding to 95.3% of the positive labels. However, there are desire tokens that are missed by cube refinement (as compared to the cone method). For example, cone refinement identifies “seek”, “aiming”, and “offering”, while cube refinement does not. When the cube is formed, it computes the farthest distance between the seed vectors in each dimension (as given in Equation 3.3). Interestingly, when the dimen- sion widths are plotted in a histogram they appear to have a bimodal distribution (see Figure 4.8). This could indicate that one set of dimensions is responsible for desire-related themes (i.e. those with low variation), with the remaining dimensions corresponding to other attributes (for example, tense). If accurate, this could enable to cube method to refine the desire verbs while taking such information into account.

Figure 4.8: Histogram of the hyper-cube widths for desire verbs 4.3. EXTRAPOLATION 81

When the hyper cone is used, the maximum cosine distance, 2 · max, is 0.81, with minimum and maximum magnitudes of 4.5 and 19.9, respectively. When compared to the cosine distances in Table 4.2, these distances can be seen to be similar. This shows that the cone method identifies other vectors within the range that synonyms would normally be found. This supports the validity of the proposed method of finding verbs based on their distance from an averaged word vector.

4.3 Extrapolation

Once the rough labels were generated and refined, they could be used to train the model. This allowed the model to learn the definition for intent and expand on it. After each epoch, the model is able to learn then expand its definition of intent, thereby increasing the number of labels it can use to train during the next epoch. The number of training examples available to train from in each epoch can be seen in Figure 4.9. However, when training the deep learner, the number of training documents is limited to 250, 000 which are randomly chosen in each epoch to prevent overfitting.

4.3.1 Sequence learner

The first model in the extrapolation process is the sequence learner (see Section 3.3.2). Throughout the training process, the significant sequences change according to the current set of agreed upon labels, ρ. By considering the high rate sequences, it can be seen that the results from the sequence learner should be relatively stable and reliable. This is because throughout the training process the highest, or most volatile, rates remained largely constant regardless of changes to the labels. The positive and 4.3. EXTRAPOLATION 82

Figure 4.9: Number of non-unknown contexts available for training in each epoch negative rates can be seen in Figure 4.10. The sequences, in addition to their rates, can be assessed to help ensure the model is working properly. The 15 highest rated sequences can be seen at the end of each epoch in Table 4.3. As would be expected based on the stability of the rates, many of the sequences in this table are likely constant between the start and end of training. However, it can be seen that some sequences change ranking (for example, “I must say”), while others disappear from the table completely (for example, “must say that”). Interestingly, versions of the going-to form (for example, “i want to know”, “i fail to”, etc.) are present in this list, however the standard form (i.e. “I am going”) does not as it has a rate of 55. This is likely because there are documents containing sequences such as “if i am going to”, which reduces the rate of the sequence “i am going to”. A similar effect does not occur as often to the sequences in Table 4.3 since 4.3. EXTRAPOLATION 83

(a) Change in significant positive sequence (b) Change in significant negative sequence rates during training, grouped by percentile rates during training, grouped by percentile

Figure 4.10: Positive and negative sequence rates throughout training many of them would not be part of a conditional statement (for example, “if i will give” likely does not occur often).

4.3.2 Deep learner

Similar to the sequence learner, the deep learner’s predictions, or objective, change throughout the training process. This is a reflection on the change in the agreed upon labels and network weights. The model’s confidence with its predictions, reflected as loss and accuracy at the end of each epoch can be seen in Figure 4.11. The loss shown can be seen to be quite low throughout the training process. This is achieved because of the number of training examples in each epoch and the use of non-uniform loss weightings as the model is introduced to new examples. It should also be noted that these values are not validation accuracy and loss, but rather the accuracy and loss calculated using the current labels, ρ. 4.3. EXTRAPOLATION 84

Table 4.3: 15 sequences with the highest positive rates after 1 vs. 20 epochs

(a) 15 sequences with the highest positive (b) 15 sequences with the highest positive rates after 1 epochs rates after 20 epochs

Sequence Positive Rate Sequence Positive Rate i want to know 211.80 i must say 315.94 i fail to 158.52 i ll just 230.48 i must say 157.87 i want to know 211.06 i will give 153.32 i ll tell you 170.92 i must admit 153.32 that we need to 169.63 that we need to 150.73 i fail to 157.97 and we need to 144.23 and we need to 157.97 i ll go 136.43 i will try 154.09 i fail to see 129.94 i will say 153.44 i ll post 120.84 i will give 152.79 i must say that 113.04 i must admit 152.79 must say that 113.04 i ll go 135.96 i ll just 105.25 and i want to 134.66 i ll keep 105.25 but i want 130.78 i will make 105.25 i fail to see 129.49

4.3.3 Consensus

Each epoch, after the two models have learned and proposed changes to the labels, the consensus mechanism combines them. This change results in a new collection of agreed upon labels, whose movement between epochs can be seen in Figure 4.12. The stacked graph shows the percentage of the labels with a given label at the end of each epoch (i.e. 0, 0.1, 0.2, ..., 1). It can be seen that the change to the positive labels is largely imperceptible due to the scale of the graph. However, the proportion of negative labels can be seen to increase over the training period. This is important be- cause as the model extrapolates, it is better able to make informed predictions about the contexts. Since the goal is to rank documents based on their content, extrapo- lation on the meaning of non-intent is equally as important as intent. If the models 4.4. ABUSIVE LANGUAGE 85

Figure 4.11: Accuracy and loss after each epoch could not reliably identify non-intent, then this would result in the misidentification of intent. Thus, by accurately identifying and expanding the models understanding of non-intent, it likewise increases the value and accuracy of intentful predictions.

4.4 Abusive language

As discussed in Section 3.4, the training process for the abuse model is different to that of the intent model. This is a result of it being trained using a labelled dataset (see Table 3.3 for composition). 4.4. ABUSIVE LANGUAGE 86

Figure 4.12: Visualization of the consensus labels after each training epoch

4.4.1 Model training

To train the abuse model, a dataset of 240, 846 documents with 10.4% of those being labelled abusive was used. As specified in Section 3.4.3, the maximum number of training epochs was chosen to be 50. However, it was expected that the early stopping conditions would stop the training process before this. When trained it was seen that the model was stopped after 12 epochs with a final validation accuracy and loss of 95.2% and 0.114, respectively. The validation accuracy and loss of the model at the end of each epoch can be seen in Figure 4.13. As expected, the loss can be seen to decrease, then increase as the model is overfit to the training data. Once the early stopping conditions are triggered, the model weights from the epoch with the lowest 4.4. ABUSIVE LANGUAGE 87

loss are then restored.

Figure 4.13: Validation accuracy and loss after each epoch

The model’s validation accuracy is also expressed as a confusion matrix to show where the false negatives and positives lie (see Figure 4.14). The confusion matrix shows a higher number of false negatives (vs false positives), which is not desirable for the application. This is likely partially caused by the definition of abuse in each of the composite datasets being different. As a result, the model would be training while receiving mixed examples of abuse.

4.4.2 Abuse predictions

Once the model finished training it could then be used to make predictions on the contexts. These predictions were made across the contexts, with roughly 25% being 4.4. ABUSIVE LANGUAGE 88

Figure 4.14: Confusion matrix of abuse predictions on validation data from Wikipedia. The predictions of those from Wikipedia can be seen to largely rest around 0, confirming that Wikipedia’s style guide is adhered to [69]. It should be noted that while some of the training data for abuse is from Wikipedia, these are from Wikipedia comments (i.e. between users), while those in the intent dataset (being discussed now) are from articles. A histogram of the predictions on the Wikipedia dataset can be seen in Figure 4.15a. More significantly, when evaluated on the con- texts from Storm-Front 2.7% were predicted to be abusive. A histogram showing the distribution of the predictions can be seen in Figure 4.15b. To gain additional insight into why a given context was predicted to contain abuse, SHAP values can be computed for the input features. As discussed in Section 2.4.4, these values represent the relative contribution of a given feature to the model’s pre- diction. Some examples showing the contribution’s are displayed below in Figure 4.16. The token values are calculated by summing the SHAP values for every dimension of 4.4. ABUSIVE LANGUAGE 89

(a) Histogram of abuse predictions on a set of (b) Histogram of abuse predictions on a set of Wikipedia pages Storm-Front posts

Figure 4.15: Histogram of abuse predictions on two corpora the token’s vector.

Figure 4.16: Example SHAP values for Storm-Front contexts 4.5. ABUSIVE INTENT 90

4.5 Abusive intent

Once the abuse and intent networks have been trained, their outputs can be used together to calculate the abusive intent predictions for documents. As discussed in Section 3.5, there are several ways this calculation can be performed. The methods introduced are distribution normalization, vector norm, or product. While it was determined that taking the product is best for the use case (i.e. ranking high-scoring documents), the other methods can be valuable for other use cases. As such, the abusive intent predictions computed using each method will be presented. However, the product predictions will be given additional attention.

4.5.1 Distribution normalization

Since the values produced by different models do not use the same units, it can be important to equate them. For example, a 0.8 prediction by the abuse model may indicate a high probability of intent, while 0.8 by the intent model may be moderate. This equation can be done by mapping a prediction with the cumulative distribution function of the model. To perform this, the cumulative distributions are estimated for both the abuse and intent models, which can be seen in Figure 4.17. Once each of the component cumulative distribution are estimated, they can be used to estimate a joint cumulative distribution. Since linguistically abuse and intent are independent (i.e. each can be expressed without the other), it was assumed that this was also true for their presence in real-world data. In the case of the Storm- Front data, this can be seen to be relatively true, with no distinct patterns present in the joint histogram (see Figure 4.18). The joint cumulative distribution can then be computed by multiplying the normalized abuse and intent predictions (as given by 4.5. ABUSIVE INTENT 91

(a) Estimated cumulative distribution of (b) Estimated cumulative distribution of in- abuse predictions tent predictions

Figure 4.17: Cumulative distribution of abuse and intent predictions

Equation 3.5). The resulting joint cumulative distribution can be seen in Figure 4.19. Using the estimated distribution, the abusive intent predictions can then be com- puted. The resulting histogram of the abusive intent predictions can be seen in Figure 4.20. Examples of contexts that are ranked highly using this approach are in Table 4.4. The examples present in the table quantitatively show that the approach is reasonably accurate at identifying documents with abusive intent. However, there are some weak examples that appear (for example, “we must look like...”).

4.5.2 Vector norm

The abusive intent score for a context can be computed using a vector norm. This method assumes each context has a prediction vector with an abuse and intent com- ponent. The norm computes the magnitude of this vector, which can be taken to represent the abusive intent strength of the vector. However, the result produced varies significantly based on the choice of norm. 4.5. ABUSIVE INTENT 92

Figure 4.18: Joint histogram of abuse and intent predictions

Infinite norm

The infinite norm is a very relaxed metric, simply returning the largest component. This translates to a context’s abusive intent score being the maximum of its abuse or intent prediction. As can be seen, this results in a significant number of highly ranked contexts (see Figure 4.21a). While this in itself is not a problem, one of the goals of the work is to reduce the number of documents requiring manual assessment. However, it has the benefit of identifying contexts that may have been missed by one of the models (and may not otherwise be considered). Additionally, it can be seen that some of the contexts identified using the infinite norm do not indicate abusive intent (see Table 4.6). These are contexts that contain high abuse and low intent or high intent and low abuse. The former are statements 4.5. ABUSIVE INTENT 93

Figure 4.19: Estimated cumulative distribution of abuse and intent that are solely meant to insult someone, where the latter are expressions of benign intent. In both cases, such statements are not of interest to this work, showing the critical downside of the infinite norm.

One norm

To refine the number of documents being considered more than the infinite norm, the one norm can be used. Once normalized to remain within [0, 1] and since the values are strictly positive, this norm simply becomes an average of the abuse and intent predictions. This results in logical rankings with interpretable values. Examples of contexts that have high abusive intent predictions using the one norm are presented in Table 4.7. 4.5. ABUSIVE INTENT 94

Figure 4.20: Histogram of abusive intent predictions computed by normalizing the distribution

Two norm

Sometimes, however, the one norm devalues a context too heavily if one of the models predicts a lower value. Unlike the one norm, the two norm rewards high-value com- ponents (i.e. predictions). This results in a behavior between that of the infinite and one norm. This is reflected in the histogram of the predictions on the Storm-Front training set (see Figure 4.21c). The advantage of this is norm is it ranks documents that score highly in one component and moderately in the other higher than the one norm. For example, a context with an abuse prediction of 0.5, but intent prediction of 1 would have a two norm of 0.87 instead of 0.75 (as it would with the one norm). Some examples of highly rated contexts using this method can be seen in Table 4.8. 4.5. ABUSIVE INTENT 95

Table 4.4: Examples of contexts with high abusive intent when normalizing distribu- tion

Abusive Abuse Intent Context Intent 0.931 1.000 1.000 i ll ignore the troll are you bnp or anal 0.825 1.000 0.999 we must look like total idiots hanging on to that lie 0.988 0.999 0.999 obama isn t a leftist you ing nazi pig incestuous ing clown i ll rip your ing intestines out and feed them to dogs 0.904 0.999 0.999 don t refer to us as a bunch of hillbillies or we ll kick your ass 0.901 0.999 0.999 we need to stop being soft hiding behind a wall of tolerance and start kicking some black and muslim ass 0.901 0.999 0.999 if you come to me and threaten my life i will kill you 0.889 0.999 0.998 now go away or we ll kick your ass 0.889 0.999 0.998 i ll wait you lying retard 0.784 1.000 0.998 we must protect our women from filth like this 0.863 0.999 0.998 we ll beat your ass 0.862 0.999 0.998 kill yourself and i will piss on your grave 0.772 1.000 0.998 i like to watch bestiality porn 0.765 1.000 0.998 i ll see your mouth shut 0.814 0.999 0.998 i ll take lying black faggots for a alex 0.810 0.999 0.998 i ll bite how about filthy traitors

4.5.3 Product

The product is able to refine the set of documents being considered much more strictly than the vector norms. This has the added benefit of targeting only the strongest expressions of abusive intent. The effect of this approach to computing the abusive intent score can be seen in Figure 4.21d. As discussed, this results in many of the highly ranked contexts containing significant amounts of both abuse and intent. This can be seen in Table 4.9, where many of the contexts display strong examples of abusive intent. 4.5. ABUSIVE INTENT 96

Table 4.5: Examples of contexts with high abusive intent calculated using the infinite norm

Abusive Abuse Intent Context Intent 0.413 1.000 1.000 i hate to say this 0.372 1.000 1.000 i hate to say it but i agree 0.325 1.000 1.000 i hate to ask the question but 0.383 1.000 1.000 i hate to say it skyfirezz but i think you are right 0.004 1.000 1.000 i love to see that 0.310 1.000 1.000 i hate to say this but this non white is actually very very smart 0.276 1.000 1.000 i hate to say this but this is a good thing 0.569 1.000 1.000 i hate to break it to them 0.125 1.000 1.000 quote originally posted by jcksonian url i hate to say it but fading light is right 0.527 1.000 1.000 i hate to tell you this but wn will not explode 0.539 1.000 1.000 i hate to see white people killed but i can t blame iraqis for defending themselves 0.226 1.000 1.000 quote originally posted by proud caucasian i hate to sound defeatist but this country looks doomed 0.360 1.000 1.000 i hate to tell you this but that isn t what the song is about 0.229 1.000 1.000 i hate to post this here this is solely directed at proud caucasian i saw your user info location 0.287 1.000 1.000 i hate to say it but we could learn a lesson from this

However, there are also contexts that while containing abuse have more passive forms of intent. For example, in Table 4.9 a context contains “we need segregation from...”. This could be considered a less active expression of intent because it uses the impersonal form of “we”. As such, this statement could be thought of as rhetoric containing abusive intent rather than a statement of future personal intention. The impersonal form of we, while being a first-person pronoun, is used more like a second or third-person pronoun. For example, if The Prime Minister said “we will make 4.5. ABUSIVE INTENT 97

Table 4.6: Examples of contexts with high abuse or intent identified using the infinite norm

Abusive Abuse Intent Context Intent 0.005 0.998 0.998 its gone now we need to start anew to reject the american culture that has superseded our own and move forward 0.005 0.991 0.991 is not our righteousness of character a testament to that which we intend to preserve 0.004 0.996 0.996 gender roles need to be flexible because we do not know in advance what kind of state we will inherit 0.004 0.999 0.999 i definitely want to move 0.003 1.000 1.000 true we must stay within certain confines for now 0.001 0.999 0.999 we must unite to show the government that we de- mand to be heard 0.003 0.997 0.997 i may not vote at all or if the new american freedom party comes up with a good candidate i will vote that direction 0.980 0.001 0.980 you are such an idiot 0.988 0.002 0.988 you re an idiot 0.990 0.000 0.990 antis are idiot dumbass 0.983 0.000 0.983 sanchez you are either a troll or an idiot most likely both 0.981 0.006 0.981 sorry once you swap out your vagina for a penis you should lose your right to mother or father children 0.988 0.007 0.988 f ck you you motherf cking sshole 0.981 0.004 0.981 you re not white you bloody idiot it through this”, it would mean the same as “Canadians will make it through this” (though the former may convey a personal tone). Critically, an expression of intent with an impersonal we is closer to a statement asking/telling people to do something, rather than the author expressing that they will do it. It can also be seen that the contexts scored low do not contain explicit examples of intent (see Table 4.10). Note that the contexts written in different languages were 4.5. ABUSIVE INTENT 98

Table 4.7: Examples of contexts with high abusive intent when using the one norm

Abusive Abuse Intent Context Intent 0.988 0.999 0.994 obama isn t a leftist you ing nazi pig incestuous ing clown i ll rip your ing intestines out and feed them to dogs 0.963 0.996 0.979 first of all i want to address the fact that you are an idiot 0.931 1.000 0.965 i ll ignore the troll are you bnp or anal 0.959 0.970 0.964 we need segregation from these stupid filthy diseased savages 0.904 0.999 0.951 don t refer to us as a bunch of hillbillies or we ll kick your ass 0.901 0.999 0.950 we need to stop being soft hiding behind a wall of tolerance and start kicking some black and muslim ass 0.901 0.999 0.950 if you come to me and threaten my life i will kill you 0.904 0.995 0.950 those white idiots are begging her not to kill black babies i want to buy her a beer honestly if that represents christianity then i want no part of it 0.898 0.999 0.949 if you tell us to pray quieter we ll kill you 0.898 0.999 0.948 we ll rape your wife pretoria give me your gun or i ll rape your wife 0.901 0.995 0.948 about seven racists were outside attempting to kick in the door shouting ing paki im going to kill you black bastard 0.896 0.998 0.947 i know for a fact that if i had a kid and he was wearing those kind of clothing my reaction would be this one i ll kick your butt you little bastard 0.904 0.988 0.946 i would recommend to say hey don t act a negro who are they for you 0.897 0.993 0.945 but if you don t i will look for you i will find you and i will kill you 0.897 0.993 0.945 one night when i disagreed with him he d grab me by the throat and said if you don t do what i say i will kill you 4.5. ABUSIVE INTENT 99

Table 4.8: Examples of contexts with high abusive intent when using the two norm

Abusive Abuse Intent Context Intent 0.965 0.993 0.989 a i m not going to clean that shit up and merge your posts every time you fucking do it if you persist i ll suspend you for a week go reintroduce yourself you faggot joined less than posts who the fuck even remembers you at this point 0.953 0.998 0.988 he is a fucking retard and i want to organize a mas- sive troll on him 0.910 0.999 0.978 if you suicide i ll kill you 0.899 0.996 0.974 a i ll end that here because being british sounds snarky as fuck 0.918 0.977 0.974 once i m done here i m going to spam the fuck out of the forums as a final fuck you 0.945 0.945 0.972 a anyway i d love to get the fuck away from this cesspit 0.895 0.991 0.972 if they don t i ll fuck them off 0.891 0.994 0.971 hahah you re a retard like me then we ll make good company 0.882 0.999 0.971 i ll leave you to wonder which i am talking about while i blush like an idiot 0.906 0.960 0.966 a also they pretty much know i ll fuck off to the motherland aftera 0.862 0.997 0.966 i like to make it clear i m not a fucking dick sucker that s all but that s how young guys are they idolise anyone who is what they don t dare be 0.858 0.996 0.964 what the literal fuck sometimes i just want to delete tumblr 0.851 0.997 0.963 clean cut but i ll still throw down with niggers 0.851 0.997 0.963 clean cut but i ll still throw down with niggers 0.841 0.996 0.960 a carrying on with being a british cuck i want to leave the eu so i don t see mongol fuckers like you stealing my money 4.5. ABUSIVE INTENT 100

Table 4.9: Examples of contexts with high abusive intent when using the product calculation

Abusive Abuse Intent Context Intent 0.988 0.999 0.988 obama isn t a leftist you ing nazi pig incestuous ing clown i ll rip your ing intestines out and feed them to dogs 0.963 0.996 0.959 first of all i want to address the fact that you are an idiot 0.931 1.000 0.931 i ll ignore the troll are you bnp or anal 0.959 0.970 0.930 we need segregation from these stupid filthy diseased savages 0.904 0.999 0.903 don t refer to us as a bunch of hillbillies or we ll kick your ass 0.901 0.999 0.901 we need to stop being soft hiding behind a wall of tolerance and start kicking some black and muslim ass 0.901 0.999 0.900 if you come to me and threaten my life i will kill you 0.904 0.995 0.899 those white idiots are begging her not to kill black babies i want to buy her a beer honestly if that represents christianity then i want no part of it 0.898 0.999 0.897 if you tell us to pray quieter we ll kill you 0.898 0.999 0.897 we ll rape your wife pretoria give me your gun or i ll rape your wife 0.901 0.995 0.897 about seven racists were outside attempting to kick in the door shouting ing paki im going to kill you black bastard 0.896 0.998 0.894 i know for a fact that if i had a kid and he was wearing those kind of clothing my reaction would be this one i ll kick your butt you little bastard 0.904 0.988 0.893 i would recommend to say hey don t act a negro who are they for you 0.897 0.993 0.890 but if you don t i will look for you i will find you and i will kill you 0.897 0.993 0.890 one night when i disagreed with him he d grab me by the throat and said if you don t do what i say i will kill you 4.5. ABUSIVE INTENT 101

(a) Histogram of abusive intent predictions (b) Histogram of abusive intent predictions computed using the ininite norm computed using the one norm

(c) Histogram of abusive intent predictions (d) Histogram of intent predictions computed computed using the two norm with the product

Figure 4.21: Abusive intent histograms computed using different norms removed by hand. In the contexts shown, none of them contain explicit first-person intent, as desired.

4.5.4 Predictions

Like in Section 4.4, a model’s reasoning for making a certain prediction can be un- derstood using SHAP values. For contexts with high abusive intent SHAP values can 4.5. ABUSIVE INTENT 102

Table 4.10: Examples of Storm-Front contexts with low abusive intent when using the product calculation

Abusive Abuse Intent Context Intent 0.006 0.000 0.000 se quemaron al sol 0.000 0.000 0.000 b nh ch ng is made from glutinous rice mung bean pork and other ingredients 0.011 0.000 0.000 however as i dig deeper into the jewish question and learn a little more it makes them seem the most worthy of that saying 0.040 0.000 0.000 talk about making fun of rape 0.006 0.000 0.000 wouldn t it be possible to do the same thing to the pimps 0.000 0.000 0.000 he was buried in bagneux cemetery just south of paris 0.000 0.000 0.000 multiculturalism sucks doesn t it 0.004 0.000 0.000 no problem b 0.001 0.000 0.000 also tom s daughter isnt there to celebrate 0.016 0.000 0.000 mayes lucas v 0.004 0.000 0.000 besides york was chosen for its strategic purpose in the first place 0.000 0.000 0.000 directorships fisher was president and executive di- rector of the writers guild of america east and in- corporated the wgae foundation established to assist professional screen and television writers 0.000 0.000 0.000 he will keep the borders open and civil liberties in retreat 0.074 0.000 0.000 quote it is crucial to always be aware of the law and know what your rights are indeed 0.012 0.000 0.000 chaparral cars was a pioneering american automo- bile racing team and race car developer that engi- neered built and raced cars from through 0.002 0.000 0.000 they came here without any real skills and little or no leadership 0.001 0.000 0.000 sebastian roch as a music journalist for and singer for catholic discipline who becomes an avid supporter of the germs 0.010 0.000 0.000 kamahuk warszawa greatest hits prod 0.006 0.000 0.000 with the internet we see a pattern 4.5. ABUSIVE INTENT 103

help explain why each model make the prediction it did. For example, Figure 4.22 shows high value on othering terms (i.e. “I” and “you”) and aggressive actions (i.e. “kill” vs. “saying”) when the abuse model made its predictions. This is in contrast to the intent model which most heavily relied on the first-person pronoun connected to a future verb (i.e. “will”). This example also shows the intent model (may have) succeeded in being trained to detect strong (but generic) forms of intent regardless of the action. In Figure 4.22 this is shown by “I” and “will” having high value, but “kill” and “you” having significantly lower values.

(a) SHAP values for abuse model (b) SHAP values for intent model

Figure 4.22: Example context with SHAP values for abuse and intent models

Similarly, in Figure 4.23 the abuse model identifies “kill you”, while the intent model identifies the three instances of “I will”. Again, the intent model shows that it uses the action verb, but the main component to its prediction is the pronoun and desire verb. 4.6. VALIDATION 104

(a) SHAP values for abuse model (b) SHAP values for intent model

Figure 4.23: Example context with SHAP values for abuse and intent models

Some examples of documents with high abusive intent scores can be seen in Ta- bles 4.11, 4.12, and 4.13. These examples qualitatively show that the abuse and intent models were constructed such that they can be applied to data from various sources.

4.6 Validation

Once the intent model was developed and the predictions were made, they had to be validated to quantify the accuracy of the model. As described in Section 3.6, this was done by enlisting volunteers to label a set of contexts. These user-labels, or votes, were then used to derive a reliable set of labels. The labels were then used to calculate the model’s accuracy at predicting instances of intent. This success was quantified using accuracy and weighted accuracy (recall Equations 3.10 and 3.11). 4.6. VALIDATION 105

Table 4.11: Examples of contexts with high abusive intent from abusive training dataset

Abusive Abuse Intent Context Intent 0.998 0.999 0.996 you are a fucking cunt and i am going to track you down fuck head piece of shit 0.999 0.997 0.996 fuck you go you cocker cocksuer mohterfuck unlce suckefing you go fuck your mom you dirty little ass fuck bitch i will kill you i konw where you live i will rape you yoru fucking ass 0.998 0.998 0.996 fuck you fuck you keep blocking me i ll keep coming back are so fucking bored that you have to sit around and fuck with me on the goddamn computer 0.996 0.999 0.996 fuck you you little shit i hope to see you end your fucking worthless existence 0.996 0.999 0.995 fuck you nerd fuck you nercd get a life freak i will burn in hell now byy 0.996 0.999 0.995 fuck you bitch i will find your ip adress n kill you with my gun bitch ass motherfucker 0.996 0.998 0.994 i will fuck you up you irish pansy fuck your mother 0.995 0.999 0.994 im going to take the whole fucking article to pieces you fucking ignorant insulting pigs 0.996 0.997 0.993 hey gwernol go suck a dick you fuckin piece of shit i told you that you cant stop me whether i m eoganan pan ethnic whoever the fuck i will bring this piece of shit thing down and put faggot users like wobble in their place 0.999 0.994 0.993 she is a motherfucker i will block you motherfucker shit bitch punch kill swear do nudity so shut up frickin around give you drugs call you names like shit fuck bum poo damn 0.993 0.999 0.992 fuck you sarcasticidealist i will kill you with my gun as well bitch ass motherfucker 0.993 0.999 0.992 shut up you stupid f ing retard or i will kick you to death 0.992 0.999 0.992 i will kill you you fucking twat i will shove a knife through your heart 4.6. VALIDATION 106

Table 4.12: Examples of contexts with high abusive intent from Manifesto

Abusive Abuse Intent Context Intent 0.968 0.999 0.967 we will kill you and drive you roaches from our lands 0.922 0.996 0.918 if you are released we will find you and kill you if you are in prison we will reach you there if you try to hide these rapist scum we will kill you as well 0.894 0.998 0.892 i will wipe you the fuck out with precision the likes of which has never been seen before on this earth mark my fucking words 0.857 0.975 0.836 not only am i extensively trained in unarmed com- bat but i have access to the entire arsenal of the united states marine corps and i will use it to its full extent to wipe your miserable ass off the face of the continent you little shit 0.751 1.000 0.751 we must crush immigration and deport those in- vaders already living on our soil 0.586 0.967 0.566 i will shit fury all over you and you will drown in it

4.6.1 Volunteers

Once ethics approval was received, the labelling site was made live and the website was shared with potential volunteers by email and social media. The contexts were labelled between March 19th and April 18th, with 96 volunteers taking part in the labelling process. While labelling, the volunteers were (transparently) passed contexts to label in batches of five and one qualifying question. The number of contexts labelled by each volunteer can be seen in Figure 4.24. Note that volunteers were limited to 30 labels (within a 12 hour period) in an attempt to ensure attention to detail and accuracy of the labels. 4.6. VALIDATION 107

Table 4.13: Examples of contexts with high abusive intent from Iron March

Abusive Abuse Intent Context Intent 0.965 0.993 0.958 a i m not going to clean that shit up and merge your posts every time you fucking do it if you persist i ll suspend you for a week go reintroduce yourself you faggot joined less than posts who the fuck even remembers you at this point 0.953 0.998 0.951 he is a fucking retard and i want to organize a mas- sive troll on him 0.910 0.999 0.909 if you suicide i ll kill you 0.918 0.977 0.897 once i m done here i m going to spam the fuck out of the forums as a final fuck you 0.899 0.996 0.895 a i ll end that here because being british sounds snarky as fuck 0.945 0.945 0.894 a anyway i d love to get the fuck away from this cesspit 0.895 0.991 0.887 if they don t i ll fuck them off 0.891 0.994 0.885 hahah you re a retard like me then we ll make good company 0.882 0.999 0.881 i ll leave you to wonder which i am talking about while i blush like an idiot 0.906 0.960 0.870 a also they pretty much know i ll fuck off to the motherland aftera 0.862 0.997 0.860 i like to make it clear i m not a fucking dick sucker that s all but that s how young guys are they idolise anyone who is what they don t dare be 0.858 0.996 0.854 what the literal fuck sometimes i just want to delete tumblr 0.851 0.997 0.848 clean cut but i ll still throw down with niggers 0.873 0.967 0.844 maybe i will you lazy faggot 0.841 0.996 0.838 a carrying on with being a british cuck i want to leave the eu so i don t see mongol fuckers like you stealing my money 4.6. VALIDATION 108

Figure 4.24: Histogram of the number of labels submitted by each volunteer

4.6.2 Collected labels

From the 2352 votes made by the volunteers, one in five were qualifying, giving 392 qualifying and 1960 genuine votes. As discussed in Section 3.6.3, a vote or a user’s label is only valid if the qualifying question from the same batch was answered correctly. Of the genuine votes, 1665 were submitted alongside correct qualifying votes. This resulted in 347 valid labels, where 19% were labelled as containing intent. The resulting effective labels can be seen as a histogram in Figure 4.25, with the red line separating positive and negative labels. 4.6. VALIDATION 109

Figure 4.25: Histogram of effective labels

4.6.3 Prediction validation

Once the labels were computed, they could be used to evaluate the performance of the intent detection model. This was done using both accuracy and weighted accuracy. The accuracy of the model was calculated to be 79.8%. The model’s weighted accuracy was calculated to be 81.4%. This indicates that some of the errors made by the model also correspond to contexts that volunteers were collectively unsure about. As previously discussed, the weighted accuracy provides a more reasonable metric for the performance of the model given the collective uncertainty present. The confusion matrix detailing the errors made by the model is shown in Figure 4.26. In the figure, the values listed are standard accuracy (i.e. not weighted accuracy). Though this shows that there were a significant number of false negative examples, 4.6. VALIDATION 110

Figure 4.26: Confusion matrix for intent model using validation labels

this is partially due to the volunteer labels. It appears that during the labelling process some of the respondents had a different definition of intent from the one introduced in Chapter 1. As a result, there are some false negatives (i.e. has a negative label and positive prediction) where the context does not contain intent. Table 4.14 contains some examples of contexts marked as false negatives. For example, the first context in the table has “I decided”, which is referring to an event that has already happened, and as such cannot contain intent. Similarly, the second context has “old barry might want to...”, which is a third person statement of intent, or really a suggestion for “old barry”. There are some false positive examples where the definition for intent differs 4.6. VALIDATION 111

Table 4.14: False negative examples

Predicted User Context Value Label 0.000 1.00 well i decided to join the group to educate the morons and counter their ridiculous posts with facts 0.016 0.75 old barry might want to think about keeping his ass at home 0.313 1.00 the street but its the closest thing we can get to without going through all the legalilties of kicking someones ass in the street 0.047 1.00 those words just make me want to break things on you people like you make me sick 0.000 0.75 gusts of popular feeling predators and sex objects media por- trayals of foreign male and female teachers i wanted to feel her big breasts mr 0.119 1.00 quote when you can produce a valid reason for polish immi- gration into my country rather than your pathetic attempt to justifing it by repeated abuse of other posters then i may well respond to your future posts in the meantime i shall just ignore you 0.003 1.00 if one of those filthy rats lays a hand on me it going to be splat time 0.009 0.75 make them get off their asses like everyone else has to and work for it 0.004 0.75 quote originally posted by maiden america here is what all the gay love and tolerance is leading us to 0.003 1.00 all the straight fine young men will have to be exposed to and will be required to put up with the fags 0.069 1.00 you and your big stories don t make me come over there and slap you 0.001 1.00 now they need to track down and kill those scum 0.103 1.00 let your seed grow keep our race alive love your white kin keep them close they will try to sepperate us to destroy us fight for what your beleave stand up for your race and let the jew know we will be on top forever hail victory 0.036 0.60 quote originally posted by steel core send them one dollar and tell them that you d like to send more but the dammed jews are sucking your wallet dry 0.014 0.60 pay attention to what the adl lady says basicly if you dare say anything about a jew your anti smite or racist 4.7. DOCUMENT AGGREGATION 112

from the presented one. Examples of false positive contexts can be found in Ta- ble 4.15. However, there are also labels that highlight some of the model’s short- comings, namely how it interprets we. For example, the first context contains “we need to unite...”, which the volunteers did not interpret as intent. As discussed in Section 4.5.3, this could be because the statement uses the impersonal form of we.

4.7 Document aggregation

Once the predictions have been made for contexts, they can be aggregated to the document level. This enables a more holistic scoring of the document to be per- formed. There are several ways the aggregation can be performed, those considered are averaged, maximum, or windowed.

4.7.1 Averaged

The document-level score can be computed using the averaging method. In practice, this favors shorter documents by the virtue of an average. As a result, it can miss potentially problematic documents if it contains too many contexts that were not scored highly, however those identified contain significant amounts of abusive intent. The effect of this can be seen in Figure 4.27 where the distribution produced using the average (see Figure 4.27b) is significantly farther left than the ones produced by the other two methods. However, this method also identifies documents with moderate, but consistent, abuse and intent predictions across their contexts. For this application such a result is less desirable then a single or subset of the contexts that instead express strong intent. 4.7. DOCUMENT AGGREGATION 113

Table 4.15: False positive examples

Predicted User Context Value Label 0.994 0.25 ie in the case of rhodesia harold wilson the commie bas- tard linden b johnson the kaffir bottie and verwoed the back stabbing traitor in sa margarate thatcher the slut and the ja stemmers you cowardly kaffir screwing bastards we as mem- ber of this forum cannot allow our country s leaders or our nationality divide us we need to unite and watch each other backs for we or the last hope for the homo sapian race 0.999 0.00 we need to return to medieval era capital punishments for these violent savage animal muds 0.985 0.00 we just want to be left alone by races that history has shown can t wipe their own arse without instructions and a tonne of aid money 0.806 0.00 we want a world where we can have white heritage month just like the blacks have thier month 0.928 0.40 i want the moth r f cker arrested 0.995 0.25 my ancestors kick ass and we continue to kick ass 0.818 0.00 i know i am single 0.899 0.00 we are going to be labeled hatred eaten bigots no matter what we do because there are large well funded jew organi- zations that watch every move we make 0.937 0.40 it s seriously starting to grind my nerves and i truly want to know who or what the f is being retarded and bothering me 0.993 0.00 jew watch jewish atrocities slave trade jewish slave ship own- ers quote for decades the white people of america have been subjected to a continual barrage from blacks and others that you and i are somehow responsible for the african slave trade and that we need to atone for our guilt 1.000 0.25 i hate to tell you but 1.000 0.25 i love to see commie scum attack other commie scum for not being quite left enough 0.521 0.00 i especially hate spics 0.980 0.00 we need to be in the middle east like we need a second asshole 0.697 0.00 thank ing god for impartial wikipedia 4.7. DOCUMENT AGGREGATION 114

(a) Histogram of document level abusive in- (b) Histogram of document level abusive in- tent predictions aggregated by taking the tent predictions aggregated by taking the av- maximum erage

(c) Histogram of document level abusive in- tent predictions aggregated by taking the win- dowed maximum

Figure 4.27: Document level Abusive intent histograms computed using different aggregation methods

4.7.2 Maximum

The maximum method of computing a document abusive intent score simply takes the highest component prediction. As such, the documents identified are simply those containing the highly scored contexts (see Table 4.9). As seen in Section 4.5.3, these 4.7. DOCUMENT AGGREGATION 115

Table 4.16: Document with abusive intent of 0.89 using the windowed product cal- culation

Abusive Abuse Intent Context Intent 0.901 0.987 0.890 we need to stop being soft hiding behind a wall of tolerance and start kicking some black and muslim ass 0.028 0.004 0.000 make them uncomfortable in europe by being more open with our nationalistic pride contexts showed expressions of abusive intent, some of which may be of interest. However, this practically means that this method of aggregation adds no additional value.

4.7.3 Windowed

The windowed approach can be seen to address some of the issues with the averaging method. It is also able to identify documents with abuse and intent spread between contexts. However, it is able to accomplish this while using the product norm. Similar to the maximum norm, some of the documents identified are from contexts containing abusive intent (see Table 4.16). However, it is also able to identify documents where the abuse and intent distributed across contexts (see Tables 4.17, 4.18, and 4.19). 4.7. DOCUMENT AGGREGATION 116

Table 4.17: Document with abusive intent of 0.89 using the windowed product cal- culation

Abusive Abuse Intent Context Intent 0.905 0.015 0.014 i cant even make it past one or two of these idiots emma west has more guts than all these idiots 0.028 0.981 0.028 to think that she is sitting in jail right now for speak- ing the truth makes me really mad i want to find her and give her a hug and thank her for being awesome

Table 4.18: Document with abusive intent of 0.91 using the windowed product cal- culation

Abusive Abuse Intent Context Intent 0.947 0.002 0.002 you are all totally stupid 0.000 0.959 0.000 i was submitting a satirical post analogous to all of those which i continue to read on this website 0.234 0.007 0.002 non sensical pretentious etc etc 0.022 0.000 0.000 erm read the title of the thread a typical wn thread 0.018 0.002 0.000 yes you are all terribly bright peoplerolleyes 0.005 0.000 0.000 the post was called posticus b s icus

Table 4.19: Document with abusive intent of 0.89 using the windowed product cal- culation

Abusive Abuse Intent Context Intent 0.923 0.001 0.001 sean taylor is a stupid ape 0.001 0.960 0.001 that i will remember him that way 117

Chapter 5

Discussion

The proliferation of Internet connected devices continues to enable massive collections of unstructured data to be created and collected. Much of this data comes from websites such as social media or blogs. Unfortunately some of these sites are used by criminal or terrorist organizations for recruitment or to spread rhetoric. By analyzing this content, however, it is possible to gain insights into the future actions of the writer(s). This information can enable organizations to take proactive measures to modify or stop said actions from taking place. To support law enforcement efforts towards such a goal, the detection of abusive intent can be automated. This task was split into its component pieces, detecting abuse and detecting intent. Abusive language detection is a well studied problem with existing techniques and datasets available (as discussed in Section 2.3.2). Intent detection, however, is a less developed field, with much of the previous work focusing on intent classification or intent detection on a topic-specific corpus (as discussed in Section 2.3.1). To fill this gap, a method for training a deep learning model for intent detection was adapted. The proposed method trained the intent detection model by generating an initial 5.1. LIMITATIONS 118

set of labels, then co-training with a deep learning and statistical model. The lin- guistic model developed was used to generate the rough labels by parsing contexts, then checking for one of the forms of intent. These labels were then refined based on the desire verb present in the expression of intent. Once produced, the initial labels were passed to the deep learner and sequence learner, which proceeded to co-train. The statistical model was adapted to identify the more subtle signals of intent (as compared to abuse). During the process, each model would train itself, make new predictions, agree on whether and how to change the labels, then start this cycle again with the new labels. This training process resulted in a deep learning model, which could then be used alongside the abusive language model to detect abuse and intent in documents. These predictions could then be used to compute abusive intent predictions for any document.

5.1 Limitations

The abusive intent model produced has several limitations to its design and imple- mentation.

5.1.1 Imperfect initial labels

When computing the initial labels, the linguistic model and refinement process lead to flaws in the labels. This causes issues since the co-training process is extrapolating on these labels with the assumption that they are correct. Some of the initial positive labels are incorrect because of errors in the refinement process and variances in the use of first person pronouns. The refinement of the initial labels is done by filtering based on the desire verb used in the expression of intent. Problems arise because while 5.1. LIMITATIONS 119

a verb is semantically and/or syntactically similar to one of the seed verbs does not mean it necessarily indicates strong intent. For example, “expecting” is located close to “going” in the vector space formed by the embeddings. However, “I’m expecting to fight” doesn’t necessarily have the same meaning or veracity as “I’m going to fight”. As a result, there could be contexts that are initial labelled as containing strong intent that instead express weaker forms of intent. Additionally, the forms of intent that the linguistic model searches for are non-exhaustive, leading to some contexts being missed. There are multiple uses of certain first person pronouns based on the sentence, document, or conversation they are present in. Most notably for this work is the use of the personal and impersonal forms of we (as discussed in Section 4.5.3). Broadly, if a statement is made using we and the other person or people in the group can be identified, it is being used personally. For example, if two people are talking and one says “we should do X now”, we is being used in a personal sense. Conversely, if someone were giving a speech and said “we will win this election”, the other members of the group implied by we are not personally identifiable (making this an impersonal use of we). If a document were to contain the second example, it should potentially be labelled as negative intent (since we is being used more as a second or third-person pronoun), however the current method would label it positively. The effect of this is seen in Table 4.15 where the model identified “...we need to...” as containing intent despite the impersonal use of we. However, given the application to law enforcement, it is likely more useful to have false positive documents such as this rather than false negative documents (that are missed as a result). 5.1. LIMITATIONS 120

5.1.2 Lack of support to detect implicit intent

The current label generation and training processes provides no support for identifying expressions of implicit intent. While the objective is to automate the detection of explicit expressions of intent, it would be beneficial if the model could also identify implicit intent. This, however, is a more complex problem since implicit expressions can be spread across multiple contexts and are more subtle than explicit ones. To detect such expressions using deep learning would likely require initial labels marking the presence of implicit intent. Appropriate labels could be crowd-sourced, however, the labelling would have to be done with a higher tolerance than those collected for validation (as discussed below). Alternatively, the labels could potentially be computed using an approach similar to the one detailed in Section 3.2. However, unlike explicit labels, defining a model to produce labels for implicit intent would be difficult since there are significantly more forms it can take.

5.1.3 Adversarial input

This model is constructed on the assumption that the text it is assessing does not contain deception, sarcasm, or similar text. Additionally, it is assumed that the input has not been constructed to elicit a certain response or avoid detection. In order to defend against such attacks the model would have to be trained using adversarial training examples and/or designed to defend against them. However, given the train- ing method and the complexity of the signals indicating intent, defending would be difficult (at least without a labelled dataset). 5.1. LIMITATIONS 121

5.1.4 Accepting statements as truth

When assessing expressions of intent, the model takes them as fact. Specifically, it assumes that the content of the locution will occur rather than questioning the author. The mapping of expression to action is a problem studied in social science (as mentioned in Section 2.2). However, many of these techniques are designed to be used given a specific person or group. Because of this, the model was designed to identify potentially problematic contexts, which can then be manually assessed with said techniques.

5.1.5 Unrepresentative validation data

As discussed briefly in Section 4.6, the volunteers voting on the labels had differ- ent understandings of the definition of intent. This resulted in some unwarranted false positives and negatives (and potentially incorrect true positives and negatives). Because of this uncertainty in the labels produced, it is possible that the reported accuracy is not indicative of the performance of the model. If the model were to be used in practice it would be important for its potential for unreliability to be better understood and quantified. Additionally, properly labelled examples could provide an estimate of the lowest prediction value that should be manually assessed.

5.1.6 Reliance and non-uniform definitions of abuse

When training the abuse model, multiple datasets were combined to form the training set. Since each of the composite datasets were produced as part of distinct studies, their definitions of abuse varied. This resulted in the training of an abuse model where some of the labels were incorrect. These conflicting training samples could 5.1. LIMITATIONS 122

result in the model ignoring some types of abuse, which is undesirable. Similar to Section 5.1.5, if the abusive intent model were to be used in production is would be important to construct a better training dataset. This would help train a more robust model and better quantify the accuracy of said model. BIBLIOGRAPHY 123

Bibliography

[1] Mart´ınAbadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Lev- enberg, Dandelion Man´e,Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi´egas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on hetero-

geneous systems, 2015. Software available from tensorflow.org. URL: https: //www.tensorflow.org/ [cited 2020-08-18].

[2] Swati Agarwal and Ashish Sureka. Characterizing linguistic attributes for au- tomatic classification of intent based racist/radicalized posts on tumblr micro-

blogging website. CoRR, abs/1701.04931, 2017. URL: http://arxiv.org/abs/ 1701.04931, arXiv:1701.04931.

[3] Victor Asal and Andrew Vitek. Sometimes they mean what they say: under- standing violence among domestic extremists. Dynamics of Asymmetric Conflict, BIBLIOGRAPHY 124

11:1–15, 05 2018. doi:10.1080/17467586.2018.1470659.

[4] John Langshaw Austin. William James Lectures. In How to do Things with Words. Harvard University, Oxford at the Clarendon Press, 1955.

[5] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine trans- lation by jointly learning to align and translate. In Yoshua Bengio and Yann Le- Cun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

URL: http://arxiv.org/abs/1409.0473.

[6] Eric Bell, Antonio Sanfilippo, and Liam McGrath. International Handbook of Threat Assessment, chapter 14, pages 224–235. Oxford University Press, 12 2013.

[7] Bellingcat. Massive white supremacist message board leak: How to access and interpret the data. Online., November 2019. URL:

https://www.bellingcat.com/resources/how-tos/2019/11/06/ massive-white-supremacist-message-board-leak-how-to-access-and-interpret-the-data/ [cited 2020-08-19].

[8] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157– 166, 1994.

[9] Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co- training. In Proceedings of the Eleventh Annual Conference on Computational BIBLIOGRAPHY 125

Learning Theory, COLT’ 98, page 92–100, New York, NY, USA, 1998. Associa-

tion for Computing Machinery. doi:10.1145/279943.279962.

[10] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enrich- ing word vectors with subword information. Transactions of the Association for

Computational Linguistics, 5, 07 2016. doi:10.1162/tacl_a_00051.

[11] Zhiyuan Chen, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. Identifying intention posts in discussion forums. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, pages 1041–1050, At- lanta, Georgia, June 2013. Association for Computational Linguistics. URL:

https://www.aclweb.org/anthology/N13-1124.

[12] Fran¸coisChollet et al. Keras. Online., 2015. URL: https://github.com/ fchollet/keras [cited 2020-08-18].

[13] Cindy Chung and James Pennebaker. Social communication. In Klaus Fiedler, editor, The Psychological Function of Function Words, chapter 12. Psychology Press, New York, 2007.

[14] Honghua (Kathy) Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, and Ying Li. Detecting online commercial intention (oci). January

2006. URL: https://www.microsoft.com/en-us/research/publication/ detecting-online-commercial-intention-oci/.

[15] Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated hate speech detection and the problem of offensive language. BIBLIOGRAPHY 126

In Proceedings of the 11th International AAAI Conference on Weblogs

and Social Media, 2017. URL: https://data.world/thomasrdavidson/ hate-speech-and-offensive-language.

[16] Ona de Gibert Bonet, Naiara Perez Miguel, Aitor Garc´ıa-Pablos, and Montse Cuadros. Hate speech dataset from a white supremacy forum. pages 11–20,

01 2018. URL: https://github.com/aitor-garcia-p/hate-speech-dataset, doi:10.18653/v1/W18-5102.

[17] Fernando Diaz, Bhaskar Mitra, and Nick Craswell. Query expansion with locally-

trained word embeddings. pages 367–377, 05 2016. doi:10.18653/v1/P16-1035.

[18] Michael J. Egnoto, Darrin J. Griffin, and Fei Qiao. Grandstanding or fore- shadowing: analysing the University of Alabama active shooter threats with intergroup threat theory. Dynamics of Asymmetric Conflict, 11(3):171–185,

2018. arXiv:https://doi.org/10.1080/17467586.2018.1432867, doi:10. 1080/17467586.2018.1432867.

[19] James Eilbert, David Carmody, Daniel Fu, Tom Santarelli, Derek Wischusen, and Jason Donmoyer. Reasoning about adversarial intent in asymmetric situations. Technical report, American Association for Artificial Intelligence, 2002.

[20] Laura Farag´o,Anna Kende, and P´eterKrek´o.Justification of intergroup violence – the role of right-wing authoritarianism and propensity for radical action. Dy-

namics of Asymmetric Conflict, 12(2):113–128, 2019. arXiv:https://doi.org/ 10.1080/17467586.2019.1576916, doi:10.1080/17467586.2019.1576916. BIBLIOGRAPHY 127

[21] Ethan Fast, Binbin Chen, and Michael S. Bernstein. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, page 4647–4657, New York,

NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2858036. 2858535.

[22] Lei Gao, Alexis Kuppersmith, and Ruihong Huang. Recognizing explicit and implicit hate speech using a weakly supervised two-path bootstrapping approach. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 774–782, Taipei, Taiwan, November

2017. Asian Federation of Natural Language Processing. URL: https://www. aclweb.org/anthology/I17-1078.

[23] Graham Gooch and Michael Williams. A Dictionary of Law Enforcement. Oxford

University Press, second edition, 2015. doi:10.1093/acref/9780191758256. 001.0001.

[24] Paul Grice. Logic and conversation. In Peter Cole and Jerry L. Morgan, editors, Syntax and semantics: Speech acts, volume 3. Academic Press, 1975.

[25] V. Gupta, Devesh Varshney, Harsh Jhamtani, D. Kedia, and S. Karwa. Identi- fying purchase intent from social posts. pages 180–186, 05 2014.

[26] Sepp Hochreiter and J¨urgenSchmidhuber. Long short-term memory. Neural

Comput., 9(8):1735–1780, November 1997. doi:10.1162/neco.1997.9.8.1735. BIBLIOGRAPHY 128

[27] Donald Holbrook. A critical analysis of the role of the internet in the preparation and planning of acts of terrorism. Dynamics of Asymmetric Conflict, 8:121–133,

05 2015. doi:10.1080/17467586.2015.1065102.

[28] Bernd Hollerit, Mark Kr¨oll,and Markus Strohmaier. Towards linking buyers and sellers: detecting commercial intent on twitter. pages 629–632, 05 2013.

doi:10.1145/2487788.2488009.

[29] Matthew Honnibal and Ines Montani. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

to appear, 2017. URL: https://github.com/explosion/spaCy [cited 2020-08- 18].

[30] Impermium. Detecting insults in social commentary. On-

line., September 2012. URL: https://www.kaggle.com/c/ detecting-insults-in-social-commentary/ [cited 2020-02-27].

[31] Jigsaw and Conversation AI. Toxic comment classification chal-

lenge. Online., December 2017. URL: https://www.kaggle.com/c/ jigsaw-toxic-comment-classification-challenge/ [cited 2019-06-17].

[32] Eric Jones, Travis Oliphant, Pearu Peterson, and others. SciPy: Open source

scientific tools for Python, 2001–. URL: http://www.scipy.org/ [cited 2019- 06-07].

[33] M. Jones, J. Bradley, Ping Identity, N. Sakimura, Microsoft, and NRI. Json web token (jwt). Web standard, Internet Engineering Task Force, 2015. URL:

https://tools.ietf.org/html/rfc7519 [cited 2020-08-18]. BIBLIOGRAPHY 129

[34] Keras. Adam. Online. URL: https://keras.io/api/optimizers/adam/ [cited 2020-08-18].

[35] Joo-Kyung Kim, Gokhan Tur, Celikyilmaz Asli, Bin Cao, and Ye-Yi Wang. Intent detection using semantically enriched word embeddings. pages 414–419,

12 2016. doi:10.1109/SLT.2016.7846297.

[36] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza- tion. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015,

Conference Track Proceedings, 2015. URL: http://arxiv.org/abs/1412.6980.

[37] Anti-Defamation League. Hate beyond borders: The internationalization of

white supremacy. online, 2019. URL: https://www.adl.org/media/13538/ download [cited 2020-08-18].

[38] Hannah LeBlanc. Model-driven abusive language detection. Master’s thesis, Queen’s University, School of Computing, May 2019.

[39] Geoffrey N. Leech. Meaning and the English Verb. Pearson Education Limited, 3 edition, 2004.

[40] Scott M Lundberg and Su-In Lee. A unified approach to interpret- ing model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Ad- vances in Neural Information Processing Systems 30, pages 4765–4774.

Curran Associates, Inc., 2017. URL: http://papers.nips.cc/paper/ 7062-a-unified-approach-to-interpreting-model-predictions.pdf. BIBLIOGRAPHY 130

[41] Christian Mair. The spread of the going-to-future in written english: a corpus- based investigation into language change in progress. In Raymond Hickey and Stanislav Puppel, editors, Language history and linguistic modelling, pages 1537–

1543. De Gruyter Mouton, Berlin, Boston, 12 2010. doi:https://doi.org/10. 1515/9783110820751.

[42] J. Reid Meloy and Jens Hoffmann, editors. International handbook of threat assessment. Oxford University Press, December 2013.

[43] Tomas Mikolov, G.s Corrado, Kai Chen, and Jeffrey Dean. Efficient estimation of word representations in vector space. In International Conference on Learn-

ing Representations, pages 1–12, 01 2013. URL: http://arxiv.org/abs/1301. 3781.

[44] Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Ar- mand Joulin. Advances in pre-training distributed word representations. In Pro- ceedings of the International Conference on Language Resources and Evaluation

(LREC 2018), 2018. URL: https://fasttext.cc/docs/en/english-vectors. html [cited 2020-05-21].

[45] Jakub Nowak, Ahmet Taspinar, and Rafal Scherer. Lstm recurrent neural networks for short text and sentiment classification. pages 553–562, 05 2017.

doi:10.1007/978-3-319-59060-8_50.

[46] Office of the Director of National Intelligence. Global trends 2030: Al- ternative worlds. Technical report, National Intelligence Council, December

2012. URL: https://www.dni.gov/files/documents/GlobalTrends_2030. pdf [cited 2020-08-18]. BIBLIOGRAPHY 131

[47] Travis Oliphant. NumPy: A guide to NumPy, 2006–. URL: http://www.numpy. org/ [cited 2019-06-07].

[48] James W. Pennebaker. Using computer analyses to identify language style and aggressive intent: The secret life of function words. Dynamics of Asymmet-

ric Conflict, 4(2):92–102, 2011. arXiv:https://doi.org/10.1080/17467586. 2011.627932, doi:10.1080/17467586.2011.627932.

[49] Hemant Purohit, Guozhu Dong, Valerie Shalin, Krishnaprasad Thirunarayan, and Amit Sheth. Intent classification of short-text on social media. pages 222–

228, 12 2015. doi:10.1109/SmartCity.2015.75.

[50] Queen’s University Reseach Services. General Research Ethics Board

(GREB). Online. URL: https://www.queensu.ca/urs/ethics/ general-research-ethics-board-greb [cited 2020-08-18].

[51] Antonio Sanfilippo, Lyndsey Franklin, Stephen Tratz, Gary Danielson, Nicholas Mileson, Rick Riensche, and Liam McGrath. Automating Frame Analysis, pages

239–248. 01 2008. doi:10.1007/978-0-387-77672-9_26.

[52] Antonio Sanfilippo, Liam McGrath, and Paul Whitney. Violent frames in action.

Dynamics of Asymmetric Conflict, 4:103–112, 07 2011. doi:10.1080/17467586. 2011.627933.

[53] Antonio Sanfilippo, Jack Schryver, Paul Whitney, Elsa Augustenborg, Gary Danielson, and Sandy Thompson. Vim: A platform for violent intent model- ing. In Social Computing and Behavioral Modeling, pages 1–11. Springer US, 04

2009. doi:10.1007/978-1-4419-0056-2_24. BIBLIOGRAPHY 132

[54] FG Sanfilippo, AP Nibbs. Violent intent modeling: Incorporating cultural knowl- edge into the analytical process. Technical report, Pacific Northwest National

Labratory, August 2007. URL: https://www.pnnl.gov/main/publications/ external/technical_reports/PNNL-16806.pdf [cited 2020-09-14].

[55] Heet Sankesara. Hierarchical attention network. Online., 2019. URL: https: //github.com/Hsankesara/DeepResearch [cited 2020-08-18].

[56] M. Schuster and K.K. Paliwal. Bidirectional recurrent neural networks. Trans.

Sig. Proc., 45(11):2673–2681, November 1997. doi:10.1109/78.650093.

[57] Bart Schuurman and Quirine Eijkman. Indicators of terrorist intent and capa- bility: Tools for threat assessment. Dynamics of Asymmetric Conflict, 06 2015.

doi:10.1080/17467586.2015.1040426.

[58] Ram´onSpaaij. The enigma of lone wolf terrorism: An assessment. Studies in

Conflict & Terrorism, 33(9):854–870, 2010. arXiv:https://doi.org/10.1080/ 1057610X.2010.501426, doi:10.1080/1057610X.2010.501426.

[59] Ram´onSpaaij and Mark Hamm. Key issues and research agendas in lone wolf ter-

rorism. Studies in Conflict and Terrorism, 38, 03 2015. doi:10.1080/1057610X. 2014.986979.

[60] Walter G. Stephan and Cookie White Stephan. Intergroup threat theory. In The International Encyclopedia of Intercultural Communication, pages

1–12. American Cancer Society, 2017. URL: https://onlinelibrary. wiley.com/doi/abs/10.1002/9781118783665.ieicc0162, arXiv:https: BIBLIOGRAPHY 133

//onlinelibrary.wiley.com/doi/pdf/10.1002/9781118783665.ieicc0162, doi:10.1002/9781118783665.ieicc0162.

[61] Sudha Subramani, Huy Quan Vu, and Hua Wang. Intent classification using fea- ture sets for domestic violence discourse on social media. CoRR, abs/1804.03497,

2018. URL: http://arxiv.org/abs/1804.03497, arXiv:1804.03497.

[62] Harrison Tarrant. The great replacement. Online., March 2019.

[63] Unknown. Iron march database. Online, 11 2019. URL: magnet:?xt=urn:btih: f745eb1b86eb55f638517654c015fcaaadc96919&dn=iron_march_201911& tr=http%3a%2f%2fbt1.archive.org%3a6969%2fannounce&tr=http%3a%2f% 2fbt2.archive.org%3a6969%2fannounce&ws=http%3a%2f%2fia601401.us. archive.org%2f17%2fitems%2f&ws=https%3a%2f%2fia801401.us.archive. org%2f17%2fitems%2f [cited 2020-08-19].

[64] Nikhita Vedula, Nedim Lipka, Pranav Maneriker, and Srinivasan Parthasarathy. Towards open intent discovery for conversational text. ArXiv, abs/1904.08524,

2019. URL: https://arxiv.org/abs/1904.08524.

[65] Stephen Walker. Anticipating attacks from the operational codes of terror-

ist groups. Dynamics of Asymmetric Conflict, 4:1–9, 07 2011. doi:10.1080/ 17467586.2011.627936.

[66] Jinpeng Wang, Gao Cong, Wayne Xin Zhao, and Xiaoming Li. Mining user intents in twitter: A semi-supervised approach to inferring intent categories for tweets. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial In- telligence, AAAI’15, page 318–324. AAAI Press, 2015. BIBLIOGRAPHY 134

[67] Zeerak Waseem and Dirk Hovy. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop, pages 88–93, San Diego, California, June 2016.

Association for Computational Linguistics. URL: https://www.aclweb.org/ anthology/N16-2013, doi:10.18653/v1/N16-2013.

[68] Wikimedia. Wikimedia downloads. Online., June 2020. URL: https://dumps. wikimedia.org/ [cited 2020-06-22].

[69] Wikipedia. Wikipedia:manual of style. Online., March 2020. URL: https: //en.wikipedia.org/wiki/Wikipedia:Manual_of_Style [cited 2020-06-22].

[70] Meghan A. Wong, Richard Frank, and Russell Allsup. The supremacy of online white supremacists – an analysis of online discussions by white supremacists. In-

formation & Communications Technology Law, 24(1):41–73, 2015. arXiv:https: //doi.org/10.1080/13600834.2015.1011845, doi:10.1080/13600834.2015. 1011845.

[71] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. pages 1480–

1489, 01 2016. doi:10.18653/v1/N16-1174. 135

Appendix A

Datasets

In order to train abusive language and intent detection models, several datasets were used.

A.1 Storm-Front (intent)

The primary dataset was provided to Professor Skillicorn by Professor Richard Frank at Simon Fraser University. When collected, their research team was advised that no explicit ethics clearance was required since it is a publicly accessible social me- dia forum. As discussed in Section 3.1.1, this dataset was chosen since, as a white supremacist forum, it likely contain many examples of abusive intent. This is bene- ficial when training models since it provides as many examples as possible to learn from.

A.2 Wikipedia (intent)

In order to teach the intent model to recognize objective statements as non-intent, Wikipedia articles were introduced. This was done since the Wikipedia guidelines A.3. IRON MARCH 136

(theoretically) prevent expressions of first-person intent in the articles [69]. The data used was a random selection of articles from the Wikipedia article backup performed on June 6th, 2020 [68].

A.3 Iron March

In 2019, the white supremist forum Iron March had a leak where the site’s SQL database was posted to The Internet Archive [7]. This contained all the user data, posts, and messages from the site’s 6-year history [63]. For this work, the personal messages were used as a secondary data source to qualitatively assess the abuse and intent models. It was chosen to help (qualitatively) ensure the abuse and intent models could be generalized to other white supremist forums.

A.4 Manifesto

The manifesto (likely) posted by the 2019 Christchurch mosque shooter was also used to qualitatively check the models [62]. This was chosen since, unlike many of the users on Storm-Front and Iron March, this individual carried out the perlocution.

A.5 Hate speech ensemble

As listed in Table 3.3, the corpus used to train the abusive language model is a composition of several datasets. These include a labelled dataset from Storm-Front, a set of insults, and Wikipedia comments [16, 30, 31]. The abusive Storm-Front dataset was chosen because it comes from the same source as the primary corpus and has labels indicating hate speech. This means that the documents likely have very similar word usage to the corpus of interest and labels indicating hate speech, not A.5. HATE SPEECH ENSEMBLE 137

just profanity. The insults dataset provides labelled data for insults, again, rather than profanity. Finally, the Wikipedia dataset has similar documents lengths to Storm-Front and labels indicating any number of toxic, severe-toxic, obscene, threat, insult, or identity-hate. From the multi-class labels, those used to train the abusive language model were all except obscene. Similar to the other datasets, this was done since profanities were not of interest. 138

Appendix B

Computational resources

To compute the various portions of the work, three machines were used. The first, called machine A, is a Windows Server meant for highly parallelizable jobs or those with large memory requirements. For deep-learning jobs machine B is used to perform the computation. Finally, machine C was used for long-running jobs. This included training the fastText word embedding models and hosting the data labelling site. Detailed specifications about all three machines can be found in Table B.1.

Table B.1: Machine descriptions

Specification Machine A Machine B Machine C Processor 40-thread Xeon E7-2860 12-thread AMD 3-thread virtual CPU Memory capacity 256 GB 16 GB 12 GB Graphics card None Nvidia 2060 None Storage medium hard disk solid state solid state Operating system Windows Server 2016 Windows 10 Ubuntu 18 139

Appendix C

Data labelling interface

Since a deductive approach was used to predict intent in documents, it was necessary to evaluate the accuracy of the model. This was accomplished by making predictions on a set of documents that were also labelled by human annotators. Using these labels the accuracy of the model was determined (see chapter 3.6 for these results). The labels were generated by anonymous volunteers using a custom web application (web-app). The web-app was a simple interface that displayed contexts then got the user to label it. An overview of the architecture used can be seen below in Figure C.1. The core requirements of the site were that it provide a lightweight and simple interface for users to label the data. It was also important that users be able to do the labelling anonymously, while ensuring the responses are accurate. Anonymity was a key factor for ethics approval and to mitigate potential conflict of interests.

C.1 Architecture

The contexts, labels, and (anonymous) user info was stored in a SQLite3 database on the server. This database is in the form of a file which was write-locked by the application for the duration of the data collection process. This file was then backed C.1. ARCHITECTURE 140

Figure C.1: Overview of the web application used to collect data labelling. up using a cron job once an hour for the duration of the labelling period. The VM was also periodically backed up by the Queen’s Department of Computing, who hosted it. The backend was written in JavaScript and based around a web application framework called Express.js 4. Throughout the labelling period, the site experienced no downtime and no application crashes (see Figure C.2). This was monitored by a process manager called PM2, which was built for managing Node.JS applications in production. Given the uptime and write-lock by a single-thread process no accidental modification or deletion of the collected labels occurred. The client-side application was written in React, a JavaScript framework. The clients connected to the backend through HTTPS, which was configured to automatically- renew its certificate. As seen in Figure C.1, the client can freely make signup and page requests. The signup endpoint creates an (anonymous) entry for the user in the database and issues the client a JSON Web Token (JWT). A JWT can then be C.2. ETHICAL CLEARANCE 141

used to securely authenticate future requests between the client and backend appli- cation [33]. This ensures that people who have not read and agreed to participate in labelling cannot accidentally access any content. The use of JWTs also ensures that all labels can be associated with specific a user (index), allowing incorrect user labels to be removed.

Figure C.2: PM2 web application status readout

For additional technical information on the data labelling application the code is openly available here.

C.2 Ethical clearance

The data collection application was reviewed and approved by the General Research Ethics Board (GREB) on March 18th, 2020. The data collection period finished on May 1, 2020. Additional information about the ethics approval process is available on the Queen’s GREB page. Further information about the ethics approval of this application can be obtained by contacting GREB about GCOMP-105-20, Labelling social media posts for presence of abusive intent. The file number of the application is 6029144 with primary investigator Professor David Skillicorn.

C.3 Labelling instructions

Before starting the labelling process, volunteers were also presented with a set of instructions explaining the ask. The key component of this being the definition of C.3. LABELLING INSTRUCTIONS 142

abuse and intent. Abuse was simply defined as “an insult or hate speech”. Intent was defined as “...the author’s’ expression [of] an intention or desire to do something in the future”. Intent and abuse were further explained using examples (as shown below).

For example, if the document was “Damn, my TV broke today so I’m going to have to go buy a new one tomorrow”, then this would not be abusive (since the profanity is not directed at anyone or any group) but it would be considered to contain intent (i.e. the intention to buy a new TV). Some other examples of intentful language are:

• “I am going to #### ...” (since they’re going to do something)

• “I want to ##### ...” (since they want something to happen)

• “I’d love to see some ...” (since they want to see something)

Some examples of documents that don’t contain intent are:

• “I don’t want to go to the ...” (since they don’t want ...)

• “hello I have not received ...” (since the author is not expressing a desire/intention)

• “I felt like ####### ...” (since this is a statement on a past feeling)

NOTE: For the complete set of text displayed to volunteers please consult the repository (as given above).