<<

Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach

Lei Gao Alexis Kuppersmith Ruihong Huang Texas A&M University Stanford University Texas A&M University [email protected] [email protected] [email protected]

Abstract contents) and tends to transform rapidly following a new trigger event. Our pilot annotation experi- In the wake of a polarizing election, social ment with 5,000 randomly selected tweets shows media is laden with hateful content. To that around 0.6% (31 tweets) of tweets are hate- address various limitations of supervised ful. The mass-scale (Yahoo! Finance online com- hate speech classification methods includ- ments) hate speech annotation effort from Yahoo! ing corpus bias and huge cost of annota- (Nobata et al., 2016) revealed that only 5.9% of tion, we propose a weakly supervised two- online comments contained hate speech. There- path bootstrapping approach for an online fore, large amounts of online texts need to be an- hate speech detection model leveraging notated to adequately identify hate speech. In re- large-scale unlabeled data. This system cent studies (Waseem and Hovy, 2016; Kwok and significantly outperforms hate speech de- Wang, 2013), the data selection methods and an- tection systems that are trained in a super- notations are often biased towards a specific type vised manner using manually annotated of hate speech or hate speech generated in cer- data. Applying this model on a large quan- tain scenarios in order to increase the ratio of hate tity of tweets collected before, after, and speech content in the annotated data sets, which on election day reveals motivations and however made the resulting annotations too dis- patterns of inflammatory language. torted to reflect the true distribution of hate speech. Furthermore, inflammatory language changes dra- 1 Introduction matically following new hate “trigger” events, Following a turbulent election season, 2016’s dig- which will significantly devalue annotated data. ital footprint is awash with hate speech. Apart To address the various limitations of super- from censorship, the goals of enabling computers vised hate speech detection methods, we present to understand inflammatory language are many. a weakly supervised two-path bootstrapping ap- Sensing increased proliferation of hate speech can proach for online hate speech detection that re- arXiv:1710.07394v2 [cs.CL] 22 May 2018 elucidate public opinion surrounding polarizing quires minimal human supervision and can be eas- events. Identifying hateful declarations can bolster ily retrained and adapted to capture new types of security in revealing individuals harboring mali- inflammatory language. Our two-path bootstrap- cious intentions towards specific groups. ping architecture consists of two learning compo- Recent studies on supervised methods for on- nents, an explicit slur term learner and a neural net line hate speech detection (Waseem and Hovy, classifier (LSTMs (Hochreiter and Schmidhuber, 2016; Nobata et al., 2016) have relied on manu- 1997)), that can capture both explicit and implicit ally annotated datasets, which are not only costly phrasings of online hate speech. to create but also likely to be insufficient to ob- Specifically, our bootstrapping system starts tain wide-coverage hate speech detection systems. with automatically labeled online hateful content This is mainly because online hate speech is rela- that are identified by matching a large collection tively infrequent (among large amounts of online of unlabeled online content with several hateful slur terms. Then two learning components will be two-path bootstrapping system is able to jointly initiated simultaneously. A slur term learner will identify many more hate speech texts (214,997 v.s learn additional hateful slur terms from the auto- 52,958 v.s 112,535) with a significantly higher F- matically identified hateful content. Meanwhile, a score (48.9% v.s 19.7% v.s 26.1%), when com- neural net classifier will be trained using the au- pared to the bootstrapping systems with only the tomatically labeled hateful content as positive in- slur term learner and only the neural net classi- stances and randomly sampled online content as fier. In addition, the evaluation shows that the negative instances. Next, both string matching two-path bootstrapping system identifies 4.4 times with the newly learned slur terms and the trained more hateful texts than hate speech detection sys- neural net classifier will be used to recognize new tems that are trained using manually annotated hateful content from the large unlabeled collec- data in a supervised manner. tion of online contents. Then the newly identi- fied hateful content by each of the two learning 2 Related Work components will be used to augment the initially identified hateful content, which will be used to Previous studies on hate speech recognition learn more slur terms and retrain the classifier. The mostly used supervised approaches. Due to the whole process iterates. sparsity of hate speech overall in reality, the data The design of the two-path bootstrapping sys- selection methods and annotations are often bi- tem is mainly motivated to capture both explicit ased towards a specific type of hate speech or and implicit inflammatory language. Explicit hate speech generated in certain scenarios. For hate speech is easily identifiable by recognizing a instance, Razavi et al.(2010) conducted their clearly hateful word or phrase. For example: experiments on 1525 annotated sentences from a company’s log file and a certain newsgroup. (1) Don’t talk to me from an anonymous account Warner and Hirschberg(2012) labeled around you faggot coward, whither up and die. 9000 human labeled paragraphs from Yahoo!’s (2) And that’s the kind of people who support news group post and American Jewish Congress’s Trump! Subhumans! website, and the labeling is restricted to anti- In contrast, implicit hate speech employs circum- Semitic hate speech. Sood et al.(2012) studied locution, metaphor, or stereotypes to convey ha- use of profanity on a dataset of 6,500 labeled com- tred of a particular group, in which hatefulness can ments from Yahoo! Buzz. Kwok and Wang(2013) be captured by understanding its overall composi- built a balanced corpus of 24582 tweets consist- tional meanings, For example: ing of anti-black and non-anti black tweets. The tweets were manually selected from Twitter ac- (3) Hillary’s welfare army doesn’t really want counts that were believed to be racist based upon jobs. They want more freebies. their reactions to anti-Obama articles. Burnap and (4) Affirmative action means we get affirmatively Williams(2014) collected hateful tweets related second rate doctors and other professionals. to the murder of Drummer Lee Rigby in 2013. Furthermore, our learning architecture has a fla- Waseem and Hovy(2016) collected tweets using vor of co-training (Blum and Mitchell, 1998) in hateful slurs, specific hashtags as well as suspi- maintaining two learning components that concen- cious user IDs. Consequently, all of the 1,972 trate on different properties of inflammatory lan- racist tweets are by 9 users, and the majority of guage. By modeling distinct aspects of online hate sexist tweets are related to an Australian TV show. speech, such a learning system is better equipped Djuric et al.(2015) is the first to study hate to combat semantic drift, which often occurs in speech using a large-scale annotated data set. self-learning where the learned model drifts away They have annotated 951,736 online comments from the esteemed track. Moreover, training two from Yahoo!Finance, with 56,280 comments la- complementary models simultaneously and utiliz- beled as hateful. Nobata et al.(2016) followed ing both models to identify hate speech of different Djuric et al.(2015)’s work. In addition to the Ya- properties in each iteration of the learning process hoo!Finance annotated comments, they also an- is important to maintain the learning momentum notated 1,390,774 comments from Yahoo!News. and to generate models with wide coverage. In- Comments in both data sets were randomly sam- deed, our experimental results have shown that the pled from their corresponding websites with a fo- cus on comments by users who were reported to with slur term seeds. Tweets that contain one of have posted hateful comments. We instead aim to the seed slur terms are labeled as hateful. detect hate speech w.r.t. its real distribution, using The two-path bootstrapping system consists a weakly supervised method that does not rely on of two learning components, an explicit slur large amounts of annotations. term learner and a neural net classifier (LSTMs The commonly used classification methods in (Hochreiter and Schmidhuber, 1997)), that can previous studies are logistic regression and Naive capture both explicit and implicit descriptions of Bayes classifiers. Djuric et al.(2015) and No- online hate speech. Using the initial seed slur term bata et al.(2016) applied neural network models labeled hateful tweets, the two learning compo- for training word embeddings, which were further nents will be initiated simultaneously. The slur used as features in a logistic regression model for term learner will continue to learn additional hate- classification. We will instead train a neural net ful slur terms. Meanwhile, the neural net classi- classifier (Kim, 2014; Lai et al., 2015; Zhou et al., fier will be trained using the automatically labeled 2015) in a weakly supervised manner in order to hateful tweets as positive instances and randomly capture implicit and compositional hate speech ex- sampled tweets as negative instances. Next, both pressions. the newly learned slur terms and the trained neu- Xiang et al.(2012) is related to our research ral net classifier will be used to identify new hate- because they also used a bootstrapping method ful content from the unlabeled large collection of to discover offensive language from a large-scale tweets. The newly labeled hateful tweets by each Twitter corpus. However, their bootstrapping of the two learning components will be used to model is driven by mining hateful Twitter users, augment the initial slur term seed identified hateful instead of content analysis of tweets as in our ap- tweet collection, which will be used to learn more proach. Furthermore, they recognize hateful Twit- slur terms and retrain the classifier in the next iter- ter users by detecting explicit hateful indicators ation. The whole process then iterates. (i.e., keywords) in their tweets while our boot- After each iteration, we have to determine if strapping system aim to detect both explicit and a stopping criterion is met and we should termi- implicit expressions of online hate speech. nate the bootstrapping process. In general, a tuned threshold score is applied or a small annotated 3 The Two-path Bootstrapping System dataset is used to evaluate the learned classifiers. for Online Hate Speech Detection We adopt the latter method. Specifically, the boot- 3.1 Overview strapping system stops when the precision of the LSTM classifier is lower than 0.6 when evalu- ated using an existing small annotated tweet set (Waseem and Hovy, 2016).

3.2 Automatic Data Labeling of Initial Data Seeing a hate slur term in a tweet strongly indi- cates that the tweet is hateful. Therefore, we use 20 manually selected slur terms to match with a large unlabeled tweet collection in order to quickly construct the initial small set of hateful tweets. Ta- ble1 shows the 20 seed slurs we used. Figure 1: Diagram of co-training model bimbo commie coon cunt Figure1 illustrates that our weakly supervised fag faggot feminazi honky islamist hate speech detection system starts with a few pre- libtard muzzie paki identified slur terms as seeds and a large collection skank subhuman tranny twat wanker of unlabeled data instances. Specifically, we ex- periment with identifying hate speech from tweets. Table 1: Seed slurs Hateful tweets will be automatically identified by matching the large collection of unlabeled tweets We obtained our initial list of slurs from Hate- base1, the Racial Slurs Database 2, and a page of 3.4 The LSTM Classifier 3 LGBT slang terms . We ranked the slur terms by We aim to recognize implicit hate speech expres- their frequencies in tweets, eliminating ambiguous sions and capture composite meanings of tweets and outdated terms. The slur ”gypsy”, for exam- using a sequence neural net classifier. Specifically, ple, refers to derogatorily to people of Roma de- our LSTM classifier has a single layer of LSTM scent, but currently in popular usage is an ideal- units. The output dimension size of the LSTM ization of a trendy bohemian lifestyle. The word layer is 100. A sigmoid layer is built on the top of ”bitch” is ambiguous, sometimes a sexist slur but the LSTM layer to generate predictions. The input other times innocuously self-referential or even dropout rate and recurrent state dropout rate are friendly. both set to 0.2. In each iteration of the bootstrap- For these reasons, we only selected the top 20 ping process, the training of the LSTM classifier terms we considered reliable (shown in Table1). runs for 10 epochs. We use both the singular and the plural form for The input to our LSTM classifier is a sequence each of these seed slur terms. of words. We pre-process and normalize tokens in tweets following the steps suggested in (Pen- 3.3 Slur Term Learner nington et al., 2014). In addition, we used the The slur term learning component extracts individ- pre-processing of emoji and smiley described in ual words from a set of hateful tweets as new slurs. a preprocess tool 4. Then we retrieve word vector Intuitively, if a word occurs significantly more fre- representations from the downloaded5 pre-trained quently in hateful tweets than in randomly selected word2vec embeddings (Mikolov et al., 2013). tweets, this term is more likely to be a hateful slur The LSTM classifier is trained using the au- term. Following this intuition, we assign a score tomatically labeled hateful tweets as positive in- to each unique unigram that appears 10 or more stances and randomly sampled tweets as negative times in hateful tweets, and the score is calculated instances, with the ratio of POS:NEG as 1:10. as the relative ratio of its frequency in the labeled Then the classifier is used to identify additional hateful tweets over its frequency in the unlabeled hateful tweets from the large set of unlabeled set of tweets. Then the slur term learner recog- tweets. The LSTM classifier will deem a tweet nizes a unigram with a score higher than a cer- as hateful if the tweet receives a confidence score tain threshold as a new slur. Specifically, we use of 0.9 or higher. Both the low POS:NEG ratio and the threshold score of 100 in identifying individual the high confidence score are applied to increase word slur terms. the precision of the classifier in labeling hateful The newly identified slur terms will be used to tweets and control semantic drift in the bootstrap- match with unlabeled tweets in order to identify ping learning process. To further combat semantic additional hateful tweets. A tweet that contains drift, we applied weighted binary cross-entropy as one of the slur terms is deemed to be a hateful the loss function in LSTM. tweet. While we were aware of other more sophisti- 3.5 One vs. Two Learning Paths cated machine learning models, one purpose of As shown in Figure1, if we remove one of the two this research is to detect and learn new slur terms learning components, the two-path learning sys- from constantly generated user data. Therefore, tem will be reduced to a usual self-learning sys- the simple and clean string matching based slur tem with one single learning path. For instance, if learner is designed to attentively look for specific we remove the LSTM classifier, the slur learner words that alone can indicate hate speech. In ad- will learn new slur terms from initially seed la- dition, this is in contrast with the second learning beled hateful tweets and then identify new hateful component that uses a whole tweet and model its tweets by matching newly learned slurs with unla- compositional meanings in order to recognize im- beled tweets. The newly identified hateful tweets plicit hate speech. These two learners are comple- will be used to augment the initial hateful tweet mentary in the two-path bootstrapping system. collection and additional slur terms can be learned from the enlarged hateful tweet set. The process 1https://www.hatebase.org 2http://www.rsdb.org 4https://pypi.python.org/pypi/tweet-preprocessor/0.4.0 3https://en.wikipedia.org/wiki/List of LGBT slang terms 5https://code.google.com/archive/p/word2vec/ will iterates. However as shown later in the evalu- First, we train a traditional feature-based classi- ation section, single-path variants of the proposed fication model using logistic regression (LR). We two-path learning system are unable to receive ad- apply the same set of features as mentioned in ditional fresh hateful tweets identified by the other (Waseem and Hovy, 2016). The features include learning component and lose learning momentum character-level bigrams, trigrams, and four-grams. quickly. In addition, for direct comparisons, we train a LSTM model using the 16 thousand annotated 3.6 Tackling Semantic Drifts tweets, using exactly the same settings as we use Semantic drift is the most challenging problem in for the LSTM classifier in our two-path bootstrap- distant supervision and bootstrapping. First of all, ping system. we argue that the proposed two-path bootstrapping system with two significantly different learning 4.3 Evaluation Methods components is designed to reduce semantic drift. We apply both supervised classifiers and our According to the co-training theory (Blum and weakly supervised hate speech detection systems Mitchell, 1998), the more different the two com- to the 62 million tweets in order to identify hate- ponents are, the better. In evaluation, we will show ful tweets that were posted before and after the US that such a system outperforms single-path boot- election day. We evaluate both precision and recall strapping systems. Furthermore, we have applied for both types of systems. Ideally, we can easily several strategies in controlling noise and imbal- measure precision as well as recall for each sys- ance of automatically labeled data, e.g., the high tem if we have ground truth labels for each tweet. frequency and the high relative frequency thresh- However, it is impossible to obtain annotations for olds enforced in selecting hate slur terms, as well such a large set of tweets. The actual distribution as the low POS:NEG training sample ratio and the of hateful tweets in the 62 million tweets is un- high confidence score of 0.9 used in selecting new known. data instances for the LSTM classifier. Instead, to evaluate each system, we randomly sampled 1,000 tweets from the whole set of hate- 4 Evaluations ful tweets that had been tagged as hateful by the 4.1 Tweets Collection corresponding system. Then we annotate the sam- pled tweets and use them to estimate precision and We randomly sampled 10 million tweets from 67 recall of the system. In this case, million tweets collected from Oct. 1st to Oct. 24th n using Twitter API. These 10 million tweets were precision = used as the unlabeled tweet set in bootstrapping 1000 learning. Then we continued to collect 62 mil- recall ∝ precision · N lion tweets spanning from Oct.25th to Nov.15th, essentially two weeks before the US election day Here, n refers to the number of hateful tweets and one week after the election. The 62 million that human annotators identified in the 1,000 sam- tweets will be used to evaluate the performance pled tweets, and N refers to the total number of of the bootstrapped slur term learner and LSTM hateful tweets the system tagged in the 62 million classifier. The timestamps of all these tweets are tweets. We further calculated system recall by nor- converted into EST. By using Twitter API, the col- malizing the product, precision · N, with an esti- lected tweets were randomly sampled to prevent a mated total number of hateful tweets that exist in bias in the data set. the 62 million tweets, which was obtained by mul- tiplying the estimated hateful tweet rate of 0.6%6 4.2 Supervised Baselines with the exact number of tweets in the test set. Fi- We trained two supervised models using the 16 nally, we calculate F-score using the calculated re- thousand annotated tweets that have been used call and precision. in a recent study (Waseem and Hovy, 2016). Consistent across the statistical classifiers in- The annotations distinguish two types of hateful cluding both logistic regression classifiers and tweets, sexism and , but we merge both 6We annotated 5,000 tweets that were randomly sampled categories and only distinguish hateful from non- during election time and 31 of them were labeled as hateful, hateful tweets. therefore the estimated hateful tweet rate is 0.6% (31/5,000). LSTM models, only tweets that receive a con- row shows the results for the full system. We fidence score over 0.9 were tagged as hateful can see that the full system Union is significantly tweets. better than the supervised LSTM model in terms of recall and F-score. Furthermore, we can see 4.4 Human Annotations that a significant portion of hateful tweets were When we annotate system predicted tweet sam- identified by both components and the weakly su- ples, we essentially adopt the same definition of pervised LSTM classifier is especially capable to hate speech as used in (Waseem and Hovy, 2016), identify a large number of hateful tweets. Then which considers tweets that explicitly or implicitly the slur matching component obtains an preci- propagate stereotypes targeting a specific group sion of around 56.5% and can identify roughly 3 whether it is the initial expression or a meta- times of hateful tweets compared with the super- expression discussing the hate speech itself (i.e. vised LSTM classifier. The last column of this a paraphrase). In order to ensure our annota- section shows the performance of our model on tors have a complete understanding of online hate a collection of human annotated tweets as intro- speech, we asked two annotators to first discuss duced in the previous work (Waseem and Hovy, over a very detailed annotation guideline of hate 2016). The recall is rather low because the data speech, then annotate separately. This went for we used to train our model is quite different from several iterations. this dataset which contains tweets related to a TV Then we asked the two annotators to annotate show (Waseem and Hovy, 2016). The precision is the 1,000 tweets that were randomly sampled from only slightly lower than previous supervised mod- all the tweets tagged as hateful by the supervised els that were trained using the same dataset. LSTM classifier. The two annotators reached an Table3 shows the number of hateful tweets our inter-agreement Kappa (Cohen, 1960) score of bootstrapping system identified in each iteration 85.5%. Because one of the annotators become un- during training. Specifically, the columns Slur available later in the project, the other annotator Match and LSTMs show the number of hateful annotated the remaining sampled tweets. tweets identified by the slur learning component 4.5 Experimental Results and the weakly supervised LSTM classifier re- spectively. We can see that both learning compo- Supervised Baselines nents steadily label new hateful tweets in each it- The first section of Table2 shows the perfor- eration and the LSTM classifier often labels more mance of the two supervised models when applied tweets as hateful compared to slur matching. to 62 million tweets collected around election time. We can see that the logistic regression model Furthermore, we found that many tweets were suffers from an extremely low precision, which is labeled as hateful by both slur matching and the less than 10%. While this classifier aggressively LSTM classifier. Table4 shows the number of labeled a large number of tweets as hateful, only hateful tweets in each of the three segments, hate- 121,512 tweets are estimated to be truly hateful. ful tweets that have been labeled by both compo- In contrast, the supervised LSTM classifier has a nents as well as hateful tweets that were labeled by high precision of around 79%, however, this clas- one component only. Note that the three segments sifier is too conservative and only labeled a small of tweets are mutually exclusive from others. We set of tweets as hateful. can see that many tweets were labeled by both The Two-path Bootstrapping System components and each component separately la- Next, we evaluate our weakly supervised classi- beled some additional tweets as well. This demon- fiers which were obtained using only 20 seed slur strates that hateful tweets often contain both ex- terms and a large set of unlabeled tweets. The two- plicit hate indicator phrases and implicit expres- path weakly supervised bootstrapping system ran sions. Therefore in our two-path bootstrapping for four iterations. The second section of Table2 system, the hateful tweets identified by slur match- shows the results for the two-path weakly super- ing are useful for improving the LSTM classifier, vised system. The first two rows show the eval- vice versa. This also explains why our two-path uation results for each of the two learning com- bootstrapping system learn well to identify vari- ponents in the two-path system, the LSTM classi- eties of hate speech expressions in practice. fier and the slur learner, respectively. The third One-path Bootstrapping System Variants Classifier Precision Recall F1 # of Predicted Tweets # of Estimated Hateful Supervised Baselines Logistic Regression 0.088 0.328 0.139 1,380,825 121,512 LSTMs 0.791 0.132 0.228 62,226 49,221 The Two-path Weakly Supervised Learning System LSTMs 0.419 0.546 0.474 483,298 202,521 Slur Matching 0.565 0.398 0.468 261,183 147,595 Union 0.422 0.580 0.489 509,897 214,997 Union* 0.626* 0.258* 0.365* - - Variations of the Two-path Weakly Supervised Learning System Slur Matching Only 0.318 0.143 0.197 166,535 52,958 LSTMs Only 0.229 0.303 0.261 491,421 112,535

Table 2: Performance of Different Models

Its Prev Slur Match LSTMs 5 Analysis 1 8,866 422 3,490 2 12,776 4,890 13,970 5.1 Analysis of the Learned Hate Indicators 3 27,274 6,299 21,579 4 50,721 9,895 22,768 berk chavs degenerates douches facist hag heretics jihadists Table 3: Number of Labeled Tweets in Each Itera- lesbo pendejo paedo pinche tion retards satanist scum scumbag slutty tards unamerican wench

Intersection LSTM Only Slur Only Table 5: New slurs learned by our model 234,584 248,714 26,599 We have learned 306 unigram phrases using the slur term learning component. Among them, Table 4: Number of Hateful Tweets in Each Seg- only 45 phrases were seen in existing hate slur ment databases while the other terms, 261 phrases in to- tal, were only identified in real-world tweets. Ta- ble5 shows some of the newly discovered hate in- In order to understand how necessary it is to dicating phrases. Our analysis shows that 86 of the maintain two learning paths for online hate speech newly discovered hate indicators are strong hate detection, we also ran two experiments with one slur terms and the remaining 175 indicators are re- learning component removed from the loop each lated to discussions of identity and politics such as time. Therefore, the reduced bootstrapping sys- ’supremacist’ and ’Zionism’. tems can only repeatedly learn explicit hate speech (with the slur learner) or implicit hateful expres- 5.2 Analysis of LSTM Identified Hateful sions (with the LSTM classifier). Tweets The LSTM labeled 483,298 tweets as hateful, and The third section of Table2 shows the evalua- 172,137 of them do not contain any of the original tion results of the two single-path variants of the seed slurs or our learned indicator phrases. The weakly supervised system. We can see that both following are example hateful tweets that have no the estimated precision, recall, F score and the explicit hate indicator phrase: estimated number of truly hateful tweets by the two systems are significantly lower than the com- (1) @janh2h The issue is that internationalists plete two-path bootstrapping system, which sug- keep telling outsiders that they’re just as entitled gests that our two-path learning system can effec- to the privileges of the tribe as insiders. tively capture diverse descriptions of online hate (2) This is disgusting! Christians are very tolerant speech, maintain learning momentums as well as people but Muslims are looking to wipe us our and effectively combat with noise in online texts. dominate us! Sen https://t.co/7DMTIrOLyw We can see that the hatefulness of these @HillaryClinton), indicating that hate speech is tweets is determined by their overall composi- often fueled by partisan warfare. Other common tional meanings rather than a hate-indicating slur. mentions include news sources, such as Politico and MSNBC, which further support that ”trigger” 5.3 Error Analysis events in the news can generate inflammatory re- The error of our model comes from semantic drift sponses among Twitter users. Certain individual in bootstrapping learning, which partially results Twitter users also received a sizable number of from the complexity and dynamics of language. mentions. @mitchellvii is a conservative activist Specifically, we found dynamic word sense of whose tweets lend unyielding support to Donald slurs and natural drifting of word semantic. Many Trump. Meanwhile, Twitter user @purplhaze42 slur terms are ambiguous and have multiple word is a self-proclaimed anti-racist and anti-Zionist. senses. For instance, “Chink”, an anti-Asian epi- Both figured among the most popular recipients of thet, can also refer to a patch of light from a small inflammatory language. aperture. Similarly, “Negro” is a toponym in ad- Table7 shows that the majority of hashtags also dition to a racial slur. Further, certain communi- indicate the political impetus behind hate speech ties have reclaimed slur words. Though the word with hashtags such as #Trump and #MAGA (Make “dyke” is derogatory towards lesbians, for exam- America Great Again, Trump’s campaign slogan) ple, some use it self-referentially to destigmatize among the most frequent. The specific televised it, a phenomenon we sometimes encountered. events also engender proportionally large amounts of hateful language as they can be commonly ex- 5.4 Temporal Distributions of Tagged perienced by all television-owning Americans and Hateful Tweets therefore a widely available target for hateful mes- By applying our co-training model on the 62 mil- sages. lion tweets corpus, we found around 510 thousand @realDonaldTrump @HillaryClinton @megynkelly tweets labeled as hateful in total. @CNN @FoxNews @newtgingrich @nytimes @YouTube @POTUS @KellyannePolls @MSNBC @seanhannity @washingtonpost @narendramodi @CNNPolitics @PrisonPlanet @guardian @JoyAnnReid @BarackObama @thehill @BreitbartNews @politico @ABC @AnnCoulter @jaketapper @ArvindKejriwal @FBI @mitchellvii @purplhaze42 @SpeakerRyan

Table 6: List of Top 30 Mentions in Hateful Tweets During Election Days

Figure 2: Temporal Distribution of Hateful Tweets #Trump #ElectionNight #Election2016 #MAGA #trndnl #photo The figure2 displays the temporal distribution #nowplaying #Vocab #NotMyPresident of hateful tweets. There is a spike in hateful tweets #ElectionDay #trump #ImWithHer from Nov.7th to Nov.12th in terms of both number #halloween #cdnpoli #Latin #Hillary #WorldSeries #1 of hateful tweets and ratio of hateful tweets to total #Brexit #Spanish #auspol tweets. #notmypresident #C51 #NeverTrump #hiring #bbcqt #USElection2016 5.5 Most Frequent Mentions and Hashtags of #tcot #TrumpProtest #XFactor Tagged Hateful Tweets Table 7: List of Top 30 Hashtags in Hateful Tweets Table6 and7 show the top 30 most frequent men- During Election Days tions in hateful tweets. They are ranked by fre- quency from left to right and from top to bottom. 6 Conclusions It is clear that the majority of mentions found in tweets tagged as hateful address polarizing Our work focuses on the need to capture both ex- political figures (i.e. @realDonaldTrump and plicit and implicit hate speech from an unbiased corpus. To address these issues, we proposed a Chikashi Nobata, Joel Tetreault, Achint Thomas, weakly supervised two-path bootstrapping model Yashar Mehdad, and Yi Chang. 2016. Abusive lan- to identify hateful language in randomly sampled guage detection in online user content. In Proceed- ings of the 25th International Conference on World tweets. Starting from 20 seed rules, we found 210 Wide Web. International World Wide Web Confer- thousand hateful tweets from 62 million tweets ences Steering Committee, pages 145–153. collected during the election. Our analysis shows a Jeffrey Pennington, Richard Socher, and Christopher D strong correlation between temporal distributions Manning. 2014. Glove: Global vectors for word of hateful tweets and the election time, as well as representation. In EMNLP. volume 14, pages 1532– the partisan impetus behind large amounts of in- 1543. flammatory language. In the future, we will look Amir H Razavi, Diana Inkpen, Sasha Uritsky, and Stan into linguistic phenomena that often occur in hate Matwin. 2010. Offensive language detection using speech, such as sarcasm and humor, to further im- multi-level classification. In Canadian Conference prove hate speech detection performance. on Artificial Intelligence. Springer, pages 16–27. Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity use in online communities. In Proceed- References ings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pages 1481–1490. Avrim Blum and Tom Mitchell. 1998. Combining la- beled and unlabeled data with co-training. In Pro- William Warner and Julia Hirschberg. 2012. Detecting ceedings of the eleventh annual conference on Com- hate speech on the world wide web. In Proceed- putational learning theory. ACM, pages 92–100. ings of the Second Workshop on Language in Social Media. Association for Computational Linguistics, Peter Burnap and Matthew Leighton Williams. 2014. pages 19–26. Hate speech, machine classification and statistical modelling of information flows on twitter: Interpre- Zeerak Waseem and Dirk Hovy. 2016. Hateful sym- tation and communication for policy decision mak- bols or hateful people? predictive features for hate ing. In Proceedings of the Internet, Politics, and speech detection on twitter. In Proceedings of Policy conference. NAACL-HLT. pages 88–93.

Jacob Cohen. 1960. A coefficient of agreement for Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and nominal scales. Educational and psychological Carolyn Rose. 2012. Detecting offensive tweets measurement 20(1):37–46. via topical feature discovery over a large scale twit- ter corpus. In Proceedings of the 21st ACM inter- Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Gr- national conference on Information and knowledge bovic, Vladan Radosavljevic, and Narayan Bhamidi- management. ACM, pages 1980–1984. pati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th Interna- Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Fran- tional Conference on World Wide Web. ACM, pages cis Lau. 2015. A c-lstm neural network for text clas- 29–30. sification. arXiv preprint arXiv:1511.08630 .

Sepp Hochreiter and Jurgen¨ Schmidhuber. 1997. Long short-term memory. Neural computation 9(8):1735–1780.

Yoon Kim. 2014. Convolutional neural net- works for sentence classification. arXiv preprint arXiv:1408.5882 .

Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In AAAI.

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI. volume 333, pages 2267– 2273.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- rado, and Jeff Dean. 2013. Distributed representa- tions of words and phrases and their compositional- ity. In Advances in neural information processing systems. pages 3111–3119.