Arxiv:1710.07394V2 [Cs.CL] 22 May 2018 Elucidate Public Opinion Surrounding Polarizing Quires Minimal Human Supervision and Can Be Eas- Events
Total Page:16
File Type:pdf, Size:1020Kb
Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach Lei Gao Alexis Kuppersmith Ruihong Huang Texas A&M University Stanford University Texas A&M University [email protected] [email protected] [email protected] Abstract contents) and tends to transform rapidly following a new trigger event. Our pilot annotation experi- In the wake of a polarizing election, social ment with 5,000 randomly selected tweets shows media is laden with hateful content. To that around 0.6% (31 tweets) of tweets are hate- address various limitations of supervised ful. The mass-scale (Yahoo! Finance online com- hate speech classification methods includ- ments) hate speech annotation effort from Yahoo! ing corpus bias and huge cost of annota- (Nobata et al., 2016) revealed that only 5.9% of tion, we propose a weakly supervised two- online comments contained hate speech. There- path bootstrapping approach for an online fore, large amounts of online texts need to be an- hate speech detection model leveraging notated to adequately identify hate speech. In re- large-scale unlabeled data. This system cent studies (Waseem and Hovy, 2016; Kwok and significantly outperforms hate speech de- Wang, 2013), the data selection methods and an- tection systems that are trained in a super- notations are often biased towards a specific type vised manner using manually annotated of hate speech or hate speech generated in cer- data. Applying this model on a large quan- tain scenarios in order to increase the ratio of hate tity of tweets collected before, after, and speech content in the annotated data sets, which on election day reveals motivations and however made the resulting annotations too dis- patterns of inflammatory language. torted to reflect the true distribution of hate speech. Furthermore, inflammatory language changes dra- 1 Introduction matically following new hate “trigger” events, Following a turbulent election season, 2016’s dig- which will significantly devalue annotated data. ital footprint is awash with hate speech. Apart To address the various limitations of super- from censorship, the goals of enabling computers vised hate speech detection methods, we present to understand inflammatory language are many. a weakly supervised two-path bootstrapping ap- Sensing increased proliferation of hate speech can proach for online hate speech detection that re- arXiv:1710.07394v2 [cs.CL] 22 May 2018 elucidate public opinion surrounding polarizing quires minimal human supervision and can be eas- events. Identifying hateful declarations can bolster ily retrained and adapted to capture new types of security in revealing individuals harboring mali- inflammatory language. Our two-path bootstrap- cious intentions towards specific groups. ping architecture consists of two learning compo- Recent studies on supervised methods for on- nents, an explicit slur term learner and a neural net line hate speech detection (Waseem and Hovy, classifier (LSTMs (Hochreiter and Schmidhuber, 2016; Nobata et al., 2016) have relied on manu- 1997)), that can capture both explicit and implicit ally annotated datasets, which are not only costly phrasings of online hate speech. to create but also likely to be insufficient to ob- Specifically, our bootstrapping system starts tain wide-coverage hate speech detection systems. with automatically labeled online hateful content This is mainly because online hate speech is rela- that are identified by matching a large collection tively infrequent (among large amounts of online of unlabeled online content with several hateful slur terms. Then two learning components will be two-path bootstrapping system is able to jointly initiated simultaneously. A slur term learner will identify many more hate speech texts (214,997 v.s learn additional hateful slur terms from the auto- 52,958 v.s 112,535) with a significantly higher F- matically identified hateful content. Meanwhile, a score (48.9% v.s 19.7% v.s 26.1%), when com- neural net classifier will be trained using the au- pared to the bootstrapping systems with only the tomatically labeled hateful content as positive in- slur term learner and only the neural net classi- stances and randomly sampled online content as fier. In addition, the evaluation shows that the negative instances. Next, both string matching two-path bootstrapping system identifies 4.4 times with the newly learned slur terms and the trained more hateful texts than hate speech detection sys- neural net classifier will be used to recognize new tems that are trained using manually annotated hateful content from the large unlabeled collec- data in a supervised manner. tion of online contents. Then the newly identi- fied hateful content by each of the two learning 2 Related Work components will be used to augment the initially identified hateful content, which will be used to Previous studies on hate speech recognition learn more slur terms and retrain the classifier. The mostly used supervised approaches. Due to the whole process iterates. sparsity of hate speech overall in reality, the data The design of the two-path bootstrapping sys- selection methods and annotations are often bi- tem is mainly motivated to capture both explicit ased towards a specific type of hate speech or and implicit inflammatory language. Explicit hate speech generated in certain scenarios. For hate speech is easily identifiable by recognizing a instance, Razavi et al.(2010) conducted their clearly hateful word or phrase. For example: experiments on 1525 annotated sentences from a company’s log file and a certain newsgroup. (1) Don’t talk to me from an anonymous account Warner and Hirschberg(2012) labeled around you faggot coward, whither up and die. 9000 human labeled paragraphs from Yahoo!’s (2) And that’s the kind of people who support news group post and American Jewish Congress’s Trump! Subhumans! website, and the labeling is restricted to anti- In contrast, implicit hate speech employs circum- Semitic hate speech. Sood et al.(2012) studied locution, metaphor, or stereotypes to convey ha- use of profanity on a dataset of 6,500 labeled com- tred of a particular group, in which hatefulness can ments from Yahoo! Buzz. Kwok and Wang(2013) be captured by understanding its overall composi- built a balanced corpus of 24582 tweets consist- tional meanings, For example: ing of anti-black and non-anti black tweets. The tweets were manually selected from Twitter ac- (3) Hillary’s welfare army doesn’t really want counts that were believed to be racist based upon jobs. They want more freebies. their reactions to anti-Obama articles. Burnap and (4) Affirmative action means we get affirmatively Williams(2014) collected hateful tweets related second rate doctors and other professionals. to the murder of Drummer Lee Rigby in 2013. Furthermore, our learning architecture has a fla- Waseem and Hovy(2016) collected tweets using vor of co-training (Blum and Mitchell, 1998) in hateful slurs, specific hashtags as well as suspi- maintaining two learning components that concen- cious user IDs. Consequently, all of the 1,972 trate on different properties of inflammatory lan- racist tweets are by 9 users, and the majority of guage. By modeling distinct aspects of online hate sexist tweets are related to an Australian TV show. speech, such a learning system is better equipped Djuric et al.(2015) is the first to study hate to combat semantic drift, which often occurs in speech using a large-scale annotated data set. self-learning where the learned model drifts away They have annotated 951,736 online comments from the esteemed track. Moreover, training two from Yahoo!Finance, with 56,280 comments la- complementary models simultaneously and utiliz- beled as hateful. Nobata et al.(2016) followed ing both models to identify hate speech of different Djuric et al.(2015)’s work. In addition to the Ya- properties in each iteration of the learning process hoo!Finance annotated comments, they also an- is important to maintain the learning momentum notated 1,390,774 comments from Yahoo!News. and to generate models with wide coverage. In- Comments in both data sets were randomly sam- deed, our experimental results have shown that the pled from their corresponding websites with a fo- cus on comments by users who were reported to with slur term seeds. Tweets that contain one of have posted hateful comments. We instead aim to the seed slur terms are labeled as hateful. detect hate speech w.r.t. its real distribution, using The two-path bootstrapping system consists a weakly supervised method that does not rely on of two learning components, an explicit slur large amounts of annotations. term learner and a neural net classifier (LSTMs The commonly used classification methods in (Hochreiter and Schmidhuber, 1997)), that can previous studies are logistic regression and Naive capture both explicit and implicit descriptions of Bayes classifiers. Djuric et al.(2015) and No- online hate speech. Using the initial seed slur term bata et al.(2016) applied neural network models labeled hateful tweets, the two learning compo- for training word embeddings, which were further nents will be initiated simultaneously. The slur used as features in a logistic regression model for term learner will continue to learn additional hate- classification. We will instead train a neural net ful slur terms. Meanwhile, the neural net classi- classifier (Kim, 2014; Lai et al., 2015; Zhou et al., fier will be trained using the automatically labeled 2015) in a weakly supervised manner in order to hateful tweets as positive instances and randomly capture implicit and compositional hate speech ex- sampled tweets as negative instances. Next, both pressions. the newly learned slur terms and the trained neu- Xiang et al.(2012) is related to our research ral net classifier will be used to identify new hate- because they also used a bootstrapping method ful content from the unlabeled large collection of to discover offensive language from a large-scale tweets.