Demographic Word Embeddings for Racism Detection on Twitter
Total Page:16
File Type:pdf, Size:1020Kb
Demographic Word Embeddings for Racism Detection on Twitter Mohammed Hasanuzzaman Gael¨ Dias Andy Way ADAPT Centre GREYC UMR 6072 ADAPT Centre Dublin City University Universite´ de Caen Normandie Dublin City University Ireland France Ireland Abstract depression, anxiety, and emotional stress as well as negative physical health outcomes such as high Most social media platforms grant users blood pressure and low infant birth weight. freedom of speech by allowing them to As the information contained in social media freely express their thoughts, beliefs, and often reflects the real-world experiences of their opinions. Although this represents in- users, there is an increased expectation that the credible and unique communication op- norms of society will also apply in social media portunities, it also presents important chal- settings. As such, there is an increasing demand lenges. Online racism is such an exam- for social media platforms to empower users with ple. In this study, we present a super- the tools to report offensive and hateful content vised learning strategy to detect racist lan- (Oboler and Connelly, 2014). guage on Twitter based on word embed- Hateful content can be defined as ”speech or ex- ding that incorporate demographic (Age, pression which is capable of instilling or inciting Gender, and Location) information. Our hatred of, or prejudice towards, a person or group methodology achieves reasonable classi- of people on a specified ground, including race, fication accuracy over a gold standard nationality, ethnicity, country of origin, ethno- dataset (F1=76.3%) and significantly im- religious identity, religion, sexuality, gender iden- proves over the classification performance tity or gender” (Gelber and Stone, 2007). While of demographic-agnostic models. there are many forms of hate speech, racism is the most general and prevalent form of hate speech in 1 Introduction Twitter (Silva et al., 2016). Racist speech relates The advent of microblogging services has im- to a socially constructed idea about differences be- pacted the way people think, communicate, be- tween social groups based on phenotype, ancestry, have, learn, and conduct their daily activities. In culture or religion (Paradies, 2006a) and covers particular, the lack of regulation has made social the categories of race (e.g. black people), ethnic- media an attractive tool for people to express on- ity (e.g. Chinese people), and religion (e.g. jewish line their thoughts, beliefs, emotions and opinions. people) introduced in Silva et al.(2016). However, this transformative potential goes with Racism is often expressed through negative the challenge of maintaining a complex balance and inaccurate stereotypes with one-word epithets between freedom of expression and the defense of (e.g. tiny), phrases (e.g. big nose), concepts (e.g. human dignity (Silva et al., 2016). Indeed, some head bangers), metaphors (e.g. coin slot), and jux- users misuse the medium to promote offensive and tapositions (e.g. yellow cab) that convey hate- hateful language, which mars the experience of ful intents (Warner and Hirschberg, 2012). As regular users, affects the business of online com- such, its automatic identification is a challenging panies, and may even have severe real-life conse- task. Moreover, the racist language is not uni- quences (Djuric et al., 2015). In the latter case, form. First, it highly depends on contextual fea- (Priest et al., 2013; Tynes et al., 2008; Paradies, tures of the targeted community. For example, 2006b; Darity Jr., 2003) evidenced strong corre- anti-african-american messages often refer to un- lations between experiences of racial discrimina- employment or single parent upbringing whereas tion and negative mental health outcomes such as anti-semitic language predominantly makes refer- 926 Proceedings of the The 8th International Joint Conference on Natural Language Processing, pages 926–936, Taipei, Taiwan, November 27 – December 1, 2017 c 2017 AFNLP ence to money, banking, and media (Warner and pose a sentence-level rule-based semantic filter- Hirschberg, 2012). Second, the demographic con- ing approach, which utilizes grammatical relations text of the speaker may greatly influence word among words to remove offensive contents in a choices, syntax, and even semantics (Hovy, 2015). sentence from Youtube comments. Although this For instance, a young female rapper from Aus- may be a valuable work, its scope deviates from tralia may not use the exact same language as an our specific goal, which aims at automatically de- elder male economist from South Africa to express tecting racist online messages. her racists thoughts (Figure1). The first identified studies that can directly match our objectives are proposed by Kwok and Wang(2013) and Warner and Hirschberg(2012). In Kwok and Wang(2013), the authors propose a na¨ıve Bayes classifier based on unigram fea- tures to classify tweets as racist or non-racist. It is important to notice that the standard data sets of racist tweets were selected from Twitter accounts that were self-classified as racist or deemed racist through reputable news sources with regards to anti-Barack-Obama articles. Although this is the very first attempt to deal with Twitter posts, the Figure 1: (Top) Tweet from a 25 years-old Aus- final model is of very limited scope as it mainly tralian rapper. (Bottom) Tweet from a senior South deals with anti-black community racism. Simi- African economist. larly, (Warner and Hirschberg, 2012) propose a To address these issues, we propose to fo- template-based strategy that exclusively focuses cus on demographic features (age, gender, and on the detection of anti-semitic posts from Yahoo!. location) of Twitter users to learn demographic In particular, some efforts were made to propose word embeddings following the ideas of Bam- new feature sets, but with unsuccessful results. man et al.(2014) for geographically situated lan- In Warner and Hirschberg(2012), the authors guage. The distributed representations learned propose a similar idea focusing on the specific from a large corpus containing a great proportion manifestation of anti-semitic posts from Yahoo! of racist tweets are then used to represent tweets and xenophobic urls identified by the Ameri- in a straightforward way and serve as input to can Jewish Congress. In particular, they use a build an accurate supervised classification model template-based strategy to generate features from for racist tweet detection. Different evaluations the part-of-speech tagged corpus augmented with over a gold-standard dataset show that the demo- Brown clusters. Then, a model is generated for graphic (age, gender, location) word embeddings each type of feature template strategy, resulting in methodology achieves F1 score of 76.3 and sig- six SVM classifiers. Surprisingly, the smallest fea- nificantly improves over the classification perfor- ture set comprised of only positive unigrams per- mance of demographic-agnostic models. formed better than bigram and trigram features. Similarly to Kwok and Wang(2013), this study 2 Related Work focuses on a specific community and is of lim- Despite the prevalence and large impact of online ited scope. However, efforts were made to propose hateful speech, there has been a lack of published new features, although with unsuccessful results. works addressing this problem. The first studies Burnap and Williams(2016) is certainly the first on hateful speech detection are proposed by (Xu study to propose a global understanding of hateful and Zhu, 2010; Kwok and Wang, 2013; Warner speech on Twitter including a wide range of pro- and Hirschberg, 2012) but with very limited scope tected characteristics such as race, disability, sex- and basic supervised machine learning techniques. ual orientation and religion. First, they build indi- One of the very first attempts to propose vidual SVM classifiers for each of the four hateful- a computational model that deals with offen- speech categories using combinations of feature sive language in online communities is pro- types: ngrams (1 to 5), slur words and typed- posed by Xu and Zhu(2010). The authors pro- dependencies. Overall, the highest F1 scores are 927 achieved by combining ngrams and hateful terms, mated expansion of the dictionary slightly boosts although the inclusion of typed dependencies re- the performance, but with no statistical signifi- duces false positive rates. In a second step, the cance. To our point of view, this may be due to authors build a data-driven blended model where the non-specificity of the corpus used to produce more than one protected characteristic is taken the word embeddings. at a time and show that hateful speech is a do- Indeed, more successful results are obtained by main dependent problem. The important finding Djuric et al.(2015), who propose to build a spe- of this work is the relevance of the simple dictio- cific paragraph embeddings for hateful speech. In nary lookup feature of slur words. particular, they show a two-step method. First, In Waseem and Hovy(2016), the authors study paragraph2vec (Le and Mikolov, 2014) is used for the importance of demographic features (gender joint modeling of comments and words from Ya- and location) for hateful speech (racism, sexism hoo! comments with the bag-of-words neural lan- and others) classification from Twitter as well as guage model. Then, these embeddings are used proposed to deal with data sparseness by substi- to train a binary classifier to distinguish between tuting word ngrams with character ngram (1 to 4) hateful and clean comments. The results of the lo- features. 10-fold cross validation results with a lo- gistic regression for 5-fold cross-validation show gistic regression show that the demographic fea- improved performance of the continuous represen- tures (individually or combined) do to not lead tation when compared to discrete ngram features. to any improvement, while character-based clas- In this paper, we propose to study further the sification outperforms by at least 5 points the F1 initial “unsuccessful” idea of Waseem and Hovy score of word-based classification.