Arxiv:2108.10724V1 [Cs.CL] 19 Aug 2021

How Hateful are Movies? A Study and Prediction on Movie Subtitles Niklas von Boguszewski ∗ Sana Moin ∗ Anirban Bhowmick ∗ [email protected] [email protected] [email protected] Seid Muhie Yimam Chris Biemann [email protected] [email protected] Language Technology Group Universität Hamburg, Germany Abstract • Hope one of those bitches falls over and breaks her leg. In this research, we investigate techniques to detect hate speech in movies. We introduce a Several sensitive comments on social media new dataset collected from the subtitles of six platforms have led to crime against minorities movies, where each utterance is annotated ei- (Williams et al., 2020). Hate speech can be consid- ther as hate, offensive or normal. We apply transfer learning techniques of domain adapta- ered as an umbrella term that different authors have tion and fine-tuning on existing social media coined with different names. Xu et al.(2012); Hos- datasets, namely from Twitter and Fox News. seinmardi et al.(2015); Zhong et al.(2016) referred We evaluate different representations, i.e., Bag it by the term cyberbully-ing, while Davidson et al. of Words (BoW), Bi-directional Long short- (2017) used the term offensive language to some term memory (Bi-LSTM), and Bidirectional expressions that can be strongly impolite, rude or Encoder Representations from Transformers use of vulgar words towards an individual or group (BERT) on 11k movie subtitles. The BERT model obtained the best macro-averaged F1- that can even ignite fights or be hurtful. Use of score of 77%. Hence, we show that trans- words like f**k, n*gga, b*tch is common in social fer learning from the social media domain is media comments, song lyrics, etc. Although these efficacious in classifying hate and offensive terms can be treated as obscene and inappropriate, speech in movies through subtitles. some people also use them in non-hateful ways Cautionary Note: The paper contains exam- in different contexts (Davidson et al., 2017). This ples that many will find offensive or hateful; makes it challenging for all hate speech systems however, this cannot be avoided owing to the to distinguish between hate speech and offensive nature of the work. content. Davidson et al.(2017) tried to distinguish between the two classes in their Twitter dataset. 1 Introduction These days due to globalization and online me- Nowadays, hate speech is becoming a pressing is- dia streaming services, we are exposed to different sue and occurs in multiple domains, mostly in the cultures across the world through movies. Thus, major social media platforms or political speeches. an analysis of the amount of hate and offensive Hate speech is defined as verbal communication content in the media that we consume daily could that denigrates a person or a community on some be helpful. arXiv:2108.10724v1 [cs.CL] 19 Aug 2021 characteristics such as race, color, ethnicity, gender, Two research questions guided our research: sexual orientation, nationality, or religion (Nock- 1. RQ1 . What are the limitations of social me- leby et al., 2000; Davidson et al., 2017). Some dia hate speech detection models to detect hate examples given by Schmidt and Wiegand(2017) speech in movie subtitles? are: 2. RQ2 . How to build a hate and offensive • Go fucking kill yourself and die already a speech classification model for movie subti- useless ugly pile of shit scumbag. tles? • The Jew Faggot Behind The Financial Col- To address the problem of hate speech detection in lapse. movies, we chose three different models. We have ∗ Equal contribution used the BERT (Devlin et al., 2019) model, due to the recent success in other NLP-related fields, a Bi- racist and homophobic tweets were classified as LSTM (Hochreiter and Schmidhuber, 1997) model hate speech, whereas the ones involving sexist and to utilize the sequential nature of movie subtitles abusive words were classified as offensive. It is one and a classic Bag of Words (BoW) model as a of the datasets we have used in exploring transfer baseline system. learning and model fine-tuning in our study. The paper is structured as follows: Section2 Due to global events, hate speech also plagues gives an overview of the related work in this topic online news platforms. In the news domain, con- and Section3 describes the research methodology text knowledge is required to identify hate speech. and the annotation work, while in Section4 we dis- Lei and Ruihong(2017) conducted a study on a cuss the employed datasets and the pre-processing dataset prepared from user comments on news arti- steps. Furthermore, Section5 describes the imple- cles from the Fox News platform. It is the second mented models while Section6 presents the evalu- dataset we have used to explore transfer learning ation of the models, the qualitative analysis of the from the news domain to movie subtitles in our results and the annotation analysis followed by Sec- study. tion7, which covers the threats to the validity of Several other authors have collected the data our research. Finally, we end with the conclusion from different online platforms and labeled them in Section8 and propose further work directions in manually. Some of these data sources are: Twit- Section9. ter (Xiang et al., 2012; Xu et al., 2012), Instagram (Hosseinmardi et al., 2015; Zhong et al., 2016), 2 Related Work Yahoo! (Nobata et al., 2016; Djuric et al., 2015), YouTube (Dinakar et al., 2012) and Whisper (Silva Some of the existing hate speech detection models et al., 2021) to name a few. Most of the data sources classify comments targeted towards certain com- used in the previous studies are based on social me- monly attacked communities like gay, black, and dia, news, and micro-blogging platforms. However, Muslim, whereas in actuality, some comments did the notion of the existence of hate speech in movie not have the intention of targeting a community dialogues has been overlooked. Thus in our study, (Borkan et al., 2019; Dixon et al., 2018). Mathew we first explore how the different existing ML (Ma- et al.(2021) introduced a benchmark dataset con- chine Learning) models classify hate and offensive sisting of hate speech generated from two social speech in movie subtitles and propose a new dataset media platforms, Twitter and Gab. In the social compiled from six movie subtitles. media space, a key challenge is to separately identify hate speech from offensive text. Although 3 Research Methodology they might appear the same way semantically, they have subtle differences. Therefore they tried to To investigate the problem of detecting hate and solve the bias and interpretability aspect of hate offensive speech in movies, we used different ma- speech and did a three-class classification (i.e., chine learning models trained on social media con- hate, offensive, or normal). They reported the best tent such as tweets or discussion thread comments macro-averaged F1-score of 68.7% on their BERT- from news articles. Here, the models in our re- HateXplain model. It is also one of the models search were developed and evaluated on an in- that we use in our study, as it is one of the ‘off-the- domain 80% train and 20% test split data using shelf‘ hate speech detection models that can easily the same random state to ensure comparability. be employed for the topic at hand. We have developed six different models: two Bi- Lexicon-based detection methods have low pre- LSTM models, two BoW models, and two BERT cision because they classify the messages based on models. For each pair, one of them has been trained the presence of particular hate speech-related terms, on a dataset consisting of Twitter posts and the particularly those insulting, cursing, and bullying other on a dataset consisting of Fox News discus- words. Davidson et al.(2017) used a crowdsourced sion threads. The trained models have been used hate speech lexicon to identify tweets with the oc- to classify movie subtitles to evaluate their perfor- currence of hate speech keywords to filter tweets. mance by domain adaptation from social media They then used crowdsourcing to label these tweets content to movies. In addition, another state-of- into three classes: hate speech, offensive language, the-art BERT-based classification model called Ha- and neither. In their dataset, the more generic teXplain (Mathew et al., 2021) has been used to classify the movies out of the box. While it is also Dataset normal offensive hate # total possible to further fine-tune the HateXplain model, Twitter 0.17 0.78 0.05 24472 we are restricted in reporting the result of the ’off- Fox News 0.72 - 0.28 1513 the-shelf’ classification system to new domains, Movies 0.84 0.13 0.03 10688 such as movie subtitles. Furthermore, the movie dataset we have col- Table 1: Class distribution for the different datasets lected (see Section4) is used to train domain- specific BoW, Bi-LSTM, and BERT models using Name normal offensive hate #total 6-fold cross-validation, where each movie was se- Django Unchained 0.89 0.05 0.07 1747 lected as a fold and report the averaged results. Fi- BlacKkKlansman 0.89 0.06 0.05 1645 American History X 0.83 0.13 0.03 1565 nally, we have identified the best model trained on Pulp Fiction 0.82 0.16 0.01 1622 social media content based on macro-averaged F1- South Park 0.85 0.14 0.01 1046 score and fine-tuned it with the movie dataset using The Wolf of Wall Street 0.81 0.19 0.001 3063 6-fold cross-validation on that particular model, to Table 2: Class distribution on each movie investigate fine-tuning and transfer learning capa- bilities for hate speech on movie subtitles.

Load more