Abusive Language on Social Media Through the Legal Looking Glass

Abusive Language on Social Media Through the Legal Looking Glass Thales Bertaglia1,3, Andreea Grigoriu2, Michel Dumontier 1, and Gijs van Dijck 2 {t.costabertaglia,a.grigoriu,michel.dumontier,gijs.vandijck}@maastrichtuniversity.nl 1Institute of Data Science, Maastricht, The Netherlands 2Maastricht Law and Tech Lab, Maastricht, The Netherlands 3Studio Europa, Maastricht, The Netherlands Abstract Automatic detection tools for detecting abusive language are used for combating online harassment. Abusive language is a growing phenomenon These tools are mainly based on machine learning on social media platforms. Its effects can reach beyond the online context, contributing algorithms that rely on training data. Therefore, to mental or emotional stress on users. Au- there is a need for good quality datasets to create tomatic tools for detecting abuse can allevi- high performing algorithms to alleviate online ha- ate the issue. In practice, developing auto- rassment. There are various datasets in the field mated methods to detect abusive language re- of online harassment research. However, there is lies on good quality data. However, there a lack of standards for developing these resources. is currently a lack of standards for creating These standards include the definitions used to de- datasets in the field. These standards include termine what content is abusive and the steps of the definitions of what is considered abusive language, annotation guidelines and reporting on annotation process (including the annotators). The the process. This paper introduces an anno- lack of standards leads to conflicting definitions, tation framework inspired by legal concepts which ultimately results in disagreement within to define abusive language in the context of the field regarding which tasks to solve, creating online harassment. The framework uses a 7- annotation guidelines, and terminology. point Likert scale for labelling instead of class Our Contribution In this project, we introduce labels. We also present ALYT – a dataset ALYT – a dataset of 20k YouTube comments in En- of Abusive Language on YouTube. ALYT includes YouTube comments in English ex- glish labelled for abusive language. The dataset and 2 tracted from videos on different controversial its data statement are available online . We man- topics and labelled by Law students. The com- ually selected videos focusing on a range of con- ments were sampled from the actual collected troversial topics and included different video types. data, without artificial methods for increasing Rather than artificially balancing the data for abu- the abusive content. The paper describes the sive content, we randomly sampled the collected annotation process thoroughly, including all its data. We developed an annotation framework in- guidelines and training steps. spired by legal definitions by analysing various 1 Introduction European provisions and case law ranging from insults, defamation and incitement to hatred. In- The increased use of social media can worsen the stead of class labels, we use a 7-point Likert scale issue of online harassment. Nowadays, more than to encapsulate the complexity of the labelling de- half of online harassment cases happen on social cisions. We analyse the n-grams in our corpus to media platforms (Center, 2017). A specific popular characterise its content and understand the abusive form of online harassment is the use of abusive language’s nature. The results show that the dataset language. One abusive or toxic statement is being contains diverse topics, targets, and expressions of 1 sent every 30 seconds across the globe . The use abuse. of abusive language on social media contributes to mental or emotional stress, with one in ten people 2 Related Work developing such issues (Center, 2017). Creating dataset for online harassment research, 1https://decoders.amnesty.org/ including abusive language, has been a challeng- projects/troll-patrol/findings. For all links, the content refers to the page version last accessed on 8 2https://github.com/thalesbertaglia/ June 2021. ALYT 191 Proceedings of the Fifth Workshop on Online Abuse and Harms, pages 191–200 August 6, 2021. ©2021 Association for Computational Linguistics ing task. Vidgen and Derczynski(2020) review movement and influencers deciding to become ve- a wide range of datasets – and their creation pro- gan. In WD, the videos illustrate the gender wage cess – within the field and identify many issues. gap and its implications. Various approaches have been explored over the We searched for content in one language; there- years, including different data collection strategies, fore, the videos and majority of the comments are labelling methodologies and employing different in English. We manually searched for videos us- views and definitions of online harassment. ing the topics as keywords. We selected popular In terms of annotation methods, crowdsourc- videos (considering the number of views) made by ing is a popular option for labelling abusive lan- well-known influencers posting controversial con- guage data (Burnap and Williams, 2015; Zhong tent. We included three types of videos: personal et al., 2016; Chatzakou et al., 2017; Ribeiro et al., videos (posted by influencers on the topic), reac- 2017; Zampieri et al., 2019). However, in some tion videos (videos in which the author reacts to instances, a small group of non-experts in harass- another video) and official videos (posted by news ment research (Bretschneider et al., 2014; Mathew and media channels). et al., 2018; van Rosendaal et al., 2020) or domain To create our dataset, we retrieved all comments experts annotate the data (Golbeck et al., 2017; from the selected videos, excluding replies. We re- Waseem and Hovy, 2016). The definitions used to moved comments containing URLs because these label the data can vary as well. At times, defini- are often spam or make reference to external con- tions are derived from literature (Chatzakou et al., tent. We also removed comments with fewer than 2017) on the topic, or existing social media plat- three tokens. In total, we obtained 879,000 com- form’s guidelines (Ribeiro et al., 2017). In other ments after these steps. Out of this sample, we instances, annotators decide by themselves when selected 20,215 to annotate. We randomly sampled abuse is present in the text (Walker et al., 2012). comments from the total distribution, not attempt- A recent direction in the field has been applying ing to balance the data according to content. We legal provisions to decide whether content should aimed to balance video topics and types equally, be removed, given criminal law provisions on hate but as the total number of comments was not even speech or incitement to hatred. These approaches per video category, the final sample was not per- represent legal provisions as decision trees that fectly balanced. Table 1 shows the distribution per guide the annotation process. Zufall et al.(2020) video category of the comments included in the apply this methodology focusing on the German dataset. provision related to incitement to hatred. Two non- expert annotators label the data, guided by the cre- Category % # ated decision tree. The experiments show that there VG 34.75 6967 was little difference between using expert and non- GI 34.46 7024 expert annotators in this case. WD 30.79 6224 3 Data Collection Official 50.31 10171 Personal 31.38 6343 We aimed to include a representative sample of Reaction 18.31 3701 abusive language on social media in our dataset. Therefore, we did not search directly for abusive Table 1: Distribution of comments per video category content. Instead, we chose topics likely to contain abusive comments. We chose three different topics before the video selection: Gender Iden- Collecting Abusive Content tity (GI), Veganism (VG), and Workplace Diversity Searching for keywords related to harassment is a (WD). The topics generate controversial videos on common approach to increase the amount of abu- YouTube while not being limited to one type of con- sive content in datasets. We do not employ any troversy (e.g. gender identity, diet choices, socio- method to balance the data artificially – i.e., we economical issues). The videos in GI focus on the do not try to search for abusive content directly. disclosure of transgender identity and the impact Instead, we randomly select comments from the of transgender people in sports. The videos in the total distribution of comments, resulting in a real- VG category concentrate on describing the vegan istic data sample, similar to what is available on 192 the platform. To compare our sampling approach words. However, the content may still correlate to to keyword search, we conduct two experiments the hashtags or accounts being used to search for comparing our dataset to others. First, we compare tweets. Our approach is similar in the sense that the the final distribution of abusive content. Then, we video topics delimit the scope of the comments, but compare the prevalence of hateful keywords. We the comments are not filtered; thus, they provide use Hatebase3 as a source of keywords, limiting it a representative sample of the entire data. To fur- to the 500 most frequent terms (by the number of ther investigate the prevalence of hateful keywords sightings). on the datasets, we analyse the proportion of con- We compare our dataset to three others, all content that contains at least one term from Hatebase. taining tweets: Davidson et al.(2017) (HSOL), Table 3 presents the results. Waseem and Hovy(2016) (HSHP), and Zampieri et al.(2019) (OLID). Twitter is the most popular Dataset % # social media platform for online harassment re- ALYT 8.81 246 search, so most datasets contain tweets. HSHP is HSOL 87.55 1252 distributed as tweet ids, so all experiments refer HSHP 7.51 204 to the distribution of the tweets we were able to OLID 6.43 263 retrieve in April 2021. These datasets use different definitions of abusive content.

Load more