Detecting Abusive Language on Online Platforms: a Critical Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Detecting Abusive Language on Online Platforms: A Critical Analysis Preslav Nakov1,2∗ , Vibha Nayak1 , Kyle Dent1 , Ameya Bhatawdekar3 Sheikh Muhammad Sarwar1,4 , Momchil Hardalov1,5, Yoan Dinkov1 Dimitrina Zlatkova1 , Guillaume Bouchard1 , Isabelle Augenstein1,6 1CheckStep Ltd., 2Qatar Computing Research Institute, HBKU, 3Microsoft, 4University of Massachusetts, Amherst, 5Sofia University, 6University of Copenhagen {preslav.nakov, vibha, kyle.dent, momchil, yoan.dinkov, didi, guillaume, isabelle}@checkstep.com, [email protected], [email protected], Abstract affect not only user engagement, but can also erode trust in the platform and hurt a company’s brand. Abusive language on online platforms is a major Social platforms have to strike the right balance in terms societal problem, often leading to important soci- of managing a community where users feel empowered to etal problems such as the marginalisation of un- engage while taking steps to effectively mitigate negative ex- derrepresented minorities. There are many differ- periences. They need to ensure that their users feel safe, their ent forms of abusive language such as hate speech, personal privacy and information is protected, and that they profanity, and cyber-bullying, and online platforms do not experience harassment or annoyances, while at the seek to moderate it in order to limit societal harm, same time feeling empowered to share information, experi- to comply with legislation, and to create a more in- ences, and views. Many social platforms institute guidelines clusive environment for their users. Within the field and policies to specify what content is considered inappropri- of Natural Language Processing, researchers have ate. As manual filtering is hard to scale, and can even cause developed different methods for automatically de- post-traumatic stress disorder-like symptoms to human anno- tecting abusive language, often focusing on specific tators, there have been many research efforts to develop tools subproblems or on narrow communities, as what and technology to automate some of this effort. is considered abusive language very much differs The key feature of offensive language in online dialog is by context. We argue that there is currently a di- that it is harmful either to its target, the online community chotomy between what types of abusive language where it occurs, or the platform hosting the conversation. online platforms seek to curb, and what research ef- The degree of harm is a factor and might range from hate forts there are to automatically detect abusive lan- speech and cyber-bullying with extreme deleterious effects, guage. We thus survey existing methods as well over slightly less damaging derogatorylanguage and personal as content moderation policies by online platforms insults, to profanity and teasing, which might even be consid- in this light, and we suggest directions for future ered acceptable in some communities. This spectrum poses work. challenges for clear labeling of training data, as well as for computational modeling of the problem. Several studies have considered the application of 1 Introduction computational methods to deal with offensive lan- Online harm is not new. Groups and individuals who have guage, particularly for English [Davidson et al., 2017; Basile et al., 2019; Fortuna and Nunes, 2018]. How- arXiv:2103.00153v1 [cs.CL] 27 Feb 2021 been targets of abusive language have suffered real harms for many years. The problem has persisted over time and may ever, these efforts tend to constrain their scope to one well be growing. Thus, abusive language in social media is of medium, to a single or to just a few subtasks, or to particular concern to online communities, governments, and only limited aspects of the problem. Prior work has especially social media platforms. studied offensive language on Twitter [Xu et al., 2012; While combating abuse is a high priority, preserving indi- Burnap and Williams, 2015; Davidson et al., 2017; viduals’ rights to free expression is also vital, making the task Wiegand et al., 2018], in Wikipedia comments,1 and in of moderating conversations particularly difficult. Some form Facebook posts [Kumar et al., 2018]. The task is usually of moderation is clearly required. Platform providers face modeled as a supervised classification problem. The very challenging technical and logistical problems in limiting models are trained on posts annotated according to the abusive language, while at the same time allowing a level of presence of some type of abusive or offensive content. free speech which leads to rich and productive online con- Examples of such types content include hate speech versations. Negative interactions experienced by users pose [Davidsonetal.,2017; MalmasiandZampieri,2017; a significant risk to these social platforms and can adversely 1challengehttps://www.kaggle.com/c/jigsaw-toxic-comment- ∗Contact Author classification-challenge Platform Type Example • Misinformation is always referenced as false news in Facebook’s Community Guidelines. SocialMedia Twitter,Facebook,Instagram Online Marketplace Amazon, PinDuoDuo, Depop • Even though Facebook covers most areas which could Dating Bumble,Tinder,Hinge be subjected to abusive language, the coverage of med- Video Community TikTok, Triller, Clash ical advice is very broadly mentioned in its policies. It Forum Patient, BG Mamma, Reddit is unclear whether individuals are free to share medical Gaming Community Twitch, DLive, Omlet Arcade advice to others in their posts. For Twitter4: Table 1: Classification of online platforms. • Theterm animal abuse is not explicitly mentioned in the policy. It is implicitly mentioned under Twitter’s Sen- Malmasi and Zampieri, 2018], abusive language sitive media policy, where graphic and sexual content [Founta et al., 2018], toxicity [Georgakopoulos et al., 2018], featuring animals is not prohibited. cyber-bullying [Dinakar et al., 2011], and aggression • Hate speech is referred to as hateful conduct in Twitter’s [Kumar et al., 2018]. content policy. While there have been several surveys on offensive lan- • Only COVID-19 related medical advice is prohibited on guage, hate speech, and cyber-bullying, none of them have Twitter. focused on what platforms need vs. what technology has 5 to offer. For example, [Schmidt and Wiegand, 2017] and For Google : [Fortuna and Nunes, 2018] both surveyed automated hate • Google’s terms of service covers only basic guidelines speech detection, but focused primarily on the features on acceptable conduct on the platform. The more spe- which have been shown to be most effective in classi- cific clauses against hate speech, bullying and harass- fication systems. [Salawu et al., 2020] provided an ex- ment are covered in service-specific policies as indicated tensive survey of NLP for detecting cyber-bullying, and in Table 2. [Vidgen and Derczynski, 2021] did important work to cata- • There are no specific clauses addressing incidents of re- log abusive language training data, which serves as a pre-step venge porn and sexual abuse (adults). It is also unclear for system solutions. Here, we aim to bridge the gap between whether they fall under the category of illegal activities. this work and platform solution requirements. Amazon and Apple offer very different services in com- parison to Facebook, Twitter and Google, which is reflected 2 Requirements by Online Platforms in the clauses covered in their terms of service. 6 Below, we explore how abusive language differs from one For Apple : online platform to another, and what content moderation poli- • Apple’s policies very broadly mention clauses in re- cies have been put in place accordingly.2 sponse to Dangerous people, Glorifying crime, Illegal Online platforms is a broad term which represents various goods, Child sexual abuse, Sexual abuse (Adults), An- categories of providers such as social media, online market- imal abuse, Human trafficking under illegal acts. Vio- places, online video communities, dating websites and apps, lence falls under threatening and intimidating posts. support communities or forums, and online gaming commu- • There are no clauses in the policy which address Med- nities, among others. Each of these platforms is governed by ical advice, Spam, Sexual solicitation, Revenge porn, their own content moderation policies and has its own defini- Graphic content, and Self-Harm. tions of what constitutes abusive language. 7 Table 1 shows a classification of online platforms by type. For Amazon : Facebook, Instagram and Twitter represent big social media • Dangerous organization and people, Glorifying crime, platforms, while Amazon, TikTok, Patient and Twitch are ex- Illegal goods, Child sexual abuse, Sexual abuse (Adults), amples of platforms focusing on specific products. Animal abuse, Human trafficking are broadly mentioned in clauses pertaining to Illegal. 2.1 The Big Tech • Revenge porn, Graphic content, Nudity and pornogra- The big social media platforms, also commonly known as phy are broadly mentioned in clauses pertaining to ob- ‘Big Tech’, have stringent content moderation policies and scenity. the most advanced technology to help detect abusive lan- • There are no policies in place which directly address guage. We summarize these policies in Table 2. Medical advice, Misinformation, Sexual solicitation, We can see a lot of overlap, but also some differences in and Self-Harm. these policies. 4 For Facebook3: https://help.twitter.com/en/rules-and-policies/twitter-rules 5https://policies.google.com/terms?hl=en