Deeper Attention to Abusive User Content Moderation

Deeper Attention to Abusive User Content Moderation John Pavlopoulos Prodromos Malakasiotis Ion Androutsopoulos Straintek, Athens, Greece Straintek, Athens, Greece Athens University of Economics [email protected] [email protected] and Business, Greece [email protected] Abstract suffer from abusive user comments, which dam- age their reputations and make them liable to fines, Experimenting with a new dataset of 1.6M e.g., when hosting comments encouraging illegal user comments from a news portal and an actions. They often employ moderators, who are existing dataset of 115K Wikipedia talk frequently overwhelmed, however, by the volume page comments, we show that an RNN op- and abusiveness of comments.3 Readers are dis- erating on word embeddings outpeforms appointed when non-abusive comments do not ap- the previous state of the art in moderation, pear quickly online because of moderation delays. which used logistic regression or an MLP Smaller news portals may be unable to employ classifier with character or word n-grams. moderators, and some are forced to shut down We also compare against a CNN operat- their comments sections entirely. ing on word embeddings, and a word-list We examine how deep learning (Goodfellow baseline. A novel, deep, classification- et al., 2016; Goldberg, 2016, 2017) can be em- specific attention mechanism improves the ployed to moderate user comments. We experi- performance of the RNN further, and can ment with a new dataset of approx. 1.6M manually also highlight suspicious words for free, moderated (accepted or rejected) user comments without including highlighted words in the from a Greek sports news portal (called Gazzetta), training data. We consider both fully auto- which we make publicly available.4 This is one matic and semi-automatic moderation. of the largest publicly available datasets of mod- 1 Introduction erated user comments. We also provide word embeddings pre-trained on 5.2M comments from the User comments play a central role in social me- same portal. Furthermore, we experiment on the dia and online discussion fora. News portals ‘attacks’ dataset of Wulczyn et al. (2017), approx. and blogs often also allow their readers to com- 115K English Wikipedia talk page comments la- ment to get feedback, engage their readers, and beled as containing personal attacks or not. build customer loyalty.1 User comments, how- In a fully automatic scenario, there is no moder- ever, and more generally user content can also ator and a system accepts or rejects comments. Al- be abusive (e.g., bullying, profanity, hate speech) though this scenario may be the only available one, (Cheng et al., 2015). Social media are under pres- e.g., when news portals cannot afford moderators, sure to combat abusive content, but so far rely it is unrealistic to expect that fully automatic mod- mostly on user reports and tools that detect fre- eration will be perfect, because abusive comments quent words and phrases of reported posts.2 Wul- may involve irony, sarcasm, harassment without czyn et al. (2017) estimated that only 17.9% of profane phrases etc., which are particularly diffi- personal attacks in Wikipedia discussions were cult for a machine to detect. When moderators followed by moderator actions. News portals also are available, it is more realistic to develop semi- 1 See, for example, http://niemanreports.org/ 3See, e.g., https://www.wired.com/2017/04/ articles/the-future-of-comments/. zerochaos-google-ads-quality-raters and 2 Consult, for example, https://www.facebook. https://goo.gl/89M2bI. com/help/131671940241729 and https://www. 4The portal is http://www.gazzetta.gr/. In- theguardian.com/technology/2017/feb/07/ structions to download the dataset will become available at twitter-abuse-harassment-crackdown. http://nlp.cs.aueb.gr/software.html. 1125 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1125–1135 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics x Dataset/Split Accepted Rejected Total x G-TRAIN-L 960,378 (66%) 489,222 (34%) 1.45M ✓ G-TRAIN-S 67,828 (68%) 32,172 (32%) 100,000 G-DEV 20,236 (68%) 9,464 (32%) 29,700 ✓ ? ? G-TEST-L 20,064 (68%) 9,636 (32%) 29,700 G-TEST-S 1,068 (71%) 432 (29%) 1,500 ? ? x x x G-TEST-S-R 1,174 (78%) 326 (22%) 1,500 x x W-ATT-TRAIN 61,447 (88%) 8,079 (12%) 69,526 W-ATT-DEV 20,405 (88%) 2,755 (12%) 23,160 ✓ ✓ ✓ W-ATT-TEST 20,422 (88%) 2,756 (12%) 23,178 ✓ ✓ ✓ Table 1: Statistics of the datasets used. Figure 1: Semi-automatic moderation. comes for free, i.e., the training data do not con- automatic systems aiming to assist, rather than re- tain highlighted words. We also show that words place the moderators, a scenario that has not been highlighted by the attention mechanism correlate considered in previous work. In this case, com- well with words that moderators would highlight. ments for which the system is uncertain (Fig.1) Our main contributions are: (i) We release a are shown to a moderator to decide; all other com- dataset of 1.6M moderated user comments. (ii) We ments are accepted or rejected by the system. We introduce a novel, deep, classification-specific at- discuss how moderation systems can be tuned, de- tention mechanism and we show that an RNN with pending on the availability and workload of the our attention mechanism outperforms the previous moderators. We also introduce additional evalu- state of the art in user comment moderation. (iii) ation measures for the semi-automatic scenario. Unlike previous work, we also consider a semi- On both datasets (Gazzetta and Wikipedia com- automatic scenario, along with threshold tuning ments) and for both scenarios (automatic, semi- and evaluation measures for it. (iv) We show that automatic), we show that a recurrent neural net- the attention mechanism can automatically high- work (RNN) outperforms the system of Wulczyn light suspicious words for free, without manually et al. (2017), the previous state of the art for com- highlighting words in the training data. ment moderation, which employed logistic regression or a multi-layer Perceptron (MLP), and rep- 2 Datasets resented each comment as a bag of (character or word) n-grams. We also propose an attention We first discuss the datasets we used, to help ac- mechanism that improves the overall performance quaint the reader with the problem. of the RNN. Our attention mechanism differs from most previous ones (Bahdanau et al., 2015; Lu- 2.1 Gazzetta comments ong et al., 2015) in that it is used in a classifi- There are approx. 1.45M training comments (cov- cation setting, where there is no previously gen- ering Jan. 1, 2015 to Oct. 6, 2016) in the Gazzetta erated output subsequence to drive the attention, dataset; we call them G-TRAIN-L (Table1). Some unlike sequence-to-sequence models (Sutskever experiments use only the first 100K comments of et al., 2014). In that sense, our attention is similar G-TRAIN-L, called G-TRAIN-S. An additional set to that of of Yang et al. (2016), but our attention of 60,900 comments (Oct. 7 to Nov. 11, 2016) mechanism is a deeper MLP and it is only applied was split to development (G-DEV, 29,700 com- to words, whereas Yang et al. also have a second ments), large test (G-TEST-L, 29,700), and small attention mechanism that assigns attention scores test set (G-TEST-S, 1,500). Gazzetta’s moderators to entire sentences. In effect, our attention detects (2 full-time, plus journalists occasionally helping) the words of a comment that affect most the clas- are occasionally instructed to be stricter (e.g., dur- sification decision (accept, reject), by examining ing violent events). To get a more accurate view them in the context of the particular comment. of performance in normal situtations, we manu- Although our attention mechanism does not al- ally re-moderated (labeled as ‘accept’ or ‘reject’) ways improve the performance of the RNN, it has the comments of G-TEST-S, producing G-TEST-S- the additional advantage of allowing the RNN to R. The reject ratio is approx. 30% in all subsets, highlight suspicious words that a moderator could except for G-TEST-S-R where it drops to 22%, be- consider to decide more quickly if a comment cause there are no occasions where the moderators should be accepted or rejected. The highlighting were instructed to be stricter in G-TEST-S-R. 1126 than the rejected ones (78% vs. 22%). We also provide 300-dimensional word embeddings, pre-trained on approx. 5.2M comments (268M tokens) from Gazzetta using WORD2VEC (Mikolov et al., 2013a,b).6 This larger dataset cannot be used to directly train classifiers, because most of its comments are from a period (before 2015) when Gazzetta did not employ moderators. 2.2 Wikipedia comments Figure 2: Re-moderated comments with at least The Wikipedia ‘attacks’ dataset (Wulczyn et al., one snippet of the corresponding category. 2017) contains approx. 115K English Wikipedia talk page comments, which were labeled as con- Each G-TEST-S-R comment was re-moderated taining personal attacks or not. Each comment was by five annotators. Krippendorff’s (2004) alpha labeled by at least 10 annotators. Inter-annotator was 0.4762, close to the value (0.45) reported by agreement, measured on a random sample of 1K Wulczyn et al. (2017) for the Wikipedia ‘attacks’ comments using Krippendorff’s (2004) alpha, was dataset. Using Cohen’s Kappa (Cohen, 1960), the 0.45. The gold label of each comment is deter- mean pairwise agreement was 0.4749.

Deeper Attention to Abusive User Content Moderation

``At the End of the Day Facebook Does What It Wants'': How Users

Download Paper

Content Moderation

Recommendations for the Facebook Content Review Board

Online Communities As Semicommons

Online Comment Moderation: Emerging Best Practices a Guide to Promoting Robust and Civil Online Conversation

Facebook's "Oversight Board:" Move Fast with Stable Infrastructure and Humility

Content Moderation

Controlling the Conversation: the Ethics of Social Platforms and Content Moderation

Reimagining Content Moderation and Safeguarding Fundamental Rights a Study on Community-Led Platforms

Arxiv:2101.07183V2 [Cs.SI] 13 Apr 2021 Tially Harmful Information

The Virtues of Moderation