arXiv:2011.10280v1 [cs.CL] 20 Nov 2020 K a eevdassandfcs hl h edhsmd unden [ made attacks has adversarial field to the susceptibility and While [5] plat focus. classifiers media sustained social a various received in has sexual research gender, ethnicity, detection color, speech race, Hate of basis challeng the research on group widely-studied a detection, speech Hate detectors. speech hate thi for In examples -foc popular adversarial it? highly on to five pattern by general hosted any videos YouTube there 8,818 is discussio and chess happen benign that over does trip inci may this classifiers internet, speech the hate over speech filtering and monitoring otx fces othmteban the him cost chess, of context o enttdi 4hus n oTb intpoieaysp circumstances any political provide current didn’t YouTube the under and that hours, Grandma speculated 24 on c in podcast it reinstated chess because got blocked a got presenting Channel) Chess while (Agadmator’s handle 2020, 28, June On Introduction 1 https://www.cs.cmu.edu/~akhudabu/Chess.html eywords A 1 3 2 hsppri cetda AI22 suetasrc) h d The abstract). (student 2021 AAAI at accepted is paper This https://www.thesun.co.uk/news/12007002/chess-champ- https://www.bbc.com/news/world-us-canada-53273381 RE oTb hnes eaktefloigrsac question: research following the ask we channels, YouTube channel the and reason, specific further give not did YouTube ecnld iha nrgigaaoyrsl nrca bias racial on result analogy intriguing an with conclude We rae hleg fclrpolysemy. color of challenge broader nJn 8 00 hl rsnigacespdato Grandma on podcast chess a presenting while Radi 2020, nio 28, June On u f6195cmet,o ,1 oTb ieshse yfiv by hosted ban videos YouTube temporary 8,818 this on him comments, earned 681,995 chess, of of pus context the in albeit lsiest u-fdmi desra examples? ments adversarial out-of-domain to classifiers Radi ever, aeSpeech Hate C aln blKlmAa nvriyo Technology of University Azad Kalam Abul Maulana HESS 1 hr xsight pehcasfir icasfidbenign misclassified classifiers speech hate existing where ’ oTb adegtbokdbcuei otie “harmfu contained it because blocked got handle YouTube c’s ´ pcltdta ie h urn oiia iuto,ar a situation, political current the given that speculated c ´ [email protected] · D Chess ISCUSSIONS ua Sarkar Rupak · Racism 3 h wf orecreto yYuuenotwithstanding, YouTube by course-correction swift The . S PEECH oebr2,2020 23, November . A R P A BSTRACT ]aewell-documented. are 6] ACIST REPRINT 2 eerlt bakaantwie nacmltl differe completely a in white” against “black to referral a , D etrie nipratquestion: important an raises dent retto,ntoaiy eiin rohrcharacteris other or religion, nationality, orientation, ,sest eetcmuiaindsaaigapro ra or person a disparaging communication detect to seeks e, om uha aeok[] wte 3 n oTb [4] YouTube and [3] [2], Facebook as such forms s icasfigte sht speech? hate as them misclassifying ns, ae,vaasbtnilcru f6195cmet on comments 681,995 of corpus substantial a via paper, s ATA nand“amu n agru”cnet h channel The content. dangerous” and “harmful ontained erlaeadt e f100anttdcom- annotated 1,000 of set data a release We sdYuuecanl,w eotorogigresearch ongoing our report we channels, YouTube used cfi esnfrti eprr a.Hwvr Radi However, ban. temporary this for reason ecific -podcast-race-ban/ al rges h oansniiiyo aespeech hate of domain-sensitivity the progress, iable A ? trHkr aaua noi Radi Antonio Nakamura, Hikaru ster o outaeoftesefhate-speech off-the-shelf are robust how S ihorfidnspitn u othe to out pointing findings our with nti ae,vaasbtnilcor- substantial a via paper, this In . ET o enttdwti 4hus How- hours. 24 within reinstated got N hs icsin sht speech. hate as discussions chess siu .KhudaBukhsh R. Ashiqur A fra o“lc gis white”, against “black to eferral ihypplrchess-focused popular highly e angeMlo University Mellon Carnegie trHkr aaua Anto- Nakamura, Hikaru ster DVERSARIAL [email protected] n agru”content. dangerous” and l t e saalbeat available is set ata si osbeta current that possible it is nteaeo AI of age the in fys o often how yes, If H ’ YouTube c’s ´ ATE is[1]. tics nt c ´ Are Chess Discussions Racist? An Adversarial Hate Speech Data Set APREPRINT

In this paper, we explore the domain of discussions where white and black are always adversaries; kills , captures , threatens , and attacks each other’s pieces; and terms such as Indian defence, Marshall attack are common occurrences. Our primary contribution is an annotated data set of 1,000 comments verified as not hate speech by human annotators that are incorrectly flagged as hate speech by existing classifiers.

2 Data Set and Hate Speech Classifiers

We consider five chess-focused YouTube channels listed in Table 1. We consider all videos in these channels uploaded on or before July 5, 2020, and use the publicly available YouTube API to obtain comments from these 8,818 videos. Our data set consists of 681,995 comments (denoted by Dchess ) posted by 172,034 unique users.

YouTube handles #Subscribers #Videos #Comments Agadmator’s Chess Channel 772k 1,780 420,350 MatoJelic 143k 2,976 126,032 Chess.com 388k 2,189 61,472 John Bartholomew 140k 1,619 589,38 GM Benjamin Finegold 59.3k 254 15,203

Table 1: List of YouTube channels considered.

Mtwitter Mstormfront [3] BERT trained on [7] Fraction of positives 1.25% 0.43% Human evaluation 5% (true positive) 15% (true positive) on predicted positives 87% (false positive) 82% (false positive) 8% (indeterminate) 3% (indeterminate)

Table 2: Classifier performance on Dchess .

That is one of the most beautiful attacking sequences I have ever seen, black was always on the back foot. Thank you for sharing. Seeing your channel one day in my recommended got me playing chess again after 15 years. All the best. At 7:15 of the video Agadmator shows what happens when Black goes for the . While this may be the most interesting move, the strongest continuation for Black is Kg4. as Agadmator states, White is still winning. But Black can prolong the agony for quite a while. White’s attack on Black is brutal. White is stomping all over Black’s defenses. The Black King is gonna fall... That end games looks like a draw to me... its hard to see how it’s winning for white. I seems like black should be able to block whites advance. ...he can still put up a fight (i dont see any immediate threat black can give white as long as white can hold on to the e-rook)

Table 3: Random samples of misclassified hate speech.

We consider two hate speech classifiers: (1) an off-the-shelf hate speech classifier [3] trained on twitter data (denoted by Mtwitter ); anda (2) a BERT-based classifier trained on annotated data from a white supremacist forum [7] (denoted as Mstormfront ).

3 Results

We run both classifiers on Dchess . We observe that only a minuscule fraction of comments are flagged as hate speech by our classifiers (see, Table 2). We next manually annotate randomly sampled 1,000 such comments marked as hate speech by at least one or more classifiers. Overall, we obtain 82.4% false positives, 11.9% true positives, and 5.7% as indeterminate. High false positive rate indicates that without a human in the loop, relying on off-the-shelf classifiers’ predictions on chess discussions can be misleading. We next evaluate individual false positive performance by manually annotating 100 randomly sampled comments marked as hate speech by each of the classifiers. We find that Mtwitter has slightly higher false positive rate than Mstormfront . Also, Mstormfront flags substantially fewer comments as hate speech than Mtwitter . Since Mstormfront is trained on a white supremacist forum data set,

2 Are Chess Discussions Racist? An Adversarial Hate Speech Data Set APREPRINT

perhaps this classifier has seen hate speech targeted at the black community on a wider range of contexts. Hence, corroborating to the well-documented domain-sensitivity of hate speech classifiers, Mstormfront performs slightly better than Mtwitter trained on a more general hate speech twitter data set. Table 3 lists a random sample of false positives from Mstormfront and Mtwitter . We note that presence of words such as black, white, attack, threat possibly triggers the classifiers.

4 Black is to Slave as White is to?

We conclude our paper with a simple yet powerful observation. Word associations test (e.g., France:Paris::Italy:Rome) on Skip-gram embedding spaces [8] are well-studied. Social biases in word embedding spaces is a well-established phenomenon [9] with several recent lines of work channelised to debiasing efforts [10]. We perform the following analogy test: black:slave::white:?, on Dchess and two data sets containing user discussions on YouTube videos posted on official channels of Fox News (Dfox ) and CNN (Dcnn ) in 2020. While both Dfox and Dcnn predict slavemaster, Dchess predicts slave! Hence, the captures , fights , tortures and killings notwithstanding, over the 64 black and white squares, the two colors attain a rare equality the rest of the world is yet to see.

References

[1] John T. Nockleby. Hate speech. Encyclopedia of the American constitution, 3(2):1277–1279, 2000. [2] Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), pages 86–95, 2017. [3] Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. Automated hate speech detection and the problem of offensive language. In ICWSM 2017, pages 512–515, 2017. [4] Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. Common sense rea- soning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(3):18, 2012. [5] Aymé Arango, Jorge Pérez, and Barbara Poblete. Hate speech detection is not as easy as you may think: A closer look at model validation. In SIGIR, pages 45–54, 2019. [6] Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, and N Asokan. All you need is" love" evading hate speech detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 2–12, 2018. [7] Ona de Gibert, Naiara Perez, Aitor García-Pablos, and Montse Cuadros. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 11–20, 2018. [8] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013. [9] Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16):E3635–E3644, 2018. [10] Thomas Manzini, Lim Yao Chong, Alan W Black, and Yulia Tsvetkov. Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In NAACL-HLT, pages 615–621, June 2019.

3