Detecting Anti-Jewish Messages on Social Media. Building an Annotated

Detecting Anti-Jewish Messages on Social Media. Building an Annotated Corpus That Can Serve as A Preliminary Gold Standard Gunther Jikeli,1 Deepika Awasthi,1 David Axelrod,1 Daniel Miehling,2 Pauravi Wagh,1 and Weejoeng Joeng1 Indiana University,1, Technical University Berlin2 {gjikeli, dawasthi, daaxelro, damieh, pwagh, weejeong}@iu.edu Abstract hate speech from their platforms. Since the “Unite the Hate speech detection in online environments faces Right” rally in Charlottesville, Virginia, in August 2017 and numerous challenges. One of them is that hate speech has the public outcry following the killing of one counter- fundamental target-specific elements. Although certain protestor, major platforms, such as Facebook and Twitter characteristics are common to many forms of hate speech, have suspended a portion of accounts violating their updated forms directed against one group, such as Jews, can be very different from forms directed against Muslims, Roma, terms of service. More systematic suspensions and deletions members of the LGBTQ+ community, and bullying victims. of accounts came after the 2019 attack at two mosques in Due to the heterogeneity of hate forms, we suggest Christchurch, New Zealand where the terrorist killed 51 approaching forms piecemeal and building labeled datasets people and live-streamed the killings. Another push came that are specific to target groups. These datasets can then be after the violent riots on Capitol Hill in January 2021. These combined into an aggregate dataset that increasingly reflects the diversity of hate speech found for a given language. efforts to remove hateful content have been imperfect and it Another challenge is the subjectivity of annotators and has become evident that better mechanisms, improved heterogeneous labeling. algorithms, and more transparency needs to be put in place We created a labeled dataset of 4,137 antisemitic and non- to deal with harmful content on social media. Additionally, antisemitic tweets, using a detailed definition and a specially hate speech detection can be used in multi-dimensional designed annotation portal. The annotation was done by expert annotators who discussed their disagreements of each context analysis to detect extremist narratives used by tweet. The dataset is built on representative samples of tweets terrorists for preventative applications (Kursuncu et al. containing more common keywords (such as “Jews”) and 2020). keywords most likely to be used in antisemitic contexts (such Antisemitism is a core element of ideologies that are closely as the term “kikes”). The dataset will be made available to related to hate speech, such as white nationalism (Ward the scientific community with the publication of this paper and will be updated with additional tweets and labels as the 2019) and jihadism (Rickenbacher 2021). project continues. The paper describes the dataset, the labeling process, the infrastructure that was built for this project, some of the Related Work challenges that we faced, and an evaluation of the inter-coder reliability. The goal is to provide a detailed description of the Zannettou et al. (2020) present a framework for quantitative labeled dataset to serve as a preliminary gold standard and a analyses of online antisemitism. Their quantitative model for creating similar datasets. measurement of slurs and antisemitic memes works well for fringe communities, such as 4chan and Gab, where users 1 frequently use such explicit expressions of hate and where Introduction even the word “Jew” can be a strong indicator of hate Online hate speech has increasingly been the focus of public speech. However, that is less useful for mainstream social debate and social media platforms have pledged to remove media platforms, such as Twitter. For example, many users Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. on mainstream platforms call out others’ use of antisemitic research group is working on a project that is similar to ours. slurs, potentially leading to false-positive classification. Chandra et al. (2021) built a labeled dataset on antisemitism Another approach to hate speech detection is training of 3,102 posts on Twitter and 3,509 posts on Gab (soon to algorithms with labeled datasets (gold standards). More and be published4), focusing on posts that include both images more such gold standards on hate speech are publicly and text and words related to Jews, such as ‘Jewish’, available. However, automated hate speech detection is still ‘Hasidic’, ‘Hebrew’, ‘Semitic’, ‘Judaistic’, ‘israeli’, challenging. The largest publicly available labeled dataset ‘yahudi’, ‘yehudi’, and also slurs. Three annotators labeled on hate speech was created by Gomez et al. (2020).2 150,000 posts as antisemitic or not and classified antisemitic posts in tweets (including each one of 51 terms from Hatebase.org) one of the four categories; political, economic, religious, or were labeled in six categories by three different workers racial antisemitism, and also used the Working Definition of from the crowdsourcing platform Amazon Mechanical Antisemitism by the International Holocaust Remembrance Turk: No attacks against any community, racist, sexist, Alliance (IHRA Definition). (Chandra et al., 2021) One of homophobic, religion-based attacks or attacks against other the main differences to our labeled dataset is that we use communities. They used majority voting to determine the samples from a 2019 and 2020 dataset that includes ten category of each tweet and experimented using hate scores percent of all tweets on a statistically relevant basis. for each tweet to account for the different votes by the three However, time-consuming manual annotation is the annotators instead of binary labels. However, they found bottleneck for most supervised machine learning projects. that a major challenge for their models is the discrepancy Our project on antisemitic tweets is not different in principle between annotations due to subjective judgement. Their best from many other hate speech dataset projects that include performing model, using text and images, had a mean defining a classification schema, labeling guidelines, accuracy of 68.5 percent and an F-score of 0.702, not gathering adequate data, pre-processing this data according outperforming models based on text only. to the task, training experts for labeling, and building a final Davidson et al. (2017) created another large dataset. corpus (Pustejovsky and Stubbs 2012). 24,802 tweets were labeled by at least three human annotators and classified in three groups: hate speech, offensive but not hate speech, or neither offensive nor hate Generating Our Corpus speech. The intercoder-agreement score provided by CF was Our dataset is an aggregate of samples taken from a Twitter 92 percent. They used the majority decision for each tweet database using a 10 percent stream of Twitter data and is to assign a label. Unanimous decisions were considerably managed by a university-based project [reference omitted to lower than 2/3 decisions. Malmasi and Zampieri (2017) meet double blind requirements]. We can query this applied a linear Support Vector Machine (SVM) classifier database, going back 36 months. The database is compliant on 14,509 tweets from the same dataset that are publicly with Twitter policy and removes deleted tweets on a regular 3 available via Crowd- Flower. Based on a 4-gram model, basis. As such, we use live tweets only. The database allows their classifier achieves 78 percent accuracy. us to build subsamples with keywords that are statistically Waseem (2016) created another publicly available dataset representative of all tweets with these keywords. We then of 6,909 labeled tweets. One expert annotator and three manually label the subsamples. amateur annotators classified tweets in four groups: non- To build our preliminary gold standard, we would ideally hate speech, racism, sexism, and both (i.e., racism and use a completely randomized sample of tweets spanning all sexism). Inter-annotator agreement among the amateur variations of antisemitic and non-antisemitic tweets. annotators was k = 0.57. Gambäck and Sikdar (2017) used However, we are limited by time and the number of expert Waseem’s dataset to train four deep learning models (CNN) annotators who manually evaluate tweets. Labeling test but their best performing model, based on word2vec samples showed that the evaluation of one tweet requires 1- embeddings does not substantially outperform on the binary 2 minutes, on average, and that annotators can rarely do classification by Waseem and Hovy (2016), with higher more than 100 tweets per day before quality suffer. As we precision than recall, and a 78.3 percent F-score. wanted to have each tweet labeled by two experts, we aimed Our aim is to build a labeled dataset of high accuracy and for a preliminary dataset of 5,000 tweets. high inter-annotator agreement, targeting a certain form of A completely randomized sample of 5,000 tweets would hate speech (antisemitism) only. We hope that this might result in too few antisemitic messages and would thus have improve performance of models based on this dataset. failed to include many varieties of antisemitic content. We Research on antisemitism is only at the beginning of the therefore opted for keywords that would ensure gathering computational turn (Bruns 2020). However, another 2 4 https://gombru.github.io/2019/10/09/MMHS/ https://github.com/mohit3011/Online-Antisemitism-Detection-Using- 3 https://data.world/crowdflower/hate-speech-identification MultimodalDeep-Learning tweets that are thematically closer to discussions around Annotation Jews. We focused on two keywords that contain the greatest number of tweets related to Jews as a religious, ethnic, or We used a web interface, our “Annotation Portal,” for political community: “Jews” and “Israel.” We then added a labeling our samples. This infrastructure was developed few samples with more targeted keywords likely to generate specifically for this project to improve the quality of a high percentage of antisemitic tweets: “kikes” and annotation. The Portal shows the tweet and a clickable “ZioNazi*”.

Load more