A Just and Comprehensive Strategy for Using NLP to Address Online Abuse

David Jurgens Eshwar Chandrasekharan Libby Hemphill University of Michigan Georgia Tech University of Michigan School of Information School of Interactive Computing School of Information [email protected] [email protected] [email protected]

Abstract

Online abusive behavior affects millions and Microagression the NLP community has attempted to mitigate this problem by developing technologies to de- Condescension Insults Hate tect abuse. However, current methods have Frequency Promoting speech Physical Self Harm largely focused on a narrow definition of abuse Threats Doxxing to detriment of victims who seek both vali- Risk of Physical Danger dation and solutions. In this position paper, we argue that the community needs to make Figure 1: Abusive behavior online falls along a spec- three substantive changes: (1) expanding our trum, and current approaches focus only on a narrow scope of problems to tackle both more subtle range (shown in red text), ignoring nearby problems. and more serious forms of abuse, (2) develop- Impact comes from both the frequency (on left) and ing proactive technologies that counter or in- real-world consequences (on right) of behaviors. This hibit abuse before it harms, and (3) reframing figure illustrates the spectrum of online abuse inan our effort within a framework of justice to pro- hypothetical manner, with its non-exhaustive examples mote healthy communities. inspired from prior surveys of online experiences (Dug- gan, 2017; Salminen et al., 2018). 1 Introduction Online platforms have the potential to enable substantial, prolonged, and productive engagement As Figure1 shows, a large spectrum of abu- for many people. Yet, the lived reality on social sive behavior exists—some with life-threatening media platforms falls far short of this potential consequences—much of which is currently unad- (Papacharissi, 2004). In particular, the promise of dressed by language technologies. Explicitly hate- social media has been hindered by antisocial, abu- ful speech is just one tool of hate, and related tac- sive behaviors such as harassment, hate speech, tics such as rape threats, gaslighting, First Amend- trolling, and the like. Recent surveys indicate that ment panic, and veiled insults are effectively em- abuse happens much more frequently than many ployed both off- and online to silence, scare, and people suspect (40% of Internet users report be- exclude participants from what should be inclu- ing the subject of online abuse at some point), sive, productive discussions (Filipovic, 2007). and members of underrepresented groups are tar- In this position paper, we argue that to pro- geted even more often (Herring et al., 2002; Drake, mote healthy online communities, three changes 2014; Anti-Defamation League, 2019). are needed. First, the NLP community needs to The NLP community has responded by de- rethink and expand what constitutes abuse. Sec- veloping technologies to identify certain types ond, current methods are almost entirely reactive of abuse and facilitating automatic or computer- to abuse, entailing that harm occurs. Instead, the assisted content moderation. Current technol- community needs to develop proactive technolo- ogy has primarily focused on overt forms of abu- gies that assist authors, moderators, and platform sive language and hate speech, without consid- owners in preventing abuse before it occurs. Fi- ering both (i) the success and failure of tech- nally, we argue that both of these threads point to nology beyond getting the classification correct, a need for a broad re-aligning of our community and (ii) the myriad forms that abuse can take. goals towards justice, rather than simply the elim- ination of abusive behavior. In arguing for these abuse that NLP could be applied to solve. First, changes, we outline how each effort offers new such behaviors do not necessarily adopt the lan- challenging NLP tasks that have concrete benefits. guage of hate speech or more common forms of hate speech and may in some contexts appear in- 2 Rethinking What Constitutes Abuse nocuous but are clearly dangerous in others. For example, posting a phone number to call could The classifications we adopt and computation- be acceptable if one is encouraging others to call ally enforce have real and lasting consequences their political representative, yet would be a seri- by defining both what is and what is not abuse ous breach of privacy (doxxing) if posted as part of (Bowker and Star, 2000). Abusive behavior is a public harassment campaign. Similarly, declara- an omnibus term that often includes harassment, tions of “keep up the weight loss!” may be positive threats, racial slurs, sexism, unexpected porno- in a dieting community, yet reinforce dangerous graphic content, and insults—all of which can behavior in a pro-anorexia community. Speech be directed at other users or at whole communi- that in isolation appears offensive, such as impo- ties (Davidson et al., 2017; Nobata et al., 2016). liteness or racial slurs, may serve pro-social func- However, NLP has largely considered a far nar- tions such as promoting intimacy (Culpeper, 1996) rower scope of what constitutes abuse through its or showing camaraderie (Allan, 2015). selection of which types of behavior to recog- Second, behaviors such as swatting, human traf- nize (Waseem et al., 2017; Schmidt and Wiegand, ficking, or pedophilia have all occurred on pub- 2017; Fortuna and Nunes, 2018). We argue that lic social media platforms (Jaffe, 2016; Latonero, NLP needs to expand its computational efforts to 2011; Holt et al., 2010). However, methods have recognize two additional general types of abuse: yet to be developed for recognizing when users (a) infrequent and physically dangerous abuse, and are engaging in these behaviors, which may in- (b) more common but subtle abuse. Additionally, volve coded language, and require recognizing we need to develop methods that respect commu- these alternative forms. Current approaches for nity norms in classification decisions. These cate- learning new explicitly-hateful symbols could be gories of abuse and the importance of community adapted to this task (e.g., Roy, 2016; Gao et al., norms have been noted elsewhere (Liu et al., 2018; 2017). Third, online platforms have been used to Guberman and Hemphill, 2017; Salminen et al., incite mobs of people to violence (Siegel, 2015). 2018; Blackwell et al., 2017) but have not yet re- These efforts often use incendiary fake news that ceived the same level of attention in NLP. plays upon factional rivalries (Samory and Mitra, Who has a right to speak and in what manner are 2018). Abusive language detection methods can subjective decisions that are guided by social rela- build upon recent advances at detecting fake news tionships (Foucault, 1972; Noble, 2018), and the to identify content-sharing likely to lead to vio- specific choices our algorithms make about what lence (McLaughlin, 2018; Oshikawa et al., 2018). speech to allow and what to silence have pow- erful effects. For instance, rejecting behavior as 2.2 Subtle Abuse not being abusive because it is outside the scope of our classification can cause substantial harm to Many forms of abusive behavior are linguistically victims (Blackwell et al., 2017), tacitly involving subtle and implicit. Behaviors such as condescen- the NLP community in algorithmic bias that sanc- sion, minimization (e.g., “your situation isn’t that tions certain forms of abuse. Thus, categorization bad”), benevolent stereotyping, and microagres- is particularly thorny: a broad categorization is sions are frequently experienced by members of likely too computationally inefficient, yet a narrow minority social groups (Sue et al., 2007; Glick and categorization risks further marginalizing affected Fiske, 2001). While subtle, such abuse can still community members and can lead to lasting harm. be as emotionally harmful as overt abuse to some Following, we outline three key directions for the individuals (Sue, 2010; Nadal et al., 2014). The community to expand its definitions. NLP community has two clear paths for growth into this area. 2.1 Physically Threatening Online Abuse First, although recognized within the larger We outline three computational challenges related NLP abuse typology (Waseem et al., 2017), only a to infrequent but overt physically-manifesting handful of approaches have attempted these problems, such as identifying benevolent sexism (Jha 2018). Such community-specific norms and con- and Mamidi, 2017), and new methods must be de- text are important to take into account, as NLP re- veloped to identify the implicit signals. Successful searchers are doubling down on context-sensitive approaches will likely require advances in natu- approaches to define (e.g., Chandrasekharan and ral language understanding, as the abuse requires Gilbert, 2019) and detect abuse (e.g., Gao and reasoning about the implications of the proposi- Huang, 2017). tions. A notable example of such an approach is However, not all community norms are so- Dinakar et al.(2012) who extract implicit assump- cially acceptable within the broader world. Even tions in statements and use common sense reason- behavior considered harmful in one community ing to identify social norm violations that would might be celebrated in another, e.g., Reddit’s be considered insults. r/fatpeoplehate (Chandrasekharan et al., 2017), Second, new methods should identify dispar- and Something Awful Forums (Pater et al., 2014). ity in treatment of social groups. For exam- The existence of problematic normative behaviors ple, in a study of the respectfulness of police within certain atypical online communities poses language, Voigt et al.(2017) found that officers a challenge to abuse detection systems. Fraser were consistently less likely to use respectful lan- (1990) notes that when a public space is governed guage with black community members than with by a dominant group, its norms about participation white community members—a disparity in a pos- end up perpetuating inequalities. One approach to itive social dimension. As NLP solutions have address this challenge would be to work closely been developed for other social dimensions of with the different stakeholders involved in online language such as politeness (Danescu-Niculescu- governance, like platform administrators, policy Mizil et al., 2013; Munkova et al., 2013; Chhaya makers, users and moderators. This will enable et al., 2018) and formality (Brooke et al., 2010; the development of solutions that cater to a wider Sheikha and Inkpen, 2011; Pavlick and Tetreault, range of expectations around moderating abusive 2016), these methods could be readily adapted for behaviors on the platform, especially when deal- identifying such systematic bias for additional so- ing with deviant communities. cial categories and settings. 2.4 Challenges for Creating New NLP 2.3 Community Norms Need to be Respected Shared Tasks on Abusive Behavior Social norms are rules and standards that are un- Shared tasks have long been an NLP tradition derstood by members of a group, and that guide for establishing evaluating metrics, defining data and constrain social behavior without the force of guidelines, and, more broadly, bringing together laws (Triandis, 1994; Cialdini and Trost, 1998). researchers. The broad nature of abusive behavior Norms can be nested, in that they can be adopted creates significant challenges for the shared task from the general social context (e.g., use of pejo- paradigm. Here, we outline three opportunities rative adjectives are rude), and more general infor new shared tasks in this area. First, new NLP ternet comment etiquette (e.g., using all caps is shared tasks should develop annotation guidelines equivalent to shouting). Yet, norms for what is accurately define what constitutes abusive behav- considered acceptable can vary significantly from ior in the target community. Recent works have one community to another, making it challenging begun to make progress in this area by modeling to build one abuse detection system that works for the context in which a comment is made through all communities (Chandrasekharan et al., 2018). user and community-level features (Qian et al., Current NLP methods are largely context- and 2018; Mishra et al., 2018; Ribeiro et al., 2018), norm-agnostic, which leads to situations where yet often the norms in these settings are implicit content is removed unnecessarily when deemed making it difficult to transfer the techniques and inappropriate (i.e., false positives), eroding com- models to other settings. As one potential solu- munity trust in the use of computational tools to tion, Chandrasekharan et al.(2018) studied com- assist in moderation. A common failure mode for munity norms on Reddit in a large-scale, data- sociotechnical interventions like automated mod- driven manner, and released a dataset of over 40K eration is failing to understand the online com- removed comments from Reddit labeled according munity where they are being deployed (Krishna, to the specific type of norm being violated (Chan- drasekharan and Gilbert, 2019). bot account would reply with a fixed comment Second, new NLP shared tasks must address about the harm such language caused and an ap- the data scarcity faced by abuse detection research peal to empathy, leading to long-term behavior while minimizing harm caused by the data. Con- change in the offenders. Identifying how to best stant exposure to abusive content has been found respond to abusive behavior—or whether to re- to negatively and substantially affect the mental spond at all—are important computational next health of moderators and users (Roberts, 2014; steps for this NLP strategy and one that likely Gillespie, 2018; Saha et al., 2019). However, la- needs to be done in collaboration with researchers beled ground truth data for building and evaluating from fields such as Psychology. Prior work has classifiers is hard to obtain because platforms typ- shown counter speech to be effective for limit- ically do not share moderated content due to pri- ing the effects of hate speech (Schieb and Preuss, vacy, ethical and public relations concerns. One 2016; Mathew et al., 2018; Stroud and Cox, 2018). possibility for significant progress is to work with Wright et al.(2017) notes that real-world exam- platform administrators and stakeholders to make ples of bystanders intervening can be found on- proprietary data available as private test sets on line, thereby providing a potential source of train- platforms like Codalab, thereby keeping annota- ing data but methods are needed to reliably iden- tions in line with community norms and still al- tify such counter speech examples. lowing researchers to evaluate on real behavior. Second, interventions that occur after a point of Third, tasks must clearly define who is the end- escalation may have little positive effect in some user of the classification labels. For example, will circumstances. For example, when two individu- moderators use the system to triage abusive con- als have already begun insulting one another, both tent, or is the goal to automatically remove abu- have already become upset and must lose face to sive content? Current solutions are often trained reconcile (Rubin et al., 1994). At this point, de- and evaluated in a static manner, only using pre- escalation may prevent further abuse but does lit- existing data; whether these solutions are effective tle for restoring the situation to a constructive dia- upon deployment remains relatively unexplored. log (Gottman, 1999). However, interventions that Evaluation must go beyond just traditional mea- occur before the point of abuse can serve to shift sures of performance like precision and recall, and the conversation. Recent work has shown that it instead begin optimizing for metrics like reduction is possible to predict whether a conversation will in moderator effort, speed of response, targeted re- become toxic on Wikipedia (Zhang et al., 2018) call for severe types of abuse, moderator trust and and whether bullying will occur on Instagram (Liu fairness in predictions. et al., 2018). These predictable abuse trajectories open the door to developing new models for pre- 3 Proactive Approaches for Abuse emptive interventions that directly mitigate harm. Third, messages that are not intended as offen- Existing computational approaches to handle abusive create opportunities to nudge authors towards sive language are primarily reactive and intervene correcting their text if the offense is pointed out. only after abuse has occurred. A complementary This strategy builds upon recent work on explain- approach is developing proactive technologies that able ML for identifying which parts of a mes- prevent the harm from occurring in the first place, sage are offensive (Carton et al., 2018; Noever, and we motivate three proactive computational ap- 2018), and work on paraphrase and style transfer proaches to prevent abuse here. for suggesting an appropriate inoffensive alterna- First, bystanders can have a profound effect on tive (Santos et al., 2018; Prabhumoye et al., 2018). the course of an interaction by steering the direc- For example, parts of a message could be para- tion of the conversation away from abuse (Markey, phrased to adjust the level of politeness in order 2000; Dillon and Bushman, 2015). Prior work to minimize any cumulative disparity towards one has used experimenter-based intervention but a social group (Sennrich et al., 2016). substantial opportunity exists to operationalize these interventions through computational means. 4 Justice Frameworks for NLP Munger(2017) developed a simple, but effective, computational intervention for the use of toxic Martin Luther King Jr. wrote that the biggest ob- language (the n-word), where a human-looking stacle to Black freedom is the “white moderate, who is more devoted to ‘order’ than to justice, A restorative process may produce a punishment, who prefers a negative peace which is the absence such as banning, but can include consequences of tension to a positive peace which is the pres- such as apology and reconciliation (Braithwaite, ence of justice” (King, 1963). Analogously, by fo- 2002). Just responses consider the emotions of cusing only on classifying individual unacceptable both perpetrators and victims in designing the speech acts, NLP risks being the same kind of ob- right response (Sherman, 2003). A key problem stacle as the white moderate: Instead of seeking here is identifying which community norm is vio- the absence of certain types of speech, we should lated and NLP technologies can be introduced to seek the presence of equitable participation. We aid this process of elucidating violations through argue that NLP should consider supporting three classification or use of explainable ML techniques. types of justice—social justice, restorative justice, Here, NLP can aid all parties (platforms, victims, and procedural justice—that describe (i) what ac- and transgressors) in identifying appropriate av- tions are allowed and encouraged, (ii) how wrong- enues for restorative actions. doing should be handled, and (iii) what procedures Third, just communities also require just means should be followed. of addressing wrongdoing. The notion of proce- First, the capabilities approach to social justice dural justice explains that people are more likely focuses on what actions people can do within a so- to comply with a community’s rules if they be- cial setting (Sen, 2011; Nussbaum, 2003) and pro- lieve the authorities are legitimate (Tyler and Huo, vides a useful framework for thinking about what 2002; Sherman, 2003). For NLP, it means that justice online could look like. Nussbaum(2003) our systems for detecting non-compliance must be provides a set of 10 fundamental capabilities for transparent and fair. People will comply only if a just society, such as the ability to express emo- they accept the legitimacy of both the platform and tion and to have an affiliation. These capabili- the algorithms it employs. Therefore, abuse detec- ties provide a blueprint for articulating the val- tion methods are needed to justify why a particu- ues and opportunities an online community pro- lar act was a violation to build legitimacy; a natu- vides: Instead of a negative articulation—an ever- ral starting point for NLP in building legitimacy is growing list of prohibited behaviors—we should recent work from explainable ML (Ribeiro et al., use a positive phrasing (e.g., “you will be able 2016; Lei et al., 2016; Carton et al., 2018). to”) of capabilities in an online community. Such effort naturally extends our proposal for detecting community-specific abuse to one of promot- 5 Conclusion ing community norms. Accordingly, NLP technologies can be developed to identify positive be- Abusive behavior online affects a substantial haviors and ensure individuals are able to fulfill amount of the population. The NLP community these capabilities. Several recent works have made has proposed computational methods to help mit- strides in this direction by examining positive be- igate this problem, yet has also struggled to move haviors such as how constructive conversations beyond the most obvious tasks in abuse detec- are (Kolhatkar and Taboada, 2017; Napoles et al., tion. Here, we propose a new strategy for NLP 2017), whether dialog on contentious topics can to tackling online abuse in three ways. First, ex- exist without devolving into squabbling (Tan et al., panding our purview for abuse detection to include 2016), or the level of support given between com- both extreme behaviors and the more subtle— munity members (Wang and Jurgens, 2018). but still offensive—behaviors like microaggres- Second, once we have adequately articulated sions and condescension. Second, NLP must de- what people in a community should be able to velop methods that go beyond reactive identify- do, we must address how the community han- and-delete strategies to one of proactivity that in- dles transgressions. The notion of restorative jus- tervenes or nudges individuals to discourage harm tice is a useful theoretical tool for thinking about before it occurs. Third, the community should how wrongdoing should be handled. Restorative contextualize its effort inside a broader frame- justice theory emphasizes repair and uses a pro- work of justice—explicit capabilities, restorative cess in which stakeholders, including victims and justice, and procedural justice—to directly support transgressors, decide together on consequences. the end goal of productive online communities. Acknowledgements Robert B Cialdini and Melanie R Trost. 1998. So- cial influence: Social norms, conformity and com- This material is based upon work supported by pliance. In D. T. Gilbert, S. T. Fiske, and G. Lindzey, the Mozilla Research Grants program and by the editors, The handbook of social psychology, pages National Science Foundation under Grant No. 151–192. McGraw-Hill. 1822228. Jonathan Culpeper. 1996. Towards an anatomy of im- politeness. J. Pragmat., 25(3):349–367.

References Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. Keith Allan. 2015. When is a slur not a slur? the use 2013. A computational approach to politeness with of nigger in ‘pulp fiction’. Lang. Sci., 52:187–199. application to social factors. In Proceedings of the Anti-Defamation League. 2019. Online hate and ha- Annual Meeting of the Association for Computa- rassment: The american experience. https://www. tional Linguistics (ACL). adl.org/onlineharassment. Accessed: 2019-3-4. Thomas Davidson, Dana Warmsley, Michael Macy, Lindsay Blackwell, Jill Dimond, Sarita Schoenebeck, and Ingmar Weber. 2017. Automated hate speech and Cliff Lampe. 2017. Classification and its con- detection and the problem of offensive language. sequences for online harassment: Design insights In Eleventh International AAAI Conference on Web from heartmob. Proceedings of the ACM on Human- and Social Media. Computer Interaction, 1(CSCW):24. Kelly P Dillon and Brad J Bushman. 2015. Unrespon- Geoffrey C Bowker and Susan Leigh Star. 2000. Sort- sive or un-noticed?: Cyberbystander intervention in ing things out: Classification and its consequences. an experimental cyberbullying context. Computers MIT press. in Human Behavior, 45:144–150.

John Braithwaite. 2002. Restorative Justice & Respon- Karthik Dinakar, Birago Jones, Catherine Havasi, sive Regulation. Oxford University Press. Henry Lieberman, and Rosalind Picard. 2012. Com- mon sense reasoning for detection, prevention, and Julian Brooke, Tong Wang, and Graeme Hirst. 2010. mitigation of cyberbullying. ACM Transactions on Automatic acquisition of lexical formality. In Pro- Interactive Intelligent Systems (TiiS), 2(3):18. ceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 90–98. Bruce Drake. 2014. The darkest side of on- Association for Computational Linguistics. line harassment: Menacing behavior. Pew Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2018. Research Center, http://www.pewresearch.org/fact- Extractive adversarial networks: High-recall expla- tank/2015/06/01/the-darkest-side-of-online- nations for identifying personal attacks in social me- harassment-menacing-behavior/. dia posts. In Proceedings of EMNLP. Maeve Duggan. 2017. Online harassment 2017. Eshwar Chandrasekharan and Eric Gilbert. 2019. Hy- brid approaches to detect comments violating macro Jill Filipovic. 2007. Blogging while female: How in- norms on reddit. arXiv preprint arXiv:1904.03596. ternet misogyny parallels “Real-World” harassment. Yale J. Law Fem., 19(1). Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, Paula Fortuna and Sergio´ Nunes. 2018. A survey on and Eric Gilbert. 2017. You can’t stay here: The automatic detection of hate speech in text. ACM efficacy of reddit’s 2015 ban examined through Computing Surveys (CSUR), 51(4):85. hate speech. Proceedings of the ACM on Human- Computer Interaction, 1(CSCW):31. Michel Foucault. 1972. The Archaeology of Knowl- edge & The Discourse on Language. Pantheon Eshwar Chandrasekharan, Mattia Samory, Shagun Books, New York. Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. Nancy Fraser. 1990. Rethinking the public sphere: The internet’s hidden rules: An empirical study A contribution to the critique of actually existing of reddit norm violations at micro, meso, and democracy. Social Text, (25/26):56–80. macro scales. Proceedings of the ACM on Human- Computer Interaction, 2(CSCW):32. Lei Gao and Ruihong Huang. 2017. Detecting online hate speech using context aware models. In Pro- Niyati Chhaya, Kushal Chawla, Tanya Goyal, Projjal ceedings of RANLP. Chanda, and Jaya Singh. 2018. Frustrated, polite, or formal: Quantifying feelings and tone in email. Lei Gao, Alexis Kuppersmith, and Ruihong Huang. In Proceedings of the Second Workshop on Compu- 2017. Recognizing explicit and implicit hate speech tational Modeling of Peoples Opinions, Personality, using a weakly supervised two-path bootstrapping and Emotions in Social Media, pages 76–86. approach. In Proceedings of ICJNLP. Tarleton Gillespie. 2018. Custodians of the Internet: Patrick M Markey. 2000. Bystander intervention in Platforms, content moderation, and the hidden de- computer-mediated communication. Computers in cisions that shape social media. Yale University Human Behavior, 16(2):183–188. Press. Binny Mathew, Hardik Tharad, Subham Rajgaria, Pra- Peter Glick and Susan T Fiske. 2001. An ambivalent jwal Singhania, Suman Kalyan Maity, Pawan Goyal, alliance: Hostile and benevolent sexism as comple- and Animesh Mukherje. 2018. Thou shalt not mentary justifications for gender inequality. Ameri- hate: Countering online hate speech. arXiv preprint can psychologist, 56(2):109. arXiv:1808.04409.

John Mordechai Gottman. 1999. The marriage clinic: Timothy McLaughlin. 2018. How what- A scientifically-based marital therapy. WW Norton sapp fuels fake news and violence in india. & Company. https://www.wired.com/story/how-whatsapp-fuels- fake-news-and-violence-in-india/. Joshua Guberman and Libby Hemphill. 2017. Chal- lenges in modifying existing scales for detecting ha- Pushkar Mishra, Marco Del Tredici, Helen Yan- rassment in individual tweets. In Proceedings of nakoudakis, and Ekaterina Shutova. 2018. Author the 50th Hawaii International Conference on System profiling for abuse detection. In Proceedings of Sciences. the 27th International Conference on Computational Linguistics (COLING), pages 1088–1098. Susan Herring, Kirk Job-Sluder, Rebecca Scheckler, and Sasha Barab. 2002. Searching for safety online: Kevin Munger. 2017. Tweetment effects on the Managing” trolling” in a feminist forum. The infor- tweeted: Experimentally reducing racist harass- mation society, 18(5):371–384. ment. Political Behavior, 39(3):629–649.

Thomas J Holt, Kristie R Blevins, and Natasha Burkert. Dasa Munkova, Michal Munk, and Zuzana Fraterov´ a.´ 2010. Considering the pedophile subculture online. 2013. Identifying social and expressive factors in Sexual Abuse, 22(1):3–24. request texts using transaction/sequence model. In Proceedings of RANLP, pages 496–503. Elizabeth M Jaffe. 2016. Swatting: the new cyberbullying frontier after elonis v. united states. Drake L. Kevin L. Nadal, Katie E. Griffin, Yinglee Wong, Rev., 64:455. Sahran Hamit, and Morgan Rasmus. 2014. The impact of racial microaggressions on mental health: Akshita Jha and Radhika Mamidi. 2017. When does Counseling implications for clients of color. Jour- a compliment become sexist? analysis and classifi- nal of Counseling and Development, 92(1):57–66. cation of ambivalent sexism using twitter data. In Proceedings of the second workshop on NLP and Courtney Napoles, Joel Tetreault, Aasish Pappu, En- computational social science, pages 7–16. rica Rosato, and Brian Provenzale. 2017. Finding good conversations online: The yahoo news anno- Martin Luther King. 1963. Letter from a birmingham tated comments corpus. In Proceedings of the 11th jail. Linguistic Annotation Workshop, pages 13–23.

Varada Kolhatkar and Maite Taboada. 2017. Construc- Chikashi Nobata, Joel Tetreault, Achint Thomas, tive language in news comments. In Proceedings Yashar Mehdad, and Yi Chang. 2016. Abusive lan- of the First Workshop on Abusive Language Online, guage detection in online user content. In Proceed- pages 11–17. ings of the 25th International Conference on World Wide Web, WWW ’16, pages 145–153, Republic and Rachael Krishna. 2018. Tumblr launched an algo- Canton of Geneva, Switzerland. International World rithm to flag porn and so far it’s just caused chaos, Wide Web Conferences Steering Committee. dec 2018. https://www.buzzfeednews.com/article/ krishrach/tumblr-porn-algorithm-ban. Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press. Mark Latonero. 2011. Human trafficking online: The role of social networking sites and online classifieds. David Noever. 2018. Machine learning suites Available at SSRN. for online toxicity detection. arXiv preprint arXiv:1810.01869. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. In Proceedings of Martha Nussbaum. 2003. Capabilities as fundamental EMNLP. entitlements: Sen and social justice. Feminist Eco- nomics, 9(2-3):33–59. Ping Liu, Joshua Guberman, Libby Hemphill, and Aron Culotta. 2018. Forecasting the presence and Ray Oshikawa, Jing Qian, and William Yang Wang. intensity of hostility on instagram using linguistic 2018. A survey on natural language process- and social features. In Twelfth International AAAI ing for fake news detection. arXiv preprint Conference on Web and Social Media. arXiv:1811.00770. Zizi Papacharissi. 2004. Democracy online: Civility, news media. In Proceedings of the Twelfth Interna- politeness, and the democratic potential of online tional AAAI Conference on Web and Social Media political discussion groups. New media & society, (ICWSM). 6(2):259–283. Mattia Samory and Tanushree Mitra. 2018. Conspira- Jessica Annette Pater, Yacin Nadji, Elizabeth D My- cies online: User discussions in a conspiracy com- natt, and Amy S Bruckman. 2014. Just awful munity following dramatic events. In Twelfth Inter- enough: the functional dysfunction of the something national AAAI Conference on Web and Social Me- awful forums. In Proceedings of the 32nd annual dia. ACM conference on Human factors in computing systems, pages 2407–2410. ACM. Cicero Nogueira dos Santos, Igor Melnyk, and Inkit Padhi. 2018. Fighting offensive language on social Ellie Pavlick and Joel Tetreault. 2016. An empiri- media with unsupervised text style transfer. In Pro- cal analysis of formality in online communication. ceedings of ACL. Transactions of the Association of Computational Linguistics (TACL), 4(1):61–74. Carla Schieb and Mike Preuss. 2016. Governing hate speech by means of counterspeech on facebook. In Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Proceedings of ICA, pages 1–23. Salakhutdinov, and Alan W Black. 2018. Style transfer through back-translation. In Proceedings of Anna Schmidt and Michael Wiegand. 2017. A survey ACL. on hate speech detection using natural language processing. In Proceedings of the Fifth International Jing Qian, Mai ElSherief, Elizabeth Belding, and Workshop on Natural Language Processing for So- William Yang Wang. 2018. Leveraging intra- cial Media, pages 1–10. user and inter-user representation learning for automated hate speech detection. In Proceedings of Amartya Sen. 2011. The Idea of Justice, reprint edi- the 2018 Conference of the North American Chap- tion edition. Belknap Press: An Imprint of Harvard ter of the Association for Computational Linguistics University Press. (NAACL), pages 118–123. Rico Sennrich, Barry Haddow, and Alexandra Birch. Manoel Horta Ribeiro, Pedro H Calais, Yuri A Santos, 2016. Controlling politeness in neural machine Virg´ılio AF Almeida, and Wagner Meira Jr. 2018. translation via side constraints. In Proceedings of Characterizing and detecting hateful users on twitter. NAACL, pages 35–40. In Twelfth International AAAI Conference on Web and Social Media. Fadi Abu Sheikha and Diana Inkpen. 2011. Generation of formal and informal sentences. In Proceedings of Marco Tulio Ribeiro, Sameer Singh, and Carlos the 13th European Workshop on Natural Language Guestrin. 2016. Why should i trust you?: Explain- Generation, pages 187–193. Association for Com- ing the predictions of any classifier. In Proceedings putational Linguistics. of KDD, pages 1135–1144. ACM. Sarah T Roberts. 2014. Behind the screen: The hid- Lawrence W Sherman. 2003. Reason for emotion: den digital labor of commercial content modera- Reinventing justice with theories, innovations, and tion. Ph.D. thesis, University of Illinois at Urbana- research—the american society of criminology 2002 Champaign. presidential address. Criminology, 41(1):1–38. Jessica Roy. 2016. ”’cuck,”snowflake,”masculinist’: Alexandra Siegel. 2015. Sectarian Twitter Wars: A guide to the language of the’alt- Sunni-Shia Conflict and Cooperation in the Digital right’. http://www.latimes.com/nation/ Age, volume 20. Carnegie Endowment for Interna- la-na-pol-alt-right-terminology-20161115-story. tional Peace. html).LosAngelesTimes. Scott R Stroud and William Cox. 2018. The varieties Jeffrey Z Rubin, Dean G Pruitt, and Sung Hee Kim. of feminist counterspeech in the misogynistic on- 1994. Social conflict: Escalation, stalemate, and line world. In Mediating Misogyny, pages 293–310. settlement. Mcgraw-Hill Book Company. Springer. Koustuv Saha, Eshwar Chandrasekharan, and Munmun Derald Wing Sue. 2010. Microaggressions in Everyday De Choudhury. 2019. Prevalence and psychological Life: Race, Gender, and Sexual Orientation. Wiley, effects of hateful speech in online college communi- Hoboken, NJ. ties. In WebSci. Derald Wing Sue, Christina M Capodilupo, Gina C Joni Salminen, Hind Almerekhi, Milica Milenkovic,´ Torino, Jennifer M Bucceri, Aisha M.B. B. Holder, Soon-gyo Jung, Jisun An, Haewoon Kwak, and Kevin L Nadal, and Marta Esquilin. 2007. Racial Bernard J Jansen. 2018. Anatomy of online hate: microaggressions in everyday life: Implications developing a taxonomy and machine learning mod- for clinical practice. American Psychologist, els for identifying and classifying hate in online 62(4):271–286. Chenhao Tan, Vlad Niculae, Cristian Danescu- Niculescu-Mizil, and Lillian Lee. 2016. Win- ning arguments: Interaction dynamics and persua- sion strategies in good-faith online discussions. In Proceedings of the 25th international conference on world wide web, pages 613–624. International World Wide Web Conferences Steering Committee. Harry Charalambos Triandis. 1994. Culture and social behavior. McGraw-Hill New York. Tom R Tyler and Yuen Huo. 2002. Trust in the Law: Encouraging Public Cooperation with the Police and Courts. Russell Sage Foundation. Rob Voigt, Nicholas P Camp, Vinodkumar Prab- hakaran, William L Hamilton, Rebecca C Hetey, Camilla M Griffiths, David Jurgens, Dan Jurafsky, and Jennifer L Eberhardt. 2017. Language from police body camera footage shows racial dispari- ties in officer respect. Proceedings of the National Academy of Sciences, 114(25):6521–6526. Zijian Wang and David Jurgens. 2018. It’s going to be okay: Measuring access to support in online communities. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Process- ing, pages 33–45. Zeerak Waseem, Thomas Davidson, Dana Warmsley, and Ingmar Weber. 2017. Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the First Workshop on Abusive Lan- guage. Lucas Wright, Derek Ruths, Kelly P Dillon, Haji Mo- hammad Saleem, and Susan Benesch. 2017. Vectors for counterspeech on twitter. In Proceedings of the First Workshop on Abusive Language Online, pages 57–62.

Justine Zhang, Jonathan P Chang, Cristian Danescu- Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Nithum Thain, and Dario Taraborelli. 2018. Conversations gone awry: Detecting early signs of conversational failure. In Proceedings of ACL.