Proceedings of the Fourteenth International AAAI Conference on Web and Social Media (ICWSM 2020) Characterizing Variation in Toxic Language by Social Context Bahar Radfar, Karthik Shivaram, Aron Culotta Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 {bradfar, kshivara}@hawk.iit.edu,
[email protected] Abstract toxicity detection. Our main findings are that (i) of tweets containing potentially toxic terms, those sent between users How two people speak to one another depends heavily on the with no social relationship are nearly three times as likely nature of their relationship. For example, the same phrase said to be toxic as those sent between users with a mutual fol- to a friend in jest may be offensive to a stranger. In this paper, we apply this simple observation to study toxic comments lowing relationship; (ii) including a feature indicating rela- in online social networks. We curate a collection of 6.7K tionship type consistently improves classification accuracy tweets containing potentially toxic terms from users with dif- by 4-5% across a variety of classification methods; (iii) lin- ferent relationship types, as determined by the nature of their guistic choice in toxic tweets varies substantially across re- follower-friend connection. We find that such tweets between lationship type, with mutual friends using relatively more users with no connection are nearly three times as likely to be extreme language to express hostility than do users with no toxic as those between users who are mutual friends, and that social connection. taking into account this relationship type improves toxicity Below, we first briefly summarize related work (§2), then detection methods by about 5% on average.