Using Sentiment Induction to Understand Variation in Gendered Online Communities Li Lucy Julia Mendelsohn Symbolic Systems Program Department of Linguistics Department of Computer Science Department of Computer Science Stanford University Stanford University
[email protected] [email protected] Abstract gator and discussion platforms, Reddit, contains thousands of unique communities, known as sub- We analyze gendered communities defined in reddits. These subreddits vary in topic, such as three different ways: text, users, and senti- r/sport and r/history, content type, such as r/pics ment. Differences across these representations reveal facets of communities’ distinctive iden- and r/videos, and format, such as the Q&A style of tities, such as social group, topic, and atti- r/IamA and the narratives on r/confession. Online tudes. Two communities may have high text communities such as subreddits are often char- similarity but not user similarity or vice versa, acterized by language use and user membership and word usage also does not vary accord- (Hamilton et al., 2016; Datta et al., 2017; Bamman ing to a clearcut, binary perspective of gen- et al., 2014; Martin, 2017). der. Community-specific sentiment lexicons demonstrate that sentiment can be a useful in- Sociolinguists have primarily analyzed phono- dicator of words’ social meaning and commu- logical and syntactic variables (Eckert, 2012), nity values, especially in the context of discus- though some have studied lexical variables (e.g. sion content and user demographics. Our re- Wong, 2005). Previous computational work also sults show that social platforms such as Reddit focuses on lexical variation (Bamman et al., 2014). are active settings for different constructions We approach variation from a new direction, of gender.