Computational Social Roles
Total Page:16
File Type:pdf, Size:1020Kb
COMPUTATIONAL SOCIAL ROLES Diyi Yang CMU-LTI-19-001 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Robert E. Kraut, Co-Chair (Carnegie Mellon University) Eduard Hovy, Co-Chair (Carnegie Mellon University) Brandy L. Aven (Carnegie Mellon University) Dan Jurafsky (Stanford University) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Language and Information Technologies Copyright c 2019 Diyi Yang Keywords: Computational social science, conversational acts, computer-supported cooperative work, generative model, human-computer interaction, machine learning, natural language processing, neural network, online health communities, recommender system, social network, social support, semantics, self-disclosure, social roles, social science, well-being, Wikipedia Abstract Millions of people participate in online communities, exchange expertise and ideas, and collaborate to produce complex artifacts. They often enact a variety of social roles in the process of helping their communities and the public at large, which strongly influence the amount and types of work they do, and how they coordinate their activities. Better understanding members’ roles benefits members by clarifying how they should behave to participate effectively and also benefits the community overall by encouraging members to contribute in ways that best use their skills and interests. Social sciences have provided rich theoretic taxonomies of social roles within groups, while natural language processing techniques enable us to automate the identification of social roles in online communities. However, most social science work has focused on generic roles without accommo- dating the activities associated with tasks in specific contexts or automat- ing the process of role identification. While there has been work to date about automatic role inference, identification of social roles has not had a corresponding strong emphasis in the language technologies community. A variety of methods were developed to extract specific “roles” or patterns in different contexts, lacking generalized definitions about what are roles and systematic methods to extract roles. Moreover, how roles change over time and how the awareness of roles influence role holders’ performance and the group production, have not been adequately researched in both fields. This thesis advocates for both theories of social science and models of text analysis to better define roles, develop ways to extract roles and optimally recommend roles to users. Concretely, this work defines what are social roles, introduces five measurable facets associated with social roles, and pro- posed a generic methodology for role identification. It also demonstrated how to computationally model social role and its facets in two socially im- portant contexts - Wikipedia and Cancer Survivor Network. Via combining theories of social roles and computational models for role identification on those two large-scale contexts, this research reveals details about emergent, behavioral, and functioning roles, and a set of computational techniques to identify such roles via fine-grained operationalization of role holders’ behaviors. This work fills the longstanding gap in role theory and empirical modeling about emergent roles in online communities, and lays the foun- dation for future work to identify and analyze roles that people enacted in group processes both online and offline. iv Acknowledgments I’m always grateful to my adviser Bob Kraut — the best adviser one could ever have. He is a great scholar and acts as a tremendous role model for me. Over the last several years we work together, I learned from him what is research and how to do rigorous research. I was trained in computer science, but Bob equipped me well with knowledge and mindset from social science. Although the journey of PhD is like taking a roller coaster, Bob provides as much support as I needed no matter there were ups or downs. These years of working with Bob is the most valuable part of my experiences at CMU. I’m very lucky to have Eduard Hovy as my co-adviser. He always en- courages me to go bigger, deeper and broader of my research direction. He is knowledgeable, visionary, and teaches me on how to think critically. Our advise meetings were not only about specific research projects, but also discussions around research visions, career plans, personality, leadership, etc. He believes in me, sets the right goals for me although I felt lost and not confident about myself most of the times. Without his encouragement and support, I would have still been the shy and timid myself. I’m deeply honored to have Brandy Aven and Dan Jurafsky on my thesis committee. I benefit a lot from Brandy’s expertise, who always push me to think of the big picture around my research. Dan is supportive, insightful, brilliant, and one of the great leaders in our field. I’m grateful that he hosted my three-month visit to Stanford NLP group. It was such a joy working with Dan and he provided me a lot of guidance and help over our meetings. I thank LTI and HCI faculty Alan Black, Jamie Callan, Chris Dyer, Alex Hauptmann, Alon Lavie, Louis-Philippe Morency, Graham Neubig, Carolyn Rosé, Yulia Tsvetkov and Laura Dabbish for their advice and help to my research or job-seeking. A special thanks to the administrative stuff: Stacey Young, Christy Melucci, Laura Alford, and JaRon Pitts. I would like to thank Aaron Halfaker from Wikimedia Foundation, Bin Gao, Tie-Yan Liu, Scott Counts from Microsoft Research, Ryan Ritter and Mark Handel from Facebook Research, for hosting my internships. I’m extremely grateful for numerous friends and collaborators, including David Adamson, Waleed Ammar, Danqi Chen, Jiaxin Chen, Vivian Yung- Nung Chen, Zihang Dai, Sauvik Das, Pradeep Dasigi, Yulun Du, Manaal Faruqui, Fang Fei, Kartik Goyal, Liangke Gui, Anhong Guo, Hyeju Jang, He He, Ting-Hao Kenneth Huang, Junjie Hu, Liang He, Zhiting Hu, Dirk Hovy, Haojian Jin, Yohan Jo, Guokun Lai, Huiying Li, Mu Li, Jiwei Li, Bin Liu, Hanxiao Liu, Zhengzhong Liu, Zack Lipton, Anna Kasunic, Xuezhe Ma, Elijah Mayfield, Will Monroe, Felicia Ng, Joseph Seering, Chenxi Song, Lu Sun, Di Wang, Haohan Wang, Ling Wang, Xu Wang, William Yang Wang, Yi- Chia Wang, Yu-Xiang Wang, Fan Wei, Miaomiao Wen, Yuexin Wu, Pengtao Xie, Qizhe Xie, Zhang Yang, Zhilin Yang, Zi Yang, Zichao Yang, Zheng Yao, Pengcheng Yin, Zhou Yu, Hao Zhang, Shikun Zhang, Yang Zhang, Guoqing Zheng, Haiyi Zhu, and many others. I want to specifically thank Chenxi, Jiaxin, Joseph, Shikun, Xu, and Zheng for being a listening ear and helping me overcome stress and hard times in life and work. I would like to thank my undergraduate adviser Yong Yu (and our ACM Honored Class), mentors Tianqi Chen and Weinan Zhang who started me on my first several research projects. I thank my parents for raising me up, for their unconditional support, and for guiding me to where I am today. I would like to thank my boyfriend Peter for his patience, encouragement, trust, and love along the way. Finally, I thank Facebook PhD Fellowship, CMU Presidential Fellowship and funds from Google and NIH for generously supporting this research. vi Contents 1 Introduction1 1.1 Thesis Overview.................................4 1.1.1 Role Theory................................4 1.1.2 Methodology for Identifying Roles...................4 1.1.3 Case Studies of Role Identification...................4 1.2 Research Impact..................................6 2 Role Theory9 2.1 Theoretical Modeling............................... 10 2.2 Computational Modeling............................ 12 2.3 Role Framework.................................. 14 2.3.1 Agent.................................... 14 2.3.2 Interaction................................. 16 2.3.3 Goal.................................... 17 2.3.4 Expectation................................ 18 2.3.5 Context................................... 20 2.4 Relevant Processes about Roles......................... 21 2.5 Summary...................................... 23 3 Methodology for Identifying Roles 25 3.1 Generic Methodology............................... 25 3.1.1 Role Postulation............................. 26 3.1.2 Role Definition.............................. 27 3.1.3 Role Identification............................ 30 3.1.4 Role Evaluation.............................. 34 3.2 Iterative Process of Identification........................ 36 3.3 Reflection...................................... 39 vii I Role Identification on Wikipedia 41 4 Identifying Roles of Editors 43 4.1 Introduction.................................... 44 4.2 Related Work................................... 46 4.3 Research Question and Data........................... 48 4.4 Predicting Edit Categories............................ 48 4.4.1 Edit Categories Construction...................... 49 4.4.2 Feature Space Design........................... 50 4.5 Modeling Editor Roles.............................. 52 4.5.1 Role Identification Method....................... 53 4.5.2 Derived Roles Exploration and Validation............... 54 4.6 Improving Article Quality............................ 57 4.6.1 Model Design............................... 58 4.6.2 Result Discussion............................. 62 4.7 Discussion and Conclusion........................... 63 4.8 Reflection...................................... 64 5 Identifying Semantic Edit Intention 65 5.1 Introduction...................................