Systems for Collective Human Curation of Online Discussion Amy
Total Page:16
File Type:pdf, Size:1020Kb
Systems for Collective Human Curation of Online Discussion by Amy Xian Zhang B.S., Rutgers University, New Brunswick (2011) M.Phil., University of Cambridge (2012) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2019 ○c Massachusetts Institute of Technology 2019. All rights reserved. Author................................................................ Department of Electrical Engineering and Computer Science August 30, 2019 Certified by. David R. Karger Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Systems for Collective Human Curation of Online Discussion by Amy Xian Zhang Submitted to the Department of Electrical Engineering and Computer Science on August 30, 2019, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The internet was supposed to democratize discussion, allowing people from all walks of life to communicate with each other at scale. However, this vision has not been fully realized—instead online discourse seems to be getting worse, as people are increasingly drowning in discussion, with much of it unwanted or unpleasant. In this thesis, I present new systems that empower discussion participants to work collectively to bring order to discussions through a range of curation tools that superimpose richer metadata structure on top of standard discussion formats. These systems enable the following new capabilities: 1) recursive summarization of threaded forums using Wikum, 2) teamsourced tagging and summarization of group chat using Tilda, 3) fine-grained customization of email delivery within mailing lists using Murmur, and 4) friendsourced moderation of messages against online harassment using Squadbox. In a world of abundant discussion and mass capabilities for amplification, the curation of a social space becomes as equally essential as content creation in defining the nature of that space. By putting more powerful techniques for curation in the hands of everyday people, I envision a future where end users are empowered to actively co-create every aspect of their online discussion environments, bringing in their nuanced and contextual insights. Thesis Supervisor: David R. Karger Title: Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments This work would not have been possible without the support of so many wonderful people. Thank you for your time, your energy, and your care. I dedicate this thesis to you: ∙ my husband Johnny Hu, who is my biggest supporter and my best friend; ∙ my parents Randy and Susan and my sister Ally, who I can always count on to be there for me; ∙ my advisor David Karger, who unfailingly gives solid advice and has been the funnest and most creative co-conspirator; ∙ my dissertation committee members, who I can also now count on as life- long mentors—Mark Ackerman, Arvind Satyanarayan, and Ethan Zuckerman— thank you for assisting and encouraging me through this process; ∙ Mor Naaman, who encouraged me to pursue undergraduate research and then to apply to graduate school; ∙ Scott Counts, Ed Chi, Jilin Chen, Lichan Hong, Bryan Culbertson, Praveen Paritosh, and Justin Cranshaw, who took a chance on me as an intern and who I can still count on for advice and mentorship today; ∙ my collaborators in the Haystack Group: Lea Verou, Tarfah Alrashed, Ted Ben- son, Anant Bhardwaj, Soya Park, Luke Murray, Farnaz Jahanbakhsh, Jumana Almahmoud, and others who helped with feedback and testing; ∙ my mentees Kaitlin Mahar and Jane Im, who I have to especially thank for your invaluable contributions to this thesis; ∙ my other mentees Jessica Wang, Janet Sung, Jenny Fan, Sunny Tian, Prateek Kukreja, Joshua Blum, Oliver Dunkley, and others, who patiently put up with me as a newbie research mentor; 5 ∙ my collaborators outside MIT during my Ph.D., including Chris Schilling and Jonathan Morgan of the Wikimedia Foundation, Connie Moon Sehat, An Xiao Mina, Jenny 8. Lee, Ed Bice, Sandro Hawke, and many others at the Credi- bility Coalition, Krzysztof Gajos at Harvard University, and Marc Facciotti at UCDavis; ∙ my team at the Collective Intelligence for Democracy Conference, including Abhishek Srivastava, Daniel Curto-Milet, Pablo Aragon, Berenice Zambrano, Ramla Ayadi, Julio Reyes, and Alejandra Monroy—thank you for believing in Wikum; ∙ friends and mentors in the broader HCI community at MIT, including Stefanie Mueller and the HCIE group, the VIS group, Rob Miller and the rest of the Usable Programming (formerly UID) Group with whom I spent the first half of my Ph.D., including Juho Kim, Carrie Cai, Elena Glassman, and Philip Guo, and the Center for Civic Media at the MIT Media Lab; ∙ friends at the Berkman Klein Center for Internet & Society, who expanded my outlook during this last year of my Ph.D.; ∙ Miranda, Daniel, Joseph, Valerie, and Laszlo, who kindly agreed to be guinea pigs for several of my tools and experiments; ∙ everyone at Cabot House at Harvard, where I called home for four wonderful years; ∙ the Google Ph.D. Fellowship and the National Science Foundation Graduate Research Fellowship, which funded my Ph.D. research and study; ∙ and finally, to all the users who participated in my user studies and peoplewho sat down to be interviewed by me—thank you. 6 Contents 1 Introduction 25 1.1 What is Wrong with Online Discussion Today? . 26 1.2 Why Isn’t Online Discussion Getting Better? . 28 1.3 From Tools for Publishing to Collectively Curated Social Spaces . 30 1.4 Systems for Collective Curation of Discussion . 31 1.4.1 Collaborative Summarization of Threaded Forums . 32 1.4.2 Teamsourced Markup and Note-taking in Group Chat . 33 1.4.3 Fine-Grained Control over Delivery in Mailing Lists . 35 1.4.4 Friendsourced Moderation of Email to Combat Harassment . 36 1.5 Thesis Contributions . 37 1.5.1 Empirical Understanding of Desired Discussion Structures and Signals . 37 1.5.2 Novel Discussion Presentations and Interactions . 38 1.5.3 Techniques for User Expression and Motivations for Curating . 38 1.6 Thesis Overview . 39 2 Background and Related Work 41 2.1 Evolution of Online Discussion Systems and Their Lingering Problems 41 2.1.1 Email is Still Email . 42 2.1.2 Group Chat: From IRC to Slack . 44 2.1.3 A Plethora of Web Forum Systems . 45 2.1.4 User Publishing and the Rise of Social Media Platforms . 46 7 2.2 Understanding Why and How We Collaborate and Participate in On- line Discourse . 48 2.2.1 Motivation to Curate Online Discourse . 49 2.2.2 Designing Effective Collaboration Tools . 51 2.2.3 Discussion Curation Artifacts . 53 2.3 Existing Research on Techniques and Tools . 56 2.3.1 Personal Information Management . 56 2.3.2 Crowdsourcing and Collective Intelligence . 58 2.3.3 Novel Presentations and Interactive Visualizations . 60 2.3.4 Discourse Analysis and Natural Language Processing . 62 2.4 Conclusion . 65 3 Wikum: Bridging Wikis and Forums towards Summarizing Discus- sion Threads 67 3.1 Introduction . 67 3.1.1 Contribution . 68 3.1.2 Chapter Overview . 70 3.2 Related Work . 71 3.3 Wikum Design . 73 3.3.1 Summary Tree Design . 73 3.3.2 Workflow Design . 74 3.4 Wikum System . 77 3.4.1 Building the Summary Tree . 77 3.4.2 Creating High Quality Summaries Efficiently . 78 3.4.3 System Implementation . 79 3.5 Lab Evaluations of Wikum . 80 3.5.1 Study 1: Summarization . 80 3.5.2 Study 2: Reading and Exploration . 88 3.5.3 Design Implications . 93 3.6 Case Study: Deliberation and Resolution in Wikipedia . 94 8 3.6.1 Introduction . 95 3.6.2 Background on Wikipedia Governance and Deliberation . 97 3.6.3 Introduction of Requests for Comment . 99 3.6.4 Analyzing Seven Years of RfCs . 103 3.6.5 Why Do RfCs Not Get Closed? . 109 3.6.6 Predicting the Likelihood of an RfC Going Stale . 116 3.6.7 Design Implications . 122 3.7 Field Study of Wikum on Wikipedia . 125 3.7.1 Existing Workflow for Frequent RfC Closers . 126 3.7.2 Field Study . 129 3.8 Discussion . 140 3.8.1 A Tension Between Summary Goals . 140 3.8.2 Who Summarizes? . 141 3.8.3 Considering Community Values . 142 3.8.4 Production of a Public Summary Tree . 142 3.9 Future Work . 143 3.9.1 Summaries as Productive Outputs of Discussion . 143 3.9.2 Wikum for Summarizing Civic Discourse . 144 3.9.3 Summarizing While Conversing . 145 3.9.4 Automatic Summarization . 145 3.9.5 Authoring Tools that Expose Intermediate Work . 146 3.10 Conclusion . 147 4 Tilda: Making Sense of Group Chat through Collaborative Tagging and Summarization 149 4.1 Introduction . 150 4.2 Related Work . 152 4.3 Needfinding Interviews for Making Sense of Group Chat . 154 4.3.1 Methodology . 154 9 4.3.2 Current Experiences with and Strategies for Managing Group Chat . 155 4.3.3 Preferences for the Content and Presentation of Synthesized Chat Designs . 160 4.4 Tilda: A Tool for Collaborative Sensemaking of Chat . 164 4.4.1 Enriching Chat Conversations using Notes and Tags . 164 4.4.2 Synthesizing Chat Conversations using Structured Summaries 168 4.4.3 System Implementation and Considerations . 169 4.5 Evaluation . 170 4.5.1 Study 1: Marking Up Chat Conversations While Chatting . 171 4.5.2 Study 2: Using Structured Summaries to Catch Up on Missed Conversations . 176 4.5.3 Field Study . 178 4.6 Discussion . 185 4.6.1 Who Annotates and Why? . 186 4.6.2 Towards Automatic Summarization of Chat . 187 4.6.3 Integration with Knowledge Management Tools and Email .