The Production and Consumption of Quality Content in Peer Production Communities
Total Page:16
File Type:pdf, Size:1020Kb
The Production and Consumption of Quality Content in Peer Production Communities A Dissertation SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Morten Warncke-Wang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Loren Terveen and Brent Hecht December, 2016 Copyright c 2016 M. Warncke-Wang All rights reserved. Portions copyright c Association for Computing Machinery (ACM) and is used in accordance with the ACM Author Agreement. Portions copyright c Association for the Advancement of Artificial Intelligence (AAAI) and is used in accordance with the AAAI Distribution License. Map tiles used in illustrations are from Stamen Design and published under a CC BY 3.0 license, and from OpenStreetMap and published under the ODbL. Dedicated in memory of John. T. Riedl, Ph.D professor, advisor, mentor, squash player, friend i Acknowledgements This work would not be possible without the help and support of a great number of people. I would first like to thank my family, in particular my parents Kari and Hans who have had to see yet another of their offspring fly off to the United States for a prolonged period of time. They have supported me in pursuing my dreams even when it meant I would be away from them. I also thank my brother Hans and his fiancée Anna for helping me get settled in Minnesota, solving everyday problems, and for the good company whenever I stopped by. Likewise, I thank my brother Carl and his wife Bente whom I have only been able to see ever so often, but it has always been moments I have cherished. Thanks as well to my nieces and nephews who brighten up the day whenever I get to visit, and help me remember that discovering the world is always a lot of fun! John Riedl was a fantastic advisor and mentor to me, and Loren Terveen and Brent Hecht did an excellent job of helping me refocus and keep going after John’s unfortunate passing. Since then, Loren and Brent have been wonderful advisors who have kept me on track with producing solid research and I am grateful for everything I have learned from them. I would also like to thank the other two members of my committee, Ching Ren and John Carlis, who have both showed me how to be a better researcher. Thanks as well to everyone at GroupLens Research for creating and supporting an environment where science prospers. First of all, thank you to Joe Konstan, John Riedl, and Loren Terveen for seeing my potential when I did not myself and being instrumental in my move to Minnesota to pursue a Ph.D. Secondly, thank you to my senior students who led by excellent example and helped shape my work: Tony Lam, Aaron Halfaker, Katie Panciera, and Michael Ekstrand. Thanks also to the undergraduate students I got to work and publish papers with: Vlad Ayukaev, Vivek Ranjan, and Connor McMahon. I would also like to thank my fellow graduate stu- dents Jacob Thebault-Spieker and Isaac Johnson for helping me understand how geographic science works and helping me make sense of our findings. GroupLens has also had some great lab assistants and I thank them for helping out with ev- eryday issues, in particular I would like to thank Angela Brandt for welcoming me to GroupLens and helping me understand how things work in the Midwest. Lastly, ii as mentioned GroupLens has a fantastic environment where science prospers, and I would have liked to name every single person for being awesome and helping that happen! My fellow researchers also deserve many thanks. Dan Cosley helped make our Wikipedia machine learning work a reality and I have greatly benefited from his research experience and perspectives. He also created SuggestBot, which has now been serving recommendations to Wikipedia contributors for about ten years and keeps going strong. Zhenhua Dong was a wonderful collaborator on my first paper and his tireless efforts still inspire me. Last, but very much not the least, thank you to Anuradha Uduwage for defying categorisation by being a great collaborator, graduate student, and friend. I would also like to thank my fellow researchers at the Wikimedia Foundation for the opportunities they have given me as a Research Fellow there. Thanks also goes to students and faculty at the University of Washington for welcoming me to the Pacific Northwest. In particular I would like to thank professors Benjamin Mako Hill and David McDonald for providing me with places to contribute and study, as well as graduate students Amirah Majid and Amanda Menking for making the research lab a place that feels like home. I am also sending thanks to my friends on both sides of the Atlantic ocean who help keep my eyes open to fresh perspectives, have my back when I struggle, make sure I laugh, and bring a smile to my face whenever I think of them. I cannot thank my girlfriend Cameo enough for her support over the past three and a half years as I have worked on completing this thesis. She has helped me find alternative perspectives, asked challenging questions, but first and foremost made sure that I do not go through a single day without feeling appreciated nor without at least one good laugh. Finally, I would like to thank the National Science Foundation for financial support under a variety of grants: IIS 08-08692, IIS 08-45351, IIS 09-68483, and IIS 11-11201. iii Abstract Over the past 25 years, commons-based peer production [Ben02] has be- come a vital part of the information technology landscape. There are successful projects in different areas such as open source software (e.g. Apache and Fire- fox), encyclopedias (e.g. Wikipedia), and map data (e.g. OpenStreetMap). A common theme in all these communities is that they are mainly volunteer-driven and that contributors are able to self-select what they want to work on. Studies on contributor motivation in peer production have found “fun” and “appropriate challenges” to be strong factors [Nov07; LW05], both associated with the sen- sation of vital engagement often referred to as “flow” [NC03]. Peer production contributors also often refer to altruism, the desire to be helpful to others, as a motivating factor [BH13]. To what extent does this bottom-up, interest-driven, volunteer-based content production paradigm meet the needs of consumers of this content? This thesis presents our work on improving our understanding of how peer production communities produce quality content and whether said quality con- tent is produced in areas where there is demand for it. We study this from three perspectives and make contributions as follows: we investigate what textual features describe content quality in Wikipedia and develop a high-performance prediction model solely based on features contributors can easily improve (so called “actionable features”); we apply a coherent framework for describing and evaluating quality improvement projects in order to discover factors associated with the success and failure of these types of projects; we introduce an analyt- ical framework that allows is to identify the misalignment between supply of and demand for quality content in peer production communities and measure the impact this has on a community’s audience. iv The research presented in this thesis provides us with a deeper understand- ing of quality content in peer production communities. These communities have created software, encyclopedic content, and maps that in many ways improve our everyday lives as well as those of millions of others. At the same time we have identified areas where there is a shortage of quality content and discussed future work that can help reduce this problem. This thesis thus lays the foun- dation upon which we can build improved communities and positively impact a large part of the world’s population. v Contents List of Tables viii List of Figures x 1 Introduction 1 2 Background 5 2.1 Peer-produced Content Quality . 6 2.2 How is Content Quality Assessed? . 9 2.3 Can We Automate Quality Assessment? . 10 2.4 The Production of Quality Content . 13 2.5 Consumption of Quality Content . 18 2.6 Misalignment Between Production and Consumption . 19 3 Predicting Content Quality Using Actionable Features 22 3.1 An Actionable Quality Model . 24 3.2 Improving the Actionable Quality Model . 41 3.3 Usage and Impact . 48 4 Understanding Quality Improvement Projects 50 4.1 Unified Descriptive Framework . 50 vi Contents 4.2 The Quality Improvement Projects Studied . 51 4.3 Datasets . 55 4.4 Measuring Project Performance . 59 4.5 Results . 60 4.6 Limitations . 74 4.7 Conclusion . 75 5 Misalignment Between Supply and Demand 77 5.1 Research questions . 78 5.2 The Perfect Alignment Hypothesis . 79 5.3 Methods and Datasets . 81 5.4 Misalignment in Wikipedia . 89 5.5 Misalignment in OpenStreetMap . 101 5.6 Discussion . 118 5.7 Future Work and Limitations . 120 5.8 Conclusion . 122 6 Conclusion 124 6.1 Broader Implications . 126 6.2 Future Work . 129 6.3 In Conclusion . 132 Bibliography 133 vii List of Tables 3.1 Feature list of all four models . 30 3.2 Classification results of all four models . 30 3.3 Overall gain ratio evaluation for all 17 features . 34 3.4 Classification results for all classifiers on 2-class problem . 34 3.5 Classification results for all seven assessment classes . 39 3.6 Confusion matrix for classification of all seven assessment classes . 39 3.7 Quality Improvement Project classifier confusion matrix . 44 3.8 Quality Improvement Project classifier prediction error . 45 3.9 Comparison between OLR models for reassessments and predictions .