Information Foraging in Debugging
Total Page:16
File Type:pdf, Size:1020Kb
AN ABSTRACT OF THE DISSERTATION OF Joseph Lawrance for the degree of Doctor of Philosophy in Computer Science presented on June 12, 2009. Title: Information Foraging in Debugging. Abstract approved: __________________________________________ Margaret M. Burnett Programmers spend a substantial fraction of their debugging time by navigating through source code, yet little is known about how programmers navigate. With the continuing growth in size and complexity of software, this fraction of time is likely to increase, which presents challenges to those seeking both to understand and address the needs of programmers during debugging. Therefore, we investigated the applicability a theory from another domain, namely information foraging theory, to the problem of programmers’ navigation during software maintenance. The goal was to determine the theory’s ability to provide a foundational understanding that could inform future tool builders aiming to support programmer navigation. To perform this investigation, we first defined constructs and propositions for a new variant of information foraging theory for software maintenance. We then operationalized the constructs in different ways and built three executable models to allow for empirical investigation. We developed a simple information-scent-only model of navigation, a more advanced model of programmer navigation, named Programmer Flow by Information Scent (PFIS), which accounts for the topological structure of source code, and PFIS 2, a refinement of PFIS that maintains an up-to- date model of source code on the fly and models information scent even in the absence of explicit information about stated goals. We then used the models in three empirical studies to evaluate the applicability of information foraging theory to this domain. First, we conducted a lab study of 12 IBM programmers working on a bug report and feature request. Second, we conducted an analysis of issues and revisions collected from Sourceforge.net. Finally, we collected programmer navigation behavior, revisions and issues from a field study of programmers working in various groups at IBM. All three models predicted programmers’ navigation behavior, including where programmers allocated their time among patches, where programmers went, or where programmers made changes to fix defects. These results indicate that information foraging theory can predict and explain programmer navigation behavior, and imply that tools based on the principles of information foraging theory will be able to predict subsequent navigation behavior and potentially assist where programmers should go to make changes to fix bugs. ©Copyright by Joseph Lawrance June 12, 2009 All Rights Reserved Information Foraging in Debugging by Joseph Lawrance A DISSERTATION submitted to Oregon State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Presented June 12, 2009 Commencement June 2010 Doctor of Philosophy dissertation of Joseph Lawrance presented on June 12, 2009 APPROVED: Major Professor, representing Computer Science Director of the School of Electrical Engineering and Computer Science Dean of the Graduate School I understand that my dissertation will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my dissertation to any reader upon request. Joseph Lawrance, Author ACKNOWLEDGEMENTS So many people have helped me come to this day. I owe a debt of gratitude to those who helped make my graduate education and this dissertation possible through their presence, feedback, support and encouragement. This includes, but is not limited to, my fiancée (Seikyung Jung), my advisor (Margaret Burnett), my coauthors (Rachel Bellamy, Chris Bogart, Kyle Rector, Steven Clarke, Laura Beckwith, Robin Abraham, Martin Erwig, Christoph Neumann, Jason Dagit, Sam Adams, Ron Metoyer, T.J. Robertson, Corey Kissinger, Susan Wiedenbeck, Alan Blackwell, Curtis Cook, Gregg Rothermel, Andy Ko, Brad Myers, Mary Beth Rosson, Chris Scaffidi and Mary Shaw), my committee (Margaret Burnett, Timothy Budd, Martin Erwig, Xiaoli Fern, Maggie Niess), my coworkers at IBM Research (John Richards, Rachel Bellamy, Cal Swart, Annie Ying, Jonathan Brezin, Peter Malkin, Wendy Kellogg, Brent Hailpern, Josh Hailpern), Microsoft (Steven Clarke, Monty Hammontree), fellow students and faculty of Oregon State over the years (Paul Cull, Carlos Jensen, Christine Wallace, Surabh Setia, Valentina Grigoreanu, Stephen Kollmansberger, Rogan Creswick, Amit Phalgune, Shraddha Sorte, Neeraja Subrahmaniyan, Vidya Rajaram, Vaishnavi Narayanan, Joey Ruthruff, Marc Fisher, Todd Kulesza, Jill Cao, Neville Mehta, Ronald Bjarnason, Randy Rauwendaal, Andrew Stucky), my family (Kenneth, Renee, Michael, Steven, Chastity, Henry, Charles, Mary Jane, Cleo, Vi, Marci, Bob, William, Jacob), Bonnie John, Janice Singer, Rob Miller, Jeff Stylos, Thomas La Toza, and you, whomever you are, for reading my dissertation. This work was financed in part by IBM Research through several internships, by an IBM Ph.D. Scholarship, by an IBM International Faculty Award, and by the EUSES Consortium via NSF ITR- 0325273. This work benefited greatly from the comments of the reviewers and the meta-reviewer of ACM CHI 2009, IEEE VL/HCC 2008, ACM CHI 2008, and IEEE VL/HCC 2007. TABLE OF CONTENTS Page 1 Introduction...............................................................................................................1 2 Introduction to Information Foraging Theory.............................................................5 3 Literature Review......................................................................................................9 3.1 Psychology of programming and debugging ................................................... 9 3.2 Theories of debugging...................................................................................13 3.3 Tools for program navigation and search .......................................................15 4 Information foraging in debugging .......................................................................... 18 4.1 Theoretical objectives....................................................................................18 4.2 Theoretical constructs....................................................................................19 4.3 Theoretical scope...........................................................................................20 4.4 Theoretical propositions ................................................................................21 5. Model 1: Information Scent (Interword correlation)................................................ 22 6 Study 1: IBM think-aloud study and Model 1 .......................................................... 24 6.1 Participants....................................................................................................24 6.2 Materials .......................................................................................................24 6.3 Procedure ......................................................................................................26 6.4 Analysis methodology ...................................................................................26 6.5 Prerequisite Results: Information patches ......................................................28 6.6 Prerequisite Results: Hypothesis vs. scent processing ....................................30 6.7 Prerequisite Results: Scent “triggers” ............................................................31 TABLE OF CONTENTS (Continued) Page 6.8 Model 1 as predictor of where programmers went .........................................32 7 Study 2: Sourceforge.net field study and Model 1.................................................... 35 7.1 Materials .......................................................................................................35 7.2 Procedure ......................................................................................................36 7.3 Model 1 as predictor of where programmers should go..................................36 8 Model 2: Programmer Flow by Information Scent (PFIS)........................................ 43 8.1 WUFIS vs. PFIS............................................................................................44 8.2 The PFIS algorithm .......................................................................................46 9 Study 3: IBM think-aloud study and Model 2 .......................................................... 50 9.1 How well did PFIS predict where programmers went?...................................50 9.2 Comparison of Model 1, Model 2, Topology and others ................................53 9.3 Did PFIS account for class-to-class traversals? ..............................................54 10 Model 3: PFIS 2 .................................................................................................... 57 10.1 The PFIS 2 algorithm ..................................................................................58 10.2 PFIS vs. PFIS 2 example .............................................................................61 11 Study 4 Field Study ............................................................................................... 66 11.1 Design of study............................................................................................66 11.2 Programmer Flow Information Gatherer......................................................67 11.3 Measures .....................................................................................................69