Master's Thesis

Eindhoven University of Technology

MASTER

Bug fixing in action activities, process, and knowledge

Lin, B.

Award date: 2016

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Mathematics and Computer Science Software Engineering and Technology Group

Bug Fixing in Action: Activities, Process, and Knowledge

Master’s Thesis

Bin Lin

Supervisor: dr. Alexander Serebrenik Co-supervisors: Alexey Zagalsky prof. dr. Margaret-Anne Storey

Exam Committee: dr. Alexander Serebrenik dr. Michel Westenberg dr. Anton Wijs Alexey Zagalsky (non-voting)

Eindhoven, August 2016

Abstract

Bug fixing plays a vital role in software development. Research has shown that debugging occupies around half of the programming time and costs huge financial budgets as well as massive human resources. A case study of WordPress is conducted in this thesis to provide insights into the modern bug fixing practices. This study investigates bug fixing from there perspectives: bug fixing activities, the bug fixing process, and knowledge flow in bug fixing. First, we examine the limitations of using the bug status to understand bug fixing activities and identify bug fixing activities by inspecting bug tickets and communication archives. Second, we propose an approach to extract a process model of bug fixing, analyze when the communication tools are used, and explore the task distribution patterns of community leads. Third, we study how knowledge flows between bug tickets and how external knowledge resources aggregate on the bug tracking system. As a result, our study proposes a new way to explore bug fixing process and provides new insights into the bug fixing practice. Moreover, we give suggestions for improving the bug fixing tools and practice, and present typologies for bug fixing activities, reasons of bug references, and external resources on bug tracking systems. Our results can serve as the basis for future bug fixing studies.

Keywords: Bug Fixing, Process Mining, Knowledge Flow, WordPress

Acknowledgments

Foremost, I would like to express my deepest gratitude to my supervisor, Dr. Alexander Serebrenik, who is always there to provide timely and invaluable help. He always inspires me with his dedication and enthusiasm to research and steers me in the right direction. After taking his course “Software Evolution”, I realized that there were so many interesting things to explore in the field of software engineering and started to look into this area. He also o↵ered me opportunities to work with di↵erent excellent researchers and attend premier conferences, which significantly broadened my horizon and enriched my experience. His assistance in my future career plan was also critical for me to figure out which way to go. I cannot thank him enough for his support and encouragement in both academic and social life. It is my great honor to have him as my supervisor. Furthermore, I would also like to thank the CHISEL group, especially Prof. Dr. Margaret- Anne Storey and Alexey Zagalsky, for their continued support and patience during my intern- ship at UVic and graduation project. Their experience, comments, suggestions and tips are a precious asset for not only my current study but my future academic life. I would also thank other CHISEL members for being so kind and friendly to me. They made my two-month stay in Victoria even more enjoyable. Additionally, I would also like to acknowledge my exam committee members: Dr. Michel Westenberg and Dr. Anton Wijs for o↵ering their time to review the thesis and attend the defense. Moreover, I would like to thank all of my friends during my study in the Netherlands. Their companionship and support made this journey exciting and delightful. Finally, I must express my profound gratitude to my parents. Without their unconditional love and support, I never would have been able to get this far. I have always been far away from home since my bachelor, but their care is always by my side.

Bug Fixing in Action: Activities, Process, and Knowledge iii

Contents

Contents v

List of Figures vii

List of Tables ix

1 Introduction 1 1.1 Research Statement and Scope ...... 1 1.1.1 Bug Fixing Activities ...... 1 1.1.2 Bug Fixing Process ...... 2 1.1.3 Knowledge Flow in Bug Fixing ...... 2 1.2 Contributions ...... 2 1.3 Thesis Organization ...... 3

2 Case Selection 5 2.1 WordPress: World Leading Open-Source Software ...... 5 2.1.1 What WordPress Is and Why We Choose It ...... 5 2.1.2 WordPress in Academia ...... 6 2.2 How to Fix a Bug for WordPress ...... 6 2.2.1 Workﬂow and Tool Use during Bug Fixing in WordPress ...... 6 2.2.2 An Example of a Bug Ticket on Trac ...... 8

3BugFixingActivities 11 3.1 Introduction ...... 11 3.2 Related Work ...... 11 3.3 Case Study Design ...... 12 3.3.1 Research Questions ...... 12 3.3.2 Data Collection ...... 13 3.3.3 Data Analysis Process ...... 13 3.4 Results ...... 14 3.4.1 What are the limitations of using the bug status to understand the bug ﬁxing process ...... 14 3.4.2 What activities are conducted during the bug ﬁxing process? ..... 15 3.5 Conclusions and Future Work ...... 17 3.5.1 Conclusions ...... 17 3.5.2 Limitations ...... 17 3.5.3 Future Work ...... 18

Bug Fixing in Action: Activities, Process, and Knowledge v CONTENTS

4BugFixingProcess 19 4.1 Introduction ...... 19 4.2 Related Work ...... 19 4.2.1 Process Model Discovery ...... 20 4.2.2 Social Network Analysis ...... 21 4.3 Case Study Design ...... 22 4.3.1 Research Questions ...... 22 4.3.2 Data Collection and Analysis ...... 22 4.4 Results ...... 27 4.4.1 How can we identify the bug fixing process? What can we learn from the bug fixing process? ...... 27 4.4.2 What are communication tools mainly used for during bug fixing? .. 32 4.4.3 How do community leads distribute tasks during bug fixing? ..... 33 4.5 Conclusions and Future Work ...... 35 4.5.1 Conclusions ...... 35 4.5.2 Limitations ...... 35 4.5.3 Future Work ...... 35

5 Knowledge Flow in Bug Fixing 37 5.1 Introduction ...... 37 5.2 Related Work ...... 37 5.3 Case Study Design ...... 38 5.3.1 Research Questions ...... 38 5.3.2 Data Collection and Analysis ...... 39 5.4 Results ...... 39 5.4.1 Why and How do bug tickets are connected on Trac? ...... 40 5.4.2 What types of knowledge from external resources are shared on Trac? 40 5.5 Conclusions and Future Work ...... 43 5.5.1 Conclusions ...... 43 5.5.2 Limitations ...... 43 5.5.3 Future Work ...... 43

6 Conclusions 45

Bibliography 47

Appendix 55

A Process model based on automatically extracted activities 55

vi Bug Fixing in Action: Activities, Process, and Knowledge List of Figures

2.1 Bug ﬁxing workﬂow based on the WordPress handbook ...... 7 2.2 Bug status transitions...... 8 2.3 An example of a bug ticket on Trac ...... 9

4.1 A fragment of the event log based on new bug statuses ...... 25 4.2 Examples of community leads ...... 27 4.3 Process model based on the bug status ...... 28 4.4 Process model based on all automatically extracted activities ...... 29 4.5 The transitions between “keyword:needs-patch” and “patch:add” extracted from the process model ...... 30 4.6 The transitions between “status:new” and “patch:add” extracted from the process model ...... 30 4.7 Partial statistics table of case duration of the trimmed process ...... 30 4.8 Durations between bug opening and the ﬁrst patch (only tickets with during less than one hour are presented) ...... 31 4.9 The transitions between “commit:add” and “owner:self” extracted from the process model ...... 31 4.10 Incoming and outgoing activities of the tools and their occurrences ...... 32 4.11 Social network extracted by the “Mine for a Working-Together Social Network” plugin ...... 33 4.12 Social network extracted by the “Mine for a Similar-Task Social Network” plugin 34

5.1 Examples of external resources linked to Trac ...... 41 5.2 Distribution of di↵erent types of external resources ...... 42

A.1 Process model based on automatically extracted activities ...... 56

Bug Fixing in Action: Activities, Process, and Knowledge vii

List of Tables

2.1 Fields of a bug report ...... 10

3.1 Classiﬁcation of contributors’ activities during bug ﬁxing ...... 15

4.1 Activities extracted from Trac tickets ...... 23 4.2 Activities automatically extracted from Trac and the related activities observed from labeling Trac and communication tools ...... 24

5.1 How many Trac tickets refer to those tickets which belong to the same component 40 5.2 Percentage distribution of each type of external resources ...... 42

Bug Fixing in Action: Activities, Process, and Knowledge ix

Chapter 1

Introduction

Bug fixing is a vital activity in software development projects. According to a study conducted by Britton et al. [1], debugging occupies 49.9% of the programming time, which results in an estimated global cost of $312 billion per year including wages and overheads. Good bug fixing practices can help companies or development groups save substantial financial budgets and human resources. To enhance the bug fixing productivity, a comprehensive understanding of bug fixing is necessary. Researchers have conducted lots of studies to investigate specific activities of bug fixing, such as duplicate bug report detection [2, 3, 4, 5, 6] and bug assignment [7, 8, 9, 10, 11]. Existing research has also recognized the critical role played by tools for bug fixing, such as issue tracking and version control systems [12, 13]. All of these studies show bug fixing has been a critical part of modern software development.

1.1 Research Statement and Scope

The overarching goal of this research is to provide developers with deep insights into the modern bug fixing practices. We also expect that our research can inspire readers to reflect on how developers can improve the bug fixing process. To achieve this goal, we conduct a case study of WordPress1, which is one of the most popular open-source web software projects. Bug fixing is a broad topic and it covers many aspects. To scope this research, the study has been limited to three areas: bug fixing activities, the bug fixing process and knowledge flow in bug fixing. We choose these three areas because they are fundamental elements of bug fixing: activities constitute the process, and the process is enriched by software engineering knowledge.

1.1.1 Bug Fixing Activities Bug fixing activities are basic units of bug fixing process. During the bug fixing, developers use the bug status to announce what activities they are conducting and have conducted. However, the bug status might be incomplete and erroneous [14]. A better approach for identifying developers’ activities during bug fixing is necessary. Our research regarding bug fixing activities is guided by the following two questions.

1https://wordpress.org/

Bug Fixing in Action: Activities, Process, and Knowledge 1 CHAPTER 1. INTRODUCTION

RQ 3.1 What are the limitations of using the bug status to understand the bug ﬁxing activities?

RQ 3.2 What activities are conducted during the bug ﬁxing process?

1.1.2 Bug Fixing Process Since manually labeling bug fixing activities is time-consuming and unscalable, we need to propose a new way to automatically extract activities from bug tickets. By mining bug fixing activities we can obtain the process workflow, analyze how tools are used and how bug fixing tasks are distributed to developers. Our research regarding the bug fixing process is motivated by the following three questions. RQ 4.1 How can we identify the bug fixing process? What can we learn from the bug fixing process?

RQ 4.2 What are communication tools mainly used for during bug ﬁxing?

RQ 4.3 How do community leads distribute tasks during bug ﬁxing?

1.1.3 Knowledge Flow in Bug Fixing During the bug fixing process, knowledge from both inside and outside the bug tracking system are invaluable assets. Knowledge is the information, understanding, or skill that you get from experience or education2. In our study, we mainly consider the knowledge about repositories and tools and we are interested in the knowledge flow between bug tickets, and the knowledge flow between the bug tracking system and external resources. The research regarding knowledge flow in bug fixing is motivated by the following two questions. RQ 5.1 Why and how are bug tickets connected on Trac?

RQ 5.2 What types of knowledge from external resources are shared on Trac?

1.2 Contributions

The main contributions of this work are as follows.

A new way to explore bug fixing process. Our findings suggest that several defects exist when using the bug status to analyze the bug fixing practice. We have proposed a new way to extract the bug fixing activities, yielding valid results. Our experience can serve as the basis for studies on bug fixing process.

New insights into the bug fixing practice and suggestions for improvement. We have demonstrated how to use process mining techniques for investigating the bug fixing process from di↵erent perspectives, including process workflows, tool use, and developers’ collaboration. The results obtained from the process are good supplements for current bug fixing studies. We also provide several suggestions for improving the bug fixing practice and tools.

2http://www.merriam-webster.com/dictionary/knowledge

2 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 1. INTRODUCTION

Typologies. We have proposed several typologies in our study, which include typologies for bug ﬁxing activities, reasons of bug references, and external resources on bug tracking systems. These typology models are the basis for further software engineering studies.

1.3 Thesis Organization

The remainder of this thesis is structured as follows. In Chapter 2, we present our study case, which includes the background information of WordPress, the reasons for choosing WordPress and the ocial procedures of bug fixing. The research is divided into three topics: bug fixing activities, the bug fixing process, and knowledge flow in bug fixing. Each topic is presented in a separate chapter: bug fixing activities is studied in Chapter 3,thebugfixingprocessis studied in Chapter 4, and the knowledge flow in bug fixing is studied in Chapter 5. For each chapter, we structure the content as five sections. The first section, which is an introduction, provides an overview of the study conducted in the chapter. The second section describes background information about the topic as well as related academic studies. In the third section, we present the research questions and the methodology regarding data collection and analysis. The results are explained in the fourth section. The fifth section draws the conclusion and discusses limitations and the future work of the study. At the end of this thesis, Chapter 6 summarizes the whole thesis and sheds light on potential extension of this research.

Bug Fixing in Action: Activities, Process, and Knowledge 3

Chapter 2

Case Selection

In this chapter, we mainly describe our study case. We begin with an introduction of Word- Press, a world leading open-source content management system (CMS), and why we choose it for the case study. We then elaborate how to fix a bug for WordPress, including tools the community uses and formal bug fixing workflow.

2.1 WordPress: World Leading Open-Source Software

In this section, we ﬁrst introduce the WordPress software and why we choose it as our study case. Then we list some studies on WordPress.

2.1.1 What WordPress Is and Why We Choose It WordPress1, initially released as a blogging system in 2003, is open-source web software. It has evolved to a full content management system with a great number of themes, widgets and plugins. We choose WordPress as our study case based on the following reasons.

WordPress is world leading web software. According to a market report by • W3Techs2, a company which collects information about the technology used for websites, by July 30, 2016, “WordPress is used by 59.6% of all the websites whose content management system we know. This is 26.5% of all websites.”3 Studying how such successful software works may provide quotable experiences for other software.

WordPress has a large community. In addition to massive users online, there are • also a large number of o✏ine meetups organized by the WordPress Community. By July 30, 20164, WordPress has already gathered 346,988 active members and 1,073 meetups. The community is also very diverse, as WordPress has been translated to 162 languages and dialects5.

1https://wordpress.org/about/ 2https://w3techs.com/ 3https://w3techs.com/technologies/details/cm-wordpress/all/all 4http://www.meetup.com/topics/wordpress/ 5https://make.wordpress.org/polyglots/teams/

Bug Fixing in Action: Activities, Process, and Knowledge 5 CHAPTER 2. CASE SELECTION

The WordPress community adopts of modern tools. Tools such as SVN6, Trac7 • and Slack8 are used for coding and communication generate abundant rich information for our study.

WordPress provides easy access to archives. All the archives including bug re- • ports, chatting logs, code change histories are available online for free, which allows us to collect the data easily.

WordPress is widely studied in academia. In recent years, there has been an • increasing amount of literature on WordPress. Section 2.1.2 lists some relevant studies.

Our study focuses on the analysis of the bug reports and communication tools the Word- Press community uses. We choose them because they are the main channels the WordPress community use to provide support for bug ﬁxing.

2.1.2 WordPress in Academia WordPress has been extensively studied by the research community and the research has investigated both technical and social aspects. A number of researchers have studied the interference e↵ects of di↵erent units in Word- Press. Koskinen et al. [15] investigated the interference between the resources declared and used by di↵erent WordPress plugins, and proposed an approach to detect and locate such interference. Sayagh et al. [16] further examined the interference caused by multi-layer con- ﬁguration options in WordPress, WordPress plugins and the PHP system. Security issues have also been discussed. Koskinen et al. [17] studied the vulnerability of WordPress plugins against well known security exploits and found no correlation between the plugin ratings and the number of vulnerabilities. Norrie et al. [18] surveyed WordPress developers to understand modern web development practices and claimed that it is necessary to have a stronger focus on interface-driven development and better support for reusing elements of design and implementation from other projects. Vasilescu et al. [19] investigated technical support networks around CMS including Word- Press and revealed the presence and participation of women in online communities.

2.2 How to Fix a Bug for WordPress

In this section, we first introduce the bug fixing workflow based on the WordPress guidelines for developers and the tools the WordPress community uses for bug fixing. We then illustrate an example of a bug report.

2.2.1 Workﬂow and Tool Use during Bug Fixing in WordPress The WordPress community maintains a Core Contributor Handbook 9, which serves as a guide- book for developers who want to contribute to WordPress.

6https://wordpress.org/plugins/about/svn/ 7https://core.trac.wordpress.org/ 8https://make.wordpress.org/chat/ 9https://make.wordpress.org/core/handbook/

6 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 2. CASE SELECTION

According to the handbook, a workﬂow is followed and several tools are involved to successfully ﬁx a bug (Figure 2.1).

Trac Trac

report the bug find or be assigned to a bug copy code from SVN to local computer Implement the patch developers developers and users developers developers

Trac Trac Trac

review the patch upload the patch commit the patch to SVN close the ticket developers reviewing developers with developers with committing rights team committing rights

discuss on mailing list/IRC/Slack

Figure 2.1: Bug ﬁxing workﬂow based on the WordPress handbook

We can summarize the workﬂow as follows.

1. Report the bug on Trac. When WordPress developers or users ﬁnd a bug, they are supposed to report it on Trac10, the issue tracking system WordPress adopted for bug management.

2. Determine which bug to fix. Developers need to determine which bug to work on. Besides finding a bug to fix themselves, developers can also be assigned to a bug by others.

3. Work for a patch. Patches, simple text ﬁles showing the code changes, are the only way that a regular developer can contribute to the source code of WordPress. In other words, patches present how developers want to ﬁx the bug. To create a patch, developers need to get a working copy of WordPress from SVN11, the version control system that WordPress uses to maintain the source code. Developers are then supposed to upload the patch they implement to Trac as required in the handbook.

4. Commit to the repository. After reviewing team reviews the patch, developers with commit rights commit the patch to the code repository and close the bug ticket.

During the bug fixing, developers can use communication tools for discussion. The communication tools they use have evolved from mailing lists12,toIRC13 and then to Slack14. In the workflow, developers use the bug status to represent which bug fixing stage they are at. There are six bug statuses in total, which are “new”, “accepted”, “assigned”, “reviewing”, “closed”, and “reopened”. The transitions between these statuses can be seen in Figure 2.2.

10https://trac.edgewall.org/ 11https://subversion.apache.org/ 12https://codex.wordpress.org/Mailing_Lists 13https://codex.wordpress.org/IRC 14https://make.wordpress.org/chat/

Bug Fixing in Action: Activities, Process, and Knowledge 7 CHAPTER 2. CASE SELECTION

Figure 2.2: Bug status transitions.

2.2.2 An Example of a Bug Ticket on Trac

Since Trac is the main bug management system WordPress adopts, here we give an example of a bug ticket on Trac.

A bug ticket consists of two parts: the bug report and the comments. Bug reports contain basic information about a bug, an example of which can be seen in Figure 2.3a.Inthebug report, information is organized as di↵erent fields as shown in Table 2.1. This information is important for getting a rough idea of the bug. However, not all the fields are compulsory. When a bug is resolved, we can expect that the following fields are filled: “title”, “description”, “type”, “resolution”, “status”, “reported by”, “opened” and “closed”. Comments follow the bug report. In the comments, developers can attach files, upload patches, and discuss the bug. Meanwhile, several tools such as SVN and Slack are integrated into Trac and their activities are presented as a comment as well. As can be seen from the example Figure 2.3b, when someone mentions the bug on Slack, the link to the chat log will be displayed on Trac as a bot comment.

8 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 2. CASE SELECTION

(a) An example of a bug report

(b) Comments to the example bug report

Figure 2.3: An example of a bug ticket on Trac

Bug Fixing in Action: Activities, Process, and Knowledge 9 CHAPTER 2. CASE SELECTION

Table 2.1: Fields of a bug report

field description Title The title summarizes the bug. Type On Trac, not all tickets are bugs. A ticket can be one of the following types: defect (bug), enhancement, feature request, and task (blessed). Resolution A ticket can be closed as fixed, duplicate, invalid, worksforme, wontfix or maybelater. Status Status indicates what stage the bug is in, which can be new, assigned, reviewing, closed, reopened. Reported by “Reported by” indicates who reported the issue. Owned by The owner is used by committers and trusted core contributors to accept and assign tickets among themselves. Opened “Opened” is followed by the time when the ticket is created. Closed “Closed” is followed by the time when the ticket is closed. Milestone The milestone indicates the release where the ticket will be addressed. Priority The priority describes how serious the ticket is from the perspective of the WordPress project. Severity The severity describes how serious the ticket is from the perspective of the reporter. Version The version indicates when the issue arises. Component Components are well-defined, functional areas of WordPress. Keywords The keywords from pre-defined list describe the current status or required actions for a ticket. Focuses Focuses are specific aspects of components, such as UI, docs, performance, etc. Description Description explains the detail of the ticket, which might include how to reproduce, actual versus expected results, etc.

10 Bug Fixing in Action: Activities, Process, and Knowledge Chapter 3

Bug Fixing Activities

3.1 Introduction

Developers’ activities are basic elements of the bug fixing process. Studying developers’ activities during bug fixing can help us gain insights into modern bug fixing practice and lay a foundation for further studies of developers’ collaboration patterns. In the WordPress community, Trac serves as the main bug management system, integrating several tools such as SVN and Slack. Since the archives of these tools are publicly accessible, developers’ activities embedded in these tools are transparent to a certain extent. In this chapter, we apply a qualitative exploratory case study methodology to answer the following research questions:

RQ 3.1 What are the limitations of using the bug status to understand the bug ﬁxing activities?

RQ 3.2 What activities are conducted during the bug ﬁxing process?

By mining Trac tickets, we identify the following limitations of the bug status: 1) the bug status is not always completely recorded; 2) the bug status sometimes is not recorded in the correct order or in time; 3) the wrong bug status might be used in the bug report; 4) the bug status is not always clear for people to understand its meaning. We also analyze the potential impacts of these limitations. After analyzing archives of Trac and the communication tools, we identify activities conducted by developers during bug ﬁxing and classify them into eight categories. We also compare our results with previous studies and our results show higher emphasis on developers’ awareness, social and organizational activities. In the end of this chapter, we conclude our results, state the limitations of our research and propose the corresponding future work. Our concerns mainly focus on the threats to validity regarding the impacts of research sample selection, time change and di↵erences between communities.

3.2 Related Work

A number of studies have been conducted to understand developers’ activities during bug ﬁxing.

Bug Fixing in Action: Activities, Process, and Knowledge 11 CHAPTER 3. BUG FIXING ACTIVITIES

A bug’s life cycle consists of transitions between bug statuses, including self-loops. The bug status is frequently used as one of the important information sources to study developers’ activities. For example, Zhang et al. [20] combined information of the bug status and developers’ actions (including responding to tickets and editing source code) to analyze delays when fixing bugs. Weiss et al. [21] predicted the time spent on fixing bugs by investigating the average lifespan (i.e., duration between the first and the last bug statuses.) of similar, order bugs. Moreover, a number of researchers have explored the particular activities based on the bug status. For instance, Jeong et al. [22] focused on the reassignment activity (i.e., the transition of the bug status from “assigned” to “assigned”), and Jongyindee et al. [23] focused on the reopening activity (i.e., the transition of the bug status from “closed” to “reopened”). However, the bug status might not always be reliable. Aranda et al. [14] found the bug report fields sometimes were filled incorrectly. This finding indicates that bias can be introduced if we only use the bug status to study developers’ activities. Researchers have also investigated developers’ activities at a more detailed level instead of checking the bug status. D’Ambros et al. [24] regarded developers’ activities as modifications of bug fields. They visualized activities with black bars positioned based on when activities happened, such that people could easily understand the relationship between the activities and the entire bug lifespan. Crowston and Scozzi [25] identified developers’ activities by examining the text of the bug report and comments and classified these activities into six categories: submit, assign, analyze, fix, test&post and close. They also discovered how these categories depend on each other according to the order of occurrence.

3.3 Case Study Design

3.3.1 Research Questions To understand the bug fixing activities, bug reports are an important source of information since they record bug statuses, bug reporter and developers’ communication, as well as links to various software artifacts. Researchers have used information from bug reports to mine the bug fixing process. How- ever, Aranda et al. [14] claimed bug reports were not always correctly recorded by investigating bug reports of Microsoft projects. Since developers of open-source software might be located around the globe and the bug reports are accessible by the public, they might need more detailed bug reports to collaborate and assist newcomers onboard. Therefore, we wonder if the bug reports for open source software can precisely reflect the bug fixing process. Here, we propose our research question RQ 3.1 What are the limitations of using the bug status to understand the bug fixing activities? From the results of RQ 3.1 we know bug statuses have many limitations when used for understanding the bug fixing activities. Additionally, the statuses are limited and it is im- possible to interpret developers’ activities during transitions of these statuses. Currently, most bug fixing process studies focus on the bug statuses, which provide very limited insights into human activities. Although Crowston et al. [26] has investigated the tasks involved in bug fixing, their study is only based on bug reports. Since some activities are conducted in other channels such as communication tools [14], we can assume that tasks extracted from only bug reports are incomplete and cannot comprehensively present the entire bug fixing

12 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 3. BUG FIXING ACTIVITIES process. Thus, we want to combine di↵erent information sources (i.e., Trac, mailing list, IRC and Slack) and study the process from a deeper perspective. We are interested in the following question: RQ 3.2 What activities are conducted during the bug ﬁxing process?

3.3.2 Data Collection As mentioned, the main goal of this chapter is to o↵er important insights into bug ﬁxing activities. Qualitative exploratory case study methodology [27] is employed to answer the questions presented in Section 3.3.1. We mined data from the public archives of Trac, mailing list, IRC and Slack of the WordPress Project. To obtain the data, we designed a crawler to scrape the whole database of the Trac system (until May 30, 2016) and downloaded mailing list archives1. We did not crawl the IRC and Slack chatting logs since crawling is prohibited according to their robots exclusion protocol, namely robots.txt. Therefore, we manually stored the related data from IRC and Slack when necessary. In our study, we used the bugs reported in recent 10 years (from May 30, 2006 to until May 30, 2016). In total, there were 21,178 bug tickets which fell into this range.

3.3.3 Data Analysis Process Limitation of the Bug Status To answer RQ 3.1, we selected 150 bug tickets as our sample data. To examine the limitations of the bug status for understanding bug ﬁxing activities, we needed to ensure that the bug ticket contained enough information. Thus, we focused on bugs with abundant comments. Since the median value of the comment number of all closed bug tickets was 5.0, we decided to only study those bugs with more than ﬁve comments. The selected bug tickets were also latest as of the date when the database was collected (i.e., May 30, 2016). The data analysis process of RQ 3.1 consists of two phases.

1. For each of the 150 selected tickets, we carefully compared the tickets with status changes. Our goal was to examine whether the activities embedded in the bug tickets could be precisely presented by the bug status. To achieve this goal, we read all comments to understand the context surrounding the bug, when we found the status should change at a certain point, we checked if the change was correctly recorded in the tickets. If not, we wrote down the ticket number and the type of mistake. After finishing examining all tickets, we summarized those mistakes. 2. We wrote a script to list all the possible status change paths of all of the 19031 closed bug tickets. We then checked if there were any abnormal patterns in the paths when compared with the ocial workflow (Figure 2.1). Our aim is to find problems with the bug status from a high level.

Activities during Bug Fixing Since activities in di↵erent channels might be di↵erent, to get a comprehensive understanding of what developers do during bug ﬁxing, we need to consider not only the bug tracking system

1 https://codex.wordpress.org/Mailing_Lists

Bug Fixing in Action: Activities, Process, and Knowledge 13 CHAPTER 3. BUG FIXING ACTIVITIES

Trac, but also other tools for communication, including mailing list, IRC and Slack. These tools are adopted by the WordPress community for bug fixing and their archives are publicly accessible. We divided Trac tickets into four categories: 1) Trac tickets with links to Slack discussion; 2) tickets with links to IRC discussion; 3) Trac tickets with links to mailing list discussion; and 4) Trac tickets without links to any communication tools. To answer RQ 3.2, we selected 30 tickets from each category with the same selection criteria as RQ 3.1. We adopted an inductive approach [27] to analyze the data. We read the comments on Trac and messages on communication channels, extracted the activities from the text and wrote these activities down when they first emerged. After collecting all activities, we checked if they could be merged and obtained a new full list of activities. After getting all the activities, we categorized them to get a more clear structure for the activities. Since Aranda et al. [14] already proposed a list of goals based on the contributors’ activities, we reused them and classified the activities we observed into their categories. The categories include:

Discovery. Detect the unexpected behavior and record it as a bug. • Diagnosis. Understand the nature, cause and impact of the bug and plan the actions • to take for ﬁxing the bug. Assignment. Determine the person who will be responsible for the bug. • Search. Find the related knowledge, resources, skills, experts for the bug. • Correction. Correct code and change relevant artifacts such as documentation. • Closure Determine to close the ticket. • Awareness. Communicate status to relevant participants. • During the analysis process, we classiﬁed activities, which did not fall into any of the categories, into “others”.

3.4 Results

3.4.1 What are the limitations of using the bug status to understand the bug fixing process To answer RQ 3.1, we sampled 150 bug tickets and compared the bug status with ticket comments. We also checked the status change paths of all bugs. We summarized the issues in the bug status and categorized them into the four main types: 1) Completeness, 2) Synchrony, 3) Correctness, and 4) Explicitness. Below we present our typology of issues regarding the bug status, meanwhile, some typical examples are given to support the typology. Completeness. We examined whether all statuses were recorded. We found that incomplete status records were common in practice. For example, in ticket #36637, the owner of the bug was assigned but the status never changed to “assigned”. Another example is ticket #36906: some bug fixing activities were conducted after the ticket status was set to “closed”. That is, a “reopened” status is missing. We also found the statuses “accepted” and “reviewing” were not widely used, appearing in only 27 and 22 out of 150 tickets, respectively while accepting and reviewing the tickets were common actions during bug fixing.

14 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 3. BUG FIXING ACTIVITIES

Synchrony. We examined whether the status was recorded in the correct order in time. Two patterns against synchrony were found in our study: 1) the status was not assigned in time after the corresponding activity was executed. For example, ticket #36296 was not closed after committing and stayed inactive for several weeks. This ticket was finally closed after a reminder of another developer. 2) The status was assigned before the corresponding activity was finished. For example, the status of ticket #36055 was set to “assigned” when the developer was asking if another developer could take charge of it. Correctness. We examined whether the status recorded was correct. In our study, two patterns against correctness were identified: 1) the status was recorded even if it did not happen. For example, the status of ticket #36660 was set to assigned although it was not assigned to anyone. 2) the developer selected the wrong status instead of a correct one. For instance, the status of ticket #8570 was changed to “new” from “reopened” after assigning an owner to the ticket. Explicitness. We examined whether the status recorded was explicit and unambiguous. The results show that a status might be used to record di↵erent activities. For example, in ticket #36096, status “accepted” represented that developers approved the validity of the ticket and agreed to implement a patch, while in ticket #35956, status “accepted” represented that developers accepted the patch and agreed to commit the patch. The validity of understanding bug fixing process with bug statuses can be negatively impacted by the issues presented above. When trying to discover the bug fixing model, incomplete records of bug status will lead to incomplete process workflow. Besides, asynchronous bug statuses can introduce bias into time-based process analysis such as bottleneck detection. Additionally, incorrect and implicit statuses may also result in misinterpretation of the bug fixing process. Moreover, the bug status can only provide very shallow insights into bug fixing process since merely six statuses are in use. These statuses do not cover all of the developers’ activities and what happens between di↵erent statuses remains unknown.

3.4.2 What activities are conducted during the bug fixing process? To answer RQ 3.2, we sampled 120 bug tickets and identified eight main types of activities that contributors conduct during bug fixing: (1) Discovery (2) Diagnosis (3) Assignment (4) Search (5) Correction (6) Closure (7) Awareness (8) Others. Table 3.1 lists these activities and explains the classification criteria.

Table 3.1: Classiﬁcation of contributors’ activities during bug ﬁxing

Activity Explanation Discovery present the issue Use text/screenshots to describe the issue encountered as well as its environment. request for further explanation of the issue Request bug reporter to provide more details about the issue. refine the issue description/summary Modify the description or summary of the ticket. discover new problems Discover new bugs during the bug fix process. discover remaining problems Discover what problems still exist after applying the attempted fix. Diagnosis discuss the nature/cause Discuss what causes the issue, when the issue starts to appear and what type (defect, enhancement, etc.) and component (template, multisite, etc.) the issue belongs to. Continued on next page

Bug Fixing in Action: Activities, Process, and Knowledge 15 CHAPTER 3. BUG FIXING ACTIVITIES

Table 3.1 – continued from previous page Activity Explanation discuss how to duplicate the issue Discuss how to reproduce the issue, including environment settings and duplication steps. accept the ticket Accept the ticket and prepare to work on it. discuss the solution Discuss how to solve the issue; the solution can also include planned actions, the impact of the solution and guideline for the solution. discuss the impact of the issue Discuss how severe and important the issue is, how dicult to solve the issue, what problems will arise due to the issue. discuss the scope of the new problems Decide if the new issue found during the bug fixing is related to the current issue. link to related/duplicated issues Discover related/duplicated issues. discuss when to fix Discuss by which version or which date the issue should be fixed. Q&A related technical details Discuss, introduce, or learn related techniques and specific skills. check the issue status Check the current status of the issue. reopen the ticket Reopen closed tickets. Assignment request somebody to execute a task Ask someone if he/she could deal with an issue or test/review the patch. claim a task for themselves A contributor claims he/she will/is doing some task, or accepts a request. assign a task to somebody A contributor assigns a task to someone else. discuss contributors responsibility Discuss which contributor takes charge of which part of the issue. Search look for related information Discover information which help solve the problem, including online tutorials, documents, previous experience, etc. look for/refer to a relevant person Discover people who are familiar with the issue. Correction add a patch Upload a patch to Trac. explain a patch Use text/screenshots to explain the patch (including what the patch does and its impact.) request a new/updated patch Request patch author to update the patch. commit to fix this bug Commit the patch to the source code repository. link to relevant commit Refer to the commit which fixes a relevant bug. decide which patch to commit Discuss which patch to use for fixing the bug. request for further explanation of the patch Request patch author to give more information about the patch. request for a review/test Request contributors to review or test the patch. review/test a patch Give opinions/suggestions/feedback to a patch/commit. update documentation Update related documentation. Closure decide whether to close Discuss if the issue should be closed, or suggest to close the ticket. close the ticket Close the ticket and set the resolution. link to follow-up tickets Refer to new tickets for remaining/newly discovered problem. Awareness announce the bug status Announce what activities has been finished, e.g., announce a patch has been uploaded. announce the contributor’s status Announce if the contributor is available. introduce the ticket into discussion group A contributor post ticket number or the link of the issue to start a discussion, and a bot displays the summary of the ticket. remind others to do something Remind other contributors to execute some tasks. Others organize a talk/meeting Organize a talk or a meeting to discuss relevant problems or make the plan for bug fixing. express emotion/appreciation Express personal feelings or thank others for their help/work. welcome new members Welcome people who open the first ticket and introduce Trac and the WordPress community. Continued on next page

16 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 3. BUG FIXING ACTIVITIES

Table 3.1 – continued from previous page Activity Explanation select first bugs for new members Select first bugs for new members to fix before they deal with complex issues. skip the issue temporarily Make a decision to deal with other tickets first and put the current ticket aside for a while. make unrelated mistakes and correct them Post irrelevant comments because of misunderstanding the ticket, and then clarify the mistake.

In our study, we used the categories from Aranda et al. [14] to classify developers’ activities during bug ﬁxing. Our results indicate that their categories did not cover all the activities. These activities are more related to social and organizational aspects, e.g., “organize a talk/meeting” and “express emotion/appreciation”. Open-source software projects usually have a higher developer turnover rate than closed-source commercial software projects, and helping newcomers onboard is an important activity for keeping the community prosperous. Therefore, we could expect some new activities regarding helping newcomers onboard, such as “select ﬁst bugs for new members” and “welcome new members”. Compared to typology proposed by Crowston and Scozzi [25], we considered activities about developers’ awareness while they did not. Maintaining group awareness is vital for smooth and e↵ective collaboration [28], thus, it is worthwhile to pay attention to developers’ awareness-related activities. Moreover, similar to the categories from Aranda et al., Crowston and Scozzi did not include social and organizational activities. Furthermore, we also found several activities were conducted more on the communication tools, for example, they were more likely to “look for/refer to a relevant person” and “request somebody to execute a task” on the communication tools. Communication tools also allow developers to “remind others to do something”, which did not happen on Trac.

3.5 Conclusions and Future Work

3.5.1 Conclusions In this study, we first investigate the limitation of bug statuses for understanding developers’ activities and we find four types of limitations in terms of completeness, synchrony, correctness and explicitness. We then analyze what activities are conducted by developers during bug fixing and compare our results with other research. Our new typology can serve as a necessary supplement for previous studies.

3.5.2 Limitations Although our findings are insightful, some limitations still exist. First, the number of data samples for analysis is not large, which might introduce bias into our results. Second, we did not consider the e↵ect of evolution of bug fixing practices when we were studying developers’ activities. However, the way developers collaborate to fix bugs might be di↵erent from 10 years ago. Since we only studied latest bugs that met our selection criteria, some activities only commonly conducted long time ago might not be disclosed. Third, di↵erent communities might have di↵erent regulations and standard procedures for bug fixing, the results would not be exactly the same if we apply our methods into other communities.

Bug Fixing in Action: Activities, Process, and Knowledge 17 CHAPTER 3. BUG FIXING ACTIVITIES

3.5.3 Future Work Although we have found the limitations of the bug status for studying developers’ activities, we did not propose a solution to address the problem. We expect future work on automatic identiﬁcation and correction of wrong statuses. We would also be interested to study whether developers’ activities evolve over time and whether the adoption of new tools changed their bug ﬁxing practice. As a continuation of this study, we also anticipate that we could apply our methods to other open-source software communities and examine the generalization ability of our results.

18 Bug Fixing in Action: Activities, Process, and Knowledge Chapter 4

Bug Fixing Process

4.1 Introduction

Understanding the process of bug ﬁxing can assist to learn the modern software development practices followed. Mining the process can help us validate the designed process model [29], and discover the ineciencies and inconsistencies during bug-ﬁxing [30]. In this chapter, we investigate the following research questions:

RQ 4.1 How can we identify the bug ﬁxing process? What can we learn from the bug ﬁxing process?

RQ 4.2 What are communication tools mainly used for during bug ﬁxing?

RQ 4.3 How do community leads distribute tasks during bug ﬁxing?

To identify the bug fixing process, we first extract activities automatically from bug tickets and convert these activities into an event log. We feed the event log to process mining tools and obtain the process model of bug fixing. We analyze the process model and identify the bottlenecks as well as other notable phenomena. To understand what communication tools are mainly used for, we analyze the incoming and outgoing activities of the tools and select the top five activities. We find Slack and IRC are used in a similar way but Slack outperforms in bug assignment. To understand how community leads distribute tasks during bug fixing. We use social network analysis to analyze the whether community leads are divided into several small groups to fix same bugs or execute similar tasks. The results indicate community leads do not follow a clear task distribution policy regarding who to work with and what tasks to execute. In the end of this chapter, we summarize our results, present the limitations of our research and describe potential future work.

4.2 Related Work

Process mining is a technique “to discover, monitor and improve real process by extracting knowledge from event logs readily available in today’s (information) systems” [31]. The discovered model of process mining can be either a process model (e.g., aPetrinet[32], BPMN [33]) or models that describe other perspectives (e.g., a social network [34]).

Bug Fixing in Action: Activities, Process, and Knowledge 19 CHAPTER 4. BUG FIXING PROCESS

In this section, we present the theory, related academic studies, and tools of two commonly used techniques in process mining: process model discovery and social network analysis.

4.2.1 Process Model Discovery

Theory

Process model discovery techniques can extract process model by automatically analyzing the imported event logs. Each item in the event logs is supposed to contain four elements: 1) case (i.e., a process instance), 2) activity (i.e., a well-deﬁned step), 3) performer (i.e., the person who initiates or executes the activity), 4) timestamp [35]. Several process model discovery algorithms have been proposed for mining event logs, including ↵-algorithm [36], heuristics miner [37] and genetic miner [38]. However, since real-life processes are less structured and more complex, the discovered models are often “spaghetti-like”. Therefore, a new process mining approach was proposed to adaptively simplify the process, namely, fuzzy miner [39].

Process Model Discovery in Software Engineering

Process mining can help software engineers better understand, analyze and optimize their software processes [40]. Poncin et al. [41] mined bug status changes to discover the bug life cycle. The results provided an initial view on how bugs were handled in practice and suggested the actual bug life cycle is di↵erent from the life cycle prescribed by an ocial guide. Gupta et al. [30] also employed process mining techniques to discover the bug fixing process. They compared the designed process with the real process and identified the inconsistencies between them. Moreover, they identified self-loops, back-forth transitions, and bottlenecks. They also applied process mining to the event logs extracted from multiple software repositories including bug tracking system, peer code review system and version control system [42]. Similarly, bottlenecks and some interesting bug fixing patterns were discovered.

Tools

Several tools have been developed to discover the process model. One of the most popular process mining tools is ProM [43]. ProM is an open-source framework for integrating process mining techniques, which are implemented as plugins. It provides a standard environment such that users can easily install or update the plugins, operate with clear GUI, import data, and export results. Regarding the process model discovery, ProM provides plugins for most state-of-the-art process model discovery algorithms including ↵-algorithm, heuristic miner, genetic miner and fuzzy miner. Disco [44] is another popular process mining tool. The core functionality of Disco is that Disco can discover process models automatically after importing event logs. The process discovery algorithm is based on the optimized Fuzzy miner. With Disco, users can easily check the process statistics, ﬁlter the event logs, analyze the performance based on duration, and visualize the process ﬂow over time.

20 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS

4.2.2 Social Network Analysis Theory The ﬁrst step of social network analysis is to extract the social network from event logs. There are several metrics for discovering the social network [45], for example, reassignment metric (i.e., if an activity is frequently reassigned from one individual to another), working-together metric (i.e., if performers frequently work on the same cases), and similar task metric (i.e., if performers frequently work on the same activities). After obtaining the social network, there are many metrics available to measure the network [46]. Di↵erent metrics should be used for di↵erent purposes. For example, degree, which represents the number of vertices connected to a node, can be used to understand how popular a vertex is. Distance centrality, which measures the proximity of a vertex to the rest, can be used to understand the ability of the vertex to inﬂuence other nodes.

Social Network Analysis in Software Engineering Software development can be modeled as self-organizing, collaboration, social networks rather than random network [47]. Social networks are commonly used to represent developers’ collaboration, which is often related to the software success. Meneely et al. [48] used social network analysis to understand the relationship between developers’ collaboration and software failure and found a strong correlation exists between the social network measures and failures. Wolf et al. [49] converted team communication structures into a social network and predicted whether an integration would fail based on relevant network structure metrics. Sureka et al. [50] demonstrated how social network analysis could be used to discover potential risks and vulnerabilities in the software. Social networks are also used to analyze the structure of software project. Crowston et al. [51] analyzed the social network of interactions represented in 62,110 bug reports and a negative relationship was found between the level of centralization and the project size, which indicated larger projects were more modular. Howison et al. [52]inquiredintothe social structure of development teams over time. They found change of the distribution of centralization is uncommon and large project teams do not change more frequently.

Tools Several tools has been developed to extract and analyze social networks. The process mining tool ProM [43] can also be used to extract the social network from the event logs. There are some other tools that are widely used for social network analysis. Unlike ProM, these tools cannot analyze the event logs directly. That is, users need to manually generate the social network and these tools can assist visualizing and analyzing the network. NetworkX [53] is a Python language package for social network analysis, which provides data structures for representing networks and can be used to calculate network properties and structure measures such as shortest paths and betweenness centrality. Gephi [54] is another popular tool for social network analysis. Gephi provides abundant modules for social network analysis. The most important advantage to NetworkX is that it provides a user friendly GUI, which makes the analysis process more interactive and convenient.

Bug Fixing in Action: Activities, Process, and Knowledge 21 CHAPTER 4. BUG FIXING PROCESS

4.3 Case Study Design

4.3.1 Research Questions Process mining, a technique to discover the process, aims to automatically construct the models explaining the behavior observed in the event log [55]. Understanding bug fixing process can help us discover the abnormal behaviors, optimize bug fixing workflow and improve working productivity. Therefore, we propose the following questions: RQ 4.1 How can we identify the bug fixing process? What can we learn from the bug fixing process? Another aspect for investigating bug fixing process is the tools developers use. Vyas et al. [56] carried out fieldwork in a multinational software team to understand the use of bug tracking system by interviews and in-situ observations. However, the use of other tools for bug fixing has received scant attention. Currently, there are many studies on tools for software development [57, 58, 59], but few of the studies have investigated the tool use for bug fixing. Communication tools such as the mailing list, IRC and Slack are widely used for discussing bugs in the WordPress Community. We are curious what role these communication tools play during bug fixing. Thus, we propose the question: RQ 4.2 What are communication tools mainly used for during bug fixing? The WordPress community is led by two main avenues: the community leads and the community volunteers. The community leads consist of WordPress co-founder, lead developers and core developers with permanent commit access 1. These community leads are the driving force of the community, and their management ensures that the community runs smoothly and successfully. We are curious about their managerial coordination, that is, we have the following question: RQ 4.3 How do community leads distribute tasks during bug fixing? More specifically, we would like to investigate 1) whether they are divided into several small groups which deal with bugs together; and 2) whether they are assigned to specific tasks during the bug fixing process.

4.3.2 Data Collection and Analysis To study the bug ﬁxing process, we used the data of 10-year Trac tickets collected in Section 3.3.2. Our approaches to analyzing data and answering research questions are as follows.

How can we identify the bug fixing process? What can we learn from the bug fixing process? To answer RQ 4.1, we used process mining techniques to discover the process model of bug fixing. From RQ 3.1 we learned that using only bug statuses as activities to discover the bug fixing process had many limitations, thus, we intended to create new events to provide deeper insights into bug fixing and diminish the impact of bias in bug statuses. Since producing a generalized process model required a large number of cases, it is inecient and unscalable to manually label all the tickets with activities. Therefore, we automatically extracted developers’ actions and changes of ticket fields and then converted them into activities. For developers’ actions, we formatted the activity as “obejct:action”. For example, when a developer uploaded a patch, we extracted this activity and recorded it as “patch:add”. For

1https://make.wordpress.org/core/handbook/about/organization/#the-wordpress-core-team

22 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS ticket field changes, we formatted the activity as “field name:field content”. For example, when a developer changed the “resolution” field into “fixed”, we recorded this activity as “resolution:fixed”. However, for some fields, we did not care what content they set the field to because we only considered developers’ activities. For example, when a user set the milestone for a Trac ticket, we cared more about the developer’s action instead of the exact version he/she set. Thus, we recorded this activity as “milestone:set”. Another reason for such a manipulation was that the process model would be extremely complex if we created a new activity for each milestone. We also divided the changes of the “owner” field into “owner:self ” and “owner:other” based on whether the developer assigned the Trac ticket to himself/herself. The complete list of activities extracted can be found in Table 4.1.

Table 4.1: Activities extracted from Trac tickets

Activity Meaning Frequency Ratio commit:add Apatchhasbeencommittedtocoderepository 22,756 11.71% component:set The component of the ticket has been set 4,679 2.41 % focus:set The focus of the ticket has been set 1,111 0.57 % keyword:2nd-opinion An opinion from another person is needed 618 0.32 % keyword:close The ticket is ready for closure 1,109 0.57 % keyword:commit The patch has been reviewed and tested by a trusted 2,761 1.42 % member and is ready to commit keyword:dev-feedback Feedback from a core developer or trusted members is 997 0.51 % needed keyword:early The ticket is a priority, and should be handled early in 548 0.28 % the next release cycle keyword:good-first-bug A good starting point for new contributors before tack- 134 0.07 % ling complicated bugs keyword:has-patch ApatchhasbeenuploadedtoTrac 7,730 3.98% keyword:has-screenshots Screenshots have been uploaded to Trac (corresponding 80 0.04 % to keyword:needs-screenshots) keyword:has-unit-tests Unit tests have been added to the patch (corresponding 112 0.06 % to keyword:needs-unit-tests) keyword:i18n-change String has been changed and translators need to be no- 50 0.03 % tified keyword:needs-codex Documentation in the Codex needs updating or ex- 67 0.03 % panding keyword:needs-docs Inlinedocumentationforthecodeisneeded 64 0.03% keyword:needs-patch Apatchisneeded. 2,468 1.27% keyword:needs-refresh The patch needs to be merged and re-submitted be- 371 0.19 % cause it becomes invalid since the change of source code. keyword:needs-screenshots Screenshots are needed because the UI has been 25 0.01 % changed by a commit/patch keyword:needs-testing The patch is needed to be tested 1,274 0.66 % keyword:needs-unit-tests The ticket has been reviewed, found to be desirable 418 0.22 % to solve but needs more unit test due to high risks of causing other issues keyword:reporter-feedback Feedback from the bug reporter is needed 2,111 1.09 % keyword:ui-feedback Feedback regarding user interface from the core team is 136 0.07 % needed keyword:ux-feedback Feedback regarding user experience from the core team 107 0.06 % is needed milestone:set The milestone of the field was set 23,625 12.16 % Continued on next page

Bug Fixing in Action: Activities, Process, and Knowledge 23 CHAPTER 4. BUG FIXING PROCESS

Table 4.1 – continued from previous page Activity Meaning Frequency Ratio owner:other Acontributorhasassignedthetickettoothers 2,722 1.4% owner:self A contributor has assigned the ticket to himself/herself 6,449 3.32 % patch:add Apatchhasbeenuploadedbyacontributor 23,091 11.89% priority:set The priority of the ticket is set 1,575 0.81 % resolution:duplicate Theresolutionoftheticketissettoduplicate 3,314 1.71% resolution:fixed Theresolutionoftheticketissettofixed 16,725 8.61% resolution:invalid Theresolutionoftheticketissettoinvalid 4,449 2.29% resolution:maybelater Theresolutionoftheticketissettomaybelater 144 0.07% resolution:wontfix Theresolutionoftheticketissettowontfix 1,629 0.84% resolution:worksforme Theresolutionoftheticketissettoworksforme 2,026 1.04% status:accepted The ticket is accepted 909 0.47 % status:assigned The ticket is closed 1,986 1.02 % status:closed The ticket is closed 24,647 12.69 % status:new Theticketiscreated 21,339 10.99% status:reopened Theticketisreopened 3,469 1.79% status:reviewing Theticketisunderreviewing 620 0.32% summary:change Thesummaryofthetickethasbeenchanged 1,990 1.02% version:set Theversionofthetickethasbeenset 3,820 1.97%

Meanwhile, we mapped the activities extracted from the Trac into the activities we observed in Section 3.1 such that we could know what activities could be revealed and what activities were still hidden when using an automatic method to extract activities. The mapping scheme can be seen as Table 4.2. Although the automatically extracted activities do not cover all activities we labeled, it has greatly enriched the activities represented by only bug statuses. We extracted process models based on these automatically extracted activities.

Table 4.2: Activities automatically extracted from Trac and the related activities observed from labeling Trac and communication tools

Automatically generated activity Corresponding manually labeled activity commit:add commit to fix this bug component:set discuss the nature/cause focus:set discuss the nature/cause keyword:2nd-opinion request for a review/test keyword:close decide whether to close keyword:commit decide which patch to commit keyword:dev-feedback request for a review/test keyword:early discuss when to fix keyword:good-first-bug select first bugs for new members keyword:has-patch add a patch keyword:has-screenshots explain a patch keyword:has-unit-tests review/test a patch keyword:i18n-change announce the bug status keyword:needs-codex update documentation keyword:needs-docs Inline documentation for the code is needed keyword:needs-patch request a new/updated patch keyword:needs-refresh request a new/updated patch keyword:needs-screenshots request for further explanation of the patch keyword:needs-testing request for a review/test keyword:needs-unit-tests request for a review/test Continued on next page

24 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS

Table 4.2 – continued from previous page Automatically generated activity Corresponding manually labeled activity keyword:reporter-feedback request for further explanation of the issue keyword:ui-feedback request for a review/test keyword:ux-feedback request for a review/test milestone:set discuss when to fix owner:other assign a task to somebody owner:self claim a task for themselves patch:add add a patch priority:set discuss the impact of the issue resolution:duplicate close the ticket resolution:fixed close the ticket resolution:invalid close the ticket resolution:maybelater close the ticket resolution:wontfix close the ticket resolution:worksforme close the ticket status:accepted accept the ticket status:assigned claim a task for themselves, assign a task to somebody status:closed close the ticket status:new present the issue status:reopened reopen the ticket status:reviewing review/test a patch summary:change refine the issue description/summary version:set discuss the nature/cause

In real practice, several actions might be taken at the same time, e.g., the resolution was set at the same time as ticket closure and process mining techniques might give them a random order. That is, we might get a process model that setting the bug resolution follows ticket closure. Since we regard the bug opening as the ﬁrst phase of a bug’s lifecycle and bug closure as the last phase, we need to manually adjust the time of bug opening and closure to ensure that they were in the right places we expected. To achieve this goal, We performed a pre-processing step before generating the event log: one second was subtracted from the time of ticket opening and one second was added to the time of ticket closure. Figure 4.1 shows a small fragment of the event log.

Figure 4.1: A fragment of the event log based on new bug statuses

We used Disco2 to discover the process model of bug ﬁxing. We chose Disco rather than 2https://ﬂuxicon.com/disco/

Bug Fixing in Action: Activities, Process, and Knowledge 25 CHAPTER 4. BUG FIXING PROCESS other process mining tools because Disco could provide detailed statistics. It also provided powerful log ﬁlters which allow users to easily tailor the log to needs based on case performance, timeframe, variation, attributes, event relationships, or endpoints. Our process analysis consists of two steps as follows. First, we prepare two event logs: one with only bug statuses, the other with automatically extracted statuses. We then feed them into Disco and generate the process model with the default settings. We compared these models with ocial Trac workﬂow and recorded discrepancies. Second, we inspected the median durations between connected activities and analyzed whether they were ecient based on our experience.

What are communication tools mainly used for during bug ﬁxing?

To understand what communication tools are mainly used for during bug fixing, we summarized incoming activities (i.e., activities right before the tool use) and outgoing activities (i.e., activities right after the tool use) of the tools and counted their occurrences. We then selected top five incoming and top five outgoing activities based on the occurrences. We next analyzed these activities and made assumptions about the main use of tools. We identified the use of the tools by the links to their chatting archives. During data analysis, we found mailing lists only connected to 142 activities while Slack and IRC connected 3895 and 1434 activities respectively. When the WordPress community used mailing lists as the main communication tool to discuss bugs, there was no integration for automatically posting the link to the mail archive on Trac. Therefore, we could identify the use of mailing lists only if developers manually posted the link on Trac. Thus, many information from mailing lists might be missing. Moreover, the small number of mailing list occurrences made it incomparable with IRC and Slack. Thus, we only discuss Slack and IRC for this question.

How do community leads distribute tasks during bug ﬁxing?

When a group of people executes tasks, two approaches for task distribution are often used: 1) people are divided into small groups to execute the whole process of the task; 2) people are divided into groups that execute specialized sub-tasks, namely flow production [60]. To understand whether it is the case for the bug fixing, we decomposed the research question RQ 4.3 into two sub-questions: 1) Are community leads divided into groups to fix same bugs together? 2) Are community leads divided into groups to conduct similar bug fixing activities? We used social network analysis to answer these questions. In detail, we used the working together metric to understand how frequently community leads are performing activities for the same tickets; and used the similar task metric to understand whether di↵erent groups of community leads are performing di↵erent clusters of activities. The first step is to identify community leads. On Trac, community leads are presented with a special label such as “Core Committer” and “4.6 Release Lead” (Figure 4.2). When checking the source code of any ticket page, we can find the community leads are listed in the variable “wpTracContributorLabels”. Thus, we identified 40 community leads.

26 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS

Figure 4.2: Examples of community leads

We first used ProM to generate the social network. To study whether community leads were divided into several groups to work on bug fixing, we used “Mine for a Working-Together Social Network” plugin, which constructed social network based on if performers frequently worked on the same cases. To study whether community leads focused on specific tasks, we used “Mine for a Similar-Task Social Network” plugin, which constructed social network based on if performers frequently worked on the same activities. The plugins could also group clusters. The idea behind this step is that if community leads were divided into several groups or focused on specific tasks, the obtained social network would show clusters. For example, if the social network obtained from “Mine for a Working-Together Social Network” plugin contained five clusters, we could indicate that developers were divided into five groups and members in the group were more connected and had more chances to work with other members within the group. By observing the social network diagram, we could only get an initial hypothesis and we would need to verify our observation. We need a metric to quantitatively assess the possibility of dividing the network into clusters. We used “modularity” metric to evaluate the social network decomposition [61]. “Modularity” can be either positive or negative. When “modularity” is negative or zero, the social network is unable to divide. Positive values indicate the possible presence of the clusters. “Modularity” is always less than 1, and high “modularity” value indicates a strong cluster structure. In practice, the “modularity” value of networks with a strong cluster structure is typically between 0.3 and 0.7 [62]. Since ProM did not support exporting the social networks and it did not provide any mechanisms to analyze the structure of social networks, we had to use another tool for the network analysis. We wrote a script to construct the same networks as ProM and then fed it into Gephi. We used Gephi because it provided a convenient and interactive interface such that we could easily perform calculation and read the results. Since Gephi already integrated “modularity” module, we could easily get the modularity value by running “modularity” module.

4.4 Results

4.4.1 How can we identify the bug fixing process? What can we learn from the bug fixing process? To answer RQ 4.1, we extracted activities from the bug status, contributors’ actions and change of Trac fields. We fed the event log with only bug status and the event log with all

Bug Fixing in Action: Activities, Process, and Knowledge 27 CHAPTER 4. BUG FIXING PROCESS activities into Disco respectively and obtained corresponding process models. We compared these process models with the ocial Trac workﬂow (Figure 2.2).

Process model based on the bug status Figure 4.3 shows the process model based on the bug status. In the diagram, the triangle symbol represents the start of the process and the stop symbol illustrates the end of the process. The events are represented by boxes and the number stands for how many times these events occurred. The arrows stand for the transition from one event to another event. The two values near the arrow mean how many times these transitions happen and how long these transitions take (median value).

21,178

status:new 1 8.3 mins 21,339 (instant)

18,254 63.6 hrs

status:closed 85 44 hrs 24,647 (instant)

3,469 3,174 543 9.8 hrs 14.3 hrs 37 hrs

status:reopened 580 729 5 d 23.3 hrs 3,469 (instant)

840 31 1,812 75 66.8 hrs 68.5 hrs 42.1 hrs 6 d

21,178 1,799 76 status:reviewing 65.6 hrs 20.2 hrs 620 (instant)

103 40 22.7 hrs 23.6 hrs

status:accepted 46 3.2 d 909 (instant)

69 64 3 d 68.1 hrs

status:assigned 2 38.5 mths 1,986 (instant)

Figure 4.3: Process model based on the bug status

When we compare this process model and the ocial Trac workﬂow, we can ﬁnd following phenomena.

The status of most bug tickets (86.2%) is changed to “closed” directly after “new”. How- • ever, it does not mean that no activities happen during two statuses, as we discovered in Section 3.4.1 that bug status is not completely recorded.

Most of the time, the actual workflow follows the ocial workflow. However, some • minor di↵erences still exist. For example, in the real workflow model, we could not find the pattern of loop “accepted” albeit it exists in the ocial workflow.

28 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS

Mistakes occur when changing the bug status. The status of “new” is not always served • as starting point: the status of one Trac ticket changed to “new” from “new”, 85 tickets changed to “new” from “reopened” and 75 changed to “new” from “assigned”. This fact backs our ﬁnding in Section 3.4.1 that sometimes the bug status is recorded incorrectly. However, consider these 161 wrong transition to the status “new” only count for 0.75% ((1+85+75)/21339) of all occurrences of status “new”.

Process model based on all automatically extracted activities Although the process model based on the bug status can bring us some insights into the lifecycle of a bug, we are still not clear about what exactly happen between two statuses. Thus, to get a deeper understanding of contributors’ actions during bug ﬁxing, we obtained a new process model based on activities automatically extracted as Figure 4.4. We also attached the ﬁle to the appendix as Figure A.1 for easy viewing. In the process model, the sum of the incoming arrows is not necessarily equal to the number in the state or the sum of the outgoing arrows. Because the model is abstraction of the real process, it only presents frequent transitions and some uncommon transitions are omitted. We can imagine if the image display all the transitions, the process model will be “spaghetti-like” due to the huge number of cases and it will be hard for us to read the process model.

21,178

status:new 21,339 (instant)

18 875 235 459 4.8 hrs 4.5 hrs 5.4 hrs 12 mins

keyword:reporter-feedback keyword:close 5,494 summary:change 2.3 hrs 2,111 (instant) 1,109 (instant) 1,990 (instant)

keyword:ux-feedback 841 563 390 12.5 d 44.4 hrs 46.7 hrs 107 (instant)

milestone:set 23,625 (instant)

1,601 1,188 2,259 2,061 92 1,009 81 instant instant instant instant instant instant instant

1,435 version:set 684 resolution:worksforme resolution:invalid resolution:duplicate 3,707 resolution:maybelater resolution:wontfix instant instant 111.7 mins 3,820 (instant) 2,026 (instant) 4,449 (instant) 3,314 (instant) 144 (instant) 1,629 (instant)

23 489 1,679 3,754 3,051 115 1,385 11 keyword:needs-refresh 18.7 hrs instant 1 secs 1 secs 1 secs 1 secs 1 secs 13 hrs 371 (instant)

component:set status:closed 4,679 (instant) 24,647 (instant)

481 priority:set 7 9 instant 53.3 hrs 26.2 hrs 1,575 (instant)

193 keyword:needs-screenshots keyword:needs-codex 16 hrs 25 (instant) 67 (instant)

focus:set 5 641 8,240 3.3 d instant 56 secs 1,111 (instant)

98 commit:add 14,384 instant 1 secs 22,756 (instant)

4,719 instant

300 owner:self 147 8,440 176 106.3 mins 5.7 hrs instant 8.1 hrs 6,449 (instant)

683 352 4,652 878 256 138 instant instant instant 2.5 hrs 4.8 d 10.2 d

516 status:accepted status:reviewing resolution:fixed instant 909 (instant) 620 (instant) 16,725 (instant)

130 2,264 5,575 15,822 instant instant 2.6 hrs

1,615 status:reopened keyword:needs-patch 10.7 hrs 3,469 (instant) 2,468 (instant)

501 1,822 33 13 14 14 65 5.6 hrs 8.1 hrs 22.3 hrs 14.1 mins 2.1 hrs instant instant

keyword:needs-docs 941 keyword:good-first-bug 13.2 hrs 64 (instant) 134 (instant)

22 85 5 hrs 33 hrs

patch:add 23,091 (instant)

510 4,599 47.5 mins 111 secs

owner:other keyword:has-patch 359 175 100 45 2.9 hrs 21.3 hrs 28.8 d 66.8 hrs 2,722 (instant) 7,730 (instant)

1,318 852 45 10 28 770 76 54 23 300 170 instant instant instant 20 secs instant instant instant instant instant instant instant

status:assigned keyword:commit keyword:has-unit-tests keyword:i18n-change keyword:has-screenshots keyword:needs-testing keyword:needs-unit-tests keyword:early keyword:ui-feedback keyword:dev-feedback keyword:2nd-opinion 1,986 (instant) 2,761 (instant) 112 (instant) 50 (instant) 80 (instant) 1,274 (instant) 418 (instant) 548 (instant) 136 (instant) 997 (instant) 618 (instant)

Figure 4.4: Process model based on all automatically extracted activities

What we can learn from the process based on all automatically extracted activities By observing the process model carefully, we have found some notable facts.

When the need for a patch is announced, developers respond and create the patch fast. • As can be seen from Figure 4.5, the median duration between “keyword:needs-patch”

Bug Fixing in Action: Activities, Process, and Knowledge 29 CHAPTER 4. BUG FIXING PROCESS

and “patch:add” is 13.2 hours, which means that more than half of the bugs can be ﬁxed in less than one day when developers get notiﬁed of the bug.

Figure 4.5: The transitions between “keyword:needs-patch” and “patch:add” extracted from the process model

For at least one third of the bugs, developers upload the patch right after they report • the bugs without any other activities on Trac and the median duration is less than one minute (Figure 4.6). The fact indicates many developers have already created a patch before reporting the bugs. To further verify this fact, we trimmed the event logs in Disco and got the process model between “status:new” and the ﬁrst “patch:add”. We inspected the statistics of case duration of the trimmed process. The screenshot of partial statistics table can be seen in Figure 4.7.

Figure 4.6: The transitions between “status:new” and “patch:add” extracted from the process model

Figure 4.7: Partial statistics table of case duration of the trimmed process

30 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS

We exported the table to Excel, and performed simple statistics. We found that 7159 out of the 11722 bug tickets with at least a patch uploaded their ﬁrst patch within one hour after they opened the tickets. We then obtained the Pareto chart [63] for the durations of these 7159 tickets, which can be seen as Figure 4.8.

Figure 4.8: Durations between bug opening and the ﬁrst patch (only tickets with during less than one hour are presented)

We can see that more than 5000 tickets, 42.7% of the bug tickets with at least a patch, get their ﬁrst patch in three minutes. The result suggests that a large percentage of bugs’ lives are still hidden from the bug tickets. As can be seen in Figure 4.9, 4719 of 64449 self assignment come at the same time • as committing to the code repository, which indicates the timestamp of “owner:self” cannot be used as the time when developers accept the task and start working on the bug. Self assignment is more like a remark about who commit the patch.

Figure 4.9: The transitions between “commit:add” and “owner:self” extracted from the process model

The bottleneck of the bug ﬁxing process is collecting feedback. We can ﬁnd three longest • durations between connected activities. The median durations between three activit-

Bug Fixing in Action: Activities, Process, and Knowledge 31 CHAPTER 4. BUG FIXING PROCESS

ies “reporter-feedback”, “keyword:2nd-opinion”, “keyword:dev-feedback” and their next activities are 12.5 days, 10.2 days, and 4.8 days respectively. One potential reason of this problem is that developers are not aware that it is their turn to take actions. A better communication and task assignment mechanism might help unblock the bottleneck.

4.4.2 What are communication tools mainly used for during bug ﬁxing? To answer RQ 4.2, we calculated the top 5 incoming and outgoing activities for the tool use. The results can be seen as Figure 4.10.

patch:add status:closed keyword:has-patch status:new milestone:set 461 236 189 175 147

Slack

patch:add commit:add milestone:set owner:other keyword:commit 502 467 275 77 76

(a) Incoming and outgoing activities of Slack and their occurrences

patch:add milestone:set status:new commit:add keyword:has-patch 181 72 69 67 61

IRC

patch:add commit:add milestone:set resolution:ﬁxed keyword:needs-patch 227 182 99 23 19

(b) Incoming and outgoing activities of IRC and their occurrences

Figure 4.10: Incoming and outgoing activities of the tools and their occurrences

From Figure 4.10a, we can see that Slack is often used before the patch is created, and possible activities are diagnosing the cause and discussing the solution for the bug. Moreover, based on these incoming and outgoing activities we can infer Slack is often used for discussing when to fix the bug, who the bug should be assigned to, and which patch to commit. From Figure 4.10b, we can see four of the five incoming activities and three of the five outgoing activities of IRC are same as Slack. We can thus claim that Slack and IRC are used in a similar way. However, some di↵erences still exist. For example, Slack is widely used to discuss the bug before the bug is assigned. One potential reason for these di↵erences is that it is easier to search for people who can fix the bug on Slack. Because IRC is synchronous communication tool, developers need to be online to discuss the bug. Otherwise, they have to access the

32 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS separate site which stores chatting logs and search for the related discussion. Slack breaks through this limitation due to its asynchronous characteristic and the messages on Slack has much more chances to approach most of the developers who use the tool. Thus, we might assume Slack makes bug assignment easier. This assumption could be further validated by surveying or interviewing developers from the WordPress community.

4.4.3 How do community leads distribute tasks during bug ﬁxing? To understand whether community leads are divided into several groups to work on bug ﬁxing, we extract the social network with “working together metric” as Figure 4.11.

Figure 4.11: Social network extracted by the “Mine for a Working-Together Social Network” plugin

In the social network, the presence of clusters is invisible. Thus, we could infer community leads are not divided into groups which ﬁx bugs together. They are likely to ﬁx bugs with any community lead. To verify our assumption, we calculated “modularity” in Gephi. As a result, the network was divided into two clusters with a “modularity” value 0.096, which

Bug Fixing in Action: Activities, Process, and Knowledge 33 CHAPTER 4. BUG FIXING PROCESS suggests the cluster structure is weak. Thus, we can back our assumption and state that there is no apparent evidence to support the clustering based on working-together metric. To understand whether community leads are divided into groups to conduct similar activities, we extracted the social network with “similar task metric” as Figure 4.12.

Figure 4.12: Social network extracted by the “Mine for a Similar-Task Social Network” plugin

Similarly, the presence of clusters is also invisible in this network. Thus, we could infer community leads are not divided into groups which conduct similar tasks. They do not follow the practice of ﬂow production. To further verify this assumption, we also calculated “modularity” and we found the network was divided into two clusters with a “modularity” value 0.006. As it is very close to 0, we can state that the network is almost unable to divide based on similar-task metric. In summary, community leads do not have a clear task distribution policy regarding who to work with and what tasks to execute.

34 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 4. BUG FIXING PROCESS

4.5 Conclusions and Future Work

4.5.1 Conclusions In this chapter, we propose a new approach for extracting activities from bug tickets and retrieve the process model of bug fixing from the activities. We analyze the process model and identify notable phenomena such as big portions of developers have already created patches before they report the bugs. We also identify the process often gets stuck for a while when requesting others’ feedback. We also analyze the popular incoming and outgoing activities of the tool use and find a similar use between Slack and IRC, however, Slack is widely used before bug assignment and it might have advantages in finding developers to fix the bug. We inspect how community leads collaborate during the bug fixing process and find they do not follow a specific task distribution policy about who to work with and what tasks to execute.

4.5.2 Limitations Limitations exist in our study. When extracting activities from the bug tickets, some of the activities we identify are not covered although we have tried to retrieve as many activities as possible. Furthermore, when analyzing the use of tools, we only get some assumptions but do not have a chance to verify them. However, our assumptions can serve as a conﬁrmatory question of further study. When inspecting community leads’ task distribution patterns, we do not consider the impact of lead team personnel changes.

4.5.3 Future Work As an extension of our work, future work include:

Surveying and interviewing WordPress community members to get more insights at • a deeper level. We would like to further validate our ﬁndings with the surveys and interviews. We would also like to explore more aspects of bug ﬁxing, such as the rationales of choosing and switching tools to work with.

Improving activity extraction techniques. Text processing techniques might be used to • automatically recognize the content of bug report comments and retrieve the relevant activities.

Bug Fixing in Action: Activities, Process, and Knowledge 35

Chapter 5

Knowledge Flow in Bug Fixing

5.1 Introduction

In the ﬁeld of software engineering, the main asset a software organization has is the knowledge held by employees [64]. Development teams use di↵erent media channels including emails and instant messaging (IM) to support the transfer of knowledge and the channels are constantly evolving over time [65]. Since knowledge is a broad concept, in our study, we mainly focus on explicit individual knowledge, including private tools and repositories. In this chapter, we investigate the following research questions:

RQ 5.1 Why and how are bug tickets connected on Trac?

RQ 5.2 What types of knowledge from external resources are shared on Trac?

We identify reasons why developers refer to one ticket in another and categorize the reasons into seven types: 1) similar issues, 2) duplicate bugs, 3) beneﬁcial patches, 4) follow-ups, 5) cause of the bug, 6) clustering, 7) suggestions for investigation. We also check whether most tickets are linked to other tickets from the same components. The results were negative. We identify external links other than WordPress Trac and categorized their types into six types: 1) communication, 2) bug related demonstration, 3) community maintenance, 4) code maintenance, 5) bug related technical information, and 6) tool reference. Based on the distribution of external resource types, we propose two approaches regarding tools and onboarding newcomers to improve the bug ﬁxing practice. In the end of this chapter, we summarize our results, present the limitations of our research regarding tool evolution and describe potential future work.

5.2 Related Work

Knowledge can be either explicit (i.e., formal, systematic and can be easily shared and com- municated) or implicit (i.e., less formal, not well structured, and cannot be easily articulated). Based on where the knowledge resides, knowledge can also be either individual or collective (i.e., shared). In the ﬁeld of software engineering, knowledge can be classiﬁed into four categories [66]:

Bug Fixing in Action: Activities, Process, and Knowledge 37 CHAPTER 5. KNOWLEDGE FLOW IN BUG FIXING

explicit individual knowledge includes private tools and repositories. • explicit collective knowledge includes development methodologies/techniques, IEEE stand- • ards, tailored inspection process and software quality models.

implicit individual knowledge includes subjective software development experiences (e.g., • debugging, design, testing).

implicit collective knowledge includes success / failure stories of project teams and team • culture.

Knowledge is commonly formed, shared, manipulated and captured on media (i.e., development tools and communication channels) [67]. A case study on R community indicates that the community uses multiple channels eciently at the same time to share and curate knowledge as each channel has its own advantages and disadvantages [68]. To understand how knowledge is shared during the software development process, the knowledge flow model was proposed to present peer-to-peer knowledge sharing and management in cooperative teams [69]. Grasping how knowledge flows in a software organization helps to locate the areas which can be improved and inspire ideas for approaches to the im- provements [70]. A few e↵orts have been made on extracting knowledge flows in software engineering. Gendreau et al. [71] proposed a model to present the knowledge flow in software project development, which linked knowledge sources to five basic cognitive factors: acquisi- tion, validation, synchronization, realization and crystallization. Mitchell et al. [72]presented a knowledge mapping technique to model software project-level knowledge flows.

5.3 Case Study Design

In this section, we introduce the research questions regarding knowledge ﬂow in bug ﬁxing, and explain how we collected and analyzed the data.

5.3.1 Research Questions Our main research goal is to understand how knowledge flows during bug fixing process. However, since the whole bug fixing process is full of di↵erent types of knowledge, to make our study more organized and logical, we focus on “explicit individual knowledge”, mainly involving private tools and repositories. We investigate the knowledge flow from two perspectives: 1) knowledge flow between Trac tickets, and 2) knowledge flow between Trac and external resources. Regarding knowledge flow between Trac tickets, we propose the following research question: RQ 5.1 Why and how are bug tickets connected on Trac? Bug tickets are not isolated islands. We can see that other tickets are frequently referred to during the bug fixing process. We would like to study for what purposes developers refer to other Trac tickets. Since components divide the tickets into several groups based on the functionality, we might assume those tickets within the same component are more connected. We would like to examine whether most Trac tickets refer to those tickets which belong to the same component.

38 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 5. KNOWLEDGE FLOW IN BUG FIXING

Regarding knowledge ﬂow between Trac and external resources, we propose the following research question: RQ 5.2 What types of knowledge from external resources are shared on Trac? Trac, which integrates all kinds of external information, is the main hub for bug ﬁxing in the WordPress community. We wonder what types of external knowledge artifacts are shared on Trac.

5.3.2 Data Collection and Analysis

To study the knowledge ﬂow in bug ﬁxing, we used the data of Trac tickets produced in the last 10 years and collected in Section 3.3.2. Our approaches to analyzing data and answering research questions are as follows.

Why and how are bug tickets connected on Trac?

To answer RQ 5.1, we conducted two studies. We ﬁrst collected latest tickets which contain links to other tickets. During the data collection, we removed duplicate ticket references caused by quotation. To analyze the why developers refer to other tickets in Trac, we followed a qualitative analysis method: we started with 30 tickets and manually inspected the context of the ticket reference and wrote down the reason, and categorized it. We then analyzed more tickets and iteratively formed emergent reasons until no new reasons emerged. We then used all the bug tickets which contained links to other bug tickets to examine whether most Trac tickets were referred by those which belonged to the same component. This goal could be easily achieved by comparing the components of the tickets which referred to other tickets and tickets which were referred.

What types of knowledge from external resources are shared on Trac?

To answer RQ 5.2, we collected 100 tickets which contained links to external resources other than Trac system of WordPress. During the data collection, we removed duplicated external links caused by quotation. Moreover, links in the error messages were abandoned, since it usually contained server host information and bug reporters did not expect developers to access those links to obtain information. Furthermore, dummy links, which are made up to explain the bug (e.g., example.com), were also removed. We followed a qualitative analysis method used for the previous research question: we manually inspected the content of the external resources and categorized these resources based on their functionality.

5.4 Results

To understand how knowledge flows in bug fixing, we first identified why developers referred to other Trac tickets, and examined whether most of the Trac tickets referred to other tickets which belonged to the same component. We then investigated and categorized what external resources other than Trac were presented by developers. In this section, we present our findings.

Bug Fixing in Action: Activities, Process, and Knowledge 39 CHAPTER 5. KNOWLEDGE FLOW IN BUG FIXING

5.4.1 Why and How do bug tickets are connected on Trac? We identiﬁed several typical reasons for ticket references on Trac.

Similar issues. The two tickets discuss similar but not exactly the same issues, or • they share part of the problems. Duplicate bugs. One ticket is the duplicate of another ticket, i.e., they discuss the • same issue. Beneficial patches. The patch of one bug can help fix the other bug to a certain • extent. Follow-ups. One ticket is proposed as the follow-up of the other ticket because some • remaining issues have not been resolved. Cause of the bug. The bug in one ticket is caused when fixing the bug in the other • ticket. Clustering. Sometimes, Trac tickets are created to discuss several similar/related • existing tickets. Suggestions for investigation. Developers sometimes refer to tickets with a spe- • cific attribute and suggest another developer to inspect those tickets. For example, developers refer to tickets with keyword “good-first-bug” for newcomers such that newcomers can start with not complex tickets first.

We identiﬁed 2165 ticket references and checked whether the two tickets (referring and being referred to) belonged to the same component.

Table 5.1: How many Trac tickets refer to those tickets which belong to the same component

Component Number of cases Percentage Same 1108 51.2% Di↵erent 1057 48.8%

From Table 5.1 we can know that around half of the tickets are linked to the tickets which belong to the di↵erent components. The result contradicts our original assumption and suggests that Trac tickets from di↵erent components also play an important role when ﬁxing bugs although the tickets have di↵erent functionality.

5.4.2 What types of knowledge from external resources are shared on Trac? We identiﬁed 262 external links from the selected 100 Trac tickets. WordPress SVN is excluded in the links because it has been completely integrated to Trac system and become a part of Trac. We accessed these external links, inspected the content of the webpages and classiﬁed them into six categories based on their functionality.

Communication. These links direct to chat logs of the communication tools. Since • we used the latest tickets for analysis, Slack is the only tool the WordPress community used for discussing bug ﬁxing. However, as we know, mailing lists and IRC were used before.

40 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 5. KNOWLEDGE FLOW IN BUG FIXING

Bug related demonstration. These links direct to resources used to demonstrate the • bug before or after a patch. We identiﬁed three types of demonstration:

– Screenshots. Developers upload screenshots to ﬁle or image sharing services (e.g., Dropbox and Imgur). – Screencasts. Developers record the running scenario of the software when the bug occurs and upload it to video sharing website, such as Youtube. – Demo websites. Developers set up running website of the software and demonstrate the bug.

Community maintenance. These links direct to the community or the support forum. • They are used in two cases: 1) developers post links to the core contributor guide to inform newcomers how to contribute to the community; 2) developers ask reporters to post on support forum when they open an invalid ticket (i.e., the ticket is a technical support request instead of a bug).

Code maintenance. These links direct to the code, commits or code change logs • related to the bug.

Bug related technical information. These links direct to tutorials, API references, • software manuals, discussions (on forums, Q&A sites or other issue tracking sites) or articles (often posted on personal websites or blogs) which are useful for ﬁxing the bug.

Tool reference. These links direct to the software, WordPress plugins, WordPress • themes, third-party libraries or services.

Figure 5.1 shows some of the external resources linked to WordPress. Trac, which serves as a hub, connects di↵erent kinds of information.

Figure 5.1: Examples of external resources linked to Trac

We calculated the ratio of each type of external resources and the results can be seen in Table 5.2. We also visualized the results with a sunburst chart [73] as Figure 5.2.

Bug Fixing in Action: Activities, Process, and Knowledge 41 CHAPTER 5. KNOWLEDGE FLOW IN BUG FIXING

Table 5.2: Percentage distribution of each type of external resources

First-levelcategory Second-levelcategory percentage Communication 24.3% Bug related demonstration Screenshots 18.0% Screencasts 0.4% Demo sites 7.1% Community maintenance Core contributor handbook 6.7% Support forum 4.7% Code maintenance 10.6% Bug related technical information Otherissuetrackingsystem 3.5% Forum 3.9% Blogs and articles 4.7% Q&A sites 1.2% API references and software manuals 6.3% Tool reference 8.6%

Figure 5.2: Distribution of di↵erent types of external resources

Based on these results, we suggest two approaches in which bug tracking systems and practices can be improved.

42 Bug Fixing in Action: Activities, Process, and Knowledge CHAPTER 5. KNOWLEDGE FLOW IN BUG FIXING

Better tool support for screenshots. Although bug reporters can upload screen- • shots to Trac, many of them still prefer to upload screenshots to third-party image sharing services. One possible cause of this behavior is that Trac only allows users to upload one attachment each time, which is inconvenient for users with multiple screenshots. Once the third-party image sharing services are down, the images on Trac will be broken. Thus, it is necessary to encourage users to upload screenshots to Trac directly.

Better support for onboarding newcomers. As we can see, over 10 percent of the • external links are for community maintenance. If there is a mechanism on the Trac which guides users to learn how to use the system and report valid bugs, developers might save a lot of energy on dealing with inexperienced newcomers.

5.5 Conclusions and Future Work

5.5.1 Conclusions In this study, we mainly focus on how knowledge ﬂows between Trac tickets and what types of external knowledge aggregate on Trac. We ﬁrst categorize reasons why developers refer to other Trac tickets and then prove tickets from di↵erent components were also often connected. Moreover, we classify external resources based on their functionality and propose two suggestions for improving the bug tracking system and practice.

5.5.2 Limitations Like any empirical study, sample selection might introduce bias to our results. In our case, when studying external resources, we selected latest tickets as our study samples. However, since tools and services evolve fast, external resources popular now might not even exist several years ago. However, the results we observed cab reﬂect the modern bug ﬁxing practice.

5.5.3 Future Work Since our samples are only a small part of the whole bug database, when studying external resources, we would expect heuristic approaches for identifying the types of external resources. In our work, we mainly study explicit individual knowledge, however, knowledge in bug ﬁxing covers much more than that. We would also like to research other types of knowledge such as implicit collective knowledge.

Bug Fixing in Action: Activities, Process, and Knowledge 43

Chapter 6

Conclusions

In this thesis, we investigate bug fixing from three perspectives: bug fixing activities, the bug fixing process and knowledge flow in bug fixing. Regarding bug fixing activities, we identify the limitations of using the bug status to understand the bug fixing activities and categorize them into four types: 1) incomplete statuses, 2) asynchronous statuses, 3) incorrect statuses and 4) implicit statuses. The impacts of these limitations are presented. We also identify developers’ activities during bug fixing and classified them into eight categories: 1) discovery, 2) diagnosis, 3) assignment, 4) search, 5) correction, 6) closure, 7) awareness and 8) others. Compared to other studies, the activities we identify highlight the awareness category and also represent activities from communication tools. Regarding the bug fixing process, we first propose an approach to automatically extract activities from bug tickets and we then convert the activities into an event log and use Disco to retrieve the process model of bug fixing. By analyzing the process model, we identify the bottlenecks of the process and several notable facts about the process. We also analyze what Slack and IRC are used for and compare their use. At last, we explore the community leads’ collaboration pattern during the bug fixing process and find they do not have a clear task distribution policy in terms of who to work with and what tasks to execute. Regarding knowledge flow in bug fixing, we first analyze the purposes that developers refer to other Trac tickets and identify seven main categories: 1) similar issues, 2) duplicate bugs, 3) beneficial patches, 4) follow-ups, 5) cause of the bug, 6) clustering and 7) suggestions for investigation. We also prove only half of the tickets refer to other tickets from the same components. We then classify external resources based on their functionality and propose two suggestions regarding tool support and onboarding newcomers. Our study can serve as the basis for future bug fixing research. Our results shed lights on the modern bug fixing practices and provide suggestions for improving bug fixing process.

Bug Fixing in Action: Activities, Process, and Knowledge 45

Bibliography

[1] Tom Britton, Lisa Jeng, Graham Carver, Paul Cheak, and Tomer Katzenellenbogen. Reversible debugging software. Technical report, Judge Bus. School, Univ. Cambridge, Cambridge, UK, 2013. 1 [2] Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering, pages 461–470. ACM, 2008. 1 [3] Yuan Tian, Chengnian Sun, and David Lo. Improved duplicate bug report identification. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on, pages 385–390. IEEE, 2012. 1 [4] Ashish Sureka and Pankaj Jalote. Detecting duplicate bug report using character n-gram- based features. In 2010 Asia Pacific Software Engineering Conference, pages 366–374. IEEE, 2010. 1 [5] Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the 2011 26th IEEE/ACM Interna- tional Conference on Automated Software Engineering, pages 253–262. IEEE Computer Society, 2011. 1 [6] Nicholas Jalbert and Westley Weimer. Automated duplicate detection for bug tracking systems. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), pages 52–61. IEEE, 2008. 1 [7] Liguo Chen, Xiaobo Wang, and Chao Liu. Improving bug assignment with bug tossing graphs and bug similarities. In 2010 International Conference on Biomedical Engineering and Computer Science, pages 1–5. IEEE, 2010. 1 [8] Pamela Bhattacharya, Iulian Neamtiu, and Christian R Shelton. Automated, highly- accurate, bug assignment using machine learning and tossing graphs. Journal of Systems and Software, 85(10):2275–2292, 2012. 1 [9] Olga Baysal, Michael W Godfrey, and Robin Cohen. A bug you like: A framework for automated assignment of bugs. In Program Comprehension, 2009. ICPC’09. IEEE 17th International Conference on, pages 297–298. IEEE, 2009. 1 [10] John Anvik, Lyndon Hiew, and Gail C Murphy. Who should fix this bug? In Proceedings of the 28th international conference on Software engineering, pages 361–370. ACM, 2006. 1

Bug Fixing in Action: Activities, Process, and Knowledge 47 BIBLIOGRAPHY

[11] John Anvik. Automating bug report assignment. In Proceedings of the 28th international conference on Software engineering, pages 937–940. ACM, 2006. 1

[12] Nicolas Serrano and Ismael Ciordia. Bugzilla, itracker, and other bug trackers. IEEE software, 22(2):11–13, 2005. 1

[13] Kamel Ayari, Peyman Meshkinfam, Giuliano Antoniol, and Massimiliano Di Penta. Threats on building models from cvs and bugzilla repositories: the mozilla case study. In Proceedings of the 2007 conference of the center for advanced studies on Collaborative research, pages 215–228. IBM Corp., 2007. 1

[14] Jorge Aranda and Gina Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of the 31st International Conference on Software Engineering, pages 298–308. IEEE Computer Society, 2009. 1, 12, 14, 17

[15] Laleh Mousavi Eshkevari, Giuliano Antoniol, James R. Cordy, and Massimiliano Di Penta. Identifying and locating interference issues in PHP applications: the case of wordpress. In Chanchal K. Roy, Andrew Begel, and Leon Moonen, editors, 22nd Inter- national Conference on Program Comprehension, ICPC 2014, Hyderabad, India, June 2-3, 2014, pages 157–167. ACM, 2014. 6

[16] Mohammed Sayagh and Bram Adams. Multi-layer software conﬁguration: Empirical study on wordpress. In Michael W. Godfrey, David Lo, and Foutse Khomh, editors, 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015, Bremen, Germany, September 27-28, 2015, pages 31–40. IEEE Computer Society, 2015. 6

[17] Teemu Koskinen, Petri Ihantola, and Ville Karavirta. Quality of wordpress plug-ins: An overview of security and user ratings. In 2012 International Conference on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 International Confernece on Social Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012, pages 834– 837. IEEE, 2012. 6

[18] Moira C Norrie, Linda Di Geronimo, Alfonso Murolo, and Michael Nebeling. The forgot- ten many? a survey of modern web development practices. In International Conference on Web Engineering, pages 290–307. Springer, 2014. 6

[19] Bogdan Vasilescu, Andrea Capiluppi, and Alexander Serebrenik. Gender, representation and online participation: A quantitative study. Interacting with Computers, 26(5):488– 511, 2014. 6

[20] Feng Zhang, Foutse Khomh, Ying Zou, and Ahmed E Hassan. An empirical study on factors impacting bug ﬁxing time. In 2012 19th Working Conference on Reverse Engineering, pages 225–234. IEEE, 2012. 12

[21] Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, and Andreas Zeller. How long will it take to ﬁx this bug? In Proceedings of the Fourth International Workshop on Mining Software Repositories, page 1. IEEE Computer Society, 2007. 12

[22] Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. Improving bug triage with bug tossing graphs. In Proceedings of the the 7th Joint Meeting of the European Software

48 Bug Fixing in Action: Activities, Process, and Knowledge BIBLIOGRAPHY

Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE ’09, pages 111–120, New York, NY, USA, 2009. ACM. 12

[23] Anakorn Jongyindee, Masao Ohira, Akinori Ihara, and Ken-ichi Matsumoto. A case study of committers activities on the bug ﬁxing process in the eclipse project. In In- ternational Workshop on Empirical Software Engineering in Practice 2011 (IWESEP 2011), page 9. 12

[24] Marco D’Ambros, Michele Lanza, and Martin Pinzger. ”a bug’s life” visualizing a bug database. In 2007 4th IEEE International Workshop on Visualizing Software for Under- standing and Analysis, pages 113–120. IEEE, 2007. 12

[25] Kevin Crowston and Barbara Scozzi. Bug ﬁxing practices within free/libre open source software development teams. Journal of Database Management, 19(2):1–30, APR-JUN 2008. 12, 17

[26] Kevin Crowston, Kangning Wei, James Howison, and Andrea Wiggins. Free/libre open- source software development: What we know and what we do not know. ACM Comput. Surv., 44(2):7:1–7:35, March 2008. 12

[27] Per Runeson, Martin Host, Austen Rainer, and Bjorn Regnell. Case study research in software engineering: Guidelines and examples. John Wiley & Sons, 2012. 13, 14

[28] Carl Gutwin, Reagan Penner, and Kevin Schneider. Group awareness in distributed software development. In Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81. ACM, 2004. 17

[29] Wikan Sunindyo, Thomas Moser, Dietmar Winkler, and Deepak Dhungana. Improving open source software process quality based on defect data mining. In International Conference on Software Quality, pages 84–102. Springer, 2012. 19

[30] Monika Gupta and Ashish Sureka. Nirikshan: Mining bug report history for discovering process maps, ineciencies and inconsistencies. In Proceedings of the 7th India Software Engineering Conference, page 1. ACM, 2014. 19, 20

[31] Wil Van Der Aalst, Arya Adriansyah, Ana Karla Alves De Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter van den Brand, Ronald Brandtjen, Joos Buijs, et al. Process mining manifesto. In International Conference on Business Process Management, pages 169–194. Springer, 2011. 19

[32] Khodakaram Salimifard and Mike Wright. Petri net-based modelling of workﬂow systems: An overview. European journal of operational research, 134(3):664–676, 2001. 19

[33] Stephen A White. BPMN modeling and reference guide: understanding and using BPMN. Future Strategies Inc., 2008. 19

[34] Stanley Wasserman and Katherine Faust. Social network analysis: Methods and applications, volume 8. Cambridge university press, 1994. 19

Bug Fixing in Action: Activities, Process, and Knowledge 49 BIBLIOGRAPHY

[35] Boudewijn F Van Dongen, Ana Karla A de Medeiros, HMW Verbeek, AJMM Weijters, and Wil MP Van Der Aalst. The prom framework: A new era in process mining tool support. In International Conference on Application and Theory of Petri Nets, pages 444–454. Springer, 2005. 20 [36] AK Alves de Medeiros, Boudewijn F van Dongen, Wil MP Van der Aalst, and AJMM Weijters. Process mining: Extending the ↵-algorithm to mine short loops. Technical report. 20 [37] AJMM Weijters, Wil MP van Der Aalst, and AK Alves De Medeiros. Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, 166:1–34, 2006. 20 [38] Ana Karla A de Medeiros, Anton JMM Weijters, and Wil MP van der Aalst. Genetic process mining: an experimental evaluation. Data Mining and Knowledge Discovery, 14(2):245–304, 2007. 20 [39] Christian W Günther and Wil MP Van Der Aalst. Fuzzy mining–adaptive process sim- plification based on multi-perspective metrics. In International Conference on Business Process Management, pages 328–343. Springer, 2007. 20 [40] Vladimir Rubin, Christian W Günther, Wil MP Van Der Aalst, Ekkart Kindler, Boudewijn F Van Dongen, and Wilhelm Schäfer. Process mining framework for software processes. In International Conference on Software Process, pages 169–181. Springer, 2007. 20 [41] Wouter Poncin, Alexander Serebrenik, and Mark van den Brand. Process mining software repositories. In Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on, pages 5–14. IEEE, 2011. 20 [42] Monika Gupta, Ashish Sureka, and Srinivas Padmanabhuni. Process mining multiple repositories for software defect resolution from control and organizational perspective. In Proceedings of the 11th Working Conference on Mining Software Repositories, pages 122–131. ACM, 2014. 20 [43] HMW Verbeek, JCAM Buijs, BF Van Dongen, and Wil MP van der Aalst. Prom 6: The process mining toolkit. Proc. of BPM Demonstration Track, 615:34–39, 2010. 20, 21 [44] Christian W Günther and Anne Rozinat. Disco: Discover your processes. BPM (Demos), 940:40–44, 2012. 20 [45] Wil MP Van Der Aalst, Hajo A Reijers, and Minseok Song. Discovering social networks from event logs. Computer Supported Cooperative Work (CSCW), 14(6):549–593, 2005. 21 [46] Luis López-Fernández, Gregorio Robles, Jesus M Gonzalez-Barahona, and Israel Herraiz. Applying social network analysis techniques to community-driven libre software projects. vol, 1:28–50, 2009. 21 [47] Gregory Madey, Vincent Freeh, and Renee Tynan. The open source software development phenomenon: An analysis based on social network theory. AMCIS 2002 Proceedings, page 247, 2002. 21

50 Bug Fixing in Action: Activities, Process, and Knowledge BIBLIOGRAPHY

[48] Andrew Meneely, Laurie Williams, Will Snipes, and Jason Osborne. Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 13– 23. ACM, 2008. 21 [49] Timo Wolf, Adrian Schroter, Daniela Damian, and Thanh Nguyen. Predicting build failures using social network analysis on developer communication. In Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pages 1–11, Washington, DC, USA, 2009. IEEE Computer Society. 21 [50] Ashish Sureka, Atul Goyal, and Ayushi Rastogi. Using social network analysis for mining collaboration data in a defect tracking system for risk and vulnerability analysis. In Proceedings of the 4th india software engineering conference, pages 195–204. ACM, 2011. 21 [51] Kevin Crowston and James Howison. Hierarchy and centralization in free and open source software team communications. Knowledge, Technology & Policy, 18(4):65–85, 2006. 21 [52] James Howison, Keisuke Inoue, and Kevin Crowston. Social dynamics of free and open source team communications. In IFIP International Conference on Open Source Systems, pages 319–330. Springer, 2006. 21 [53] Daniel A Schult and P Swart. Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conferences (SciPy 2008), volume 2008, pages 11–16, 2008. 21 [54] Mathieu Bastian, Sebastien Heymann, Mathieu Jacomy, et al. Gephi: an open source software for exploring and manipulating networks. ICWSM, 8:361–362, 2009. 21 [55] Wil MP van der Aalst, Hajo A Reijers, Anton JMM Weijters, Boudewijn F van Dongen, AK Alves De Medeiros, Minseok Song, and HMW Verbeek. Business process mining: An industrial application. Information Systems, 32(5):713–732, 2007. 22 [56] Dhaval Vyas, Tara Capel, Deven Tank, and David Shepherd. Understanding the use of a bug tracking system in a global software development setup. In Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction, pages 222–226. ACM, 2015. 22 [57] Javier Portillo-Rodr´ıguez, Aurora Vizca´ıno, Mario Piattini, and Sarah Beecham. Tools used in global software engineering: A systematic mapping review. Information and Software Technology, 54(7):663–685, 2012. 22 [58] Anja Guzzi, Alberto Bacchelli, Michele Lanza, Martin Pinzger, and Arie van Deursen. Communication in open source software development mailing lists. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 277–286. IEEE Press, 2013. 22 [59] Margaret-Anne Storey, Christoph Treude, Arie van Deursen, and Li-Te Cheng. The impact of social media on software engineering practices and tools. In Proceedings of the FSE/SDP workshop on Future of software engineering research, pages 359–364. ACM, 2010. 22

Bug Fixing in Action: Activities, Process, and Knowledge 51 BIBLIOGRAPHY

[60] David Hounshell. From the American system to mass production, 1800-1932: The development of manufacturing technology in the United States. Number 4. JHU Press, 1985. 26

[61] Mark EJ Newman. Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23):8577–8582, 2006. 27

[62] Tejaswini Narayanan, Merril Gersten, Shankar Subramaniam, and Ananth Grama. Mod- ularity detection in protein-protein interaction networks. BMC research notes, 4(1):1, 2011. 27

[63] Leland Wilkinson. Revising the pareto chart. The American Statistician, 60(4):332–334, 2006. 31

[64] Finn Olav Bjørnson and Torgeir Dingsøyr. Knowledge management in software engineering: A systematic review of studied concepts, ﬁndings and research methods used. Information and Software Technology, 50(11):1055–1068, 2008. 37

[65] Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and Alexey Zagalsky. The (r) evolution of social media in software engineering. In Proceedings of the on Future of Software Engineering, pages 100–116. ACM, 2014. 37

[66] Lesley Pek Wee Land, Ayb¨uke Aurum, and Meliha Handzic. Capturing implicit software engineering knowledge. In Software Engineering Conference, 2001. Proceedings. 2001 Australian, pages 108–114. IEEE, 2001. 37

[67] M. A. Storey, A. Zagalsky, F. Filho, L. Singer, and D. German. How social and communication channels shape and challenge a participatory culture in software development. IEEE Transactions on Software Engineering, PP(99):1–1, 2016. 38

[68] Alexey Zagalsky, Carlos Gómez Teshima, Daniel M German, Margaret-Anne Storey, and GermánPoo-Caamaño. How the r community creates and curates knowledge: a compar- ative study of stack overflow and mailing lists. In Proceedings of the 13th International Workshop on Mining Software Repositories, pages 441–451. ACM, 2016. 38

[69] Hai Zhuge. A knowledge ﬂow model for peer-to-peer team knowledge sharing and management. Expert systems with applications, 23(1):23–30, 2002. 38

[70] Bo Hansen Hansen and Karlheinz Kautz. Knowledge mapping: A technique for identifying knowledge ﬂows in software organisations. In European Conference on Software Process Improvement, pages 126–137. Springer, 2004. 38

[71] Olivier Gendreau and Pierre N Robillard. Exploring knowledge ﬂow in software project development. In Information, Process, and Knowledge Management, 2009. eKNOW’09. International Conference on, pages 99–104. IEEE, 2009. 38

[72] Susan M Mitchell and Carolyn B Seaman. A knowledge mapping technique for project- level knowledge ﬂow analysis. In 2011 international symposium on empirical software engineering and measurement, pages 347–350. IEEE, 2011. 38

52 Bug Fixing in Action: Activities, Process, and Knowledge BIBLIOGRAPHY

[73] Alison Smith, Timothy Hawes, and Meredith Myers. Hi´erarchie: Interactive visualization for hierarchical topic models. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pages 71–78, 2014. 41

Bug Fixing in Action: Activities, Process, and Knowledge 53

Appendix A

Process model based on automatically extracted activities

Bug Fixing in Action: Activities, Process, and Knowledge 55 APPENDIX A. PROCESS MODEL BASED ON AUTOMATICALLY EXTRACTED ACTIVITIES instant 98 2,722 owner:other status:assigned (instant) 1,986 instant 1,318 (instant) instant 516 component:set 4,679 1,111 (instant) focus:set instant 481 (instant) 5.6 hrs 501 keyword:commit 2,761 (instant) 8.1 hrs 1,822 18.7 hrs 23 instant 1,435 keyword:ux-feedback 107 22.3 hrs 33 3,820 version:set instant 489 (instant) keyword:has-unit-tests (instant) 112 (instant) 14.1 mins 13 iueA1 rcs oe ae natmtclyetatdactivities extracted automatically on based model Process A.1: Figure instant 1,601 instant 130 instant 852 keyword:i18n-change instant 684 50 1,575 (instant) priority:set 2.1 hrs 14 (instant) resolution:worksforme 22,756 2,026 commit:add 16 hrs 193 10.7 hrs 1,615 instant 45 (instant) (instant) keyword:needs-screenshots keyword:has-screenshots 25 3.3 d 5 80 (instant) status:accepted (instant) 106.3 mins 300 909 20 secs 10 keyword:reporter-feedback (instant) resolution:invalid 4,449 2,111 (instant) (instant) keyword:has-patch instant 28 47.5 mins 510 7,730 instant 4,719 4.8 hrs 18 instant 683 1 secs 1,679 6,449 (instant) owner:self keyword:needs-testing instant 770 (instant) 1,274 instant 1,188 instant 352 status:reviewing (instant) 620 4.5 hrs 875 (instant) resolution:duplicate 21,339 111 secs 4,599 keyword:close 3,314 1,109 status:new 1 secs 3,754 (instant) (instant) 12.5 d 841 (instant) 21,178 5.4 hrs 235 instant 2,259 instant 76 5.7 hrs 147 keyword:needs-unit-tests instant 4,652 53.3 hrs 7 418 44.4 hrs 563 instant 2,061 1 secs 3,051 resolution:fixed (instant) 16,725 23,625 instant 8,440 milestone:set 2.3 hrs 5,494 (instant) instant 54 (instant) 2.9 hrs 359 keyword:needs-docs 24,647 status:closed status:reopened 111.7 mins 3,707 3,469 64 instant 2,264 (instant) (instant) (instant) instant 14 2.5 hrs 878 12 mins 459 resolution:maybelater keyword:early instant 92 21.3 hrs 175 548 summary:change 46.7 hrs 390 1,990 144 (instant) 1 secs 115 5 hrs 22 (instant) (instant) 23,091 instant 23 patch:add 28.8 d 100 (instant) 2.6 hrs 5,575 instant 1,009 1 secs 14,384 1 secs 1,385 keyword:ui-feedback keyword:needs-codex 66.8 hrs 45 136 resolution:wontfix 26.2 hrs 9 1,629 67 (instant) (instant) (instant) 13.2 hrs 941 instant 300 33 hrs 85 keyword:good-first-bug keyword:needs-patch 2,468 134 (instant) (instant) instant 65 instant 641 keyword:dev-feedback 997 13 hrs 11 (instant) instant 170 keyword:needs-refresh instant 81 371 (instant) 8.1 hrs 176 keyword:2nd-opinion 618 (instant) 4.8 d 256 10.2 d 138 15,822 56 secs 8,240

56 Bug Fixing in Action: Activities, Process, and Knowledge