A Social Information Foraging Approach to Improving End-User

Developers’ Productivity

A dissertation submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in the Department of Electrical Engineering and of the College of Engineering and Applied Science by

Xiaoyu Jin B.S. University of Cincinnati October 2017

Committee Chair: Nan Niu, Ph.D.

i

Abstract

Software engineering is the application of engineering to the development of in a systematic method. Traditionally, professional software engineers use technologies and practices from a variety of fields to improve their productivity in creating software and to improve the quality of the delivered product. The practices come from aspects of software requirements, , , process, and , etc. Nowadays, more and more non-professional developers start to write programs not as their job function, but as means to support their main goal, which is something else, such as accounting, designing a webpage, doing office work, scientific research, entertainment, etc. The number of non-professional developers is already several times the number of professional , and even students of elementary school start to learn simple programming tool skills. However, due to the varied purposes of developing software, the practices for these non-professional developers can be quite different from the practices for professional developers. The programming behavior of non-professional developers has characteristic of opportunistic, which lacks systematic guidelines, and the software created by them tends to lack enough quality considerations. Therefore, support from software engineering area is needed to improve end-user programmers’ productivity and to increase the quality of the software developed. In this thesis, we define these non-professional developers as end-uesr developers and identify the distinctions between end-user developers and professional developers including the concept and programming practices of requirements,

ii

specifications, documentation, reuse, testing and verification, and . We then identify that the pragmatic software reuse is the main approach adopted by end-user developers to fulfill their daily programming tasks. We conduct several rounds of observational experiments by inviting end-user developers to carry out software reuse tasks to further analyze their programming behaviors. From the experiments, our results can be summarized in 4 aspects: (1) we first analyzed end-user developers’ information needs and summarized the needs into five categories with architectural concerns, and we also validated the positive effect that social network information brought about to end-user developers’ overall productivity; (2) we found that the diverse types of webpages represent diverse kinds of hints, which could impact productivity in a positive way. We then used various metrics to differentiate the diversity to see which metrics best capture the relation between the diverse hints and the productivity; (3) we found that according to different foraging goals, different types of webpages serve the goal in different ways impacting the easiness and time cost to fulfill the goal. We characterize and categorize webpages into four types of foraging curve styles to serve for end-user developers’ seeking and navigation behavior; (4) we identify the constant revisit behavior of end-user developers during their information seeking and reuse, and we design tool support to ease such behavior thereby reducing the time cost. In summary, the overall contribution of this thesis is that it provides both principled guidelines and concrete tool support to the end-user developers to improve their productivity mainly through improving quality of solution and reducing the time cost.

iii

iv

Acknowledgements

First and foremost I want to thank my advisor Dr. Nan Niu. It has been an honor to be his Ph.D. student and to work with him. From my experience, he cares so much about his students that he always spends large amount of time and effort guiding us in research and taking care of us in other aspects. I cannot enumerate how many things I have learnt from him since there are too many. To me, the most important things I have learnt from him include doing work in a serious and responsible manner, hardworking, critical thinking, and paying attention to detail. I believe I will benefit from these skills in rest of my life. I really appreciate all his contributions of time, effort, ideas, trust, and funding to make my

Ph.D. experience productive and stimulating.

I want to thank all the professors in my committee for their support and guidance.

Dr. Raj Bhatnagar gave me tremendous help especially in the first two years of my Ph.D. life. He cares about me so much and always explain things patiently to me. I want to thank

Dr. Michael Wagner for his time, effort, and criticality that he spent on my thesis in improving the quality of my work. I learnt from him the attitude of being serious and responsible for research and work. I also want to thank Dr. Michael D. Sokoloff for his guidance and rigor to research as well as pointing out my weakness for me to improve.

Finally, I want to thank Dr. Carla Purdy for her approval and encourage to my work, which makes me more confident to my work and myself.

v

I want to thank all the members of the Software Engineering Research Lab, who have contributed immensely both to my life and my work. Their support comes from the friendships, the good advice and collaboration, the constant encouragement, and the fun time with them. I am especially grateful for the help from Wentao Wang, Charu Khatwani, and Arushi Gupta. I also want to thank all my friends that I cannot enumerate during my life time in University of Cincinnati. Their help and support are everywhere in my everyday life.

Lastly, I would like to thank my family for all their love, support, and encouragement. I am grateful to my parents because they have sacrificed a lot to raise me, educate me, and support me in pursuing my goal. I want to thank my sister Xiaofang also in United States for her help in my daily life and her advice to my work and career. Thank you.

Xiaoyu Jin University of Cincinnati November 2017

vi

Table of Contents

1 Introduction 1 1.1 Pragmatic software reuse …………………………………………………….. 2 1.2 Development social network ………………………………………………… 4 1.3 Thesis contributions and organization ……………………………………….. 6

2 Background and related work 9 2.1 Pragmatic software reuse…………………………………………………….... 9 2.2 Information foraging theory ………………………………………………….. 11 2.2.1 Foraging theory’s applications in software engineering………………. 11 2.2.2 Social information foraging ……………………………………………13 2.2.3 Patch model …………………………………………………………….16

3 Characteristics of end-user developers 19 3.1 Concept of end-user developers ……………………………………………….19 3.2 End-user software engineering ……………………………………………….. 22 3.2.1 Software requirements …………………………………………………27 3.2.2 Design specifications ……………………………………………….... 28 3.2.3 Reuse ………………………………………………………………… 29 3.2.4 Testing and verification ………………………………………………. 33 3.2.5 Debugging …………………………………………………………….. 34 3.3 Summary ………………………………………………………………………36

4 Quality and value improvement 37 4.1 Definition of productivity ……………………………………………………..37 4.2 How social network information support information needs ………………….39 4.2.1 Study design ……………………………………………………………39 4.2.1.1 Participants ……………………………………………………40 4.2.1.2 Tasks …………………………………………………………..41

vii

4.2.1.3 Procedures …………………………………………………… 46 4.2.2 Information needs in pragmatic software reuse ………………………. 47 4.2.3 Usage of social network information …………………………………..51 4.2.3.1 Usage flow model of social network information ……………..51 4.2.3.2 Before reuse …………………………………………………...53 4.2.3.3 During reuse ………………………………………………...... 53 4.2.3.4 After reuse …………………………………………………… 54 4.2.4 Supporting the needs with social network information ………………. 55 4.2.5 Improve productivity with social network information diversity ………62 4.3 How the diversity of social network information impacts productivity ………71 4.3.1 Categorizing social network information…………………………….... 71 4.3.2 Results and analysis ……………………………………………………75 4.3.2.1 Raw data extraction …………………………………………... 76 4.3.2.2 Data refinement ……………………………………………… 77 4.3.2.3 Log-normal regression ……………………………………….. 81 4.3.2.4 Interactions between categories ……………………………… 84 4.4 Threats to validity …………………………………………………………….. 85 4.4.1 Threats to internal validity ……………………………………………..85 4.4.2 Threats to external validity ……………………………………………. 86 4.4.3 Threats to construct validity ……………………………………………87 4.5 Summary ………………………………………………………………………87

5 Cost estimation and reduction 89 5.1 Estimating time cost through foraging curve styles ……………………………89 5.1.1 Motivations …………………………………………………………….89 5.1.2 Related work ………………………………………………………….. 92 5.1.2.1 Cost evaluation & cost reduction …………………………….. 92 5.1.3 Our approach ………………………………………………………….. 94 5.1.3.1 Cost estimation ………………………………………………. 94

viii

5.1.3.2 Tool design ………………………………………………….. 98 5.1.4 Evaluation design ……………………………………………………..101 5.1.5 Results and analysis …………………………………………………..102 5.1.5.1 Reduced time for task completion ……………………………102 5.1.5.2 Shifted pattern for foraging curves ………………………….. 105 5.1.5.3 Reduced number of unproductive cases of foraging ………… 107 5.1.6 Threats to validity ……………………………………………………. 110 5.1.7 Summary ……………………………………………………………...111 5.2 Reduce time cost of short-term revisit behavior …………………………….. 112 5.2.1 Motivation ………………………………………………………….... 112 5.2.2 Related work ………………………………………………………….114 5.2.2.1 Web page revisitation ……………………………………….. 114 5.2.3 Exploratory study of end-user developers revisit behavior ………….. 117 5.2.3.1 Data preprocessing ………………………………………….. 117 5.2.3.2 How many visits are revisits? ………………………………. 118 5.2.3.3 What support can we provide? ……………………………… 121 5.2.4 Results and analysis …………………………………………………. 125 5.2.4.1 Reduce revisit time with EasyRevisit ……………………….. 125 5.2.4.2 Integration of information foraging model ………………….. 128 5.2.5 Threats to validity …………………………………………………….133 5.2.6 Summary ……………………………………………………………...133

6 Conclusions 135

References 143

Appendix A: Detailed task description 159

ix

List of Tables

3.1 Qualitative differences between professional and end-user software engineering … 24 3.2 Detailed differences between professional and end-user software engineering …… 25 4.1 Social network information (SNI) provided for ImageJ reuse task …………………44 4.2 Social network information (SNI) provided for the StochKit reuse task ……………45 4.3 Experimental block assignments …………………………………………………….46 4.4 Comparing the number of links followed to fulfill the information needs…………..66 4.5 Comparing the time used to fulfill the information needs …………………………..67 4.6 Raw data extracted from experiment video………………………………………….76 4.7 Calculating rate of gain ……………………………………………………………...77 4.8 Categorizing SNI and assign X axis values according to SNI type …………………78 4.9 Categorizing SNI and assign X axis values according to contributor role ……….....79 4.10 Lognormal regression parameters…………………………………………………..83 4.11 Bivariate lognormal regression parameters ………………………………………. 84 5.1 Comparison of task completion time……………………………………………….. 103 5.2 Distribution of webpages according to four categories of foraging curves…………105 5.3 Distribution of webpages for unproductive and productive cases………………….. 108 5.4 Fragmented behavior from participant’s vide……………………………………….118 5.5 Statistical results of visits and revisits……………………………………………….120 5.6 Statistical results for ImageJ ……………………………………………………….. 126 5.7 Statistical results for StochKit………………………………………………………127

x

List of Figures

2.1 Lognormal distribution of information foraging prediction ……………………….. 14 2.1 Charnov’s marginal value theorem patch model…………………………………….16 3.1 Programming activities along dimensions of experience and intent …………………22 4.1 Factors in multiple dimensions for productivity……………………………………... 38 4.2 ImageJ reuse task …………………………………………………………………….42 4.3 StochKit reuse task ………………………………………………………………….. 42 4.4 Categories of information needs in pragmatic software reuse ……………………… 49 4.5 Usage flow model of social network information ……………………………………52 4.6 SNI usage by tasks and information-need categories…………………………………56 4.7 Comparison of task completion time …………………………………………………57 4.8 Successful situation of task completion for StochKit task …………...... 59 4.9 Partially successful situation …………………………………………………………60 4.10 Unsuccessful situation …………………………………………………………….. 60 4.11 Comparison of reuse solution success ……………………………………………...61 4.12 An example of webpage containing only source code ……………………………..63 4.13 An example of the webpage of ImageJ plot API ……………………………………64 4.14 Usage explanation of function “addLegend” ……………………………………… 65 4.15 Stack Overflow page showing code example of function “addLegend” ………….. 65 4.16 Source code snippet from a blog-like webpage …………………………………….69 4.17 Comparing SNI diversity by task completion groups ………………………………70 4.18 A sample SNI website ………………………………………………………………73 4.19 Fitting data points into a curve ……………………………………………………...79 4.20 Lognormal regression with Excel Solver ………………………………………...... 82 4.21 Fitting lognormal regression curve………………………………………………….82 5.1 Foraging curve: linear ………………………………………………………………. 96 5.2 Foraging curves: (a) conceptual foraging; (b) answer seeking foraging……………. 96 5.3 An example result with our tool support……………………………………………100

xi

5.4 A trend shifting to the Ranked Foraging category when having tool support ………106 5.5 Categories of revisit behaviors ……………………………………………………...116 5.6 A of our tool support……………………………………………………….123 5.7 An example showing within-patch time when reducing between-patch time……….131

xii

Chapter 1

Introduction

Software is a critical enabler to advance our understandings and make innovative discoveries in many areas such as physics, chemistry, biology, and business etc. In fact, the software engineering challenges have grown so immense that, in the United States, for example, the primary biomedical and health-related funding agency –– the National

Institutes of Health (NIH) –– began investigating ways to better discover software, namely to greatly facilitate the biomedical research community to locate and reuse software [12].

These challenges in biomedical research community act as an important motivation for us to not only study biomedical researchers, but also to study non-professional developers as a whole, which we define as end-user developers.

End-user developers are people who write programs, but not as their primary job function. Instead, they write programs as means to support their main goal, which is something else, such as accounting, designing a webpage, doing office work, scientific

1

research, entertainment, etc. The number of end-user programmers in the United States falls somewhere between 12 million and 50 million people—several times the number of professional programmers [1]. Since programming becomes easier is a trend, the computer science education nowadays even comes down to the elementary school, so there is also a trend that there will be more and more end-user developers. End-user developers tend to work opportunistically [2], and the software they create tends to lack quality considerations

[1]. The reason is that the software quality is not the primary concern for end-user developers, and they use software as tools to achieve other goals [3]. To improve end-user developers’ productivity and the quality of their software, a comprehensive analysis of their programming process is crucial.

The overall objective of this thesis is to provide support to the end-user developers in improving their productivity. The topics under study include what their programming process is, and how they rely on online resource or reuse existing code to fulfill their programming tasks. During the programming process, information and knowledge are essential elements in structuring their mental models and performing the actual implementation. How they seek the programming related web information will largely influence the programming efficiency and productivity. Therefore, we hypothesize that we can improve end-user developers’ productivity by analyzing their information needs during programming and observing how they seek the required information. In specific, we will apply social information foraging theory [4] to model their programming process, and identify important factors which could potentially be used to improve their productivity.

2

Our rationale is that foraging theory offers a coherent set of rules to explain software engineering phenomena and to make predictions about developers’ behaviors at an appropriate level of abstraction [5]. The theory has already been extensively and successfully applied in the area of website design and evaluation [6], [7] and software engineering [8]-[11]. Our work extends from the theory, and provides concrete tool supports in improving end-user developers’ productivity. The overall contribution of this thesis is that it provides both principled guidelines and concrete tool supports to the end- user developers to improve quality or reduce time thus improving productivity during programming.

Our research design is mainly based on two aspects of end-user developers’ programming characteristics including: (1) their reuse practice is pragmatic software reuse; and (2) heavy dependence on social network information. We will introduce the two characteristics next.

1.1 Pragmatic software reuse

The concept of reuse can lie in a spectrum of several levels, ranging from use of an existing package or modification of existing package, to development of new code, which incorporates (opportunistically) snippets from existing packages or online websites. The reuse in this thesis is limited to reusing existing software or package with modifications to achieve new features or functionality. For end-user software development, an essential challenge is pragmatic software reuse that uses the software artifacts which were not

3

necessarily developed with reuse in mind [13]. In contrast to pre-planned software reuse such as product line engineering [14], pragmatic reuse recognizes the opportunistic and exploratory nature of reuse decisions manifested in practices like copy-paste-modify code.

A unique aspect in this domain is that the programmers are often researchers whose principal training area is not software engineering but say biomedical related fields. For instance, in the daily work of the researchers, pre-planned software reuse may not be instrumented or enforced, leaving pragmatic reuse the only feasible option.

Current approaches to pragmatic reuse attempt to support the developer’s explicit recording of a reuse plan and automatically enact certain steps of the plan [15], define metrics to indicate the effects of reuse on project performance [16], and assist in the reusable component extraction by iteratively analyzing the structural complexity of code

[17]. In addition, various kinds of code search and recommendation mechanisms have been proposed [18]-[22], focusing on white-box reuse where existing code needs internal modifications so as to fit the target system. Current approaches provide interesting features which are helpful in facilitating the reuse process, such as utilizing classic information retrieval techniques [18], treating the web as a code component repository [19], and automating code search and reuse [20], etc. Despite the contemporary support, pragmatic software reuse remains a difficult endeavor for developers. Among the salient challenges are the dependencies surrounding the reusable code and the breakdowns experienced when the code is integrated into the target system [13].

4

Such issues can be regarded as instances of architectural mismatch [23] representing a persistent difficulty in software reuse [24]. Architectural mismatch stems from the incompatible assumptions that each part of reuse had made about its operating environment.

Although pragmatic reuse is often labeled ad hoc [15], we believe it should not be performed without explicit architectural considerations. Existing pragmatic reuse approaches, however, have not thoroughly examined the role of in pragmatic reuse.

1.2 Development social network

Social interactions of software engineers have been studied from various technical and organizational aspects. For example, given a particular organization or project, people can “be friends” with the work items that they share [25], optimal group size can be determined by social information foraging principles [26], and latent sub-communities can be identified based on email exchanges [27].

Today’s generation of software developers, especially end-user developers [1], [28],

[29], frequently make use of social network information (SNI) to solve their problems during software programming tasks [30]. What have recently emerged to shape software engineering practices are the online social networks where developers collaborate and exchange ideas and expertise. These technologies include community portals, Q&A sites, wikis, forums, and microblogs [31]. Some famous examples include Github, Reddit, Stack

Overflow, Twitter, , Hacker News, and TopCoder. Reviewing feeds,

5

watching projects, and following others are the most used social networking functionalities among today’s software developers [32]. Surprisingly, little is known about how software reuse can utilize and even strengthen the online social networks. This lack of knowledge is especially prominent in pragmatic, white-box reuse tasks, which means the sub-modules of a program can be viewed, accessed, and modified by programmers. If a can examine source code, weaknesses in an algorithm are much easier to discover. That makes white box reuse much more effective than black box reuse but considerably more difficult from the sophistication needed for the programmer to understand the sub-modules.

Exploiting online information to support software reuse, even before the social networking era, is not without problems. Hummel and Atkinson [19] pioneered the systematic investigation of the Web as a reuse repository; however, the web services that they deployed for white-box reuse experienced discontinuation and returned disappointing results. Happel et al. [18] found that most source code search engines focused on retrieving lines of code and often lacked the capability to help the re-user explore in-depth connected information. Zou and colleagues [33] proposed an automated approach to searching software reuse candidates and using the developer comments extracted from online social networks to perform sentiment analysis of the candidates. In sum, the support so far has been extensive on code search but not on the reuse per se. Understanding how social network information can help carry out the actual reuse task is one of the focuses of our research.

1.3 Thesis contributions and organization

6

Motivated by the two aspects of pragmatic software reuse and heavy reliance on social network information, we conduct observational experiments by inviting end-user developers to carry out software reuse tasks to further analyze their programming behaviors.

From the experiments, we performed in-depth analyses, and the observations and results are summarized in Chapters 3, 4, and 5. Chapter 2 describes the background and related work.

In Chapter 3, we investigate how the software programming practices for end-user developers are distinct from professional developers. We summarize the distinctions surrounding five aspects including requirements, specifications, reuse, testing and verification, and debugging. Overall, the practices of professional software engineering possess the characteristics of explicit, pre-planned, cautious and systematic, while end-user software engineering practices have the features of implicit, unplanned, overconfident and opportunistic.

In Chapter 4, we first analyze end-user developers’ information needs by extracting questions they asked during performing software reuse tasks. Our results contain 31 specific questions, which we grouped into four categories according to architectural concerns. Furthermore, we also validate the positive effect that social network information brought about to end-user developers and tested the extent to which online social networks support the needs of software reuse tasks. During the process of analysis, we observed that the diverse types of webpages represent diverse kinds of hints, which could impact productivity in a positive way. We therefore use various metrics to differentiate the

7

diversity to investigate which metrics best capture the relation between the diverse hints and the productivity.

In Chapter 5, we draw inspiration from social information foraging theory and found that according to different foraging goals, different types of webpages serve the goal differently regarding the easiness and time cost. We characterize and categorize webpages into four types of foraging curve styles to serve for their seeking and navigation behavior.

We then design tool features and perform experiments to test the effectiveness of the four types of foraging curve styles. We observe the constant revisit behavior of end-user developers during their information seeking and reuse, and we design tool support to ease such behavior thereby reducing the time cost.

The outcome and contribution of our work lie in three aspects: (1) we extend the theoretical insights from information foraging theory to adapt and apply it for empirical data analysis; (2) we perform observational experiments of end-user developers to study and analyze their behavior in a detailed way; (3) we develop principled guidelines as well as actual tools to support end-user developers to improve their productivity.

8

Chapter 2

Background and related work

2.1 Pragmatic software reuse

Software reuse attempts to improve software quality and developer productivity by leveraging existing code and knowledge [38]. Two approaches can be distinguished in terms of how the reusable artifacts are created and used. Pre-planned approaches, such as object-oriented inheritance and product line engineering, explicitly build artifacts for reuse so that subsequent software product/system development can be carried out with reuse. In contrast, pragmatic approaches, such as code scavenging [38] and opportunistic programming [39], facilitate the reuse of software artifacts that were not necessarily designed for reuse [13]. While the distinction is not always clear cut, a key difference is that pre-planned approaches assume that a reusable part exists that either fits perfectly or that the target system can be adapted to make it fit whereas a pragmatic approach assumes that the reusable itself is a legitimate target for modification [15].

9

Maras et al. [13] identified three steps involved in pragmatic software reuse based on their experience of Web application development: locating the source code of an individual feature, analyzing and modifying the code, and integrating code into the target system. These steps are in line with the process model described by Holmes and Walker

[15]. In [15], a tool named Gilligan was introduced to support Java developer’s recording of a pragmatic reuse plan. Moreover, Gilligan helped automate simple cycles of the plan

(e.g., copy a manually found element, paste it in a manually determined location, flag syntactic warnings, etc.). Their experiments with 16 participants (2 undergraduates, 7 graduate students, and 7 industrial developers) using Gilligan showed that, compared to the location of reusable code, much difficulty occurred in analyzing the code, especially in resolving dangling dependencies to libraries, types, methods, and fields [15].

The difficult-to-resolve dependencies reflect incompatibilities at not only the source code level, but also the software architecture level. Architecture-centric reuse approaches date back at least to the work of Garlan et al. [23], who argued that a main reason why software architecture is important is because it lets designers exploit recurring architectural styles to reuse routine solutions for certain classes of problems. Drawing from their experience of failing to build a system from reusable parts, Garlen et al. [23] recognized a root cause being the conflicting assumptions among the parts and termed this phenomenon

“architectural mismatch”. Generally speaking, four categories of assumptions can lead to architectural mismatch: nature of the components, nature of the connectors, global architectural structure, and construction process.

10

Researchers have advanced architecture-centric reuse by trying to avoid or tolerate mismatch [40], to sustain evolutionary stability [41] and to catalog specialized solutions specific to a particular domain in a way that restricts the range of permissible components and their interactions [42]. Beyer et al. [43] reported a success story of introducing a product-line architecture to a small software team of 2 developers and 1 tester. During 4 iterations, an organization-specific software architecture was established, static architectural compliance checks were performed, and the reduced development effort was observed. In [43], the main benefit of the product-line architecture was to help communicate and negotiate the competing stakeholder concerns within the same organization. In our work in Chapter 4, one of the emphasis is on examining the role of general software architectural styles in pragmatic code reuse without any organizational boundary.

2.2 Information foraging theory

2.2.1 Foraging theory’s applications in software engineering

Humans seeking information adopt various strategies, sometimes with striking parallels to those of animal foragers. Essentially animals adapt, among other reasons, to increase their rate of energy intake [43]. To do this they evolve different methods: a wolf hunts for prey, but a spider builds a web and allows the prey to come to it. The wolf-prey strategy bears some resemblance to classic information retrieval [43], and the spider-web strategy is like information filtering [44]. Information foraging theory assumes that humans are well adapted to the excessive information in the world around them, and that they have

11

evolved strategies to efficiently information relevant to their needs. Pirolli [45] has successfully applied the core mathematics of optimal foraging theory to study human behavior during information-intensive tasks like Web navigation. The WUFIS (Web User

Flow by Information Scent) algorithm [45] represents one of the most rational models applying the core mathematics of optimal foraging theory to study Web navigation.

Computing the web user’s “information diet” provides remarkable insights into issues like link selection and decision to leave a webpage. As a result, information foraging theory has become extremely useful as a practical tool for website design and evaluation [6], [7].

Inspired by human’s adaptive interaction with information on the Web, researchers began to apply foraging theory in software engineering. Notably, Lawrance et al. [8]-[11] have made tremendous strides in understanding programmer navigation in debugging by viewing programmer as predator and bug-fix as prey. Building on Pirolli’s work, Lawrance et al. [9] developed the PFIS (Programmer Flow by Information Scent) model. Extending beyond Pirolli’s work, Lawrance et al. [11] presented the PFIS2 model by incorporating the incremental changes in programmers’ navigation goals during debugging. More recent work focused on empirically assessing programmer navigation models’ predictive accuracy [46] and optimally composing single factors (e.g., recency, spatial proximity, etc.) into a family of PFIS3 models [47].

In summary, foraging theory offers a coherent set of rules to explain software engineering phenomena and to make predictions about developers’ behaviors at an appropriate level of abstraction. The theory thus unifies many isolated approaches that

12

would otherwise not be linked in a meaningful way. An example is Fleming and his colleagues’ work unifying various tool in debugging, refactoring, and reuse under foraging theory’s common abstraction [48]. This allows software engineering research and tool building to be carried in an integrated and principled manner [22].

2.2.2 Social information foraging

The applications of foraging theory in software engineering so far have mainly focused at an individual level. However, today’s software is rarely developed by soloists but is the result of collective action. Pirolli has extended information foraging theory to the social level [4]. This multilevel model is derived from the quantitative theories of cooperative problem solving [49] and foraging in social groups [50]. The key assumption connecting solo-level foraging and the social-level is the shared set of hints, which provide likely location of useful information that will yield some amount of utility for one or more foragers [4].

Hints, for example, can be in the form of tags in social tagging systems [4] or in software development environments [51]. Shared tags, contributed by individuals, provide navigation paths (hints) to available content that potentially improve information search.

Building on the “tags-as-hints” instantiation [4], we posit that many types of individual contributions (e.g., blogs, wiki edits, question answers, etc.) can also be treated as hints, which if shared, will offer varying amounts of utility to a developer’s information foraging.

13

Such an effect is quantitatively described in [4]. Figure 2.1 illustrates the theoretical prediction. It is the diversity of hints (H), rather than the hints themselves, that are predicted to impact an individual forager’s cumulative rate of gain (R). The reason that hint diversity is important is because hints may vary in the validity of the search information conveyed, may vary in how they are interpreted by the information forager who receives them, and may vary in effectiveness depending on when they are exchanged in the search process [4].

For instance, to the extent that hints may contain correlated or even redundant search information, the effectiveness of hints will depend on what hints have already been processed. The H in Figure 2.1, therefore, should be interpreted as the distinct kind of hints, i.e., the independent heuristic effectiveness of a given type of hints.

Figure 2.1: Lognormal distribution of information foraging prediction adopted from [4]. The rate of gain for solo foraging is a constant value. While for social foraging, the rate of gain is increasing rapidly first due to the diverse hints, but then decreasing since the hints can overlap and become repetitive gradually.

14

Because hints can overlap, their effect on foraging efficiency is not monotonically increasing. Drawing on the derivation by Huberman [49], Pirolli [4] showed that the probability density distribution for finding valuable information could be cast as a lognormal distributive function of the sample of hints contributed by individuals, as shown in Figure 2.1. The lognormal distribution makes interesting predictions about productivity of a forager receiving hints with respect to high-utility search results. If one assumes that the various states of a search space have a binomial distribution of utilities, then the search performed by the solo forager with mildly effective but not shared hints will return a distribution of result values shown by the solo-curve in Figure 2.1. Increasing hint diversity will shift that distribution to a lognormal and will especially increase the likelihood of search results at the higher end of the utility spectrum. As more diverse kinds of hints are received, the productivity begins to decrease due to the diminished value (e.g., redundant hints) and unneglectable cost of processing the hints.

In summary, social information foraging models like the one depicted in Figure 2.1 extend the theory’s explanatory and predictive power to the social-level phenomena of foraging with shared hints. How well is the theoretical prediction about hint diversity on productivity when confronted with empirical study’s results? This is precisely the question that drives our research of analyzing how the diversity of webpage types impact the overall productivity in Chapter 4.

15

Figure 2.1: Charnov’s marginal value theorem from [52]: (a) the rate-maximizing time to spend in patch t* occurs when the slope of the within-patch gain function, g, is equal to the average of gain, which is the slope of the tangent line R; (b) when the between patches time costs decreases, the average rate of gain increases and the rate-maximizing time to spend decreases. In order to better distinguish tB and tW, they are depicted in opposite direction in x axis only for demonstration purpose, which has same effect compared to representing it in a traditional coordinates.

2.2.3 Patch model

Information Foraging Theory was originally inspired by appeals in the psychology literature for an ecological approach to understanding human information-gathering and sense-making [45]. Pirolli [45] laid out the basic analogies between food foraging and

16

information seeking: predator (human in need of information) forages for prey (the information itself) along patches of resources and decides on a diet (what information to consume and what to ignore). The patch model is a core component of information foraging theory which originated from food foraging theory. For instance, imagine a bird that forages for berries found in patches on berry bushes. A forager first needs to expend some between-patch time getting to the next food patch. Once in a patch, the forager needs within-patch time to forage food and also needs to decide when to stay or leave this patch for the next one [52].

In Information Foraging Theory, a patch aims to structure the information in a certain way and can be constructed differently such as a book, a webpage, a source code file, a code section, or even a line of code, etc. As shown in Figure 2.2-(a), the Charnov’s marginal value theorem [1], [52] depicts that as the foraging time within the patch increases, the cumulative amount of useful information (represented as g(tw)) gained from the patch increases. The curve is increasing but with a decreasing speed based on the assumption that there will be diminishing valuable information gained in a patch as time progresses. The assumption is based on the observations that forager generally will prioritize to forage more valuable information and forager may get redundant information in later time which replicates information encountered earlier [52]. Further, Charnov’s marginal value theorem was developed to predict that a forager should remain in a patch so long as the slope of g(tw) is greater than the average rate of gain, R, for the environment. If we reduce the between-patch time tB by between-patch enrichment, the within-patch time should also be

17

reduced and the average rate of gain increases to achieve the optimal performance as shown in Figure 2.2-(b). The patch model drives our study in Chapter 5 to evaluate the cost of foraging a webpage hence facilitating developers to select and forage an optimal webpage link.

18

Chapter 3

Characteristics of end-user developers

End-user developers possess certain different characteristics compared with professional developers. In this chapter, we discuss the characteristics of end-user developers in detail, ranging from concept, requirements, specifications, reuse, testing and verification, and debugging, etc. These characteristics help us gain deeper understanding and drive our further research in later chapters.

3.1 Concept of end-user developers

In order to define end-user developers, we first need to refer to end-user programming, which is a phrase popularized by Nardi in her investigations into spreadsheet use in office workplaces [34]. An end user is simply any computer user. End-user programming was later defined by Ko et al. [3] as programming to achieve the result of a program primarily for personal, rather than public use. The key distinction is that program is not primarily intended for use by a large number of users with varying needs. For instance, a teacher may need to write a grades spreadsheet to track students’ test scores, a

19

photographer may write a Photoshop script to apply common filters to a hundred photos, or a caretaker may write a script to help a person with cognitive disabilities to be more independent [53]. In these end-user programming situations, the program is a means to an end and only one of potentially many tools that could be used to accomplish a goal [3]. The definition also includes a skilled software developer writing “helper” code to support some primary task [3]. For example, a developer is engaging in end-user programming when writing code to visualize a data structure to help diagnose a bug. Here, the tool and its output are intended to support the developers’ particular task, but not a broader group of users of use cases [3]. From this perspective, the definition of end-user developers is actually not dependent on a certain person, but should depend on the specific task a developer is doing. A professional developer can be end-user developer sometimes.

Similarly, an end-user developer such as bioinformatics researcher can also be a professional developer when he or she is involved in developing some online technical services provided to the whole research group.

Ko et al. [3] also discussed the definitions of professional developers in their work.

In contrast to end-user programming, professional programming has the goal of producing code for others to use. The intention can be making a profit for a company, or to provide a public service, or even to write it for fun, etc. Therefore, the moment novice web designers move from designing a web page for themselves to designing a web page for someone else, the nature of their activity has changed. The moment this shift in intent occurs, the

20

developer must plan and design for a broader range of possible uses, increasing the importance of design and testing, and the prevalence of potential bugs.

It is important to distinguish the concept of end-user developers with inexperienced developers. Professional developers with years of experience may also engage in end-user programming by writing code for their personal use, with no intent to share their program with others [3]. They can complete their end-user programming task more quickly and with fewer bugs. However, they will not approach the work with same quality goals when they produce code for others to use. This distinction is summarized in Figure 3.1 adopted from

[3]. Figure 3.1 portrays programming experience and intent as two separate dimensions.

Computer science students and professional developers may both code with the intent of creating software for others to use, but they vary in their experience. Similarly, end-user programming involves programming for personal use, but could include a wide range of programming expertise. For example, a bioinformatics researcher may have years of experience working in a software development company and then he decided to pursue a

Ph.D. degree in bioinformatics and he became an end-user developer when he started to code to produce research results. However, for end-user developers, there are many more inexperienced programmer than experienced ones [3]. The key distinction of intent does not mean that the distribution of experience between professional and end-user developers are the same.

21

Figure 3.1: Programming activities along dimensions of experience and intent adopted from [3]. Note that the upward slant in end-user programming indicates that people with more experience tend to plan for other uses of their code.

3.2 End-user software engineering

Having understood professional and end-user developers, we now discuss the different software engineering activities between professional and end-user developers. In the previous Section, we discussed that the intent behind programming is the key distinction between end-user programming and other programming activities.

Programmers’ intents would determine to what extent they would consider concerns like reuse, reliability, robustness, and maintainability and the extent to which they invest in activities that improve these qualities, like verification, documentation, testing, and debugging. Since software engineering is defined as systematic and disciplined activities that address software quality issues, the main difference between professional software

22

engineering and end-user software engineering is the amount attention given to software quality concerns [3].

In professional software engineering, the amount of attention given to software quality concerns is much greater. If a software program aims to be used by millions of users, who have unique using scenarios and varying concerns, programmers in the team must consider quality rigorously and regularly in order to succeed. Systematicity, discipline, and quantification are common characteristics required for professional software engineering. These characteristics all require significant time and attention, so that professional software developers generally spend more time testing and maintaining code than developing it [54], and the teams, communication, and tools are often structured around performing these activities [55].

Although end-user software engineering retains different characteristics, it does not mean that end-user software engineering is absent from systematic and disciplined activities which address software quality issues. However, these activities are secondary to the goal that the program is helping to achieve since the primary goal is to achieve the results in their own domains of expertise. Due to the difference in priorities and the opportunistic nature of end-user programming, end-user developers rarely have the interest or the time in systematic and disciplined software engineering activities. For instance, previous work by Segal [56] reported on several teams of scientists engaging in end-user programming. They found that software itself is not valued, and the process of developing software was highly iterative and unpredictable, and that testing was not considered

23

important compared with other domain-specific risks. These differences are summarized by Ko et al. [3] in Table 3.1, showing that end-user software engineering can be characterized by its unplanned, implicit, opportunistic nature, due primarily to the priorities and intents of the programmer, but perhaps also due to inexperience.

Table 3.1: Qualitative differences between professional and end-user software engineering adopted from [3].

Software Engineering Activity Professional SE End-User SE

Requirements Explicit Implicit

Specifications Explicit Implicit

Reuse Planned Unplanned

Testing and Verification Cautious Overconfident

Debugging Systematic Opportunistic

Recognizing these differences, some may think that we should change or improve end-user developers’ behavior to let them adapt to professional software engineering practices. However, we consider that this idea is not the right mentality and approach to supporting end-user developers. As is mentioned in [3], the real challenge of end-user software engineering research is to find ways to incorporate software engineering activities and tool support into end-user developers’ existing workflow, without requiring people to substantially change the nature of their work or their priorities because their goal, which determined their priorities and behavior, cannot be changed by external influence. For example, rather than expecting spreadsheet users to incorporate a testing phase into their

24

programming efforts, tools can simplify the tracking of successful and failing inputs incrementally, providing feedback about software quality as the user edits the spreadsheet program [3]. In our work, we try to apply the same mentality to provide support to end- user developers, which aims to allow users to stay focused on their primary goals such as teaching children, recording a television, making scientific discoveries, etc., while still achieving software quality.

Extending from Table 3.1, we summarize more characteristics of end-user software engineering as shown in Table 3.2. These characteristics still follow the five main categories of requirements, specifications, reuse, testing and verification, and debugging.

We will describe these characteristics in detail next.

Before that, we first need to bear in mind that the key difference, which is the concept distinction that professional software is developed for public use while end-user software is developed primarily for personal use (Table 3.2). The concept distinction is important because it is the root cause for many other characteristics discussed next.

Table 3.2: Detailed differences between professional and end-user software engineering.

Software Engineering Activity Professional SE End-User SE

Concept distinction Public use Personal use primarily

Requirements Explicit Implicit

Source of requirements Customers End-user himself/herself

Planning of requirements Pre-planned Emerging

Stability of requirements More stable Tend to change

25

Specifications Explicit Implicit

Expected lifespan Long term Short term

Actual lifespan Long term Long term reuse accidentally

Reuse Planned Unplanned

Reuse strategy Systematic Opportunistic

Aim of reuse Less time, maintainability Less time of implementation

Reusing challenges Correctness, reliability Programming environment and

API

Reuse solution finding Fastidious on quality, less Less fastidious on quality,

fastidious on simplicity fastidious on simplicity

Reuse adaptation Process and comprehend Process and comprehend less

thoroughly thoroughly

Concept comprehensiveness Understand before code Code without understanding

completely

Adaptation strategy Cautious and make it Copy-paste and redundancy

concise tolerance

Testing and Verification Cautious (as experience Overconfident

gained)

Debugging Systematic Opportunistic; quick and dirty;

(could lead to additional errors)

Bug interpretation Better grasp of results Lack accurate knowledge, have

and failures difficulty conceiving failure

Use of SE tools (IDE, Pervasive, essential Rare, sporadic debugging, , etc.)

26

3.2.1 Software requirements

Software requirements refer to statements that how a program should behave in the world, as opposed to the internal behavior of a program, which is how it achieves these external concerns [3]. We have mentioned that requirements in professional software engineering are explicit while implicit in end-user software engineering. In professional software engineering, projects usually involve a requirements gathering phase, which results in requirements specifications. The specifications can be helpful in predicting the project resource needs and for negotiating with clients. For end-user software engineering, however, end-user developers rarely have an interest in expressing their requirements explicitly and clearly, and the requirements may only become clear in the process of implementation. From the definition, the source of requirements of professional software engineering is from customers while the source comes from end-users themselves in end- user software engineering. In professional software engineering, formal interviews and other methods are used to try to arrive at clearly defined requirements, and the challenge lies in understanding the context, needs and priorities of other people or organizations. For end users, requirements are easily understood simply because the requirements come from their own. The feature of self-owned software also makes the requirements more likely to change since end-users only need to negotiate changes with themselves. To summarize, the requirements are explicit, pre-planned, and more stable for professional engineering while they are implicit, emerging, and tend to change for end-user software engineering.

27

Previous research has not attempted to explicitly support requirements capture in end-user software engineering due to the requirements being implicit. However, there are some techniques that could be applied to perform requirements elicitation. For example, the Whyline [37] allows users to ask questions about a program’s output, making it an implicit way to learn what is the intended behavior and unintended behavior. Another example is the goal debugging work in spreadsheet model [57], from which users can inquire about incorrect values in spreadsheet output.

3.2.2 Design specifications

In software engineering, design specifications specify the internal behavior of a system, whereas the requirements are external in the world [3]. In professional software engineering, software designers translate the ideas from requirements to design specifications, which can be helpful in formulating implementation strategies and ordering the right prioritization of software quality concerns such as performance and reliability.

This design process can make sure all requirements have been accounted for.

In end-user software engineering, it might be unclear to end-user developers that having explicit specifications is beneficial in the long term and at a large scale, because they may not expect to use their programs for long term, while actually this is not particularly accurate in practice. For instance, previous study found that end users are creating more and more complex spreadsheets, with typical corporate spreadsheets doubling in size and formula content every three years [3]. When they create a spreadsheet, they thought they probably will not use it for next task, but actually they commonly reuse

28

and extend the previous spreadsheets to achieve new goals. To summarize, specifications are explicit in professional software engineering while implicit for end-user software engineering. The expected lifespan is long term for software developed by professional software engineering process, while it is expected to be short term for software by end-user developers. In practice, however, software is both used for long term and there will be accidentally long term reuse for software by end-user developers.

Since end-user developers’ designs and specifications tend to be emergent, which are similar to their requirements, the requirements and design specifications are rarely separate activities in end-user programming. This phenomenon has been reflected in many design approaches that targeted at end-user developers, which mainly aim to support evolutionary and exploratory prototyping, rather than upfront design [3]. For example,

Newman et al. developed DENIM [58], which is a sketching system for designing web sites. It allows users to leave parts of the interface in a rough and ambiguous state. The characteristic is named provisionality by Green et al. [59], meaning that the elements of a design could be partially, or even imprecisely stated.

3.2.3 Reuse

Reuse generally refers to either a form of composition, such as “gluing” together components APIs, or libraries, or modification, such as changing some existing code to suit a new context or problem [3]. In professional programming, developers commonly reuse code to fulfill their tasks, such as copying code snippets, adapting example code, or using libraries and frameworks by invoking their APIs [60]. The motivations of these

29

various types of reuse are usually to save time, to avoid the risk of writing erroneous new code, and to support maintainability [61].

These practices of reuse for professional developers are also true and applicable for end-user programming, but there are still quite some differences due to the goal differences, meaning that software can be a product, or just a tool or a means to an end. Compared with professional developers, goals of reuse for end-user developers are more to save time and less to achieve other software qualities like maintainability. Therefore, finding, reusing, and even sharing code is more opportunistic for end-user developers. Moreover, in end- user programming, reuse is the way that often makes a project possible, because it is easier for an end-user developer to perform a task either manually or not at all than to have to write it from scratch without other code to reuse [62].

To reuse code, finding available code to reuse is the first step. It is challenging to find code and abstractions to reuse or even knowing if there exists available code at all [61].

For instance, previous study found that students have difficulty finding relevant APIs when they try to use Visual Basic.NET to implement user interfaces. In this situation, they tend to seek for help from more experienced peers to find example code or APIs [37]. For both professional programming and end-user programming, example code and programs are always a great source helping with discovering, understanding, and coordinating reusable abstractions and solutions [3]. In most cases, the examples found are fully functional, so that programmer can try out the examples and better understand how they work [63].

However, both professional and end-user developers can be fastidious in selecting a piece

30

of code or an API for certain function, while the difference is that professional developers would be more fastidious on quality and less fastidious on simplicity, and end-user developers would be more fastidious on simplicity but less fastidious on quality. With code of more simplicity, end-user developers can achieve the goal better for reducing their time of development.

After finding reusable code or abstractions, they may have difficulty adopting code or abstractions that were provided by an API. The study of students using Visual

Basic.NET for found that most difficulties for students relate to determining how to use abstractions correctly, coordinating the use of multiple abstractions, and understanding why abstractions had certain output [37]. Actually, the students made most errors related to the programming environment and API, and not the . For example, many students had problem with how to program to pass data from one window to another one, and they bumped into null pointer exceptions and other inappropriate program behavior. These errors were primarily caused by choosing the wrong API construct or violating usage rules in the coordination of multiple API constructs.

Compared with end-user developers, the challenge of reuse from external source like searched websites mainly lies in the quality concerns. They want to make sure the code or

API reused is correct, trustworthy, reliable and concise. The reason is still coming from the definition that they have to consider the quality of the software product.

Still due to the goal differences, the reuse adaption behavior of code or API could be different. Sometimes, the reuse process is related to some concept learning such as event

31

driven architecture or plugin architecture. When handling these concepts, professional developers tend to have a good understanding of them before adopting the code, while it is common for end-user developers to directly adopt the code without investing time in understanding these concepts. Also, when adapting the code, professional developers will probably comprehend and process the code thoroughly such as refine the code line by line, while end-user developers tend to comprehend and process less on the code and adopt directly. As long as it is running successfully, they may skip this part of code without worrying about the details. When adopting the code, professional developers are cautious and try to make it concise, while end-user developers may just copy-paste the code, sometimes even without much editing, which will also result in redundant code or redundant functions sometimes. They may not care about the redundancy as long as it looks working correctly.

To summarize, the reuse is planned and systematic for professional developers while is unplanned and opportunistic for end-user developers. The aim of reuse is both reducing time, but professional developers also aim to achieve better quality like maintainability, which is not quite a concern for end-user developers. Reuse is more necessary for end-user developers because the coding can be much more difficult or even failed if there is no proper code to reuse. The challenges of reuse for end-user developers lie more in the programming environment and API while for professional developers, they may need to consume more time to make sure the quality features of correctness, reliability, trustworthiness and being concise. When finding reuse candidates, professional developers

32

are more fastidious on quality and less fastidious on simplicity, while end-user developers are more fastidious on simplicity to reduce time and less fastidious on the code quality.

When adopting the code or API, professional developers would understand the concepts and code before use, process the code thoroughly and adopt with cautious and make it concise, while end-user developers may reuse directly without much upfront understanding, process less thoroughly of the code, and copy-paste and code and just to make it run, with tolerance of redundant code or functions.

3.2.4 Testing and verification

The main goal of testing and verification techniques is to enable people to have a more objective and accurate level of confidence than they would if there were no such process [3]. From this perspective, the difference of end-user software engineering and professional software engineering is that end-user programmers’ priorities of treating software programs as tools often lead to overconfidence in the correctness of their programs.

A test oracle is the source of knowledge which can be used to make decision about whether a particular program behavior or output is correct. Oracles might be explicitly documented statements of correct and intended behavior, or can be people that make more or less formal decisions about the correctness of program behavior or output. People are imperfect oracles generally. Professional developers are also known to be overconfident

[64], but such overconfidence decreases as they gain more and more experience [55]. In comparison, some end-user developers are notoriously overconfident; many previous

33

studies about spreadsheets found that despite the high error rates in spreadsheets, spreadsheet developers are heedlessly confident about the correctness [65], [66]. In fact, for spreadsheets, previous studies reported that between 5% and 23% of the value judgements made by end-user developers are incorrect [67]-[69] . All these studies showed that end-user developers were much more likely to judge an incorrect value to be right than a correct value to be wrong.

In summary, overconfident is a key characteristic for end-user developers during testing and verification while professional developers tend to be more cautious as they gain experience gradually. Moreover, testing for professional developers has formalized testing plans including unit test, integration test, acceptance test, system test, and regression test, etc. While for end-user developers, they mainly perform ad-hoc testing, which simply based the output and the error information from complier in most situations.

3.2.5 Debugging

Following the testing and verification step which detects the presence of errors, debugging is the process of finding and removing errors. Previous studies showed that debugging is actually one of the most time-consuming step of both professional and end- user programming [55], [70], [71]. Although there are different strategies and techniques in the process of debugging, studies have shown across a range of areas that debugging is fundamentally a hypothesis-driven diagnostic activity [36], [72], [73]. The challenge of debugging in general is that developers typically start the investigation with a “why” question regarding their program’s behavior or output, but then they need to translate this

34

question into a series of queries and actions using low-level techniques and tools such as breakpoints and print statements [74].

For end-user developers, debugging could be more problematic mainly due to the opportunistic mentality that they value feasibility of code more than understanding the code in previous steps. Many of them lack accurate knowledge and understanding of how their programs execute and, as a result, they often have difficulty conjecturing possible explanations for a program’s failure [36]. Moreover, because end users often prioritize their external goals over software quality and reliability, debugging strategies often involve quick and dirty solutions, such as modifying their code until it appears to work. However, in the process of fixing existing errors, such strategies often lead to additional errors [35].

To summarize, the opportunistic behavior in previous steps of reusing caused only partial understanding of their code and program. This partial understanding bring about more difficulty in interpreting the program failure, thus may make the debugging more time consuming. And their persistence in using strategies of quick and dirty solutions could lead to additional errors during debugging.

Along the five aspects of distinctions, the tool support related to the five aspects also vary between end-user developers in terms of the availability. For professional developers, there are many open source or commercial tools to support them when developing software starting from the requirements phase all the way until the testing and debugging phase. While for end-user developers, they lack specialized tool support for

35

them and they may use plain and simple tools and find some available tools opportunistically.

3.3 Summary

In this chapter, we have discussed the characteristics of end-user software engineering in detail. These discussions are extended from the five original main characteristics from [3], which are implicit requirements and specifications, unplanned and opportunistic reuse behavior, overconfident testing and verification, and finally the opportunistic debugging approach. Having comprehensive understandings of these characteristics not only acts as a basis for our further study, but also provides readers of this thesis a good grasp of end-user software engineering and professional software engineering. Yet our work in the next two chapters are mainly motivated by the characteristics of reuse and adoption behavior because this part is the dominant programming part in current end-user software engineering practices, and we consider it important to help and support end-user developers in these two aspects.

36

Chapter 4

Quality and value improvement

Having a comprehensive understanding of the characteristics of end-user software engineering practices, we now start to propose strategies to help improve end-user developers’ productivity. Software productivity can be roughly defined as the ratio between the functional value of software produced to the labor and expense of producing it. So in order to improve productivity, two approaches are feasible: (1) increase the output, e.g. quantity and quality, etc.; (2) decrease the input, e.g. time cost and human labor cost.

In this chapter, we start from the definitions of productivity and then present our study in terms of increasing the quality and quantity produced to improve productivity. In the next

Chapter, we present our additional study in terms of reducing the time cost during programming thereby improving the productivity. Part of the materials in this chapter appeared in [5] and [77].

4.1 Definition of productivity

The term productivity was coined in the context of labor market, which is equal to the ratio between a measure of output volume (gross domestic product or gross value added)

37

and a measure of input use (the total number of hours worked or total employment) [75].

Productivity in software engineering area is more complex and involves more factors.

Since productivity of end-user developers is different from it for professional developers in that the output of professional developers are the software program developed by them while the output of end-user developers includes more than the software programs because they have external goals. So in terms of defining productivity for end-user developers, we consider end-user developers as a kind of knowledge workers.

Figure 4.1: Factors in multiple dimensions for productivity summarized from previous literature adopted from [76].

As shown in Figure 4.1, Previous study by Ramírez et al. [76] summarized the factors contributing to productivity mentioned in previous literature and ranked these factors according to frequency occurred in the literature. We can see that the factors from important to less important include quantity, cost and profitability, quality, effectiveness,

38

efficiency, timeliness, autonomy, project success, customer satisfaction, innovation or creativity, responsibility level, knowledge worker’s perception, and absenteeism. Our work will not touch upon all these factors, but mainly limit to three factors of quality, efficiency, and project success.

4.2 How social network information support information needs

In previous chapter, we have discussed that the end-user developers’ characteristics of reuse is opportunistic, has higher necessity, and rely more heavily on online resources compare with professional developers. And based on previous related work, we also introduced that the pragmatic software reuse, a clearly defined reuse concept, is the only feasible option for end-user developers to perform their programming tasks. The challenges of reuse mainly lie in programming environment, which we further narrow down to architectural mismatch problem discussed in related work previously. Therefore, we design our study with software reuse tasks considering the architectural concerns and the online resources which we call it social network information (SNI) because they mainly seek information from social network information to fulfill their programming tasks. We start with analysis of their information needs, and then use information foraging theory to analyze the usage characteristics of the social network information.

4.2.1 Study design

Since we have recognized that end-user developers tend to apply the pragmatic reuse in their daily work. Besides, social network information plays an important role in

39

facilitating end-user developers. We first conduct an observational experiment of two reuse tasks with social network information as treatment. The purpose of the experiment is to realize the first two research aims: what information is needed in pragmatic software reuse, and how social networks can help meet those needs. The rest of this section details our experiment design and execution.

4.2.1.1 Participants

The population that our study intends to impact are end-user developers. Due to the close location and collaboration with researchers in Bioinformatics department in a local research center, we selected bioinformatics researchers who develop biomedical software as target group of end-user developers [77]. Twenty participants took part in our experiment (12 male and 8 female; 18 graduate students and 2 staff researchers). These participants were recruited from the Cincinnati local community (Cincinnati Children’s

Hospital Medical Center) via email invitations. To be eligible to participate in our experiment, each individual had to consider writing software as an essential (as opposed to accidental) part of their work, and consider that pragmatic code reuse (as opposed to pre- planned reuse) is common in their practice. We did not impose any criteria regarding research area, software development experience, or programming language, as we attempted to select a sample representative of developers across the broad biomedical domains. Our participants had varied background: 13 had no professional software development experience, 1 had less than a year professional experience, 3 had 1-5 years, and 3 had more than 5 years. Note that we performed two pilot trails before the actual

40

experimentation to test instrumentation and solicit feedback. The results from these two pilots are excluded when analyzing the data.

4.2.1.2 Tasks

The participants were asked to perform pragmatic software reuse tasks that have direct biomedical relevance (see Appendix for more details). We explicitly considered software architecture when designing the tasks. Two architectural styles were chosen: plug- in architecture and event-driven architecture. For each architecture an open-source software acted as the target system where the actual reuse were expected to take place.

Next is a description of these two systems and their reuse tasks.

 ImageJ [78] is a Java image processing program. We downloaded the latest version

of ImageJ (v1.49) and ran it as a standalone application on a Windows lab machine.

ImageJ provides extensibility via Java plug-ins. Some plug-in examples are

automatically installed and can be accessed as shown in Figure 4.2-(a). It is believed

that user-written ImageJ plug-ins make it possible to solve almost any image

processing or analysis problem [78]. The reuse task that we defined for our

participants was inspired by protein quantification with ImageJ [78]. In particular,

we pre-processed an image containing a variety of different proteins being separated

on a gel. We stored the pre-processing results in 4 text files, which were provided

as inputs to the reuse task: Protein.txt defining the values on the x-axis and each of

Result1.txt, Result2.txt, and Result3.txt giving rise to a protein sample. For each

sample, the participant was asked to reuse code so as to draw the gel plot and

41

perform linear regression of that plot. Figure 4.2-(b) illustrates one sample curve

and its linear regression result: R2=0.917147. The implication is that the best protein

fit is the sample with the greatest R2 value.

(a) (b)

Figure 4.2: ImageJ reuse task: (a) example plug-ins after installation, (b) sample output.

(a) (b)

Figure 4.3: StochKit task: (a) before reuse, (b) after reuse.

42

 StochKit [79] is a ++ biochemical reaction simulation program. We installed its

latest version (StochKit v2.0.10) on the same lab machine as ImageJ. StochKit

utilizes event-driven architecture to achieve fine-grained control of the reaction

process and to simulate real-time response. Event triggers are discrete changes in

the system state or parameter value typically used to mimic biological processes or

to recreate experimental conditions [79]. Figure 4.3 simulates the reaction: Blue +

Red  Green. The reuse task here was motivated by a mathematical model of an

open monosubstrate enzyme reaction [80]. Specifically, we asked the participant to

reuse code so that the enzyme reaction could be better controlled, namely, to follow

[80] to increase Blue’s volume under two conditions: (i) when its value drops below

5, and (ii) at time units 2 and 8. If the two conditions interact, (i) takes precedence

over (ii). The StochKit task illustrates that human intervention is essential,

especially when the amount of reactants needs to be strictly regulated to achieve a

stable biochemical reaction environment.

The independent variable was the social network information that we wanted to test in a controlled manner. To instrument such a treatment, two researchers manually searched for useful online resources and jointly finalized a set of links for each task. Tables 4.1 and

4.2 list these links pointing to portals (e.g., ImageJ #11), wikis (e.g., ImageJ #3), forums

(e.g., StochKit #7), Q&A sites (e.g., StochKit #5), etc. These links are by no means complete. Our intention is to raise the participant’s awareness of online social network information and offer a set of specific links to encourage them to take advantage of the

43

information during their reuse tasks. In this sense, the links in Tables 4.1 and 4.2 should be treated as hints that provide shortcuts to potentially useful information for carrying out the pragmatic reuse tasks. We grouped these “shortcuts” in the experimental computer’s Web browser’s bookmark—one bookmark folder per reuse task. To avoid participant’s unintentional inference about the resources’ importance, we ordered the links inside each bookmark folder alphabetically, as shown in Tables 4.1 and 4.2. For the remaining of the proposal, we use “pre-selected SNI” to refer to the social network information in Tables

4.1 and 4.2.

Table 4.1: Social network information (SNI) provided for the ImageJ reuse task.

# Title Link

1 Data Analysis‐Linear Regression http://introcs.cs.princeton.edu/java/97data/

2 Development ‐ ImageJ http://imagej.net/Develop

3 Gel electrophoresis ‐ Wikipedia https://en.wikipedia.org/wiki/Gel_electrophoresis

4 gel quantification analysis http://imagejdocu.tudor.lu/doku.php?id=video:analysis:gel_

[ImageJ Documentation Wiki] quantification_analysis

5 image ‐ Live vertical profile plot http://stackoverflow.com/questions/19016991/live‐

in ImageJ ‐ Stack Overflow verticalprofile‐plot‐in‐imagej

6 Java read file and store text in an http://stackoverflow.com/questions/19844649/java‐readfile‐

array ‐ Stack Overflow and‐store‐text‐in‐an‐array

7 Java Read Files With http://www.dotnetperls.com/bufferedreader

BufferedReader, FileReader

8 Linear Regression http://stattrek.com/regression/linear‐regression.aspx

9 Linear regression ‐ Wikipedia https://en.wikipedia.org/wiki/Linear_regression

44

10 Plot issues in Jython script for http://stackoverflow.com/questions/26400563/plot‐issuesin‐

ImageJ jython‐script‐for‐imagej‐reference‐sources‐welcome

11 Plugins (ImageJ) http://rsb.info.nih.gov/ij/plugins/index.html

12 Protein Electrophoresis | http://www.bio‐rad.com/en‐us/applicationstechnologies/

Applications & Technologies | introduction‐protein‐electrophoresis

13 Read Text file in string array Java http://www.technical‐recipes.com/2011/reading‐text‐

filesinto‐string‐arrays‐in‐java/

Table 4.2: Social network information (SNI) provided for the StochKit reuse task.

# Title Link

1 abs ‐ C++ Reference http://www.cplusplus.com/reference/cmath/abs/

2 C Program: Solving Simultaneous http://www.thelearningpoint.net/computer‐science/c-

Equations in Two Variables program‐solving‐simultaneous‐equations‐in‐two‐variables

3 Gel electrophoresis ‐ Wikipedia http://stackoverflow.com/questions/19147208/difference-

between‐using‐ipp‐extension‐and‐cpp‐extension‐files

4 Equations for 2 variable Linear http://stackoverflow.com/questions/459480/equations‐for‐

Regression ‐ Stack Overflow 2‐variable‐linear‐regression

5 Event Driven Programming? ‐ http://programmers.stackexchange.com/questions/230180/

Programmers Stack Exchange event‐driven‐programming

6 Global Variables ‐ C++ Forum http://www.cplusplus.com/forum/windows/115425/

7 How do you make C++ solve http://www.cplusplus.com/forum/beginner/34039/

equations? ‐ C++ Forum

8 java ‐ Creating a simple event http://stackoverflow.com/questions/13483048/creating‐a-

driven architecture simple‐event‐driven‐architecture

45

9 Solving a system of 2 Linear http://stackoverflow.com/questions/14594240/solving‐a-

Equations using C++ system‐of‐2‐linear‐equations‐using‐c

10 visual studio ‐ How to declare a http://stackoverflow.com/questions/9702053/how‐to-declare‐

global variable in C++ a‐global‐variable‐in‐c

4.2.1.3 Procedures

The participants worked individually in a lab and began by signing the consent form and completing a background survey. Each participant received a randomly assigned experimental ID and followed the corresponding block assignment to perform the two reuse tasks. Table 4.3 shows our block design, in which both SNI-treatment order and task order are counterbalanced. Thus, each participant performed one task with SNI and the other without the pre-instrumented SNI support. Similar to [15], our design is best understood as within-(participants plus SNI treatment) and between-(participants plus order).

Table 4.3: Experimental block assignments.

ID (Block Name) First Task Second Task

A ImageJ-without-SNI StochKit-with-SNI

B ImageJ-with-SNI StochKit-without-SNI

C StochKit-without-SNI ImageJ-with-SNI

D StochKit-with-SNI ImageJ-without-SNI

46

A researcher explained the first reuse task to the participant. The task description was printed on a hard copy which was presented throughout the task period for easy reference. The researcher then introduced the target system with the emphasis of structural aspect. If the first task was with the SNI treatment, then the participant was made aware of the task-specific bookmark folder that the researcher pre-ported to the lab computer. For the instrumentation to be uniform, the research configured all the 3 browsers’ bookmarkings of the computer in the same way: Internet Explorer, Mozilla Firefox, and

Google Chrome. If the first task was in the control group receiving no SNI treatment, then the researcher would make sure no task-related bookmarks existed in the browsers. The participant was then asked to perform the first reuse task and was encouraged to “think aloud” to verbalize their rationales, decision, strategies, and tactics being employed. Note that the participant was allowed to access the entire internet for completing the reuse task, independent of whether preselected SNI was present. Informed by our pilot trails, we set the expected task completion time to be 60 minutes and communicated such an expectation to the participant prior to the task. The participant was reminded around 50 minutes into the task but was not forced to terminate until a natural stop point was signaled by the participant himself or herself. The researcher then conducted an informal interview with the participant to collect feedback, and if the first task was treated with SNI then the usefulness of the pre-selected SNI was also surveyed verbally. The participant was given a break if desired, and then continued with the second reuse task in the same manner.

4.2.2 Information needs in pragmatic software reuse

47

Understanding the needs of software developers is a prerequisite for researchers and tool builders to better answer those needs. For software evolution tasks, Sillito et al. [81] identified 44 specific questions programmers ask and further classified those questions into

4 groups: (1) finding focus points, (2) expanding focus points, (3) understanding a subgraph, and (4) understanding groups of subgraphs. Ko et al. [55] abstracted from 17 Microsoft developers’ daily practices into 21 types of information needs, emphasizing the communication and coordination demands in collocated software teams. The needs in pragmatic software reuse tasks, to the best of our knowledge, have not been thoroughly explored.

The participants in our study asked a variety of questions which we group into 5 categories. The data extraction was done manually and jointly by two researchers. Figure

4.4 positions the categories along a software-architecture-centric and problem-domain- centric spectrum. The specific questions are presented below, annotated with ‘I’ (relevant to ImageJ), ‘S’ (relevant to StochKit), or ‘B’ (relevant to both).

Reuse infrastructure (C1) touches upon the critical issues of the architectural style underpinning the target system, and if not addressed properly, will likely cause serious architectural mismatch [23], [24] thereby hampering pragmatic reuse.

1. Where is the starting point that similar to the “main function” like the one in C++

language? [B]

2. How does this software know that I am writing a plugin class? [I]

3. What is the control structure/flow in an event-driven architecture? [S]

48

Figure 4.4: Categories of information needs in pragmatic software reuse.

4. Where is an event triggered and/or captured? [S]

5. How to reuse the software to realize simple functions like ‘Hello World’? [B]

Components & connectors (C2) are at the heart of software architecture.

Understanding the computation, the interface, the decomposition, and the interdependency is key to arrive at a successful reuse implementation.

6. How to name the plug-in class and what must be imported? [I]

7. Where should I save the plugin class in the file system? [I]

8. How do different events relate to each other? [S]

9. Where to specify precedence of multiple events? [S]

10. How can I customize the data to fit into the function being reused? [B]

11. How to initialize suitable variables to be applicable for a function? [B]

12. What is the linkage between the computation units (methods, procedures, etc.)

and/or between the encapsulates (classes, templates, etc.)? [B]

49

Reuse implementation (C3) is where the two ends of Figure 4.4 meet. Solving the needs in this category will facilitate the completion of the pragmatic reuse task in an architecturally compatible way.

13. Where can I see the compilation information? [B]

14. Is this software capable of printing things out to help me debug? [B]

15. Where are the input files located and how to change the values related to the reuse

task in the (input) files? [B]

16. How to resolve the dangling references of a reused code fragment? [B]

17. How to output string with a numeric value together? [I]

18. How is the for trajectories done? [S]

Problem solving (C4) shifts the information needs toward the functionalities that the reuse task dictates. Here the programmers search for reuse candidates written in same programming language as the target system.

19. Is there existing Java implementation that I can import (reuse) to calculate linear

regression as well as the R2 value? [I]

20. Can I find existing code to plot a point or add a legend in the coordinate system in

ImageJ? [I]

21. Is there available C++ code online to solve the linear equations? [S]

22. What might be the existing implementation for a specific function (variable type

conversion [I], absolute value calculation [S], etc.)? [B]

23. How to initialize member variables and vectors in Java or C++? [B]

50

24. Are there unit tests to be reused together with code? [B]

Problem understanding (C5) helps the developers to clarify the conceptual questions about the reuse task. Formulating an appropriate task context is important to search and evaluate reuse candidates.

25. Why use linear regression in this task? [I]

26. What is the original gel sample: human or other species? [I]

27. How is the protein separated from the gel and what is the expected error rate of the

given input _les as this can affect the kinds of (linear) regression that I do? [I]

28. Do I have to plot three curves in a single figure or in 3 separate figures? [I]

29. What does species mean in a biochemical reaction? [S]

30. Are those values (time units 2 and 8, volume below 5) arbitrary or do they follow

certain properties? [S]

31. What is the biomedical significance of the task? [B]

4.2.3 Usage of social network information

4.2.3.1 Usage flow model of social network information

Having elicited the specific questions and characterized them, we now analyze data from social network information perspective. We first build a model flow of how the social network information is used during the reuse process. Inspired by a previous study [82], which built a canonical social model of user activities before, during, and after a search act, we build the SNI usage activities before, during, and after the reuse process as shown in

Figure 4.5. The numerical data are all calculated by averaging the data of 20 participants.

51

Figure 4.5: Usage flow model of social network information before, during, and after the reuse process. Note that in sub-figure (c), we use scale 1-5 to evaluate participants’ responses, in which 5 means agree the most and 1 means agree the least.

52

4.2.3.2 Before reuse

Since we have pre-selected SNI prepared for our participants, they can access these

SNI before they reuse the software to complete the task as shown in Figure 4.5-(a). We summarize the usage data in before reuse phase. The result shows that 55.3% of the pre- selected SNI links are used during the reuse process, and 44.7% of them are unused. As shown in Figure 4.5-(b), among the unused SNI links, 58.2% are related to concept of biology related knowledge or the specific software architecture, the other 41.8% are not used because they use different strategies from a participant’s strategy in completing the task. The concept related links are not used much is because the participants are focusing on complete the task in a short time, and they would like a general understanding of task related biology knowledge and software architectural concept without digging deep into the concepts. Just from the data that 58.2% of pre-selected SNI links are used during reuse, we can make preliminary judgement that these SNI links are helpful to the participants.

4.2.3.3 During reuse

Information-seeking behavior is rooted in a “need” to find information or a motivation that drives the search process [82]. In Figure 4.5-(a), during the reuse, the information need will drive participant to seek answer directly in pre-selected SNI (31%) or go to Google to search for answer (69%). The participants can get answer from pre- selected SNI in 59.6% of situations since they go there purposefully. If not, these SNI can still guide their search in Google (40.4%). Relying on Google, they can get positive result in 81.2% of the situations while get no satisfying answer in 18.8% of situations, which

53

probably require participants to figure out these questions by themselves such as further investigation into the . From Figure 4.5-(d), our pre-selected SNI can mainly help participants solve problems in categories of components & connectors (C2), problem understanding (C5) and reuse infrastructure (C1). For categories of reuse implementation (C3) and problem solving (C4), pre-selected SNI can provide guidance but participants need to mainly depend on Google to search the solutions. This is because C1,

C2 and C5 are related to more general concerns that most participants share, while each participant has specialized questions in C4 and C5 according to their different strategies of solving problems. Our post informal interview, which asked participants how they feel about the SNI we provided, got positive feedback from participants shown in Figure 4.5-

(c), in which we use 5 as scale to evaluate their responses.

4.2.3.4 After reuse

In the informal interview, we also collected the information of how would the participants handle the SNI after reusing software to solve the actual task in their research.

We summarize the post processing of SNI in the after reuse phase. In Figure 4.5-(a), 25.4% of the participants will do nothing about SNI links they have accessed, the other 74.6% will take action. The actions include rate the SNI (22%), save SNI to oneself (78%), share

SNI to colleague (27%) and share to public (5%). Note that the sum of these four numbers is not 1 since participant may take multiple kinds of actions. Generally, participants tend not to contribute back to the community publicly such as writing a blog because they are not paid to do this and they prefer to spend time on their own work. Moreover, the contents

54

need to be well prepared and organized when publish to public, which will cost additional time and energy. However, if they found a solution or had a solution to a very difficult problem which cost them a lot of efforts, they probably will save the solution, rate the solution or post their own solution to the public.

In summary, information needs acted as the driver of the whole process including before reuse, during reuse and after reuse, so having a deep understanding of information needs can guide further steps of improving efficiency of pragmatic software reuse. From participants’ subjective feeling and objective data collected, we can roughly say that the provided SNI contribute positively to the completion of the task. More systematic analysis will be discussed next.

4.2.4 Supporting the needs with social network information

Having elicited the specific questions and characterized them, we now examine how the SNI supports pragmatic reuse needs. The support is analyzed both qualitatively and quantitatively. Figure 4.6 presents our qualitative analysis result, in which the mappings between the SNI links and the information-need categories are established. For each category, we present three statistics per task: the support from pre-selected SNI links (cf.

Tables 4.1 and 4.2), the preselected links followed by the participants, and the additional online SNI that the participants accessed during pragmatic reuse.

55

Figure 4.6: SNI usage by tasks and information-need categories.

Two observations can be made from Figure 4.6. First, most pre-selected SNI links were actually followed. This indicates that the developers perceived the SNI as helpful hints which they were willing to spend time investigating. Second, for a category whose pre-selected SNI links were actively followed, more additional links were sought. This implies that the developers, once made aware of SNI support, were motivated to pursue more links, which in turn increased the likelihood of devising a reuse solution, as opposed to starting from scratch.

The usage data of Figure 4.6 increased our confidence in the magnitude of the impact that SNI had on pragmatic reuse. In another word, if little SNI were followed, the impact would be trivial. To quantify such an impact, we assessed 2 variables: time to task completion and success of reuse solution. The comparisons were made between the control groups (participants who did not receive pre-selected SNI links) and the treatment groups

(those who did).

56

Figure 4.7: Comparison of task completion time. The effect of time decrease is only significant on ImageJ task, but not on ImageJ task.

Figure 4.7 compares the time required for completing the reuse task in different settings. Generally speaking, developers spent less time on the ImageJ task than the

StochKit task. This indicates that ImageJ’s architecture, namely the plug-in architecture, is more extensible. By conforming to basic architectural constraints and the construction process (e.g., importing the necessary libraries, storing the new class file in the plug-in folder, etc.), the developers were able to quickly extend the functionality of ImageJ. When the median completion time is compared, pre-selected SNI links facilitated both tasks to

57

be finished faster. However, the effect is statistically significant only on the ImageJ task

(Wilcoxon signed rank test: p=0.0059, α=0.05) but not on the StochKit task (Wilcoxon test: p=0.1548, α=0.05). We speculate this may be caused by the more effort in understanding

StochKit’s event-driven architecture as well as in locating the feature where pragmatic reuse would interact with [13].

In general, a (pragmatic) reuse task can have multiple, equally valid solutions. Thus we assessed the participants’ reuse solutions on an individual basis without a pre- determined ‘gold standard’ answer. Two researchers jointly judged all the solutions and quantified them into 3 categories: successful (fulfilled functionality with the reuse done conforming to the underlying software architecture), unsuccessful (unfulfilled functionality or solution developed from scratch), and partially successful (things in between). Here we use StochKit task to illustrate how we judge the solutions. Figure 4.8 -

4.10 show example solutions which are successful, partially successful and unsuccessful respectively. For successful example in Figure 4.8, it not only modified reactant’s amount according the two conditions described previously in the task, but also coded the priority situation when conflicts happened of the two conditions. In figure 4.9, the solution is partially successful. Although the two conditions were solved, but they were independent in the code that the conflicts were not considered when the two conditions happen together.

For the unsuccessful example in Figure 4.10, although the place to change code is located and the logic is represented in the code, the participant was not able to solve the system of two linear equations, and the result figure cannot be plotted.

58

Figure 4.8: Successful: solved the conflicts as well as the two conditions.

59

Figure 4.9: Partially successful: completed the two conditions but did not solve the conflicts.

Figure 4.10: Unsuccessful: unable to complete the two conditions.

60

Figure 4.11: Comparison of reuse solution success.

Figure 4.11 shows the distributions. For the ImageJ task, when provided with SNI, the developers could better complete the reuse in that the successful rate increased from

30% to 50% and the unsuccessful rate decreased from 20% to 10% in Figure 4.11. For

StochKit, SNI’s help seemed rather limited by shifting some unsuccessful reuse solutions to partially, but not completely, successful ones.

It is evident based on our observation and our interviews with the participants that, for the two tasks, developers preferred reuse over devising a solution from scratch. The preference was predominant. In addition, the support that the developers received from the

SNI—either pre-selected, additionally followed, or both—is indisputable. In fact, all the participants in our study showed significant reliance on SNI that some went on using

Google to confirm and even refine the pre-selected links prepared by us. In sum, our results suggest the positive impact of SNI on answering developers’ needs and completing the pragmatic reuse tasks with speed and quality.

61

4.2.5 Improve productivity with social network information diversity

Knowing that the SNI can positively impact the pragmatic reuse task’s completion by improving speed and quality, we would like to further explore the factors which influence the usage efficiency of SNI. Specifically, we want to identify the factors for the three phenomena in our previous results. Participants given access to the pre-selected SNIs:

(1) tend to seek more additional links to solve the task; (2) use less time to complete the task; (3) tend to have higher probability of finishing the task successfully.

We observe that for different kinds of information needs, participants require different kinds of SNI to solve them, and the efficiency is different when using different kinds of SNI. For example, before reusing the software to solve the task, participants prefer to read some concise text descriptions to understand the general task related concept such as the R2 value and the software architecture knowledge like the event-driven architecture.

For this phase, they do not want to see specific detailed information like the source code.

The concept they would like to read should be concise and abstract enough such as the introduction part of a or the first several introductory paragraphs in a wiki page. Another situation is that when participants actually concentrate on the source code to solve the task, their information needs change significantly that they are eager to see SNI with source code demonstrating the detailed implementation, such as the example in Figure 4.12. Such format of information can be well presented in Q&A website such as Stack Overflow or certain technical blogs.

62

Figure 4.12: An example of webpage containing only source code.

Another concrete example in our experiment is that one participant was trying to find a function to add a legend of the R2 value to the generated figure toward the end of finishing the task. He was presented with a pre-selected SNI link of ImageJ plot API, which was directly related to his need of plotting a legend. However, when he clicked the SNI link, he realized that there was too much information that there were 38 pages briefly

63

describing hundreds of functions as shown in Figure 4.13. It was easy to get overwhelmed by such scale of information to make it difficult for the participant to find the “addLegend” function displayed in in Figure 4.13. And even if he could find the function, it would cost additional time for him to understand how to use the function since the description, as shown in Figure 4.14, is not as practical to use as an actual code example. Actually, the participant made a quick decision to leave the SNI link though he knew the needed information was contained in this link, and then searched on Google to find other SNIs about this function. Before long he found a Stack Overflow webpage as shown in Figure

4.15. This webpage contains concise information and directly applicable example of this function. By applying the code, he solved his problem quickly.

Figure 4.13: An example of how the Web page of ImageJ plot API looks like.

64

Figure 4.14: Usage explanation of function “addLegend”.

Figure 4.15: Stack Overflow page showing code example of using function “addLegend”.

Inspired by such kind of observations, it is important to have appropriate SNI in answering the information need. We speculate that the diversity of SNI can influence the productivity of applying them. Our hypothesis is that for a specific information need, if a participant is provided with multiple types of SNIs addressing the need, he can choose the optimal one, which can help him the need at hand according to his own preference. We

65

therefore want to test if the diversity of SNI is a factor for the three phenomena of seeking more additional SNI links, less completion time, and higher successful rate.

In order to compare the productivity between control group and treatment group, we first identified three information needs in ImageJ task that all the participants share: (1) how to plot a point in a coordinate system; (2) how to calculate R2 value; (3) how to add the R2 as a legend or label to the coordinate system. These information needs are contained in the questions 19 and 20 in the “problem solving (C4)” category defined previously in

Section 4.2.3. The three common information needs mean that all the participants need to resolve the three problems in order to complete the task. For each information need of one participant, we count the number of SNI links he followed to solve the need and record the total time of solving the need. We average the data in the control group and treatment group and then present the data in Tables 4.4 and 4.5. Here, we use number of SNIs to quantify the diversity of SNIs since we consider any different SNIs have diversities in certain aspects.

Table 4.4: Comparing the number of links followed to fulfill the information needs between the treatment group and control group.

ID Avg # of SNIs – Treatment Group Avg # of SNIs - Control Group

(Information need) (Standard deviation) (Standard deviation)

(1) 2.6 (0.81) 1.4 (0.32)

(2) 6.3 (1.86) 3.7 (1.12)

(3) 3.4 (0.45) 1.6 (0.65)

66

Table 4.5: Comparing the time used to fulfill the information needs between treatment group and control group.

ID Avg Time (s) - Treatment Group Avg Time (s) - Control Group

(Information need) (Standard deviation) (Standard deviation)

(1) 127.3 (22.3) 147.5 (31.1)

(2) 322.4 (43.6) 450.3 (38.5)

(3) 135.6 (24.4) 170.2 (25.5)

In Table 4.4, we can see that for all three information needs, the number of SNIs followed in treatment group is greater than the control group. This phenomenon further proves that the pre-selected SNIs are likely to motivate participants in seeking more relevant SNIs. It also suggests that participants tend to seek more diverse SNIs to find better solutions when provided with initial seeds. Relating back to the first one of the three phenomena we want to explain, we conclude that it is the objective of achieving larger diversity of SNI hints that drives the participants to seek more additional links.

From Table 4.5, we summarize the time used to solve the three common information needs. Although from Table 4.4, participants in treatment group sought more SNI links for an information need, the time used to fulfill the need was less than that used in control group. This indicates that the more diversity of SNI hints help participants finish the task faster. This implies that the participants accessing more diverse SNIs will use less time to complete the task, which explained the second phenomenon we observed. Next, we offer

67

an example to explain these seemingly contradictory findings: seeking a greater number of

SNIs while completing the tasks faster.

When the participants are trying to find out how to calculate R2 value, there is a kind of SNI link very helpful to solve this need, which contains a whole piece of source code of

R2 calculation. Our pre-selected SNI can facilitate the participants to explore more diverse

SNIs so that they can find such kind of SNI link, as shown in the black rectangle in Figure

4.16. However, finding such kind of SNI link is not so easy for participants in control group.

One of them first came across some webpages explaining the concept of R2. By learning from the concept of linear regression, he tried to understand these concepts and come up with code to fulfill the task. He finally fulfilled this part of task after a long while since it is time consuming to learn the knowledge and convert them to code. After that, he happened to also find the SNI link in Figure 4.16, which could have saved him some time.

But at this point the SNI link already had no value to him. From this example, we can see that the participants finding more diverse SNIs of R2 calculation tend to find better solution thus completing the subtask faster.

68

Figure 4.16: Source code snippet from a blog-like webpage.

To evaluate if diversity of SNI has influence on the success of the task completion, we group the participants according to their task completion situations. From previous results in Figure 4.11, for the ImageJ task, 8 participants were successful in completing the task, 9 participants were partially successful, and 3 participants were unsuccessful. For each participant, we count the number of all the SNI links followed to complete the task and summarize the data for each group into a box plot as shown in Figure 4.17. We can see that the participants who successfully completed the task tended to follow more SNIs, which was an indication of SNI diversity. Therefore, we infer that a higher diversity of SNI can increase the probability of finishing the task successfully.

69

Figure 4.17. Comparing SNI diversity by task completion groups of being successful, partially successful and unsuccessful.

In summary, our analysis shows the relations between SNI diversity and the three phenomena: (1) In order to gain diversity of SNI, participants given access to the pre- selected SNIs tend to seek more additional links; (2) Participants accessing more diversity of SNI use less time to complete the task; (3) Participants accessing more diversity of SNI tend to finish the task more successfully.

70

4.3 How the diversity of social network information impacts

productivity

To further study how the diversity of SNI impacts productivity during pragmatic software reuse, we first need to decide whether we need to perform new experiment or use the data from previous experiment. Since in the previous experiment the treatment of pre- selected SNI would influence the productivity, we performed a new round of experiments in which we removed the treatment of pre-selected SNI. We invited another 20 participants who had not participated in our previous study. Since the study design is largely repetitive with previous design, we omit the description of this study design here. We present our analysis and results in this section.

4.3.1 Categorizing social network information

According to our analysis of participants’ usage data, the diversity of SNI can be revealed from 4 perspectives. First, the different types of SNI with different organization of information can influence whether a participant will use it, the way and the efficiency of using it. Recall the example in Figures 4.13, 4.14 and 4.15, it is difficult and time consuming for participants to find information in a wiki-like API Web, but the task can be fulfilled easily in a Stack Overflow page. Second, the quality and the credibility can be diverse for different webpages. The quality diversity originates from the different authors of the webpages. People have various levels of expertise or authority in certain aspects. For instance, the developers of a software system are probably more authoritative than the

71

general users of the software in giving instructions of how to use the software. Third, inspired by our categories of information, the diversity can come from what kind of information needs that SNI can answer. The information need can be described as a question: Is this SNI about general concept of R2 regression, software architecture or the actual implementation details? Last, the diversity can lie in the timing that the participant finds one SNI. For the same SNI, it may be a hurdle when one finds it around 5 minutes but may be very valuable information if one finds it around 15 minutes to the task. Our previous example in Figure 4.16 shows that the source code of calculating R2 value is very useful before a participant starts to implement R2 calculation, but it becomes useless if the participant already finished programming R2 regression calculation when he comes across this SNI link. Our observations of these SNI diversity are largely reflected in the description of social information foraging theory, which will be introduced later. From the

4 perspectives of SNI diversity, we propose the following five specific diversity categories:

1) Social network type: From the different structure of a webpage, the natural classification of SNI includes Q&A (e.g., Stack Overflow), blog, wiki, content communities (e.g., YouTube), social networking sites (e.g., Facebook), etc. [83]. We have already discussed that various types of SNI vary in the ability and the efficiency in assisting the participants. Extending the current classification, we considered the programming feature of using SNI and further refined these categories including Q&A, wiki, software

API, developer tutorial, and software repository. According to the social network type, the webpage in Figure 4.18 is labelled as “Q&A” hint.

72

Figure 4.18: A sample SNI website - some contents are omitted and rearranged. The numbers in black circle indicate the category type. (1): Q&A; (2): core author; (3): number of contributor role; (4): question of reuse implementation; (5): question of finding focus points.

2) Contributor role: Human factor is the root factor, which determined the validity of the content on a SNI page. From this perspective, we categorized SNI to be core author, main author, and marginal author to capture the SNI diversity. For example, core authors can be the original developers of a software system who should have the highest authority about the software, so their words should be highly valued when describing the software.

Similarly, main authors can be the of this software who probably know part of the software thus having less authority about the software. Marginal authors can be the general users of the software who have the least authority about the software since they know software externally and may give speculative ideas or suggestions. Since in essence contributor role is an indication of the content’s authority, it can be reflected by different features such as the number of users’ approval of the content. As the example shown in

Figure 4.18-(2), seven users voted for the answer, which reflected the value of the content.

73

Having valuable answer is an indication that the webpage has content with validity, so we label this SNI as a core author hint. Here, five up-votes is used as our threshold because it is the threshold of triggering the reputation change in Stack Overflow [84].

3) Number of contributor roles: The validity of content on a webpage can also be reflected by the number of contributor roles since various viewpoints by different people can provide more value than an individual statement. The observation is in line with the study by Singer et al. [84] that we cannot rely on the popularity of a single user’s idea in social network. For an idea to gain traction in an online social network, multiple users need to post about it. Therefore, a topic discussed by multiple people tend to have higher validity and importance. However, too many ideas may get the readers overwhelmed and further lower the efficiency of finding valuable information. For instance, the topic in a forum may collect thousands of answers or posts, which is too much to digest. In addition, the posts in a forum generally are only listed as the original order by posting time, making it difficult for the reader to find posts with high quality and validity. Considering this, we categorize

SNIs as single author, several authors (no more than 10) or many authors (more than 10).

The threshold 10 was chosen based on our observation from the experiment videos that participants would read 10 posts maximally from an SNI link.

4) Information needs concerning software architecture: In our results in Section

4.2.2, we classify the information needs by our participants during the reuse tasks into five categories: problem understanding, problem solving, reuse implementation, component and structure, and software infrastructure. Our rationale is that there are two ends of

74

information needs: one is the domain concept concerning the problem to be solved, and the other end is the architecture concern of the software to be reused. The other three categories lie in between of the two ends. The aim of this classification is to test if the diversity of

SNI hints could be captured by information needs driven by architectural concern.

Specifically, we categorized the SNI according to the category of questions it answers. As shown Figure 4.18-(4), the information need is how to read file with Java and store the text as an array. The information need is a sub question when the participant was trying to implement a function. The searched SNI was triggered by implementation and used for implementation, thus we labeled this webpage as the “reuse implementation” hint.

5) Information needs organized by complexity: Sillito et al. [81] identified 44 questions programmers ask during software development. They further categorized the 44 questions into four groups: (1) finding focus points, (2) expanding focus points, (3) understanding a subgraph, and (4) understanding groups of subgraphs. They consider the categories in terms of the amount and type of information required to answer a question

[81], which indicate the complexity of answering a question in our opinion. In Figure 4.18, the question is a simple technical question with no external relation to other concerns, thus we labeled it as the hint of “finding focus points”.

4.3.2 Results and analysis

Our previous observations and analysis fit well to the social information foraging theory that the diversity of hints have a log-normal relationship with the overall

75

productivity, which we have described previously in Figure 2.1 from Chapter 2.2. Next, we will use this social information foraging model to guide our analysis.

Table 4.6: Raw data extracted from experiment video. Time Step Duration(m) Gain Action

IDE 8.00 17% Editing

SNI1 0.80 3% Scanning

SNI2 0.53 2% Scanning

SNI3 3.58 50% Copying

IDE 4.00 7% Editing, debugging

SNI4 1.12 7% Copying

IDE 6.00 3% Editing, debugging

4.3.2.1 Raw data extraction

In our experiment, the computer screen was recorded as a video to capture each participant’s behavior. We generated the raw data as shown in Table 4.6 by analyzing these videos. We split each participant’s process of finishing the task into time steps based on the window environment they were working on. For example, “IDE” in Table 4.6 means the participant was focusing on the IDE window to program on software source code;

“SNI1” means focusing on first social network information searched and accessed. We saved the actual SNI webpage links for later analysis. We recorded duration time in minutes and actions for each time step. For each time step, productivity was evaluated by the designer of the task mainly based on lines of code along with its complexity [85]. To reduce

76

bias, two task designers both estimated the productivity and achieved consensus. Gain was then calculated indicating the percentage of contribution this time step would contribute to the overall task completion.

4.3.2.2 Data refinement

Previously, the raw data were generated by analyzing the videos recorded for each participant. Since our study focused on the SNI used by the participants, we needed to keep only the time steps with SNI utilized. Thus we removed data of other IDE time steps as shown in Table 4.7. Based on the raw data, we calculated cumulated time and gain values.

Then, we divided each cumulated gain value by cumulated time to obtain the rate of gain values, which constitute the values for Y axis of Figure 2.1, according to the mathematical model of social information foraging theory. For instance, in Table 4.7, after using SNI2, the time spent on SNI was 1.33 minutes and the participant achieved 8% of task completion, thus the rate of gain was 8%/1.33 = 0.060.

Table 4.7: Calculating rate of gain.

Time step Time accumulated(m) Gain accumulated Rate of gain

SNI1 0.80 4% 0.050

SNI2 1.33 8% 0.060

SNI3 4.92 70% 0.142

SNI4 6.03 87% 0.144

SNI5 6.75 94% 0.139

SNI6 7.52 98% 0.130

SNI1 8.07 100% 0.124

77

Now that we obtained the rate of gain values for Y axis, we needed to formulate values for X axis in Figure 2.1. In particular, we re-shaped the data by labeling the SNI as hint according to the five diversity categories defined previously as shown in Table 4.8.

Then, we grouped the same type of SNIs and assigned the X axis values according to the number of SNI categories participant had already used. For each type, we assigned 1 unit

X axis value to it, and divided the X axis evenly if there were several SNIs in it. For example in Table 4.8, for first type, there were two SNIs in it, so the first X value for SNI1 is assigned as 1/2 = 0.5 and the second one is assigned as 1.0. This strategy can be imagined as shrinking the X space whenever there are several SNIs grouped into one type. Another comparative example analyzed by second categorization of contributor role is shown in

Table 4.9.

Table 4.8: Categorizing SNI and assign X axis values according to category #1: SNI type.

# of types of SNI SNI category X Y

One type SNI1-software API 0.5 0.050

SNI2-software API 1.0 0.060

Two types SNI3-source code tutorial 1.5 0.142

SNI4-source code tutorial 2.0 0.144

Three types SNI5-wiki 3.0 0.139

Four types SNI6-developer tutorial 3.5 0.130

SNI1-software API 4.0 0.124

78

Table 4.9: Categorizing SNI and assign X axis values according to category #2: contributor role. # of types of SNI SNI category X Y

One type SNI1-core author 0.5 0.050

SNI2-core author 1.0 0.060

Two types SNI3-main author 1.33 0.142

SNI4-main author 1.670 0.144

SNI5-core author 2.0 0.139

Three types SNI6-marginal author 2.5 0.130

SNI1-core author 3.0 0.124

Figure 4.19: Fitting data points into a curve.

79

After obtaining X values and Y values for a diversity category, we drew the fitting curve (Figure 4.19) to have an intuitive feeling of how the curve looks like. The curves obtained by the other four categorization strategies are overall similar to Figure 4.19.

According to the social information foraging model (Figure 2.1), the rate of gain versus hint diversity curve should follow log-normal distribution. While from Figure 4.19, the distribution of first two points on the left side is nothing like log-normal distribution.

We found that there was a “cold start” problem. The 20 participants had a quick scan of one or two SNI links that were not directly useful to the task solving. The participants used this kind of webpages to warm up with the task and help them get into the task, and we call it warm up SNI. Our participants averagely scanned 1.8 warm up SNI links before they got into the real task solving. This phenomenon is in line with Ying and Robillard’s study [86], which investigated whether the nature of a reuse task has any relationship with when a programmer would edit code during a programming session. Their results showed that an enhancement task was less likely to be associated with a high fraction of source code edit events at the beginning of the programming session. They also found that this could be explained by the fact that enhancement tasks might require exploration throughout the trace.

Their conclusion exactly matched our observation since our reuse tasks were similar to enhancement task. Considering this point, these starting SNI webpages foraged by participants did not satisfy our study purpose since they did not contribute to completing the reuse task. Therefore, we removed these data from raw data to perform analysis. As indicated by the right side of dotted line in Figure 4.19, the curve looks much more like a

80

log-normal distribution. This step unblocked our way to further analyze log-normal regression described in the following section.

4.3.2.3 Log-normal regression

After the warm up SNI links were removed, our next step was to fit the five categories of data into a log-normal regression curve to see how good the fitness was and which one best fit log-normal curve. Here, we used Microsoft Excel to perform log-normal regression.

For our analysis, we need to adjust the values of μ and 휎 in the distribution formula to make the lognormal function best describe our data. At first, we manually assign μ (mu) and 휎 (sigma) to be 1 as shown in Figure 4.20-up. The predicted rate of gain value could be calculated according to the formula. Then the difference of actual gain rate and predicted gain rate is calculated and finally the sum of squared differences is achieved (in green rectangle). Then a solver function tool in excel is used to search the smallest sum of squared difference value by adjusting the value of μ and 휎. The results are shown in Figure 4.20- down. It shows that the mu and sigma values are updated to achieve the smallest sum of squared differences (in purple rectangle). With the obtained μ and sigma for the formula, we added more points and drew the curve in Figure 4.21, which gives intuitive feeling of how well the curve is fitted to lognormal distribution.

81

Figure 4.20: Lognormal regression with Excel Solver; up: initial setting, down: results after running Excel Solver.

Figure 4.21: Fitting log-normal regression curve.

82

Table 4.10. Lognormal regression parameters. Category type µ σ Sum of Difference

SNI type 1.41 0.93 6.9E-4

Contributor role 1.54 1.00 1.5E-4

# of contributor role 1.57 1.17 7.3E-4

Architecture concern 1.47 0.90 9.3E-4

Development questions 1.42 0.93 5.6E-4

Similarly, we plotted five regression curves for each participant according to the five diversity categories. For each curve, a pair of µ and σ values (Table 4.10) was obtained to indicate the fitness of the regression. We will take one participant as example to show the results and then discuss the situations for all participants.

In Table 4.10, the category type of contributor role has the smallest value of sum of squared differences, meaning that the contributor role can best predict the productivity according to the foraging-theoretic principle. This finding applies to 15 of the 20 participants. For the five remaining end-user developers, three have the best regression fit with the information needs concerning software architecture, whereas the other two of them best fit the information needs organized by complexity. Further analysis shows that the

SNIs used by the five participants have only one or two kinds of contributor roles. Hence the hint diversity based on contributor role does not exist. In situations like this, other types of diversity categorization show better ability to predict the productivity. We therefore

83

conclude that contributor role serves as a primary factor that best manifests foraging theory’s prediction. Next we analyze the interactions between the factors.

4.3.2.4 Interactions between categories

We further studied how the interactions between two types of categorization affect theoretic prediction. Since we concluded that “contributor role” fit best to foraging theory’s prediction, we only used “contributor role” to interact with other diversity categories. Here, we used bivariate log-normal distribution [88], which basically means that the log-normal shape is a joint effect of two variables. To fit the bivariate log-normal distribution, we used a similar approach to the process of log-normal regression. The results are shown in Table

11. The difference is that now we need to adjust two pairs of variables to achieve the best fit.

Table 4.11: Lognormal regression parameters.

µ (C2) σ (C2) µ σ Sum of difference

6.41 8.97 C1: 6.59 C1:10.20 1.5E-3

6.76 4.37 C3: 7.64 C3: 10.18 9.2E-4

5.58 4.46 C4: 10.59 C4: 11.59 1.5E-6

6.55 7.04 C5: 15.26 C5: 21.46 6.8E-6

Since for 15 of the 20 participants, social information foraging model best predicts the productivity with contributor role. We only analyze interactions for these 15 participant.

In Table 4.11, when contributor role is incorporated with information needs concerning

84

software architecture, the sum of difference value of 1.5E-6 is the smallest. This value is also smaller than the value of 1.5E-4 when contributor role is considered only, meaning that the combined effect in theoretical prediction is even better than only considering contributor role. For the 15 participants that we analyzed, when contributor role is incorporated with information needs concerning software architecture, productivity could be best predicted for 12 of them. The remaining 3 can be best predicted by incorporating contributor role with social information needs organized by complexity. Our intuition for the results is that the contributor role and information needs concerning software architecture are two relatively orthogonal directions in categorizing the SNI. Information needs organized by complexity has similarities to information needs concerning software architecture, hence sometimes it can better predict the productivity.

4.4 Threats to validity

In this section we present the potential threats to validity of our study including threats to internal validity, threats to external validity, and threats to construct validity.

4.4.1 Threats to internal validity

When quantifying the productivity in fitting to the social information foraging model, the quantification standard is mainly defined by the experiment designer, which involves subjective estimation of code complexity along with the lines of code generated.

Such estimation may not capture the productivity objectively, which may influence the accuracy of our results. A more objective and detailed definition of productivity is needed

85

in our future work. Also, the metrics and thresholds used to categorize the SNI are based on our observation and subjective understanding. We need to perform detailed case studies for different types of SNIs to define the metrics and threshold values in a robust way.

4.4.2 Threats to external validity

Our results about software developers’ information needs in pragmatic reuse are clearly influenced by the types of architectural styles and particular systems adopted in our experiment. While the 31 specific questions may not be generalizable to other settings, we believe the external validity is relatively stronger for our result categories (cf. Figure 4.4).

Another external validity threat relates to the representativeness of our study participants. While we tried to be inclusive, the participants were recruited from a local community and were primarily affiliated with a university's medical campus. Also, the tasks in our experiments are well defined, which might not represent all kinds of tasks in real world. The study by Scaffidi et al. [90] proposed an approach to categorizing end-user developers according to their programming practices and the ways they represent abstractions. End-user programming tasks can be grouped into three types including programming by example, textual language programming like spreadsheet formulas, and visual programming languages. The tasks of bioinformatics researchers mainly fall in the task type of programming by example, thus we argue that our results are generalizable to end-user developers who mainly perform task of programming by example, but may not be generalizable to other types of tasks.

86

4.4.3 Threats to construct validity

For our analysis about SNI’s effect on pragmatic reuse, an important construct validity threat is the pre-selection of the SNI links for both tasks. We attempted to avoid repetitiveness in our SNI preparation; however, different researchers might select different online resources. Nevertheless, the usage data reported in Figure 4.6 show that any SNI preparation will be unavoidably incomplete. As a result, we argue that the main purpose of the SNI is not to present to the developers all the links, but to encourage them to engage in an active information seeking and knowledge acquisition process.

The removal of “cold start” points, which helped us proceed with fitting the social information foraging theory, is not completely confirmed to be a right decision. Although we have found one study [86] discussed this phenomenon and backed up our decision, more careful analysis is needed to prove the rationality of this decision.

4.5 Summary

For the biomedical community, reusing software in a pre-planned manner is often infeasible. Pragmatic software reuse, therefore, is key to programmer’s success in practice.

This chapter makes four major contributions: (1) a classification of 31 information needs elicited during pragmatic reuse tasks; (2) A flow model of how SNI are used before, during, and after pragmatic software reuse; (3) an observational experiment revealing the positive impact that development online social networks have on meeting the information needs and on completing the reuse tasks; (4) a social information foraging approach identifying

87

the contributor role, together with architectural consideration, as key factors of improving the predictability of end-user programmers’ productivity.

88

Chapter 5

Cost estimation and reduction

Our main goal is to help end-user developers improve their productivity. In the previous chapter, we mainly take approaches of increasing their quality and output to improve productivity. In this chapter, we discuss approaches to estimate the time cost and reduce the time cost during their programming tasks. In specific, there are two parts of work: (1) we found that according to different foraging goals and different kinds of webpages, there are different foraging styles. We try to characterize these foraging styles to provide time cost estimation whereby optimizing their selection of webpages and reducing the time cost; (2) we identify that end-user developers tend to have short-revisit behavior frequently, and we developed tool support to help reduce the time cost regarding this kind of behavior. Part of the materials in this chapter appeared in [89] and [90].

5.1 Estimating time cost through foraging curve styles

5.1.1 Motivations

89

Developers, both novices and experts alike, frequently use web search engine as an opportunistic approach to programming, emphasizing speed and ease of development over code robustness and maintainability [91]. Search engines are of significant value since they not only filter oceans of information, but also rank webpage links according to relevance.

However, sometimes a developer was sure that he selected a webpage containing the needed information but it turned out that he was overwhelmed by the large amount of info in the webpage. This kind of situation reflects that having only relevancy may not be enough in identifying an optimal webpage link. The cost of foraging a webpage can provide supplementary information which could prevent developers from getting into dilemma such as being overwhelmed and distracted. This observation is in line with Martos et al.’s study [92] where they found end-user developers use diverse types of cues and “scents” besides keyword and relevance during the searching for programs and their variants. Their study opens question regarding search engines: is keyword-based searching for programs and their variants enough? To answer this question, our study considers foraging cost as an important type of cue.

Recently, Piorkowski et al. [93] used Information Foraging Theory [94] to study developers’ ability to predict the value and cost of their investigation decisions. They found that over 50% of developers’ navigation choices produced less value than they had predicted and nearly 40% cost more than they had predicted [93]. Their study revealed open problems in predicting the value and cost of navigation decisions. Related to our study, they pointed out the cost estimation problem regarding how to enable developers to more

90

accurately predict the foraging cost they will incur before they incur them [93]. They further performed literature analysis showing that little research has been done to align the predicted navigation value and cost with the actual values and costs. Therefore, they called for action to make progress toward solving the open problems.

The study of Piorkowski et al. [93] provides important motivation to our study, which aims to make progress in the cost prediction problem. Specifically, we focus on analyzing time cost of foraging a webpage. We identify two factors of information accumulation and information amount for a webpage, which could assist developers in selecting appropriate webpage links. Our overall hypothesis is that by comprehensively considering the webpage’s relevancy, information accumulation, and information amount, developers can find useful information easier and faster.

For end-user developers, utilizing and developing software is not their goal, but a means to achieve their goal. They often lack the time or motivation to learn contemporary programming techniques. They frequently refer to knowledge from internet, even for basic programming knowledge during programming. Therefore, we study end-user developers’ web usage behavior in this Chapter. Also, the scope of this Chapter is limited to the web searches that are related to coding because compared to general web searches, the web searches in programming tasks are more information-intensive and intelligence-complex.

To tackle information-intensive tasks in software engineering, Pirolli’s Information

Foraging Theory attracts much attention lately. The theory leverages our animal ancestors’

“built-in” food-foraging mechanisms [95] to understand human’s rational information

91

seeking and gathering behaviors [96], which provides operationalizable constructs for us to estimate the time cost of foraging a webpage.

In this Chapter, we develop tool support following our hypothesis to automatically extract two features for a webpage: information accumulation, and information amount. To evaluate our tool support, we invited 20 end-user developers to perform a lab experiment of two software reuse tasks. Three results supported our hypothesis: (1) participants use less task completion time when having tool support; (2) participants with our tool support run into less unproductive cases than without tool support; (3) participants with tool support tend to visit more easy-to-forage webpages due to the ranked contents. The key contribution of our work lies in the identification of two novel hints of information accumulation and information amount for a webpage, which could facilitate end-user developers’ web search process.

5.1.2 Related work

5.1.2.1 Cost evaluation & cost reduction

Holmes et al. [97] described their approach to finding source code examples in which the structure of the source code that the developer is writing is matched heuristically to a repository of source code. Ten examples are returned to the users and ranked according to structural similarity based on four heuristics: inherits, calls, uses (a relaxation of the calls heuristic), and references. Each example consists of three parts: a graphical overview illustrating the structural similarity to the queried fragment, a textual description of why the examples was selected, and source code with structural details highlighted that are

92

similar to the queried fragment. The ranking and the returned knowledge of mainly graphical overview and textual rationale description can be used by developers to quickly decide whether the recommended example is worth examining more closely thereby achieving cost evaluation and reduction.

Several studies have considered humans as a valuable resource. Minto and Murphy

[98] introduced Emergent Expertise Locator (EEL) that uses emergent team information to propose experts to a developer within their development environment as the developer works. Based on the history of how files have changed in the past and who has participated in the changes, the tool can recommend a ranked list of members with whom to communicate. In another study, DeLine et al. [99] introduced Team Tracks that rely on human further to use the combined data across all team member. While this tool is not to recommend members, it uses the team’s navigation data to filter the typical hierarchical information about the program. Team Tracks provides a visualization called Related Items according to two principles: (1) The more often a part of code is visited, the more importance it has for someone new to the code; (2) The more often two parts of the code are visited in succession, the more related they are. Overall, human can provide valuable information directly or indirectly, relying on team members can reduce the cost by going directly to important code sections.

Toomim et al. [100] presented a lightweight editor-based technique named Linked

Editing for managing duplicated source code. With this technique, two or more code clones are identified as being similar and are persistently linked together. The differences and

93

similarities are then analyzed, visualized, and recorded, allowing users to work with all linked elements simultaneously, or particular elements individually. It eliminates the problems of traditional solution to duplicated code, which has inherent cognitive costs, leading programmers to chronically copy and paste code instead. The visualization and the linked editing technique reduce the cost of handling duplicated code effectively. To summarize, previous studies mainly focus on the cost estimation and reduction for the source code level activities. These efforts vary in their approaches used, including similarity measure, human factor, abstraction and visualization, etc. Our study is supplementary to previous efforts since our study focuses on the evaluation and reduction for the cost of foraging a webpage. Also, most of the previous techniques are not applicable to the case of foraging a webpage because the contents on webpages are enormous and are not limited to source code while the most previous tools were working on a relatively small set of source code artifacts.

5.1.3 Our approach

Our overall hypothesis is that by comprehensively considering the webpage’s relevancy, information accumulation, and information amount, developers can find useful information easier and faster. Since relevancy has already been provided by the search engines, our approach focus on achieving cost estimation by utilizing remaining two features of information accumulation and information amount.

5.1.3.1 Cost estimation

94

Previously in Figure 2.2, we have discussed the patch model of information foraging theory that as the foraging time within the patch increases, the cumulative amount of useful information gained from the patch increases. The curve is increasing but with a decreasing speed based on the assumption that there will be diminishing valuable information gained in a patch as time progresses. The assumption is based on the observations that forager generally will prioritize to forage more valuable information and forager may get redundant information in later time which replicates information encountered earlier [52]. The patch model can be constructed differently according to different purposes. Our study aims to investigate how end-user developers gain information from webpages to help them complete their programming tasks. Generally, they use a search engine to search for relevant webpages to gain information.

As a result, we treat each webpage as a patch to construct patch model. However, when applying the patch model to the webpages, we realized a problem that the foraging curves for webpages are not always like the shape in Figure 2.2, which increases with a decreasing speed, because the structures and organizations of webpages vary significantly.

The patch mode should have a hidden assumption that the forms of patches are more or less similar to each other, which is not the case for the significantly varied webpages. Thus we revisited Pirolli and Card’s paper [52], finding an example which supports our rationale.

In [52], they described an example whose foraging curve is linearly increasing as shown in

Figure 5.1, stating that an information forager who collects relevant citations from a finite list of citations returned by a search engine, where the relevant items occur randomly in

95

the list. This example gives us hint that the foraging curve can be different according to the characteristics of an information patch.

Figure 5.1. Foraging curve adopted from [52]: A linear, finite cumulative within-patch gain function.

Figure 5.2: Foraging curves: (a) Conceptual Foraging; (b) Answer Seeking Foraging. Here, we excluded between-patch time and only include within-patch time compared with Figure 1 and Figure 2 since we only use within-patch curve in the rest of our analysis.

96

Inspired by the two shapes of foraging curve, we further analyzed more webpage and found two additional shapes of foraging curve and summarized the foraging curves into four categories. The first category was depicted in Figure 2.2, meaning that the contents of a webpage is ranked according to certain standard. Q&A website is a typical example such as Stack Overflow, Quora, Answers.com, and Yahoo Answers, because these webpages ranked the answers according to users’ voting. We name the first category as

Ranked Foraging. The second category was depicted in Figure 5.1, indicating that the forager can linearly get information from a webpage. The webpages containing listed items belong to the second category. For instance, an API website for a software system, which lists the functions can be used. The number of functions learned is roughly linear to the time spent. We name the second category as Linear Foraging. The third and fourth categories are shown in Figure 5.2-(a) and Figure 5.2-(b) respectively. The third category represents the situation when the goal of the forager is not to seek for information, but to study certain concept. The knowledge learned is accumulated gradually and exponentially, and forager may get to a tipping point to achieve an in-depth understanding. Wiki and blog websites could be examples for such foraging goal because they aims to convey knowledge comprehensively. We name this category as Conceptual Foraging. The fourth category represents the situation that the forager is seeking a piece of specific information from a webpage. Before he found the answer, the gain was always near to zero. For instance, in a forum webpage, there are posts from many people discussing a topic. The forager scans the webpage and skips the unrelated discussions to his needs, and when he finds the right answer and solves his problem, he would leave the webpage without continuing reading

97

the remaining contents. We name this category as Answer Seeking Foraging. We summarized the four categories of foraging curves as a feature named information accumulation. Besides information accumulation, information amount is another important factor for cost estimation, which we will discuss next in the Tool Design section.

5.1.3.2 Tool design

Having identified the two factors of information accumulation and information amount that contribute to the foraging cost, we start to design and develop corresponding tool support. The ideal solution is to automatically depict the shape of foraging curve by analyzing the characteristics of a webpage, but it is difficult to figure out a straightforward approach to achieving this automation since it depends on user’s query and goal, and time sequence of the user’s information foraging. Therefore, we come up with a compromise approach that we judge a webpage’s foraging curve according to its webpage type such as wiki, forum, API, and Q&A, etc. Specifically, we identify the webpage type according to the keywords contained in the url of a webpage. We first build a dictionary for each of the four categories of foraging curves. Example keywords include blog, wikipedia, wiki, courses, stackexchange, stackoverflow, khanacademy, lecture, .java, api, tutorial, , forum, and so on. According to these keywords, we first judge the types of webpages and then assign the four categories of foraging curves accordingly. It is unavoidable that some of the urls do not contain the keywords in our dictionary; we put these webpages into category of Linear Foraging since this category is neutral and will influence foragers’ selection the least. We tested our approach on 200 randomly selected webpages, 72.5% of

98

the webpages were distinguishable according to the keywords, while the remaining 27.5% of the webpages were not, so we put them into category of Linear Foraging. The accuracy is 78.5%, meaning that 78.5% of the webpages were assigned to foraging curves correctly.

Although we sacrificed some accuracy, we achieved the feasibility of implementing our hypothesis. Finally, our tool will return an image of corresponding curve for a webpage link, as shown above the black underline in Figure 5.3.

Another tool feature is to quantify the amount of information in a webpage. We use number of words on a webpage to estimate the amount of information since word is the most important carrier for information. We noticed that some webpages contain contents which are not the main content such as advertisement. Readers generally would ignore the nonprimary content. While our approach still counts these words based on three considerations: (1) the extra count does not influence much on the overall estimation of information since the non-primary content is not dominant; (2) the word count of webpages vary significantly from several hundreds to hundred thousands so that the webpages are distinguishable; (3) it is difficult to separate the non-primary content automatically.

99

Figure 5.3: An example result with our tool support, which shows the first five results links when searching “java double to string” in Google. Our tool adds information accumulation and information amount hints for each webpage link, as shown above the black underline. The first curve represents Ranked Foraging, the remaining curves belong to Linear Foraging.

100

Previous study showed that people can generally read about 300 words in a minute

[101]. Dividing the word count by this speed, we obtain the approximate time needed to forage a webpage. As a result, our tool will return the time information as well as the word count information for each webpage, as shown above the black underline in Figure 5.3.

Figure 5.3 comes from a situation when we want to know how to convert double type to string type for a variable. Figure 5.3 displays the first five result links returned by Google search engine when we search “java double to string”. From the information accumulation and information amount information in Figure 5.3, judgement could be made that probably first link and third link are two good options to forage because the first link contains ranked information and the third link contained the least information, which requires only 0.5 minutes to forage. This example can somewhat reflect the value of our tool support, which is to have another layer of information filtering besides relevancy.

In summary, our tool provides two novel hints of information accumulation and information amount to developers, which could facilitate their web search process according to our hypothesis.

5.1.4 Evaluation design

To evaluate our tool support and hypothesis, we performed a lab experiment. Our independent variable was our designed tool support that we wanted to test in a controlled manner. We named our tool as CostEstimator. The population that our study intends to impact is end-user developers. Again, we select bioinformatics researchers who develop biomedical software as target group of end-user developer. This time, we invited another

101

twenty participants to participant in our experiment compared to our previous experiment in Chapter 4 because we used same tasks for our experiment. These participants were recruited from the local community via email invitations. To be eligible to participant in our experiment, each individual had to consider writing software as an essential (as opposed to accidental) part of their work. Our participants had a varied background: 12 had no professional software development experience, 2 had less than a year professional experience, 5 had 1-5 years, and 1 had more than 5 years. For the tasks, we still used the two same tasks as the previous experiment in Chapter 4, which are ImageJ task and

StochKit task. The participants worked individually in a lab and began by signing the consent form and completing a background survey. This time, the treatment is our tool

CostEstimator instead of social network information (SNI). Each participant performed one task with CostEstimator and the other without it. We counterbalanced both

CostEstimator treatment order and the task order.

5.1.5 Results and analysis

5.1.5.1 Reduced time for task completion

When performing the software reuse tasks, the computer screen was recorded as a video for each participant. Our results are generated by analyzing these videos. We first want to know if having our tool support can reduce the overall task completion time. We excluded one participant’s data for our analysis since the participant did not finish the

StochKit task. The participant was struggled with C language in StochKit task because he had a biology background and only programmed in Matlab and Python before. The result

102

of task completion time for the remaining 19 participants is presented in Table 5.1.

Generally speaking, developers spent less time on the ImageJ task than the StochKit task.

When the median completion time is compared, the tool support of CostEstimator facilitated both tasks to be finished faster. For ImageJ task, it is a 20.8% reduction from 60 to 47.5 minutes, and for StochKit task, it is an 8.7% reduction from 74.8 to 68.3 minutes.

However, the effect is statistically significant only on the ImageJ task (Wilcoxon signed rank test: p=0.0020, α=0.05) but not on the StochKit task (Wilcoxon test: p=0.1235,

α=0.05). We speculate this is because the complexity of StochKit task lies more in the original source code comprehension, and the function that the participants need to write is relatively simpler than ImageJ task. For StochKit task, the participants need less external information from websites.

Table 5.1: Comparison of task completion time: “Median” means the median task completion time, “SD” represents standard deviation.

Task Without CostEstimator With CostEstimator

Median (SD) Median (SD)

ImageJ 60 min (3.4) 47.5 min (2.7)

StochKit 74.8 min (4.5) 68.3 min (4.7)

We also noticed that the extra overhead of using CostEstimator can impact the task completion time. The overall mechanism of CostEstimator is to first parse the original webpage source code, and then add estimation information to the source code, and return

103

the webpage finally. The process of and adding information would prolong the webpage response time when accessing a webpage. To compare the overhead, we first analyzed the original query response time of Google. We only analyzed Google since all the participants in our study used Google to search for information, and also considering its dominant position in search engine field. From both our experiment and pervious study

[9], we found that the Google’s average response time to queries is 2 seconds. We then analyzed the response time when having CostEstimator support, which ranges from 5 to 8 seconds. A 3 to 5 seconds increase of time for a query is not a small amount at first glance.

However, the total extra overhead is no more than 2 minutes for a task since there were no more than 20 queries used. We consider 2 minutes extra overhead not hurting much since

CostEstimator achieved more than 10 minutes time reduction for each task. To achieve better performance, we will also seek for strategies to reduce the extra overhead in the future. The factor of extra overhead also make us think that an more advanced solution involving complex technique such as natural language processing or may have an even bigger overhead of processing a webpage. From this perspective, our simple approach may be preferred although it suffer from accuracy problem. Further study is needed to examine this speculation.

Overall, the reduced task completion can provide preliminary evidence that our tool is effective, but further analysis is needed to check how the tool impact developers’ behavior.

104

5.1.5.2 Shifted pattern for foraging curves

To study what changes our tool has brought about, we calculated the distribution of webpages according to the four categories of foraging curves. In specific, for each of the

19 participants, we analyzed all the webpages that were accessed. We then partitioned the webpages into two groups by checking whether a webpage is found with or without tool support. Then we classified these webpages according to the four categories of foraging curves. We count the number of webpages for each category per participant, and then calculated the average for each category. Table 5.2 summarized the results, which are the distribution of the number of webpages in the four categories of foraging curves. We can see a general trend that when performing tasks with the tool CostEstimator, the number of visited webpages is reduced. Our further analysis found that this is because participants go to fewer webpages that are less useful to their task with the tool support.

Table 5.2: Distribution of webpages according to the four categories of foraging curves. The data averaged from 19 participants.

Categories ImageJ-without ImageJ-with StochKit-without StochKit-with

Total 32.6 25.8 20.6 16.7

Ranked 10.7 (33%) 10.9 (42%) 6.9 (33%) 7.4 (44%)

Linear 9.2 (28%) 5.6 (22%) 5.3 (26%) 3.8 (23%)

Answer Seeking 8.8 (27%) 6.2 (24%) 6.1 (30%) 3.4 (20%)

Conceptual 3.9 (12%) 3.1 (12%) 2.3 (17%) 2.1 (13%)

105

In order to compare the data generated having or not having tool support, we transformed the data into percentage data (Table 5.2) and generated Figure 5.4. From

Figure 5.4, for both ImageJ and StochKit tasks, when having tool support from

CostEstimator, participants used more webpages from the Ranked Foraging category, and used less webpages from categories of Linear Foraging and Answer Seeking Foraging. Our further analysis reveals the reason. We found that participants have more questions about the details of implementations than the conceptual level question, which is also reflected in Figure 5.4 that only a few webpages used are in category of Conceptual Foraging.

Therefore, having the tool support will lead participants to access more webpages from

Ranked Foraging, from which answers can be easily found. This observation explains how our tool helps improve the efficiency from one perspective.

Figure 5.4: A trend shifting to the Ranked Foraging category when having tool support.

106

Since CostEstimator itself is not accurate when categorizing webpage into the four types of information accumulation hint, we also analyzed the categorization accuracy in our experiment. Two researchers first analyzed and agreed on 30% of the webpages and then partitioned and analyzed the remaining 70% webpages separately. For ImageJ task with CostEstimator, averagely 3.8 webpages were misclassified from 25.8 used webpages, resulting in an accuracy of 85.3%; for StochKit task, averagely 2.7 webpages were misclassified from 16.7 webpages, resulting in an accuracy of 83.8%. For both tasks, the accuracy is higher than the previous tested accuracy of 78.5% from 200 random webpages.

The reason is that the keywords dictionary we built focused more on programming related webpages, hence CostEstimator can classify webpages in our task more accurately than general webpages. Also, roughly 70% of the misclassified webpages were categorized as

Linear Foraging from other three categories, owing to the strategy we used to categorize unrecognizable webpages into Linear Foraging. The problem of accuracy loss can impact our finding that participants tend to forage more webpages belonging to Ranked Foraging category. However, we consider the impact not a significant threat because: (1) only a small portion of webpages were misclassified; (2) webpages were mainly misclassified into

Linear Foraging category from all other three categories, the pattern could be more obvious if the classification is accurate.

5.1.5.3 Reduced number of unproductive cases of foraging

During the programming tasks, participants will frequently access webpages that turn out to be not directly useful for their tasks. We call this situation an unproductive case

107

of foraging. According to our observation, there are mainly two situations for unproductive cases: (1) the webpage does not contain the information needed by participant; (2) the webpage does contain needed information, but it turns out that the participant cannot locate it. For the first situation, our tool support cannot help much since the tool provides cost estimation, but not reflects the relevance of a webpage. While for the second situation, we expect our tool can help since having cost estimation information can remind participant not to go to webpages with large amount of information and difficult to forage. To test our thinking, we partition webpages according to unproductive or productive cases this time.

And then also calculated the average values. We summarize the results in Table 5.3. For

ImageJ task, we do see a decrease of unproductive cases from 40.5% to 29.5% when having tool support. While for StochKit task, there is a little decrease from 20.9% to 18.6%, which we speculate the reason is still because participants need less external information so that the effect is not obvious.

Table 5.3: Distribution of webpages for unproductive and productive cases of foraging webpages. The data are averaged from 19 participants.

Categories ImageJ-without ImageJ-with StochKit-without StochKit-with

Total 32.6 25.8 20.6 16.7

Unproductive 13.2 (40.5%) 7.6 (29.5%) 4.3 (20.9%) 3.1 (18.6%)

Productive 19.4 (59.5%) 18.2 (70.5%) 16.3 (79.1%) 13.6 (81.4%)

108

For ImageJ task, we further compared the unproductive cases between having or not having tool support. We found that for the unproductive cases, there are averagely 8691.3 words for a webpage when having no tool support, while there are only 1409.6 words for a webpage when having tool support. We also compared the information accumulation feature, which has no pattern or significant difference. This indicates that the decrease of unproductive cases is mainly caused by the information amount feature that participants go to less webpages which have a large amount of information.

In addition, we received positive feedback from our participants. From the feedback regarding our tool collected from our participants, 15 of them felt that overall the tool is helpful in helping select a good webpage link, although it is not always helpful. Some of them say that it is helpful, but the two features are not functioning together generally.

According to different situations, sometimes the information accumulation helps, sometimes the word count helps, and in some other situations, neither of this two help.

Three of the participants said they feel the tool has some influence to their selection, but they are not sure if it helps improve the efficiency. We analyzed their data, showing that they finished the task faster when having tool support. Two of the participants said they feel the word count is actually more useful because it helps prioritize their selections, and the foraging curves cannot accurately depict the webpage content. Their experience reveals the drawbacks of our tool that it cannot always predict information accumulation accurately.

In the future, we will seek more accurate ways to summarize the information accumulation,

109

which probably require us to have a deep analysis about the structure based on the HTML source code of webpages.

5.1.6 Threats to validity

One of the external validity threats to our study is the representativeness of our study participants. Our experiment involved a limited number of participants. These bioinformatics researchers may not represent some other kinds of end-user developers. The study by Scaffidi et al. [102] proposed an approach to categorizing end-user programmers according to their programming practices and they ways they represent abstractions.

Following their approach, we summarize the key programming practices of bioinformatics researchers is to reuse and to change software ranging from reusing a complete software system to importing certain existing library packages. The two tasks in our experiments reflected the key practices, thus we expect our study is applicable to the end-user developers featured with reusing and changing software as their main programming practices.

The four categories of foraging curves may be not complete in describing the webpages which have significant diversity. However, we argue that the incompleteness does not hinder the results and conclusions we made based on two reasons: (1) we used consistent standard to classify the webpages; (2) we want to show difference between patterns, not between all the patterns.

110

In our experimental design, each participant perform one task with and the other task without tool support. This design may bias the participants so that they tend to return positive feedback. To mitigate the bias, we explicitly told participants before collecting their feedback that we were neutral to the tool features and were willing to accept any suggestions or critiques. In our tool design, we overly simplified the complex situations in webpages when calculating the word count. Special situations can threaten our simplified assumption. For example, code snippet can contains large amount of information with relatively fewer words. It is probably not fair to equal the information amount of a word between a code snippet and a general paragraph. Another example is that our strategy ignored the information contained in images on a webpage. A possible solution in the future is to treat an image as certain amount of words to reduce the neglected information.

Another key threat comes from the unpredictability of human behavior. Humans are good at information seeking, they may find useful information quickly even the webpage is not well organized or contains large amount of information. Different people vary in their ability of information seeking and foraging. Even the same individual may perform differently depending on various factors. These variabilities may be against to our hypothesis and tool design. However, we argue that this impact is largely reduced by averaging and cross comparing the performance from 20 participants.

5.1.7 Summary

In this part of work, we started from patch model in Information Foraging Theory aiming to analyze end-user developers’ web search behavior. We discovered four

111

categories of foraging curves for webpages, which we summarized as a hint of information accumulation. We hypothesized that the two hints of information accumulation and information amount would facilitate end-user developers by estimating the cost of foraging a webpage. We then developed tool support to provide the two hints. With the tool support as independent variable, we performed a lab experiment to evaluate our hypothesis. Three results supported our hypothesis: (1) participants require less task completion time when having tool support; (2) participants with our tool support run into less unproductive cases than without tool support; (3) participants with tool support tend to visit more easy-to- forage webpages due to the ranked contents. In addition, the participants’ positive feedback showed that both hints of information accumulation and information amount could help in finishing their tasks. The key contribution of this part of work lies in the identification of two novel hints of information accumulation and information amount for a webpage, which could facilitate end-user developers’ web search process.

5.2 Reducing time cost of short-term revisit behavior

5.2.1 Motivation

Programming is an information-intensive process since programmers always need to gather various kinds of information to write the correct code, including programming task requirements, learning new programming concepts, clarification of existing knowledge and reminders about forgotten details [91]. Since programmers cannot always keep all the detailed information in mind, they constantly need to go back and revisit various kinds of documents. Some kinds of revisit behavior are straightforward and fast,

112

while in some other situations the revisit behavior can be inefficient thus hurting productivity [103]. In order to improve programmers’ efficiency regarding the revisit behavior, studying the characteristics of this behavior during programming is a prerequisite.

Previous efforts have mainly focused on the revisit behavior on general webpages.

The focus includes high percentage of revisit behavior occupied in the overall browsing history, various revisit patterns, reasons behind the revisit behavior and revisit prediction.

Previous studies consistently found that there is a high revisit rate ranging from 45% to

81% during web browsing and usage [104]-[106], indicating the revisit behavior is an important research topic. Different from the studies of revisit focusing on general Web pages, Sawadsky et al. [103] focused on the revisit behavior of code-related Web pages, and they developed a tool called Reverb which can correctly predict 51% of code-related webpage revisits. These studies provide valuable insight into different perspectives to revisit behavior. However, the studies are all based on log analysis for a long time span generally from days to months. We would like zoom in to further study the short time period revisit behavior within several hours during end-user developers’ everyday programming tasks. Specifically, this section will present the study of the end-user programmers’ revisit behavior during software reuse tasks.

For end-user developers, they care more about generating results by rapidly making a software tool work rather than developing a well-engineered tool. They often lack the time or motivation to learn contemporary programming techniques. Therefore, they frequently visit and revisit Web information, even for basic programming knowledge

113

during their software reuse tasks. Based on these observations, we start to think about how to study end-user developers’ behavior during software reuse tasks. We use Pirolli’s information foraging theory to serves as our basis for studying and modeling the revisit behavior during software reuse tasks.

In this section, we first perform an exploratory experiment to observe the revisit behavior and develop tool support. We then apply information foraging theory to model the revisit behavior and generate initial hypothesis. Finally, to test our hypothesis, we perform experiments in a controlled manner with tool support as a treatment involving 20 biomedical software developers performing software reuse tasks. Our aim is to study the short term revisit behavior and provide principled tool support to help end-user developers improve their productivity. The contributions lie in three aspects. First, it identifies novel research topic of short term revisit behavior which can potentially improve programmer’s efficiency. Second, it utilizes information foraging theory to quantitatively model the revisit behavior which can help understand this phenomenon from a novel perspective.

Finally, our preliminary tool support helped programmers reduce 19.7% of time in finishing their reuse tasks.

5.2.2 Related work

5.2.2.1 Web page revisitation

The study of revisit behavior started with the focus of answering the question of how much are the revisits occupied in searching and browsing behavior [105], [107]-[109],

[111]-[114]. An early study in 1997 by Tauscher and Greenberg found an average revisit

114

rate of 58% by analyzing 6 weeks of Web usage data collected from 23 users [107]. Later in 2001, Cockburn and McKenzie reported a significant higher revisit rate of 81% in their

Web logs [105]. Baldi et al. speculated the reason of this higher revisit rates is that the usage of the Web may have evolved from a more exploratory mode to a more utilitarian mode [105]. However, the study by Herder found that the actual reason for the reported increase in page revisits is due to differences in preprocessing step [106]. Herder’s study also reported a revisit rate of 51% after a careful processing of data collection and preparation. Although the concrete numbers of revisit rates by previous studies are different, the conclusion is consistent that revisit is a constant and repetitive behavior worth research attentions.

After learning revisit is an unneglectable behavior, the next research question is what are the actions or patterns of the revisit behaviors. Adar et al. [108] studied the revisit pattern based on the time span of revisits and the frequency of revisits. They performed large scale analysis of 612000 users’ Web interaction logs for five weeks, and they identified four categories of revisit patterns: rapid repeat revisits, slower repeat visits, mix of fast and slow repeats, and variable time between repeats, and each of the revisit patterns mainly corresponds to certain kinds of Web pages [108]. They further analyzed the reasons behind the behaviors and discussed how these patterns can help predict the future revisit behavior more accurately.

The study by Obendorf et al. [109] also categorized the revisit behavior according to the time span. As shown in Figure 5.5, they defined it as short-term revisit, medium-

115

term revisit, and long-term revisit along with the proportion for each category. They identified different user strategies to revisit Web pages for different categories. For short- term revisits, instead of navigating back and forth, people switch between windows or tabs, which results in fewer page requests and fewer revisits. For medium-term revisits ranging from one hour to a week, direct access strategy (URL-entry, bookmark selection) were most frequently used. Long-term revisits generally aim to rediscover content accessed earlier, and hyperlinks initiated the most long-term revisitations. Rediscovery is problematic since after a long time, users often have problems to remember the original query and the search result pages of global search engines are changing rapidly. Finally, their conclusion is that although long-term revisits had only a share of 7.6% of all page revisits, users encountered severe problems in this category of revisits [109].

Figure 5.5. Categories of revisit behaviors (adopted from [109]).

Although previous studies concluded that long-term revisits is the most demanding research area, we will study the short-term revisit in this paper. Our hypothesis is that although people have no problem re-finding previously visited information after a short while, the time cost of revisit can still be reduced thus improving the overall efficiency, especially when they are focusing on a programming task, which requires large volume of information digestion and highly intelligent work.

116

5.2.3 Exploratory study of end-user developers revisit behavior

Our first study objective is to investigate how the revisit behavior looks like during end-users developers’ and whether it is a meaningful topic to study. To address these questions, we analyze the videos recorded in experiment described in section 5.1, which consists of ImageJ task and StochKit task with CostEstimator as treatment. We do not use experiment videos with SNI as treatment because we consider the pre-selected SNI provided by us has more influence to the participants’ revisit behavior.

5.2.3.1 Data preprocessing

The computer screen was recorded as a video to capture each participant’s behavior.

In order to analyze their behavior in detail, we carefully analyze each video and fragment participant’s behavior whenever the participant switches to another window. The fragmented information is recorded in a spreadsheet. Table 5.4 is extracted from the spreadsheet, which shows a brief example of how the video was segmented for part of a participant’s session. We call each window that participant visited as an entity, then we assign a distinct entity number to each entity. We also analyze and record other information including the time spent on an entity, the goal of visiting this entity, the actual behavior within this entity and how the result look like. Since such analysis is subjective, we have two researchers analyze the videos to construct the spreadsheets and then compare them and resolve the conflicts together.

117

Table 5.4: Fragmented behavior from participant’s video.

Table 5.4 shows an example of revisit behavior in bold. In the software reuse task, the participant needs to calculate R2 value for a set of floating numbers. By exploring in

Google search, he finally found a useful Web page “Webpage-2: LinearRegression.java”

[110] which is a piece of source code consisting of 48 lines to perform linear regression calculation. Then he copied the whole function and decided to paste and call this function in his source code, which is “Entity 2: HelloWorld.java” in Table 5.4. He switched back to the source code file for 5 seconds then realize that it was difficult to call the whole function directly. So he changed his mind and revisited Webpage-2 to integrate the code little by little into his source code. There are other situations of revisit behavior with various reasons behind it; we do not include them here due to space limitation.

5.2.3.2 How many visits are revisits?

In order to answer the question how many visits are revisits, we first need to define what revisit is in our situation. Compared to the pure browsing behavior heavily studied

118

previously, revisit behavior in a software reuse task is different in two ways. First, revisit during software reuse task not only includes revisiting webpages searched and browsed, but also includes other kinds of material such as the source code file currently working on, software documentation, task description, etc. Therefore, our study of revisit has a broader scope which includes all the materials revisited. Second, revisit during software reuse task not only needs to search, gather, understand information from all kinds of sources, but also needs to digest the information to help the participants generate lines of source code. All the information gathering and understanding are surrounding the core source code file in need of changing, participants constantly need to go back and revisit the source code file to understand or to edit it. Considering this, we do not count the constant returning to source code as revisit behavior.

By analyzing the data in the 5 spreadsheets generated previously, we summarized the results in Table 5.5. The result data are generated by calculating the average and standard deviation of the 5 participants’ raw data. The number of visits and revisits are calculated in two ways: non-redundant and redundant. Non-redundant means that we only count the distinct entities that a participant visited while redundant means we count all entities visited including the ones visited repeatedly.

On average, the participants visited 31.3 different entities in 49.3 minutes. If counting the repeated visits, there are 125.6 revisits in total. Note that the difference between the two numbers is not the number of revisits according to our definition, because participants constantly return to the source code they are editing which is not treated as

119

revisits by us. Similarly, the participants averagely revisited 31.3 entities including 7.2 distinct ones. The average time spent on revisits is 10.2 minutes, which occupies 20.7% of the total time of finishing the task. Twenty percent is not a small number, and we believe more research is needed to reduce the time spent on revisit behavior. The revisit rate is

23.0% for nonredundant calculation and 24.9% for redundant calculation, which is overall consistent with the time spent on revisit. The revisit rate 24.9% in our case is much smaller than 45% to 81% generated by previous studies of general Web page revisits [103].

According to our understanding, the reason is that the study method and study scope are totally different from previous studies. Previous studies generally perform Web log analysis for several months, but we study end-user developers’ behavior by observing their activities in one hour. Therefore, direct comparison is probably not meaningful.

Table 5.5. Statistical results of visits and revisits.

Average # (std) Average time (std)

Total visits (non-redundant) 31.3 (6.4) 49.3 min (8.4 min) Total visits (redundant) 125.6 (28.7)

Revisits (non-redundant) 7.2 (3.1) 10.2 min (4.3 min) Revisits (redundant) 31.3 (8,2)

Revisits rate (non-redundant) 23.0% (5.2%) 20.7% (5.8%) Revisit rate (redundant) 24.9% (4.1%)

120

5.2.3.3 What support can we provide?

In order to provide support to reduce the revisit costs, we need to identify the key characteristics of revisit behavior during end-user developer’s reuse task. From the results above and our observations, we identified three characteristics. First, if a participant revisited an entity, he will probably revisit it more times in the future. This phenomenon is also discussed in [103], which studied revisit behavior of code-related Web pages. They found that greater than 13% of code-related page visits are revisits, and the frequency of revisits for some participants is more than three times per hour. Although in our experimental data, revisit of an entity once does not cost much time but multiple revisits cause the time cost not to be neglectable. Second, we found that even though participants have no difficulty finding the entity information previously visited, the process is sometimes costly. Since there are always many windows, tabs, and pages opened during performing the reuse task, the participants may continuously go to the wrong places to find information and sometimes even get lost and forget what he wanted to do initially. This phenomenon is also observed in [115] that they found developers spend significant time navigating code fragments across multiple locations. Regarding this aspect, we cannot reduce the number of revisits since the revisit behavior is driven by participant’s inner goal which is difficult to change. However, we can provide tool support to largely reduce the complexity of finding previously visited entities. Third, the revisit behavior itself is essential in finishing the task, but the context switch can interrupt participant’s flow. The participant need to take some time to refocus on the source code file he is working on and

121

figure out which step he is on. And the longer participant exploring other entities, the longer he needs to re-concentrate to the source code file, which is the core of finishing the task.

This interruption and refocus problem is also discussed in previous studies [116], [117].

Based on the above observations, we designed a tool support as shown in Figure 5.6.

The general idea is that the developer could capture the required information anytime and anywhere on the screen, then the captured information can always float on screen so that the developer does not need to go back and forth when foraging the piece of information.

In this way, time and energy could be saved from the reduced switch and revisit behavior.

The design of the tool mainly combines two features. First, it utilizes the screenshot tool and set the hot key to access the function conveniently. Second, the image generated by the screenshot tool will automatically have the feature of always-on-top of the screen. This feature is adapted from a software named FileBox Extender [118], which can pin a window to always be on top. Our tool can also support multiple windows simultaneously floating on top of window. Figure 5.6 shows an example that when a programmer needs to read input data from a text file as well as preprocess the data for later use. The user can program on the source code file and always refer to the two small floating windows in the meantime.

This design will reduce the redundant behavior in programming and keep programmer’s flow smoothly. Although the design of the tool sounds very simple, we expect it to be effective in reducing the time costs for participants’ revisit behavior because it targeted to reduce the complexity of revisit discussed earlier.

122

Figure 5.6: A use case of our tool support.

Different from other tasks, programming needs absolute correctness of code to make it run. Therefore, even several lines of code will need programmers to go back and check several times to adapt it to his own code. Generally, there are two approaches in reusing the code from online resources, one is to adapt the code little by little to integrate into one’s own code. In this case, our tool can be effective to improve programmer’s efficiency.

Another method is to copy a large chunk of code and change it little by little to fit to one’s own code. However, this method has a shortcoming that in a lot situations, programmer may fall into chaos when trying to make the code right and even ruin the original right code.

This phenomenon is especially more applied to end-user programmers who have a lot of

123

opportunistic habits when programming. In this situation, our tool can still help the programmer because they can always refer to the small windows of code section to see how the code looked like originally before he made any changes.

Our tool support has some similarities with the tool Code Bubbles [115] developed by Bragdon et al. which aims to help developers in two ways. First, it provides lightweight editable information fragments called code bubbles to help developers concentrate on a small piece of closely related information. Second, the code bubbles are displayed concurrently within an IDE window so that developers can refer to these information conveniently to reduce the navigation interactions. Their tool design is based on the similar observation that developers spend significant time reading and navigating code fragments spread across multiple locations. Compared with the Code Bubbles, our tool is not limited to a specific IDE and can show the small fragment of information above any software windows. Also, our tool can capture all kinds of information needed by programmer but not limited to gathering only code information fragments. The disadvantage of our tool is that we lack the flexibility of automatic layout of multiple window fragments which was implemented in Code Bubbles. The research objective of developing this tool support is to use it as treatment to perform experiment which will be described next.

After we have initial insights observed from the exploratory experiment, we developed tool support which we expect to improve the bioinformatics researchers’ programming efficiency. In this section, we will redesign our experiment to integrate our tool as treatment. The aim is to test if our idea and tool are effective in the actual software

124

reuse tasks. Again, we used the ImageJ and StochKit tasks to establish our experiment, and we invited 20 bioinformatics researchers who have not performed the two tasks before.

5.2.4 Results and analysis

5.2.4.1 Reduce revisit time with EasyRevisit

Having the experiment video data, we first want to know if having our tool support can reduce the revisit time during the tasks. We excluded one participant’s data for this analysis since the participant did not finish the Stochkit task. The participant struggled with

C language in StochKit task because he had a biology background and only programmed in Matlab and Python before. The results for the remaining 19 participants are presented in

Table 5.6 for ImageJ task and Table 5.7 for StochKit. We used the same strategy like Table

5.5 to summarize the data. Different from Table 5.5, the data in brackets are summarized from the experiments without the treatment EasyRevisit while the data with no brackets are from experiments with EasyRevisit.

For ImageJ task, we can first compare the data without EasyRevisit with the data in

Table 5.5. They are comparable since both data come from experiments with no support of

EasyRevisit. By comparing the data in brackets in Table 5.6 with the data in Table 5.5, we can find that data are overall consistent with each other, which validated the reliability of our data to some extent. Then we can compare the data with treatment and without treatment, which is to compare data without and with brackets in Table 5.6. We can see that even though the distinctly revisited entities increased a little from 6.8 to 7.5 when using

EasyRevisit, the total number of revisits decreased significantly from 33.2 to 13.8 which

125

decreased 58.4%. Correspondingly, the revisit time reduced 59.2% from 10.3 minutes to

4.2 minutes. Also the total visits including redundant ones decreased from 128.3 to 73.2, and the total task completion time reduced from 48.8 minutes to 39.2 minutes.

These decreased data when performing task with EasyRevisit clearly proved the effectiveness of our tool support. We also noticed that the total task completion time reduced 9.6 minutes while the revisit time only reduced 6.1 minutes. For this phenomenon, we hypothesize that the reduced revisit behavior can further improve the efficiency of other kinds of behavior since the participants can be more concentrated with less distracting information and behavior. This hypothesis need further study in the future.

Table 5.6: Statistical results for ImageJ.

Average # with (without) Average time(min) with

RasyRevisit (without) EasyRevisit

Total visits (non-redundant) 32.4 (29.8) 39.2 (48.4) Total visits (redundant) 73.2 (128.3)

Revisits (non-redundant) 7.5 (6.8) 4.2 (10.3) Revisits (redundant) 13.8 (33.2)

Revisit rate (non-redundant) 23.1% (22.8%) 10.7 (21.1%) Revisit rate (redundant) 18.9% (25.9%)

126

Table 5.7: Statistical results for StochKit.

Average # with (without) Average time(min) with

RasyRevisit (without) EasyRevisit

Total visits (non-redundant) 17.3 (18.5) 54.8 (59.6) Total visits (redundant) 66.4 (86.7)

Revisits (non-redundant) 5.8 (5.5) 7.2 (10.3) Revisits (redundant) 16.8 (28.2)

Revisit rate (non-redundant) 23.1% (29.7%) 13.1 (17.3%) Revisit rate (redundant) 25.3% (32.5%)

For StochKit task, however, the effect is not as significant as the ImageJ task. The average revisit time reduced from 10.3 minutes to 7.2 minutes and the average task completion time reduced only 8% from 59.6 minutes to 54.8 minutes. This is probably because the complexity of StochKit task lies more on the original source code comprehension, and the function that the participants need to write is relatively simpler than ImageJ task. The participants need more external information to solve ImageJ task, but for StochKit task they mainly need to focus on the source code. We categorize the two kinds of tasks as external information intensive task and internal information intensive task.

We expect that our tool can facilitate former kind of task more effectively by reducing the time cost for revisit behavior.

From the feedback regarding out tool collected from our participants, 17 of them felt that they had a better experience and were more efficient using the tool in revisiting

127

some key information items. Moreover, five of them stated they could be more focused on the task by using the tool since they felt less distracted by the information locating in different places. One of the professional participant said he was still more accustomed to his own way of finding information because he had more freedom of using various approaches to find or store information. Although from the result data, he used relatively less time of the task with tool support. Two of the participants were not sure whether the tool could help them finish the task faster because using the tool would cost additional time overhead.

In summary, from the reduced revisit time and participants’ experience, our tool support is effective in improving efficiency. Especially when the task needs to refer to large amount of external information besides the source code to be changed.

5.2.4.2 Integration of information foraging model

Although based on the reduced revisit time and reduced task completion time with tool support we can observe that our tool support is effective, we still do not know where and how the time is reduced. In order to answer this question, we use information foraging theory to model participant’s process of navigating and foraging information during the software reuse task. The benefit of using information foraging model is that it can help us quantitatively analyze the sub-processes of the foraging process.

To apply the information foraging model, the concept patch from the model needs to be defined in our situation. In the original theory [52], the task environment of an

128

information forager often has a “patchy” structure. Information relevant to a person’s information needs may reside in piles of documents, file drawers, office bookshelves, libraries, or in various on-line collections. Information patches could be relatively static online collections such as Web sites or temporary collections constructed by a Web search engine in response to user queries. Often the information forager needs to navigate from one information patch to another. Often the forager is faced with decisions: How should time be allocated to between-patch foraging tasks versus within-patch foraging tasks [52]?

In our case, we define the patch at the level of a window, a tab or a document such as each source code file, each Web page opened, and each document used like the task description. Whenever a participant switched from a window to another window looking for information, we say the participant navigated from one information patch to another information patch. For each visit to a patch, the time spent on finding the patch is called between-patch time, the time spent on understanding and using the information patch is called within-patch time. Same for the revisit, each revisit has between-patch time and within patch time. The distinction is that visit is finding a new patch while revisit is navigating to a previously visited patch.

Having the between-patch time and within-patch time defined, Charnov’s marginal value theorem in Figure 2.2 described the quantitative relations between the two variables.

From Figure 2.2-(a), if we know the between-patch time tB and the within-patch gain function g(tW), we can predict the optimal within-patch time t* that should be used to forage the patch. The t* occurs when the slope of the within-patch gain, g, is equal to the average

129

of gain, which is the slope of the tangent line R. In Figure 2.2-(b), If we can reduce the between-patch time from tB1 to tB2, the optimal within-patch time should be correspondingly reduced from t1* to t2*. Corresponding to our case, we reduced the between-patch time for the revisited patches with our tool support. Our tool support fetched chunks of information, which then would be assigned surrounding the source code to be changed. This strategy can reduce the time cost by revisiting a patch in the same place, which originally would need to revisit from different locations. Considering this, we will only analyze the revisited patches for ImageJ task to see if the between-patch time and within-patch time follow the theoretical prediction. We also want to check if the reduced time for revisited patches is actually equal to the total task completion time reduced. If they are equal, it means the reduced task completion time all comes from the reduced time for revisit behavior.

Since we cannot have same participant performing same task both with and without tool support, we will use the averaged data for all participants to perform the analysis.

Specifically, for a same patch of data revisited by the participants, we group the data in two groups, with or without tool support. We first sum the total between-patch time for revisit for a participant. Then we calculate the average of the total between-patch time. The within- patch time is also calculated in the same way. According to the a chunk of information’s value to the overall task completion, the designer of the task analyzed the gain value obtained for each time of the revisit and draw the gain function curve. Finally, we summarized the data and drew curves similar to Figure 5.7, which is only an example of

130

showing one revisited patch’s situation. Each point represents one revisit to the patch. For each revisit, we recorded the patch re-finding time and the patch re-handling time. Then the task designer analyzed how much contribution this revisit can make to the overall task completion. After collecting the information for each point, we drew these points along the positive side of x-axis and fitted a regression curve to these points. The total patch re- finding time is the sum of each patch re-finding time, and was then assigned to the negative side of x-axis. Finally, the tangent line could be drawn and the predicted within-patch time could be estimated. This kind of gain rate curve was also discussed in our previous study

[5], which had a more detailed description of how we drew such curves.

Figure 5.7. An example showing what the within-patch time should be when the between-patch time is reduced.

131

By analyzing all such figures and data generated, we have two findings. First, the total reduced time for revisit behavior is 9.4 minutes, which constitutes 79.7% of the total reduced time. Considering the time difference can be caused by other random factors, we conclude that overall, the reduced task completion time comes from the time reduced for revisit behavior with our tool support. Secondly, as shown in Figure 5.7, when the between- patch time is 1.61 minutes, the predicted within-patch time is 2.73 minutes, which is near to the actual within-patch time 2.87 minutes used by the participants. This means the theoretical prediction is accurate. When the between-patch time is reduced from 1.61 minutes to 1.18 minutes, the predicted within-patch time should be reduced from 2.73 minutes to 1.95 minutes. However, the actual within-patch time in our experiment was reduced to be 1.54 minutes, which was significantly less than predicted value 1.95 minutes.

This phenomenon was also observed for most of other revisited patches. The reduced within-patch time was possibly because the reduced revisit effort could further reduce the time needed to re-focus on the main task and main source code file. We had initial estimated data showing that this two event was positively correlated. Generally, the more time a participant spent on revisit behavior, the more time he would spent on gathering his thought and refocus on the task. So we hypothesize that the reduced between-patch time will reduce within-patch time. In sum, for the reduced within-patch time, half of the reason is obeying the principle of information foraging theory which implied that the later visit to a patch will get redundant or less valuable information. The other half of the reason is by reducing the time of re-finding a patch, the refocus time would be reduced correspondingly.

This hypothesis still needs more detailed study in the future.

132

5.2.5 Threats to validity

One of the external validity threats to our study is the representativeness of our study participants. Our experiment involved a limited number of participants. While we tried to be inclusive, the participants were recruited from a local community and were primarily affiliated with a university’s medical campus. These bioinformatics researchers may not represent some other kinds of end-user developers. When quantitatively analyzing the data with the integration of information foraging model, subjective estimation is involved which may influence the accuracy of the results. However, since we averaged the data from multiple datasets, this kind of threat can be largely reduced.

5.2.6 Summary

In this part of work, we first performed a preliminary experiment to study the characteristics of end-user developers’ revisit behavior during software reuse tasks. By analyzing the experiment data, we put forward a tool support which can reduce the time of revisit behavior thus improve efficiency. With the tool support as treatment, we performed an experiment to validate our hypothesis about the tool support. We found that participants reduced 19.7% of time in completing one of the reuse task when using our tool support, which indicated our tool support was effective. By applying the information foraging theory to further divide the revisit time into between-patch time and within-patch time, we found that when between-patch time was reduced in revisitation, the within-patch time was also reduced as indicated by the theory but with a larger time reduction. We hypothesized that half of the reduced time was as predicted by the theory, and another half was due to

133

the reduced refocus time caused by the reduced revisit time. We also found that the reduced revisit time constituted 79.7% of the total reduced time, which overall proved that our tool reduced the task completion time by reducing the time cost of revisit behavior.

134

Chapter 6

Conclusions

In this thesis, we study end-user developers from a special angle. We not only used software engineering principles to study end-user behavior like the pragmatic reuse practices, but also used information foraging theory to guide our analysis and to model the information-intensive process of end-user programming. The implications could be discussed from several aspects, including pragmatic software reuse, social network information, end-user developers, social network information designers,

We first studied end-user developers’ information needs in an experiment setting of pragmatic software reuse, and we summarized the information needs according to architectural considerations. This part of study have several direct implications for tool building. First of all, architecture decisions should be central to the identification and evaluation of reuse candidates. Moreover, when the code is reused and integrated into the target system [13], architectural conformance should be checked and violations should be managed. Last, but certainly not the least, social network information should be seamlessly

135

incorporated into the entire process of pragmatic software reuse: ranging from understanding the reuse infrastructure to implementing a successful reuse solution. It is hoped that our work illuminates a systematic way to tackle architectural mismatch [24] in biomedical software reuse.

We then drew inspiration from the information foraging theory that the hints diversity would impact the productivity in a form of log-normal function curve. We confronted the theory model with empirical data from our experiments. We identified contributor role and information needs from the architectural concern for SNI as two key factors which could potentially improve end-user developers’ productivity. In addition, our strategies and metrics of labeling SNI hints can provide guidelines to develop tool support to automatically categorize the SNIs. This part of work has implications for both end-user developers and SNI web designers.

For end-user developers, they may have not realized the importance of knowing who wrote a piece content since they care more about the content and information.

However, our results which identified contributor role as an important factor to improve productivity, indicating that the chances to find useful information from user with high reputation or more expertise should be statistically higher than from general users. If the end-user developers start to realize and think about this point, they can possibly improve their efficiency when finding information considering where it comes from, especially when the webpage provides clear indication of such information.

136

For SNI web designers, they can design algorithms or metrics to assess the users’ professional level regarding a topic. The evaluation could be done by multiple perspectives such as working background and social status. For each user, different scores can be assigned to different areas by calculating the closeness between the topic and the user’s specialized area. The importance of contributor role is also discussed in previous literature.

Previous studies have shown that users with high social status or high reputation tended to be more active members of a forum and they are at the core of community and have merits of their quality of contributions [119], [120]. The user level status can then be shown on the webpage to help web information forager quickly locate information from expert users.

To answer what information needs a webpage can satisfy for the end-user developers, some strategies such as word frequency analysis could be utilized to summarize the main topic of a webpage.

However, for this part of work, the main limitation is that we only performed analysis on the category level of the SNI webpages. While the actual contents contained in the webpages could have significant impact on how foragers will use the SNI. For example, the most voted answer in Stack Overflow is set to the top which can largely reduce the time needed for developers to find a good solution. This example indicates that the structure and ordering of contents on a webpage matters. Also, some other important characteristics of

SNI webpage include size of the contents, whether it contains source code showcase example and is it directly runnable, understandability, whether it contains illustrative

137

figures, etc. All these factors can demonstrate the complexity and difficulty of analyzing

SNI webpages in a comprehensive way.

Our later study further had an in-depth analysis and we developed four types of foraging curve styles to model end-user developers’ information seeking and foraging process. In this study, we motivated by the rationale that the foraging cost is an important supplement to relevancy regarding the selection of webpages. However, this observation is not limited to selection of webpages. Other kings of information patches, such as source code files in IDE, tutorial documents, books, etc., may also require cost estimation. When selecting an information patch from a bunch of patches, relevancy of the patch is certainly important, while the cost should be another factor not being neglected. There should be multiple ways to model and estimate the cost of foraging an information patch. Our approach of foraging curve and information amount is not solving the problem, but one attempt to response to the call for actions in [93]. We expect our study could inspire other approaches in predicting the cost of foraging an information patch. We also found that the end-user developers prefer to access webpages belonging to Ranked Foraging category.

From this viewpoint, we can collectively consider the information needs [5], [77] with the information accumulation to provide better suggestions to developers.

From the search engine perspective, our study help generate an idea of advanced search engine: (1) search engine should not limit to the accuracy as the only ranking factor, but should consider other factors such as cost, authors, ratings, and recency, etc. [92]; (2) user should have the power to select which ranking strategy to use according to their

138

specific foraging goal. We speculate that different foraging goals can alter the forager’s strategies and behavior: a forager wanting a best answer probably value ratings; a forager wanting only a working solution probably value cost; a forager wanting an authoritative answer may value who wrote the content.

For this study, the key limitation is that we only studied foraging styles by analyzing the webpage types themselves. However, the foraging style is also affected by the foraging goals that end-user developers are pursuing. For example, foraging for the name of a function in a language that a programmer is unfamiliar with is very different from foraging for how to write code that works with a plug-in interface. In the first case, it’s likely the programmers just want a page that can deliver the answer quickly (like answer-seeking foraging in our study). In the second case, there are likely a number of questions that foragers would ask during such kind of task [121], [122]. A programmer can visit API documentation and find answers for either of the two goals, but two kinds of foraging curves could be applicable. Therefore, we expect that taking foraging goal into consideration will be the key element to improve this part of our work.

In terms of tool design, currently our tool has limitations that it can only return two hints of information accumulation and information amount to serve for end-user developers’ selection and navigation process. Two potential improvements can be made to further facilitate the webpage seeking process. First, we can add a third hint of predicting the optimal time to stay in a webpage by applying Charnov’s marginal value theorem introduced in related work section. This hint predicts the optimal stay time considering

139

both features of information accumulation and information amount. Knowing the optimal stay time can guide forager to make decision of when to leave a webpage. Second, if the current two hints of our tool can also contribute to the ranking of these webpages, it may further help developers find an optimal webpage link in a shorter time.

Lastly, we observed end-user developers’ short-term revisit behavior during programming and we believe this should be a topic which demands more studies to provide support to the end-user developers. We expect such support can reduce some kinds of the redundant behavior thus improving end-user developers’ productivity. In addition, we expect the support not only will have an effect from engineering perspective such as improving efficiency, but also will have an effect from psychological perspective that can help them save energy and stay in a better flow when programming, because they would be less likely to be overwhelmed by large amount of information when having such support to facilitate their tasks.

By applying information foraging theory, we found that if the time of re-finding a patch is decreased, the corresponding time needed to handle this patch will also be decreased, because shorter revisit time will help end-user programmers more easily refocus on their information needs for the task. Such observation was also described as memory for goals model by Altmann and Trafton [96], who found that the time to resume task goals after an interruption varied depending on the duration and cognitive demand of interruptions. The study by Mont et al. [117] then showed that longer and more demanding interruptions led to longer resumption times in a hierarchical, interactive task. Having such

140

knowledge, we can study the revisit from a novel perspective and provide corresponding support to the end-user developers.

Is a task completion time decrease of 19.7% significant enough to provide value to developers? For a one hour task, the reduction may not sound so significant. However, considering the real world tasks for these bioinformatics researchers will often cost hours to days, the time reduction will be valuable to save their time thus improving productivity.

Moreover, two potential improvements can further enhance the tool’s ability in facilitating end-user developers. First, our tool EasyRevisit can only statically display the small chunks of fetched information. If the fetched information can be displayed in a dynamic way which means the information windows can automatically readjust their positions according to user’s current focus, the better display can better satisfy user’s requirements thus the efficiency could be further improved. Second, the fetched information is read-only in a picture format so the user can only read and type by hand to program. If we can automatically transform the information on the picture to be editable information, then the user can have more flexibility in using the information such as copying and pasting part of the information which reduce the cost of typing by hand.

Our future work lies in several directions. First, when studying the relation between

SNI diversity and productivity, we only analyzed SNI on the meta-data level while the actual contents contained in the webpages have significant impact on how foragers would use the SNI. Therefore, we would like to further analyze the chunks of contents on the SNI webpages. Second, the analysis of cost estimation lack the consideration of foraging goal,

141

we plan to take foraging goal into consideration to perform more comprehensive analysis.

For our tool design of CostEstimator, we plan to investigate three options to strengthen it thereby further validating our hypothesis: (1) seek more accurate and efficient way in abstracting a webpage’s information accumulation; (2) consider integrating the tool with

Charnov’s marginal value theorem so as to guide forager in terms of when to leave a webpage; and (3) empower the tool so that it can also contribute to the ranking of the searched webpage links. Finally, for the study of short-term revisit, our future work includes optimizing the tool EasyRevisit to be used easier and with more flexibility by making the information displayed editable and dynamic. We also plan to invite more participants to install and use our tool support on their computers and perform a long-term study to collect feedback about the tool which could potentially generate more research questions and directions.

142

References

[1] M. M. Burnett and B. A. Myers, "Future of end-user software engineering: Beyond the silos," in

Proceedings of Software Engineering on Future of Software Engineering, pp. 201-211. 2014, May. ACM.

[2] J. Brandt, P. J. Guo, J. Lewenstein and S. R. Klemmer, "Opportunistic programming: How rapid ideation and prototyping occur in practice," in Proceedings of the 4th International Workshop on End-

User Software Engineering, pp. 1-5. 2008, May. ACM.

[3] A. J. Ko, R. Abraham, A. Beckwith, M. Burnett, M. Erwig, C. Scaffidi, J. Lawrance, H. Lieberman, B.

Myers and M. B. Rosson, "The state of the art in end-user software engineering," ACM

Surveys (CSUR), vol. 43, (3), pp. 21, 2011.

[4] P. Pirolli, "An elementary social information foraging model," in Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems, pp. 605-614. 2009, April. ACM.

[5] X. Jin, N. Niu and M. Wagner, "On the impact of social network information diversity on end-user programming productivity: a foraging-theoretic study," in Proceedings of the 8th International Workshop on Social Software Engineering, pp. 15-21. 2016, November. ACM.

[6] E. H. Chi, A. Rosien, G. Supattanasiri, A. Williams, C. Royer, C. Chow, E. Robles, B. Dalal, J. Chen and S.

Cousins, "The bloodhound project: Automating discovery of web usability issues using the InfoScentp simulator," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 505-

512. 2003, April. ACM.

[7] J. M. Spool, C. Perfetti and D. Brittan, Designing for the Scent of Information. User Interface

Engineering, 2004.

143

[8] J. Lawrance, R. Bellamy and M. Burnett, "Scents in programs: Does information foraging theory apply to program maintenance?" in Proceedings of IEEE Symposium on Visual Languages and Human-Centric

Computing, pp. 15-22. 2007, September. IEEE.

[9] J. Lawrance, R. Bellamy, M. Burnett and K. Rector, "Using information scent to model the dynamic foraging behavior of programmers in maintenance tasks," in Proceedings of the SIGCHI Conference on

Human Factors in Computing Systems, pp. 1323-1332. 2008, April. ACM.

[10] J. Lawrance, C. Bogart, M. Burnett, R. Bellamy, K. Rector and S. D. Fleming, "How programmers debug, revisited: An information foraging theory perspective," IEEE Transactions on Software

Engineering, vol. 39, (2), pp. 197-215, 2013.

[11] J. Lawrance, M. Burnett, R. Bellamy, C. Bogart and C. Swart, "Reactive information foraging for evolving goals," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp.

25-34. 2010, April. ACM.

[12] Software Discovery Meeting Report. Last accessed: October 2017. Available: https://nciphub.org/resources/885/supportingdocs.

[13] J. Maras, M. Štula and I. Crnković, "Towards specifying pragmatic software reuse," in Proceedings of the 2015 European Conference on Software Architecture Workshops, pp. 54. 2015, September. ACM.

[14] N. Niu, J. Savolainen, Z. Niu, M. Jin, and J. R. C. Cheng, "A systems approach to product line requirements reuse," IEEE Systems Journal, vol. 8, (3), pp. 827-836, 2014.

[15] R. Holmes and R. J. Walker, "Systematizing pragmatic software reuse," ACM Transactions on

Software Engineering and Methodology, vol. 21, (4), pp. 20, 2012.

144

[16] G. Kakarontzas, E. Constantinou, A. Ampatzoglou and I. Stamelos, "Layer assessment of object- oriented software: A metric facilitating white-box reuse," Journal of Systems and Software, vol. 86, (2), pp. 349-366, 2013.

[17] E. Constantinou, A. Naskos, G. Kakarontzas and I. Stamelos, "Extracting reusable components: A semi-automated approach for complex structures," Information Processing Letters, vol. 115, (3), pp. 414-

417, 2015.

[18] H. Happel, T. Schuster and P. Szulman, "Leveraging source code search for reuse," High Confidence

Software Reuse in Large Systems, pp. 360-371, 2008.

[19] O. Hummel and C. Atkinson, "Using the web as a reuse repository," in International Conference on

Software Reuse, pp. 298-311, 2006, June. Springer.

[20] O. A. L. Lemos, S. Bajracharya, J. Ossher, P. C. Masiero and C. Lopes, "A test-driven approach to code search and its application to the reuse of auxiliary functionality," Information and Software Technology, vol. 53, (4), pp. 294-306, 2011.

[21] N. Niu, X. Jin, Z. Niu, J. R. C. Cheng, L. Li and M. Y. Kataev, "A clustering-based approach to enriching code foraging environment," IEEE Transactions on Cybernetics, vol. 46, (9), pp. 1962-1973, 2016.

[22] N. Niu, A. Mahmoud and G. Bradshaw, "Information foraging as a foundation for code navigation

(NIER track)," in Proceedings of the 33rd International Conference on Software Engineering, pp. 816-819.

2011, May. ACM.

[23] D. Garlan, R. Allen and J. Ockerbloom, "Architectural mismatch: Why reuse is so hard," IEEE

Software, vol. 12, (6), pp. 17-26, 1995.

145

[24] D. Garlan, R. Allen and J. Ockerbloom, "Architectural mismatch: Why reuse is still so hard," IEEE

Software, vol. 26, (4), 2009.

[25] A. Begel, Y. P. Khoo and T. Zimmermann, "Codebook: Discovering and exploiting relationships in software repositories," in Proceedings of 32nd International Conference on Software Engineering, pp.

125-134. 2010, May. IEEE.

[26] T. Bhowmik, N. Niu, W. Wang, J. R. C. Cheng and X. Cao, "Optimal group size for software change tasks: a social information foraging perspective," IEEE Transactions on Cybernetics, vol. 46, (8), pp. 1784-

1795, 2016.

[27] C. Bird, D. Pattison, R. D'Souza, V. Filkov and P. Devanbu, "Latent social structure in open source projects," in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software

Engineering, pp. 24-35. 2008, November. ACM.

[28] G. Engels, "Model-driven development for end-users, too!?" in Dagstuhl Seminar Proceedings,

2007. Schloss Dagstuhl-Leibniz-Zentrum fr Informatik.

[29] K. T. Stolee, S. Elbaum and A. Sarma, "Discovering how end-user programmers and their communities use public repositories: A study on yahoo! pipes," Information and Software Technology, vol. 55, (7), pp. 1289-1303, 2013.

[30] M. A. Storey, C. Treude, A. van Deursen and L. T. Cheng, "The impact of social media on software engineering practices and tools," in Proceedings of the FSE/SDP Workshop on Future of Software

Engineering Research, pp. 359-364, 2010, November. ACM.

146

[31] A. Begel, J. Bosch and M. Storey, "Bridging software communities through social networking," IEEE

Software, vol. 30, (1), pp. 26-28, 2013.

[32] L. Dabbish, C. Stuart, J. Tsay and J. Herbsleb, "Leveraging transparency," IEEE Software, vol. 30, (1), pp. 37-43, 2013.

[33] Y. Zou, C. Liu, Y. Jin and B. Xie, "Assessing software quality through web comment search and analysis," in International Conference on Software Reuse, pp. 208-223. 2013, June. Springer.

[34] B. A. Nardi, A Small Matter of Programming: Perspectives on End User Computing. MIT press. 1993.

[35] A. J. Ko and B. A. Myers, "Development and evaluation of a model of programming errors," in

Proceedings of 2003 IEEE Symposium on Human Centric Computing Languages and Environments, pp. 7-

14. 2003, October. IEEE.

[36] A. J. Ko, B. A. Myers and H. H. Aung, "Six learning barriers in end-user programming systems," in

Proceedings of 2004 IEEE Symposium on Visual Languages and Human Centric Computing, pp. 199-206.

2004, September. IEEE.

[37] A. J. Ko and B. A. Myers, "Designing the Whyline, a debugging interface for asking why and why not questions about runtime failures," in Proceedings of 2004 Human Factors in Computing Systems, pp.

151-158, 2004, April. ACM.

[38] C. W. Krueger, "Software reuse," ACM Computing Surveys (CSUR), vol. 24, (2), pp. 131-183, 1992.

[39] J. Brandt, P. J. Guo, J. Lewenstein, M. Dontcheva and S. R. Klemmer. "Writing code to prototype, ideate, and discover," IEEE Software, vol. 26, (5), 2009.

147

[40] N. Niu, F. Yang, J. R. C. Cheng and S. Reddivari, "Conflict resolution support for parallel software development," IET Software, vol. 7, (1), pp. 1-11, 2013.

[41] J. Savolainen, N. Niu, T. Mikkonen and T. Fogdal, "Long-term product line sustainability with planned staged investments," IEEE Software, vol. 30, (6), pp. 63-69, 2013.

[42] N. Niu and S. Easterbrook, "Exploiting COTS-based RE methods: An experience report," in

International Conference on Software Reuse, pp. 212-216. 2008, May. Springer.

[43] H. Schtze, "Introduction to information retrieval," in Proceedings of the International

Communication of Association for Computing Machinery Conference, 2008, June.

[44] R. DeLine, A. Khella, M. Czerwindki and G. Robertson, "Towards understanding programs through wear-based filtering," in Proceedings of the 2005 ACM Symposium on Software Visualization, pp. 183-

192. 2005, May. ACM.

[45] P. Pirolli, Information Foraging Theory: Adaptive Interaction with Information. Oxford University

Press. 2007.

[46] D. Piorkowski, S. Fleming, C. Scaffidi, C. Bogart, M. Burnett, B. John, R. Bellamy and C. Swart,

"Reactive information foraging: An empirical investigation of theory-based recommender systems for programmers," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp.

1471-1480. 2012, May. ACM.

[47] D. Piorkowski, S. D. Fleming, C. Scaffidi, L. John, C. Bogart, B. E. John, M. Burnett and R. Bellamy,

"Modeling programmer navigation: A head-to-head empirical evaluation of predictive models," in

148

Proceedings of 2011 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 109-116.

2011, September. IEEE.

[48] S. D. Fleming, C. Scaffidi, D. Piorkowski, M. Burnett, R. Bellamy, J. Lawrance and I. Kwan, "An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks," ACM

Transactions on Software Engineering and Methodology, vol. 22, (2), pp. 14, 2013.

[49] B. A. Huberman, "The performance of cooperative processes," Physica D, vol. 42, (1-3), pp. 38-47,

1990.

[50] L. Giraldeau and T. Caraco, Social Foraging Theory. Princeton University Press. 2000.

[51] C. Treude and M. Storey, "Work item tagging: Communicating concerns in collaborative software development," IEEE Transactions on Software Engineering, vol. 38, (1), pp. 19-34, 2012.

[52] P. Pirolli and S. Card, "Information foraging." Psychological Review, vol. 106, (4), pp. 643, 1999.

[53] S. P. Carmien and G. Fischer, "Design, adoption, and assessment of a socio-technical environment supporting independence for persons with cognitive disabilities," in Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems, pp. 597-606. 2008, April. ACM.

[54] S. Planning, "The economic impacts of inadequate infrastructure for software testing," National

Institute of Standards and Technology. 2002.

[55] A. J. Ko, R. DeLine and G. Venolia, "Information needs in collocated software development teams," in Proceedings of 29th International Conference on Software Engineering, pp. 344-353. 2007, May. IEEE.

149

[56] J. Segal, "Some problems of professional end user developers," in Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 111-118, 2007, September. IEEE.

[57] R. Abraham, M. Erwig and S. Andrew, "A type system based on end-user vocabulary," in

Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 215-222. 2007,

September. IEEE.

[58] M. W. Newman, J. Lin, J. I. Hong and J. A. Landay, "DENIM: An informal web site design tool inspired by observations of practice," Human-Computer Interaction, vol. 18, (3), pp. 259-324, 2003.

[59] T. R. Green, A. E. Blandford, L. Church, C. R. Roast and S. Clarke, "Cognitive dimensions:

Achievements, new directions, and open questions," Journal of Visual Languages & Computing, vol. 17,

(4), pp. 328-365, 2006.

[60] S. Bellon, R. Koschke, G. Antoniol, J. Krinke and E. Merlo, "Comparison and evaluation of clone detection tools," IEEE Transactions on Software Engineering, vol. 33, (9), 2007.

[61] Y. Ye and G. Fischer, "Reuse-conducive development environments," Automated Software

Engineering, vol. 12, (2), pp. 199-235, 2005.

[62] A. F. Blackwell, "First steps in programming: A rationale for attention investment models," in

Proceedings of IEEE 2002 Symposia on Human Centric Computing Languages and Environments, pp. 2-

10. 2002, September. IEEE.

[63] R. A. Walpole and M. M. Burnett, "Supporting reuse of evolving visual code," in Proceedings of 1997

IEEE Symposium on Visual Languages, pp. 68-75. 1997, September. IEEE.

150

[64] J. Lawrence, S. Clarke, M. Burnett and G. Rothermel, "How well do professional developers test with visualizations? an empirical study," in Proceedings of 2005 IEEE Symposium on Visual

Languages and Human-Centric Computing, pp. 53-60. 2005, September. IEEE.

[65] R. R. Panko, "What we know about spreadsheet errors," Journal of Organizational and End User

Computing, vol. 10, (2), pp. 15-21, 1998.

[66] R. R. Panko, "Spreadsheet errors: What we know. what we think we can do," arXiv Preprint arXiv:0802.3457, 2008.

[67] J. R. Ruthruff, M. Burnett and G. Rothermel, "An empirical study of fault localization for end-user programmers," in Proceedings of the 27th International Conference on Software Engineering, pp. 352-

361. 2005, May. ACM.

[68] J. R. Ruthruff, S. Prabhakararao, J. Reichwein, C. Cook, E. Creswick and M. Burnett, "Interactive, visual fault localization support for end-user programmers," Journal of Visual Languages & Computing, vol. 16, (1), pp. 3-40, 2005.

[69] A. Phalgune, C. Kissinger, M. Burnett, C. Cook, L. Beckwith and J. R. Ruthruff, "Garbage in, garbage out? an empirical look at oracle mistakes by end-user programmers," in Proceedings of 2005 IEEE

Symposium on Visual Languages and Human-Centric Computing, pp. 45-52. 2005, September. IEEE.

[70] T. D. LaToza, G. Venolia and R. DeLine, "Maintaining mental models: A study of developer work habits," in Proceedings of the 28th International Conference on Software Engineering, pp. 492-501. 2006,

May. ACM.

151

[71] A. J. Ko and B. A. Myers, "A framework and methodology for studying the causes of software errors in programming systems," Journal of Visual Languages & Computing, vol. 16, (1), pp. 41-84, 2005.

[72] M. P. Robillard, W. Coelho and G. C. Murphy, "How effective developers investigate source code: An exploratory study," IEEE Transactions on Software Engineering, vol. 30, (12), pp. 889-903, 2004.

[73] S. Wiedenbeck and A. Engebretson, "Comprehension strategies of end-user programmers in an event-driven application," in Proceedings of 2004 IEEE Symposium on Visual Languages and Human

Centric Computing, pp. 207-214. 2004, September. IEEE.

[74] A. Ko and B. Myers, "Debugging reinvented," in Proceedings of ACM/IEEE 30th International

Conference on Software Engineering, pp. 301-310. 2008, May. IEEE.

[75] D. K. Datta, J. P. Guthrie and P. M. Wright, "Human resource management and labor productivity: does industry matter?" Academy of Management Journal, vol. 48, (1), pp. 135-145, 2005.

[76] Y. W. Ramrez and D. A. Nembhard, "Measuring knowledge worker productivity: A taxonomy,"

Journal of Intellectual Capital, vol. 5, (4), pp. 602-628, 2004.

[77] X. Jin, C. Khatwani, N. Niu, M. Wagner and J. Savolainen "Pragmatic software reuse in bioinformatics: How can social network information help?" in International Conference on Software

Reuse, pp. 247-264. 2016, June. Springer.

[78] C. A. Schneider, W. S. Rasband and K. W. Eliceiri, "NIH Image to ImageJ: 25 years of image analysis,"

Nature Methods, vol. 9, (7), pp. 671-675, 2012.

152

[79] K. R. Sanft, S. Wu, M. Roh, J. Fu, R. K. Lim and L. R. Petzold, "StochKit2: software for discrete stochastic simulation of biochemical systems with events," Bioinformatics, vol. 27, (17), pp. 2457-2458,

2011.

[80] E. E. Sel'Kov, "Self‐Oscillations in Glycolysis," The FEBS Journal, vol. 4, (1), pp. 79-86, 1968.

[81] J. Sillito, G. C. Murphy and K. De Volder, "Asking and answering questions during a programming change task," IEEE Transactions on Software Engineering, vol. 34, (4), pp. 434-451, 2008.

[82] B. M. Evans and E. H. Chi, "Towards a model of understanding social search," in Proceedings of the

2008 ACM Conference on Computer Supported Cooperative Work, pp. 485-494. 2008, November. ACM.

[83] A. M. Kaplan and M. Haenlein, "Users of the world, unite! The challenges and opportunities of

Social Media," Business Horizons, vol. 53, (1), pp. 59-68, 2010.

[84] D. Movshovitz-Attias, Y. Movshovitz-Attias, P. Steenkiste and C. Faloutsos, "Analysis of the reputation system and user contributions on a question answering website: Stackoverflow," in

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 886-893. 2013, August. ACM.

[85] K. Petersen, "Measuring and predicting software productivity: A systematic map and review,"

Information and Software Technology, vol. 53, (4), pp. 317-343, 2011.

[86] A. T. Ying and M. P. Robillard, "The influence of the task on programmer behaviour," in Proceedings of 2011 IEEE 19th International Conference on Program Comprehension, pp. 31-40. 2011, June. IEEE.

[87] E. Limpert, W. A. Stahel and M. Abbt, "Log-normal Distributions across the Sciences: Keys and Clues:

On the charms of statistics, and how mechanical models resembling gambling machines offer a link to a

153

handy way to characterize log-normal distributions, which can provide deeper insight into variability and probability—normal or log-normal: That is the question," AIBS Bulletin, vol. 51, (5), pp. 341-352, 2001.

[88] S. Yue, "The bivariate lognormal distribution for describing joint statistical properties of a multivariate storm event," Environmetrics, vol. 13, (8), pp. 811-819, 2002.

[89] X. Jin, N. Niu and M. Wagner, "Facilitating end-user developers by estimating time cost of foraging a webpage," in Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, pp.

31-35, 2017, October. IEEE.

[90] X. Jin and N. Niu, "Short-term revisit during programming tasks," in Proceedings of International

Conference on Software Engineering Companion, pp. 322-324. 2017, May. IEEE Press.

[91] J. Brandt, P. J. Guo, J. Lewenstein, M. Dontcheva and S. R. Klemmer, "Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code," in Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems, pp. 1589-1598. 2009, April. ACM.

[92] C. Martos, S. Y. Kim and S. K. Kuttal, "Reuse of variants in online repositories: Foraging for the fittest," in Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 124-

128. 2016, September. IEEE.

[93] D. Piorkowski, A. Z. Henley, T. Nabi, S. D. Fleming, C. Scaffidi and M. Burnett, "Foraging and navigations, fundamentally: Developers' predictions of value and cost," in Proceedings of 24th ACM

SIGSOFT International Symposium on Foundations of Software Engineering, pp. 97-108. 2016,

November. ACM.

154

[94] P. Pirolli and S. Card, "Information foraging in information access environments," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 51-58. 1995, May. ACM

Press/Addison-Wesley Publishing Co.

[95] D. W. Stephens and J. R. Krebs, Foraging Theory. Princeton University Press. 1986.

[96] E. M. Altmann and J. G. Trafton, "Memory for goals: An activation-based model," , vol. 26, (1), pp. 39-83, 2002.

[97] R. Holmes, R. J. Walker and G. C. Murphy, "Approximate structural context matching: An approach to recommend relevant examples," IEEE Transactions on Software Engineering, vol. 32, (12), 2006.

[98] S. Minto and G. C. Murphy, "Recommending emergent teams," in Proceedings of Fourth

International Workshop on Mining Software Repositories, pp. 5. 2007, May. IEEE.

[99] R. DeLine, M. Czerwinski and G. Robertson, "Easing program comprehension by sharing navigation data," in Proceedings of 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp.

241-248. 2005, September. IEEE.

[100] M. Toomim, A. Begel and S. L. Graham, "Managing duplicated code with linked editing," in

Proceedings of 2004 IEEE Symposium on Visual Languages and Human Centric Computing, pp. 173-180.

2004, September. IEEE.

[101] T. Bell, "Extensive reading: Speed and comprehension," The Reading Matrix, vol. 1, (1), 2001.

[102] C. Scaffidi, M. Shaw and B. Myers, "An approach for categorizing end user programmers to guide software engineering research," in ACM SIGSOFT Software Engineering Notes, vol. 40, (4), pp. 1-5, 2005,

May. ACM.

155

[103] N. Sawadsky, G. C. Murphy and R. Jiresal, "Reverb: Recommending code-related web pages," in

Proceedings of the 2013 International Conference on Software Engineering, pp. 812-821. 2013, May. IEEE

Press.

[104] P. Baldi, P. Frasconi and P. Smyth, "Modeling the Internet and the Web," Probabilistic Methods and Algorithms, 2003.

[105] A. Cockburn and B. McKenzie, "What do web users do? An empirical analysis of web use,"

International Journal of Human-Computer Studies, vol. 54, (6), pp. 903-922, 2001.

[106] E. Herder, "Characterizations of user Web revisit behavior," in Proceedings of the Workshop on

Adaptivity and User Modeling in Interactive Systems, pp. 32-37, 2005, July. Springer.

[107] L. Tauscher and S. Greenberg, "How people revisit web pages: Empirical findings and implications for the design of history systems," International Journal of Human-Computer Studies, vol. 47, (1), pp. 97-

137, 1997.

[108] E. Adar, J. Teevan and S. T. Dumais, "Large scale analysis of web revisitation patterns," in

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1197-1206. 2008,

April. ACM.

[109] H. Obendorf, H. Weinreich, E. Herder and M. Mayer, "Web page revisitation revisited: Implications of a long-term click-stream study of browser usage," in Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, pp. 597-606. 2007, April. ACM.

[110] R. Sedgewick and K. Wayne, LinearRegression.java. Last accessed: October, 2017. Available: https://introcs.cs. princeton.edu/java/97data/LinearRegression.java.html.

156

[111] E. Adar, J. Teevan and S. T. Dumais, "Resonance on the web: Web dynamics and revisitation patterns," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1381-

1390. 2009, April. ACM.

[112] R. Kumar and A. Tomkins, "A characterization of online browsing behavior," in Proceedings of the

19th International Conference on World Wide Web, pp. 561-570. 2010, April. ACM.

[113] B. McKenzie and A. Cockburn, "An empirical analysis of web page revisitation," in Proceedings of the 34th Annual Hawaii International Conference on System Sciences, pp. 9. 2001, January. IEEE.

[114] K. Nakasai, M. Tsunoda and H. Hata, "Web search behaviors for software development," in

Proceedings of the 9th International Workshop on Cooperative and Human Aspects of Software

Engineering, pp. 125-128. 2016, May. ACM.

[115] A. Bragdon, R. Zeleznik, S. P. Reiss, S. Karumuri, W. Cheung, "Code bubbles: A working set-based interface for code understanding and maintenance," in Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, pp. 2503-2512. 2010, April. ACM.

[116] L. Dabbish, G. Mark and V. M. Gonzlez, "Why do i keep interrupting myself?: Environment, habit and self-interruption," in Proceedings of the SIGCHI Conference on Human Factors in Computing

Systems, pp. 3127-3130. 2011, May. ACM.

[117] C. A. Monk, J. G. Trafton and D. A. Boehm-Davis, "The effect of interruption duration and demand on resuming suspended goals." Journal of Experimental Psychology: Applied, vol. 14, (4), pp. 299, 2008.

[118] C. Nicora, FileBox eXtender. Last accessed: October, 2017. Available: http://www.hyperionics.com/index.asp.

157

[119] A. Bosu, C. S. Corley, D. Heaton, D. Chatterji, J. C. Carver and N. A. Kraft, "Building reputation in stackoverflow: An empirical investigation," in Proceedings of the 10th Working Conference on Mining

Software Repositories, pp. 89-92, 2013, May. IEEE Press.

[120] K. Hart and A. Sarma, "Perceptions of answer quality in an online technical question and answer forum," in Proceedings of the 7th International Workshop on Cooperative and Human Aspects of

Software Engineering, pp. 103-106. 2014, June. ACM.

[121] J. Sillito, G. C. Murphy and K. De Volder, "Questions programmers ask during software evolution tasks," in Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software

Engineering, pp. 23-34. 2006, November. ACM.

[122] D. J. Piorkowski, S. D. Fleming, I. Kwan, M. M. Burnett, C. Scaffidi, R. K. Bellamy and J. Jordahl, "The whats and hows of programmers' foraging diets," in Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, pp. 3063-3072. 2013, April. ACM.

158

Appendix A

Detailed task description

A.1 ImageJ task

1. INPUT

Known Protein concentration – Protein.txt

Three files – (Y axis)

i. Result1.txt

ii. Result2.txt

iii. Result3.txt

2. PROCESS –

i. Treat Protein.txt as the X- axis

ii. Iterate through each Result file and plot it on Y –axis

159

iii. Generate 3 plots one of each of the Result File

iv. Perform Linear Regression for each curve and calculate R2 value for each

3. OUTPUT–

i. Plot the 3 curves along with their R2 value

ii. Select the one with highest R2.

Environment – ImageJ

ImageJ is public domain open source software. An ImageJ user has the four essential freedoms defined by the Richard Stallman in 1986:

1. The freedom to run the program, for any purpose.

160

2. The freedom to study how the program works, and change it to make it do what you wish.

3. The freedom to redistribute copies so you can help your neighbor.

4. The freedom to improve the program, and release your improvements to the public, so that the whole community benefits.

Plugins

ImageJ’s functionality can be expanded through the use of plugins written in Java.

Plugins can add support for new file formats or they can filter or analyze images. Plugins located in ImageJ’s “plugins” folder are automatically installed in the Plugins menu or they can be installed in other menus using Plugins/Hot Keys/Install Plugin. Plugins can be created or modified using Plugins/Edit.

HelloWorld.java

import ij.*; import ij.gui.*; import ij.plugin.PlugIn; import java.awt.*; public class HelloWorld implements PlugIn { public void run(String arg) { IJ.showMessage("My_Plugin","Hello world!"); }

161

} Steps to create Plugin –

1. Create a plugin file with .java extension on any like Notepad++

2. Save it at an appropriate place

3. Go to plugin – Compile and Run and select your .java file

4. The file will be compiled and the output will display

REUSE SEED

1. You can access some of the examples by going to Plugins  Examples 

2. Also you can use these as an existing running example and make edits to it in order to perform your task. It can be found from going to the .java file of the example and save it with a different name.

162

REUSE SEED:

public class Plot implements PlugIn { public void run(String arg) { if (IJ.versionLessThan("1.27t")) return; float[] x = {0.375f,0.75f, 1.5f, 2.250f, 3.00f,3.75f,4.50f,4.75f,5.0f}; // x-coordinates float[] y = {1231.00f,156.00f,3678.00f,2567.00f,5678.00f,4345.00f,4563.00f,7345.0f,8236.0 0f}; // y-coordinates float[] e = {.8f,.6f,.5f,.4f,.3f,.5f,.6f,.7f,.8f}; // error bars PlotWindow plot = new PlotWindow("Example Plot","x-axis","y-axis",x,y); plot.setLimits(0.000, 5.000, 800, 8800); plot.addErrorBars(e); // add a second curve float x2[] = {.4f,.5f,.6f,.7f,.8f}; float y2[] = {4,3,3,4,5}; plot.setColor(Color.red);

163

plot.addPoints(x2,y2,PlotWindow.X); plot.addPoints(x2,y2,PlotWindow.LINE); plot.setColor(Color.blue); plot.draw(); } } 3. Save it as .java file and place it at an appropriate location.

A.2 Stochkit task

Existing software

StochKit2 is a software used to simulate biochemical reactions. Because of simulation, there will be some randomness in the output.

Biochemical reaction formula:

Blue + Red  Green

164

 Demo

 Open the Visual Studio Command Prompt

 (Start ‐> All Programs ‐> 2010 Express ‐> Visual

Studio Command Prompt (2010)).

 cd C:\Users\jinxu\Documents\StochKit2.0.10_WINDOWS

 Run the command:

 .\bin\ssa_direct_events -m models\events.xml -t 10 -r 1 -i 5 --keep-trajectories -f

 The input file is events.xml in \models.

 The output files are in \models\events_output\trajectories. You can check the time

generated.

 Double click StochKitGUI.fig in \tools\MATLAB to run it in Matlab

 Click “Plot Trajectory” and choose the trajectory file in

\models\events_output\trajectories, run it to plot the figure above.

Your task:

Change the Blue value in following two conditions:

Change according to time:

Time step 2 4 6 8 10

Change Blue value +20 None None +15 None

165

If (Blue value) <= 5, then add (50 * (|X1| + |X2|)) to Blue value. X1 and X2 are calculated by solving linear equations in two variables:

(1) Blue * X1 – 2* Red *X2 +3 = 0

(2) 3 * Blue * X1 + Red * X2 -5 = 0

If condition #1 and condition #2 happen together. The condition #2 will always take the precedence.

For example – at time 6 – Blue =1 and Red=50

(1) 1*X1-2*50*X2+3=0  X1-100X2+3=0

(2) 3*1*X1+50*X2-5=0  3X1+50X2-5=0

Output-plot Above is the original curve, the one down is the curve after performing above task.

166

Hints:

Demo the change of code: After adding

std::cout << "HelloWorld!\n";

to Input_events.ipp here, you can check output in \models\events_output\log.txt

One sample solution requires good understanding about the following 3 files here

\src\model_parser\Input_events.ipp

\src\solvers\SSA_Direct.ipp

\src\solvers\SSA_Direct_Events.ipp Start

Bookmarks

There are some bookmarks that has social network information in the search engine which might help you solve the task.

167