<<

THE EFFECTIVENESS OF TEST-DRIVEN DEVELOPMENT AND REFACTORING

TECHNIQUES IN COMPUTATIONAL SCIENCE AND ENGINEERING

DEVELOPMENT

by

AZIZ NANTHAAMORNPHONG

JEFFREY CARVER, COMMITTEE CHAIR JEFF GRAY KARLA MORRIS RANDY SMITH XIAOYAN HONG

A DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of in the Graduate School of The University of Alabama

TUSCALOOSA, ALABAMA

2014 Copyright Aziz Nanthaamornphong 2014 ALL RIGHTS RESERVED ABSTRACT

Many computational science and engineering (CSE) software developers have applied soft- ware engineering practices to their work in recent years. CSE has unique characteristics that differ from those of traditional software development. Some existing soft- ware engineering practices may not be suitable for CSE software development. Agile methods are gaining more interest in both industry and research fields, including the CSE application do- main. Agile methods focus on an incremental and iterative development process in which the requirements, specifications, design, implementation, and testing evolve throughout the project lifecycle. Test-driven development (TDD) and refactoring practices are critical for the success of

Agile methods. Although many CSE projects employ Agile practices, the effect of TDD on CSE software development remains unknown and should thus be investigated.

The empirical study is the primary research method by which the choice of a given soft- ware development technique should be justified. The empirical evidence obtained from research can be used by practitioners who are working to improve software develop- ment. The overall goal of this work is to provide the empirical evidence of the effects of TDD on CSE software development. Thus, I will perform a series of case studies and other empirical studies. I also propose a reverse engineering tool to assist developers in performing refactoring activities during project development. The details and results of these studies will be described in this dissertation.

ii DEDICATION

To the memory of my beloved father, his love and encouragement are always with me.

iii ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Jeffrey Carver, for his overwhelming support, direc- tion, and feedback throughout my Ph.D. program. His guidance provided me with the opportunity to learn from a recognized expert in the field of empirical software engineering. I would also like to thank committee members Dr. Jeff Gray, Dr. Karla Morris, Dr. Randy Smith, and Dr. Xi- aoyan Hong for their guidance in helping me complete the dissertation. I also gratefully thank

Dr. Damian Rouson at Stanford University, Hope Michelsen member of the Combustion Chem- istry Department at Sandia National Laboratories, and Dr. David Hudak at the Ohio

Center; their comments and helpful discussions were extremely valuable. I would like to thank the staff and developers at the Combustion Research Facility, Sandia National Laboratories and Ohio

Supercomputer Center for their contributions to this research.

My completion of the doctoral program also required the assistance of my classmates and school colleagues. I would like to thank the members of Software Engineering Research Group at University of Alabama for providing valuable feedback and suggestions and for offering me plentiful opportunities to present my research. I would especially like to thank my friends in the

Empirical Software Engineering research group for their support and good friendship - we had great and brilliant times together. Finally, I thank Royal Thai Government Scholarship Program for generous financial support toward my Ph.D. program.

iv CONTENTS

ABSTRACT ...... ii

DEDICATION ...... iii

ACKNOWLEDGMENTS ...... iv

LIST OF TABLES ...... ix

LIST OF FIGURES ...... x

1 INTRODUCTION ...... 1

1.1 Problem Statement ...... 3

1.2 Study Rationale ...... 7

1.3 Research Questions ...... 8

1.4 Outline of Dissertation ...... 9

2 BACKGROUND RESEARCH ...... 10

2.1 Empirical Study ...... 10

2.2 (XP) ...... 13

2.3 Test-Driven Development (TDD) ...... 18

2.4 Refactoring ...... 19

2.5 Design Patterns ...... 21

2.6 Object-Oriented Fortran ...... 24

3 METHODOLOGY ...... 27

3.1 Building the software tools for use in the case study ...... 28

v 3.1.1 The automated data collection tool ...... 29

3.1.2 Reverse engineering tool for object-oriented Fortran: ForUML ...... 30

3.2 Case Study: Test-Driven Development in the Community Laser-Induced Incandes- cence Modeling Environment (CLiiME) ...... 43

3.2.1 Study Rationale ...... 43

3.2.2 Research Questions ...... 44

3.2.3 CLiiME Description ...... 44

3.2.4 Procedure ...... 47

3.2.5 Data Collection ...... 49

3.2.6 Data Analysis ...... 50

3.3 Case Study: TDD in the Microscopy Imaging Processing project...... 51

3.3.1 Study Rationale ...... 51

3.3.2 Research Questions ...... 52

3.3.3 Data Collection ...... 52

3.3.4 Data Analysis ...... 54

3.4 The survey of the effectiveness of TDD in the CSE community ...... 57

3.4.1 Study Rationale ...... 57

3.4.2 Research Questions ...... 57

3.4.3 Participants ...... 57

3.4.4 Procedure ...... 58

3.4.5 Data Collection ...... 59

3.4.6 Data Analysis ...... 60

4 RESULTS ...... 62

vi 4.1 ForUML ...... 62

4.1.1 Evaluation Results ...... 62

4.1.2 Limitations ...... 63

4.1.3 Lessons Learned ...... 66

4.2 A Case Study: Agile Development in the Community Laser-Induced Incandes- cence Modeling Environment (CLiiME) ...... 67

4.2.1 Findings and Results ...... 67

4.2.2 Lessons Learned ...... 76

4.3 The survey of the effectiveness of TDD in the CSE community ...... 81

4.3.1 Demographics ...... 83

4.3.2 Test-Driven Development ...... 86

4.3.3 Characteristics ...... 88

4.3.4 Difficulty of employing TDD ...... 91

4.3.5 Testing ...... 94

4.3.6 Refactoring ...... 106

4.3.7 Benefits and Challenges of TDD ...... 120

4.3.8 Summary ...... 127

4.4 Case Study: TDD in the Microscopy Imaging Processing project...... 131

4.4.1 The effectiveness of TDD ...... 131

4.4.2 Testing ...... 132

4.4.3 Refactoring ...... 133

4.4.4 Benefits and Disadvantages of TDD ...... 135

4.4.5 Summary ...... 135

vii 5 FINDINGS ...... 137

6 SUMMARY ...... 141

6.1 Summary ...... 141

6.2 Contribution ...... 144

6.3 Future studies ...... 145

6.4 Publications ...... 146

REFERENCES ...... 148

APPENDICES ...... 157

A FORUML - SCREENSHOTS ...... 158

B WEEKLY SURVEY ...... 160

C MONTHLY SURVEY ...... 165

D BACKGROUND QUESTIONNAIRE ...... 168

E SURVEY: TEST-DRIVEN DEVELOPMENT IN COMPUTATIONAL SCIENCE AND ENGINEERING ...... 171

F INSTITUTIONAL REVIEW BOARD CERTIFICATIONS ...... 180

viii LIST OF TABLES

2.1 Empirical study methods (adapted from [Pfleeger, 1995]) ...... 14

2.2 The characteristics of Agile development ([Miller, 2001]) and CSE software de- velopment ...... 15

2.3 The 29 rules of XP ([Don, 2009b])...... 17

2.4 Object-Oriented Fortran terms (adapted from [Rouson, Adalsteinsson, and Xia, 2010])...... 25

3.1 Fortran to XMI Conversion Rules ...... 39

3.2 The list of weekly questions ...... 55

3.3 The list of monthly questions ...... 56

3.4 The Survey Questions ...... 61

4.1 Evaluation of ForUML : recall (extracted data / actual data) ...... 64

4.2 A brief comparison between UML tools (A - Automatically adjusted and M - Man- ually adjusted) ...... 66

4.3 Software quality characteristics definitions ...... 88

4.4 Testing definitions ...... 94

4.5 Automated testing tools ...... 98

4.6 Mapping testing problems and solutions ...... 107

4.7 Mapping refactoring problems and solutions ...... 119

4.8 Common problems in refactoring and testing ...... 119

4.9 Summary of survey results ...... 128

ix LIST OF FIGURES

1.1 Overview of Computational Science and Engineering Definition ...... 2

1.2 The of a tornado. The tornado is represented by spheres that are colored according pressure (source: NCSA [Trish, 2004])...... 3

2.1 Lifecycle of the XP process (adapt from [Abrahamsson, Salo, Ronkainen, and Warsta, 2002])...... 18

2.2 Overall process of TDD ...... 19

2.3 Sample code snippet of object-oriented constructs supported by Fortran 2003 . . . 26

3.1 Overall Research Process ...... 28

3.2 The Wrapper Script’s Procedure ...... 30

3.3 The Fortran metamodel ...... 35

3.4 An Overview of the Transformation Process ...... 36

3.5 Mapping extracted objects into the class diagram ...... 40

3.6 Overall Structure for CLiiME ...... 45

3.7 The GUI of CLiiME ...... 48

3.8 The procedure of automated data collection ...... 53

3.9 The procedure of the survey ...... 60

4.1 The Class Diagram (partial) : MPFlows ...... 65

4.2 The development process ...... 68

4.3 Sample code snippet of code...... 69

x 4.4 The class diagram of CLiiME using the Unified Modeling Language (UML) nota- tion: boxes indicate classes; panels within boxes indicate the class name, attributes (not show), and methods; lines connect related classes; solid diamonds indicate one class aggregates an instance of another class; and open triangles indicates one class extends another class...... 72

4.5 Experience of employing TDD ...... 82

4.6 Learning TDD methods ...... 82

4.7 Respondent locations by country ...... 83

4.8 Type of organization worked for ...... 84

4.9 Type of projects worked for ...... 85

4.10 Highest level of education ...... 85

4.11 Work experience ...... 86

4.12 Experience year of CSE software development ...... 87

4.13 Programming languages used ...... 87

4.14 The importance of quality characteristics ...... 89

4.15 The importance of quality characteristics based on the TDD experience ...... 90

4.16 The effectiveness of employing TDD on software quality ...... 91

4.17 The effectiveness of refactoring on software quality ...... 92

4.18 The difficulty of TDD ...... 92

4.19 Design when coding ...... 93

4.20 Automated testing tools ...... 97

4.21 Testing methods ...... 99

4.22 Problems about making the test ...... 100

4.23 Solutions of writing test problems ...... 104

xi 4.24 Additional methods to improve the code ...... 109

4.25 The methods of identifying poor code ...... 110

4.26 Problems about refactoring ...... 113

4.27 Solutions of refactoring ...... 116

4.28 Benefits of employing TDD ...... 120

4.29 Disadvantage of employing TDD ...... 124

4.30 Sample code snippet of Matlab ...... 133

A.1 Selection of the Fortran Code ...... 158

A.2 Generating the XMI ...... 159

A.3 View the UML class diagram ...... 159

xii Chapter 1

INTRODUCTION

Approximately 400 years ago, Galileo observed a massive exploding star. However, the mechanism of the explosion following the core collapse of a supernova remains largely unknown.

Today, scientists in many fields are working with computational scientists and engineers to un- derstand the nature of the universe and this phenomenon. Scientists and engineers cannot create physical variants of the current universe or observe its future evolution, so the only feasible way to investigate such questions is through model-based simulations produced with computational science and engineering (CSE) software. Scientists and engineers have been trained in their appli- cation areas and understand how the most powerful computer systems can be used to solve their problems. CSE offers powerful advantages over other research methods, enabling rapid calcu- lations on a large volume of data that would be impossible otherwise. Computational research has been referred to as the third pillar of the scientific enterprise and engineering research, along with theory and physical experimentation [National Science Foundation, 2012]. Computational research focuses on developing software to simulate natural phenomena and real-world problems that cannot be studied experimentally.

CSE software development has received increased attention from the software engineering community. CSE software supports a number of important application domains across various organizations (e.g., military, scientific, and commercial). CSE software has a significant impact on society due to the criticality of the problems addressed by CSE [Carver, 2011]. CSE has a role in

1 and is impacted by broad multidisciplinary areas, including science, engineering, mathematics, and computer science (Figure 1.1). For example, understanding the environmental basis of respiratory disease requires a highly complex interdisciplinary modeling effort that considers social science, public health data, and fluid dynamics models of airflow.

Figure 1.1: Overview of Computational Science and Engineering Definition

CSE has made major contributions to the design of aircraft, ships, and chemical and nuclear plants. It has played a role in tornado forecasting, the prediction of global climate changes, and has applications in genomics, proteomics, human health, and drug discovery. For example, a

CSE application can assist scientists in simulating a long-track tornado that spends 40-50 min on the ground with a pressure of at least 50 mb [Patterson and Cox, 2005]. Figure 1.2 presents a visualization of a tornado that was generated from a simulation software designed by the National

Center for Supercomputing Applications (NCSA) [Trish, 2004]. From a business perspective, CSE provides a competitive edge by transforming business and engineering practices. For example, the Boeing Company integrated modeling and simulation techniques to minimize wind tunnel testing as part of its wing design process. As a result, the company could reduce cost and time to market [Douglas, 2003]. Additionally, the survey of Council on Competitiveness reported that

2 CSE was not only beneficial, but also essential to the company’s survival [Joseph and Willard,

2004].

Figure 1.2: The visualization of a tornado. The tornado is represented by spheres that are colored according pressure (source: NCSA [Trish, 2004]).

1.1 Problem Statement

Several unique characteristics of CSE software development have inspired me to study this topic. These characteristics can be used to describe the nature of CSE software development and its constraints.

First, academic and national laboratory researchers depend on an unpredictable stream of research grants and contracts. This factor imposes time constraints on scientists’ work and often necessitates iterative work. Scientists work with limited time and a defined schedule for each iteration of a simulation because simulations must be conducted as quickly as possible. Therefore, a process-heavy software development methodology would conflict with their need to minimize the development time required prior to publication and program review deadlines.

Second, CSE represents an important domain in which the software requirements must change with changes in the level of understanding or sophistication of the science or engineering discipline itself. CSE software typically evolves with new experimental results, new modeling

3 developments, and the continual augmentation of the software’s functionality. Thus, its require- ments and changes are extremely dynamic. Similarly, the successful testing of CSE software is problematic due to the lack of elicitation and specification of requirements in its early phases. In this context, developers cannot possibly predict all of the requirements at the project’s inception.

Rather than focusing significant effort on the up-front requirements and design analysis, CSE de- velopers repeatedly implement small increments of functional code, considering changes in the requirements or context. Although the developers do not know the details of all requirements in advance, they attempt to reduce the risk by writing code based on known necessary features. They add new features into the system iteratively until the system meets the customers’ expectations.

Scientists or developers may modify the model and code when they receive unexpected results.

They can also repeatedly, in several iterations, (1) develop the model, (2) perform the experiment, and (3) compare the result.

Third, CSE developers have a strong background in the theoretical sciences, but often do not have formal training in software engineering techniques. More specifically, the complexity of the CSE problems generally require an expert in the underlying domain (e.g., physics or biology) to even understand the problem. Thus, these domain experts learn how to develop software [Carver,

2009].

Fourth, some software engineering tools are difficult to use in a CSE development envi- ronment [Carver, Kendall, Squires, and Post, 2007]. CSE applications are generally developed with software tools that are crude compared to those used today in the commercial sector. The researchers and scientists seek easy-to-use software that enables analysis of complex data and visualization of complicated interactions. Consequently, CSE developers often have trouble iden- tifying and using the most appropriate software engineering techniques for their work, particularly

4 in the case of maintenance tasks. Scientists often spend more of their time on software tools than they do on their own research. The limited interoperability of the tools and their complexity are major obstructions to further progress.

Fifth, CSE software typically lacks adequate development-oriented documentation [Segal,

2004]. In fact, the documentation of CSE software often exists only in the form of documentation. This documentation is typically clear and sufficient for library users, who treat the library as a black box, but not sufficient for developers, who must understand the library in sufficient detail to maintain it. The lack of design documentation leads to multiple problems, particularly in the maintenance tasks. The software engineering community typically uses reverse engineering to address this problem. Reverse engineering is a method that transforms the into a model. Although a number of reverse engineering tools have been developed, those tools that can be applied to Fortran, which is one of the major programing languages in CSE ap- plications, do not provide the full set of documentation required by developers. Additionally, CSE software is developed and maintained by a disparate assortment of universities, national laborato- ries, and software vendors. These groups rarely have the human resources to support the software tools required for these maintenance tasks (e.g., program comprehension, writing documentation).

The first three characteristics imply that the CSE development team must work under a

flexible, rather than rigid, software development process. A heavy process, such as the , is not appropriate for CSE software development [Nanthaamornphong, Morris, Rouson, and Michelsen, 2013]. The primary drawback to using the Waterfall model, particularly in a re- search environment, is its inability to adapt to changing requirements, as evidenced by the model’s expectation that developers define all (or almost all) of the requirements at the beginning of the

5 project. Under the Waterfall model, requirements can change after the “requirements” phase but only with a high implementation cost [Laplante and Neill, 2004].

Currently, developers in multiple CSE software projects have employed Agile methods, with claims that Agile methods are better than the Waterfall model in terms of productivity. Agile methods focus on an incremental and iterative development process in which the requirements, specifications, design, implementation, and testing have evolved throughout the project lifecycle.

Agile methods promote an iterative mechanism for developing CSE software and further increase the iterative nature of the software lifecycle. Agile software development includes various method- ologies (e.g., XP [Don, 2009a], Scrum [Scrum, 2009], Lean [Ebert, Abrahamsson, and Oza, 2012],

Crystal [Cockburn, 2004]) that share the same philosophy of emphasizing good people and prac- tices over heavy processes. Nevertheless, each methodology has its own terminology, procedures, and strategies. Among the Agile methodologies, eXtreme Programming (XP) is likely the most prominent [Beck and Fowler, 2000]. In this dissertation, I focus on two important key practices of

XP: Test-Driven Development (TDD) and refactoring [Astels, 2003; Koskela, 2007]. According to McBreen [McBreen, 2002], TDD is the most powerful of the XP practices. TDD is a program- ming approach in which developers “only write enough code to make a failing test pass.” An important characteristic of TDD is that the tests are written before the production code.

TDD helps software developers ensure that their software meets all of the user require- ments in terms of the scope of the software features. However, this approach, unchecked, is likely to lead to unmaintainable and possibly poorly performing code. Therefore, refactoring is critical for the success of TDD. During refactoring, developers reorganize their code without changing its functionality to improve software quality, such as readability, redundancy, , and ex- tensibility [Opdyke, 1992]. Martin Fowler [Fowler, 1999] identified the benefits of refactoring

6 as following: 1) it improves the , 2) it makes the software easier to understand,

3) it assists developers in finding bugs, and 4) it makes the application faster. A review of the literature indicates that although many studies have been conducted [Bhat and Nagappan, 2006;

Damm and Lundberg, 2006, 2007; Maximilien and Williams, 2003; Nagappan, Maximilien, Bhat, and Williams, 2008; Sanchez, Williams, and Maximilien, 2007; Williams, Maximilien, and Vouk,

2003] about the impact of TDD in software development, there is little empirical evidence of how

TDD supports CSE software development.

Regarding the remaining characteristics, they indicate the need for new - ing tools that are designed for a specific purpose in CSE software development. As this research focuses on TDD, some activities in a refactoring process require a special tool to assist Fortran developers in accomplishing this task. Additionally, a tool is needed to gather the necessary infor- mation during the evolution of a system implemented with Fortran.

1.2 Study Rationale

This study is important for several reasons. First, the effective development of CSE ap- plications requires many software engineering practices (e.g., testing, program comprehension, refactoring); however, some characteristics of CSE software development might preclude certain existing software engineering practices. I believe that TDD is suitable for the unique characteris- tics of CSE such that the CSE developers would benefit from the use of TDD. Investigating the effectiveness of employing TDD will provide insight into the outcome of TDD in the CSE domain.

Additionally, I would like to investigate whether a common refactoring technique used in the software engineering community could be applied to CSE applications. For example, design patterns are commonly used in traditional software development, but is relatively novel in the CSE

7 world. Thus, the second objective of this study is to identify the benefits and barriers of adopting refactoring techniques in CSE development.

Empirical study can help researchers understand the software development processes. We can consider empirical study as a series of actions designed to obtain evidence and a better under- standing of some aspects of software development. Traditional software development practices, tools, and technologies have been empirically evaluated to better understand them in their appro- priate contexts [Easterbrook, 2007]. Similarly, higher quality and productivity in CSE software development would not be possible without well-understood and sufficient empirical evidence.

Third, this study focuses on qualitative findings to gain an in-depth understanding of how TDD is experienced by CSE developers. The reasons for using qualitative research are as follows:

• It is able to capture the experiences of participants.

• It allows for the study of software development in a naturalistic setting.

• It mitigates any effects that my personal self-awareness and self-reflection might have on the

study outcome.

Finally, this study will provide information for CSE developers who are working to improve CSE software development.

1.3 Research Questions

The main research question that this study aims to answer is as follows:

What is the empirical evidence of the effect of TDD and refactoring techniques on the improvement of CSE software development?

Sub-questions related to this question include the following:

8 1. What is the effectiveness of TDD?

2. What refactoring techniques do developers use to improve their code?

3. What are the difficulties of using TDD?

4. How does the TDD method support the CSE software development process?

1.4 Outline of Dissertation

This dissertation is divided into five chapters. The second chapter presents the background and related work. The third chapter describes each study in detail along with the study objectives and research methods, including the procedures, data collection, and data analysis. Chapter four presents the results of the studies. Chapter five highlights the findings of this dissertation. The last chapter summaries the dissertation chapters and describes areas of future work.

9 Chapter 2

BACKGROUND RESEARCH

This chapter presents the background concepts and previous work that are related to this dissertation. It is divided into five sections. Section 2.1 describes the objective, definition, and pro- cess of each methodology used in this research. Section 2.2 describes eXtreme Programming, one of the Agile methodologies used in this research. Section 2.3 details the TDD process. Section 2.4 further describes refactoring, including its definition and some refactoring techniques. Section 2.5 emphasizes design patterns, which are a relatively novel refactoring technique in the CSE domain.

Finally, Section 2.6 presents the concepts of the object-oriented Fortran language. Because Fortran is still widely used in the CSE community, a better understanding of the new features of Fortran will help developers better implement CSE software.

2.1 Empirical Study

Empirical studies in software engineering are employed to explore, describe, and explain certain phenomenon. The empirical evidence derived from such studies allows researchers to val- idate their theories, identify important factors, and build models [Basili, 2007]. Because there are various types of empirical studies, researchers must choose the proper method to accomplish their research goal. The two empirical strategies employed in this research are the case study and survey, as described below.

1. Case Study. The use of case studies in software engineering research has emerged over the last two decades and has been gradually accepted by the software engineering research community.

10 A case study investigates a phenomenon in its real-life context, particularly when the boundaries between the phenomenon and its context are not clearly evident [Yin, 2002]. The control level in a case study is lower than in a controlled experiment. Researchers typically studies when studying an existing . Researchers can use a case study both to identify the key factors that could affect the outcome of an activity and to document the activities. In a case study, researchers typically collect both qualitative and quantitative data. A case study is suitable for answering particular types of how and why research questions. In some situations, researchers conduct a case study rather than performing a controlled experiment because the experiment re- quires an excessive time commitment of months or even years.

Case studies provide a deeper understanding of the phenomena under consideration than can be provided by a controlled experiment. In general, there are five major steps in a case study:

1. Case study design. The researchers must first identify the objectives and plan of the study.

2. Identify data collection methods. The researchers must define the protocol and procedures.

3. Collect evidence. The researchers execute the procedures. Lethbridge et al. [Lethbridge,

Sim, and Singer, 2005] described three degrees of data collection:

• First degree. This level involves direct methods in which the researchers directly in-

teract with the subjects in real-time, i.e., interviews, focus groups, and observations of

subjects “think aloud”

• Second degree. In this degree, the researchers collect data in real time but do not

interact with the subjects directly. This type of data collection can use sensor-based

software tools and video recording.

11 • Third degree. Here, the researchers analyze already-available materials, such as re-

quirement specifications, source codes, and bug reports.

4. Data analysis. The data analysis procedures are applied to the data.

5. Write a report. The researchers report the findings of their study. Other than the findings

themselves, the researchers should also include:

• The case study protocol and data collection instruments.

• The case study artifacts collected during the case study.

• Experience and lessons learned.

2. Survey. A survey is the collection of standardized information from a specific population or other sample through some form of interviews or questionnaires designed to describe, compare, or explain their knowledge, attitudes, and behavior [Robson, 2002]. The researchers then ana- lyze the results of the survey to make generalizations. Survey research consists of the following steps [Kasunic, 2005]:

1. Identify the research objectives. The researchers must identify the problem statement and

how the survey will answer the problem.

2. Identify the target population. The researchers must identify the target population, as defined

by demography, geography, occupation, and/or time. Another approach is to select a sample

that can be representative of the population.

3. Design the sampling plan. In this step, the researchers should determine the required sample

size and select the respondents.

12 4. Design the questionnaire. The researchers must develop a questionnaire that can provide the

answers to the research questions.

5. Conduct a pilot study. Once the questionnaire has been developed, the researchers should

perform a pilot study to validate the questions. The researchers may distribute the survey on

a small scale to members of the target population.

6. Distribute the questionnaire. Once the researchers are satisfied that the questionnaire is

ready, they can distribute it to the respondents. The researchers send out the questionnaire

along with instructions on how to fill it out. The questionnaire can be either in paper or

electronic form. The researchers may send a preliminary notification to respondents before

sending them the questionnaire. Additionally, the researchers might send a reminder notifi-

cation about the appropriate time for sending back the responses.

7. Analyze the results and write a report. After the study is over, the researchers must analyze

the responses and write their report. The results from the survey can then be analyzed to de-

rive descriptive and explanatory conclusions. The researchers may apply statistical methods

to the results.

Table 2.1 presents different key characteristics of empirical strategies. Execution control describes the extent to which the researcher can control the study. Measurement control is the degree to which the researcher can decide to include or exclude data in the study.

2.2 eXtreme Programming (XP)

CSE software typically evolves with new experimental results, new modeling develop- ments, and the continual addition of requirements. In this context, developers cannot possibly

13 Table 2.1: Empirical study methods (adapted from [Pfleeger, 1995]) Characteristics Case Study Survey Execution control No No Measurement control Yes No Investigation cost Medium Low Ease of replication Low High Qualitative/quantitative Both Both know all of the requirements in advance. As a result, developers frequently develop small incre- ments while considering possible changes in the requirements or contexts. Agile methods seem to fits these development characteristics well. Miller identified nine main characteristics of Agile methods [Miller, 2001]. Table 2.2 maps each of those nine characteristics to the characteristics of

CSE software development to illustrate how Agile methods support both developers and scientists.

Over the years, Agile methods have become a powerful approach to the software devel- opment process. Agile methods focus on incremental and iterative development, in which the requirements, specifications, design, implementation, and testing continue throughout the project lifecycle. This new methodology has been used to solve some of the problems in traditional soft- ware development processes, such as the Waterfall model [Gilbert, 1986]. One limitation of the

Waterfall model is that all user requirements must be specified at the beginning of the project.

In practice, stakeholders can rarely identify all requirements in the initial stages of the project.

In the Waterfall model, requirements can be changed in the later stages of the software develop- ment cycle, but this process can be expensive. Agile methods promote an iterative mechanism for developing software, and they further increase the iterative nature of the software lifecycle by in- troducing a test-code-refactor process in each iteration of the cycle. By implementing a number of software engineering practices, Agile ideas can be used to reduce the cost associated with chang-

14 Table 2.2: The characteristics of Agile development ([Miller, 2001]) and CSE software develop- ment Agile Development CSE Software Development 1. Modularity of development CSE developers engage in self-contained activities, e.g., per- activities forming an experiment, examining the result, and comparing the result. 2. Iterative with cycles Scientists or developers might modify both the model and code when they received unexpected results. They repeat- edly develop the model, perform the experiment, and com- pare the result in several iterations [Sanders and Kelly, 2008]. 3. Time-bound with iterations Publication cycles and funding cycles impose time bounds on scientists’ work and necessitate iterative work. 4. Parsimony Scientists’ chief aim is illuminating nature. They generally view the minimum amount of software that accomplishes the goal to be the optimal amount of software. 5. Adaptive CSE software tends to evolve due to new experimental re- sults, model development and continual augmentation of functionality [Sanders and Kelly, 2008]. Thus, require- ments and changes are extremely dynamic. Likewise, suc- cessful testing of CSE software is problematic due to lack of elicitation and specification of requirements in the early phases [Sletholt, Hannay, Pfahl, and Langtangen, 2012]. 6. Incremental Developers add new features that come from new findings to the system. They build the system in small steps. 7. Convergent Although, the developers do not know the details of all re- quirements in advance, they try to reduce the risk by writing code based on necessary features. They add new features into the system iteratively until the system meets the cus- tomers’ satisfaction. 8. People-Oriented A process-heavy software development methodology would conflict with scientists’ desire to minimize development time to meet publication and program review deadlines [Carver et al., 2007]. 9. Collaborative Developers might not thoroughly understand the problem in the scientific domain. Therefore, developers and scientists need to work constantly together with close communication.

15 ing requirements. Rather than focusing significant effort on the up-front design analysis, small

increments of functional code are implemented according to customer needs.

One of the well-known Agile methodologies is XP, which is a discipline of software devel-

opment based on the values of simplicity, communication, feedback, and courage [Beck, 2000b].

XP works by bringing the entire team together in the presence of simple practices, with sufficient

feedback to enable the team to view their progress and tune their practices to the unique situation

at hand. XP requires control at all levels: “project planning, personnel, architecture and design,

verification and validation, and integration" [Beck, 2000a]. The philosophy of XP is to make the

customer satisfied with the software product. Bringing the customer onto the team and being able

to receive frequent feedback are critical for achieving the goals of XP. Customer feedback should

be considered throughout the software development process.

Figure 2.1 presents the lifecycle of an XP project. The XP process is divided into six

phases: exploration, planning, iterations to release, production, maintenance, and death [Beck,

2000b]. In the exploration phase, the customer writes the stories included in the system. The set of

written stories are prioritized, and a schedule of the first release is developed during the planning

phase. The architecture of the system is created in the next phase, iterations to release, which includes the and code integration. Before releasing the system, developers may need to perform additional testing in the production phase. In the maintenance phase, any

suggestions about the system are documented for later implementation in the updated release.

Finally, the death phase occurs when there are no more stories or changes from the customer.

In general, development with XP practices is performed iteratively, and the phases occasionally

overlap; thus, it is not easy to provide a broad step-by step description of the XP practices. Table

2.3 lists the 29 rules of XP.

16 Table 2.3: The 29 rules of XP ([Don, 2009b]) No. Practices 1. User stories are written 2. Release planning creates the release schedule 3. Make frequent small releases 4. The project is divided into iterations 5. Iteration planning starts each iteration 6. Give the team a dedicated open work space 7. Set a sustainable pace 8. A stand up meeting starts each day 9. The Project Velocity is measured 10. Move people around 11. Fix XP when it breaks 12. Simplicity 13. Choose a system metaphor 14. Use CRC cards for design sessions 15. Create spike solutions to reduce risk 16. No functionality is added early 17. Refactor whenever and wherever possible 18. The customer is always available 19. Code must be written to agreed standards 20. Code the unit test first 21. All production code is pair programmed 22. Only one pair integrates code at a time 23. Integrate often 24. Set up a dedicated integration computer 25. Use collective ownership 26. Acceptance tests are run often and the score is published 27. All code must pass all unit tests before it can be released 28. When a bug is found tests are created 29. All code must have unit tests

17 Figure 2.1: Lifecycle of the XP process (adapt from [Abrahamsson et al., 2002])

2.3 Test-Driven Development (TDD)

TDD is one of the XP methodologies introduced to minimize developer effort [Beck and

Andres, 2004]. TDD is a high-software-quality approach in which the tests, called unit tests, are

written prior to the functional code itself. In particular, TDD helps software developers ensure that

the software is built according to the user requirements. It also helps the developers ensure that the

scope of the software features is addressed.

Figure 2.2 presents the process of TDD. First, the developers write the test cases for a par-

ticular functionality. All of the test cases fail because the code has not yet been written. Next, the

developers implement the functions based on the requirements to make the tests pass. Developers

only refactor the code after all of the tests pass successfully. Consequently, TDD can be thought of

as a programming technique that is based on a simple rule: “Only ever write code to fix a failing

test” [Koskela, 2007]. The TDD cycle is shorter than the traditional software-development cycle

(design-code-test-debug) because it focuses on building exactly what is needed and uses immediate feedback to reduce regression errors.

18 Figure 2.2: Overall process of TDD

Currently, no studies have investigated the impact of TDD on the CSE domain [Desai,

Janzen, and Savage, 2008; Madeyski, 2010], and thus, there is a great need to study the effects of

TDD in a CSE domain. In his work, Kollanus [Kollanus, 2010] performed a systematic literature review to analyze the empirical evidence of the benefits of TDD. He concluded that the empirical evidence of the external and internal software quality that results from using TDD is not sufficient.

He also suggested that still more empirical evidence is needed to better understand the effects of

TDD.

2.4 Refactoring

One of the key aspects of TDD is refactoring, which was originally introduced by William

Opdyke in his PhD dissertation [Opdyke, 1992]. Refactoring has many definitions, but the most

widely cited definition is "a change made to the internal structure of software to make it easier

to understand and cheaper to modify without changing its observable behavior" [Fowler, 1999].

Refactoring neither changes nor improves software from a functional perspective, as the software’s

functionality is the same before and after refactoring. The goal of refactoring is to improve the

software design and code structure. This objective stems from the fact that most programming

19 languages enable even simple problems to be solved in many different ways. Some examples of the most common motivations for refactoring are as follows:

• To make the system easier to add new code. When developers need to add a new feature to a

system, they can quickly implement the feature without worrying about how well it fits with

an existing system. The developers can refactor the code later, after a new feature has been

written.

• To improve the design of existing code. By continuously refactoring the design of the code,

refactoring improves the quality of the code. As a result, developers can easily extend the

maintained code.

• To help developers avoid errors. Refactoring is a method to clean up code, which minimize

the chances of introducing defects.

• To gain a better understanding of the code. Unclear code will impede the code comprehen-

sion’s process and must therefore be removed by refactoring.

The process of refactoring involves the removal of duplications, the simplification of com- plex logic, and the clarification of unclear code. Opydyke described 26 ‘low-level’ C++ refactor- ings. Some examples include:

• Create a member of the variable/function/class. This refactoring introduces a new variable

or function to a class or creates a new class.

• Delete an unused member of the variable/function/class. This refactoring removes variables,

functions, and classes that are not referenced.

20 • Rename a member. This refactoring is used to rename variables, functions, and classes.

• Move a member variable. This refactoring is used to redistribute a variable to a sub- or

super-class.

Generally, the refactoring process consists of several distinct activities:

1. Identify where the code should be refactored;

2. Determine which refactoring(s) should be applied to those identified places;

3. Guarantee that the applied refactoring preserves behavior;

4. Evaluate the effects of the refactoring on the software quality; and

5. Maintain the consistency between the refactored program code and other software artifacts.

Some refactoring techniques have been proposed for the Fortran , which is widely used in the CSE domain [Orchard and Rice, 2013; Overbey, Xanthos, Johnson, and Foote, 2005; Overbey, Negara, and Johnson, 2009]. The primary goal of these techniques is to improve the performance of the system being developed. Additionally, the automated refactoring tool Photran [Eclipse, 2013] was developed to help developers perform the refactoring process.

Photran is an IDE based on Eclipse, and it includes 39 refactoring methods, such as replacing common block and block data subprograms with module variables, removing computed goto, and requiring explicit interface blocks.

2.5 Design Patterns

Refactoring at the code level is not always adequate because some problems can only be identified at a higher level of abstraction. One possible method is to apply design patterns to the

21 system being developed. A design pattern is a generic solution to a common software design problem that can be reused in similar situations. Design patterns are made of the best practices drawn from various sources, such as building software applications, developer experiences, and empirical studies. Generally, we can classify the design patterns of the software into classical and novel design patterns. The 23 classical design patterns were introduced by the "Gang of

Four" (GoF) [Gamma, Helm, Johnson, and Vlissides, 1995]. Subsequently, software developers and researchers have proposed a number of novel design patterns targeted at particular domains.

For example, design patterns have been proposed specifically for parallel programming [Mattson,

Sanders, and Massingill, 2004; Ortega-Arjona and Luis, 2010].

Unfortunately, many developers misunderstood the difference between design patterns and refactoring. They considered design patterns to be related to the design itself and not to the code.

In fact, design patterns are related to both the code and design. Adding a pattern often requires changing the associated code. Ralph Johnson explained that developers generally use design pat- terns in two ways: before they write any code and after a significant amount of code has been written [Kerievsky, 2004]. The later approach is refactoring, as they are changing the software design without adding any new features or changing the external behavior. Additionally, Mar- tin Fowler noted that there is a relationship between design patterns and refactoring as follows:

“Design patterns are where you want to be; refactorings are ways to get there from somewhere else” [Fowler, 1999].

In general, a design pattern includes a section known as intent. Intent is “a short statement that answers the following questions: What does the design pattern do? What is its rationale and intent? What particular design issues or problem does it address?” [Gamma et al., 1995]. For example, the intent of the Template Method pattern requires that developers define the skeleton of

22 an algorithm in an operation, deferring some steps to subclasses. Template Method lets subclasses redefine certain steps of an algorithm without changing the algorithm’s structure. When using the design patterns, developers have to understand the intent of each design pattern to determine whether the design pattern could provide a good solution to a given problem.

A number of studies provide empirical evidence on the impact of design patterns on soft- ware quality [Ampatzoglou, Charalampidou, and Stamelos, 2013; Zhang and Budgen, 2012].

However, most studies focus only on traditional software, including both commercial and open source software. The fact that CSE software differs from traditional software means that little research has focused on design patterns in CSE software. Several researchers have proposed design patterns for computational software implemented with Fortran. For example, Wiedmann et al. [Weidmann, 1997] demonstrated how design patterns were implemented in an interface of sparse matrix computations on NVIDIA GPUs. They then evaluated the benefits of the implemen- tation and reported that the design patterns provided a high level of maintainability and perfor- mance. Based on the work of Rouson et al. [Rouson et al., 2010], they proposed three new design patterns, called Multiphysics design patterns, to implement the differential equations, which are integrated into Multiphysics and numerical software. These new design patterns include the Semi-

Discrete, Surrogate, and Template Class patterns. Based on the works of Markus [Markus, 2006,

2008], these works demonstrated how some well-known design patterns could be implemented in

Fortran 90, 95, and 2003. Similarly, Decyk et al. [Decyk and Gardner, 2007] proposed the Factory pattern in Fortran 95 based on CSE software. These researchers presented the proposed pattern implementation in their Particle-in-Cell (PIC) methods [Neunzert, Klar, and Struckmeier, 1995] in plasma simulation software. Decyk and Gardner [Decyk and Gardner, 2008] also described a way to implement the Strategy, Template, Abstract Factory, and Facade patterns in Fortran 90/95.

23 2.6 Object-Oriented Fortran

In CSE software, Fortran is still a very widely used programming language. Due to the growing complexity of the problems being addressed through CSE, the procedural programming style available in a language like Fortran 77 is no longer sufficient. Many developers have applied the object-oriented to effectively implement the complex data structures used within CSE software. A number of studies discuss approaches for expressing object-oriented principles in Fortran 90/95. For example, Decyk et al. described how to express the concepts of data encapsulation, function overloading, classes, objects, and inheritance in Fortran 90 ([Decyk,

Norton, and Szymanski, 1997a,b, 1998]). Moreover, several authors have described the use and syntax of object-oriented features in Fortran 2003 ([Brainerd, 2009; Metcalf, Reid, and Cohen,

2011; Rouson, Xia, and Xu, 2011]).

The Fortran 2003 language standard includes full support of object-oriented constructs, and as such influenced the advent of several CSE packages [Barbieri, Cardellini, Filippone, and Rou- son, 2012; Filippone and Buttari, 2012; Morris, Rouson, Lemaster, and Filippone, 2012; Rouson,

Xia, and Xu, 2010; Rouson et al., 2010]. Currently, a number of Fortran vendors support all (or almost all) of the object-oriented features included in the Fortran 2003 standard. These include: NAG [Numerical Algorithm Group, 1970], GNU Fortran (gfortran) [GNU,

2012], IBM XL Fortran [IBM, 2014], Cray [National Energy Research Scientific Cen- ter, 2014], and Intel Fortran [Intel, 2014] compilers [Chivers and Sleightholme, 2012].

Table 2.4 illustrates the Fortran-specific terms along with their object-oriented program- ming equivalent and their Java counterparts. Fortran 2003 supports procedure overriding, which is included in most object-oriented languages. Procedure overriding allows a child object to override

24 Table 2.4: Object-Oriented Fortran terms (adapted from [Rouson et al., 2010]) Fortran Object-Oriented Equivalent Java Counterpart Derived type Abstract data type (ADT) Class Component Attribute Property Type-bound procedure Method Method Parent type Parent class Base Class Extend type Child class Subclass Module Package Package Generic interface Static polymorphism Overloading Deferred procedure binding Abstract method Abstract Intrinsic type Primitive type Primitive type

a procedure inherited from its base object. In Fortran 2003, developers can specify a type-bound

procedure in a child type that has the same binding-name as a type-bound procedure in the parent

type. When the child overrides a particular type-bound procedure, the child’s type-bound proce-

dure will be invoked instead of the type-bound procedure in the base type. Fortran 2003 also sup-

ports user-defined constructors that can be implemented by overloading the intrinsic constructors

provided by the compiler. The user-defined constructor is created by defining a generic interface

with the same name as the derived type.

Figure 2.3 illustrates a snippet of Fortran 2003 code in which the parent type shape_ is ex- tended by the type circle. This new type inherits all of the properties assigned to shape_, but

adds a component (radius) and type-bound procedures (set_radius and add). At runtime,

the compiler invokes the type-bound procedure add whenever a "+" operator (with the specified

argument type) is used in the client code. This behavior conforms to polymorphism, which allows

a type or procedure to take many object or procedural forms. Data abstraction is the separation

between the interface and implementation of the program. It allows developers to provide essen-

25 1 module example 2 type shape_ 3 r e a l : : a r e a 4 end type 5 ! Inheritance 6 type , extends(shape_) :: circle 7 ! Data abstraction 8 p r i v a t e 9 ! Encapsulation 10 r e a l : : r a d i u s 11 c o n t a i n s 12 procedure :: set_radius 13 procedure :: add 14 ! Polymorphism 15 g e n e r i c : : operator ( + ) => add 16 end type 17 ! Overloads intrinsic constructor 18 i n t e r f a c e c i r c l e 19 module procedure new_circle 20 end interface 21 ! .... 22 end module

Figure 2.3: Sample code snippet of object-oriented constructs supported by Fortran 2003

tial information about the program to the outside world. In Fortran, the private and public

keywords control access to members of the type. Members defined with public are accessible to

any part of the program. Conversely, members defined with private are not accessible to code

outside the module in which the type is defined. Thus, Fortran 2003 has an encapsulation mecha-

nism for protecting data from use or access outside the module. In the example in Figure 2.3, the

component radius cannot be accessed directly by other programs. Rather, the caller must invoke

the type-bound procedure set_radius.

26 Chapter 3

METHODOLOGY

This chapter details the methodologies used to answer the research questions. In partic- ular, the objective of this research study was to determine whether TDD supports CSE software development. I use case studies and a survey to investigate the effect of TDD on CSE software development. Overall, this research consists of the three stages illustrated in Figure 3.1. The first stage includes development of tools to use during the case studies. The objectives of these tools are: 1) to collect necessary data during the case study and 2) to help Fortran developers perform program comprehension tasks. The second stage consists of two case studies and a survey, as follows:

1. Case Study: TDD in the Community Laser-Induced Incandescence Modeling Environment

(CLiiME) project.

2. Case Study: TDD in the Microscopy Imaging Processing project.

3. A survey of the effectiveness of TDD in the CSE community.

More specifically, the subject of the first case study is a CSE project that does not utilize a high-performance computing system. The second case study assesses a CSE project in a high- performance computing environment. One challenge of massively parallel computer systems is that they are difficult to program and manage. The development of parallel software is interesting because developers might use different refactoring techniques to improve their code. Furthermore,

27 the difficulty of employing SE techniques to develop parallel software is higher than for serial code development. Although these issues are important for practicing parallel programming, they are also important for TDD. The survey gathered additional data from scientists and engineers about these topics. The survey research method is a suitable way to cover broader CSE software development contexts. In the last stage, I analyzed the results from all studies and presented the

findings of studies as well as the conclusion and recommendations for further study.

The following sections describe the details of each study. These sections include the as- sumptions and rationale for each study, the setting, the participant selection procedures, the data collection procedures, and the data analysis methods.

Figure 3.1: Overall Research Process

3.1 Building the software tools for use in the case study

This section provides details about the work in the first stage. I have developed two software tools with different purposes, as follows:

• The automated data collection tool. This tool is used to collect necessary data while per-

forming the second case study.

28 • Reverse engineering tool for object-oriented Fortran: ForUML. ForUML helped me assess

the evolution of source code and software designs during the first case study. Additionally,

ForUML was designed to help developers perform program comprehension tasks.

The following subsections provide details about these tools.

3.1.1 The automated data collection tool

In the case study, to collect data during software development, I developed and customized an automated data collection infrastructure. The main goal of collecting data automatically is to identify useful heuristics for evaluating a developer’s workflow. Similar automated data collection approaches have been used in previous work in this area (e.g. [Zhang and Hochstein, 2009]). The tools collect automated data about two activities: compiling code and executing the program. More

specifically, the instrumentation contains the following tools:

Wrapper Scripts. The primary goal of wrapper scripts is to automatically capture informa-

tion about the development process and code evolution. Rather than invoking commands directly,

developers invoke a wrapper script. Figure 3.2 illustrates how the scripts work. When a developer

executes a script, first it captures the source code file passed in as an argument and checks it into

a Subversion repository. Second, it captures the text of the executed command and input parame-

ters. Third, it calls the underlying command with the parameters that were passed into the script.

Fourth, it captures any return message from the compiler or execution environments. Fifth, it adds

the command, parameter list, and return message to a log file. Finally, it checks the log file into the

Subversion repository. From the developer’s perspective there is no additional overhead because

the recording and storing of data are invisible.

29 Figure 3.2: The Wrapper Script’s Procedure

3.1.2 Reverse engineering tool for object-oriented Fortran: ForUML

The first project was developed using object-oriented Fortran. I needed a way to understand

the evolution of software designs while developers were employing TDD. To gain this under-

standing, the software engineering community typically uses reverse engineering, by transforming

source code into a model or documentation [Jacobson, Booch, and Rumbaugh, 1999]. Unfortu-

nately, although a number of reverse engineering tools exist [Müller, Jahnke, Smith, Storey, Tilley,

and Wong, 2000], the tools that can be applied to object-oriented Fortran (e.g., Doxygen1) do not provide the full set of documentation required by the developers. Therefore, I developed ForUML, a tool that automatically reverse engineers the necessary UML design documentation. The pri- mary goal of ForUML is to extract UML class diagrams from the Fortran code. This tool helped me to understand the evolution of source code and software designs in the context of the refactor- ing process during the case study. The following subsections provide details about the motivation

1 http://www.doxygen.org

30 for building ForUML, existing work upon which ForUML builds and expands, mechanisms of

ForUML, and evaluation methods.

3.1.2.1 Rationale for building ForUML

In addition to building ForUML for assessing source code and software designs during the

first case study, I also have other reasons for developing this tool. This section describes why

ForUML is necessary for CSE software development.

First, existing software engineering tools are difficult to use in a CSE development envi- ronment [Carver et al., 2007]. Consequently, CSE developers often have trouble identifying and using the most appropriate software engineering tools for their work, particularly in the case of re- verse engineering tasks. For example, Storey noted that CSE developers who lack formal software engineering training need help with program comprehension when developing complex applica- tions [Storey, 2006]. To address this problem, the software engineering community must develop tools that satisfy the needs of CSE developers. These tools must allow the developers to perform important reverse engineering tasks with simplicity. More specifically, a visualization-based tool is appropriate for program comprehension in complex object-oriented applications [Pacione, 2004].

Second, CSE software typically lacks adequate development-oriented documentation [Se- gal, 2004]. The lack of documentation makes refactoring and maintenance difficult and prone to errors. CSE software typically evolves over many years and involves multiple developers [Sletholt et al., 2012], as additional functionality and capabilities are added or extended [Britcher, 1990].

The developers need to be able to determine whether the evolved software deviates from the origi- nal design intent. To facilitate this process, the developers need tools to help them identify changes that may affect the design and determine whether those changes have undesired effects on the de-

31 sign integrity. The availability of appropriate design documentation can reduce the likelihood of poor choices during the maintenance process.

These issues indicate that CSE developers could benefit from a tool that requires little effort to create documentation of the system as it evolves. In addition to benefitting this research, this tool will provides following benefits to the entire CSE community.

1. The extracted UML class diagrams should support throughout the

development process. During software evolution, maintainers also use UML diagrams to

ensure that the original design intentions are maintained.

2. Developers can use the UML diagrams to illustrate software design concepts to their team

members. In addition, UML diagrams can help developers visually examine the relationships

among objects to identify code smells [Fowler, 1999] in the software under development.

3. Software engineers realize that tools affect productivity. ForUML encourages CSE devel-

opers adopt software engineering practices in CSE software development. For instance, Fo-

rUML can assist developers in performing refactoring by evaluating the refactoring results

with UML diagrams rather than with a manual code inspection.

3.1.2.2 Background

ForUML builds upon and expands existing work. Lethbridge et al. [Lethbridge, Tichelaar, and Ploedereder, 2004] provide the schema for the static structure of source code, called The

Dagstuhl Middle Metamodel (DMM). This schema is used to represent models extracted from source code written in most common object-oriented programming languages to support reverse engineering tasks. I applied the idea of DMM to object-oriented Fortran.

32 The transformation process in ForUML is based on the XML Metadata Interchange (XMI) format, which provides a standard method of mapping of the object model into XML. XMI is an open standard with which developers or software vendors can create, read, manage and generate

XMI tools. Transforming the model (Fortran code) to XMI requires the Model Driven Architec- ture (MDA) technology [Object Management Group, 1997b], which is a standard using modeling issued by the Object Management Group (OMG) [Object Management Group, 1997a]. MDA aims to increase productivity and reuse by separating concern and abstraction. A Platform Independent

Model (PIM) is an abstract model that contains the information to drive one or more Platform Spe- cific Models (PSMs), including source code, Data Definition Language (DDL), XML, and other outputs specific to the target platform. MDA defines transformations that map from PIMs to PSMs.

The information in the XMI can be used to develop their own applications among a set of tools to create and exchange. The basic idea of using an XMI file to maintain the metadata for UML diagrams was drawn from four reverse engineering tools. Alfi et al. developed two tools that use XMI to maintain the metadata for the UML diagrams: a tool that generates UML sequence diagrams for web application code [Alalfi, Cordy, and Dean, 2009] and a tool to create UML-Entity

Relationship diagrams for the Structured Query Language (SQL) [Alalfi, Cordy, and Dean, 2008].

Similarly, Korshunova et al. [Korshunova, Petkovic, van den Brand, and Mousavi, 2006] developed

CPP2XMI to extract various UML diagrams from C++ source code. CPP2XMI generates an XMI document that describes the UML diagram which is then displayed graphically by DOT (part of the Graphviz framework) [Gansner, Koutsofios, North, and Vo, 1993]. Duffy et al. [Duffy and

Malloy, 2005] created libthorin, a tool to convert C++ source code into UML diagrams. Prior to converting an XMI document into a UML diagram, libthorin requires developers to use a third party compiler to compile code into DWARF (a file format is used to support source

33 level debugging [Eager, 2007]). In terms of Fortran, DWARF only supports Fortran 90, which

does not include object-oriented features. This limitation may cause compatibility problems with

different Fortran compilers. Conversely, ForUML is compiler-independent and able to generate

UML for all object-oriented Fortran code.

Doxygen is a documentation tool that can use Fortran code to generate either a simple,

textual representation with procedural interface information or a graphical representation. The

only object-oriented class relationship Doxygen supports is inheritance. Doxygen has two primary

limitations. First, it does not support all object-oriented features within Fortran (e.g., type-bound

procedure, component). Second, the diagrams generated by Doxygen only include class names

and class relationships, but do not contain other important information typically included in UML

class diagrams (e.g., methods, properties). ForUML expands upon Doxygen by adding support

for object-oriented Fortran and by generating UML diagrams that include all relevant information

about the included classes (e.g., properties, methods, and signatures).

There are a number of available tools (both open source and commercial) that claim to

transform object-oriented code into UML diagrams (e.g., Altova UModel®, Enterprise Architect®,

StarUML, and ArgoUML). However, in terms of ForUML, these tools do not support object- oriented Fortran. Although they cannot directly create UML diagrams from object-oriented Fortran code, most of these tools are able to import the metadata describing UML diagrams (e.g., the XMI

file) and generate the corresponding UML diagrams. ForUML can take advantage of this feature to display the UML diagrams described by the XMI files it generates separately from object-oriented

Fortran code.

This previous work has contributed significantly to the reverse engineering tools of tradi- tional software. ForUML specifically offers a method to reverse engineer code implemented with

34 Figure 3.3: The Fortran metamodel different Fortran versions including the 2008 standard. Moreover, it was deliberately designed to support important features of Fortran, such as the coarray, and operator overloading.

3.1.2.3 Transformation Process

The primary goal of ForUML is to reverse engineer UML class diagrams from Fortran code. By extracting a set of source files, it builds a set of objects associated with syntactic entities and relations. Object-based features were first introduced in the Fortran 90 language standard.

Accordingly, ForUML supports all versions of Fortran 90 and later, which encompasses most platforms and compiler vendors. ForUML was implemented using Java Platform SE6 so that it could run on any client computing system.

The UML object diagram in Figure 3.3 expresses the metamodel of the Fortran language.

The Module object corresponds to Fortran modules, i.e., containers holding Type and Procedure objects. The Type-bound procedure and Component objects are modeled with a composition asso- ciation to instances of Type. Both the Procedure and Type-bound procedure objects are composed of Argument and Statement objects. The generalization relation with Base Type object leads to

35 Figure 3.4: An Overview of the Transformation Process the parents in the inheritance hierarchy. When generating the class diagram in ForUML, I con- sider only the objects inside the dashed-line that separate object-oriented entities from the module- related entities.

Figure 3.4 provides an overview of the transformation process embodied in ForUML, com- prising the following steps: Parsing, Extraction, Generating, and Importing. The following sub-

sections discuss each step in more detail.

1. Parsing - The Fortran code is parsed by the Open Fortran Parser (OFP)2. OFP provides

ANTLR-based parsing tools [Parr and Quong, 1995] including Fortran grammars and libraries for

performing translation actions. ANTLR is a parser generator that can parse language specifica-

tions in an EBNF-like syntax and generate the library to parse the specified language. ANTLR

distinguishes three compilation phases: lexical analysis, parsing, and tree walking.

I customized the ANTLR libraries to translate particular AST nodes (i.e., Type, Component, and Type-bound procedure) into objects. These AST nodes are only the basic elements of UML class diagrams. In fact, a UML class diagram includes Classes, Attributes, Methods, and Relations.

The parsing actions include two steps. The first step verifies the syntax in the source file and

2 http://fortran-parser.sourceforge.net

36 eliminates source files that have syntax problems. It also eliminates source files that do not contain

any instances of Type and Module. For example, ForUML will eliminate modules that contain only sub-routines or functions. After this step, ForUML reports the results to the user via a GUI. In the second step, the parser manipulates all AST nodes, relying on the model described earlier. Note that ForUML only manipulates the selected input source files. Any associated Type objects that exist in files not selected by the user are not included in the class diagram.

2. Extraction - Next, ForUML extracts the objects to identify the relationship among them.

During extraction, ForUML determines the type of each extracted relationship. Then, it maps each relationship to a specific relationship’s type object. Based on the example code in Figure 2.3, the type Circle inherits the type Shape. Subsequently, the extraction process creates a Generaliza- tion object. ForUML supports two relationship types: Composition and Generalization.

• Composition represents the whole-part relationship. The lifetime of the part classifier de-

pends upon the lifetime of the whole classifier. In other words, a composition describes a

relationship in which one class is composed of many other classes. In our case, the Compo-

sition association will be produced when a Type object refers to another Type object in the

component. However, the association that refers to the Type, which was not provided by the

user does not appear in the class diagram. In the UML diagram, a composition relationship

appears as a solid line with a filled diamond at the association end that is connected to the

whole class.

• Generalization represents an is-a relationship between a general object and its derived spe-

cific objects, commonly known as an Inheritance relation. Similar to the composition asso-

ciation, the generalization association is not shown in the class diagram if the source file of

37 the base type is not provided by the user. This relationship is represented by a solid line with

a hollow unfilled arrowhead that points from the child class to the parent class.

Note that the current ForUML version does not support the aggregation and dependency relations.

3. Generating - The XMI generator module was developed to convert the extracted objects into the XMI notation. After relationship objects are created, the XMI Generator transforms those objects into an XMI Version 1.2 document3. To ensure that the XMI document conformed to the standard, I followed the UML specification for maintaining UML models in a standardized

XMI. The XMI document is specified with a Document Type Definition (DTD), which defines how UML models are mapped into the XML file. The rules for mapping the extracted objects and XMI document are specified in Table 3.1. In addition to these rules, I needed to create new stereotype notations for the constructor and coarray constructs. Those features are notated in the UML class diagram with <> and <>, respectively. Regarding overloading, I used the stereotype to specify the name of a calling procedure name and followed by a referred procedure. For example, procedure :: x => y is shown in the class diagram as <> y(). In case of an operator overloading, I used the symbol of the operator as a calling procedure name, such as <> add(). At the completion of this step, all necessary objects are mapped into the XMI schema. Figure 3.5 illustrates the mapping of extracted objects into the UML class diagram.

4. Importing - Finally, the generated XMI document must be imported into a UML model- ing tool to display the resulting class diagram. ForUML currently uses ArgoUML for displaying the class diagram, which allows users to view UML diagrams without installing a separate UML modeling tool. I added methods in the main class of ArgoUML code to allow it to automatically

3 I chose Version 1.2 because at time of development ArgoUML supported that version.

38 Table 3.1: Fortran to XMI Conversion Rules Fortran XMI elements Derived Type UML:Class Type-bound Procedure UML:Operation Dummy Argument UML:Parameter Component UML:Attribute Intrinsic type UML:DataType Parent Type UML:Generalization.parent Extended Type UML:Generalization.child Composite UML:Association (the aggregation property as ‘composite’) import the XMI document. These methods have responsibilities to build class elements in the dia- gram after ArgoUML was invoked. From the user’s view, this process is transparent, i.e., the user does not need to manually import the XMI file. Of course, a user can later choose to view the UML class diagram by manually importing an XMI file into another tool.

After importing the XMI file, ArgoUML’s default view of the class diagram does not show any entities in the editing pane. Like the WYSIWYG4 concept, the user needs to drag the target entity from a hierarchical view to the editing pane. To help with this problem, I added features so that ArgoUML will show all entities in the editing pane immediately after successfully importing the XMI document. Note that the XMI document does not specify how to present the elements graphically, so ArgoUML automatically adjusts the diagram when rendering the graphics. Each graphical tool may have its own method for generating the graphical layout of diagrams. The key reasons why I chose to integrate ArgoUML into ForUML are: 1) it is open source and implemented in Java, making its integration seamless; 2) it has sufficient documentation; and 3) it provides suf-

ficient basic functions required by the users (e.g., Export graphics, Import/Export XMI, Critique,

Zooming).

ForUML provides a Java-based for executing the commands. To create a

4 WYSIWYG is acronym for "what you see is what you get"

39 Figure 3.5: Mapping extracted objects into the class diagram

UML class diagram, the user performs these steps: 1) Select the Fortran source code; 2) Select the

location to save the output; and 3) Open the UML diagram. AppendixA shows screenshots from

the ForUML tool along with explanations.

3.1.2.4 Evaluation Methods

Note that this section describes the methods for evaluation. Section 4.1 details the evalua-

tion results. I evaluated the accuracy of ForUML on five object-oriented Fortran software packages by adopting the definitions of recall and precision defined by Tonella et al. [Tonella and Potrich,

2001]:

• Recall measures the percentage of the various objects, i.e., Type, Components, Type-bound

procedure, and Associations, in the source code are correctly identified by ForUML.

• Precision measures the percentage of the objects identified by ForUML that are correct when

compared with the source code.

I performed the evaluations as follows.

40 1. First, I manually inspected the source code to document the number of relevant objects in

each package. Note: I performed this step multiple times to ensure that the numbers were

not biased by human error.

2. Second, I executed ForUML on each software package and documented the number of rele-

vant objects included in the generated class diagram.

3. Third, to compute recall, I compared the number of objects manually identified in the source

code (Step 1) with the number identified by ForUML (Step 2).

4. Finally, to compute precision, I determined whether there were any ForUML objects (Step

2) that were not in the code (Step 1).

The five software packages I used in the experiments were 1) ForTrilinos5; 2) CLiiME; 3)

PSBLAS6; 4) MLD2P47; and 5) MPFlows. I selected these software packages because they were developed for the CSE domain. Two of the software packages (CLiiME and MPFlows) are not yet publicly available. A description of each software package follows.

1. ForTrilinos - ForTrilinos consists of an object-oriented Fortran interface to expand the use

of Trilinos into communities that predominantly write Fortran. Trilinos is a collection of

parallel numerical solver libraries for CSE applications in the HPC environment. To pro-

vide portability, ForTrilinos extensively exploits the Fortran 2003 standard’s support for in-

teroperability with C. ForTrilinos includes 4 sub-packages (epetra, aztecoo, amesos, and

fortrilinos), 36 files, and 36 modules.

5 http://trilinos.sandia.gov/packages/fortrilinos/ 6 http://www.ce.uniroma2.it/psblas/ 7 http://www.mld2p4.it

41 2. CLiiME - Community Laser Induced Incandescence Modeling Environment is a dynamic

simulation model that predicts the temporal response of laser-induced incandescence from

carbonaceous particles. CLiiME is implemented in Fortran 2003. It contains 2 sub-packages

(model and utilities), 30 files, and 29 modules.

3. PSBLAS - PSBLAS 3.0 is a library for parallel sparse matrix computations, mostly dealing

with the iterative solution of sparse linear systems via a distributed memory paradigm. The

library assumes a data distribution consistent with a domain decomposition approach, where

all variables and equations related to a given portion of the computation domain are assigned

to a process. The data distribution can be specified in multiple ways allowing easy interfacing

with many graph partitioning procedures. The library design also provides data management

tools allowing easy interfacing with data assembly procedures typical of finite elements and

finite volumes discretization. Versions of the library have been successfully used to solve

linear system with millions of unknowns arising in complex simulations in various applica-

tion domains, mostly in fluid dynamics and structural analysis. The PSBLAS library version

3.0 is implemented in Fortran 2003. PSBLAS contains 10 sub-packages (prec, psblas, util,

impl, krylov, tools, serial, internals, comm, and modules), 476 files, and 135 modules.

4. MLD2P4 - Multi-Level Domain Decomposition Parallel Preconditioners Package based on

PSBLAS (MLD2P4 Version 1.2) is a package of parallel algebraic multi-level precondition-

ers. This package provides a variety of high-performance preconditioners for the Krylov

methods of PSBLAS. A preconditioner is an operator capable of reducing the number of

iterations needed to achieve convergence to the solution of a linear system; multilevel pre-

conditioners are very powerful tools especially suited for problems derived from elliptic

42 partial different equations (PDEs). This package is implemented with object-based Fortran

95. The MLD2P4 contains only one package (miprec), 117 files and 9 modules.

5. MPFlows - Multiphase flows (MPFlows) is a package developed for computational modeling

of spray applications. MPFlows is implemented in Fortran 2003/2008. The use of coarrays

within this application enables HPC software to work without requiring external parallel

libraries. MPFlows contains 2 sub-packages (spray and utilities), 12 files, and 12 modules.

3.2 Case Study: Test-Driven Development in the Community Laser-Induced Incandescence Mod-

eling Environment (CLiiME)

This case study investigates the development of the CSE project CLiiME, which was de- veloped in the Combustion Research Facility (CRF) of Sandia National Laboratories. This section provides an overview of the case study design. Section 4.2 describes the results and lessons from this case study.

3.2.1 Study Rationale

I undertook this study in response to questions raised before becoming involved in the de- velopment team as a visiting researcher at the CRF of Sandia National Laboratories. The CRF intended to develop CLiiME, an open-source software system that facilitates the analysis of ex- perimental data from the soot-detection technique commonly used in combustion diagnostics. The

CRF developers were very interested in Agile development but remained unsure as to whether it was appropriate for this project. The development team eventually decided to employ the XP and

TDD methods for this project. This opportunity prompted me to study the benefits and drawbacks of using TDD in the context of CSE software development.

43 3.2.2 Research Questions

This case study investigates the level of support provided by TDD for CSE software devel-

opment. Here, the research question is: “how does the test-driven development method support the CSE software development process?"

3.2.3 CLiiME Description

One of the interesting facts about this case study is that Sandia will distribute the CLiiME application to scientists, including CSE software developers and non-developers. The project cus- tomer is the Division of Chemical Sciences, Geosciences, and Biosciences in the United States

Department of Energy Office of Science. The deliverables for this customer are new scientific insights summarized in refereed scientific publications. For purposes of this project, the team’s domain expert served as a proxy for the customer. Numerous research groups have developed energy- and mass-balance models to describe particle heating by laser absorption, cooling by con- duction, radiation, and sublimation, and mass loss by sublimation.

The CLiiME project is an attempt to achieve a consensus in the understanding of the tech- nique. To obtain consistency in the analysis of results from different research groups, Sandia has selected a laser-induced incandescence (LII) model [Michelsen, 2003] validated against a range of experimental data as the base case. Once Sandia releases the base model to the community, different research groups should be able to extend the functionality of CLiiME by adding physical models based on new available experimental data. The LII model accounts for particle heating by light absorption from a pulsed laser and cooling by sublimation, conduction, and radiation.

This model also includes mechanisms for oxidation and annealing of the particles and non-thermal

44 Figure 3.6: Overall Structure for CLiiME photodesorption of carbon clusters from the particle surfaces. Sandia has established a community- based modeling environment designed to promote collaborative model development.

CLiiME intends to predict the temporal response of LII from carbonaceous particles and allows users to customize the model with their own equations. Additionally, CLiiME provides tools to visually compare the results of the simulation with experimental results. Figure 3.6 illustrates the overall structure of CLiiME.

The developers implemented CLiiME with two object-oriented programming languages,

Fortran and Java. They used object-oriented Fortran on the model portion (LII-Model), which is responsible for all of the calculations within the model. Conversely, developers built LII-UI, the graphical user interface (GUI) of the system, with Java. Developers designed the model to sup- port an LII simulation of a particle affected by the energy mechanisms explained by Michelsen

[Michelsen, 2003], but the users can add other energies in the future. LII-UI allows users to cus- tomize the energy mechanisms for their own models. Based on the current version of the software, the model supports the following mechanisms for a particular simulation: 1) Absorption; 2) An-

45 nealing; 3) Conduction; 4) Radiation; 5) Oxidation; 6) Sublimation; 7) Scattering; 8) Extinction;

9) Sublimation extension; and 10) Thermionic.

In its most recent release, the model included the following general capabilities:

• Track the evolution of a single particle. Developers assume that the evolution of a collection

of particles is a multiple of the contributions from a single particle.

• Allow the user to modify input parameters, such as the absorption coefficient, accommoda-

tion coefficient, and density.

• Allow for an extension of the model to address a collection of particles of different sizes.

• Enable capabilities for curve-fitting of simulation data.

• Generate output data in a simple text file format.

• Accept user-provided flags to set the energy mechanisms included in the model used for a

particular simulation.

• Accept user-provided laser wavelength, fluence value, detection wavelength, and detector

wavelength response functions and any other parameters required by the specific model.

• Accept a user-provided input file as the laser temporal profile for the simulation.

In the GUI, the current version of CLiiME supports the following features:

• Allow users to modify the energy mechanisms or write their own energy mechanisms.

• Allow users to build scripts for either compiling the code or running their own modified

model.

46 • Allow users to use the GUI to view the generated output.

Capabilities related to the GUI were the result of close customer and software developer interaction. The requirements established at the beginning of the project specified model expan- sion through the implementation of new source code in the form of modules. In the course of development, the customer added requirements to enable the use of CLiiME by scientists with no programming experience. The GUI allows scientists to create new energy models without having to write source code. It has the capability to generate Fortran code (.F90) according to the infor- mation that the user provides through the GUI, including the energy’s name, calculation methods, variables, and descriptions. Once the code is generated, the GUI also generates the scripts for com- piling the generated source code. Figure 3.7 presents an example of the GUI generating the code.

In addition to the incorporation of new requirements, developers could not clearly define some requirements at the beginning of the project. Due to time constraints, the software project was un- der development while the customer was simultaneously validating model parameters, comparing parameters to others used within the community, validating the correctness of numerical equations against previous simulations using similar models, conducting the analysis of new experimental results to determine certain model parameters, and other associated activities. To accommodate the dynamic nature of all these requirements, the development team decided to use an Agile devel- opment process for this project.

3.2.4 Procedure

The CLiiME project had three developers, including me, and one domain expert. All three developers had 5+ years of software development experience, but none had used TDD. To ensure

47 Figure 3.7: The GUI of CLiiME

that the team understood TDD and to gain project agreement, the team allowed everyone to discuss

and share the idea about Agile and TDD practices.

The development team had one server machine to control the source code and build the

system. Developers needed to pull the latest version from this server. To control the release

versions in this project, they used Git8 as the version control system. Git allows a team member to work on the same set of files without interrupting other developers. The team chose the CTest and

CMake frameworks9 to perform automated unit testing. CTest is part of the CMake open source system that manages the build process of a software application using a compiler-independent

8 http://git-scm.com 9 http://www.cmake.org

48 method. CTest helps developers define the unit-test code, control the execution of tests, run the tests, and report the results of tests through a single command.

3.2.5 Data Collection

I collected materials and had frequent discussions with the stakeholders throughout the project. The collected materials consist of the project plan, software design, collaborative emails, source code (including the production and testing code), and other project materials. The team had weekly meetings in which relevant topics, such as their progress and new requirements from cus- tomers, were discussed. The weekly meetings were typically attended by both the developers and customers. These discussions focused not only on the software requirements and specifications, but also on the impact of using TDD. I recorded the important issues and problems raised during these meetings. Thus, the meeting minutes have two major components: an update of the progress of the project and a report on the discussion that occurred during the meeting. The meeting min- utes were complemented by other sources of data, such as emails and informal discussions (e.g., discussion in the break room, lunch-time conversations, and phone calls).

I was able to access all source files for this project. The development team used Git to keep track of the changes made to the source code. They committed their code changes to the central repository after testing them on their local machines so that I could examine the changes along with the log records. The team developed a coding standard of a set of rules and recommendations for coding in a Fortran programming language. The coding standard helps the developers in adding comments to their code. This rule allows the team members to understand the code easily. In this project, the comments include the necessary constructs for the generation of Doxygen10 documen-

10 www.doxygen.org/

49 tation. Otherwise, developers should avoid providing unnecessary comments, which could confuse other developers. I also included source code comments as part of the data for this study.

3.2.6 Data Analysis

I transcribed all documents and project materials before analyzing the data. The transcrib- ing process allowed me to become acquainted with the data. I used the meaning of the analysis context as the unit of analysis for coding. Thus, the data were not coded sentence-by-sentence or paragraph-by-paragraph but were instead coded for meaning. The variety of data collected during the case study is reflected in the varying approaches used to organize the data and analyze them.

The main activities of the analyses were as follows:

1. Identification and organization of the information related to the project plan, such as sched-

ules, milestones, and customer requirements for each release. This information was collected

from meeting minutes, e-mails, and project documentations. This activity required the care-

ful reading of a significant quantity of documentation relating to the projects. I then recorded

the summarized data in word-processing documents.

2. Identification and classification of decisions and actions by the development team to attempt

to solve software design and development process problems. Rather than reading the various

project documentation notes, this activity required an actual comprehension of the program

itself. I reviewed the source code and inspected the software design. To capture the design

form of the written code, I used ForUML. I recorded the summarized data in word-processing

documents and UML diagrams.

50 3.3 Case Study: TDD in the Microscopy Imaging Processing project.

The goal of this study is to investigate the effects of using TDD to develop CSE software in a high-performance-computing (HPC) environment. As part of this project, I gathered data about the microscopy image processing application developed by the Ohio Supercomputer Cen- ter. Microscopy image processing is a technique for manipulating an image captured through a microscope. Microscope image processing is of increasing interest to the CSE community. Recent developments in cellular-, molecular-, and nanometer-level technologies have led to rapid discov- eries and have greatly advanced many fields (e.g., biology, medicine, and pharmacology).

3.3.1 Study Rationale

Computational scientists and engineers have often used HPC to process large amounts of data and solve complex calculations. Currently, the HPC-CSE community has increasingly been adopting software engineering practices. The existing techniques are specifically proposed to help developers build HPC applications. For example, the use of design patterns in parallel comput- ing, which is one type of HPC application, differs from its use in traditional software develop- ment [Mattson et al., 2004]. Since the subject of the first case study (3.2) did not utilize a HPC system, I needed conduct a second case study in a HPC environment.

I believe that the benefits and barriers of employing TDD in HPC-CSE software develop- ment are different from those of traditional CSE software development. Therefore, this study was undertaken to better understand how TDD can support CSE software development and to inves- tigate the techniques that developers use to refactor their code. Additionally, having results from multiple case studies strengthens the conclusion I can draw.

51 3.3.2 Research Questions

The three research questions are as follows:

1. What is the effectiveness of TDD from the developers’ point of view?

2. Which refactoring techniques do developers use to improve the code?

3. What are the difficulties and barriers in using TDD?

3.3.3 Data Collection

The developers used an environment designed to collect various types of data during each of their activities, including compiling and executing source code to test its correctness. In addition, the developers completed an experience questionnaire, weekly surveys, and monthly surveys, via an online system. More specifically, the data collection methods are described below.

3.3.3.1 Automated Data

The main goal of automated data collection is to identify useful heuristics for evaluating a developer’s workflow. The developers were asked to install the wrapper scripts (described in

Section 3.1.1) in their home directory, which is located on the clustering system. Figure 3.8 illus- trates the procedure of automated data collection. When the developer calls the compile or execute commands, the wrapper automatically captures the code file and saves it in the Subversion reposi- tory located at the University of Alabama. It also captures and saves the return messages from the compiler. This process is transparent from the development perspective.

Therefore, the Subversion repository contains all of the code and log reports that occur during the project. The log report contains information describing the problems during the compi- lation and execution time.

52 Figure 3.8: The procedure of automated data collection

3.3.3.2 Surveys

I asked developers to answer weekly and monthly surveys, as well as a background ques- tionnaire. The developers spent approximately 10 min on each weekly survey and 15 min on the monthly survey. The main goal of the weekly survey is to observe the progress and developer’s

TDD-related activities during each week. Table 3.2 provides the weekly survey questions (Ap- pendixB presents the survey posted on the web site). The weekly survey questions are divided into three parts:

1. Progress: This part consists of questions about the progress of the work and the ease of using

TDD practices in software development.

2. Technical: This part consists of questions about the refactoring techniques that the developer

used during the software development.

3. General: This part consists of questions about the problems that might arise in using the data

collection tools. In addition, there is a question intended to obtain any additional comments

from the developer.

53 The monthly survey aims to assess the developer’s experiences with using TDD and refac- toring strategies. Table 3.3 presents the monthly survey questions. (AppendixC presents the survey posted on the web site.) The questions ask for the developer’s opinion about using TDD and the refactoring techniques during that month. Additionally, the survey asks the developer to identify the benefits and drawbacks of TDD and the refactoring techniques.

The background questionnaire includes questions about the demographics and experiences of the software developers (e.g., their programming skill and TDD experience). AppendixD pro- vides details on the background questionnaire.

3.3.4 Data Analysis

To analyze the survey responses, I performed a qualitative analysis process that includes:

(a) coding categories for each question, (b) identifying and coding each answer carefully, (c) orga- nizing each answer into categories, and (d) comparing each new answer to the existing categories to determine whether the new data fit into an existing category.

The data analysis process begins with the coding and organization of the data in search of patterns, critical themes, and meanings. The goal of coding is to learn from the data and keep revisiting them until certain patterns and explanations emerge. This type of coding is reminiscent of the filing techniques by which we sort information and thereby ensure access to everything that is known about a topic. When codes are analyzed, they can be treated in a similar manner as other nominal or ordered categorical data. The frequencies of different types of responses can be counted or cross-tabulated. The final piece of data analysis involves representing and reporting the results.

In addition to analyzing the survey responses, I analyzed the code in the Subversion repos- itory. The main objectives of this analysis are: 1) to investigate whether the refactoring techniques

54 Table 3.2: The list of weekly questions Progress Questions 1. How much of your planned work did you complete this week?  0%  1-25%  26-50%  51-75%  76-100% 2. Did you add any new functionality this week?  Yes  No 3. What percent of the code that you wrote last week followed the TDD process?  0%  1-25%  26-50%  51-75%  76-100% 4. How many new test cases did you write this week? ...... 5. Did you have any problems with TDD? If yes, please describe.  Yes  No 6. Order these step in terms of difficulty relative to TDD. Writing Test Cases  Least Difficult  More Difficult  Most Difficult Make Test Pass  Least Difficult  More Difficult  Most Difficult Refactoring Code  Least Difficult  More Difficult  Most Difficult 7. Has the schedule changed? If yes, please explain.  Yes  No Technical Questions 1. Did you refactor code this week?  Yes  No 2. Which refactor techniques did you use this week? (Refactoring Techniques) Very Frequently  Frequently  Occasionally  Rarely  Never 3. Did you evaluate the result of the refactoring?  Yes  No 4. When did you refactor the code?  After all test pass  After added a new feature  Review the code  Fix a bug  Whenever possible  Other 5. What is your motivation for refactoring?  Fix problems  Improve performance  Improve maintainability  Improve security  Follow TDD  Other 6. Did you rewrite the existing test case after refactoring?  Yes  No 7. Did you add the new test case after refactoring?  Yes  No 8. Did you remove the existing test case after refactoring?  Yes  No General Questions 1. Did you have any problems with the wrapper scripts or VIM sensor? If yes, please specify.  Yes  No 2. Did you use any other software development tools? If yes, please specify.  Yes  No 3. Other comments......

55 Table 3.3: The list of monthly questions Questions 1. Based on your experience, what are the benefits and disadvantages of TDD? ...... 2. Based on your experience, how effective are the following activities during TDD? Writing Test Cases  Very Effective  Effective  Neutral  Ineffective  Very Ineffective  No Opinion Make Test Pass  Very Effective  Effective  Neutral  Ineffective  Very Ineffective  No Opinion Refactoring Code  Very Effective  Effective  Neutral  Ineffective  Very Ineffective  No Opinion 3. Did you perform any software design activities during code development?  Yes  No 4. When did you create the software design?  Before writing Test Cases  Before Writing Code to make the Test Cases Pass  After the Test Cases Pass  Refactoring Code  Other (Please specify) 5. How often did you change the software design?  Very Frequently  Frequently  Occasionally  Rarely  Never 6. Did you change the software design white refactoring the code?  Yes  No 7. In your opinion, did refactoring the code improve its quality?  Yes  No 8. Besides refactoring, did you use other techniques or approaches to improve the code?  Yes  No 9. Overall, how did you identify code smells (poor code or poor design)? ...... 10. In your opinion, which refactoring technique(s) are very helpful for your work? Why? ...... 11. What did you learn from refactoring code? ...... used to improve the design and code conform to the survey responses, and 2) to measure the progress of the software projects.

To accomplish the first goal, I analyzed the code of each commit and compare any observed changes to a previous version. These changes involved adding, removing, or modifying the code.

For the second goal, I measured the project size, including the lines of code (LOC) and number of functions.

56 3.4 The survey of the effectiveness of TDD in the CSE community

To gain a more complete understanding of the community’s perspective on TDD, I con- ducted a survey within the CSE community regarding the effect of TDD on CSE software devel- opment.

3.4.1 Study Rationale

Based on the results of the case study at Sandia National Laboratories (Section 3.2), I would like to better understand how TDD supports other CSE projects. To answer this question I sent a survey to the CSE community. The goals of this study are threefold: 1) to determine the opinions of CSE developers with regard to employing TDD in their projects, 2) to better understand the benefits developers gained by using TDD in CSE projects, and 3) identify any barriers to using

TDD in CSE projects.

3.4.2 Research Questions

The purpose of this study is to conduct a survey of CSE software developers from various organizations. Three research questions are follows:

1. What is the effectiveness of TDD from the developers’ point of view?

2. Which refactoring techniques do developers use to improve the code?

3. What the difficulties and barriers in using TDD?

Note: I conducted this study with the same objective as the second case study (Section 3.3.2), so these research questions are the same questions as used in the second case study.

3.4.3 Participants

I solicited participation in the survey via emails to the following groups:

57 1. The emails of people who attend conferences or workshops that are related to the CSE soft-

ware or other HPC software (126 e-mails).

2. The emails of authors who have publications related to CSE software development (134

e-mails).

3. Other related email lists in the CSE software domain (40 e-mails). For example, divisions

or departments in Oak Ridge National Laboratory, including Computational Chemical and

Material Sciences11, Computational Earth Sciences12, Computational Mathematics13, Com-

puter Science Research14. Note: I only chose the groups that provide public e-mails.

3.4.4 Procedure

This section describes the activities that I followed to conduct the survey research. Fig- ure 3.9 illustrates the process of this study.

1. Designed survey questions.

2. Have questions evaluated by experts and refined based on feedback.

3. To ensure that the survey questions are comprehensible and valid with respect to the study

construct, I conducted a pilot study to observe all stages of the survey process, including the

administration of the questionnaire. The pilot study duplicates the final survey design on a

small scale from beginning to end, including the data processing and analysis steps. The

pilot study allowed me to see how well the questionnaire performs during all steps in the

11 http://www.csm.ornl.gov/newsite/comp_materials.html 12 http://www.csm.ornl.gov/newsite/climate_dynamics.html 13 http://www.csm.ornl.gov/newsite/comp_math.htm 14 http://www.csm.ornl.gov/newsite/network_cluster.html

58 survey. I randomly selected 5% of the participants (15 e-mails) from the target lists. These

participants were excluded from the subsequent major study.

4. I evaluated the preliminary results of the pilot study in terms of whether the questions are

understood and the answers are sufficient for analysis. The questionnaire was revised based

on the results of the pilot study.

5. After the verification process, I made the survey available on the web, where it can be ac-

cessed through the URL (http://universityofalabama.qualtrics.com). One advantage of web-

based surveys is that participants’ responses will be automatically stored in a , from

which they can be transferred easily into other formats (e.g., spreadsheets, SPSS15). The first

page of the survey presented an informed consent form where the participants could express

their willingness to participate.

6. The web link was distributed to participants.

7. To increase the response rate, (1) I sent an e-mail reminder one week after sending out the

surveys, and (2) I sent a second e-mail reminder reiterating the importance responding after

two weeks.

8. I obtained the survey from respondents.

9. Finally, I analyzed the collected data and summarize the results.

3.4.5 Data Collection

Table 3.4 presents the mapping between research questions and a list of the survey ques- tions. The survey consists of two sections, including general (Questions 1-9) and specific (Ques-

15 a software package used for statistical analysis (http://www-01.ibm.com/software/analytics/spss/)

59 Figure 3.9: The procedure of the survey tions 10-23) sections. In the first section, I ask participants about their demographic information.

These questions provide basic information about the participants, such as their experience, educa- tional background, and programming skills. The questions in the second section ask participants to assess the effectiveness of employing TDD and the refactoring techniques. Additionally, I ask specific questions about the TDD and refactoring activities.

In this study, I use a self-developed questionnaire technique including different question formats: multiple choice, asking either for one option or all that apply, dichotomous Yes/No an- swers, and self-assessment items (using a five-point scale), and open-ended questions. Some ques- tions in the survey have an open-ended “Other (specify)” option to provide the correct answer for every subject in the study. AppendixE presents the survey posted on the web site.

3.4.6 Data Analysis

To analyze the response data from surveys, I used the same method to analyze the qualita- tive data described in the second case study (Section 3.3.4).

60 Table 3.4: The Survey Questions Demographic Questions 1.For which type of organization do you currently work? 2. What type of projects do you typically work on? 3. Please describe any other significant work experience in fields other than your educational background. 4. Please describe your educational background (i.e. list degrees and Majors - B.S. in Chemistry; M.S in Chemistry, etc..) 5. How many years have you been developing real CSE software projects? 6. Please rate your programming language skills. 7. Do you know Test-Driven Development (TDD)? (If No, go to 7.1) 7.1 Do you have any plan to learn or employ TDD for your CSE project? (if Yes, go to 7.1.1) 7.1.1 Why you will use TDD for your CSE project? 7.2 Do you currently use any specific software development process in your CSE project? (e.g., Agile, Waterfall) (If Yes, go to 7.2.1) 7.2.1 Please describe. 8. What is your previous experience with TDD in a CSE project? (Check the most item that applies)? 9. How have you obtained your Test-Driven Development skill? RQ#1 What is the effectiveness of TDD? 10. Please rank the software quality based on what is important to your software (Compatibility, Functional suitability, Maintainability, Operability, Performance, Reliability, Security, Transfer- ability) 11. Based on Question 10, how was the effectiveness of employing TDD on the most important software quality? 12. Based on Question 10, how was the effectiveness of refactoring on the most important software quality? 13. Based on your experience, what are the benefits and disadvantages of TDD? 14. Have you ever employed TDD in a parallel computing project? (If Yes, go to 14.1) 14.1 How was the effectiveness of employing TDD in the parallel project on the most important software quality (Question 10)? RQ#2 Which refactoring techniques does the developer use to improve the code? 15. Which techniques did you use to refactor the code? 16. When using TDD, how often do you design the software before writing the code? 17. When using TDD, how often do you perform any software design activities during code devel- opment? 18. Besides refactoring, did you use other techniques or approaches to improve the code? 19. Overall, how did you identify poor code or poor design? RQ#3 What are the difficulties of using TDD? 20. Did you use any automated testing tools? (CMake, CTest, GTest, etc...) 21. Please rank these activities in terms of difficulty relative to TDD 22. What did you learn about the problem of writing tests in your project? how did you solve such problems? 23. What did you learn about the problem of refactoring the code in your project? how did you solve such problems? 61 Chapter 4

RESULTS

This chapter describes the results and findings of the studies described in Chapter3. Sec- tion 4.1 describes the evaluation of ForUML. Section 4.2 reports the results of the first case study on CLiiME. Section 4.3 reports the results of the survey study. Section 4.4 provides the results of the second case study on the microscopy imaging processing project.

4.1 ForUML

This section provides the details of the evaluation results along with the limitations of

ForUML and some lessons learned from the experiments.

4.1.1 Evaluation Results

Table 4.1 shows the results of the experiments (described in Section 3.1.2.4) that evaluated the precision and recall of ForUML compared with manual analysis. Each cell represents the recall as a ratio between extracted data and actual data. The results show that the recall reaches

100% for all sub-packages. Overall, there was only one error in precision in the ForTrilinos sub-

package of ForTrilinos. The analysis of the code identified a conditional preprocessor statement

(specified by the #if statement) as the source of the problem. ForUML currently does not handle preprocessor directives. During the experiments, only 6 files were not parsed (0.89% of all files).

The notification messages informed the users which files were not processed and specifically why each file could not be processed. Based on code inspection, I found four files that do not conform to the Fortran metamodel described earlier (Figure 3.3). Those files do not have the module

62 keyword that is the starting point for the transformation process. Other file exceptions were due to

ambiguous syntax, such as Fortran keywords were used as part of a procedure name (e.g., print, allocate).

Figure 4.1 provides an example of an excerpt from a class diagram produced by ForUML for the MPFlows project. In Fortran, each dummy argument has three possible intent attributes including IN, OUT, and INOUT. Therefore each, parameter, which is passed to the operation in the diagram, needs to be specified with a specific intent. In the class diagram, the keyword IN is omitted because ArgoUML assumes that a parameter has the IN by default.

4.1.2 Limitations

Based on the experimental results, ForUML provided quite precise outputs. ForUML was able to automatically transform the source code into the correct UML diagrams. To illus- trate the contributions of ForUML, Table 4.2 compares ForUML with other visualization-based tools [Storey, 2006] that have features to support program comprehension tasks.

Based on this table, one of the unique contributions of ForUML is its ability to reverse engineer OO Fortran code. ForUML integrates the capabilities of ArgoUML to visually display the class diagram. However, ForUML has a few limitations that must be addressed in the future:

• Provide more relationship types. One example of other relationship types in UML is depen-

dency. In practice, dependency is most commonly used between elements (e.g., packages,

folders) that contain other elements located in different packages.

• Incorporation of other UML CASE (Computer-aided Software Engineering) tools. Currently,

ForUML integrates ArgoUML as the CASE tool. I plan to build different interfaces to in-

19 http://www-03.ibm.com/software/products/us/en/enterprise/ 20 http://www.rigi.csc.uvic.ca

63 Table 4.1: Evaluation of ForUML : recall (extracted data / actual data) Packages Sub-packages Type Procedure Component Inheritance Composition Epetra 16/16 304/304 17/17 12/12 2/2 Aztecoo 1/1 12/12 1/1 0/0 0/0 ForTrilinos Amesos 1/1 7/7 1/1 0/0 0/0

64 ForTrilinos 48/48 11/11 139/139 4/4 4/4 CLiiME model 23/23 167/167 61/61 32/32 32/32 modules 50/50 1309/1309 160/160 34/34 28/28 PSBLAS prec 20/20 208/208 28/28 24/24 12/12 MLDP4 miprec 11/11 0/0 67/66 0/0 10/10 MPFlows spray 10/10 55/55 29/29 2/2 3/3 Overall 180/180 (100%) 2073/2073 (100%) 503/503 (100%) 108/108 (100%) 91/91 (100%) Figure 4.1: The Class Diagram (partial) : MPFlows tegrate with other UML tools, so users can select their tool of preference. Although many

UML CASE tools support the use of XMI documents, there are several XMI versions de-

fined by Object Management Group (OMG) and different tools support different versions.

I also plan to develop a plugin for Photran [Eclipse, 2013; Overbey et al., 2005], to allow users to automatically generate UML diagrams within the IDE.

65 Table 4.2: A brief comparison between UML tools (A - Automatically adjusted and M - Manually adjusted) 19 20

Features Rose Enterprise Doxygen Libthorin ForUML+ArgoUML Rigi Visualization UML Graph UML UML Graph Reverse Engineering (Fortran) No No Ver.90 Yes No Hide/Show Detail Yes No Yes No No Inheritance Yes No Yes Yes No Layout A/M A A A/M A

4.1.3 Lessons Learned

To support program comprehension, however, the UML diagrams must be properly ar- ranged. A large class diagram that contains several classes and relationships requires more users’ effort than a smaller one. Unfortunately, the built-in function layout in ArgoUML does not refine the layout as I expected when the diagram contains many elements. Although ArgoUML provides the ability to zoom in or zoom out, the diagram is still difficult to view. To increase the diagram’s understandability, one possible solution is to divide the classes into smaller packages. Another option is to provide different settings for the information included in the class diagrams, allowing a user to create diagrams with the level of detail required for a particular task. This option can ease the development and/or maintenance process. Therefore, to provide options for eliminating irrelevant details is helpful.

66 4.2 A Case Study: Agile Development in the Community Laser-Induced Incandescence Model-

ing Environment (CLiiME)

This section reports results from the case study and lessons learned that might be useful to the CSE community.

4.2.1 Findings and Results

At the beginning of the project, the development team had a kickoff meeting to introduce

XP and TDD to the development team. The customer provided an overview of the project and I discussed the TDD methodologies. Because the development team intended to release the CLiiME application as open source software, building an extendable system is an important project goal.

The development team also decided to implement the application using Fortran 2003, an Object- oriented language. In Fortran 2003, the object-oriented concepts of classes, class methods, and class attributes correspond to extensible derived types, type-bound procedures, and components, respectively. The Open/Closed principle is another principle of this project. This principle states

“classes should be open for extension, but closed for modification” [Meyer, 1997]. In practice, this principle means that developers should be able to extend the existing classes easily, without need- ing to modify the base class. The application should allow for the addition of new functionalities and new classes without requiring modifications to the existing code. Figure 4.2 presents the de- velopment process that the results of each step will be described in each of following subsections.

4.2.1.1 Write a test

When developers write a test in TDD, they are actually making design decisions. In this project, developers focused on how they could implement each functionality and then developed a unit test. Unit tests verify that small elements of the system are working as expected. A unit

67 Figure 4.2: The development process test will not compile at this point. Therefore, developers must create a concrete implementation for the functionality. The strategy developers used for writing a test was to create each unit test as an object. They called this unit test a testing class. Each testing class must have all related testing functions. For example, developers created the testing class absorptionTest for dealing with the absorption energy. This test class consists of testing functions that compute the absorption energy for the particle. Based on the requirements, the users can configure some parameters, so the testing class has functions for testing configurable inputs and default inputs. Figure 4.3 displays an example of unit testing code for the absorption energy.

In this step, the challenge is to write a good unit test. A unit test should test only a very specific functionality. If the unit tests are rough, a test failure is more difficult to discover. In con- trast, if the unit test examines a small amount of code, developers can spot the test failure quickly.

To test the partial differential equations (PDEs) modeled in this application, developers tested the output of different functions involved in the PDE (e.g., the Runge-Kutta method [Hairer, Roche, and Lubich, 1989]). Developers compared the obtained results with expected results computed by

68 1 type absorptionTest 2 c o n t a i n s 3 procedure :: absorption_energy_input 4 procedure :: absorption_energy_default 5 end type 6 c o n t a i n s 7 subroutine absorption_energy_input(this) 8 class(absorptionTest) , i n t e n t ( in ) : : t h i s 9 type (absorptionEnergy) :: absorption 10 !Result from running the Igor code 11 r e a l (rkind) :: expect = 2.441e−10 12 absorption = absorptionEnergy() 13 Qabs = absorption%Energy() 14 c a l l assert(error_within_tolerance(expect , Qabs), & 15 error_message( "Qabs= 2.441e-10 is expected" )) 16 end subroutine 17 subroutine absorption_energy_default(this) 18 class(absorptionTest) , i n t e n t ( in ) : : t h i s 19 type (absorptionEnergy) :: absorption 20 r e a l (rkind) :: Qabs 21 !Result from running the Igor code 22 r e a l (rkind) :: expect = 1.441E−10 23 chara cter ( l e n =*), parameter :: filename='default.txt' 24 absorption = absorptionEnergy(filename=filename) 25 Qabs = absorption%Energy(properties , laser) 26 c a l l assert(error_within_tolerance(expect , Qabs), & 27 error_message( "Qabs= 1.44E-10 is expected" )) 28 end subroutine

Figure 4.3: Sample code snippet of unit testing code.

another commercial software application, called Igor16, which provided some built-in functions

involved in the PDE calculation. More specifically, developers compared the results for a small

number of time steps (e.g., 10 steps). In this test case, developers compared the results of each

time step obtained from CLiiME with the expected results produced by Igor. Developers used

the debugging capability of Igor to acquire the expected result at each time step. Developers then

created other test cases to verify the results of the entire time evolution.

Throughout this project, the team set an error tolerance value (the difference must be less

than 10%) to avoid the round-off error problem when verifying the computed values. To ensure

consistent testing, all developers had to use a developed function employing the tolerance value to

compare two numerical values. Although these testing strategies allow developers to validate the

16 http://www.wavemetrics.com

69 output with confidence, in general, the development of unit tests in CSE software projects requires additional techniques to identify the appropriate testing task for a given development environment.

Unit test dependency is another problem encountered when writing a unit test. Issues arise when one unit test depends on another unit test. To solve such problems, the developers tried to minimize the dependency between unit tests. Thus, the developers followed these general rules when creating the unit tests:

• Tests should be isolated and order-independent (atomic).

• Tests should run quickly.

• Tests should not require manual setup.

In addition to writing a test for a particular functionality, the developers also created a test when they found a bug.

4.2.1.2 Write the code to pass the test

The next step is to write just enough code to pass the test. The primary goal of this step is to pass the test with the least development time possible. Developers wrote only enough code to meet the current requirements and did not try to predict what the customer would need in the future. To ensure that developers ran all unit tests after writing the code, everyone in the team used the same scripts generated by CTest and CMake frameworks to define the unit test code, to control the execution of the tests, to run the tests, and to report the results of tests through a single command. When developers changed the code, developers ran all of the tests to identify the effects of the change. If any tests failed, developers corrected the code until all of the tests passed again.

Developers also deployed these frameworks on an integration server when building the system to ensure that none of the releases had errors.

70 4.2.1.3 Refactoring

The main purpose of refactoring is to improve the maintainability of the software. As

developers developed CLiiME, the number of dependencies among the classes became unwieldy,

which violated the Open/Closed principle. For example, developers needed to modify existing

classes when they wanted to add a new energy class. They solved this problem through the use of

design patterns.

In CLiiME, developers used three design patterns (defined in the subsequent paragraphs):

the Strategy [Gamma et al., 1995], Factory Method [Gamma et al., 1995], and Surrogate [Rouson et al., 2011] patterns. Figure 4.4 illustrates the class diagram of CLiiME, which includes these three design patterns. The colored boxes highlight classes that relate to each pattern: Strategy- blue, Surrogate-green, and Factory Method-red.

The Strategy pattern is useful when a client object needs to be able to dynamically select one algorithm from a set of related algorithms. Consider the following : a developer wants to defer the decision about which algorithm to use to sort a set of given numbers until runtime. The developer implements each algorithm in a separate class to encapsulate them from the other code.

In this way, the Strategy pattern inverts the dependencies of the generic algorithm and their detailed implementation. For CLiiME, the Strategy pattern allows users to select a specific algorithm for advancing the differential equations in time. At runtime, users can choose from among the provided algorithms (Runge Kutta second- and fourth-order methods) or create new ones.

To include the Strategy pattern in CLiiME, developers created an interface class strategy that defines only the time-integration method, deferring to subclasses the implementation of the ac- tual quadrature schemes. The concrete strategy classes runge_kutta_2nd and runge_kutta_4th

71 Figure 4.4: The class diagram of CLiiME using the Unified Modeling Language (UML) notation: boxes indicate classes; panels within boxes indicate the class name, attributes (not show), and methods; lines connect related classes; solid diamonds indicate one class aggregates an instance of another class; and open triangles indicates one class extends another class.

72 provide the algorithm that represents a part of the time_advance method declared by the strategy interface. Because all of the strategy classes share the same interface, a client object can seamlessly access the algorithm offered by different strategy objects. This approach supports the TDD aim of writing minimally sufficient code. The Strategy pattern alleviates the need to embed conditional logic in the source code for branching to the appropriate strategy. Instead, the developer builds a type system in the form of an inheritance hierarchy and thereby gives the com- piler the ability to dynamically dispatch the appropriate strategy at runtime based on the subclass of strategy chosen.

The classes enclosed in the blue box in Figure 4.4 implement the Strategy pattern for time advancement, including one composition relationship and two inheritance (type extension) relationships. In object-oriented programming, developers often refer to composition relation- ships as “has a” relationships. Inheritance relationships are often described as “is a” relation- ships. Using this terminology, we can read these relationships in the pattern as follows: “The gaussian_elimination class has a strategy. The runge_kutta_4th class is a strategy, and the runge_kutta_2nd class is a strategy also.” At runtime, the application can substi- tute either of the two Runge-Kutta strategies for the other without any impact on the Gaussian elimination code. The compiler generates all related conditional branching.

The Surrogate pattern is very similar in concept to an Automated Teller Machine (ATM).

An ATM holds a surrogate database for bank information that exists in another place. The bank’s customer can perform transactions through the ATM and circumvent a visit to the bank. The Surro- gate pattern avoids Fortran’s prohibition against circular references, where one module references another module via a use statement. Similarly, C++ developers avoid circular dependency by using forward references. In CLiiME, the implementation of the Surrogate pattern introduces the

73 surrogate abstract class (virtual class in C++). Even though integrand has a component of class strategy (parent class of runge_kutta_4th and runge_kutta_2nd) surrogate allows developers to pass an integrand child class dummy argument to the type-bound proce- dures implemented in runge_kutta_4th and runge_kutta_2nd.

The Factory Method pattern is useful when a client object wants to create an instance of a parent class, but does not know, at design-time, which subclass to instantiate. The Factory Method pattern hides the details of object construction from the client, which facilitates customization.

This approach supports the Open/Closed principle by allowing developers to define a new class without having to modify the original code. For example, imagine an application that can use either a local database or a remote database. During execution, the user can choose which mode to use. When a user selects the local mode, the Factory Method constructs the objects required to work with a local database. When a user selects the remote mode, the Factory Method constructs the objects required to work with a remote database, which entails additional functionality (e.g., connecting the remote database, uploading information).

The Factory Method pattern suggests encapsulating the subclass selection and object con- struction processes into one class. The application exposes only the interface class, freeing the application from any direct ties to the subclasses. The interface superclass specifies all standard and generic behaviors and delegates the object-creation details to the subclasses. In Figure 4.4, the abstract class ienergy publishes the factory methods Energy() and Dmdt(). The concrete *Energy classes (i.e., oxidationEnergy, coatingEnergy, etc) implement the ienergy class and provide the concrete methods Energy() and Dmdt(). These methods construct the Energy object. The Factory

Method allows developers to implement new types of energy objects without having to modify the existing energy objects.

74 The development team frequently evaluated the refactored source code to ensure that the changes did not adversely affect the expected design. By extracting using ForUML to extract UML diagrams from the existing Fortran source code, developers could determine whether the code conformed to the design. Developers compared the class diagram obtained from ForUML with the UML class diagram created manually during the refactoring process to determine whether the implementation of design patterns matched the plans. Instead of manually inspecting the source code, which requires a large amount of effort, the developers made the comparison graphically, requiring less effort.

The developers also used ForUML to identify existing design patterns or candidates for new design patterns, bottom-up, from the code (i.e., patterns not intentionally inserted). After identifying a design pattern, the team discussed whether the pattern affected maintainability or extensibility. If the identified pattern did not improve the system, the developers refactored the code with another approach. For example, during a , the development team found that one developer used the Adapter pattern [Gamma et al., 1995] to address the possible various different algorithms. The Adapter pattern hides the complexities of the system and provides a channel to the client for accessing the sub-subsystems. The client can access different algorithms through the interface. Based on the discussion, the developers decided to replace this pattern with the Strategy pattern because the Adapter pattern requires developers to modify the interface in the future if the users have more algorithms.

In addition to helping the developers verify the design, ForUML also helped the develop- ers to identify code that might introduce defects in the future. Specifically, ForUML helped the developers identify problems, which compromise the software’s maintainability and extensibility.

The dependency problem can make it difficult to maintain or extend a class without breaking the

75 client’s code. The generated UML class diagrams helped the developers visually identify the po- tential relationship exhibiting the dependency. Once the developers identified the dependency, they redesigned the code structure and evaluated the outcome.

4.2.2 Lessons Learned

Based on the results of this study, this section reports six lessons learned along with rec- ommendations for other CSE developers to consider.

4.2.2.1 Agile development and TDD awareness

It is difficult to successfully introduce Agile practices to a team. All CLiiME developers had little prior knowledge of Agile and TDD. In the early development phase, the developers were not fully aware of the benefit of the Agile method and had some doubts about Agile practices. As a result, the developers did not immediately adopt the practices. For example, TDD requires that developers write approximately the same amount of test code as production code. The developers realized that TDD required additional work, but they did not see the benefits of TDD at the begin- ning of the project. Fortunately, the developers gained an understanding of the benefits of TDD while working on the project.

Here, I provide two specific examples. First, one developer who had little knowledge about the scientific domain, reported that he could write the scientific production code more eas- ily because the testing code provided the expected results. Second, another developer reported that refactoring the code helped reduce the number of types and the number of type-bound pro- cedures in many modules, subsequently, reducing the effort required to document those modules.

Therefore, I make the following recommendations to CSE developers:

76 1. Before the project starts, provide training sessions on Agile and TDD for developers to

explain the process and the benefits.

2. Have a mentor who knows Agile and TDD well can help developers come up to speed

significantly faster.

3. Encourage team members to add tests every time they find a problem, thereby increasing the

speed of finding the cause.

4.2.2.2 Tools for TDD

The TDD method relies on a set of tests that developers continually execute during devel- opment. It is helpful to be able to run the test suite automatically, without much effort. In this project, the team chose the CTest and CMake frameworks to perform unit testing. The test tools helped the developers to manage and run all tests easily. Currently, there are some useful tools for performing TDD in different environments.

With ForUML, the generated diagrams could help developers perform the refactoring tasks throughout the project. However, the development team believed that the specific refactoring tool designed for Fortran would be useful. The refactoring tools would allow developers to carry out the extremely difficult refactoring tasks safely and easily. Therefore, I make the following recom- mendations for CSE developers:

1. The team should carefully consider which tools will be most useful for the current project.

2. Although the team did not use other tools for handling the requirements or bug reports, I

77 believe that when the software becomes more complex and the project involves other parties,

additional tools will be necessary (e.g., Bugzilla 17, iConcur 18).

3. Developers should identify good refactoring tools (e.g., Photran). A single refactoring tool

may not provide an adequate way to improve the code.

4.2.2.3 Tailored Agile practices

Although Agile practices seem to fit CSE software development better than plan-driven ones, there is still a need to tailor the Agile practices according to the nature of the project. For example, the team had to establish a tolerance value and strategy for testing the computed results in PDEs. As TDD does not provide specific guidelines for evaluating the testing results, each team must determine the strategy for validating the results within the organization’s acceptable tolerance levels. Here, I suggest the following:

1. Other projects might use different testing methods for PDE. However, the team must consis-

tently employ such a method throughout the project.

2. Software developers need to examine themselves and their organization’s culture and then

properly tailor their practices.

4.2.2.4 Design Patterns

In the refactoring process, the development team used design patterns and subsequently assessed improvements in the design based on those design patterns. The use of design patterns helped improve CLiiME’s quality, particularly its extensibility. With design patterns, the team can effectively control potential changes and can make extensions much easier. These patterns could

17 http://www.bugzilla.org 18 http://www.iconcur-software.com

78 help developers avoid having to revise the code when changes occur. Additionally, I observed in practice that the design patterns served as a good team communication method. When the developers discussed different possible solutions to a software design problem, they used design pattern names and their intents as a concise way to effectively communicate concepts. Regarding the use of design patterns, I make these recommendations to CSE developers:

1. Be aware of problems that arise when using design patterns: increased code complexity and

unnecessary design patterns. A good understanding of the purpose of each design pattern

will minimize such problems.

2. With using design patterns for refactoring, however, developers may have to create new tests

or remove old ones. Refactoring in small steps helped them safely avoid breaking unit tests.

During refactoring in each step, developers might need to manually test the function rather

than using automated testing. Of course, developers need to run of all tests after completely

refactoring to confirm that the code is behaving properly.

4.2.2.5 Documentation of the system

Proper use of documentation with Agile methods has drawn a lot of attention. One common question is “how much documentation is enough?" In general, customers demand more documen- tation than needed, and developers do not produce that documentation well. Although the Agile methodology does not stress the need for documentation, I found that maintenance tasks require documentation to preserve critical information over time. Therefore, I make the following recom- mendations to CSE developers:

1. Developers need to document code that might cause a problem in the future. Developers

at least need to make maintenance documentation, providing necessary information for new

79 developers. The guideline the team used in CLiiME was to provide a description of every

class, every field, and every function. Each description must include the objective, input,

output, numerical equation (optional), and citation (optional). Additionally, rather than writ-

ing a comment, developers should remove or refactor unclear code. Leaving a comment on

unclear code is not a good practice because it will confuse other developers.

2. Developers should use a documentation tool to automatically convert code comments into

written documentation. In this project, the development team used Doxygen to generate

documentation.

4.2.2.6 Challenges in Agile and TDD practices

Because I did not measure defects and maintenance cost for this project, I cannot conclude whether the Agile practices require more maintenance costs than previous CSE projects developed at CRF. Currently, CLiiME contains 3,378 Lines of Code (LOC) for actual code and 2,160 LOC for testing code. Consequently, it implies that 40% of the code is additional code that must be maintained. These results are only over a 4-month project life span. For generalizability, most

CSE projects have a life span of several years, implying that the amount of testing code will also increase. Another important issue related to TDD is the quality of the test cases. In this project, the team did not explicitly investigate the quality of our test cases. For example, the code coverage analysis that can be used to see what code has been executed while automated testing is running.

Code coverage can also be used as a metric of how much testing performed.

A second problem was that sometimes developers skipped the refactoring process when a deadline was approaching. Requiring developers to include comments in the code implicitly forces them to at least refactor the code to make it more readable.

80 Therefore, I make the following recommendations to CSE developers and researchers:

1. Reducing the amount of testing code and maintenance costs provides an opportunity for

fruitful research.

2. Developers should periodically examine test case quality through scientific measurements,

such as code coverage and number of defects/bugs during development.

4.3 The survey of the effectiveness of TDD in the CSE community

This section reports the results from the survey described in Section 3.4. The survey was conducted on line using the Qualtric survey system19. I performed the pilot study with 15 ran- domly chosen participants as a test and to make the survey as easy as possible for respondents to complete. Only 4 participants responded to the survey. Based on these responses, I refined some questions according to the comments provided (e.g., regarding typos and ambiguous questions). I then distributed the survey to a total of 285 participants. After I sent the email to the target list, some of those participants informed me that they would forward the survey link to potential partic- ipants, and some posted the link on the CSE community web sites, such as http://www.software- carpentry.org. The survey was conducted from December 2013 to January 2014.

A total of 77 people responded to the survey. A majority of the respondents (64 of 77, or

83%) reported that they had experience with TDD. Of the respondents who did not have experience with TDD, three respondents planned to learn or employ TDD in the future. The 64 respondents who did have experience with TDD were asked about their experience of employing TDD in their projects. Figure 4.5 shows the distribution of the respondents’ experience with using TDD. The most popular response was that they had used it on multiple real projects.

19 http://universityofalabama.qualtrics.com

81 Figure 4.5: Experience of employing TDD

Also of interest, is the methods respondents used to learn TDD. As shown in Figure 4.6, the most common methods for learning TDD were online resources (49), reading from books (37), learning from co-workers (28), and taking a course (13).

Figure 4.6: Learning TDD methods

Furthermore, the respondents were asked whether they had employed TDD on a parallel

82 computing project. Of the 64 respondents, 30 respondents (47%) reported that they had employed

TDD on a parallel computing project, whereas the other 34 respondents had never employed TDD

on a parallel computing project.

The following sub-sections present the results of survey based on the 64 participants who

had experience with TDD.

4.3.1 Demographics

The survey respondents were located worldwide. I used the physical location generated by

the survey system, mapping each survey participant’s IP address to the physical location. There

were contributions from North America (40), Europe (20), South America (3), and Australia (1).

Figure 4.7 provides a more detailed breakdown. The countries in Europe included Finland, Cyprus,

France, Italy, and the Netherlands.

Figure 4.7: Respondent locations by country

The respondents were also asked to provide their organizations, type of projects, and edu- cational background. Figure 4.8 provides a summary of the types of organizations for which they

83 work. The types of organizations are universities, including all educational and teaching institu-

tions; industry and other companies; government laboratories; and non-profit organizations.

Figure 4.8: Type of organization worked for

Figure 4.9 summarizes the types of projects on which the participants worked. The type of projects include the following:

• Research - the main goal is to publish papers.

• Production - the main goal is to produce software for real users.

Each respondent could respond that they worked on either or both project types. More than 50% of the responses are work on research projects.

Figure 4.10 presents the highest level of education obtained and the fields in which that education was completed for the survey respondents. All 77 respondents provided information for their highest level of education; however, one respondent indicated that he/she does not have any degree. The majority of respondents have a Ph.D. degree and Ph.D. degrees in engineering are the most common.

84 Figure 4.9: Type of projects worked for

Figure 4.10: Highest level of education

The respondents were also asked to report their significant work experience in fields other

85 than their educational backgrounds. Figure 4.11) shows the results from the 39 respondents who answered this question.

Figure 4.11: Work experience

The CSE software development experience of the survey respondents ranges from one to forty-five years with a median of 10 years. Figure 4.12 provides a breakdown of this type of experience in years.

Figure 4.13 shows the respondents’ experiences with different programming languages.

The five popular programming languages that are most commonly used in multiple real projects are Python (38), C (35), C++ (31), Fortran (27), and MATLAB (15). In contrast, the respondents almost never used Smalltalk (61), C# (50), C#(47), Haskel (42), or VB (35).

4.3.2 Test-Driven Development

The following sub-sections describe the responses to questions regarding TDD, including software quality, testing, refactoring, benefits, and challenges.

86 Figure 4.12: Experience year of CSE software development

Figure 4.13: Programming languages used

87 Table 4.3: Software quality characteristics definitions Quality Definitions Compatibility The ability of two or more software components to exchange information and/or to perform their required functions while sharing the same hardware or software environment. Functionality The degree to which the software product provides functions that meet stated and implied needs when the software is used under specified con- ditions. Maintainability The degree to which the software product can be modified. Modifications may include corrections, improvements or adaptation of the software to changes in environment, and in requirements and functional specifications. Operability The degree to which the software product can be understood, learned, used and attractive to the user, when used under specified conditions. Performance The degree to which the software product provides appropriate perfor- mance, relative to the amount of resources used, under stated conditions. Reliability The degree to which the software product can maintain a specified level of performance when used under specified conditions. Security The protection of system items from accidental or malicious access, use, modification, destruction, or disclosure. Transferability The degree to which the software product can be transferred from one envi- ronment to another.

4.3.3 Software Quality Characteristics

The respondents were asked to rank the importance of software quality characteristics, pri- marily regarding these characteristics when working on projects. The following software quality characteristics described by the ISO-25010-2011 standard model were used: 1) Compatibility, 2)

Functionality, 3) Maintainability, 4) Operability, 5) Performance, 6) Reliability, 7) Security, and

8) Transferability. To ensure that the respondents understood the meaning of each quality char- acteristic, the survey provided descriptions of each quality according to the ISO/IEC 25010:2011 standard model [ISO/IEC, 2011] (Table 4.3).

All quality characteristics were rated by the 64 respondents. Each respondent mapped each software quality characteristic to a ranking number For example, a respondent might map

88 Functionality to the #1 ranking, Performance to #2 ranking, and so on. Figure 4.14 presents the ranking order chosen for these software quality characteristics by the respondents. The size of each circle represents the number of respondents.

Figure 4.14: The importance of quality characteristics

Thus, based on the figure, the most important software quality characteristic at each ranking number is the quality with the greatest circle size. The rankings are follows:

1. Functionality (43)

2. Reliability (20)

3. Performance (14)

89 4. Maintainability (19)

5. Compatibility (17)

6. Operability (16)

7. Transferability (18)

8. Security (34)

Figure 4.15 presents the ranking order of the most important software quality characteristics based on the TDD experience of respondents. Note that each ranking represents the number of votes for that ranking.

Figure 4.15: The importance of quality characteristics based on the TDD experience

Regarding the effectiveness of employing TDD in the respondents’ projects, the survey asked the respondents to rate the overall effectiveness of TDD regarding the most important quality characteristics. After analyzing the answers in detail, I categorized the responses as follows: 1)

Very Effective, 2) Effective, 3) Neutral, 4) Ineffective, and 5) Unable to evaluate. Figure 4.16

90 shows that overall the use of TDD was generally effective or very effective regarding the most

important software quality characteristics.

Figure 4.16: The effectiveness of employing TDD on software quality

More specifically, the survey also asked respondents to evaluate the effectiveness of refac- toring on the most important software quality characteristics mentioned in the previous paragraph.

As shown in the Figure 4.17, around 50% of the respondents (24) reported that refactoring was effective (i.e., Effective or Very Effective) regarding the most important software quality charac- teristic.

4.3.4 Difficulty of employing TDD

The survey asked the respondents to evaluate the difficulty of TDD activities, including writing a test, implementing the code, and refactoring. Figure 4.18 presents the difficulty ranking of each activity. The size of each circle represents the number of respondents who selected the response. The results show that ’writing a test’ was the most difficulty activity, followed by ’im-

91 Figure 4.17: The effectiveness of refactoring on software quality

plementing the code’, and finally ’refactoring’. Note I found that the TDD experience has no effect

on this question. All groups ranked the same orders.

Figure 4.18: The difficulty of TDD

Theoretically, the TDD process does not require software design before the code is im- plemented because the developers have to write the code corresponding to the test. In practice, however, the developers may need to design the system before testing or implementing actual code. In this context, software design is defined as the process of defining software methods, func-

92 tions, objects, and the overall structure and interaction of your code. To better understand about the software design activity in the TDD, the survey asked the respondents how often they design software before coding and while coding. As shown in Figure 4.19, most respondents often per- formed design activities both before and while implementing the code. This result indicates that respondents consistently break the TDD principle, which does not require a design stage before implementing the code.

Figure 4.19: Design when coding

93 4.3.5 Testing

This section reports the responses regarding the testing aspects, including understanding of testing, testing tools, and testing problems and its solutions. The following sub-sections detail the responses.

4.3.5.1 Understanding of testing

To determine whether users, especially CSE developers or scientists, understood testing in the same manner as software engineers, the survey asked whether the respondents agreed with the descriptions given for various testing methods. Table 4.4 presents the description of each testing method and the responses provided by the 64 survey participants.

Table 4.4: Testing definitions Testing Methods Definitions Agree (%) Disagree (%) Unit testing Testing the smallest testable units (e.g., 95.3 4.7 class, module, function) in a software in isolation. Usually done with a spe- cialized unit testing framework. Testing that occurs after Unit Testing 89.1 10.9 and is intended to ensure that the units interact properly. Rerunning test cases (which were suc- 85.9 14.1 cessful in the past) to ensure that changes to the code have not introduced bugs.

Most respondents (more than 85%) agreed with the definitions provided for each testing method. Based on the respondents’ explanations, the disagreements are summarized as follows:

• Unit testing - Only 3 respondents disagreed with the given definition. One disagreed with

the second sentence, stating that the framework should be changed to ‘general’ rather than

‘specialized’. Two respondents thought that ‘smallest’ was difficult and too restrictive.

94 • Integration testing - Seven respondents disagreed with the given definition. Five respon-

dents did not think that the integration testing was in addition to unit testing or necessarily

performed after unit testing. One respondent thought that ‘properly’ had no meaning, and

another one did not think that integration testing was important in CSE software develop-

ment.

• Regression testing - Nine respondents disagreed with the given definition. All of those re-

spondents said regression testing does not ensure that ‘bugs are not introduced’. Instead, it

only ensures that the tested invariants are preserved. Furthermore, regression testing does

not always fix everything.

4.3.5.2 Testing tools and techniques

Regarding the testing tools, 51 out of the 64 respondents reported that they used automated testing tools in their projects. The responses indicate that there are 24 different automating tools.

A list of those tools is provided in Table 4.5. Figure 4.20 presents only the 5 most commonly used automated testing tools, including:

• CMake (http://www.cmake.org) - A tool designed to build, test and package soft-

ware. It is used to control the software compilation process. CMake is invoked on the

project’s source directory, parse the text files describing the build process, and generate a

native build chain for the desired platform and compiler. CMake provides options to the user

with which the build process can be customized.

• CTest (http://www.cmake.org) - CTest is an automated testing tool distributed with

CMake. CTest can perform several operations, including configure, build, perform a list

95 of predefined runtime tests. It also includes advanced features for testing, such as code

coverage, memory checking.

• JUnit (http://www.junit.org) - JUnit quickly became the de facto standard frame-

work for writing and running automated tests in the Java programming language.

• Google Test or GTest (https://code.google.com/p/googletest/) - It is a frame-

work for writing the C++/C tests on a variety of platforms, including Linux, Window, OS X.

GTest provides various options for running the tests.

• Python Nose (https://nose.readthedocs.org) - Nose is a Python package that

provides an alternate test discovery and running process for unit tests.

• Python unittest (http://docs.python.org/3/library/unittest.html) - Python

unit test is a testing framework that is a Python language of JUnit.

• CxxTest (http://www.cxxtest.com) - CxxTest is a unit testing framework for C++

that is similar to JUnit. CxxTest supports a very flexible form of test discovery.

The respondents were asked to provide the testing methods that they used in their projects.

Figure 4.21 presents the testing methods employed by the respondents. Based on the responses, I have classified the testing methods into two categories:

• Multiple testing methods - The respondents employed many testing approaches with dif-

ferent objectives during software development. This category includes 1) Using Unit, Re-

gression, and Integration testing together (15), 2) Using Unit, Regression, Integration, and

System testing together (7), and 3) Using Unit and Regression testing together (4).

96 Figure 4.20: Automated testing tools

• Single method - The respondents used only one approach, including 1) Unit testing (13),

2) Performance testing (5), 3) Comparing known values (4), 4) Verification testing (3), 5)

Validation testing (3), 6) Evaluating the code coverage (3), 7) Automated testing tool (3),

8) White box (1), 9) Smoke test (1), 10) Pre-release testing (1), 11) Positive testing (1), 12)

Negative testing (1), Black box testing (1), and Ad-hoc testing (1).

4.3.5.3 Testing problems

The survey also asked the respondents about the problems and their solutions when they performed the testing process.

97 Table 4.5: Automated testing tools Tools Languages URL Boost C++ http://www.boost.org Buildbot Python http://buildbot.net CDash Global http://www.cdash.org Clover Java https://www.atlassian.com/software/ clover/overview CMake Global http://www.cmake.org CppUnit C++ http://cppunit.sourceforge.net CTest Global http://www.cmake.org/cmake/help/v2.8.8/ ctest.html CxxTest C++ http://cxxtest.com GCov Global http://gcc.gnu.org/onlinedocs/gcc/Gcov. html GTest C++ https://code.google.com/p/googletest/ Jenkins Java http://jenkins-ci.org Igloo C++ http://igloo-testing.org JUnit Java http://www.junit.org Lcov Global http://ltp.sourceforge.net/coverage/ lcov.php Marathon Java http://marathontesting.com Nose Python http://nose.readthedocs.org pFUnit Fortran http://pfunit.sourceforge.net Pytest Python http://pytest.org Python unit test Python http://docs.python.org Qtest Global http://www.qasymphony.com/qtest.html Sikuli Global http://www.sikuli.org Silk TestPart- Global http://www.borland.com/products/ ner silktestpartner/ Travis Global https://travis-ci.org XUnit .NET Frame- http://xunit.codeplex.com work

Figure 4.22 provides a breakdown of problems when creating the test. The respondents noted a range of testing issues when employing TDD. I analyzed the responses based on a qualita- tive process (described in Chapter 3 Section 3.3.4). I coded each response and organized the data into categories. I repeated these steps to find the most salient theme within each category. Based

98 Figure 4.21: Testing methods

on the qualitative analysis, the problems can be divided into two sectors: Effort and Technique.I summarized the problems described by the survey participants for each sector as follows:

1. Effort - The problems in this sector are related to the time and cost to developers of writing the test. The ‘Unclear requirements’ problem is categorized in this group because the developers require a greater amount of time to understand the requirements. The problems include the following:

Problem #1: Time consuming (3)

It takes more time to write the test, which affects budgets and planning. Adequate testing

99 Figure 4.22: Problems about making the test is a full-time job, especially when a new change is introduced by the customers or users on a tight schedule.

Problem #2: Unclear requirements (2)

The respondents indicated that the unit tests are difficult to create if the requirements are not well specified. One respondent stated that “TDD is only as good as your understanding of the requirements”.

Problem #3: Amount of test code (1)

A respondent indicated that the amount of test code is larger than the actual program code.

100 The testing code becomes a major barrier to improving the code. For example, a change in the pro-

gram that might involve 500 LOC could require changing 100 LOC of testing code. This amount

of code makes the developers less inclined to make changes in the production code that require

test modification. In particular, the research environment often requires developers to change their

code in ways that require test modifications as well.

2. Technique - This sector consists of problems related to the code, environment, software

design, and knowledge about testing. The problems include following:

Problem #1. Difficulty of writing a good test (19)

The main problem in writing a test is to write a good test or create test cases. Many

respondents explained that writing a good test can be more difficult than implementing the ac-

tual functionality, particularly for tests that do not have to be changed if the functional code is

changed. Furthermore, during refactoring, developers realize that the interface of a function re-

quires changes, which requires test modifications and up-to-date information. The respondents

also explained that testing without a tool or framework made it more difficult to write tests; thus,

developers were less likely to do a good job. From the perspective of senior developers, the ex-

perience of each new developer is different, so it is difficult to expect their tests to be written

well.

Problem #2: Code coverage (9)

Code coverage was the second most reported problem in writing the test. Theoretically, the test must cover most of the code, but respondents explained that the code coverage is not always

100%. In particular, it is difficult to provide comprehensive coverage for CSE code with many dynamic parts. One respondent reported that the code coverage analysis tool does not work for their C++ template code.

101 Problem #3: Complex function or code (6)

In CSE software, there are many parts in the code that consist of complex algorithms. The

complex code requires complicated tests. Many problems with complex code occur at production

runtime, such as deadlocks and repetitive connection drops; it is thus difficult to test these prob-

lems in advance. Another problem is that it is difficult to write tests for functions that produce a

considerable amount of text data.

Problem #4: Numerical computations (5)

The respondents explained that tests for numerical computations are difficult because the developers do not know the right answer with full confidence. Although automated testing tools are available to help developers write tests, some of those tools are difficult to use for numerical computations.

Problem #5: Code or requirement is changed (4)

The survey participants indicated that the test must be changed when the requirement or

API is changed. For example, if the algorithm changes, the test may no longer be sufficient because

the random number generator is used in a different manner.

Problem #6: Unfamiliarity with software engineering practices and tools (3)

In this problem, the respondents thought that writing the tests required them to understand

software engineering practices. Additionally, the existing automated testing tools are not user-

friendly for scientists or CSE developers.

Problem #7: Parallel computing (3)

The respondents explained that writing tests to examine concurrency issues in parallel com-

puting was difficult. In particular, the ‘right answer’ for a complete simulation is generally not

known a priori.

102 Problem #8: Difficult to validate the test (3)

In CSE software development, it is difficult to correlate verification tests and issues with validation tests, so developers focus on testing based on code functionality.

Problem #9: Awareness of writing tests (2)

One problem of writing tests is introducing unit testing when not all developers have expe- rience. In research context, developers from a research environment tend to be less disciplined in building and maintaining tests than required.

Problem #10: The tolerance is much larger (1)

In some cases, the developers must assume a tolerance for the computed output, but there is no standard for creating the tolerance value. Therefore, the tolerance may be considerably larger than would be ideal.

Problem #11: Knowledge about the problem domain (1)

One respondent explained that knowledge of the problem being solved is required before writing the unit test.

Problem #12: Different platforms (1)

It is difficult to write a test for all expected working cases, especially when executing the software on different platforms (e.g., OS, compilers).

The respondents were asked to provide the solutions to the problems they identified. The solutions given by respondents are summarized in Figure 4.23 based on a qualitative analysis.

The solutions provided by the respondents can be classified into two sectors: High-level

and Low-level. The solutions are summarized by sector as follows:

1. High-level - These solutions are not related to the code or software engineering tech-

103 Figure 4.23: Solutions of writing test problems nique, but to the policy or management plan of the project team or culture. These solutions include the following:

Solution #1: Use a suitable tool (11)

The respondents believed that a good testing framework or automated testing tool would facilitate the test writing considerably. The respondents also suggested that the tool would save time compared to writing the test manually. Additionally, they recommended choosing appropriate tools for the projects.

Solution #2: Learn more about writing tests (6)

104 The developers may need to learn the basics of why software is tested and what constitutes

a good test. In particular, developers in the research environment require mentoring or coaching on

writing the test. In addition to learning about the testing, knowledge of best practices or examples

would increase the testing skill.

Solution #3: Experience (5)

Many respondents indicated that writing a good test requires experience, particularly when testing the complex code or algorithms. Furthermore, experienced developers could confidently provide the team with the solutions to the testing problem.

Solution #4: Plan better (1)

In terms of project management, a project manager may need to reestimate the required

time and budgets.

2. Low-level - These solutions are directly related to the code and testing strategy, including

the following:

Solution #1: Understand the requirements (4)

A clear understanding of the requirements is necessary to write the test, particularly when

the developers are working on complex problems, such as numerical or multi-physics. As in-depth

understanding could improve the test coverage and reduce the required amount of time.

Solution #2: Gain confidence (3)

When developers work on simulation software involving floating point calculations and

parallelism, they must check simpler output to gain confidence. For example, to ensure that the

testing of a stochastic model is correct, scientists need to run multiple simulations and analyze the

output.

Solution #3: Redesign the test (3)

105 When the developers feel that the code coverage is not good, they need to redesign the unit test. This process requires another developer to reviews the test in addition to the code review process.

Solution #4: Combine the unit testing and system testing (1)

One respondent recommended using a combination of unit testing and system testing to solve the code coverage problem.

Solution #5: Include bugs as testing failures (1)

One solution that can help developers write test cases is to count the bug reports as failed tests. Thus, the developers can create more test cases.

I mapped these solutions to the 5 most common problems reported by the respondents

(ordered by the number of responses). Table 4.6 presents the mapping between problems and so- lutions. The check mark represents the solutions provided by survey participants. The respondents believe that the suitable tools and experience would help them to solve the problems involved with writing tests. Some problems did not have a solution. However, based on my own experience,

I suggested additional solutions to some problems (represented by exclamation marks in the ta- ble). The last column present the number of solutions for each problem, including respondents’ solutions and my suggestions.

4.3.6 Refactoring

This section reports the responses regarding refactoring, including methods, tools, prob- lems, and solutions.

4.3.6.1 Refactoring methods

The survey asked the respondents to provide the refactoring methods that they applied to their projects. The survey offers a list of well-known refactoring methods along with the ‘Other’

106 Table 4.6: Mapping testing problems and solutions

Problems Use a suitable tool Learn more about writing tests Experience Plan better Understand requirements Gain confidence Redesign the test Combine the unit testing and system testing Include bugs as testing failures Total (Respondent, Suggestion) Difficulty of writing a good test X X ! - ! X - - ! 3,3 Code coverage X - ! ! - - ! X X 3,2 Complex function - - ! - - ! - - - 0,2 Numerical computations - - X - - ! - - - 1,1 Code or requirement is changed ------X - - 1,0 option. The survey participants could select several choices. The well-known refactoring methods are as follows:

• Breaking large methods up into smaller methods

• Renaming methods, class variables

• Simplifying control structure (e.g., series of ‘if’ statements or nested loops, etc)

• Creating an encapsulated field (e.g., using setter methods to make public member data pri-

vate)

• Splitting large classes (e.g., move parts of the code from an existing class into a new class)

• Adding or removing parameters from a method

107 • Moving methods or fields of a class to a super class

• Moving methods or fields of a class to a sub-class

• Applying the design patterns

The most common refactoring technique is Breaking large methods up into smaller meth- ods (56). Interestingly, 20 out of the 64 respondents employed design patterns in their systems.

The most common design pattern that respondents used is the Factory Method pattern (5). In ad- dition, 15 other GoF design patterns were reported by the respondents, namely, Strategy, Visitor,

Decorator, Facade, Template Method, Adapter, Singleton, Observer, Command, Abstract Factory,

Mediator, Prototype, and Proxy patterns. Two design patterns, Surrogate and Object were not among the reported GoF patterns.

The respondents were also asked whether they applied other techniques to improve the code. Figure 4.24 shows the responses from the 36 respondents who did use other techniques.

Surprisingly, only two respondents noted that they used a tool (e.g., Fortify) to improve the code in terms of security vulnerabilities. This evidence conforms to the responses regarding software quality (Section 4.3.3) in which security is the least important attribute in CSE software develop- ment.

To understand which methods the respondents used to identify poor code in need of refac- toring, the survey asked the respondents to provide the methods that they used to identify poor code. Based on the qualitative analysis (as shown in Figure 4.25), I classified the methods into two main levels, Runtime and Static.

The methods that respondents used to identify poor code are summarized by level as fol- lows:

108 Figure 4.24: Additional methods to improve the code

1. Runtime - The approaches in the runtime level identify poor code when the program is executing or running. The following approaches are included in this level:

Approach #1: Poor performance (16)

The respondents use performance to indicate whether the system includes poor code. The major method used to measure the performance is to run the profiling tool. In addition to employing the tool, the respondents also observed the system during execution, noting such problems as unreasonable execution times.

Approach #2: Number of bugs or defects (8)

109 Figure 4.25: The methods of identifying poor code

When the program returned incorrect results or displayed unexpected behaviors (e.g., spo- radic shutdown), the developers examined the code related to that result. Similarly, the respondents tracked which code trended to cause crashes or errors most often. Bug reports from colleagues and users were also used to identify poor code.

2. Static - The static method is the primary analysis of the code or document during devel- opment, including the following:

Approach #1: Code or peer review (47)

110 Code review is the most common method that the respondents used to identify poor code.

The peer (colleagues or team leader) review method is also used to review the code. The respon- dents reviewed the code daily or periodically. Rather than reviewing the code through normal editors (e.g., Vim), some respondents reviewed the code in an IDE with syntax highlighting to make certain issues stand out, such as repetitive copy-and-paste code or routines that declare large numbers of poorly named variables.

While reviewing the code, the respondents used the following methods:

• Comparison with the guidelines or best practices. The respondents compared the written

code with the available guidelines. There are typically guidelines that apply to either global

programming languages or to the specified language. Some common good software engi-

neering practices were also used as guidelines to identifying the poor code, e.g., lack of

modularity, lack of separation of concern, large argument list in procedure calls, and lack of

encapsulation.

• Finding duplicated code. Finding the code where multiple code segments were serving

nearly the same purpose that could be refactored into a single routine or method.

• Finding complex code. Complex code is not a positive sign when a system is critical to the

software performance.

• Finding code that is difficult to understand. If the code is hard to understand within a rea-

sonable amount of time, it is considered poor code.

Approach #2: Code that is difficult to modify (7)

111 Here, a common principle that I found is that if a developer must modify several portions of code to make a single change, then the code is considered poor. Another symptom describing poor code is the limitations in the existing code when adding or extending new features to the system.

These limitations will hinder or prevent further changes.

Approach #3: Lack of documentation (4)

A system that lacks design document tends to have poor code because the developers do not understand the existing system well, and thus, modifications will result in many problems.

Approach #4: Software design review (3)

Rather than reviewing the code, reviewing the design would help developers identify poor code earlier in the process. The software design is reviewed in company when inspecting the requirements. Good design should conform to the given requirements. One respondent explained that design reviews with users could help developers to identify poor design quickly.

Approach #5: Code that is difficult to test (3)

The respondents stated that non-testable code was often identified as poor code.

Approach #6: Code that is difficult to maintain (2)

The respondents assume that the code is not written well if it requires a considerable amount of time to maintain.

Additionally, some respondents used code analysis tools (2) to identify the poor code or potential areas that might be poor code, but they did not specify the tools.

4.3.6.2 Refactoring problems

The respondents were asked to identify problems and their solutions in performing the refactoring process. Figure 4.26 provides a breakdown of the problems faced when refactoring the code.

112 Figure 4.26: Problems about refactoring

These problems can be divided into two sectors: Effort and Technique. The problems of

each sector are summarized as follows:

1. Effort - The problems in this sector are related to the time and cost when the developers

were refactoring the code. This sector includes the following problems:

Problem #1: Time consuming (7)

The respondents explained that continually refactoring code requires more time to perform a good refactoring. One reason provided by respondents is that it is difficult to know when to stop refactoring. For example, one respondent reported that he/she took over one day to successfully

113 run a large group of functions and classes that shared considerable functionality or depended on each other.

Problem #2: Little incentive for academic projects (4)

There is little incentive for refactoring code in academic projects under pressure to publish papers. When implementing CSE software, it is often difficult to design it in advance unless the developers are implementing known methods and processes. Scientists or researchers in CSE fields rarely revise previous code after the paper is published.

Problem #3: Requirement to update the test (3)

In some cases, developers must update the test and related script files after refactoring the code. This problem breaks the rule of TDD that the test should not be changed once passed.

Problem #4: Refactoring requires coordination across the team (2)

The development team must share the same concept of code refactoring. If each devel- oper does not follow the same concept, the refactoring returns unsatisfactory results to the other developers.

2. Technique - The problems in this sector consist of problems related to the code, envi- ronment, software design, and knowledge about refactoring. This sector includes the following problems:

Problem #1: Dependence on unit tests (8)

The respondents indicated that the process of refactoring is difficult if the unit tests are not well designed because the code implementation is driven by the test rather than by the requirements or specifications. However, software is nearly impossible to refactor without thorough unit tests that are readily available. Similarly, it is also hard to perform refactoring if the code coverage of the testing is poor.

114 Problem #2: Dependence on the software design (7)

Although testing is essential to refactoring, poor design makes refactoring difficult or im- possible. Furthermore, the respondents indicated that if the initial design is poor, it is occasionally necessary to redesign the software and revisiting large portions of it. Based on the respondents’ experience, poor design resulted in refactoring in a number of different projects.

Problem #3: Dependence on the development environment (4)

Refactoring depends on the environment in which the developers are working, including the platform, programming languages, and interacting components. For example, one respondent reported that refactoring an application implemented with C is more difficult than refactoring an application implemented with Python.

Problem #4: Lack of appropriate tools (3)

The respondents indicated it is difficult to refactor without tools. For example, providing code to a large number of users requires a certain level of API compatibility with older versions.

Refactoring is impossible without the appropriate tools.

Problem #5: Legacy code (3)

Refactoring is difficult when the respondents are working with a diverse legacy code base.

More specifically, for legacy code, the developers may not actually have adequate tests to prove that the system is behaving correctly.

Problem #6: Lack of knowledge regarding how to refactor (3)

One common problem is a failure to understand refactoring and view it as as beneficial.

There is lack of knowledge of how to refactor (at all and/or efficiently) or why refactoring is con- ducted (because it does not provide functionality). Additionally, refactoring may require knowl- edge of advanced programming techniques, e.g., template code and functional programming.

115 Problem #7: Difficult to know when to start refactoring (3)

The respondents found that it is difficult to decide when to start refactoring. Problems are exacerbated when refactoring is delayed. If the refactoring affects many developers’ code, it is difficult to mitigate the problem within a short period of time.

The respondents were asked to provide the solutions to the identified problems. The solu- tions given by the respondents are summarized in Figure 4.27.

Figure 4.27: Solutions of refactoring

The solutions provided by the respondents can be classified into two sectors: High-level

and Low-level. The solutions in each sector are summarized as follows:

116 1. High-level - These solutions are not related to the code or software engineering tech- niques but are part of the policy or management plan of the project team or culture.

Solution #1: Coaching developers (4)

Education about refactoring is necessary. In the context of a research environment, the team leader may need to convince the members that refactoring saves time and improves the software quality. Experienced developers should provide guidance and help the other developers when they

find problems.

Solution #2: Communication with each other (3)

Developers in the team should communicate with each other to discuss refactoring before it occurs. Before starting the projects, the developers should have a common vision for refactor- ing the code. One respondent also indicated that version control software can help developers to understand the changes made by others before refactoring.

Solution #3: Set up a suitable environment (1)

Setting up the development environment to support the changes in the project would in- crease the speed of the refactoring process. For example, the Agile methodology allows each team member to modify others’ code without permission.

2. Low-level - These solutions are directly related to the code and testing strategy, including the following:

Solution #1: Use refactoring tools (7)

The respondents explained that automated refactoring tools would be useful, particularly when working on large and complex applications. Additionally, the refactoring should be able to integrate with the IDEs.

Solution #2: Redesign the software (4)

117 In some cases, it is very difficult to refactor poor code. One solution is to redesign the software. The respondents indicated that redesigning the software might save time compared to refactoring code in some cases.

Solution #3: Appropriate time for refactoring (3)

Generally, refactoring can be performed at anytime during the development process. There is no rule regarding the time for refactoring. The respondents’ suggestions indicate there are three options for refactoring:

1. Multiple-round refactoring - Performing several rounds of refactoring would provide better

results than would performing only a few rounds.

2. Refactoring when the code begins to become disorganized - One suggestion is to start refac-

toring when the code begins to become disorganized.

3. Refactoring shortly after release - One respondent typically does most refactoring shortly

after release, before he/she becomes tangled in new features for the next release.

Solution #4: Redesign the unit test (2)

Revisiting the unit test would help developers reduce the time involved in refactoring, es- pecially redesigning some of the worst tests.

Solution #5: Use simple techniques (1)

To reduce the effect of differing experience among developers, each developer should try to use simple refactoring methods initially.

I mapped the 5 most common problems and solutions provided by the respondents in Ta- ble 4.7. The check mark represents the solutions provided by survey participants. I also suggested additional solutions to some problems based on my experience (represented by exclamation marks

118 Table 4.7: Mapping refactoring problems and solutions

Problems Coaching developers Communication with each other Set up a suitable environment Use refactoring tools Redesign the software Appropriate time for refactoring Redesign the unit test Use simple techniques Total (Respondent, Suggestion Dependence on unit tests ------X - 1,0 Time consuming ! - - X - - - - 1,1 Dependence on software design - - - - X - - - 1,0 Little incentive for academic projects ! - ! - - - - - 1,1 Dependence on the development environment - - X ! - - - - 1,1 in the table). For each problem the last column present the number of solutions, including respon- dents’ solutions and my suggestions.

Thus far, we can identify some problems common to both refactoring and the testing pro- cess. Table 4.8 highlights some of those problems and the number of responses. The common problems include 1) Time and Effort, 2) SE practices, and 3) Awareness.

Table 4.8: Common problems in refactoring and testing Common Problems Refactoring Testing Time and Effort Time consuming Time consuming Unclear requirements Amount of test code Requirement to update the test Coordination across the team SE practices Lack of knowledge Code coverage Lack of appropriate tools Unfamiliarity with SE practices and tools Awareness Little incentive for academic Awareness of writing tests projects

119 4.3.7 Benefits and Challenges of TDD

The survey asked the respondents to describe the benefits and challenges of employing

TDD in their projects.

4.3.7.1 Benefits

Figure 4.28 presents the common benefits of employing TDD. The benefits are classified

into two perspectives: Project management and Software quality. Each perspective is summarized as follows:

Figure 4.28: Benefits of employing TDD

120 1. Project management - This perspective includes benefits that impact the team or the

progress of the project rather than the system code itself. These benefits include the following:

Benefit #1: Increase the speed of software development (3)

Because the developers do not need to gather all requirements at the beginning, the project

can be started quickly. The team leader could estimate the extent to which a piece of code is

completed to plan the schedule appropriately. The ability to identify non-passed tests based on the

given requirements would help the team to rapidly solve the problem. One survey participant, for

example, made checklists to indicate progress.

Benefit #2: Reduce the testing team (1)

TDD could reduce the need for an independent testing team when the organization has a

small number of professionals on a project team. This benefit also applies to the researchers who

lack a team of testing experts or the budget to build a large development team.

Software quality - This perspective includes benefits that impact the quality of the system,

including the following:

Benefit #1: Ensure the quality of code (22)

The respondents stated that while implementing the system, they gain in confidence by be-

ing able to test very specific code behavior. Tests are written in conjunction with the development

of functionality, providing valuable regression notifications when combined with continuous inte-

gration testing. For examples, one respondent said, “The main benefit is that I have confidence in my software. Quite literally, I sleep better at night”.

TDD enforces the discipline of performing systematic testing during software development and writing testable code. Many respondents also underscored that having unit tests before writing code is better than writing tests at the end of the process, when it is more difficult to trust the

121 code. TDD allows developers to safely change the code and let it evolve without the worry of breaking more parts of the code with each new change. For example, one respondent explained that TDD is superior to the other approaches with which he/she is familiar because, “The code

I wrote using TDD was much more thoroughly tested than other code I have written, which is a definite advantage”.

Benefit #2: Improve software reliability (13)

Developing the software under TDD increased confidence that new capabilities would not break existing capabilities, results, or interfaces. Over long term, the testing artifacts that are produced will be checked to ensure that the modification does not break the original function. A software package with thorough and easy-to-use tests can be refactored with confidence over time, thereby improving the longevity of the product, because aggressive refactoring can be performed with confidence. Otherwise, software without the test suite can became extremely fragile and cannot be refactored with confidence.

Benefit #3: Improve software maintainability (11)

Several respondents believed that having unit tests greatly improves maintainability. Refac- toring makes it easy to add features to the software, aids in debugging, and facilitates the fixing of future problems. The tests also provide documentation describing the test driver, the input/output, and examples of executing a program.

Benefit #4: Identify problems early (10)

The respondents reported that they could find problems/bugs earlier in the development process. With TDD, the respondents could remove the bug or adjust the correctness at the be- ginning of the project, reducing the overall number of bugs. One respondent reported that he/she could also find performance bottlenecks quickly.

122 Benefit #5: Better understanding of software requirements (10)

TDD requires the developers to understand the requirements clearly before writing the code. The developers gained a deeper understanding of the actual problem before coding. They were also automatically forced to think through edge cases before any writing the code and thus encountered fewer surprises later in the process.

Benefit #6: Improve software compatibility (3)

The test suite helps developers quickly check that a piece of code serves the correct purpose in different environments (e.g., the function is not broken by changes in a linked library, compiler or compiler options, or little vs. big endian).

Benefit #7: Cleaner design (2)

During the refactoring process, some respondents reported that they had the opportunity to redesign the software to remove potential problems.

Benefit #8: Better support for OOP than procedural languages (1)

A respondent mentioned that it is easier to apply TDD to a system being implemented with an object-oriented programming language (e.g., C++, Java) than one implemented with a procedural programming language. The respondent further explained that testable code is more natural with OO than with non-OO. The benefits are further affected by the widespread availability of automated testing tools and refactoring tools. For example, Java appears to lead in unit testing with JUnit, which is well integrated with many IDEs (e.g., Eclipse, NetBeans).

4.3.7.2 Challenges

Figure 4.29 presents the challenges that were frequently encountered when employing

TDD. Based on these responses, I classified the challenges into two perspectives: Project man- agement and Technique. The challenges fore each perspective are summarized as follows:

123 Figure 4.29: Disadvantage of employing TDD

1. Project Management - Project management includes challenges that impact the team or the progress of the project rather than the system ocde itself. The challenges in this perspective include:

Challenge #1: Spending an excessive amount of effort (30)

Generally, several survey respondents reported excessive time spent writing the test, and

thus, the developers often skipped testing in the final stages. Additionally, adding TDD to an exist-

ing project requires enormous effort. One respondent noted that that TDD can only be performed

by developers who understand what code does in detail and are willing to invest time in code qual-

124 ity. TDD works well if the developers know exactly what the code should produce, which requires

the developers to spend and excessive amount of time on the requirements or specifications of the

project. In terms of project management, it might be difficult to demonstrate overall progress to

the customers or users.

In an academic environment, writing a test case prior to writing actual code can detract from

time and resource constraints on obtaining and publishing research results in the short term. Sim-

ilarly, there is not a sufficient amount of research funding to write the amount of testing code that

real TDD would require. There is a trade-off between the rigorous adoption of TDD and idea ex-

ploration for the development of functions with a significant research component to them. Further-

more, CSE-based simulation software development groups consist of domain experts/researchers

and computer scientists/programmers. TDD does not come naturally to many researchers. The

adoption of TDD requires substantial effort to promote the vision and maintain the necessary dis-

cipline.

Challenge #2: Learning curve (11)

TDD appears to be a new concept in the CSE domain, and therefore, there is a learning curve. Managing and creating tests before writing the code, for example, is a new experience for many scientists and CSE developers. Few scientists and researchers understand the benefits of applying proper TDD, and thus, there is a need to learn about TDD (e.g., how to perform it and its benefits). It is also difficult to convince large organizations of its financial benefits, especially organizations that are not very interested in software engineering practices, which might be the most pervasive problem with software engineering (not only TDD) in CSE projects.

Adequate investment in TDD training could shorten the learning curve. User training may be necessary to facilitate the transition of scientists/researchers to TDD.

125 Challenge #3: Need to set up a new environment (6)

Some survey participants reported that the organizations must invest in creating an adequate

TDD environment and maintaining a consistent level of testing across all developers. TDD cannot be successful as a part of project management without support from management and developers.

Although some of the benefits of TDD tend to accrue to a development team or research group as a whole over time, an individual developer might view it as significant additional work with only a modest short-term payoff. Additionally, some organizations (e.g., laboratories) employ sub-contractors to implement a system. TDD works when both parties operate from a common set of rules for engagement. Therefore, the goal, process, and protocols must be established for collaboration between the employers and sub-contractors.

Challenge #4: Amount of code (2)

The amount of code could be two- or three-fold greater than the amount of code that the developers would write without TDD; furthermore, the developers have to maintain the test code.

The increased amount of code has a higher cost of maintenance.

Challenge #5: Lack of flexibility (1)

A respondent explained that developers need to follow specific steps (e.g., writing the test, writing the code, refactoring), and TDD requires greater discipline to write the test. Thus, TDD is not convenient for developers compared to other software development approaches.

Challenge #6: Lack of software design (1)

TDD does not motivate the developers and designers to write design and technical docu- mentation, and therefore, in some cases, the test is poorly documented, which creates problems.

2. Technique - This perspective consists of challenges related to the code, environment, software design, and knowledge of refactoring, which include the following:

126 Challenge #1: Difficult to implement (16)

Most examples of this type of challenge involve the testing, as described below:

• It is difficult to write the unit test early in the software development process.

• It is difficult to write the test in the integration step of large numerical computations or com-

plex functions. In CSE software, for example, the output is often computationally expensive;

there are issues with numerical precision, and in some cases, it is impossible to compute the

expected output. Additionally, the context can change between the tests and code, especially

in the development of complex code. Several respondents noted that it is extremely difficult

to test around concurrency issues.

• It is difficult to test research code because the results are typically not previously known.

• It is difficult to write tests that capture all aspects of complex capabilities (code coverage).

Challenge #2: Lack of appropriate tools for numerical software (2)

In the CSE software development environment, the developers lack appropriate tools to readily perform TDD. The currently available tools are typically not ideal for working with nu- merical software with regard to tolerance issues, which are not handled adequately. In Fortran development, for example, only a few unit testing frameworks are currently maintained, and few

Fortran developers actually have and use any of these frameworks.

4.3.8 Summary

The results of this survey provide empirical evidence of the effectiveness of TDD in CSE software development. Of the 77 respondents who participated in the survey, 64 respondents in- dicated that they had experience with TDD. The effectiveness of TDD on software quality was

127 also evaluated by respondents. Table 4.9 summarizes the main findings in various aspects of TDD based on the responses.

Table 4.9: Summary of survey results Areas Summary Effectiveness • Functionality is the most important software quality characteristics. • TDD has effectiveness in improving the most important software qual- ity characteristic. Difficulty • Writing the test is the most difficult activity in the TDD. • Many participants violated the principle of TDD that the test should not be modified after the test passed. Another violation is participants always made the software design either before or whiling implement the code. Testing • In addition to creating the unit test, participants adopted several testing methods. There are 3 testing methods that are used concurrently, includ- ing: 1) unit testing, 2) regression testing, and 3) integration testing. • During performing the testing, participants have used automated testing tools, such as CMake, CTest, Google Test, and JUnit • The testing problems are classified into two categories: 1) Effort and 2) Technique. The most 5 problems found in the testing process are: 1) Difficulty of writing a good test, 2) Code Coverage, 3) Complex func- tion or code, 4) Numerical computations, and 5) Code or requirement is changed. All these problems are classified in the techniques problems. I have expected that Time consuming was the most testing problems, but a few respondents mentioned this problem. Refactoring • Most participants used simple techniques to refactoring the code. The most technique is Breaking large methods up into smaller methods. • To identify the poor code before refactoring, participants provided many techniques within two groups, including Runtime and Static. The results show that Code or peer review is the most methods used by re- spondents. Regarding the refactoring problems, the list of most 5 problems includes: 1) Dependence on unit tests, 2) Time consuming, 3) Dependence on the software design, 4) Little incentive for academic projects, and 5) Dependence on the development environment. Benefits The most 5 benefits of TDD are 1) Ensure the quality of code, 2) Im- prove software reliability, 3) Improve software maintainability, 4) Iden- tify problems early, and 5) Better understand software requirements Challenges The most 5 challenges of TDD are 1) Spending an excessive amount of effort, 2) Difficult to implement, 3) Learning curve, 4) Need to set up a new environment, and 5) Amount of code.

128 Regarding the challenges of employing TDD in the CSE software development, some chal- lenges were not faced by the traditional software community. Causevic et al. [Causevic, Sundmark, and Punnekkat, 2011] performed a systematic review to identify the factors limiting the adoption of TDD in the software industry. The review includes 48 primary studies concerning TDD. The researchers identified the factors limiting the adoption of TDD as follows:

1. Increased development time

2. Insufficient TDD experience/knowledge

3. Insufficient design

4. Insufficient developer testing skills

5. Insufficient adherence to the TDD protocol

6. Domain- and tool-specific limitations

7. Legacy code

Not all of the problems observed from the survey are included in this list; there are also additional problems that might apply specifically to the context of CSE software development, including the following:

• Lack of SE practices - In traditional software development, developers might have more SE-

related background than scientists or CSE developers. Therefore, they are not limited by this

problem.

• Implementing in parallel computing environment - This problem occurs in testing process

in a CSE environment, where parallel computing is often utilized. This problem should be

129 carefully studied by software engineering researchers because existing testing methods may

not be suitable for this environment.

• Awareness of writing tests - The goal of CSE software is to publish the paper or to present the

prototype, and thus, the researchers or scientists ignore the task of writing tests. In contrast,

traditional software products are intended for release to customers for use over a long period,

and thus, testing is a critical activity in the software development process.

• Coordination across the team - In CSE software development, the project might involve both

scientists who know about programming and others who have never developed software.

The implementation of TDD in CSE projects requires everyone to understand the process

and benefits of TDD.

• There is little incentive for refactoring in academic projects - Researchers in the academic en-

vironment must work under tight schedule and funding constraints, and thus, the refactoring

stage of the TDD process is always overlooked when the deadline approaches. Addition-

ally, the quality of the code in the software is not a priority because the software is only the

means to produce the results of their research. Conversely, the quality of traditional software

is extremely important, particularly for commercial products.

• The requirements are not clear - This problem might appear especially in the CSE environ-

ment because scientists or developers cannot know all of the system features in advance.

More specifically, the requirements are based on the results of experiments. The require-

ments can also be changed often. In contrast, in traditional software development, most

often the requirements are available at the beginning of the project. Although the require-

130 ments cannot be completed, commercial developers might predict the needs of customers

more easily than CSE software developers.

These problems indicate that the adoption TDD in a CSE environment requires more atten- tion from CSE developers than from traditional software developers. To minimize problems, both the technical techniques and the management strategies mentioned in the previous sections will be necessary.

4.4 Case Study: TDD in the Microscopy Imaging Processing project.

This section describes the results of the case study defined in Section 3.3. There were two active developers on this project. The developers in this project have never previously used

TDD. One has developed software as a part of a team during a course. He has written programs for classwork in Java, C, and Ruby on Rails. He has been working to gain scientific software development experience for 4 months. Both of the developers learned TDD immediately before starting this project. Overall, each developer worked a total of four weeks.

4.4.1 The effectiveness of TDD

On average, the first and second developers completed only 50% and 35% of the planned work in each week, respectively. In terms of new actual code (not testing code) that the developers wrote during each week, the first and second developers added an average of 10% and 20%, respec- tively. On average, they added only one new test case each week. Based on the survey responses, the developers did not report any problems with the TDD process. However, the developers rated that TDD was ineffective with regard to productivity.

131 4.4.2 Testing

Overall, the developers reported that they did not rewrite existing test cases or add a new test case after refactoring. Under this project, the developers did not use automated testing frameworks.

They built their own scripts to test the code. These scripts had to be changed manually when they created a new test case.

I have analyzed the files in the code repositories and found that the developers did create a few test cases. Based on these observations, I found that the code, even the modules implemented in C++, was not implemented in the object-oriented style. The developers did not create a test for every function or program. Many parts of the code use third-party libraries. This observation confirmed the survey responses that the developers did not know how to write the test cases for complex code.

Based on the survey responses, the developers described the problems they encountered when performing the testing as follows:

• It was difficult to write the unit test when considering floating point calculations and paral-

lelism. Additionally, it was difficult to determine the right answer for a complete simulation.

• Writing a good test was often more difficult than implementing the actual functionality,

especially when it involved resources that are difficult to simulate (e.g., message passing in

a parallel environment). It was necessary to invest some time in implementing a collection

or reusable simulations and harnesses.

• If the functionality changes to any non-trivial degree, then many tests must be re-written.

132 4.4.3 Refactoring

Based on the responses, the primary refactoring methods that the developers used are as

follows:

• Simplifying control structures

• Renaming methods, variables, or classes.

However, they never evaluated the refactoring results. As I observed the changes to the repositories,

developers always inserted the comments into the code providing information about each function

(e.g., input, output, and purpose). Figure 4.30 provides an example of the source code. This

practice can also help the code to be easily readable and understandable.

1 f u n c t i o n STORM_image_mix(fdir , finalList , imsize , over_sampl_co , constrain , precision) 2 % PURPOSE: 3 % Function to overlay the STORM image and original TIRF image 4 %------5 % USAGE: 6 % STORM_image_mix(fdir, finalList, imsize, over_sampl_co, constrain, precision) 7 %------8 % INPUTS: 9 % fdir= h5 file directory 10 % finalList= all the fitted list 11 % imsize= original image size 12 % over_sampl_co= over sampling coefficient 13 % constrain= constrain or not 14 % precision= if constrain, save the image with the precision 15 %------16 % OUTPUTS: 17 % no output is provided, the image will be saved 18 %------19 % REFERENCES: 20 %------21 % REMARKS: 22 %------

Figure 4.30: Sample code snippet of Matlab

Both developers performed refactoring for two reasons: 1) to fix the problems and 2) to

improve the performance, including bugs and performance issues. The developers rated refactor-

ing as effective on the productivity. They provided the following opinions about the benefits of

refactoring activities:

133 • Refactoring became easier when the test was in place, and it was a useful tool for thinking

about an improved design, implementing it, and obtaining testing feedback.

• Refactoring changed the design process to include early testing and refactoring.

• Refactoring helped the developers to identify where problems occur as development pro-

gresses.

However, they encountered the following challenges in performing refactoring:

• Refactoring required considerable practice. It was not well understood. There was a lack

of knowledge regarding how to refactor. Teaching or education was needed regarding the

refactoring techniques.

• Any refactoring resulted in a complete rewrite of the tests that were affected, and thus in-

creased the amount of time spent on the code.

• The most tedious part of refactoring was updating the test scripts.

• Good refactoring was time consuming and difficult, especially without a robust tool. Fur-

thermore, the refactoring tool should be integrated with the IDEs.

• Refactoring was difficult when working with non-modular code, especially in the case of

non-modular data.

• One challenge was arbitration regarding the ownership of data. In some portions of the code,

it was difficult to design the interfaces to ensure simultaneous encapsulation and flexibility.

• Poor software design makes refactoring very difficult or impossible.

• It was difficult to refactor the code in MATLAB.

134 4.4.4 Benefits and Disadvantages of TDD

Based on the survey responses, I summarized the benefits of TDD as follows:

• TDD requires developers to think about the quality of software.

• TDD provides a good opportunity to write software that achieves what users want it to

achieve.

However, the developers indicated the following disadvantages of employing TDD in this project:

• TDD was difficult when there was a significant change in the intentions or expectations

during development.

• TDD required more time to be spent up front.

• TDD required discipline during development. Additionally, it required the developers to

have experience in software development and software engineering practices.

• It was difficult to use TDD in the project because the developers lacked appropriate tools.

4.4.5 Summary

This goal of this project was to develop a CSE application utilizing parallel computing.

The study indicates that it was difficult to implement TDD in the parallel computing environment, especially the testing. The results agree with the survey results, with the following conclusions:

1. Problems in writing tests when the developers are using parallel computing - In this project,

the developers often violated the rules of TDD by not writing the test case before writing

135 the actual code. The problem of writing test cases for a parallel environment was the main

reason for this violation.

2. Time consuming - The developers did not want to spend time writing test cases. Additionally,

one of the two developers is graduate student, so he often skipped the problems to save time

during software development.

3. Lack of SE practices - Both developers indicated that they need to learn about software

engineering practices to adopt the TDD process.

136 Chapter 5

FINDINGS

This chapter presents the findings and recommendations drawn from the analysis of the data detailed in Chapter4. Based on the studies, I report the following five primary findings along with recommendations for the CSE community.

Finding 1: The effectiveness of TDD on CSE software development depends on the type of CSE project.

In terms of software quality characteristics, TDD is primarily effective for functionality

(described in the survey study Section 4.3.3). TDD helps developers build software with strong confidence that software performs all of its intended functions correctly. Based on the studies,

TDD is more effective on general CSE projects than for parallel computing projects. Using the

TDD process, developers might produce two- to three-fold more test code than actual code and may generate thousands of unit tests. Moreover, testing and refactoring are difficult for software of a large size, particularly when testing around concurrency issues. For example, the mathematical software framework provides such a wide variety of components for the user to choose from that it is nearly impossible to consider the full coverage of test cases. Therefore, TDD is not suitable for large CSE projects, except in the presence of robust tools and experienced developers.

Recommendation: Best practices or guidelines might help developers successfully employ

TDD in parallel computing projects. In large projects, the developers should perform refactoring

137 often. The use of multiple testing methods would also help developers ensure that nearly all of the code was tested.

Finding 2: Although many developers applied refactoring techniques to their project, most of the techniques were very simple.

Because complex refactoring techniques require advanced software engineering practices,

CSE developers do not use difficult techniques (described in the survey study Section 4.3.6). More specifically, developers in the academic environment do not consider refactoring essential.

Recommendation: Additional TDD training is needed for scientists and researchers. Fur- thermore, it is important to convince the development team of the benefits of refactoring. The developers benefit from the coupling of their training with hands-on experience.

Finding 3: Writing the test is the most difficult activity in the TDD process.

It is difficult to write a good test, especially when floating point calculations, numerical methods, and parallelism are involved (the survey study Section 4.3.5 and the microscopy image processing project Section 4.4). Writing a good test requires a thorough understanding of the requirements. Furthermore, when there is pressure from the users to provide new features in a compressed schedule, it can be difficult to spend the additional time required to define and imple- ment adequate unit tests. In TDD, passed tests should not be changed. Unfortunately, in many cases, it is not practical to leave passed tests unchanged, especially after the code is refactored.

Recommendation: At a minimum, a unit test must 1) be built fast, 2) run fast, and 3) isolate errors. Additionally, automated testing tools protect the software going forward when compared to the manual testing.

Finding 4: Refactoring is difficult for legacy code.

Here, I define legacy code as any code without a unit test. In the legacy code, develop-

138 ers may not actually have adequate tests to demonstrate correct behavior (the survey study Sec- tion 4.3.5). Generally, refactoring efforts cannot proceed incrementally without testing in the form of easily checked unit tests. Furthermore, running validation tests for incremental changes is often a large and time-consuming task, thereby reducing the efficiency of the refactoring process. More- over, a considerable amount of existing legacy code is implemented with procedural programming languages (e.g., C, COBOL, Pascal, and Basic). Procedural languages often lack the classes of

OO programming languages. One method used to improve the legacy code is managing the code dependencies carefully. However, it is difficult to break dependencies in procedural code.

Recommendation: Developers must write tests for existing code before they begin refac- toring. I recommend the refactoring strategy for legacy code explained by Feathers [Feathers,

2004]. The following steps must be taken before refactoring legacy code:

1. Identify change points - Identify the code that must be changed depending on the expected

design.

2. Break dependencies - The goal of this step is to break fundamental dependencies to allow

the code being changed to be inserted into a unit test.

3. TDD steps - Writing a failing test, making the test pass, and refactoring.

Feathers further explained that developers should establish the habit of refactoring into tests. Refactoring into tests is similar to an initial test design. For example, developers should create a test for each small function and then create small functions in the actual code (extraction method) to make the test pass.

Finding 5: CSE developers need refactoring tools.

139 The evidence indicates that many CSE developers view refactoring tools as essential during the refactoring process (described in 2 case studies Section 4.2 and Section 4.4). Automated refac- toring tools would help CSE developers save the time and effort. Unfortunately, some of existing tools are limited to a few programming languages (e.g., Java, C++).

Recommendation: Software engineers should develop an automated refactoring tool specif- ically designed for the CSE development. Issues, such as version control, automated testing tools, and other software engineering tools should be considered before starting the CSE project.

140 Chapter 6

SUMMARY

This chapter presents a summary of the study followed by the contributions and future studies.

6.1 Summary

The purpose of this study was to determine whether TDD supports CSE software develop- ment. Qualitative case studies and survey designs were selected to gather data from focus group developers, survey responses, and current development projects. One case study was conducted at the Sandia National Laboratories, where I have been involved in the development team. The survey study included 77 respondents, including researchers, scientists, and private companies. Another case study was conducted at the Ohio Supercomputer Center, where the data were primarily col- lected through surveys and source code.

The results of those studies have been presented as they relate to research questions. The main research question of this dissertation is What is the empirical evidence of the effect of TDD and refactoring techniques on the improvement of CSE software development? The summaries of answers by sub-question are follows:

1. What is the effectiveness of TDD?

TDD requires developers to thoroughly understand requirements before writing a test. Ac- cording to the name of this software development methodology, Test-Driven Development, devel-

141 opers have been driven by tests, rather than specifications. This approach guides developers in

writing only the code needed to pass the tests and thus meet the requirement.

More specifically, it seems that the TDD is effective for particular software quality char-

acteristics, including Functionality, Reliability, Performance, and Maintainability. When de-

veloping software with testability in mind, the result is that the software is extensible, flexible,

and maintainable. For software to evolve with new demands, the software should support new

augmented functionalities with little cost. The ability to support such changes is very important,

especially in a maintainable manner. Refactoring for maintainability should work to simplify the

code wherever possible. Additionally, the refactoring that is an essential process in TDD helps

developers remove the poor code that introduces the performance problem in the future.

2. What refactoring technique does the developer use to improve the code?

Many CSE developers used simple techniques to refactoring the code. CSE developers also

applied other additional techniques to improve the code, such as Profiling, Static code analysis,

Code or peer review, Inserting comments, and Refactoring tools. Based on the survey study, the

most common technique to improve the code was Code or peer review.

Design Patterns, likely a new methodology in the CSE world, are also applied to the system being developed. Not only GoF design patterns were used, but also innovated design patterns were developed for a specified programming language, such as Fortran.

3. What are the difficulties of using TDD?

Writing the test is the most difficult task in the TDD process. The main problem in writing a test is to write a good test or create test cases. Writing a good test can be more difficult than implementing the actual code, particularly for tests that do not have to be changed if the functional code is changed. Additionally, testing without a tool or framework made it more difficult to write

142 tests. However, the existing automated testing tools are not user-friendly for scientists or CSE developers. Other factors related the testing are Code coverage, Complex function or code, Nu- merical computations, and Code or requirement is changed. Furthermore, writing tests to examine concurrency issues in parallel computing is difficult.

I also found that CSE developers faced challenges regarding refactoring. The main chal- lenge is that the process of refactoring is difficult if the unit tests are not well implemented. Re- gardless the benefits of refactoring, there is little motivation for refactoring code in an academic environment. When implementing a CSE application, it is often difficult to design it in advance unless the developers are implementing known methods and processes. Scientists or researchers in the CSE domain rarely revise previous code after the paper is published.

4. How does the TDD method support the CSE software development process?

TDD can help the development team start the project quickly, because the developers do not need to gather all requirements at the beginning. TDD enforces the discipline of performing systematic testing during software development and writing testable code. In addition to improve the software quality (e.g., reliability, maintainability, compatibility), TDD could reduce the need for an independent testing team when the organization has a small number of professionals on a project team.

Tests are written in conjunction with the development of functionality, providing valuable regression notifications when combined with testing. Additionally, TDD helps developers remove the bug or adjust the correctness at the beginning of the project, reducing the overall number of bugs.

143 6.2 Contribution

The development of CSE software has received increased attention in the past few years.

Even so, there is still rooms in this domain for improvement. The main contribution of this work is to provide empirical information about adopting TDD in CSE projects. The results of this study will prove beneficial to the CSE community of scientists and developers when selecting software development methodologies for their CSE projects. The lessons learned from the case studies pre- sented here will provide the practical suggestions to other practitioners and developers who are using TDD in their projects. More evidence will enable developers to make better decisions rel- ative to TDD in the future. From the researcher’s perspective, the empirical evidence will help the researchers understand the benefits and drawbacks of TDD in CSE software development.

This study may help researchers invent new techniques or improve existing software engineering practices for CSE characteristics. The evidence of this work should be of interest to researchers, because adopting TDD in CSE projects is not always positive. The problems that survey partici- pants provided highlight the need for additional empirical evaluations of adopting TDD in various contexts.

As a secondary contribution, this work is related to a new software tool that has been developed to support CSE software development. The developed tool can be used to extract the software design from Fortran code. The extracted designs should support the maintenance of their software throughout the software development process. Since the Fortran 2003 provides all of the concepts of OOP the SE tool likes ForUML has emerged to place Fortran and other OOP languages on equal fingerprints.

144 6.3 Future studies

Because I have only conducted case studies on the small projects (in terms of the number of developers and size of software), the findings can only be generalized to a limited extent. Various case studies would provide more generalizable results. In the future, I plan to conduct case stud- ies on different CSE projects, especially with projects that have more developers and involve other parties (e.g., subcontractors or other organizations). In addition to investigating the effectiveness of the TDD process on broader CSE software development aspects, I plan to investigate the effect of employing TDD on particular software quality characteristics (e.g., functionally and maintainabil- ity). Another interesting direction of further research would be an empirical human-based study of the impact of TDD on CSE software development.

Regarding ForUML, I plan to address the limitations. I also plan to conduct human-based studies to evaluate the effectiveness and usability of ForUML for other members of the CSE soft- ware developer community. To encourage the wider adoption and use of ForUML, I am investi- gating the possibility of releasing it as open source software. This direction should help me obtain additional feedback about the usability and correctness of the tool. Demonstrating that ForUML is a realistic tool for large-scale computational software will make it an even more valuable contribu- tion to both the SE and CSE communities.

Furthermore, I also plan to study in other related areas, including follows:

and quality insurance in parallel computing

• The impact of design patterns on CSE applications

• Agile software development approaches in the CSE environment

145 • Automated refactoring tools for the CSE and HPC software development

6.4 Publications

Thus far I have published one referred journal paper:

• Aziz Nanthaamornphong, Jeffrey C. Carver, Karla Morris, Hope Michelsen, and Damian

Rouson. 2014. Building CLiiME via Test-Driven Development: A Case Study. Computing

in Science & Engineering. (Accepted for Publication)

I have also published four refereed workshop papers:

• Aziz Nanthaamornphong, Karla Morris, and Salvatore Filippone. 2013. Extracting UML

class diagrams from object-oriented Fortran: ForUML. In Proceedings of the International

Workshop on Software Engineering for High Performance Computing in Computational Sci-

ence and Engineering. Denver, Colorado, USA, pp. 9-16.

• Aziz Nanthaamornphong. 2013. A pilot study: design patterns in parallel program devel-

opment. In Proceedings of the International Workshop on Software Engineering for High

Performance Computing in Computational Science and Engineering. Denver, Colorado,

USA, pp. 17-20.

• Aziz Nanthaamornphong, Karla Morris, Damian Rouson, and Hope Michelsen. 2013. A

case study: Agile development in the community laser-induced incandescence modeling

environment (CLiiME). In Proceedings of the 5th International Workshop on Software En-

gineering for Computational Science and Engineering. San Francisco, California, USA, pp.

9-18.

146 • Aziz Nanthaamornphong and Jeffrey C. Carver. 2011. Design Patterns in Software Main-

tenance: An Experiment Replication at University of Alabama. In Proceedings of the 2nd

International Workshop on Replication in Empirical Software Engineering Research, Al-

berta, Canada, pp. 15-24.

147 REFERENCES

Abrahamsson, P., O. Salo, J. Ronkainen, and J. Warsta (2002). Agile software de- velopment methods - Review and analysis. Technical Report 478, VTT PUBLICA- TIONS, Espoo, Finland: Technical Research Centre of Finland, Available online: http://www.inf.vtt.fi/pdf/publications/2002/P478.pdf.

Alalfi, M., J. Cordy, and T. Dean (2008, October). SQL2XMI: Reverse engineering of UML-ER Diagrams from relational database schemas. In Proceedings of the 15th Working Conference on Reverse Engineering, Antwerp, Belgium, pp. 187–191.

Alalfi, M., J. Cordy, and T. Dean (2009, April). Automated reverse engineering of UML Se- quence Diagrams for dynamic web applications. In Proceedings of the International Confer- ence on Software Testing, Verification and Validation Workshops, Denver, Colorado, USA, pp. 287–294.

Ampatzoglou, A., S. Charalampidou, and I. Stamelos (2013). Research state of the art on GoF design patterns: A mapping study. Journal of Systems and Software 86(7), 1945 – 1964.

Astels, D. (2003). Test Driven development: A Practical Guide. Prentice Hall Professional Tech- nical Reference.

Barbieri, D., V. Cardellini, S. Filippone, and D. Rouson (2012, September). Design Patterns for Scientific Computations on Sparse Matrices. In Proceedings of the International Conference on Parallel Processing, Bordeaux, France, pp. 367–376.

Basili, V. R. (2007, June). The role of controlled experiments in software engineering research. In Proceedings of the International Conference on Empirical software Engineering Issues: Critical Assessment and FutureDirections, Dagstuhl Castle, Germany, pp. 33–37.

Beck, K. (2000a). Emergent control in extreme programming. Cutter IT Journal 13(11), 22–25.

Beck, K. (2000b). Extreme Programming Explained: Embrace Change. Boston, Massachusetts, USA: Addison-Wesley Longman Publishing Co., Inc.

Beck, K. and C. Andres (2004). Extreme Programming Explained: Embrace Change (2nd Edi- tion). Addison-Wesley Professional.

148 Beck, K. and M. Fowler (2000). Planning Extreme Programming (1st ed.). Boston, Mas- sachusetts, USA: Addison-Wesley Longman Publishing Co., Inc.

Bhat, T. and N. Nagappan (2006, September). Evaluating the efficacy of test-driven develop- ment: industrial case studies. In Proceedings of the International Symposium on Empirical Software Engineering, Rio de Janerio, Brazil, pp. 356–363.

Brainerd, W. S. (2009). Guide to Fortran 2003 Programming (1st ed.). Springer Publishing Company, Incorporated.

Britcher, R. N. (1990, October). Re-engineering Software: A Case Study. IBM Systyem Jour- nal 29(4), 551–567.

Carver, J. (2011, October). Development of a mesh generation code with a graphical front-end: A case study. Journal of End User Computing 23(4), 1–16.

Carver, J. C. (2009). Report: The second international workshop on software engineering for CSE. Computing in Science and Engineering 11(6), 14–19.

Carver, J. C., R. P. Kendall, S. E. Squires, and D. E. Post (2007, May). Software development en- vironments for scientific and engineering software: A series of case studies. In Proceedings of the 29th International Conference on Software Engineering, Minneapolis, Minnesota, USA, pp. 550–559.

Causevic, A., D. Sundmark, and S. Punnekkat (2011, March). Factors limiting industrial adop- tion of test driven development: A systematic review. In Proceedings of the 4th International Conference on Software Testing, Verification and Validation, Berlin, Germany, pp. 337–346.

Chivers, I. D. and J. Sleightholme (2012, December). Compiler Support for the Fortran 2003 and 2008 Standards Revision 11. SIGPLAN Fortran Forum 31(3), 17–28.

Cockburn, A. (2004). Crystal Clear a Human-powered Methodology for Small Teams (First ed.). Addison-Wesley Professional.

Damm, L.-O. and L. Lundberg (2006, July). Results from introducing component-level and test-driven development. Journal of Systems and Software 79(7), 1001– 1014.

Damm, L.-O. and L. Lundberg (2007). Quality impact of introducing component-level test au- tomation and test-driven development. In P. Abrahamsson, N. Baddoo, T. Margaria, and R. Messnarz (Eds.), Software Process Improvement, Volume 4764 of Lecture Notes in Com- puter Science, pp. 187–199. Springer Berlin-Heidelberg.

149 Decyk, V. and H. Gardner (2007). A factory pattern in Fortran 95. In Y. Shi, G. Albada, J. Don- garra, and P. Sloot (Eds.), Computational Science, Volume 4487 of Lecture Notes in Com- puter Science, pp. 583–590. Springer Berlin-Heidelberg.

Decyk, V. K. and H. J. Gardner (2008, April). Object-oriented design patterns in Fortran 90/95: mazev1, mazev2 and mazev3. Computer Physics Communications 178(8), 611–620.

Decyk, V. K., C. D. Norton, and B. K. Szymanski (1997a, April). Expressing Object-Oriented Concepts in Fortran 90. SIGPLAN Fortran Forum 16(1), 13–18.

Decyk, V. K., C. D. Norton, and B. K. Szymanski (1997b, October). How to Express C++ Concepts in Fortran 90. Scientific Programming 6(4), 363–390.

Decyk, V. K., C. D. Norton, and B. K. Szymanski (1998). How to Support Inheritance and Run-time Polymorphism in Fortran 90. Computer Physics Communications 115, 9–17.

Desai, C., D. Janzen, and K. Savage (2008, June). A survey of evidence for test-driven develop- ment in academia. SIGCSE Bull. 40(2), 97–101.

Don, W. (2009a). Extreme programming: A gentle introduction. http://www. extremeprogramming.org. Accessed February 2014.

Don, W. (2009b). The rules of extreme programming. http://www. extremeprogramming.org/rules.html. Accessed February 2014.

Douglas, B. (2003). Supercomputing at Boeing commercial airplanes past successess and fu- ture challenges. http://www.nitrd.gov/nitrdgroups/images/c/cd/SC03_ hecrtf_dball.pdf. Accessed January 30, 2014.

Duffy, E. B. and B. A. Malloy (2005, August). A Language and Platform-Independent Ap- proach for Reverse Engineering. In Proceedings of the 3rd ACIS International Conference on Software Engineering Research, Management and Applications, Mt. Pleasant, Michigan, USA, pp. 415–423.

Eager, C. (2007). The DWARF debuggin standard. http://www.dwarfstd.org. Accessed Decem- ber 2013.

Easterbrook, S. (2007, November). Empirical research methods for software engineering. In Proceedings of the 22nd International Conference on Automated Software Engineering, At- lanta, Georgia, USA, pp. 574–574.

Ebert, C., P. Abrahamsson, and N. Oza (2012, September). Lean software development. IEEE Software 29(5), 22–25.

150 Eclipse (2013). Photran - an integrated development environment and refactoring tool for for- tran. http://www.eclipse.org/photran/. Accessed December 2013.

Feathers, M. (2004). Working Effectively with Legacy Code. Upper Saddle River, New Jersey, USA: Prentice Hall PTR.

Filippone, S. and A. Buttari (2012, August). Object-Oriented Techniques for Sparse Matrix, Computations in Fortran 2003. ACM Transactions on Mathematical Software 38(4), 1–20.

Fowler, M. (1999). Refactoring: Improving the Design of Existing Code. Boston, Mas- sachusetts, USA: Addison-Wesley Longman Publishing Co., Inc.

Gamma, E., R. Helm, R. Johnson, and J. Vlissides (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Boston, Massachusetts, USA: Addison-Wesley Long- man Publishing Co., Inc.

Gansner, E., E. Koutsofios, S. North, and K.-P. Vo (1993, March). A Technique for Drawing Directed Graphs. IEEE Transactions on Software Engineering 19(3), 214–230.

Gilbert, P. (1986). Software Design and Development. USA: SRA School Group.

GNU (2012). Gnu fortran. http://gcc.gnu.org/fortran/. Accessed February 2014.

Hairer, E., M. Roche, and C. Lubich (1989). Runge-kutta methods for differential-algebraic equations. In The Numerical Solution of Differential-Algebraic Systems by Runge-Kutta Methods, Volume 1409 of Lecture Notes in Mathematics, pp. 14–22. Springer Berlin Hei- delberg.

IBM (2014). Ibm Fortran compiler family. http://www-03.ibm.com/software/ products/en/fortcompfami/. Accessed February 2014.

Intel (2014). Intel Fortran compiler. http://software.intel.com/en-us/ fortran-compilers. Accessed Bebruary 2014.

ISO/IEC (2011). Systems and Software Engineering – System and Software Quality Require- ments and Evaluation (SQuaRE) – System and Software Quality Models. Technical Report ISO/IEC 25010:2011.

Jacobson, I., G. Booch, and J. Rumbaugh (1999). The Unified Software Development Process. Boston, Massachusetts, USA: Addison-Wesley Longman Publishing Co., Inc.

Joseph, E., A. S. and C. G. Willard (2004). Council on competitiveness study of U.S. In- dustrial HPC Users. http://www.compete.org/images/uploads/File/PDF% 20Files/HPC_Users_Survey%202004.pdf. Accessed January 30, 2014.

151 Kasunic, M. (2005). Designing an Effective Survey. Technical report, Handbook CMU/SEI- 2005-HB-004, Software Engineering Institute, Carnegie Mellon University, Available on- line: http://www.sei.cmu.edu/reports/05hb004.pdf.

Kerievsky, J. (2004). Refactoring to Patterns. Pearson Higher Education.

Kollanus, S. (2010, September). Test-driven development - still a promising approach? In Pro- ceedings of the 7th International Conference on the Quality of Information and Communi- cations Technology, Porto, Portugal, pp. 403–408.

Korshunova, E., M. Petkovic, M. van den Brand, and M. Mousavi (2006, October). CPP2XMI: Reverse Engineering of UML Class, Sequence, and Activity Diagrams from C++ Source Code. In Proceedings of the 13th Working Conference on Reverse Engineering, Benevento, Italy, pp. 297 –298.

Koskela, L. (2007). Test Driven: Practical TDD and Acceptance TDD for Java Developers. Greenwich, Connecticut, USA: Manning Publications Co.

Laplante, P. A. and C. J. Neill (2004, February). The demise of the waterfall model is imminent. Queue 1(10), 10–15.

Lethbridge, T. C., S. E. Sim, and J. Singer (2005, July). Studying software engineers: Data collection techniques for software field studies. Empirical Software Engineering 10(3), 311– 341.

Lethbridge, T. C., S. Tichelaar, and E. Ploedereder (2004). The dagstuhl middle metamodel: A schema for reverse engineering. Electronic Notes in Theoretical Computer Science 94(0), 7 – 18.

Madeyski, L. (2010). Test-Driven Development: An Empirical Evaluation of Agile Practice (1st ed.). Springer Publishing Company, Incorporated.

Markus, A. (2006, April). Design patterns and fortran 90/95. SIGPLAN Fortran Forum 25(1), 13–29.

Markus, A. (2008, November). Design patterns and Fortran 2003. SIGPLAN Fortran Fo- rum 27(3), 2–15.

Mattson, T., B. Sanders, and B. Massingill (2004). Patterns for Parallel Programming (First ed.). Addison-Wesley Professional.

Maximilien, E. M. and L. Williams (2003, May). Assessing test-driven development at IBM. In Proceedings of the 25th International Conference on Software Engineering, Portland, Oregon, USA, pp. 564–569.

152 McBreen, P. (2002). Questioning Extreme Programming. Boston, Massachusetts, USA: Addison-Wesley Longman Publishing Co., Inc.

Metcalf, M., J. Reid, and M. Cohen (2011). Modern Fortran Explained (4th ed.). NYC, New York, USA: Oxford University Press, Inc.

Meyer, B. (1997). Object-oriented Software Construction (2nd Ed.). Upper Saddle River, New Jersey, USA: Prentice-Hall, Inc.

Michelsen, H. A. (2003). Understanding and predicting the temporal response of laser-induced incandescence from carbonaceous particles. The Journal of Chemical Physics 118(15), 7012–7045.

Miller, G. G. (2001, July). The characteristics of agile software processes. In Proceedings of the 39th International Conference and Exhibition on Technology of Object-Oriented Languages and Systems, Santa Barbara, California, USA, pp. 385–387.

Morris, K., D. W. Rouson, M. N. Lemaster, and S. Filippone (2012, October). Exploring Capabilities within ForTrilinos by Solving the 3D Burgers Equation. Scientific Program- ming 20(3), 275–292.

Müller, H. A., J. H. Jahnke, D. B. Smith, M.-A. Storey, S. R. Tilley, and K. Wong (2000, June). Reverse Engineering: A Roadmap. In Proceedings of the Conference on The Future of Software Engineering, Limerick, Ireland, pp. 47–60.

Nagappan, N., E. M. Maximilien, T. Bhat, and L. Williams (2008, June). Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empirical Software Engineering 13(3), 289–302.

Nanthaamornphong, A., K. Morris, D. Rouson, and H. Michelsen (2013, May). A case study: Agile development in the community laser-induced incandescence modeling environment (CLiiME). In Proceedings of the International Workshop on Software Engineering for Com- putational Science and Engineering, San Francisco, California, USA, pp. 9–18.

National Energy Research Scientific Computing Center (2014). Cray compiler. http://www. nersc.gov/users/software/compilers/cray-compilers. Accessed Febru- ary 2014.

National Science Foundation (2012). Cyberinfrastructure for 21st Century Science and Engi- neering Advanced Computing Infrastructure (Vision and Strategies Plan). http://www. nsf.gov/pubs/2012/nsf12051/nsf12051.pdf. Accessed February 2014.

Neunzert, H., A. Klar, and J. Struckmeier (1995). Particle methods: Theory and applications. Technical Report 95-113, Fachbereich Mathematik, Universität Kaiserslautern, Germany.

153 Numerical Algorithm Group (1970). Nag compiler. http://www.nag.com. Accessed February 2014.

Object Management Group, I. (1997a). Object Management Group (OMG). http://www.omg.org. Accessed December 2013.

Object Management Group, I. (1997b). OMG Model Driven Architecture (MDA). http://www.omg.org/mda/. Accessed December 2013.

Opdyke, W. F. (1992). Refactoring object-oriented frameworks. Ph. D. thesis, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA.

Orchard, D. and A. Rice (2013, October). Upgrading fortran source code using automatic refac- toring. In Proceedings of the International Workshop on Refactoring Tools, Indianapolis, Indiana, USA, pp. 29–32.

Ortega-Arjona and J. Luis (2010). Patterns for Parallel Software Design (1st ed.). Wiley Pub- lishing.

Overbey, J., S. Xanthos, R. Johnson, and B. Foote (2005). Refactorings for fortran and high- performance computing. In Proceedings of the 2nd International Workshop on Software Engineering for High Performance Computing System Applications, St. Louis, Missouri, USA, pp. 37–39.

Overbey, J. L., S. Negara, and R. E. Johnson (2009, May). Refactoring and the evolution of fortran. In Proceedings of the International Workshop on Software Engineering for Compu- tational Science and Engineering, Vancouver, British Columbia, Canada, pp. 28–34.

Pacione, M. (2004, May). Software Visualization for Object-Oriented Program Comprehension. In Proceedings of the 26th International Conference on Software Engineering, Scotland, UK, pp. 63–65.

Parr, T. J. and R. W. Quong (1995, July). ANTLR: A Predicated-LL(k) Parser Generator. Soft- ware: Practice and Experience 25(7), 789–810.

Patterson, R. and D. Cox (2005). Visualization of an f3 tornado within a simulated supercell thunderstorm. In ACM SIGGRAPH 2005 Electronic Art and Animation Catalog, NYC, New York, USA, pp. 248–249.

Pfleeger, S. (1995). Experimental design and analysis in software engineering. Annals of Soft- ware Engineering 1(1), 219–253.

Robson, C. (2002, March). Real World Research: A Resource for Social Scientists and Practitioner-Researchers (Regional surveys of the world) (2 ed.). Wiley-Blackwell.

154 Rouson, D., J. Xia, and X. Xu (2011). Scientific Software Design: The Object-Oriented Way (1st ed.). NYC, NY, USA: Cambridge University Press.

Rouson, D. W., J. Xia, and X. Xu (2010). Object Construction and Destruction Design Patterns in Fortran 2003. Procedia Computer Science 1(1), 1495 – 1504.

Rouson, D. W. I., H. Adalsteinsson, and J. Xia (2010, January). Design Patterns for Multiphysics Modeling in Fortran 2003 and C++. ACM Transactions on Mathematical Software 37(1), 3:1–3:30.

Sanchez, J. C., L. Williams, and E. M. Maximilien (2007, August). On the sustained use of a test-driven development practice at IBM. In Proceedings of the AGILE, Washington, D.C., USA, pp. 5–14.

Sanders, R. and D. Kelly (2008, July-August). Dealing with risk in scientific software develop- ment. IEEE Software 25(4), 21–28.

Scrum (2009). Scrum software development. https://www.scrum.org. Accessed Febru- ary 2014.

Segal, J. (2004). Professional End User Developers and Software Development Knowledge. Technical Report 2004/25, Open University UK.

Sletholt, M., J. Hannay, D. Pfahl, and H. Langtangen (2012, March-April). What Do We Know about Scientific Software Development’s Agile Practices? Computing in Science Engineer- ing 14(2), 24 –37.

Storey, M.-A. (2006, September). Theories, Tools and Research Methods in Program Compre- hension: Past, Present and Future. Software Quality Control 14(3), 187–208.

Tonella, P. and A. Potrich (2001, November). Reverse Engineering of the UML Class Diagram from C++ Code in Presence of Weakly Typed Containers. In Proceedings of the Interna- tional Conference on Software Maintenance, Florence, Italy, pp. 376–385.

Trish, B. (2004). Hunt for the supertwister. http://access.ncsa.illinois.edu/ Stories/supertwister/index.html. Accessed January 30, 2014.

Weidmann, M. (1997). Design and performance improvement of a real-world, object-oriented C++ solver with STL. In Y. Ishikawa, R. Oldehoeft, J. Reynders, and M. Tholburn (Eds.), Scientific Computing in Object-Oriented Parallel Environments, Volume 1343 of Lecture Notes in Computer Science, pp. 25–32. Springer Berlin-Heidelberg.

155 Williams, L., E. M. Maximilien, and M. Vouk (2003, November). Test-driven development as a defect-reduction practice. In Proceedings of the 14th International Symposium on Software Reliability Engineering, Denver, Colorado, USA, pp. 34–46.

Yin, R. K. (2002, December). Case Study Research: Design and Methods, 3rd Edition (Applied Social Research Methods, Vol. 5) (3rd ed.). SAGE Publications, Inc.

Zhang, C. and D. Budgen (2012, September-October). What Do We Know About the Effective- ness of Software Design Patterns. IEEE Transaction on Software Engineering 38(5), 1213 –1231.

Zhang, M. and L. Hochstein (2009, October). Fitting a workflow model to captured development data. In Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, Lake Buena, Florida, USA, pp. 179–190.

156 Appendices

157 Appendix A FORUML - SCREENSHOTS Screenshots from the ForUML tool. FigureA illustrates how a user can select multiple Fortran source files for input to ForUML. The Add button opens a new window to select the target file(s). The users can remove the selected file(s) by selecting the Remove button. The Reset button clears all selected files.

Figure A.1: Selection of the Fortran Code

After selecting the source files, the user chooses the location to save the generated XMI document (.xmi file). The Generate button activates the transformation process. During the pro- cess, the user can see whether each given source file is successfully parsed (FigureA). Once the XMI document has been successfully generated, the user can view the class dia- gram by selecting the View button. FigureA illustrates the UML class diagram that is automatically represented in the editing pane with ArgoUML, which allows the users to refine the diagram and then decide to either save the project or export the XMI document, which contains all of the mod- ified information.

158 Figure A.2: Generating the XMI

Figure A.3: View the UML class diagram

159 Appendix B WEEKLY SURVEY

160

Weekly Survey - Test-Driven Development (TDD)

Respondent's name

Have you worked on the project last week? ! Yes ! No If No is selected, then skip to end of survey

Part I - Progress

1. How much of your planned work did you complete last week? ______Completed Work (%)

2. Did you add any new functionality last week? ! Yes ! No

3. What percent of the code you wrote last week followed the TDD process? ______Code (%)

4. How many new test cases did you write last week?

5. Did you have any problems with TDD? ! Yes ! No

Appear when Yes is selected Please describe

6. Order these items in terms of difficulty relative to TDD (Drag & Drop to rearrange the difficulty) ______Writing a test case ______Writing code to make a test pass ______Refactoring

7. Has the schedule changed? ! Yes ! No

Appear when Yes is selected Why? Please explain.

161 Part II - Technical Questions

8. Did you refactor code last week? ! Yes ! No

9. Which techniques did you use to refactor the code last week? (Drag refactoring techniques in the left side and drop to the right side) Very Frequently Occasionally Rarely Never Frequently ______Breaking large Breaking large Breaking large Breaking large Breaking large methods up methods up methods up methods up methods up into smaller into smaller into smaller into smaller into smaller methods methods methods methods methods ______Renaming Renaming Renaming Renaming Renaming Methods, Methods, Methods, Methods, Methods, Variables or Variables or Variables or Variables or Variables or Classes Classes Classes Classes Classes ______Simplifying Simplifying Simplifying Simplifying Simplifying control control control control control structures (e.g. structures (e.g. structures (e.g. structures (e.g. structures (e.g. series of if series of if series of if series of if series of if statements, or statements, or statements, or statements, or statements, or nested loops, nested loops, nested loops, nested loops, nested loops, etc…) etc…) etc…) etc…) etc…) ______Creating Creating Creating Creating Creating Encapsulated Encapsulated Encapsulated Encapsulated Encapsulated Field (e.g. Field (e.g. Field (e.g. Field (e.g. Field (e.g. using getter using getter using getter using getter using getter and setter and setter and setter and setter and setter methods to methods to methods to methods to methods to make public make public make public make public make public member data member data member data member data member data private) private) private) private) private) ______Splitting large Splitting large Splitting large Splitting large Splitting large classes (Move classes (Move classes (Move classes (Move classes (Move part of the code part of the code part of the code part of the code part of the code from exiting from exiting from exiting from exiting from exiting class into a class into a class into a class into a class into a new class) new class) new class) new class) new class) ______Adding ______Adding ______Adding ______Adding ______Adding or Removing or Removing or Removing or Removing or Removing

162 parameters parameters parameters parameters parameters from a method from a method from a method from a method from a method ______Moving ______Moving ______Moving ______Moving ______Moving methods or methods or methods or methods or methods or fields of a class fields of a class fields of a class fields of a class fields of a class to a super class to a super class to a super class to a super class to a super class ______Moving ______Moving ______Moving ______Moving ______Moving methods or methods or methods or methods or methods or fields of a class fields of a class fields of a class fields of a class fields of a class to a sub-class to a sub-class to a sub-class to a sub-class to a sub-class ______Applying a Applying a Applying a Applying a Applying a design pattern design pattern design pattern design pattern design pattern ______Other** ______Other** ______Other** ______Other** ______Other**

If Other, Please specify the method you used

10. Did you evaluate the result of the refactoring? ! Yes ! No

Appear when Yes is selected Please describe the evaluation process

11. When did you refactor the code? (Select all that apply) " After all test pass " After added a new feature " Review the code " Fix a bug " Whenever possible " Other*

Appear when Other is selected Please specify

12. What is your motivation for refactoring? (Select all that apply) " Fix problems " Improve performance " Improve maintainability " Improve security " Follow the Test-Driven Development approach " Other*

163 Appear when Other is selected Please describe

13. Did you rewrite any existing tests case after refactoring? ! Yes ! No

14. Did you add any new test cases after refactoring? ! Yes ! No

15. Did you remove any existing test cases after refactoring? ! Yes ! No

Part III - General Questions

16. Did you have any problems with the wrapper script? ! Yes ! No

Appear when Yes is selected Please explain.

17. Did you use any other software development tools? (e.g., Visual Studio, new compiler, new test unit framework) ! Yes ! No

Appear when Yes is selected Please specify

Do you have any general comments about TDD?

164 Appendix C MONTHLY SURVEY

165 Monthly Survey

1. How was the effectiveness of employing TDD on the productivity? m Very Effective m Effective m Somewhat Effective m Neither Effective nor Ineffective m Somewhat Ineffective m Ineffective m Very Ineffective

2. How was the effectiveness of refactoring code on the productivity? m Very Effective m Effective m Somewhat Effective m Neither Effective nor Ineffective m Somewhat Ineffective m Ineffective m Very Ineffective

3. Based on your experience, what are the benefits and disadvantages of TDD?

4. When using TDD, how often do you design the software before writing the code?(Software design is the process of defining software methods, functions, objects, and the overall structure and interaction of your code e.g., create flow charts, class diagrams) m Very Frequently m Frequently m Occasionally m Rarely m Very Rarely m Never

5. When using TDD, how often do you perform any software design activities during code development?(For example, modify an original design, create a new design, compare with original design) m Very Frequently m Frequently m Occasionally m Rarely m Very Rarely m Never

166 6. Besides refactoring, did you use other techniques or approaches to improve the code? m Yes m No

Appear when Yes is selected Please specify

7. Overall, how did you identify poor code or poor design?

8. In your opinion, which refactoring technique(s) are very helpful for your work? and why? q Breaking large methods up into smaller methods q Renaming Methods, Variables or Classes q Simplifying control structures (e.g. series of if statements, or nested loops, etc…) q Creating Encapsulated Field (e.g. using getter and setter methods to make public member data private) q Splitting large classes (Move part of the code from exiting class into a new class) q Adding or Removing parameters from a method q Moving methods or fields of a class to a super class q Moving methods or fields of a class to a sub-class q Applying a design pattern q Other

Appear when Other is selected Please specify

9. Based on Question 8, Why do you think those refactoring techniques are helpful? Please explain.

10. What did you learn about the problem of writing tests in your project? how did you solve such problems?

11. What did you learn about the problem of refactoring the code in your project? how did you solve such problems?

167 Appendix D BACKGROUND QUESTIONNAIRE

168 Experience Questionnaire

What is your previous experience with software development in practice? (Check the bottom-most item that applies.) __ I have never developed software. __ I have developed software on my own. __ I have developed software as a part of a team, as part of a course. __ I have developed software as a part of a team, in industry one time. __ I have worked on multiple projects in industry.

Please explain your answer. Include the number of semesters or number of years of relevant experience. (E.g. “I worked for 10 years as a programmer in industry”; “I worked on one large project in industry”; “I developed software as part of a class project”; etc…) ______

Education and Training Please provide your educational background (i.e. list degrees and Majors – B.S. in Chemistry; M.S in Chemistry, etc…) ______

Please describe any other significant work experience in fields other than your educational background. ______

Work Experience • Years of programming work experience: ______• Years of scientific software development experience: ______• Years of parallel software development experience: ______

Programming and Test-Driven Development (TDD) Experience

Please rate your experience in this section with respect to the following 5-point scale: (Please include any relevant comments below each section) 1 = No experience; 2 = learned in class or from book; 3 = used on a class project; 4 = used on one project in industry; 5 = used on multiple projects in industry

• Please rate you C++ programming skills: 1 2 3 4 5 • Please rate you Matlab programming skills: 1 2 3 4 5 • Please rate your parallel programming skills: 1 2 3 4 5 • Please rate you TDD experience: 1 2 3 4 5 • Approximately how many lines of C++ code you have written: ______• Approximately how many lines of Matlab code you have written: ______

169 Comments / other experiences: ______

170 Appendix E SURVEY: TEST-DRIVEN DEVELOPMENT IN COMPUTATIONAL SCIENCE AND ENGINEERING

171 University Of Alabama Informed Consent for a Research Study Dr. Jeffrey Carver and Aziz Nanthaamornphong from the University of Alabama are conducting a research study called “Survey of Test-Driven Development in Computational Science and Engineering”. They wish to understand the usability of different development approaches for computational science and engineering software. Taking part in this study involves completing a web survey that will take about 20 minutes. This survey contains questions about your project and your experience in software development. All data gathered will be anonymous from the time of collection. No identifying information will be associated with the data. There is no risk associated with lack of privacy. Only Dr. Carver and his graduate students will have access to the data. The data will be stored on Dr. Carver’s password protected computer and secured survey hosting site (Qualtrics.com). Only summarized data will be presented at meetings or in publications. There will be no direct benefits to you. This study will help the researchers understand current computational science and engineering software development practices and areas of future need. If you have questions about the study, please call Dr. Carver at 205-348-9829. If you have any questions, concerns, or complaints about your rights as a research participant you may contact Ms. Tanta Myles, The University of Alabama Research Compliance Officer, at (205)-348-8461 or toll-free at 877-820-3066. You may also ask questions, make suggestions, or file complaints and concerns through the IRB Outreach website at http://osp.ua.edu/site/PRCO_Welcome.html or email us at [email protected]. After you participate, you are encouraged to complete the survey for research participants that is online at the outreach website or you may ask the investigator for a copy of it and mail it to the UA Office for Research Compliance, Box 870127, 358 Rose Administration Building, Tuscaloosa, AL 35487- 0127. YOUR PARTICIPATION IS COMPLETELY VOLUNTARY. You are free not to participate or stop participating any time before you submit your answers.

If you understand the statements above, are at least 19 years old, and freely consent to be in this study, choose "Agree". Otherwise, choose "Disagree". ! Agree ! Disagree If Disagree is selected, then skip to end of survey

1. For which type of organization do you currently work? ! University ! Government Laboratory ! Private Company ! Other

172 Appear when Other is selected Please specify.

2. What type of projects do you typically work on? " Research (main goal is to publish papers) " Production (main goal is to produce software for real users) " Other

Appear when Other is Selected Please describe.

3. Please describe any other significant work experience in fields other than your educational background.

4. Please describe your educational background (i.e. list degrees and Majors – B.S. in Chemistry; M.S in Chemistry, etc…)

5. How many years have you been developing real computational science and engineering software projects?

6. Please rate your programming language skills. No experience Learn in a Used on one Used on class or book, real project multiple real but never used projects on a real project C++ ! ! ! ! Fortran ! ! ! ! C ! ! ! ! Matlab ! ! ! ! Python ! ! ! ! Java ! ! ! ! Visual Basic ! ! ! ! C# ! ! ! ! Perl ! ! ! ! Smalltalk ! ! ! ! Haskell ! ! ! ! Mathematica ! ! ! !

173 7. Do you know Test-Driven Development (TDD)? ! Yes ! No If No is selected, then skip to 8. Do you have any plan to learn or employ TDD for your computational science and engineering project?

8. What is your previous experience with TDD in a computational science and engineering project? (Check the item that applies the most)? ! I have only learned TDD without implementing in any real project. ! I have ever employed TDD as part of a course, but never used on any real project. ! I have ever employed TDD on only one real project. ! I have ever employed TDD on many real projects.

9. How have you obtained your Test-Driven Development skill? " Reading Books " Training Course " Co-workers " Learning on my own from online resources " Other

Appear when Other is selected Please describe.

174 10. Please rank the software quality based on what is important to your software [1 - Most Important] (Drag & Drop to rearrange the quality concern) ______Compatibility (The ability of two or more software components to exchange information and/or to perform their required functions while sharing the same hardware or software environment.) ______Functional suitability (The degree to which the software product provides functions that meet stated and implied needs when the software is used under specified conditions.) ______Maintainability (The degree to which the software product can be modified. Modifications may include corrections, improvements or adaptation of the software to changes in environment, and in requirements and functional specifications.) ______Operability (The degree to which the software product can be understood, learned, used and attractive to the user, when used under specified conditions.) ______Performance efficiency (The degree to which the software product provides appropriate performance, relative to the amount of resources used, under stated conditions.) ______Reliability (The degree to which the software product can maintain a specified level of performance when used under specified conditions.) ______Security (The protection of system items from accidental or malicious access, use, modification, destruction, or disclosure.) ______Transferability (The degree to which the software product can be transferred from one environment to another.)

11. Based on Question 10, how was the effectiveness of employing TDD on the most important software quality?

12. Based on Question 10, how was the effectiveness of refactoring on the most important software quality?

13. Based on your experience, what are the benefits and disadvantages of TDD?

14. Have you ever employed TDD in a parallel computing project? ! Yes ! No

Appear when Yes is selected How was the effectiveness of employing TDD in the parallel project on the most important software quality (Question 10)?

175 15. Which techniques did you use to refactor the code? (Code refactoring is disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Refactoring is undertaken in order to improve some of the nonfunctional attributes of the software.) " Breaking large methods up into smaller methods " Renaming Methods, Variables or Classes " Simplifying control structures (e.g. series of if statements, or nested loops, etc…) " Creating Encapsulated Field (e.g. using getter and setter methods to make public member data private) " Splitting large classes (Move part of the code from exiting class into a new class) " Adding or Removing parameters from a method " Moving methods or fields of a class to a super class " Moving methods or fields of a class to a sub-class " Applying the design pattern(s) " Others

Appear when Applying the design pattern(s) is selected Please provide the name of used design patterns.

Appear when Other is selected Please describe the refactoring technique that you used.

16. When using TDD, how often do you design the software before writing the code? (Software design is the process of defining software methods, functions, objects, and the overall structure and interaction of your code e.g., create flow charts, class diagrams) ! Very frequently ! Frequently ! Occasionally ! Rarely ! Very Rarely ! Never

17. When using TDD, how often do you perform any software design activities during code development? (For example, modify an original design, create a new design, compare with original design) ! Very frequently ! Frequently ! Occasionally ! Rarely ! Very Rarely ! Never

176 18. Besides refactoring, did you use other techniques or approaches to improve the code? ! Yes ! No

Appear when Yes is selected Please describe.

19. Overall, how did you identify poor code or poor design?

20. Did you use any automated testing tools? (CMake, CTest, GTest, etc...) ! Yes ! No

Appear when Yes is selected Please specify the tool you used.

21. Please rank these activities in terms of difficulty relative to TDD. [1- Most difficult] (Drag & Drop to rearrange the activities) ______Write a test ______Write code to make the test pass ______Refactoring

22. Please explain the testing method(s) that you used in the project?

23. Do you agree with this given definition of 'Unit Testing'? Definition: Testing the smallest testable units (e.g., class, module, function) in a software system in isolation. Usually done with a specialized unit testing framework. ! Yes ! No

Appear when No is selected If No, please explain.

24. Do you agree with this given definition of 'Integration Testing'? Definition: Testing that occurs after Unit Testing and is intended to ensure that the units interact properly. ! Yes ! No

Appear when No is selected If No, please explain.

177 25. Do you agree with this given definition of 'Regression Testing'? Definition: Rerunning test cases (which were successful in the past) to ensure that changes to the code have not introduced bugs ! Yes ! No

Appear when No is selected If No, please explain.

26. What did you learn about the problem of writing tests in your project? how did you solve such problems?

27. What did you learn about the problem of refactoring the code in your project? how did you solve such problems? Go to Comments about questions/ what else would you like to tell us about TDD.

8. Do you have any plan to learn or employ TDD for your computational science and engineering project? ! Yes ! No

Appear when Yes is Selected Why you will use TDD for your computational science and engineering project?

9. Do you currently use any specific software development process in your computational science and engineering project? (e.g., Agile, RUP, Waterfall) ! Yes ! No

Appear when Yes is selected Please specify the method you used.

Appear when No is selected Please you briefly describe the software development method for your computational science and engineering project.

10. Please explain the testing method(s) that you used in the project?

11. Do you agree with this given definition of 'Unit Testing'? Definition: Testing the smallest testable units (e.g., class, module, function) in a software system in isolation. Usually done with a specialized unit testing framework. ! Yes ! No

178 Appear when No is selected If No, please explain.

12. Do you agree with this given definition of 'Integration Testing'? Definition: Testing that occurs after Unit Testing and is intended to ensure that the units interact properly. ! Yes ! No

Appear when No is selected If No, please explain.

13. Do you agree with this given definition of 'Regression Testing'? Definition: Rerunning test cases (which were successful in the past) to ensure that changes to the code have not introduced bugs. ! Yes ! No

Appear when No is selected If No, please explain.

Comments about questions / what else would you like to tell us about TDD?

179 Appendix F INSTITUTIONAL REVIEW BOARD CERTIFICATIONS

180 181 182 183 184 185 186 187 188 189