A Randomized Controlled Trial

SIGCSE: G: Empirical Assessment of Software Documentation Strategies: A Randomized Controlled Trial Scott Kolodziej Texas A&M University College Station, Texas [email protected] ABSTRACT Anecdote, case studies, and thought experiments are often pro- Source code documentation is an important part of teaching stu- vided as evidence in these standards and references, and recent ad- dents how to be effective programmers. But what evidence dowe vances in data mining source code repositories can provide hints to have to support what good documentation looks like? This study good programming practices. However, well-designed experiments utilizes a randomized controlled trial to experimentally compare may provide significant insight and further test these hypotheses several different types of documentation, including traditional com- and practices [3, 19]. This randomized controlled trial compares ments, self-documenting naming, and an automatic documentation several different documentation styles under laboratory conditions generator. The results of this experiment show that the relationship to determine their effect on a programmer’s ability to understand between documentation and source code understanding is more what a program does. complex than simply "more is better," and poorly documented code may even lead to a more correct understanding of the source code. 2 BACKGROUND AND RELATED WORK Several previous studies have examined the effects of software CCS CONCEPTS documentation on program comprehension, but few have focused • General and reference → Empirical studies; • Software and specifically on source code comments and variable names rather its engineering → Documentation; Software development tech- than external documentation formats. Fewer still can be classified niques; Automatic programming; as controlled experiments [15]. One experiment by Sheppard et al. found that descriptive variable names had no statistically signifi- KEYWORDS cant impact on the ability of a programmer to recreate functionally Software documentation, empirical study, randomized controlled equivalent code from memory [14]. However, the same study com- trial, automatic documentation generation, self-documenting code mented that as participants wrote their functionally equivalent code, they generally used meaningful variable names, even when ACM Reference Format: rewriting source code that had less meaningful names. Scott Kolodziej. 2019. SIGCSE: G: Empirical Assessment of Software Doc- Another study by Woodfield et al. demonstrated the efficacy of umentation Strategies: A Randomized Controlled Trial. In Proceedings of comments when interpreting source code written in Fortran 77 ACM Student Research Competition Grand Finals. ACM, New York, NY, USA, [22], with similar studies conducted by Tenny using PL/1 [18] and 5 pages. Takang et al. using Modula 2 [17] showing similar trends. While these studies lend strong evidence to the claim that comments 1 PROBLEM AND MOTIVATION improve comprehension of source code, their validity may not be When discussing source code documentation, a question invariably generalizable to today’s modern programming languages. arises: What does good documentation look like? High-quality More recently, Endrikat et al. conducted a controlled experiment comments and good variable names are standard recommendations, to study the effect of static typing and external (i.e. not in-source) but what experimental evidence exists to support this standard? documentation of an existing API had on development time [4]. Historically, technical limitations in programming languages While the beneficial effect of static typing over dynamic typing was such as line length limits have restricted the amount and type of in- statistically significant and substantial, the effect of additional API source documentation available. However, as these limitations have documentation was not statistically significant. A similar study on been removed, the available options for documentation have rapidly software maintenance by Tryggeseth showed that external docu- increased and diversified. Coding standards frequently describe how mentation of a large codebase with no existing comments decreased source code should be documented [2, 5–7], and popular reference the time programmers took to modify the system by 21.5%, sup- books provide detailed arguments for and against certain forms of porting the efficacy of external documentation in the absence of documentation [9, 10]. comments [20]. Developers tend to be more willing to change code than to change Permission to make digital or hard copies of part or all of this work for personal or accompanying comments, which can lead to misleading documen- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation tation and future bugs [8]. This may imply that self-documenting on the first page. Copyrights for third-party components of this work must be honored. code is a better documentation strategy than comments alone. An For all other uses, contact the owner/author(s). observational study based on open-source software repositories ACM Student Research Competition Grand Finals, 2019, indicated that more verbose comments may result in more change- © 2019 Copyright held by the owner/author(s). prone code, again challenging the efficacy of comments [1]. ACM Student Research Competition Grand Finals, 2019, Scott Kolodziej Automatic documentation generation is a recent development (A) No comments with poor naming. This category served as the that uses machine learning to document code based on established control. No comments were provided, and all variable and examples of well-documented code [16]. The resulting documen- function names were random alphabetic characters. tation often uses a mixture of variable renaming and additional (B) No comments with self-documenting naming. No comments comments. One such generator, JSNice, uses this approach to gen- were provided, but variables and functions were descriptively erate documentation for source code written in JavaScript [12]. named. In this work, we compare the effect of comments, self-documenting (C) Descriptive comments with poor naming. Comments were naming, and automatic documentation generation in a modern pro- included in the source code, but variables and functions gramming language (C++). Specifically, we measure the effects were named with random alphabetic characters. of these documentation styles on programmer response time and (D) Descriptive comments with self-documenting naming. Both accuracy when determining what a short code snippet does. Addi- comments and descriptive names were included, otherwise tionally, this study is an instance of a randomized controlled trial, referred to as full documentation. meaning that the experimental categories are randomly assigned to (E) Automatic generation. Comments and names generated using avoid bias and a control category (representing no documentation) JSNice. is present. 3.1 Experimental Design and Methods 3 APPROACH AND UNIQUENESS Five code snippets, ranging between 23 and 96 lines of code, were In this experiment, three forms of documentation were compared: written in C++ and documented in each of the five styles. The snip- • Commenting, non-functioning descriptive text in the source pets were created with the goal of being representative samples code (see Listing 1). of software source code, with each snippet including 1-2 concepts • Self-documenting naming, descriptively naming variables and selected from object-oriented programming, algorithms, data struc- functions to imply further documentation unnecessary (see tures, and recursion. Listing 2). As no C++-based documentation generator currently exists, the • Automatic generation, using a machine learning-based tool to JavaScript-based JSNice was used [12]. Thus, the code snippets were generate both comments and descriptive names (see Listing translated from C++ to equivalent JavaScript, stripped of all forms 3). of documentation, submitted to JSNice, and the resulting source code was translated back to C++. 1 // Sum all elements in the arraya Twenty undergraduate students majoring in computer science 2 doubles=0; or computer engineering were recruited who had completed an in- 3 int n = a.size();// length of the array 4 for(int i = 0; i < n; i++) { troductory course in C++. After informed consent, each participant 5 s += a[i]; was given a brief demographic survey (to account for programming 6 } 7 experience and year in major), a practice question to familiarize 8 // Compute the arithmetic mean themselves with the user interface, and a test to assess their under- 9 doublex=s/n; standing of the documented code snippets. Participants were asked Listing 1: Example of Commented Code to respond to three multiple-choice or fill-in-the-blank questions per snippet, with the code still visible. To avoid experimental bias, the test contained all five code snip- 1 double arraySum = 0; 2 int numElements = array.size(); pets in a randomized order and randomized documentation style 3 for(int index = 0; index < numElements; index++) { such that each participant was presented with all five snippets and 4 arraySum += array[index]; all five documentation styles. This allowed each participant to serve 5 } 6 as their own control. This randomization was accomplished using 7 double mean = arraySum / numElements; a 5 × 5 Latin square experimental design (see Figure

A Randomized Controlled Trial

Measuring the Software Size of Sliced V-Model Projects

Introduction to Metamodeling for Reducing Computational Burden Of

Beyond Accuracy: Assessing Software Documentation Quality

Software Development a Practical Approach!

The Developer's Guide to Debugging

Effective Methods for Software Testing

Understanding Document for Software Project

The IDEF Family of Languages

Technical Specification Document Software Development

Documentation

Software Documentation Guidelines

ICAM Software Documentation Standards