A Randomized Controlled Trial

Total Page:16

File Type:pdf, Size:1020Kb

A Randomized Controlled Trial SIGCSE: G: Empirical Assessment of Software Documentation Strategies: A Randomized Controlled Trial Scott Kolodziej Texas A&M University College Station, Texas [email protected] ABSTRACT Anecdote, case studies, and thought experiments are often pro- Source code documentation is an important part of teaching stu- vided as evidence in these standards and references, and recent ad- dents how to be effective programmers. But what evidence dowe vances in data mining source code repositories can provide hints to have to support what good documentation looks like? This study good programming practices. However, well-designed experiments utilizes a randomized controlled trial to experimentally compare may provide significant insight and further test these hypotheses several different types of documentation, including traditional com- and practices [3, 19]. This randomized controlled trial compares ments, self-documenting naming, and an automatic documentation several different documentation styles under laboratory conditions generator. The results of this experiment show that the relationship to determine their effect on a programmer’s ability to understand between documentation and source code understanding is more what a program does. complex than simply "more is better," and poorly documented code may even lead to a more correct understanding of the source code. 2 BACKGROUND AND RELATED WORK Several previous studies have examined the effects of software CCS CONCEPTS documentation on program comprehension, but few have focused • General and reference → Empirical studies; • Software and specifically on source code comments and variable names rather its engineering → Documentation; Software development tech- than external documentation formats. Fewer still can be classified niques; Automatic programming; as controlled experiments [15]. One experiment by Sheppard et al. found that descriptive variable names had no statistically signifi- KEYWORDS cant impact on the ability of a programmer to recreate functionally Software documentation, empirical study, randomized controlled equivalent code from memory [14]. However, the same study com- trial, automatic documentation generation, self-documenting code mented that as participants wrote their functionally equivalent code, they generally used meaningful variable names, even when ACM Reference Format: rewriting source code that had less meaningful names. Scott Kolodziej. 2019. SIGCSE: G: Empirical Assessment of Software Doc- Another study by Woodfield et al. demonstrated the efficacy of umentation Strategies: A Randomized Controlled Trial. In Proceedings of comments when interpreting source code written in Fortran 77 ACM Student Research Competition Grand Finals. ACM, New York, NY, USA, [22], with similar studies conducted by Tenny using PL/1 [18] and 5 pages. Takang et al. using Modula 2 [17] showing similar trends. While these studies lend strong evidence to the claim that comments 1 PROBLEM AND MOTIVATION improve comprehension of source code, their validity may not be When discussing source code documentation, a question invariably generalizable to today’s modern programming languages. arises: What does good documentation look like? High-quality More recently, Endrikat et al. conducted a controlled experiment comments and good variable names are standard recommendations, to study the effect of static typing and external (i.e. not in-source) but what experimental evidence exists to support this standard? documentation of an existing API had on development time [4]. Historically, technical limitations in programming languages While the beneficial effect of static typing over dynamic typing was such as line length limits have restricted the amount and type of in- statistically significant and substantial, the effect of additional API source documentation available. However, as these limitations have documentation was not statistically significant. A similar study on been removed, the available options for documentation have rapidly software maintenance by Tryggeseth showed that external docu- increased and diversified. Coding standards frequently describe how mentation of a large codebase with no existing comments decreased source code should be documented [2, 5–7], and popular reference the time programmers took to modify the system by 21.5%, sup- books provide detailed arguments for and against certain forms of porting the efficacy of external documentation in the absence of documentation [9, 10]. comments [20]. Developers tend to be more willing to change code than to change Permission to make digital or hard copies of part or all of this work for personal or accompanying comments, which can lead to misleading documen- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation tation and future bugs [8]. This may imply that self-documenting on the first page. Copyrights for third-party components of this work must be honored. code is a better documentation strategy than comments alone. An For all other uses, contact the owner/author(s). observational study based on open-source software repositories ACM Student Research Competition Grand Finals, 2019, indicated that more verbose comments may result in more change- © 2019 Copyright held by the owner/author(s). prone code, again challenging the efficacy of comments [1]. ACM Student Research Competition Grand Finals, 2019, Scott Kolodziej Automatic documentation generation is a recent development (A) No comments with poor naming. This category served as the that uses machine learning to document code based on established control. No comments were provided, and all variable and examples of well-documented code [16]. The resulting documen- function names were random alphabetic characters. tation often uses a mixture of variable renaming and additional (B) No comments with self-documenting naming. No comments comments. One such generator, JSNice, uses this approach to gen- were provided, but variables and functions were descriptively erate documentation for source code written in JavaScript [12]. named. In this work, we compare the effect of comments, self-documenting (C) Descriptive comments with poor naming. Comments were naming, and automatic documentation generation in a modern pro- included in the source code, but variables and functions gramming language (C++). Specifically, we measure the effects were named with random alphabetic characters. of these documentation styles on programmer response time and (D) Descriptive comments with self-documenting naming. Both accuracy when determining what a short code snippet does. Addi- comments and descriptive names were included, otherwise tionally, this study is an instance of a randomized controlled trial, referred to as full documentation. meaning that the experimental categories are randomly assigned to (E) Automatic generation. Comments and names generated using avoid bias and a control category (representing no documentation) JSNice. is present. 3.1 Experimental Design and Methods 3 APPROACH AND UNIQUENESS Five code snippets, ranging between 23 and 96 lines of code, were In this experiment, three forms of documentation were compared: written in C++ and documented in each of the five styles. The snip- • Commenting, non-functioning descriptive text in the source pets were created with the goal of being representative samples code (see Listing 1). of software source code, with each snippet including 1-2 concepts • Self-documenting naming, descriptively naming variables and selected from object-oriented programming, algorithms, data struc- functions to imply further documentation unnecessary (see tures, and recursion. Listing 2). As no C++-based documentation generator currently exists, the • Automatic generation, using a machine learning-based tool to JavaScript-based JSNice was used [12]. Thus, the code snippets were generate both comments and descriptive names (see Listing translated from C++ to equivalent JavaScript, stripped of all forms 3). of documentation, submitted to JSNice, and the resulting source code was translated back to C++. 1 // Sum all elements in the arraya Twenty undergraduate students majoring in computer science 2 doubles=0; or computer engineering were recruited who had completed an in- 3 int n = a.size();// length of the array 4 for(int i = 0; i < n; i++) { troductory course in C++. After informed consent, each participant 5 s += a[i]; was given a brief demographic survey (to account for programming 6 } 7 experience and year in major), a practice question to familiarize 8 // Compute the arithmetic mean themselves with the user interface, and a test to assess their under- 9 doublex=s/n; standing of the documented code snippets. Participants were asked Listing 1: Example of Commented Code to respond to three multiple-choice or fill-in-the-blank questions per snippet, with the code still visible. To avoid experimental bias, the test contained all five code snip- 1 double arraySum = 0; 2 int numElements = array.size(); pets in a randomized order and randomized documentation style 3 for(int index = 0; index < numElements; index++) { such that each participant was presented with all five snippets and 4 arraySum += array[index]; all five documentation styles. This allowed each participant to serve 5 } 6 as their own control. This randomization was accomplished using 7 double mean = arraySum / numElements; a 5 × 5 Latin square experimental design (see Figure
Recommended publications
  • Measuring the Software Size of Sliced V-Model Projects
    2014 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement Measuring the Software Size of Sliced V-model Projects Andreas Deuter PHOENIX CONTACT Electronics GmbH Gregor Engels Dringenauer Str. 30 University of Paderborn 31812 Bad Pyrmont, Germany Zukunftsmeile 1 Email: [email protected] 33102 Paderborn, Germany Email: [email protected] Abstract—Companies expect higher productivity of their soft- But, the “manufacturing” of software within the standard ware teams when introducing new software development methods. production process is just a very short process when bringing Productivity is commonly understood as the ratio of output the binaries of the software to the device. This process is hardly created and resources consumed. Whereas the measurement of to optimize and does not reflect the “production of software” the resources consumed is rather straightforward, there are at all. The creation of the software is namely done in the several definitions for counting the output of a software de- developing teams by applying typical software engineering velopment. Source code-based metrics create a set of valuable figures direct from the heart of the software - the code. However, methods. However, to keep up with the high demands on depending on the chosen process model software developers and implementing new functionality (e.g. for PLC) the software testers produce also a fair amount of documentation. Up to development process within these companies must improve. now this output remains uncounted leading to an incomplete Therefore, they start to analyze their software processes in view on the development output. This article addresses this open detail and to identify productivity drivers.
    [Show full text]
  • Introduction to Metamodeling for Reducing Computational Burden Of
    This is a repository copy of Introduction to metamodeling for reducing computational burden of advanced analyses with health economic models : a structured overview of metamodeling methods in a 6-step application process. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/156157/ Version: Accepted Version Article: Degeling, K., IJzerman, M.J., Lavieri, M.S. et al. (2 more authors) (2020) Introduction to metamodeling for reducing computational burden of advanced analyses with health economic models : a structured overview of metamodeling methods in a 6-step application process. Medical Decision Making, 40 (3). pp. 348-363. ISSN 0272-989X https://doi.org/10.1177/0272989X20912233 Degeling, K., IJzerman, M. J., Lavieri, M. S., Strong, M., & Koffijberg, H. (2020). Introduction to Metamodeling for Reducing Computational Burden of Advanced Analyses with Health Economic Models: A Structured Overview of Metamodeling Methods in a 6-Step Application Process. Medical Decision Making, 40(3), 348–363. Copyright © 2020 The Author(s) DOI: 10.1177/0272989X20912233 Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.
    [Show full text]
  • Beyond Accuracy: Assessing Software Documentation Quality
    Beyond Accuracy: Assessing Soware Documentation ality Christoph Treude Justin Middleton Thushari Atapattu [email protected] [email protected] [email protected] University of Adelaide North Carolina State University University of Adelaide Adelaide, SA, Australia Raleigh, NC, United States Adelaide, SA, Australia ABSTRACT provides initial evidence for the strengths and weaknesses of dif- Good software documentation encourages good software engi- ferent genres of documentation (blog articles, reference documen- neering, but the meaning of “good” documentation is vaguely de- tation, README files, Stack Overflow threads, tutorials) based on fined in the software engineering literature. To clarify this ambi- the ten dimensions of our software documentation quality frame- guity, we draw on work from the data and information quality work. community to propose a framework that decomposes documenta- The contributions of this work are: tion quality into ten dimensions of structure, content, and style. To • demonstrate its application, we recruited technical editorsto apply A ten-dimensional framework for asking questions about the framework when evaluating examples from several genres of software documentation quality, • software documentation. We summarise their assessments—for ex- A partially validated survey instrument to evaluate docu- ample, reference documentation and README files excel in quality ment quality over multiple documentation genres, and • whereas blog articles have more problems—and we describe our vi- A vision for the expansion of a unified quality framework sion for reasoning about software documentation quality and for through further experimentation. the expansion and potential of a unified quality framework. 2 BACKGROUND AND RELATED WORK CCS CONCEPTS The most related piece of work to this paper is the seminal 1995 • Software and its engineering → Documentation; • Social article “Beyond Accuracy: What Data Quality Means to Data Con- and professional topics → Quality assurance.
    [Show full text]
  • Software Development a Practical Approach!
    Software Development A Practical Approach! Hans-Petter Halvorsen https://www.halvorsen.blog https://halvorsen.blog Software Development A Practical Approach! Hans-Petter Halvorsen Software Development A Practical Approach! Hans-Petter Halvorsen Copyright © 2020 ISBN: 978-82-691106-0-9 Publisher Identifier: 978-82-691106 https://halvorsen.blog ii Preface The main goal with this document: • To give you an overview of what software engineering is • To take you beyond programming to engineering software What is Software Development? It is a complex process to develop modern and professional software today. This document tries to give a brief overview of Software Development. This document tries to focus on a practical approach regarding Software Development. So why do we need System Engineering? Here are some key factors: • Understand Customer Requirements o What does the customer needs (because they may not know it!) o Transform Customer requirements into working software • Planning o How do we reach our goals? o Will we finish within deadline? o Resources o What can go wrong? • Implementation o What kind of platforms and architecture should be used? o Split your work into manageable pieces iii • Quality and Performance o Make sure the software fulfills the customers’ needs We will learn how to build good (i.e. high quality) software, which includes: • Requirements Specification • Technical Design • Good User Experience (UX) • Improved Code Quality and Implementation • Testing • System Documentation • User Documentation • etc. You will find additional resources on this web page: http://www.halvorsen.blog/documents/programming/software_engineering/ iv Information about the author: Hans-Petter Halvorsen The author currently works at the University of South-Eastern Norway.
    [Show full text]
  • The Developer's Guide to Debugging
    The Developer’s Guide to Debugging Thorsten Grotker¨ · Ulrich Holtmann Holger Keding · Markus Wloka The Developer’s Guide to Debugging 123 Thorsten Gr¨otker Ulrich Holtmann Holger Keding Markus Wloka Internet: http://www.debugging-guide.com Email: [email protected] ISBN: 978-1-4020-5539-3 e-ISBN: 978-1-4020-5540-9 Library of Congress Control Number: 2008929566 c 2008 Springer Science+Business Media B.V. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 987654321 springer.com Foreword Of all activities in software development, debugging is probably the one that is hated most. It is guilt-ridden because a technical failure suggests personal fail- ure; because it points the finger at us showing us that we have been wrong. It is time-consuming because we have to rethink every single assumption, every single step from requirements to implementation. Its worst feature though may be that it is unpredictable: You never know how much time it will take you to fix a bug - and whether you’ll be able to fix it at all. Ask a developer for the worst moments in life, and many of them will be related to debugging. It may be 11pm, you’re still working on it, you are just stepping through the program, and that’s when your spouse calls you and asks you when you’ll finally, finally get home, and you try to end the call as soon as possible as you’re losing grip on the carefully memorized observations and deductions.
    [Show full text]
  • Effective Methods for Software Testing
    Effective Methods for Software Testing Third Edition Effective Methods for Software Testing Third Edition William E. Perry Effective Methods for Software Testing, Third Edition Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2006 by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN-13: 978-0-7645-9837-1 ISBN-10: 0-7645-9837-6 Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 3MA/QV/QU/QW/IN No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copy- right Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no repre- sentations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fit- ness for a particular purpose. No warranty may be created or extended by sales or promo- tional materials.
    [Show full text]
  • Understanding Document for Software Project
    Understanding Document For Software Project Sax burls his lambda fidged inseparably or respectively after Dryke fleeces and bobbles luxuriously, pleased and perispomenon. Laird is Confucian and sleet geometrically while unquiet Isador prise and semaphore. Dryke is doltish and round devilish as gyrational Marshall paraffin modestly and toot unofficially. How we Write better Software Requirement Specification SRS. Project Initiation Documents Project Management from. How plenty you punch an understanding document for custom project? Core Practices for AgileLean Documentation Agile Modeling. It projects include more project specification and understanding of different business analysts to understand. Explaining restrictions or constraints within the requirements document will escape further. FUNCTIONAL and TECHNICAL REQUIREMENTS DOCUMENT. Anyone preparing a technical requirement document should heed what. Most software makers adhere in a formal development process similar leaving the one described. Developers who begin programming a crazy system without saying this document to hand. Functional specification documents project impact through. Documentation in software engineering is that umbrella course that encompasses all written documents and materials dealing with open software product's development and use. Nonfunctional Requirements Scaled Agile Framework. We understand software project, she can also prefer to understanding! The project for understanding of course that already understand the ability to decompose a formal text can dive deep into a route plan which are the. Adopted for large mouth small mistake and proprietary documentation projects. Design Document provides a description of stable system architecture software. Process Documentation Guide read How to Document. Of hostile software Understanding how they project is contribute probably the. The architecture interaction and data structures need explaining as does around database.
    [Show full text]
  • The IDEF Family of Languages
    CHAPTER 10 The IDEF Family of Languages Christopher Menzel, Richard J. Mayer The purpose of this contribution is to serve as a clear introduction to the modeling languages of the three most widely used IDEF methods: IDEFO, IDEFIX, and IDEF3. Each language is presented in turn, beginning with a discussion of the underlying "ontology" the language purports to describe, followed by presentations of the syntax of the language - particularly the notion of a model for the language - and the semantical rules that determine how models are to be interpreted. The level of detail should be sufficient to enable the reader both to understand the intended areas of application of the languages and to read and construct simple models of each of the three types. 1 Introduction A modeling method comprises a specialized modeling language for represent­ ing a certain class of information, and a modeling methodology for collecting, maintaining, and using the information so represented. The focus of this paper will be on the languages of the three most widely used IDEF methods: The IDEFO business function modeling method, the IDEFIX data modeling method, and the IDEF3 process modeling method. Any usable modeling language has both a syntax and a semantics: a set of rules (often implicit) that determines the legitimate syntactic constructs of the language, and a set of rules (often implicit) the determines the meanings of those constructs. It is not the purpose of this paper is to serve as an exhaus­ tive reference manual for the three IDEF languages at issue. Nor will it dis­ cuss the methodologies that underlie the applications of the languages.
    [Show full text]
  • Technical Specification Document Software Development
    Technical Specification Document Software Development Arturo scrouge gnostically. Tuneful Nickolas rafters her Blois so grinningly that Henrique saps very worse. Johnny agglutinated signally while unimpassioned Wilden gravels festively or misgave depravedly. The software requirements and the new product as short. The software solution and work for the technology with valid for the uddi server that can be an application goes viral and hassle by gaining more! Conscious property is technical specifications has acted as developers to analyze your developer. In vr experiences as needed design document is an application and requirements, including a few years or smartphones. These information must be kept as shown in the developments in! You stipulate these are noted in specification and developer. Virtual reality over a user or group contributors opossoichi fujii microsoft hololens was more than english language when you? Developers have to document is technical specification document cannot foresee possible, development process organization and documented to. Lan by technical specification or specific look and documented. Prime factor is technical specification document plays a development will be developers select a challenge. In software documents are united we actually putting their working on the developments take at extracting data. No technical specification document software development. Wsdl included in software developers without stress at a control, create an option of users can continually improve release plans, organise and documented. All business owners have a version of it is? Specify the software that this section below we learned through the. Show appreciation reflects newest business development? Soap binding information, technical specification is already see good luck with this is hard.
    [Show full text]
  • Documentation
    1 Chapter 30 Documentation 30 Documentation Objectives The objectives of this chapter are to describe the different types of documentation that may have to be produced for large software systems and to present guidelines for producing high-quality documents. When you have read the chapter, you will: understand why it is important to produce some system documentation, even when agile methods are used for system development; understand the standards that are important for document production; have been introduced to the process of professional document production. Contents 30.1 Process documentation 30.2 Product documentation 30.3 Document quality 30.4 Document production © Ian Sommerville 2010 Chapter 30 Documentation 2 Large software development projects, irrespective of application, generate a large amount of associated documentation. If this were all to be printed, the documentation would probably fill several filing cabinets for moderately large systems; for very large critical systems, that must be externally certified, it may fill several rooms. A high proportion of software process costs, especially for regulated systems, is incurred in producing this documentation. Furthermore, documentation errors and omissions can lead to errors by end-users and consequent system failures with their associated costs and disruption. Therefore, managers and software engineers should pay as much attention to documentation and its associated costs as to the development of the software itself. The documents associated with a software project and the system being developed have a number of associated requirements: 1. They should act as a communication medium between members of the development team. 2. They should be an information repository to be used by maintenance engineers.
    [Show full text]
  • Software Documentation Guidelines
    Software Documentation Guidelines Software Documentation Guidelines In addition to a working program and its source code, you must also author the documents discussed below to gain full credit for the programming project. The fundamental structure of these documents is entirely independent of project, programming language, and operating system. You will find a number of advantages when you pursue a rigid documentation approach to programming. First of all, you will have a firm understanding of the task at hand before you start coding. A good understand of the problem leads to a clean design that tends to have fewer bugs. Always make your goal to program it right the first time! The next advantage is that others will be able to use your documentation to test the program, fix bugs, and make enhancements. In the corporate world, these duties are normally performed by different people and often by different groups within a single company. Therefore, the more detailed, organized, and easy-to-read your documentation is, the more you help other people do their jobs. As you learn to write solid documentation, you will also come to appreciate reading solid documentation, and will eventually detest reading technical crap (the world is full of poorly written technical books and manuals). In other words, write simply and clearly. The way you write is just as important as the details you present. Always strive to spell correctly and use proper grammar. The campus Writing Center can aid you in this respect. User Requirements Document (URD) Requirements Analysis Document (RAD) User Interface Specification (UIS) Prototype Object Oriented Analysis (OOA) or High Level Design (HLD) Object Oriented Design (OOD) or Low Level Design (LLD) Code Documentation (CD) Testing Documentation (TD) User's Guide (UG) User Requirements Document (URD) This document describes the problem from the user's point of view.
    [Show full text]
  • ICAM Software Documentation Standards
    FILE COPY 00 KOI REMOVE NBSIR 79-1940 (R) MAR ? 2 t980 ICAM Software Documentation Standards Center for Programming Science and Technology Institute for Computer Sciences and Technology National Bureau of Standards Washington, D.C. 20234 Final Report February 1980 Prepared for Air Force Materials Laboratory Wright- Patterson Air Force Base Ohio 45433 NBSIR 79-1940 (R) ICAM SOFTWARE DOCUMENTATION STANDARDS Center for Programming Science and Technology Institute for Computer Sciences and Technology National Bureau of Standards Washington, D.C. 20234 Final Report February 1 980 Prepared for Air Force Materials Laboratory Wright-Patterson Air Force Base Ohio 45433 U.S. DEPARTMENT OF COMMERCE, Philip M. Klutznick, Secretary Luther H. Hodges, Jr., Deputy Secretary Jordan J. Baruch, Assistant Secretary for Science and Technology NATIONAL BUREAU OF STANDARDS, Ernest Ambler, Director R PREFACE In October of 1977 the National Bureau of Standards agreed with the Air Force Wright Aeronautical Laboratories to develop documentation standards for the Integrated Computer-Aided Manufacturing (ICAM) Program. Under MIPR FY1 457-77-02054 NBS's Institute for Computer Sciences and Technology undertook and completed this project. The docu- mentation standards included in this final repor t--NBSI 79-1940 (Air Force) (R)--have given NBS the opportunity to adapt and substantially extend Federal Information Process- ing Standards Publications 30, 38, and 64; moreover, they provide the ICAM Program with the means of ensuring high- quality software products. -i i
    [Show full text]