Beyond Accuracy: Assessing Software Documentation Quality

Total Page:16

File Type:pdf, Size:1020Kb

Beyond Accuracy: Assessing Software Documentation Quality Beyond Accuracy: Assessing Soware Documentation ality Christoph Treude Justin Middleton Thushari Atapattu [email protected] [email protected] [email protected] University of Adelaide North Carolina State University University of Adelaide Adelaide, SA, Australia Raleigh, NC, United States Adelaide, SA, Australia ABSTRACT provides initial evidence for the strengths and weaknesses of dif- Good software documentation encourages good software engi- ferent genres of documentation (blog articles, reference documen- neering, but the meaning of “good” documentation is vaguely de- tation, README files, Stack Overflow threads, tutorials) based on fined in the software engineering literature. To clarify this ambi- the ten dimensions of our software documentation quality frame- guity, we draw on work from the data and information quality work. community to propose a framework that decomposes documenta- The contributions of this work are: tion quality into ten dimensions of structure, content, and style. To • demonstrate its application, we recruited technical editorsto apply A ten-dimensional framework for asking questions about the framework when evaluating examples from several genres of software documentation quality, • software documentation. We summarise their assessments—for ex- A partially validated survey instrument to evaluate docu- ample, reference documentation and README files excel in quality ment quality over multiple documentation genres, and • whereas blog articles have more problems—and we describe our vi- A vision for the expansion of a unified quality framework sion for reasoning about software documentation quality and for through further experimentation. the expansion and potential of a unified quality framework. 2 BACKGROUND AND RELATED WORK CCS CONCEPTS The most related piece of work to this paper is the seminal 1995 • Software and its engineering → Documentation; • Social article “Beyond Accuracy: What Data Quality Means to Data Con- and professional topics → Quality assurance. sumers” by Wang and Strong [17]. We follow the same beyond- accuracy approach for the domain of software documentation. KEYWORDS software documentation, quality Defective Software Documentation. Defect detection tools have been widely investigated at the code level, but very few studies fo- 1 INTRODUCTION AND MOTIVATION cus on defects at the document level [22]. The existing approaches High-quality software documentation is crucial for software de- in the documentation space investigate inconsistencies between velopment, comprehension, and maintenance, but the ways that code and documentation. In one of the first such attempts, Tan et documentation can suffer poor quality are numerous. For exam- al. [16] presented @tcomment for testing Javadoc comments re- ple, Aghajani et al. [1] have designed a taxonomy of 162 doc- lated to null values and exceptions. DocRef by Zhong and Su [19] umentation issue types, covering information content, presenta- detects API documentation errors by seeking out mismatches be- tion, process-related matters, and tool-related matters. Addition- tween code names in natural-language documentation and code. ally, because software documentation is written in informal nat- AdDoc by Dagenais and Robillard [3] automatically discovers doc- ural language which is inherently ambiguous, imprecise, unstruc- umentation patterns which are defined as coherent sets of code tured, and complex in syntax and semantics, its quality can often elements that are documented together. Also aimed at inconsisten- arXiv:2007.10744v3 [cs.SE] 8 Sep 2020 only be evaluated manually [6]. The assessment depends on con- cies between code and documentation, Ratol and Robillard [15] pre- text for some quality attributes (e.g., task orientation, usefulness), sented Fraco, a tool to detect source code comments that are fragile it mixes content and medium for others (e.g., visual effectiveness, with respect to identifier renaming. retrievability), and sometimes it requires looking beyond the docu- Wen et al. [18] presented a large-scale empirical study of code- mentation itself (e.g., accuracy, completeness). While most related comment inconsistencies, revealing causes such as deprecation work has focused on documentation accuracy and completeness and refactoring. Zhou et al. [21, 22] contributed a line of work on (e.g., for API documentation [3, 19]), the fact that many developers detecting defects of API documents with techniques from program are reluctant to carefully read documentation [20] suggests that comprehension and natural language processing. They presented documentation suffers from issues beyond accuracy. As such, we DRONE to automatically detect directive defects and recommend lack a comprehensive framework and instruments for assessing solutions to fix them. And in recent work, Panthaplackel et al.[8] software documentation quality beyond these characteristics. proposed an approach that learns to correlate changes across two In this work we adapt quality frameworks from the data and in- distinct language representations, to generate a sequence of edits formation quality community to the software engineering domain. that are applied to existing comments to reflect code modifications. We design a survey instrument to assess software documentation To the best of our knowledge, little of the existing work tar- quality from different sources. A pilot study with four technical gets the assessment or improvement of the quality of software editors and 41 documents related to the R programming language documentation beyond accuracy and completeness; one example ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Christoph Treude, Justin Middleton, and Thushari Atapau is Plösch and colleagues’ survey on what developers value in doc- 4 PILOT STUDY umentation [13]. We turn to related work from the data and infor- To gather preliminary evidence on whether experts would be able mation quality community to further fill this gap. to use the proposed framework for assessing software documenta- tion, we conducted a pilot study with technical editors and differ- ent genres in software documentation. Recruitment. To recruit technical editors, we posted an adver- Information Quality. In their seminal 1995 work “Beyond Accu- 1 racy: What Data Quality Means to Data Consumers”, Wang and tisement on Upwork, a website for freelance recruitment and em- Strong [17] conducted a survey to generate a list of data quality at- ployment. Our posting explained our goal of evaluating software tributes that capture data consumers’ perspectives. These perspec- documentation on ten dimensions and identifying weaknesses in tives were grouped into accuracy, relevancy, representation, and their design. Therefore, we required applicants to be qualified as accessibility, and they contained a total of 15 items such as believ- technical editors with programming experience, and accordingly, ability, reputation, and ease of understanding. A few years later, we would compensate them hourly according to their established Eppler [4] published a similar list of dimensions which contained rates. From this, we recruited the first four qualified applicants 16 dimensions such as clarity, conciseness, and consistency. with experience in technical writing and development. We refer These two sets of dimensions built the starting point into our to our participants henceforth as E1 through E4. investigation of dimensions from the data and information qual- Data Collection. Because developers write software instructions ity community which might be applicable to software documenta- in a variety of contexts, we use a broad definition of documen- tion. Owing to the fact that software documentation tends to be tation, accepting both official, formal documentation and implicit disseminated over the Internet, we further included the work of documentation that emerges from online discussion. We focus on Knight and Burn [7], who discussed the development of a frame- the genres of software documentation that have been the sub- work for assessing information quality on the World Wide Web ject of investigation in previous work on software documentation: and compiled a list of 20 common dimensions for information and reference documentation [5], README files [14], tutorials [11], data quality from previous work, including accuracy, consistency, blogs [10], and Stack Overflow threads [2]. To better control for and timeliness. variation between language, we focused exclusively on resources documenting the R programming language or projects built with it. All documentation was then randomly sampled from the following sources: 3 SOFTWARE DOCUMENTATION QUALITY • Reference Documentation (RD): from the 79 subsections of the R language manual.2 Inspired by the dimensions of information and data quality iden- • README files (R): from the first 159 items in a list3 of tified in related work, we designed the software documentation open-source R projects curated by GitHub users. quality framework summarised in Table 1. The first and last author • Tutorials (T): from the 46 R tutorials on tutorials- of this paper collaboratively went through the dimensions from re- point.com.4 lated work cited in Section 2 to select those that (i) could apply to • Articles (A): from 1,208 articles posted on R-bloggers.com,5 software documentation, (ii) do not depend on the logistics of ac- a blog aggregation website, within the last six months. cessing the documentation (e.g., security), and (iii) can be checked • Stack Overflow
Recommended publications
  • Measuring the Software Size of Sliced V-Model Projects
    2014 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement Measuring the Software Size of Sliced V-model Projects Andreas Deuter PHOENIX CONTACT Electronics GmbH Gregor Engels Dringenauer Str. 30 University of Paderborn 31812 Bad Pyrmont, Germany Zukunftsmeile 1 Email: [email protected] 33102 Paderborn, Germany Email: [email protected] Abstract—Companies expect higher productivity of their soft- But, the “manufacturing” of software within the standard ware teams when introducing new software development methods. production process is just a very short process when bringing Productivity is commonly understood as the ratio of output the binaries of the software to the device. This process is hardly created and resources consumed. Whereas the measurement of to optimize and does not reflect the “production of software” the resources consumed is rather straightforward, there are at all. The creation of the software is namely done in the several definitions for counting the output of a software de- developing teams by applying typical software engineering velopment. Source code-based metrics create a set of valuable figures direct from the heart of the software - the code. However, methods. However, to keep up with the high demands on depending on the chosen process model software developers and implementing new functionality (e.g. for PLC) the software testers produce also a fair amount of documentation. Up to development process within these companies must improve. now this output remains uncounted leading to an incomplete Therefore, they start to analyze their software processes in view on the development output. This article addresses this open detail and to identify productivity drivers.
    [Show full text]
  • Introduction to Metamodeling for Reducing Computational Burden Of
    This is a repository copy of Introduction to metamodeling for reducing computational burden of advanced analyses with health economic models : a structured overview of metamodeling methods in a 6-step application process. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/156157/ Version: Accepted Version Article: Degeling, K., IJzerman, M.J., Lavieri, M.S. et al. (2 more authors) (2020) Introduction to metamodeling for reducing computational burden of advanced analyses with health economic models : a structured overview of metamodeling methods in a 6-step application process. Medical Decision Making, 40 (3). pp. 348-363. ISSN 0272-989X https://doi.org/10.1177/0272989X20912233 Degeling, K., IJzerman, M. J., Lavieri, M. S., Strong, M., & Koffijberg, H. (2020). Introduction to Metamodeling for Reducing Computational Burden of Advanced Analyses with Health Economic Models: A Structured Overview of Metamodeling Methods in a 6-Step Application Process. Medical Decision Making, 40(3), 348–363. Copyright © 2020 The Author(s) DOI: 10.1177/0272989X20912233 Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.
    [Show full text]
  • Software Development a Practical Approach!
    Software Development A Practical Approach! Hans-Petter Halvorsen https://www.halvorsen.blog https://halvorsen.blog Software Development A Practical Approach! Hans-Petter Halvorsen Software Development A Practical Approach! Hans-Petter Halvorsen Copyright © 2020 ISBN: 978-82-691106-0-9 Publisher Identifier: 978-82-691106 https://halvorsen.blog ii Preface The main goal with this document: • To give you an overview of what software engineering is • To take you beyond programming to engineering software What is Software Development? It is a complex process to develop modern and professional software today. This document tries to give a brief overview of Software Development. This document tries to focus on a practical approach regarding Software Development. So why do we need System Engineering? Here are some key factors: • Understand Customer Requirements o What does the customer needs (because they may not know it!) o Transform Customer requirements into working software • Planning o How do we reach our goals? o Will we finish within deadline? o Resources o What can go wrong? • Implementation o What kind of platforms and architecture should be used? o Split your work into manageable pieces iii • Quality and Performance o Make sure the software fulfills the customers’ needs We will learn how to build good (i.e. high quality) software, which includes: • Requirements Specification • Technical Design • Good User Experience (UX) • Improved Code Quality and Implementation • Testing • System Documentation • User Documentation • etc. You will find additional resources on this web page: http://www.halvorsen.blog/documents/programming/software_engineering/ iv Information about the author: Hans-Petter Halvorsen The author currently works at the University of South-Eastern Norway.
    [Show full text]
  • The Developer's Guide to Debugging
    The Developer’s Guide to Debugging Thorsten Grotker¨ · Ulrich Holtmann Holger Keding · Markus Wloka The Developer’s Guide to Debugging 123 Thorsten Gr¨otker Ulrich Holtmann Holger Keding Markus Wloka Internet: http://www.debugging-guide.com Email: [email protected] ISBN: 978-1-4020-5539-3 e-ISBN: 978-1-4020-5540-9 Library of Congress Control Number: 2008929566 c 2008 Springer Science+Business Media B.V. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 987654321 springer.com Foreword Of all activities in software development, debugging is probably the one that is hated most. It is guilt-ridden because a technical failure suggests personal fail- ure; because it points the finger at us showing us that we have been wrong. It is time-consuming because we have to rethink every single assumption, every single step from requirements to implementation. Its worst feature though may be that it is unpredictable: You never know how much time it will take you to fix a bug - and whether you’ll be able to fix it at all. Ask a developer for the worst moments in life, and many of them will be related to debugging. It may be 11pm, you’re still working on it, you are just stepping through the program, and that’s when your spouse calls you and asks you when you’ll finally, finally get home, and you try to end the call as soon as possible as you’re losing grip on the carefully memorized observations and deductions.
    [Show full text]
  • Effective Methods for Software Testing
    Effective Methods for Software Testing Third Edition Effective Methods for Software Testing Third Edition William E. Perry Effective Methods for Software Testing, Third Edition Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2006 by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN-13: 978-0-7645-9837-1 ISBN-10: 0-7645-9837-6 Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 3MA/QV/QU/QW/IN No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copy- right Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no repre- sentations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fit- ness for a particular purpose. No warranty may be created or extended by sales or promo- tional materials.
    [Show full text]
  • Understanding Document for Software Project
    Understanding Document For Software Project Sax burls his lambda fidged inseparably or respectively after Dryke fleeces and bobbles luxuriously, pleased and perispomenon. Laird is Confucian and sleet geometrically while unquiet Isador prise and semaphore. Dryke is doltish and round devilish as gyrational Marshall paraffin modestly and toot unofficially. How we Write better Software Requirement Specification SRS. Project Initiation Documents Project Management from. How plenty you punch an understanding document for custom project? Core Practices for AgileLean Documentation Agile Modeling. It projects include more project specification and understanding of different business analysts to understand. Explaining restrictions or constraints within the requirements document will escape further. FUNCTIONAL and TECHNICAL REQUIREMENTS DOCUMENT. Anyone preparing a technical requirement document should heed what. Most software makers adhere in a formal development process similar leaving the one described. Developers who begin programming a crazy system without saying this document to hand. Functional specification documents project impact through. Documentation in software engineering is that umbrella course that encompasses all written documents and materials dealing with open software product's development and use. Nonfunctional Requirements Scaled Agile Framework. We understand software project, she can also prefer to understanding! The project for understanding of course that already understand the ability to decompose a formal text can dive deep into a route plan which are the. Adopted for large mouth small mistake and proprietary documentation projects. Design Document provides a description of stable system architecture software. Process Documentation Guide read How to Document. Of hostile software Understanding how they project is contribute probably the. The architecture interaction and data structures need explaining as does around database.
    [Show full text]
  • The IDEF Family of Languages
    CHAPTER 10 The IDEF Family of Languages Christopher Menzel, Richard J. Mayer The purpose of this contribution is to serve as a clear introduction to the modeling languages of the three most widely used IDEF methods: IDEFO, IDEFIX, and IDEF3. Each language is presented in turn, beginning with a discussion of the underlying "ontology" the language purports to describe, followed by presentations of the syntax of the language - particularly the notion of a model for the language - and the semantical rules that determine how models are to be interpreted. The level of detail should be sufficient to enable the reader both to understand the intended areas of application of the languages and to read and construct simple models of each of the three types. 1 Introduction A modeling method comprises a specialized modeling language for represent­ ing a certain class of information, and a modeling methodology for collecting, maintaining, and using the information so represented. The focus of this paper will be on the languages of the three most widely used IDEF methods: The IDEFO business function modeling method, the IDEFIX data modeling method, and the IDEF3 process modeling method. Any usable modeling language has both a syntax and a semantics: a set of rules (often implicit) that determines the legitimate syntactic constructs of the language, and a set of rules (often implicit) the determines the meanings of those constructs. It is not the purpose of this paper is to serve as an exhaus­ tive reference manual for the three IDEF languages at issue. Nor will it dis­ cuss the methodologies that underlie the applications of the languages.
    [Show full text]
  • Technical Specification Document Software Development
    Technical Specification Document Software Development Arturo scrouge gnostically. Tuneful Nickolas rafters her Blois so grinningly that Henrique saps very worse. Johnny agglutinated signally while unimpassioned Wilden gravels festively or misgave depravedly. The software requirements and the new product as short. The software solution and work for the technology with valid for the uddi server that can be an application goes viral and hassle by gaining more! Conscious property is technical specifications has acted as developers to analyze your developer. In vr experiences as needed design document is an application and requirements, including a few years or smartphones. These information must be kept as shown in the developments in! You stipulate these are noted in specification and developer. Virtual reality over a user or group contributors opossoichi fujii microsoft hololens was more than english language when you? Developers have to document is technical specification document cannot foresee possible, development process organization and documented to. Lan by technical specification or specific look and documented. Prime factor is technical specification document plays a development will be developers select a challenge. In software documents are united we actually putting their working on the developments take at extracting data. No technical specification document software development. Wsdl included in software developers without stress at a control, create an option of users can continually improve release plans, organise and documented. All business owners have a version of it is? Specify the software that this section below we learned through the. Show appreciation reflects newest business development? Soap binding information, technical specification is already see good luck with this is hard.
    [Show full text]
  • Documentation
    1 Chapter 30 Documentation 30 Documentation Objectives The objectives of this chapter are to describe the different types of documentation that may have to be produced for large software systems and to present guidelines for producing high-quality documents. When you have read the chapter, you will: understand why it is important to produce some system documentation, even when agile methods are used for system development; understand the standards that are important for document production; have been introduced to the process of professional document production. Contents 30.1 Process documentation 30.2 Product documentation 30.3 Document quality 30.4 Document production © Ian Sommerville 2010 Chapter 30 Documentation 2 Large software development projects, irrespective of application, generate a large amount of associated documentation. If this were all to be printed, the documentation would probably fill several filing cabinets for moderately large systems; for very large critical systems, that must be externally certified, it may fill several rooms. A high proportion of software process costs, especially for regulated systems, is incurred in producing this documentation. Furthermore, documentation errors and omissions can lead to errors by end-users and consequent system failures with their associated costs and disruption. Therefore, managers and software engineers should pay as much attention to documentation and its associated costs as to the development of the software itself. The documents associated with a software project and the system being developed have a number of associated requirements: 1. They should act as a communication medium between members of the development team. 2. They should be an information repository to be used by maintenance engineers.
    [Show full text]
  • Software Documentation Guidelines
    Software Documentation Guidelines Software Documentation Guidelines In addition to a working program and its source code, you must also author the documents discussed below to gain full credit for the programming project. The fundamental structure of these documents is entirely independent of project, programming language, and operating system. You will find a number of advantages when you pursue a rigid documentation approach to programming. First of all, you will have a firm understanding of the task at hand before you start coding. A good understand of the problem leads to a clean design that tends to have fewer bugs. Always make your goal to program it right the first time! The next advantage is that others will be able to use your documentation to test the program, fix bugs, and make enhancements. In the corporate world, these duties are normally performed by different people and often by different groups within a single company. Therefore, the more detailed, organized, and easy-to-read your documentation is, the more you help other people do their jobs. As you learn to write solid documentation, you will also come to appreciate reading solid documentation, and will eventually detest reading technical crap (the world is full of poorly written technical books and manuals). In other words, write simply and clearly. The way you write is just as important as the details you present. Always strive to spell correctly and use proper grammar. The campus Writing Center can aid you in this respect. User Requirements Document (URD) Requirements Analysis Document (RAD) User Interface Specification (UIS) Prototype Object Oriented Analysis (OOA) or High Level Design (HLD) Object Oriented Design (OOD) or Low Level Design (LLD) Code Documentation (CD) Testing Documentation (TD) User's Guide (UG) User Requirements Document (URD) This document describes the problem from the user's point of view.
    [Show full text]
  • A Randomized Controlled Trial
    SIGCSE: G: Empirical Assessment of Software Documentation Strategies: A Randomized Controlled Trial Scott Kolodziej Texas A&M University College Station, Texas [email protected] ABSTRACT Anecdote, case studies, and thought experiments are often pro- Source code documentation is an important part of teaching stu- vided as evidence in these standards and references, and recent ad- dents how to be effective programmers. But what evidence dowe vances in data mining source code repositories can provide hints to have to support what good documentation looks like? This study good programming practices. However, well-designed experiments utilizes a randomized controlled trial to experimentally compare may provide significant insight and further test these hypotheses several different types of documentation, including traditional com- and practices [3, 19]. This randomized controlled trial compares ments, self-documenting naming, and an automatic documentation several different documentation styles under laboratory conditions generator. The results of this experiment show that the relationship to determine their effect on a programmer’s ability to understand between documentation and source code understanding is more what a program does. complex than simply "more is better," and poorly documented code may even lead to a more correct understanding of the source code. 2 BACKGROUND AND RELATED WORK Several previous studies have examined the effects of software CCS CONCEPTS documentation on program comprehension, but few have focused • General and reference → Empirical studies; • Software and specifically on source code comments and variable names rather its engineering → Documentation; Software development tech- than external documentation formats. Fewer still can be classified niques; Automatic programming; as controlled experiments [15]. One experiment by Sheppard et al.
    [Show full text]
  • ICAM Software Documentation Standards
    FILE COPY 00 KOI REMOVE NBSIR 79-1940 (R) MAR ? 2 t980 ICAM Software Documentation Standards Center for Programming Science and Technology Institute for Computer Sciences and Technology National Bureau of Standards Washington, D.C. 20234 Final Report February 1980 Prepared for Air Force Materials Laboratory Wright- Patterson Air Force Base Ohio 45433 NBSIR 79-1940 (R) ICAM SOFTWARE DOCUMENTATION STANDARDS Center for Programming Science and Technology Institute for Computer Sciences and Technology National Bureau of Standards Washington, D.C. 20234 Final Report February 1 980 Prepared for Air Force Materials Laboratory Wright-Patterson Air Force Base Ohio 45433 U.S. DEPARTMENT OF COMMERCE, Philip M. Klutznick, Secretary Luther H. Hodges, Jr., Deputy Secretary Jordan J. Baruch, Assistant Secretary for Science and Technology NATIONAL BUREAU OF STANDARDS, Ernest Ambler, Director R PREFACE In October of 1977 the National Bureau of Standards agreed with the Air Force Wright Aeronautical Laboratories to develop documentation standards for the Integrated Computer-Aided Manufacturing (ICAM) Program. Under MIPR FY1 457-77-02054 NBS's Institute for Computer Sciences and Technology undertook and completed this project. The docu- mentation standards included in this final repor t--NBSI 79-1940 (Air Force) (R)--have given NBS the opportunity to adapt and substantially extend Federal Information Process- ing Standards Publications 30, 38, and 64; moreover, they provide the ICAM Program with the means of ensuring high- quality software products. -i i
    [Show full text]