Beyond Accuracy: Assessing Software Documentation Quality

Beyond Accuracy: Assessing Soware Documentation ality Christoph Treude Justin Middleton Thushari Atapattu [email protected] [email protected] [email protected] University of Adelaide North Carolina State University University of Adelaide Adelaide, SA, Australia Raleigh, NC, United States Adelaide, SA, Australia ABSTRACT provides initial evidence for the strengths and weaknesses of dif- Good software documentation encourages good software engi- ferent genres of documentation (blog articles, reference documen- neering, but the meaning of “good” documentation is vaguely de- tation, README files, Stack Overflow threads, tutorials) based on fined in the software engineering literature. To clarify this ambi- the ten dimensions of our software documentation quality frame- guity, we draw on work from the data and information quality work. community to propose a framework that decomposes documenta- The contributions of this work are: tion quality into ten dimensions of structure, content, and style. To • demonstrate its application, we recruited technical editorsto apply A ten-dimensional framework for asking questions about the framework when evaluating examples from several genres of software documentation quality, • software documentation. We summarise their assessments—for ex- A partially validated survey instrument to evaluate docu- ample, reference documentation and README files excel in quality ment quality over multiple documentation genres, and • whereas blog articles have more problems—and we describe our vi- A vision for the expansion of a unified quality framework sion for reasoning about software documentation quality and for through further experimentation. the expansion and potential of a unified quality framework. 2 BACKGROUND AND RELATED WORK CCS CONCEPTS The most related piece of work to this paper is the seminal 1995 • Software and its engineering → Documentation; • Social article “Beyond Accuracy: What Data Quality Means to Data Con- and professional topics → Quality assurance. sumers” by Wang and Strong [17]. We follow the same beyond- accuracy approach for the domain of software documentation. KEYWORDS software documentation, quality Defective Software Documentation. Defect detection tools have been widely investigated at the code level, but very few studies fo- 1 INTRODUCTION AND MOTIVATION cus on defects at the document level [22]. The existing approaches High-quality software documentation is crucial for software de- in the documentation space investigate inconsistencies between velopment, comprehension, and maintenance, but the ways that code and documentation. In one of the first such attempts, Tan et documentation can suffer poor quality are numerous. For exam- al. [16] presented @tcomment for testing Javadoc comments re- ple, Aghajani et al. [1] have designed a taxonomy of 162 doc- lated to null values and exceptions. DocRef by Zhong and Su [19] umentation issue types, covering information content, presenta- detects API documentation errors by seeking out mismatches be- tion, process-related matters, and tool-related matters. Addition- tween code names in natural-language documentation and code. ally, because software documentation is written in informal nat- AdDoc by Dagenais and Robillard [3] automatically discovers doc- ural language which is inherently ambiguous, imprecise, unstruc- umentation patterns which are defined as coherent sets of code tured, and complex in syntax and semantics, its quality can often elements that are documented together. Also aimed at inconsisten- arXiv:2007.10744v3 [cs.SE] 8 Sep 2020 only be evaluated manually [6]. The assessment depends on con- cies between code and documentation, Ratol and Robillard [15] pre- text for some quality attributes (e.g., task orientation, usefulness), sented Fraco, a tool to detect source code comments that are fragile it mixes content and medium for others (e.g., visual effectiveness, with respect to identifier renaming. retrievability), and sometimes it requires looking beyond the docu- Wen et al. [18] presented a large-scale empirical study of code- mentation itself (e.g., accuracy, completeness). While most related comment inconsistencies, revealing causes such as deprecation work has focused on documentation accuracy and completeness and refactoring. Zhou et al. [21, 22] contributed a line of work on (e.g., for API documentation [3, 19]), the fact that many developers detecting defects of API documents with techniques from program are reluctant to carefully read documentation [20] suggests that comprehension and natural language processing. They presented documentation suffers from issues beyond accuracy. As such, we DRONE to automatically detect directive defects and recommend lack a comprehensive framework and instruments for assessing solutions to fix them. And in recent work, Panthaplackel et al.[8] software documentation quality beyond these characteristics. proposed an approach that learns to correlate changes across two In this work we adapt quality frameworks from the data and in- distinct language representations, to generate a sequence of edits formation quality community to the software engineering domain. that are applied to existing comments to reflect code modifications. We design a survey instrument to assess software documentation To the best of our knowledge, little of the existing work tar- quality from different sources. A pilot study with four technical gets the assessment or improvement of the quality of software editors and 41 documents related to the R programming language documentation beyond accuracy and completeness; one example ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Christoph Treude, Justin Middleton, and Thushari Atapau is Plösch and colleagues’ survey on what developers value in doc- 4 PILOT STUDY umentation [13]. We turn to related work from the data and infor- To gather preliminary evidence on whether experts would be able mation quality community to further fill this gap. to use the proposed framework for assessing software documentation, we conducted a pilot study with technical editors and different genres in software documentation. Recruitment. To recruit technical editors, we posted an adver- Information Quality. In their seminal 1995 work “Beyond Accu- 1 racy: What Data Quality Means to Data Consumers”, Wang and tisement on Upwork, a website for freelance recruitment and em- Strong [17] conducted a survey to generate a list of data quality at- ployment. Our posting explained our goal of evaluating software tributes that capture data consumers’ perspectives. These perspec- documentation on ten dimensions and identifying weaknesses in tives were grouped into accuracy, relevancy, representation, and their design. Therefore, we required applicants to be qualified as accessibility, and they contained a total of 15 items such as believ- technical editors with programming experience, and accordingly, ability, reputation, and ease of understanding. A few years later, we would compensate them hourly according to their established Eppler [4] published a similar list of dimensions which contained rates. From this, we recruited the first four qualified applicants 16 dimensions such as clarity, conciseness, and consistency. with experience in technical writing and development. We refer These two sets of dimensions built the starting point into our to our participants henceforth as E1 through E4. investigation of dimensions from the data and information qual- Data Collection. Because developers write software instructions ity community which might be applicable to software documenta- in a variety of contexts, we use a broad definition of documen- tion. Owing to the fact that software documentation tends to be tation, accepting both official, formal documentation and implicit disseminated over the Internet, we further included the work of documentation that emerges from online discussion. We focus on Knight and Burn [7], who discussed the development of a frame- the genres of software documentation that have been the sub- work for assessing information quality on the World Wide Web ject of investigation in previous work on software documentation: and compiled a list of 20 common dimensions for information and reference documentation [5], README files [14], tutorials [11], data quality from previous work, including accuracy, consistency, blogs [10], and Stack Overflow threads [2]. To better control for and timeliness. variation between language, we focused exclusively on resources documenting the R programming language or projects built with it. All documentation was then randomly sampled from the following sources: 3 SOFTWARE DOCUMENTATION QUALITY • Reference Documentation (RD): from the 79 subsections of the R language manual.2 Inspired by the dimensions of information and data quality iden- • README files (R): from the first 159 items in a list3 of tified in related work, we designed the software documentation open-source R projects curated by GitHub users. quality framework summarised in Table 1. The first and last author • Tutorials (T): from the 46 R tutorials on tutorials- of this paper collaboratively went through the dimensions from re- point.com.4 lated work cited in Section 2 to select those that (i) could apply to • Articles (A): from 1,208 articles posted on R-bloggers.com,5 software documentation, (ii) do not depend on the logistics of ac- a blog aggregation website, within the last six months. cessing the documentation (e.g., security), and (iii) can be checked • Stack Overflow

Beyond Accuracy: Assessing Software Documentation Quality

Measuring the Software Size of Sliced V-Model Projects

Introduction to Metamodeling for Reducing Computational Burden Of

Software Development a Practical Approach!

The Developer's Guide to Debugging

Effective Methods for Software Testing

Understanding Document for Software Project

The IDEF Family of Languages

Technical Specification Document Software Development

Documentation

Software Documentation Guidelines

A Randomized Controlled Trial

ICAM Software Documentation Standards