Documentation Reuse: Hot Or Not? an Empirical Study Mohamed Oumaziz, Alan Charpentier, Jean-Rémy Falleri, Xavier Blanc
Total Page:16
File Type:pdf, Size:1020Kb
Documentation Reuse: Hot or Not? An Empirical Study Mohamed Oumaziz, Alan Charpentier, Jean-Rémy Falleri, Xavier Blanc To cite this version: Mohamed Oumaziz, Alan Charpentier, Jean-Rémy Falleri, Xavier Blanc. Documentation Reuse: Hot or Not? An Empirical Study. 16th International Conference on Software Reuse (ICSR), May 2017, Salvador, Brazil. pp.12-27, 10.1007/978-3-319-56856-0_2. hal-02182142 HAL Id: hal-02182142 https://hal.archives-ouvertes.fr/hal-02182142 Submitted on 6 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Documentation Reuse: Hot or Not? An Empirical Study Mohamed A. Oumaziz, Alan Charpentier, Jean-R´emy Falleri, Xavier Blanc CNRS, Bordeaux INP, Univ. Bordeaux LaBRI, UMR 5800 F-33400, Talence, France {moumaziz,acharpen,falleri,xblanc}@labri.fr Abstract. Having available a high quality documentation is critical for software projects. This is why documentation tools such as Javadoc are so popular. As for code, documentation should be reused when possible to increase developer productivity and simplify maintenance. In this paper, we perform an empirical study of duplications in JavaDoc documentation on a corpus of seven famous Java APIs. Our results show that copy- pastes of JavaDoc documentation tags are abundant in our corpus. We also show that these copy-pastes are caused by four different kinds of relations in the underlying source code. In addition, we show that popular documentation tools do not provide any reuse mechanism to cope with these relations. Finally, we make a proposal for a simple but efficient automatic reuse mechanism. Keywords: documentation, reuse, empirical study 1 Introduction Code documentation is a crucial part of software development as it helps devel- opers understand someone else's code without reading it [13,25]. It is even more critical in the context of APIs, where the code is developed with the main intent to be used by other developers (the users of the API) that do not want to read it [12, 16, 17]. In this context, having a high quality reference documentation is critical [5]. Further, it has been shown that the documentation has to be close to its cor- responding code [8,9,14]. Developers prefer to write the documentation directly in comments within the code files rather than in external artifacts [22]. Popular documentation tools, such as JavaDoc or Doxygen, all share the same principle which is to parse source code files to extract tags from documentation comments and to generate readable web pages [20, 24]. Writing documentation and code are highly coupled tasks. Ideally, developers should write and update the documentation together with the code. However, it has been shown that the documentation is rarely up-to-date with the code [8,9, 14] and is perceived as very expensive to maintain [4, 5]. 2 Mohamed A. Oumaziz, Alan Charpentier, Jean-R´emy Falleri, Xavier Blanc We think that one possible reason for this maintenance burden is that docu- mentation tools lack reuse mechanisms whereas there are plenty of such mecha- nisms in programming languages. Developers that write documentation therefore copy-paste many documentation tags, which is suspected to increase the main- tenance effort [11]. As an example, let us consider a case of delegation as shown in the Figure 1. In this example, the right method is just returning directly a value computed from the left method. As expected, some documentation tags from the left method are copy-pasted in the right method: the common parameters and the return value. As a consequence, if the documentation of the callee method is updated, an update of the caller documentation will have to be carried out manually, which is well known to be error-prone [11]. /** /** * @param a the first collection, must * @param a the first collection, must not be null not be null * @param b the second collection, must * @param b the second collection, must not be null not be null * @return true iff the collections * @param equator the Equator contain the same elements with the used for testing equality same cardinalities. * @return true iff the collections */ contain the same elements with the public static boolean same cardinalities. isEqualCollection(final */ Collection a, final public static boolean Collection b) { isEqualCollection(final ... Collection a, final return true; Collection b, final Equator } equator ) { ... return isEqualCollection(collect(a, transformer), collect(b, transformer)); } Fig. 1. Extract of a documentation duplication due to method delegation (in the Apache Commons Collections project). Duplicated tags are displayed in bold. In this paper, we investigate this hypothesis and more formally answer the two following research questions: { RQ1: Do developers often resort to copy-paste documentation tags? { RQ2: What are the causes of documentation tags copy-paste and could they be avoided by a proper usage of documentation tools? We answer our research questions by providing an empirical study performed on a corpus of seven popular Java APIs where the need of documentation is crit- ical (see Section 2.1). We answer the first research question by showing how big is the phenomenon of documentation tags copy-pasting (see Section 3). To that extent, we automatically identify what we call documentation tags duplications (Section 2.2), count them, and manually check if they are intended copy-pastes Documentation Reuse: Hot or Not? An Empirical Study 3 or just created by coincidence. We answer the second research question by in- vestigating the intended copy-pastes we observed with the objective to find out their causes. Then we analyze whether existing documentation tools can cope with them (see Section 4). We further extend our second research question by providing a proposal for a simple but useful documentation reuse mechanism. Our results show that copy-pastes of documentation tags are abundant in our corpus. We also show that these copy-pastes are caused by four kinds of relations that take place in the underlying source code. In addition, we show that popular documentation tools do not provide any reuse mechanism to cope with these relations. The structure of this paper is as follows. First, Section 2 presents our corpus and the tool we create to automatically identify documentation duplications. Then, Section 3 and Section 4 respectively investigate our two research questions. Finally, Section 5 describes the related works about software documentation and Section 6 concludes and describes future work. 2 Experimental Setup In this section, we first explain how we create our corpus (Section 2.1), and give general statistics about it. Then, we describe how we extract documentation duplications contained in our corpus (Section 2.2). 2.1 Corpus The corpus of our study is composed of seven Java APIs that use JavaDoc, arbi- trary selected from the top 30 most used Java libraries on GitHub as computed in a previous work of Teyton et al. [23]. We just considered the source code used to generate the documentation displayed on their websites. We also choose to focus only on methods' documentation, as this is where there is most of the documentation. In the remainder of this paper, we therefore only discuss about the documentation of Java methods written in JavaDoc. Table 1 presents these seven APIs. All the data gathered for this study is available on our website1. As we can see in the General section of this table, the projects are medium to large sized (from 33 to 1,203 classes). As expected, they contain a fair amount of documentation: from about 28% to 97% of the methods are documented. The Tags section of this table gives some descriptive statistics of the JavaDoc tags used. As we can see, the most frequent tags are usually, in order: @description, @param, @return and @throw. Finally, the inheritDoc section of this table shows that there are few @inheritDoc tags. Such tags are used to express a documentation reuse between two methods but the method that reuses the documentation must override or implement the method that contains the reused documentation. 1 http://se.labri.fr/a/ICSR17-oumaziz 4 Mohamed A. Oumaziz, Alan Charpentier, Jean-R´emy Falleri, Xavier Blanc Table 1. Statistics computed from our corpus and the documentation it contains. acc1 acio2ggson3Guava JUnit Mockito SLF4J General # of classes 466 119 72 1,203 205 375 33 # of methods 4,078 1,173 569 9,928 1,319 1,716 433 % of documented methods 61.53 97.27 52.55 36.37 43.44 28.15 36.49 Tags # of @description 1,939 922 265 3,073 448 436 128 # of @param 1,199 734 106 749 178 237 51 # of @throw 438 209 65 462 12 11 5 # of @return 892 322 92 414 90 131 42 inheritDoc # of usage 85 18 0 112 2 0 0 1 Apache Commons Collections 2 Apache Commons IO 3 google-gson 2.2 Documentation Duplication Detector A documentation duplication is a set of JavaDoc tags that are duplicated among a set of Java methods. If it is intended then it was created by a copy-paste, if not then it was created by coincidence. We propose a documentation duplication detector that inputs a set of Java source code files and outputs the so-called documentation duplications2. The detector first parses the Java files and identifies all the documentation tags they contain by using the GumTree tool [7].