Tests from Traces: Automated Unit Test Extraction for R Filip Křikava, Jan Vitek

Tests from traces: automated unit test extraction for R Filip Křikava, Jan Vitek To cite this version: Filip Křikava, Jan Vitek. Tests from traces: automated unit test extraction for R. 27th ACM SIG- SOFT International Symposium on Software Testing and Analysis (ISSTA 2018), Jul 2018, Amster- dam, Netherlands. pp.232-241, 10.1145/3213846.3213863. hal-02131523 HAL Id: hal-02131523 https://hal.archives-ouvertes.fr/hal-02131523 Submitted on 16 May 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Tests from Traces: Automated Unit Test Extraction for R Filip Křikava Jan Vitek Czech Technical University Northeastern University and CTU Czech Republic USA ABSTRACT This paper explores a rather simple idea, namely, can we effec- Unit tests are labor-intensive to write and maintain. This paper tively and efficiently extract test cases from program execution traces? looks into how well unit tests for a target software package can be Our motivation is that if programmers do not write comprehensive extracted from the execution traces of client code. Our objective is unit test suites, then it may be possible for a tool to extract those to reduce the effort involved in creating test suites while minimizing for them, especially when the software to test is widely used in the number and size of individual tests, and maximizing coverage. other projects. Our approach is as follows: For each project and To evaluate the viability of our approach, we select a challenging its reverse dependencies, gather all runnable artifacts, be they test target for automated test extraction, namely R, a programming lan- cases or examples, that may exercise the target, run the artifacts guage that is popular for data science applications. The challenges in an environment that records execution traces, and from those presented by R are its extreme dynamism, coerciveness, and lack traces produce unit tests and, if possible, minimize them, keeping of types. This combination decrease the efficacy of traditional test only the ones that increase code coverage. extraction techniques. We present Genthat, a tool developed over The key question we aim to answer is: how well can automated the last couple of years to non-invasively record execution traces trace-based unit test extraction actually work in practice? The metrics of R programs and extract unit tests from those traces. We have of interest are related to the quality of the extracted tests, their carried out an evaluation on 1,545 packages comprising 1.7M lines coverage of the target project, as well as the costs of the whole of code. The tests extracted by Genthat improved code coverage process. To answer this we have to pick an actual programming from the original rather low value of 267,496 lines to 700,918 lines. language and its associated software ecosystem. Any combination The running time of the generated tests is 1.9 times faster than the of language and ecosystem is likely to have its quirks, we structure code they came from. the paper so as to identify those language specific features. Concretely, this paper reports on our implementation and empir- CCS CONCEPTS ical evaluation of Genthat, a tool for automated extraction of unit tests from traces for the R statistical programming language [11]. • Software and its engineering → Software testing and de- The R software ecosystem is organized around a curated open bugging; source software repository named CRAN. For our purposes, we randomly select about 12% of the packages hosted on CRAN, or KEYWORDS 1,545 software libraries. These packages amount to approximately Test extraction, Program tracing, R 1.7M lines of code stripped of comments and empty lines. The main- ACM Reference Format: tainers of CRAN enforce the presence of so-called vignettes, these Filip Křikava and Jan Vitek. 2018. Tests from Traces: Automated Unit Test are documentation with runnable examples, for all hosted libraries. Extraction for R. In Proceedings of 27th ACM SIGSOFT International Sympo- Some libraries come equipped with their own test cases. These sium on Software Testing and Analysis (ISSTA’18). ACM, New York, NY, USA, are typically coarse grained scripts whose output is compared for 10 pages. https://doi.org/10.1145/3213846.3213863 textual equality. Our aim with Genthat is to help R developers extract unit tests that can easily pinpoint the source of problems, 1 INTRODUCTION are reasonably small, provide good coverage, and execute quickly. Testing is an integral part of good software engineering practices. Furthermore, we want to help developers to automatically capture Test-driven development is routinely taught in Computer Science common usage patterns of their code in the wild. In the R code programs, yet a cursory inspection of projects on GitHub suggests hosted on CRAN, most of the code coverage comes from examples that the presence of test suites cannot be taken for granted and and vignettes, and very little is already in the form of tests. In the even when tests are available they do not always provide sufficient corpus of 1,545 packages we selected for this work, tests provide coverage or granularity needed to easily pinpoint the source of only an average of 19% coverage, whereas when examples and errors. vignettes are executed coverage is boosted to 68%. As of this writing, we are not aware of any other tool for auto- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed matically extracting test cases for R. This likely due to the limited for profit or commercial advantage and that copies bear this notice and the full citation interest that data analysis languages have garnered in our com- on the first page. Copyrights for components of this work owned by others than the munity, and also due to features of R that make it a challenging author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission target for tooling. The language is extremely dynamic: it has no and/or a fee. Request permissions from [email protected]. type annotations to structure code, each and every operation can be ISSTA’18, July 16–21, 2018, Amsterdam, Netherlands redefined during execution, values are automatically coerced from © 2018 Copyright held by the owner/author(s). Publication rights licensed to the Association for Computing Machinery. type to type, arguments of functions are lazily evaluated and can ACM ISBN 978-1-4503-5699-2/18/07...$15.00 be coerced back to source expressions, values are modified using a https://doi.org/10.1145/3213846.3213863 ISSTA’18, July 16–21, 2018, Amsterdam, Netherlands Filip Křikava and Jan Vitek copy-on-write discipline most of the time, and reflective operations − All expressions are evaluated by-need, thus the call f(a+b) con- allow programs to manipulate most aspect of a program’s execution tains three delayed sub-expressions, one for each variable and state [9]. This combination of features has allowed developers to one for the call to plus. This means that R does not pass values build complex constructions, such as support for object-oriented to functions but rather passes unevaluated promises (the order programming, on top of the core language. Furthermore, large of evaluation of promises is part of the semantics as they can swaths of the system are written in C code and may break any have side effects). These promises can also be turned back into reasonable invariants one may hope for. code by reflection. The contributions of this paper are the description of a tool, − Most values are vectors or lists. Values can be annotated by Genthat, for automatically extracting unit tests for R, as well as key-value pairs. These annotations, coupled with reflection, are an empirical evaluation of that tool that demonstrates that for a the basic building blocks for many advanced features of R. An large corpus, 1,545 packages, it is possible to significantly improve example of this are the four different object systems that use code coverage. On average, the default tests that come with the annotations to express classes and other attributes. packages cover only 19%. After deploying Genthat we are able − R has a copy-on-write semantics for shared values. A value is to increase the coverage to 53%. This increase mostly comes from shared if it is accessible from more than one variable. This means extracting test cases from all the available executable artifacts in that side effects that change shared values are rare. This givesa the package and the artifacts from packages that depend on this functional flavor to large parts ofR. package. Genthat is surprisingly accurate, it can reproduce 80% of R is a surprisingly rich language with a rather intricate semantics, the calls executed by the scripts and it is also able to greatly reduce we can’t do it justice in the space at hand. The above summary the number and size of test cases that are retained in the extracted should suffice for the remainder of this paper. suite, running 1.9 times faster than package examples, tests and vignettes combined with only 15% less code coverage (53% vs 68%).

Load more