Efficient Dependency Detection for Safe Java Test Acceleration

Efficient Dependency Detection for Safe Java Test Acceleration Jonathan Bell, Gail Kaiser Eric Melski, Mohan Dattatreya Columbia University Electric Cloud, Inc 500 West 120th St 35 S Market Street New York, NY USA San Jose, CA USA {jbell,kaiser}@cs.columbia.edu {ericm,mohan}@electric-cloud.com ABSTRACT Our study of popular open source Java programs echoes Slow builds remain a plague for software developers. The fre- these results, finding projects that take hours to build, with quency with which code can be built (compiled, tested and most of that time spent testing. Even in cases of projects packaged) directly impacts the productivity of developers: that build in a more manageable amount of time | for ex- longer build times mean a longer wait before determining if ample, five to ten minutes | faster builds can result in a a change to the application being built was successful. We significant increase in productivity due to less lag-time for have discovered that in the case of some languages, such as test results. Java, the majority of build time is spent running tests, where To make testing faster, developers may turn to techniques dependencies between individual tests are complicated to such as Test Suite Minimization (which reduce the size of discover, making many existing test acceleration techniques a test suite, for instance by removing tests that duplicate unsound to deploy in practice. Without knowledge of which others) [11, 12,23,24,27,28,37,39], Test Suite Prioritization tests are dependent on others, we cannot safely parallelize (which reorders tests to run those most relevant to recent the execution of the tests, nor can we perform incremen- changes first) [14, 15, 35, 36, 38], or Test Selection [17, 25, 32] tal testing (i.e., execute only a subset of an application's (which selects tests to execute that are impacted by recent tests for each build). The previous techniques for detecting changes). Alternatively, given a sufficient quantity of cheap these dependencies did not scale to large test suites: given a computational resources (e.g. Amazon's EC2), we might test suite that normally ran in two hours, the best-case run- hope that we could reduce the amount of wall time needed ning scenario for the previous tool would have taken over to run a given test suite even further by parallelizing it. 422 CPU days to find dependencies between all test meth- All of these techniques involve executing tests out of or- ods (and would not soundly find all dependencies) | on the der (compared to their typical execution | which may be same project the exhaustive technique (to find all depen- random but is almost always alphabetically), making the dencies) would have taken over 10300 years. We present a assumption that individual test cases are independent. If novel approach to detecting all dependencies between test some test case t1 writes to some persistent state, and t2 de- cases in large projects that can enable safe exploitation of pends on that state to execute properly, we would be unable parallelism and test selection with a modest analysis cost. to safely apply previous work in test parallelization, selection, minimization, or prioritization without knowledge of Categories and Subject Descriptors this dependency. Previous work by Zhang et al. has found that these dependencies often come as a surprise and can D.2.5 [Software Engineering]: Testing and Debugging cause unpredictable results when using common test priori- Keywords tization algorithms [40]. Test dependence, detection algorithms, empirical studies This assumption is part of the controlled regression testing assumption: given a program P and new version P 0, when 1 Introduction P 0 is tested with test case t, all factors that may influence 0 Slow builds remain a hinderance to continuous integration the outcome of this test (except for the modified code in P ) and deployment processes, with the majority of building remain constant [34]. This assumption is key to maintaining time often spent in the testing phase. Our industry part- the soundness of techniques that reorder or remove tests ners confirm previous results reported in literature [35]: test from a suite. In the case of test dependence, we specifically suites frequently take several hours to run | often over a day assume that by executing only some tests, or executing them | making it hard to run them with the desired frequency. in a different order, we are not effecting their outcome (i.e., that they are independent). One simple approach to accelerating these test suites is to ignore these dependencies, or hope that developers specify Permission to make digital or hard copies of part or all of this work for them manually. However, previous work has shown that personal or classroom use is granted without fee provided that copies are inadvertently dependent tests exist in real projects, can take not made or distributed for profit or commercial advantage and that copies significant time to identify, and pose a threat to test suite bear this notice and the full citation on the first page. Copyrights for third- correctness when applying test acceleration techniques [29, party components of this work must be honored. For all other uses, contact 40]. Zhang et al. show that dependent tests are a serious the Owner/Author. Copyright is held by the owner/author(s). ESEC/FSE ’15, August 31 - September 04, 2015, Bergamo, Italy problem, finding in a study of five open source applications ACM 978-1-4503-3675-8/15/08. 96 tests that depend on other tests [40]. In our own study we nique can allow for sound test acceleration. While we im- found many large test suites in popular open source software plement ElectricTest in Java, we believe that we present a do not isolate their tests, and hence, may potentially have sufficiently general approach that should be applicable to dependencies. other memory-managed languages. Our new approach and tool, ElectricTest, detects dependencies between test cases in both small and large, real- 2 Motivation world test suites.ElectricTest monitors test execution, de- To motivate our work, we set out to answer three motivating tecting dependencies between tests, adding on average a 20x questions to ground our approach: slowdown to test execution when soundly detecting depen- MQ1: For those projects that take a long time to build, dencies. In comparison, we found that the previous state of what component of the build dominates that time? the art approach applied to these same projects showed an MQ2: Are existing test acceleration approaches safe to average slowdown of 2,276x (using an unsound heuristic not apply to real world, long running test suites? guaranteed to find all dependencies), often requiring more MQ3: Can the state of the art in test dependency detec- than 10308 times the amount of time needed to run the test tion be practically used to safely apply test accelera- suite normally in order to exhaustively find all dependencies. tion to these long running test suites? Moreover, the existing technique does not point developers 2.1 A Study of Java Build Times to the specific code causing dependencies, making inspection In our previous work [8], we studied 20 open source Java and analysis of these dependencies costly. applications to determine the relative amount of build time With ElectricTest, it becomes feasible to soundly perform spent testing, finding testing to consume on average 78% of test parallelization and selection on large test suites. Rather build time. The longest of these projects took approximately than detect manifest dependencies (i.e., a dependency that 40 minutes to build, while the shortest completed in under changes the outcome of a test case, the definition in previous one minute. Given a desire to target projects with very long work by Zhang et al, DTDetector [40]), ElectricTest detects build times, we wanted to make sure that those very long simple data dependencies and anti-dependencies (i.e., read- running builds were also spending most of their time in tests. over-write and write-over-read). Since not all data depen- If we are sure that most of the time spent building these dencies will result in manifest dependencies, our approach is projects is in the testing phase, then we can be confident inherently less precise than DTDetector at reporting \true" that a reduction in testing time will have a strong impact in dependencies between tests, though it will never miss a de- reducing overall build time. pendency that DTDetector would have detected. However, For this study, we downloaded the 1,966 largest and most in the case of long running test suites (e.g. over one hour), popular Java projects from the open source repository site, the DTDetector approach is not feasible. On popular open GitHub (those 1,000 with the most forks and stars overall, source software, we found that the number and type of de- and those 1,000 with the most forks over 300 MB, as of pendencies reported by ElectricTest allow for up to 16X December 23rd, 2014). From these projects, we searched for speedups in test parallelization. only those with tests (i.e., had files that had the word \test" Our key insight is that, for memory-managed languages, in their name), bringing our list to 921 projects. we can efficiently detect data dependencies between tests by Next, we looked at the different build management sys- leveraging existing efficient heap traversal mechanisms like tems used by each project: there are several popular build those used by garbage collectors, combined with filesystem systems for Java, such as ant, maven, and gradle. To mea- and network monitoring. For ElectricTest, test T depends 2 sure the per-step timing of building each of these projects, on test T if T reads some data that was last written by T .

Efficient Dependency Detection for Safe Java Test Acceleration

Java™ Technology Test Suite Development Guide

Corner Cases and Possible Improvements of Automated Testing

Javatest Harness Architect’S Guide, Javatest Harness 4.4.1 for the Java Platform E20663-02

Java™ Technology Compatibility Kit User's Guide Template

2007 Javaonesm Conference Word “BENEFIT” Is in Green Instead of Orange

Joseph D. Darcy Oracle #Include "Std Orcl Disclaimer.H"

Javatest™ Harness 4.5

Making Software More Reliable by Uncovering Hidden Dependencies

Javatest™ Harness 4.3

Javatest™ Harness (The Harness) to Run Tests of the Test Suite, Browse Results, and Write Reports Test Results

Automatizzazione E Controllo Delle Soluzioni Per I Test Dello Strumento TCK Di JSR331

Java Technology Compatibility Kit (TCK) Project Planning And