Programs That Test Themselves

COVER FEATURE PROGRAMS THAT TEST THEMSELVES Bertrand Meyer, ETH Zurich and Eiffel Software Arno Fiva, Ilinca Ciupa, Andreas Leitner, and Yi Wei, ETH Zurich Emmanuel Stapf, Eiffel Software The AutoTest framework automates the and gauges that perform continuous testing and gather software testing process by relying on data for maintenance. programs that contain the instruments of While software does not physically degrade during op- their own verification, in the form of con- eration, its development requires extensive testing (and tract-oriented specifications of classes and other forms of verification); yet software design usually pays little attention to testing needs. It is as if we had not their individual routines. learned the lessons of other industries: Software construc- tion and software verification are essentially separate activities, each conducted without much consideration of odern engineering products—from planes, the other’s needs. A consequence is that testing, in spite of cars, and industrial plants down to refrig- improved tools, remains a labor-intensive activity. erators and coffee machines—routinely test themselves while they operate. The AUTOTEST goal is to detect possible deficiencies and AutoTest is a collection of tools that automate the Mto avoid incidents by warning the users of needed main- testing process by relying on programs that contain the tenance actions. This self-testing capability is an integral instruments of their own verification, in the form of part of the design of such artifacts. contracts—specifications of classes and their individual The lesson that their builders have learned is to design routines (methods). The three main components of Auto- for testability. This concept was not always understood: Test address complementary aspects: With cars, for example, we used to have no clue (save for the oil gauge) that major mechanical trouble might be im- • Test Generation: automatically creates and runs test minent; if we wanted to know more, we would take our cases, without any human input such as manually car to a mechanic who would check every component prepared test cases and test oracles. from scratch, not knowing what actually happened during • Test Extraction: automatically produces test cases operation. Today’s cars, in contrast, are filled with sensors from execution failures. The observation behind Test 46 COMPUTER Published by the IEEE Computer Society 0018-9162/09/$26.00 © 2009 IEEE Extraction is that some of the most important test in later versions, indicating that the software has partly cases are not devised as such: They occur when a “regressed.” A project should retain any test that failed developer tries the program informally during de- at any stage of its history, then passed after the fault was velopment, but then it’s execution fails. The failure is corrected; test campaigns should run all such tests to spot interesting, in particular for future regression testing, cases of regression. but usually it is not remembered: The developer fixes Automated tools should provide resilience. A large test the problem and moves on. From such failures, Test suite is likely to contain some test cases that, in a particular Extraction automatically creates test cases, which can execution, crash the program. Resilience means that the be replayed in subsequent test campaigns. process may continue anyway with the remaining cases. • Integration of Manual Tests: supports the development One of the most tedious aspects of testing is test case and management of manually produced tests. Unlike generation. With modern computers we can run very Test Generation and Test Extraction, this functionality large numbers of test cases. Usually, developers or testers relies on state-of-the-art techniques and includes no have to devise them; this approach, limited by people’s major innovation, but it ensures a smooth interaction time, does not scale up. The AutoTest tools complement of the automatic mechanisms with existing practices such manual test cases with automatic tests exercising by ensuring all tests are managed in the same way the software with values generated by algorithms. Object- regardless of their origin—generated, extracted, or oriented programming increases the difficulty because it manual. requires not only elementary values such as integers but also objects. These mechanisms, initially developed for research purposes at ETH Zurich, have now been integrated into the EiffelStudio environment and are available both as an AutoTest helps provoke failures open source download (http://eiffelstudio.origo.ethz.ch) and manage information about the and commercially. Research continues on the underlying corresponding faults. theory and methods (http://se.ethz.ch/research/autotest). Our working definition of testing focuses on one essen- tial aspect: To test a program is to try to make it fail.1 Other definitions include more lofty goals, such as “provid[ing] Test oracles represent another challenge. A test run is information about the quality of the product or service” only useful if we know whether it passed or failed; an oracle (http://en.wikipedia.org/wiki/Software_testing). But in is a mechanism to determine this. Here too a manual pro- practice, the crucial task is to uncover failures of execu- cess does not scale up. Approaches such as JUnit include tion, which in IEEE-standard terminology2 reflect faults oracles in test cases through such instructions as “assert in the program, themselves the result of mistakes in the (success_criterion),” where “assert” is a general mechanism developer’s thinking. AutoTest helps provoke failures and that reports failure if the success_criterion does not hold. manage information about the corresponding faults. This automates the application of oracles, but not their preparation: The tester must still devise an assert for every ‘AutoMatED testING’ test. AutoTest’s approach removes this requirement by rely- “Automated testing” is a widely used phrase. To under- ing on contracts already present in the code. stand what it entails, it is necessary to distinguish several Another candidate for automation is minimization. It is increasingly ambitious levels of automation. desirable to retain and replay any test that ever failed. The What is best automated today is test execution. In a proj- failure may, however, have happened after a long execu- ect that has generated thousands of test cases, running tion exercising many instructions that are irrelevant to the them manually would be tedious, especially as testing failures. Retaining them would make regression testing too campaigns occur repeatedly—for example, it is customary slow. Minimization means replacing a test case, whenever to run extensive tests before every release. Traditionally, possible, with a simplified one producing a failure that testers wrote scripts to run the tests. The novelty is the evidences the same fault. spread of frameworks such as JUnit (www.junit.org) that Commonly used frameworks mostly address the first avoid project-specific scripts. This widely influential de- three goals: test execution, regression testing, and re- velopment has markedly improved testing practice, but it silience. They do not address the most labor-intensive only automates a specific task. tasks: preparing test cases, possibly in a minimized form, A related goal, also addressed by some of today’s tools, and interpreting test results. Without progress on these is regression testing. It is a common phenomenon of soft- issues, testing confronts a paradox: While the growth of ware development that some corrected faults reappear computing power should enable us to perform ever more SEPTEMBER 2009 47 cover FEATURE DESIGN BY CONTRACT Contract” sidebar describes the use of contracts in more detail. In the traditional Eiffel process, developers write pro- esign by Contract1 is a mechanism pioneered by Eiffel that D characterizes every software element by answering three grams annotated with contracts, then manually run these questions: programs, relying on the contracts to check the executions’ correctness. AutoTest’s Test Generation component • What does it expect? adds many more such executions by generating test cases • What does it guarantee? automatically. • What does it maintain? Execution will, on entry to a routine r, evaluate r’s precondition and the class invariant; on exit, it evaluates r’s Answers take the form of preconditions, postconditions, and invariants. For example, starting a car has the precondition that postcondition and the invariant. For correct software, such the ignition is turned on and the postcondition that the engine is evaluations always yield true, with no other consequence; running. The invariant, applying to all operations of the class but an evaluation to false, known as a contract violation, CAR, includes such properties as “dashboard controls are illumi- signals a flaw:3 nated if and only if ignition is on.” With Design by Contract, such properties are not expressed in separate requirements or design documents but become part of • A precondition violation signals a possible fault in the the software; languages such as Eiffel and Spec#, and language client (the routine that called r). extensions such as JML, include syntax—keywords such as • A postcondition or invariant violation signals a pos- require, ensure, and invariant—to state contracts. sible fault in the supplier (r itself). Applications cover many software tasks: analysis, to make sure requirements are precise

Programs That Test Themselves

From Eiffel to the Java Virtual Machine

Eiffelstudio: a Guided Tour

Spest - a Tool for Specification-Based Testing

Automatic Verification of Eiffel Programs

List of Compilers 1 List of Compilers

PART VI: Appendices

Invitation to Eiffel (Bertrand Meyer)

Ada-Europe 2010 That Will Take Place June 2010 in Valencia, Spain

Eiffel Analysis, Design and Programming Language ECMA-367

Advancing Compilers

Eiffel: Analysis, Design and Programming Language ECMA-367

Data Persistence in Eiffel