Fully Automated Testing of the Linux Kernel
Total Page:16
File Type:pdf, Size:1020Kb
Fully Automated Testing of the Linux Kernel Martin Bligh Andy P. Whitcroft Google Inc. IBM Corp. [email protected] [email protected] Abstract formance regressions and trends, adding sta- tistical analysis. A broad spectrum of tests are necessary—boot testing, regression, func- Some changes in the 2.6 development process tion, performance, and stress testing; from disk have made fully automated testing vital to the intensive to compute intensive to network in- ongoing stability of Linux R . The pace of de- tensive loads. A fully automated test harness velopment is constantly increasing, with a rate also empowers other other techniques that are of change that dwarfs most projects. The lack impractical when testing manually, in order of a separate 2.7 development kernel means that to make debugging and problem identification we are feeding change more quickly and di- easier. These include automated binary chop rectly into the main stable tree. Moreover, the search amongst thousands of patches to weed breadth of hardware types that people are run- out dysfunctional changes. ning Linux on is staggering. Therefore it is vi- tal that we catch at least a subset of introduced In order to run all of these tests, and collate the bugs earlier on in the development cycle, and results from multiple contributors, we need an keep up the quality of the 2.6 kernel tree. open-source client test harness to enable shar- Given a fully automated test system, we can ing of tests. We also need a consistent output run a broad spectrum of tests with high fre- format in order to allow the results to be col- quency, and find problems soon after they are lated, analysed and fed back to the community introduced; this means that the issue is still effectively, and we need the ability to “pass” fresh in the developers mind, and the offending the reproduction of issues from test harness to patch is much more easily removed (not buried the developer. This paper will describe the re- under thousands of dependant changes). This quirements for such a test client, and the new paper will present an overview of the current open-source test harness, Autotest, that we be- early testing publication system used on the lieve will address these requirements. http://test.kernel.org website. We then use our experiences with that system to de- fine requirements for a second generation fully automated testing system. 1 Introduction Such a system will allow us to compile hun- dreds of different configuration files on ev- It is critical for any project to maintain a high ery release, cross-compiling for multiple dif- level of software quality, and consistent inter- ferent architectures. We can also identify per- faces to other software that it uses or uses it. 114 • Fully Automated Testing of the Linux Kernel There are several methods for increasing qual- • it prevents replication of the bad code into ity, but none of these works in isolation, we other code bases, need a combination of: • fewer users are exposed to the bug, • skilled developers carefully developing • the code is still fresh in the authors mind, high quality code, • the change is less likely to interact with • static code analysis, subsequent changes, and • regular and rigorous code review, • the code is easy to remove should that be required. • functional tests for new features, • regression testing, In a perfect world all contributions would be widely tested before being applied; however, as • performance testing, and most developers do not have access to a large range of hardware this is impractical. More rea- • stress testing. sonably we want to ensure that any code change is tested before being introduced into the main- Whilst testing will never catch all bugs, it will line tree, and fixed or removed before most peo- improve the overall quality of the finished prod- ple will ever see it. In the case of Linux, An- uct. Improved code quality results in a better drew Morton’s -mm tree (the de facto develop- experience not only for users, but also for de- ment tree) and other subsystem specific trees velopers, allowing them to focus on their own are good testing grounds for this purpose. code. Even simple compile errors hinder devel- Test early, test often! opers. The open source development model and Linux In this paper we will look at the problem of au- in particular introduces some particular chal- tomated testing, the current state of it, and our lenges. Open-source projects generally suffer views for its future. Then we will take a case from the lack of a mandate to test submissions study of the test.kernel.org automated test sys- and the fact that there is no easy funding model tem. We will examine a key test component, for regular testing. Linux is particularly hard the client harness, in more detail, and describe hit as it has a constantly high rate of change, the Autotest test harness project. Finally we compounded with the staggering diversity of will conclude with our vision of the future and the hardware on which it runs. It is completely a summary. infeasible to do this kind of testing without ex- tensive automation. 2 Automated Testing There is hope; machine-power is significantly cheaper than man-power in the general case. Given a large quantity of testers with diverse It is obvious that testing is critical, what is per- hardware it should be possible to cover a use- haps not so obvious is the utility of regular test- ful subset of the possible combinations. Linux ing at all stages of development. It is important as a project has plenty of people and hardware; to catch bugs as soon as possible after they are what is needed is a framework to coordinate created as: this effort. 2006 Linux Symposium, Volume One • 115 Patches Patches Patches Patches range of systems. As a result, there is an op- portunity for others to run a variety of tests on incoming changes before they are widely dis- Test tree Test tree Test tree tributed. Where problems are identified and flagged, the community has been effective at Mainline Prerelease getting the change rejected or corrected. Mainline By making it easier to test code, we can en- courage developers to run the tests before ever submitting the patch; currently such early test- Distro Distro Distro Distro ing is often not extensive or rigorous, where it is performed at all. Much developer effort is Users Users Users Users being wasted on bugs that are found later in the cycle when it is significantly less efficient to fix them. Figure 1: Linux Kernel Change Flow 2.1 The Testing Problem 2.2 The State of the Union As we can see from the diagram in figure 1 It is clear that a significant amount of testing Linux’s development model forms an hour- resource is being applied by a variety of par- glass starting highly distributed, with contri- ties, however most of the current testing effort butions being concentrated in maintainer trees goes on after the code has forked from main- before merging into the development releases line. The distribution vendors test the code that (the -mm tree) and then into mainline itself. they integrate into their releases, hardware ven- It is vital to catch problems here in the neck dors are testing alpha or beta releases of those of the hourglass, before they spread out to the distros with their hardware. Independent Soft- distros—even once a contribution hits mainline ware Vendors (ISVs) are often even later in the it is has not yet reached the general user popu- cycle, first testing beta or even after distro re- lation, most of whom are running distro kernels lease. Whilst integration testing is always valu- which often lag mainline by many months. able, this is far too late to be doing primary test- ing, and makes it extremely difficult and ineffi- In the Linux development model, each actual cient to fix problems that are found. Moreover, change is usually small and attribution for each neither the tests that are run, nor the results of change is known making it easy to track the au- this testing are easily shared and communicated thor once a problem is identified. It is clear that to the wider community. the earlier in the process we can identify there is a problem, the less the impact the change will There is currently a large delay between a have, and the more targeted we can be in report- mainline kernel releasing and that kernel being ing and fixing the problem. accepted and released by the distros, embed- ded product companies and other derivatives Whilst contributing untested code is discour- of Linux. If we can improve the code qual- aged we cannot expect lone developers to be ity of the mainline tree by putting more effort able to do much more than basic functional test- into testing mainline earlier, it seems reason- ing, they are unlikely to have access to a wide able to assume that those “customers” of Linux 116 • Fully Automated Testing of the Linux Kernel would update from the mainline tree more of- it against. Moreover, performance tests are not ten. This will result in less time being wasted 100% consistent, so taking a single sample is porting changes backwards and forwards be- not sufficient, we need to capture a number of tween releases, and a more efficient and tightly runs and do simple statistical analysis on the integrated Linux community. results in order to determine if any differences are statistically significant or not.