Improving Test Coverage of GNU Coreutils
Total Page:16
File Type:pdf, Size:1020Kb
MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨ OF INFORMATICS !"#$%&'()+,-./012345<yA| Improving test coverage of GNU coreutils BACHELOR THESIS Andrej Antaš Brno, 2012 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Andrej Antaš Advisor: RNDr. Petr Roˇckai iii Acknowledgement I would like to thank my supervisor and OndˇrejVašík, who had the patience to walk me through the process of creating this work. My thanks also belongs to my beloved family, friends and my loving girl- friend, who all stood by my side in need and never let me down. iv Abstract The aim of this bachelor degree thesis is to improve test coverage on the GNU coreutils project in Fedora Linux distribution. The output of my work is a patch file with changes done on tests and reported Red Hat bugzilla describing problems discovered by new altered tests, which also improved the robustness of the coreutils source code and brought the level of test coverage of these utilities near the level of the upstream coverage. v Keywords coreutils, testing, coverage, gcov, fedora, patch, multi-byte vi Contents 1 Introduction ............................1 2 GNU Core Utilities ........................4 2.1 Brief history .........................4 2.2 GNU File Utilities ......................4 2.3 GNU Shell utilities .....................5 2.4 GNU Text utilities ......................5 3 Testing ...............................6 3.1 Black box testing ......................7 3.2 White box testing ......................8 3.3 Coreutils test suite .....................9 4 Linux ................................ 10 4.1 Operating system ...................... 10 4.2 RPM Packaging system ................... 11 4.3 Fedora Project ........................ 11 4.4 Upstream ........................... 12 4.5 Downstream ......................... 12 5 Test coverage ........................... 14 5.1 Statement Coverage ..................... 14 5.2 Method Coverage ...................... 15 5.3 Condition Coverage .................... 15 5.4 Branch Coverage ...................... 16 5.5 Path coverage ........................ 17 6 Coverage tools .......................... 19 6.1 Gcov ............................. 19 6.1.1 Data format . 19 6.1.2 Usage . 20 6.1.3 Why gcov? . 25 6.2 Other used tools ....................... 26 6.2.1 LCOV . 26 6.2.2 Genhtml . 26 6.2.3 Git . 26 6.3 Other coverage tools .................... 28 6.3.1 Trucov . 28 6.3.2 Testwell CTC++ . 28 6.3.3 Bullseye Coverage . 29 vii 6.3.4 KLEE . 29 7 Test coverage retrieval ...................... 30 7.1 Prerequisites ......................... 30 7.2 Upstream test coverage ................... 31 7.3 Downstream test coverage ................. 33 8 Comparison ............................ 36 8.1 Cause of differences ..................... 37 8.1.1 Patches . 37 8.1.2 coreutils-i18n.patch . 37 8.1.3 Affected utilities . 38 8.2 Multi-byte test patch .................... 39 8.2.1 Making patch . 39 8.2.2 Differences after multi-byte test patch . 42 9 Conclusion ............................. 45 10 Attachment ............................ 50 viii 1 Introduction We, as developers and many companies use UNIX based operating systems for their development process and collection of other activ- ities. I want to talk about Linux, which belongs into this group and is distributed under GNU general public license [3], what basically means it is free to use even for companies and also comes with source codes, therefore is easy to alter in any way, so the system works for us not we for the sake of system. Linux as an operating system is built mainly on packaging, which means that every single program/utility in the system is distributed as a package available for you to download from official and unof- ficial repositories. These are easily reachable places on the internet, where packages are stored. This approach widens the opportunity for the user to customize the system. There obviously are some packages, that are common for all sys- tems like kernel (the core of the system) and for example also core- utils. These core components include utilities to correctly manipulate with the hardware and grant us the basic software to use the com- puter. As I mentioned, one of these basic packages is coreutils package, which is the one I was interested in for my bachelor thesis. Coreutils is the collection of utilities which give us the basic capabilities for work with file system, text and shell. This package is part of every Linux distribution with some small divergences. The availability of source codes means, that if user finds a mistake in software he is using, he can easily download the package with source code and if he is able to, he can correct the mistake and submit it to maintainers of the package, so they can revise this change and maybe include it in the next update. Therefore there are still many updates in distribution packages, but one problem arise with these. The problem is that policy for submitting new changes to down- stream is not that strict and does not care that much about testing your changes as the upstream (definitions of terms unknown to reader can be found in chapter 4). Exactly this situation occurred in coreutils package. Few bigger updates were introduced and dragged the test coverage of the source code rapidly down and also added potential 1 1. INTRODUCTION problems.This was the thing that led me to the project, because on every occasion the main thing we, as developers want, is to avoid situations when harmful code is pushed into the production. Mistakes in source code are also called “bugs” and the best ex- ample that came into my mind and probably the most expensive in human history is the explosion of European rocket Ariane V. The ex- plosion occurred forty seconds after the launch and what is the most interesting part, all because of one failing test said to be harmless. I mentioned it is the most expensive one and that is because the dam- age was quantified to 370 million US dollars.[2] The reason why I mentioned this particular situation is that it is the most extreme demonstration of what could possibly be caused by a bug, but also that the exact mistake was a raised software exception during data conversion from 64-bit floating point to 16-bit signed integer value[2]. Which is not quietly the same as untested changes in coreutils project but hugely reminds of the situation because my aim is the i18n patch which added multi-byte functionality to certain utilities in the package. Thesis starts with brief introduction into coreutils project which can be found in chapter two. After handshake with the project, there is the chapter three going deeper into testing and its principles also mentioning information about tests on coreutils project. Next, fourth, chapter is about Linux operating system and con- cludes all the basics we need, all terms used and also narrows our view to Fedora Linux distribution. This testing unrelated, but also es- sential chapter is followed by fifth chapter which is dealing with test coverage and explaining most used and the most important kinds from all existing. The coverage part is logically followed by tools which can re- trieve these statistics for us. It is the sixth chapter and it is further divided into section with description of collection of tools I really used during my work on thesis and section with selection of other alternatives that can be used for coverage retrieval. Seventh chapter is describing preparations before coverage re- trieval and then also my approaches of retrieving the coverage from the upstream source code followed by the downstream one and also showing the results of these procedures. In final, eighth, chapter the main cause of the differences in cov- 2 1. INTRODUCTION erage is described and also coverages of the most affected utilities are shown and discussed. Then the creation of the patch that raises the distribution package test coverage follows and at last bugs that these new test changes uncovered are listed and so the comparison with upstream coverage of these distinct utilities. The implementation of test changes was done in Perl program- ming language, utilities used for coverage retrieval are listed and described in chapter six as I mentioned before. The output of my bachelor thesis is patch with all test changes and I also reported bugs found after running these altered tests. 3 2 GNU Core Utilities The GNU Core Utilities are the basic file, shell and text manipula- tion utilities within the GNU operating systems. These are the core utilities which are expected to exist on every UNIX based operating system.[4] 2.1 Brief history Coreutils project originates from UNIX utilities, development of which started in early nineties and until now it still remained active, which is a very good sign for an open source project. In the whole development process, more than a hundred pro- grammers contributed to the project with 28 of them still being active nowadays. The Coreutils package is the combination of and replacement for the fileutils, sh-utils, and textutils packages. It began as the union of these revisions: fileutils-4.1.11, textutils-2.1, sh-utils-2.0.15 in August 2002.[10] The first major stable release (coreutils-5.0) was published in April 2003 and has gone through to version 8.15 since then. The maintainers of the project are Jim Meyering, Pádraig Brady, Eric Blake and Paul Eggert, with Jim Meyering being the contributor of more than 87% of the whole 110 000 lines of code now contained on the project.[5] 2.2 GNU File Utilities Collection of basic file manipulation utilities such as cp, chmod, mkdir, mv, rm, touch and other. It was along with other two packages of the similar basic purpose combined into coreutils and since then only the project GNU Core Utilities is maintained.[6] 4 2.