Partha Pratim Ray Automated bug localization in embedded softwares A new paradigm through holistic approaches

i -f n5.26 'LAMBERT % P12-CL Academic Publishing 39231

_T.

Partha Pratim Ray

Automated bug localization in embedded softwares

A new paradigm through holistic approaches

ONIVf-

'/VtU-s iw)" •

LAP LAMBERT Academic Publishing Impressum/Imprint (nur fur Deutschland/only for Germany) Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothekverzeichnet diese Publlkatlon in der Deutschen Natlonalblbliografie; detaillierte bibliografische Daten sind im Internet iiber http://dnb.d-nb.de abrufbar. Aile in diesem Buch genannten Marken und Produktnamen unterliegen warenzeichen-, marken- oderpatentrechtlichem Schutz bzw. sind Warenzeichen oder eingetragene Warenzeichen derjeweiligen Inhaber. Die Wiedergabe von Marken, Produktnamen, Gebrauchsnamen, Handelsnamen, Warenbezeichnungen u.s.w. in diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme, dass solche Namen im Sinne der Warenzeichen- und Markenschutzgesetzgebung als frei zu betrachten waren und daher von jedermann benutzt werden diirften.

Coverbild: www.ingimage.com

Verlag: LAP LAMBERT Academic Publishing GmbH&Co. KG Heinrich-Bocking-Str. 6-8,66121 Saaibriicken, Deutschland Telefon +49 681 3720-310. Telefax +49 681 3720-3109 Email: info@lap-pub!ishing.com Approved by: Haldia,West Bengal Universityof Technology, Thesis, 2011

Hersteilung in Deutschland: Schaltungsdienst Lange o.H.G., Berlin Books on Demand GmbH, Norderstedt Reha GmbH, Saarbrucken Amazon Distribution GmbH, Leipzig ISBN: 978-3-8484-4439-7

Imprint (only for USA, CB) Bibliographic information published by the Deutsche Nationalbibliothek: The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Intemet at httpV/dnb.d-nb.de. Any brand names and product names mentioned in this book are subject to trademark, brand or patent protection and are trademarks or registered trademarks of their respective holders. The use of brand names, product names, common names, trade names, product _descriptions etc. even without a particular marking in this works is in no way to be construed to mean that such names may be regarded as unrestricted in respect of trademark and brand protection legislation and could thus be used by anyone.

Coverimage: www.ingimage.com

Publisher: LAP LAMBERT Academic Publishing GmbH 8i Co. KG Heinrich-Bocking-Str. 6-8, 66121 Saarbrucken, Germany Phone +49 681 3720-310, Fax+49 681 3720-3109 Email: [email protected]

Printed in the U.S.A. Printed in the U.K. by (see last page) ISBN: 978-3-8484-4439-7 Copyright © 2012 by the author and LAP LAMBERT Academic Publishing GmbH 8.Co. KG and licensors /• n t All rights reserved. Saarbrucken 2012 0 0^' Call —.1 •*" A»N..33..aS.l 0-1^^I ^

Additionally, I want to thank the faculties and staffofElectronics and Communication Engineering department for all their help to complete my degree and prepare for a career as a enthusiastic science-seeker. This includes (but certainly is not limited to) the following individuals:

Dr. Sunandan Bhunia

He made it possible for me to have many wonderful experiences I enjoyed as a student, including the opportunity to get in touch with the external sources of knowledge base such as ISI, Kolkata. His believe and immense support on me and my work encouraged to move ahead in research work. 1 want to greet a veiy special thank to him.

Mr. Soutav Kumar Das

His constant help and faith in Embedded Systems Laboratory during last few months provided me great support in completing my work.

And finally, I must thank my dear friends, father and mother, well wishers and obviously Poulami Majumder, for putting up with me during the development of this work with continuing, loving support and no complaint. I do not have the words to express all my feelings here, only that 1 love you, all!

Place: Haldia Date: 29"" April, 2011 Partha Pratim Ray -^•Iditionaiiy^ I Abstract

individuals. ' "'^"'^•seeker, n,: . *'°'np'etc my degree amJ prepare fof " Debugging denotes the process of detecting root causes of unexpected observable behaviors in prograins,- such as a program crash, an unexpected output value being produced or an assertion violation. Debugging of program errors is a difficult task and often takes "lade i(' „Possible for tn a significant amount of time in the software development life cycle. In the context of embedded software, the probability of bugs is quite high. Due to requirements of low "PPortunity "'nnderful experiences Ienjoyed as a code size and less resource consumption, embedded softwares typically do away with a """ ">y '^orJt Koikata" lot of sanity checks during development time. This leads to high chance of errors being special th (q believe and immense support on ree uncovered in the production code at ran time. OB-' Debugging embedded softwares is a common problem. Several techniques and tools do exist for solving this problem. However, various kinds of bugs remain untouched regarding '^natanthj,- unauthorized and inefficient memory related issues and few bugs may also reside inside the states of the software code. No feasible algorithm or technique has ever been developed ^"d tinallv t '"PPort [„ ^a'anis Laboratory during; last fe^ that will solve all the particular cases ofthis problem accurately. It turns out, though, that the width of the debugging metrics are quite important in a "m^°'' "" •"h"!'"'" •""' """P'. WP" wi.hcr, .nd »l>«°yi}\y large number of the embedded systems having various buggy issues. This becomes readily '"•"IP.h.,,, "•=

Partha Pratim Ray indeed promising. Therefore, the goal ofthis thesis is to answer the following question:

Table of Contents "y ated technique or methodology be developedfor the general prob- - "f'^^bus^ngembedded.ofr.are. .ith narrow interval parameters? Page coefficiems ' Problem of debugging embedded softwares with narrow interval i Acknowledgements i that win 1 . '' Mnsider it possible to develop a feasible technique Abstract ill --"-ive an partieular eases Of this problem automatically. Table ofContents v Chapter

1 Introduction 1

2 Literature Review 3 2.1 Software Bug 3 2.1.1 Types ofBugs 3 2.1.2 Bug Handling 5 2.2 Debugging 5 2.3 Embedded System 6

Sb. 2.3.1 Embedded Processors 7 2.3.2 Features ofEmbedded Systems 7

2.4 Embedded Software 8

2.4.1 Characteristics of Embedded Software 9

2.4.2 Embedded Software Architecture 10 2.4.3 Models ofComputation 11 2.5 BusyBox 12 2.5.1 Overall View of BusyBox 13 2.5.2 Usage 13 2.5.3 Bugs in BusyBox 13

3 Related Work 17 4 Memory Error Detection 20 4.1 Valgrind 20

4.1.1 Introduction 20

V 4.1.2 Interpreting Memchecks Output 21 4.2 Bug Analysis 24 4.2.1 Analyzing the Aip Bug in BusyBox 24 4.2.2 Analyzing the Top Bug in BusyBox 24 5 Invariant Analysis 32 5.1 Introduction _ 32 5.1.1 Invariant 32 5.1.2 Program Point . 32 5.1.3 Nonsensical 33 5.2 Daikon _ _ 33 5.2.1 Introduction 33 5.2.2 Executing Daikon 34 5.3 Methodology 35 5.4 Concept of Aip Bug Behind Our Approach 38 5.5 Bug Analysis _ 3g 5.5.1 Trace File Generation _ _ 4] 5.5.2 Invariant Detection 4I 5.5.3 Invariant Comparison . 43 5.5.4 Output Analysis 43 6 Object State Incorporated Debugging-OSiD 45 6.1 Introduction 6.2 Class Dependence Graph 4g 6.3 Our Approach 6.4 Bug Localization 6.4.1 Test OOP and Other Metrics jq 6.4.2 State Chart ^ 6.4.3 Stale Transition Table 6.4.4 CIdg Representation of the Test Program 53 6.4.5 Test Suite Deployment 6.4.6 State Information Comparison jg

vi 6-4.7 Slate Comparison Matrix -"%.-- 57 4 2 BugAnalysis 21 6.4.8 Source Level Bug Localization 57 24 7 Conclusion and Future Work 59 24 List of Publications 60 24 References 61 5-1 Introduction 32 5-I'l Invariant.. 32 ^-'-2 ProgramPoint 32 5-1-3Nonsensical 32

5-2 Daikon.. 33

5-2-1Introduction 33 ^-2-2ExecutingDaikon 33 Methodology 34 35

38 38 ^•-2T"'"variantDetection. 41 • '"variantComparison 41 6 Obi , 43 43 46 46 E"8Locallaation 46 47 49 50 52 52 '"'""nationCom^""ipatisoQ""••••-. 53 53 56 Chapter 1

Introduction

Embedded softwares are the heart ofembedded systems. The development ofembedded softwares need to undergo the process ofgeneral software development cycle, resulting in code development and debugging as important activities. Embedded softwares typically do away with lot of sanity checks during development time. This leads to high chance of errors being uncovered in the production code at nm time. In this thesis we propose a methodology for debugging errors in embedded softwares. As a case study we have used BusyBox. de-facto standard for embedded Linux in embedded systems and a few pseudo codes tescmbiing a few blocks of embedded BusyBox. Our first methodology works on top ofValgrind, a popular memory error detector. Our second work deals with invariant analysis ofsome parts ofBusyBox by Daikon, an invariant analyzer. As out final contribution we developed a technique which incorporates object state information into class dependence ^aph (cldg) to detect bugs in object oriented programs for embedded

softwares.

Problem Statement ; Different kinds ofdebugging techniques are available for debug ging embedded softwares. Specific types of errors that result from memory mishandling, unauthorized memory access etc. are very much crucial to embedded softwares in respect of bug intrusion, that must be detected well at the time ofdevelopment. Bugs can penetrate into softwares through other ways also. But the existing approaches to detect the above said bugs, are unable to execute its job satisfactorily. Hence the need for developing new debugging methodologies aroused-

Vltl

Organization : This thesis ' regarding various aspects f follows. Chapter 2presents literature review Chapter 2 'heir features. Chapter 3 systems and embedded software and raemoiy enor detection Cha t ^0'^: on this matter. Chapter 4presents the state incorporated debugging te '' invariant analysis. Chapter 6presents object Literature Review ^wks, ®technique. Chapter 7 presents future work and concluding

2.1 Software Bug

A software bug [40] is an error, flaw, mistake, undocumented feature that prevents it from behaving as intended (e.g. producing an incorrect result). Most bugs arise due to mistakesand errors made by the programmer, either in the program source code or its design, and a few are caused by compilers producing incorrect code. A program that contains a large number ofbugs, and orbugs that seriously interfere with its functionality, is said to be buggy. Reports detailing bugs in a program are commonly known as bug reports, fault reports or problem reports. " s l:>"i 2.1.1 Types of Bugs 1. Type I error [39], also known as an error of the first kind, an a error, ora fiilse positive; the error of rejecting a null hypothesis when it is actually true. It occurs when we are observing a difference when intruththere isnone, thus indicating a test of poor speciflction. Type I error can be viewed as the error ofexcessive credulity [46] or

even hallucination.

2. Type II error [39], also known as an error of the second kind, a p error, ora ftlse negative; the error of failing to reject a null hypothesis when in fact we should have rejected it. In other words, this is the error offailing to observe a difference when in truth there is one, thus indicating a test of poor sensitivity. Type II error can be viewed as the error of excessive skepticism [47]or myopia.

•i! 'he proerammer can be ama' «ft-are debugging varies ^ SCFTWARE fPQAJ MEMCflY on .he '^PP'Pie-ity of the system, and also depends, to -^^buggers. Defaugg,^ J® '^"euage(s) used and the available tools, snch as fiJO ccr^SaoN -hAcmjATcre which enable the programmer .o monitor the CONS^RSCN .. r ii, re-starf;. .... ^ [41]. high.,eve, pro " change values in tlernoty- ALMUARV SYSTEMS tbey have features ' ®"oh as Java, make debugs HUMAN OtAQNOSTIC

».pto 1! ' ' «" «Wm, Z """ " '="•»» PPP o, . few 2.3.1 Embedded Processors compm Tdembedded as part oft Embedded processors can be broken into two broad categories: ordinary microprocessors today [27] ^ Embedded system "®*'b'e and to meet a (gP) and microcontrollers (pC). which have many more peripherals on chip, reducing cost taracteristics Of embedded svst'""'™' in common use and size. There are Von Neumann as well as various degrees ofHarvard architectures. •^-bedded systems are de. ^re [243: RISC as well as non-RISC and VLIW types of processors are available in market. Few of these are: 65816. 65C02. 68HC0S. 68HC11. 68k. 78KOR/78KO, 8051. ARM. AYR, AVR32. -"Zrr^" be ageneral- Blackfin, C167, Coldfire. C0P8. Coitus APS3, eZ8, eZ80. FR-V. H8. HT48. M16C, M32C, MIPS. MSP430, PIC. PowerPC. R8C. RL78. SHARC, SPARC. ST6, SupcrH, TLCS-47. TLCS-870, TLCS-900, Tricore, V850. . XE8000, Z80, AsAP etc.

2.3.2 Features of Embedded Systems memory chips. firmware. The figure 2.1 illustrates the peripheral environment of an embedded processor as well as a construction of abasic embedded system. Embedded systems are very much accustomed

ormational systems, which simply take abody of input 2.4.3 Models of Computation data and transform it mio abody of output data. There are many models ofcomputation, each dealing with concurrency and time in diRcrent ways. Here we outline some of the most useful models for embedded software [37], ^•4.2 Embedded software contain, I. Dataflow ; in dataflow models, actors are atomic (indivisible) computations that feature of important measure, such as (24): are triggered by the availability of input data. Connections between actors represent • Simple control loop ;i„ ,i,i, • the flow ofdata from a producer actor to a consumer actor, subroutines, ejei, ^''8". the software simply has a loop. The loop calls 2. Time Triggered :Some systems with timed events are driven by clocks, which are . , ., " f**" ®f<" 'hethe hardware or software.sol '"♦^ntpt conctoued system c signals with events that are repeated indefinitely with a fixedperiod. rupt conitollei ibis means 'tnhedded systems are predominantly inter- 3. Synchronous/Reactive :In the synchronous/reactive (SR) model of computation, by differentkindsofevents, Performed by the system are trise=«d connections between components represent data values that are aligned with global ^««ve mnititaskin, „ clock ticks, as with time-triggered approaches [37]. ^simple control loop multitasking system is very similat 4, Rendezvous :In synchronous message passing, the components are processes, and Preemptive multitagkinutasl^j^g .j processes communicate in atomic, instantaneous actions called rendezvous. If two between tme-n- . . ® . I or threads based onl ^ "low-,eve, piece of code S.iitches processes are to communicate, and one reaches the point first at which it is ready to communicate, then it stalls until the other process is ready to communicate. >^-~andMicroand "o-kerneiExo-kerrtel-^p,.Exr, i.^ -a >.• " ^""necteri ®u interrupt). TheThp ,„..,iusual arrangement. •^®«:toV.~-.. «aio. switches the CPU to d

11 10

J 2.5 BusyBox

In this text we have experimented on BusyBox [22], a de-facto standard for Linux in emebedded systems. BusyBox is a fairly comprehensive set of programs needed to run a Linux system. Moreover it is the de-facto standard for embedded Linux systems, providing many standard Linux utilities, but having small code size (size of executables), than the GNU Core Utilities. BusyBox provides compact replacements for many traditional full blown utilities found on most desktop and embedded Linux distributions. Examples include the le utilities such as Is, cat, cp, dir, head, and tail. BusyBox also providessupport for more complex operations, such as ifcong, netstat, route, and other network utilities. BusyBox is remarkably easy to configure, compile, and use, and it has the potential to significantly reduce the overall system resources required to support a wide collection of common Linux utilities. BusyBox in general case can be built on any architecture supported by gcc. BusyBox is modular and highly configurable, and can be tailored to suit particular requirements. The package includes a configuration utility similar to that used to configure the Linux kernel. The commands in BusyBox are generally simpler implementations than their full-blown counterparts. In some cases, only a subset of the usual command line options is supported. In practice,however, the BusyBox subset of command functionality is more than sufficient for most general embedded requirements. ° The BusyBox bundle functions as a single executable where the different utilities are actually passed on at the command line for separate invocation. It is not possible to build the individual utilities separately and run them stand alone. For example, for nmning the arp utility, we need to invoke BusyBox as arp-Ainet and record the execution trace. Since we work on the binary level, the buggy implementation for us is the BusyBox binary, which has a large code base (about 121000 lines of code).

12

[. [[. acpid, addgroup, adduser, adjtimex, ar, arp, arping, ash, awk, basenatnc, beep, blkid, brctl, bunzip2, bzcat, bzip2, cal, cat, catv, chat, chattr, chgrp, chmod, chown, chpasswd, chpst, chroot, chrt, chvt, cksuin, clear, ctnp, conn, cp, cpio, crond, crontab, cryptpw, cttyhack, cut, date, dc, dd, deallocvt, delgroup, deluser, depmod, devmen, df, dhcprelay, diff, dimame, dnesg, dnsd, dnsdomainname, dos2unix, du, dunpkmap, dunpleases, echo, ed, egrep, eject, env,.envdir, envuidgid, ether-wake, expand, expr, fakeidentd, false, fbset, fbsplash, fdflush, fdformat, fdisk, fgrep, find, findfs, flash.eraseall, flash_lock, halt, hd, hdpam, head, hexdump, hostid, hostname, httpd, hush, hwclock, id, ifconfig, ifdown, ifenslave, ifplugd, ifup, inetd, init, inotifyd, logread, losetup, Ipd, Ipq, Ipr, Is, Isattr, Ismod, Ispci, Isusb, Izmacat, Izop, Izopcat, makedevs, makemime, man, mdSsum, mdev, patch, pgrep, pidof, ping, ping6, pipe_progress, pivot_root, pkill, popmaildir, poweroff, printenv, printf, ps, pscan, pwd, raidautorun, rdate, rdev, readahead, readlink, readprofile, realpath, reboot, refomime, renice, reset, resize, im, rmdir, nmnod, route, rpm, rpm2cpio, rtcwake, run-parts, runlevel, runsv, • runsvdir, rx, script, scriptreplay, sed, sendnail, seq, setarch, setconsole, setfont, setkeycodes, setlogcons, setsid, setuidgid, sh, shalsum, sha2S6sun, sha512sum, showkey, slattach, sleep, softlimit, sort, split, ostart-stop- daemon, stat, strings, stty, su, sulogin, sun, sv, svlogd, swapoff, swapon, switch_root, sync, sysctl, syslogd, tac, tail, tar, taskset, tcpsvd, tee, telnet, telnetd, test, tftp, tftpd, time, uudecode, uuencode, vconfig, vi, vlock, volname, wall, watch, watchdog, wc, wget, which, who, whoami, xargs, yes, zcat, zcip, msh, mt, nv etc.

Figure 2.2: The commands incorporated in BusyBox version-1.16.0.

14 • buildroot--A configurable means for building users' own busybox/ uClibc based system systems, maintained by the uClibc developers • OpenWrt--A for embedded devices, based on

buildroot • Tiny Core Linux—A very small minimal Linux GUI Desktop • PTXdist—Another configurable means for building your own busybox based systems • installer (boot floppies) project

• Red Hat installer

Installer

install/boot CDs

• The Mandriva installer • Linux Embedded Appliance Firewall—The sucessor of the Linux Router Project, supporting all sorts of embedded Linux gateways, routers, wireless routers, and firewalls

• Build Your Linux Disk • AdTran - VPN/firewall VPN Linux Distribution

• mkCDrec - make CD-ROM recovery

• Partition Image • —A Hnux distribution for handheld computers

" Netstation • GNU/Fiwix • TimcSys real-time Linux • MoviX—Boots from CD and automatically plays every video file on the

CD • Salvare—More Linux than tomsrtbt but less than , aims to provide a useful workstation as well as a rescue disk and many more.

Figure 2.3: List of projects that use the BusyBox.

15 * Galaxy MGB4500 Raid Pro NAS * StoryBox Ultimate uses BusyBox vl.1.3 1 *Western Digital's ShareSpace network attached storage device i| * RD129 embedded board from ELPA ! * EMTEC MovieCube R700 uses BusyBox 1.1.3 i * The Kerbango Internet Radio < * LinuxMagic VPN Firewall * Cyclades-TS and other Cyclades products * Linksys WRT54G - Wireless-G Broadband Router

* Dell TnieMobile 1184

* NetGear WG602 wireless router with sources here * ASUS WL-300g Wireless LAN Access Point with source here * Belkin S4g Wireless DSL/Cable Gateway Router with source here * Acronis PartitionExpert 2003 * U.S. Robotics Sureconnect 4-port ADSL router with source here * ActionTec GT701-WG Wireless Gateway/DSL Modem with source here * DLink—Model GSL-G604T, DSL-300T

* Siemens SESIS DSL router

* Free Remote Windows Terminal * ZyXEL Routers etc.

Figure 2.4: List of products that use the BusyBox.

16