Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection Jia-Ju Bai, Yu-Ping Wang, Jie Yin, and Shi-Min Hu, Tsinghua University https://www.usenix.org/conference/atc16/technical-sessions/presentation/bai
This paper is included in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16). June 22–24, 2016 • Denver, CO, USA 978-1-931971-30-0
Open access to the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16) is sponsored by USENIX. Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection
Jia-Ju Bai, Yu-Ping Wang, Jie Yin, and Shi-Min Hu Department of Computer Science and Technology, Tsinghua University
Abstract tion [34]. For example, “bad address” (EFAULT) is a Device drivers may encounter errors when communi- common error should be handled, but it happens only cating with OS kernel and hardware. However, error when the memory or I/O address is invalid. Another handling code often gets insufficient attention in driver example is hardware error, which happens only when development and testing, because these errors rarely the hardware malfunctions. Triggering these errors in occur in real execution. For this reason, many bugs are real environment is very hard and uncontrollable. hidden in error handling code. Previous approaches for To simulate software and hardware errors at runtime, testing error handling code often neglect the characteris- software fault injection (SFI) is often used in driver tics of device drivers, so their efficiency and accuracy testing. This technique mutates the code to inject specif- are limited. In this paper, we first study the source code ic errors into the program, and enforces error handling of Linux drivers to find useful characteristics of error code to be executed at runtime. Linux Fault Injection handling code. Then we use these characteristics in fault Capabilities Infrastructure (LFICI) [43] is a well-known injection testing, and propose a novel approach named project integrated in Linux kernel. It can simulate com- EH-Test, which can efficiently test error handling code mon errors, such as memory-allocation failures and bad in drivers. To improve the representativeness of injected data requests. Inspired by LFICI, other fault injection faults, we design a pattern-based extraction strategy to approaches [7, 13, 23, 32] have been proposed in recent automatically and accurately extract target functions years, and they have shown promising results in driver which can actually fail and trigger error handling code. testing and bug detection. However, these approaches During execution, we use a monitor to record runtime still have some limitations in practical use. information and pair checkers to check resource usages. The representativeness of injected faults is often We have evaluated EH-Test on 15 real Linux device neglected, and most injected faults are random or drivers and found 50 new bugs in Linux 3.17.2. The manually selected. Random faults can not reflect code coverage is also effectively increased. Comparison real errors well. Manually selected faults often omit experiments to previous related approaches also show representative injected faults. the effectiveness of EH-Test. Numerous redundant test cases are generated. In fact, many generated test cases may cover the same 1. Introduction error handling code, but they all need to be actually tested at runtime. For this reason, they often spend As important components of the operating system, de- much time in runtime testing. vice drivers control hardware and provide fundamental Only several kinds of faults can be injected, such as supports for high-level programs. During driver execu- memory-allocation failures. But these faults can not tion, different kinds of occasional errors may occur, cover most error handling code in drivers. such as kernel exceptions and hardware malfunctions Much manual effort is needed. The kinds and plac- [31]. Therefore, device drivers need error handling code es of injected faults are often manually decided. to assure reliability. But in some drivers, error handling In fact, previous fault injection approaches aim to code is incorrect or even missed. In these drivers, seri- support general software, but they neglect the character- ous problems like system crashes and hangs may occur istics of target programs. To relieve their limitations, we when occasional errors are triggered. According to our should consider the key driver characteristics in SFI. study on Linux driver patches, more than 40% of ac- For example, because drivers are often written in C, so cepted patches add or update corresponding error han- built-in error handling mechanisms (such as “try-catch”) dling code. It shows that error handling code in device are not supported. For this reason, the developers often drivers is not reliable enough, so testing error handling use an if check to decide whether the error handling code and detecting bugs inside are very necessary. code should be triggered in device drivers. This charac- A challenge of testing error handling code is that oc- teristic can help to decide which functions can actually casional errors are infrequent to happen in real execu- fail and should be fault-injected.
1
USENIX Association 2016 USENIX Annual Technical Conference 635 n his a er e irs s ud inu dri er code and 4) High automation and scalability. os or ing ind hree use ul charac eris ics in error handling code rocedure o es is au oma ed including arge function return value trigger few branches and check unc ion e rac ion aul in ec ion and es case e ecu decision hen ased on hese charac eris ics and ion nd i can su or man inds o e is ing dri ers e ro ose a rac ical a roach named es o n his a er e ma e he ollo ing con ri u ions e icien l es error handling code and de ec ugs in e s ud he source code o inu de ice dri ers side irs l es uses a pattern-based extraction and ind hree use ul charac eris ics in error han strategy o e rac arge unc ions hich can ail rom dling code he ca ured run ime races o normal e ecu ion his ased on he a erns o error handling code in s ra eg can au oma icall and accura el e rac real dri ers e design a a ern ased e rac ion s ra e arge unc ions o im ro e he re resen a i eness o g o au oma icall and accura el real arge unc in ec aul s hen e genera e es cases corru ing ions as re resen a i e in ec ed aul s he re urn alues o arge unc ions e e run each ased on he charac eris ics e design a rac ical es case on he real hard are and use a moni or o rec a roach named es hich can e icien l es ord run ime in orma ion and air chec ers o chec re error handling code in de ice dri ers and de ec source usage iola ions hese air chec ers con ain he ugs inside asic in orma ion o resource ac uiring and resource e e alua e es on de ice dri ers in inu releasing unc ions hich can e o ained rom s eci i and and res ec i el ind and ca ion mining echni ues and user con ugs ll he de ec ed ugs in ha e een igura ion uring dri er e ecu ion s s em crashes and con irmed de elo ers he code co erage is also hangs can e easil iden i ied hrough ernel crash logs e ec i el increased in run ime es ing e also or user o ser a ion er dri er e ecu ion es er orm com arison e erimen s o re ious a can re or resource release omissions e ha e im le roaches and ind ha es can de ec he ugs men ed es using and e alua ed i on hich are missed hem he e erimen al resul s inu dri ers o hree classes he resul s sho ha sho ha es can e icien l er orm dri er es ing and accura el ind real ugs es can accura el ind real ugs in error handling he res o his a er is organi ed as ollo s ec ion code and im ro e code co erage in run ime es ing in roduces he mo i a ion ec ion resen s he hree om arison e erimen s o re ious a roaches also charac eris ics o de ice dri ers ound our s ud on sho i s e ec i eness inu dri er code ec ion resen s our a ern ased om ared o re ious a roaches or es ing e rac ion s ra eg ec ion in roduces es in dri ers our a roach ha e our ad an ages de ail ec ion sho s our e alua ion on inu de 1) Representative injected faults. e design a a ice dri ers and com arison e erimen s o re ious ern ased e rac ion s ra eg o au oma icall and ac a roaches ec ion in roduces he rela ed or and cura el e rac real arge unc ions as re resen a i e ec ion concludes his a er in ec ed aul s uses code a erns o decide he her a unc ion can ac uall ail in dri er e ecu ion his s ra 2. Motivation eg can largel im ro e he e ec i eness o 2) Efficient test cases. ccording o our s ud man o ensure he relia ili o de ice dri ers error han dri ers ha e e ranches in error handling code so dling code should e correc l im lemen ed o handle in ec ing a single aul in each es case is enough o di eren inds o occasional errors u in ac error co er mos error handling code oreo er our a ern handling code is incorrec or e en missed in some dri ased e rac ion s ra eg can il er man unre resen a ers so hard o ind ugs ma occur during e ecu ion n i e in ec ed aul s here ore he es cases genera ed his sec ion e irs re eal his ro lem using a con es are e icien and he ime usage o run ime cre e e am le and our s ud on inu dri er a ches es ing can e largel shor ened and hen e s e ch he so are aul in ec ion ech 3) Accurate bug detection. in ec ing re resen a ni ue used in his a er i e aul s es can realis icall simula e di eren 2.1 Motivating Example inds o occasional errors o co er error handling code e irs mo i a e our or using a real inu dri er oreo er es runs on he real hard are and uses bnx2 his dri er manages roadcom e reme e ac e ecu ion in orma ion o er orm anal sis hese herne on roller igure sho s a ar o i s source oin s assure he accurac o ug de ec ion code in inu he unc ion bnx2_init_board calls
1 es rogram can e do nloaded rom he lin pci_request_regions (line ) o re ues and http://oslab.cs.tsinghua.edu.cn/EHTest/index.html memor resources hen ini iali ing he hard are an