Robustness Evaluation of Operating Systems

Robustness Evaluation of Operating Systems Vom Fachbereich Informatik der Technischen Universität Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades eines Doktor-Ingenieur (Dr.-Ing.) vorgelegt von Andréas Johansson aus Falkenberg, Schweden Referenten: Prof. Neeraj Suri, Ph.D. Prof. Christof Fetzer, Ph.D. Datum der Einreichung: 19.11.2007 Datum der mündlichen Prüfung: 14.01.2008 Darmstadt 2008 D17 ii Summary The premise behind this thesis is the observation that Operating Systems (OS), being the foundation behind operations of computing systems, are complex entities and also subject to failures. Consequently, when they do fail, the impact is the loss of system service and the applications running thereon. While a multitude of sources for OS failures exist, device drivers are often identified as a prominent cause behind failures. In order to characterize the impact of driver failures, at both the OS and application levels, this thesis develops a framework for error propagation-based robustness profiling for an OS. The framework is first developed conceptually and then experimentally validated on a real OS, namely Windows CE .Net. The choice of Windows CE is driven by its representativeness for a multitude of OS’s, as well as the ability to customize the OS components for particular needs. For experimental validation, fault injection is a prominent technique that can be used to simulate faults (or errors) in the system by inserting synthetic ones and study their effect. Three key questions with such a technique are where, what and when to inject faults. This thesis shows how injecting errors at the interface between drivers and the OS can be very effective in evaluating the effects driver faults can have. To quantify the OS’s robustness, this thesis defines a series of error propagation measures, specifically tailored for device drivers. These measures allow for quantifying and comparing both individual services and device drivers on their susceptibility and diffusing abilities. This thesis compares three contemporary error models on their suitability for robustness evaluation. The classical bit-flip model is found to identify a higher number of severe failures in the system. It also identifies failures for more services than both other models, data type and fuzzing. However, its main drawback is that it requires substantially more injections than the other two. Fuzzing, even though not giving rise to as many failures is able to find new additional services with severe failures. A careful study of the injections performed with the bit-flip model shows that only a few bits are generally useful for identifying new services with robustness weaknesses. Consequently, a new composite model is proposed, combining the most effective bits of the bit-flip model with the fuzzing model’s ability to identify new services, giving rise to new model without loss of important information and at the same time incurring a moderate number of injections. To answer the question of when to inject an error this thesis proposes a novel model of a driver’s usage profile, focusing on high-level operations being carried out. It guides the injection of errors to instances when the driver is carrying out specific operations. Results from extensive fault injection experiments show that more service vulnerabilities can be discovered. Furthermore, a priori profiling of the drivers can show how effective the proposed approach will be. iii iv Kurzfassung Der Hintergrund dieser Dissertation beruht auf der Beobachtung, dass das Betriebssystem, welches die Grundlage für den Betrieb von Rechnersystemen darstellt, eine sehr komplexe Struktur aufweist, was häufig zu Fehlern im Be- triebssystem führen kann. Wenn diese betriebssysteminternen Fehler Ausfälle von Diensten zur Folge haben, sind auch die im Rahmen des Betriebssystems laufenden Applikationen gefährdet. Auch wenn es im allgemeinen viele Fehlerquellen gibt, werden oft fehlerhafte Treiber als die häufigste Ursache angegeben. Um die Auswirkungen von Treiberdefekten auf der Betriebssystem- und App- likationsebene zu charakterisieren, wird in dieser Dissertation ein auf der Ausbre- itung von Fehlern basierendes Framework für Robustheitsauswertung entwickelt. Das Framework wird sowohl konzeptionell entwickelt als auch auf einem echten Be- triebssystem experimentell validiert. Das gewählte Betriebssystem, Windows CE .Net, ist repräsentativ für viele andere Betriebssysteme. Es ist modular aufgebaut, was die Anpassung der Betriebssystemkomponenten an verschiedene Bedürfnisse erheblich vereinfacht. Fehlerinjektion ist eine bedeutende Technik für die experimentelle Validierung, wobei Fehler simuliert werden indem man sie in das System injiziert und ihre Fol- gen beobachtet. Drei wichtige Aspekte, die hierbei berücksichtigt werden müssen, sind: Welche Fehler sollen wo und wann injiziert werden. In dieser Dissertation wird gezeigt, dass Fehlerinjektion in die Schnittstelle zwischen dem Betriebssystem und den Treibern eine effektive Vorgehensweise darstellt, die Folgen von Treiber- fehlern abzuschätzen. Um die Robustheit eines Betriebssystems zu quantifizieren, werden eine Reihe von Fehlerausbreitungsmetriken definiert, die speziell auf Treiberfehler zugeschnit- ten sind. Anhand dieser Metriken können Dienste und Treiber hinsichtlich Empfindlichkeit und Ausbreitungsvermögen verglichen werden. Diese Dissertation vergleicht drei zeitgemäe Fehlermodelle in Bezug auf ihre Tauglichkeit zur Robustheitsbewertung. Das klassische Bit-Flip-Modell ermittelt am häufigsten schwere Ausfälle im System. Mehr als die beiden anderen Modelle, Data Type und Fuzzing, ermittelt dieses Modell auch die meisten Dienste, die zu Ausfällen führen könnten. Der gröte Nachteil dieses Modells ist allerdings, dass es sehr viele Injektionen erfordert. Fuzzing ermittelt weniger Dienste, dafür aber neue fehlerhafte, von Bit-Flip nicht erkannte Dienste. Eine sorgfältige Untersuchung der Ergebnisse des Bit-Flip-Modells zeigt, dass schon eine Teilmenge der Bits ausreichend ist, um neue Dienste, die zu Robus- theitsausfällen führen, zu ermitteln. Daraufhin wird ein neues, zusammengesetztes Modell vorgeschlagen, das die guten Eigenschaften des Bit-Flip-Modells und das Vermögen des Fuzzing-Modells neue Dienste zu identifizieren miteinander kom- biniert. Das neue Modell verliert keine wichtige Information, und erfordert insge- samt deutlich weniger Injektionen. Um die Frage zu beantworten wann es sinnvoll ist Fehler zu injizieren, wird ein neues, an das Benutzerprofil des Treibers angelehntes Timingmodell vorgeschla- v gen. Das neue Modell basiert auf der Ausführung von Befehlen in einer höheren Schicht. Bestimmte Fehlerinjektionen werden zum Zeitpunkt der Ausführung bes- timmter Befehle getätigt. Die Ergebnisse der Fehlerinjektionen zeigen, dass ein Vielfaches an störungsanfälligen Diensten gefunden werden kann. Auerdem gibt das Benutzerprofil des Treibers im Voraus Aufschlussuber ¨ die Effektivität der neuen Methode. vi Acknowledgements The path to a Ph.D. is a long, winding one. At the beginning it is wide and spacious, only to become narrower and narrower. Sometimes it goes steeply upwards, sometimes downwards. Sometimes you think you see an opening and light after the next turn, only to find a dead end. There are crossings, where one has to choose which path to pursue. Some paths look more promising than others but you quickly learn that the easy path is not always the shortest. I am now at the end of the path, only to realize that it is the beginning of a new. I would not have gotten here without the assistance and encouragement of several people. First of all I would like to thank my guide and mentor, Prof. Neeraj Suri. Thanks for your guidance and support. I would also like to thank all present and former members of the DEEDS group: Martin, Vilgot, Orjan,¨ Arshad, Robert, Adina, Dinu, Marco, Dan, Ripon, Brahim, Faisal, Majid and Matthias. Many thanks also to Birgit, Ute, Sabine and Boyan. A great many thanks also to Prof. Chrstof Fetzer for accepting to be my opponent. I am also grateful for funding for conducting my research coming from the projects EC IP DECOS, EC NoE ReSIST and by research grants from Microsoft Research. A part of the research validation was accomplished over an internship at Microsoft Research, Cambridge. A special thanks to Brendan Murphy at MSR for hosting my internship and for all discussions and help writing papers. Finally to my beautiful and supporting wife, Mia. Thank you for every- thing! vii viii Contents 1 Introduction 1 1.1 Dependability: The Basic Concepts ............... 3 1.1.1 Dependability Attributes ................. 4 1.1.2 Dependability Threats .................. 5 1.1.3 Dependability Means ................... 6 1.1.4 Alternate Terminology: Software Engineering ..... 7 1.1.5 Bohrbugs and Heisenbugs ................ 7 1.2 Robustness Evaluation ...................... 8 1.3 Thesis Research Questions & Contributions .......... 9 1.4 Thesis Structure .......................... 14 2 Background and Context 15 2.1 A Short Operating System History ............... 16 2.1.1 OS Design ......................... 16 2.1.2 Device Drivers ....................... 17 2.1.3 What is the Problem? .................. 18 2.2 Sources of Failures of Operating Systems ............ 18 2.2.1 Hardware Related ..................... 19 2.2.2 Software Related ..................... 20 2.2.3 User Related ....................... 20 2.2.4 Device Drivers ....................... 21 2.3 Fault Injection .........................

Robustness Evaluation of Operating Systems

EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President

Cloud Workbench a Web-Based Framework for Benchmarking Cloud Services

Automatic Benchmark Profiling Through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin

Automatic Benchmark Profiling Through Advanced Trace Analysis

Introduction to Digital Signal Processors

BOOM): an Industry- Competitive, Synthesizable, Parameterized RISC-V Processor

A Free, Commercially Representative Embedded Benchmark Suite

Low-Power High Performance Computing

Exploring Coremark™ – a Benchmark Maximizing Simplicity and Efficacy by Shay Gal-On and Markus Levy

CPU and FFT Benchmarks of ARM Processors

DSP Benchmark Results of the GR740 Rad-Hard Quad-Core LEON4FT

Scalable Vector Media-Processors for Embedded Systems