Runtime Monitoring for Open Real-Time Systems

Runtime Monitoring for Open Real-Time Systems Dissertation zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) vorgelegt an der Technischen Universität Dresden Fakultät Informatik eingereicht von Dipl.-Inf. Martin Pohlack geboren am 15. Juni 1978 in Löbau Betreuender Hochschullehrer: Prof. Dr. rer. nat. Hermann Härtig Technische Universität Dresden Gutachter: Prof. Dipl.-Ing. Dr. Gerhard Fohler Technische Universität Kaiserslautern Tag der Verteidigung: 1. Juli 2010 Dresden im August 2010 Acknowledgements First, I would like to thank my supervisor, Prof. Hermann Härtig, for his support and for providing a work environment at the Operating Systems Chair of the TU Dresden that made this work possible. Many members of the operating-systems group have helped to make this work a reality. I would especially like to thank Björn Döbel and Michael Roitzsch, with whom it was a joy to work with. Björn Döbel and Torvarld Riegel contributed to this thesis by working on runtime-monitoring matters for their master theses. I also want to thank the members of the Magpie research project at Microsoft Research Cambridge / UK for providing me with an extremely interesting research-internship ex- perience. Several people proofread versions of this thesis and gave very valuable feedback for which I am thankful: Prof. Gerhard Fohler, Prof. Hermann Härtig, Dr.-Ing. Michael Hohmuth, Juliane Pohlack, Sebastian Pohlack, and Michael Roitzsch. My parents always supported me and believed in me for which I am grateful. Last but not least, I want to thank my wonderful wife Juliane and my children for supporting me and for their patience. Contents 1 Introduction1 1.1 Contributions.................................2 1.2 The scope of my work and organization..................4 1.3 List of Publications..............................5 2 Foundations and related work7 2.1 Instrumentation techniques.........................7 2.1.1 Static instrumentation........................8 2.1.2 Hybrid approaches..........................9 2.1.3 Dynamic instrumentation...................... 10 2.2 Transport techniques............................. 12 2.3 Evaluation approaches............................ 16 2.3.1 Internal vs. external request specification............. 16 2.3.2 Specification languages........................ 17 2.3.3 Other projects............................ 19 2.4 Miscellanea.................................. 22 2.4.1 Sampling techniques......................... 22 2.4.2 Distributed systems......................... 23 2.4.3 Stable event ABIs.......................... 23 2.4.4 Event size and timestamp size................... 24 2.4.5 Probe effect, intrusiveness, and monitoring overhead....... 24 2.5 Terminology.................................. 25 2.6 Summary................................... 27 3 Design 31 3.1 Target systems and requirements...................... 31 3.1.1 Target systems............................ 31 3.1.2 Properties and requirements..................... 32 3.1.2.1 High-level requirements.................. 32 3.1.2.2 Low-level requirements.................. 33 3.1.2.3 Requirements to hardware and operating system.... 35 3.2 Design decisions and architecture...................... 36 3.2.1 Roles and their interaction: Ferret’s architecture........ 36 3.2.1.1 Discussion of specific design alternatives......... 38 3.2.2 Online evaluation, offline evaluation, and external schemata... 40 3.2.2.1 Online evaluation..................... 40 3.2.2.2 Offline evaluation and schemata............. 41 V Contents 3.2.2.3 Controlling the monitoring................ 41 3.2.3 Sensors................................ 42 3.2.3.1 A variety of sensor types................. 42 3.2.3.2 Timestamps........................ 44 3.2.4 Sensor synchronization approaches................. 50 3.2.4.1 Common approaches.................... 50 3.2.4.2 An evaluation of the event sensors from: A Generalized Approach to Runtime Monitoring for Real-Time Systems 52 3.2.4.3 Atomic sections in user-space code for sensor synchronization........................... 57 3.3 Conclusion.................................. 60 4 Implementation 63 4.1 Atomic sections in user-space code..................... 63 4.1.1 Common concepts.......................... 64 4.1.2 The rollback approach........................ 65 4.1.2.1 Limitations of the rollback approach........... 65 4.1.3 The rollforward approach...................... 66 4.1.3.1 Alternative implementation approaches......... 66 4.1.3.2 Implementation details.................. 67 4.1.4 The combined approach....................... 68 4.1.5 The implementation for Fiasco................... 69 4.1.6 Using atomic sequences in sensors................. 70 4.1.6.1 ALists............................ 70 4.1.6.2 VLists............................ 73 4.2 General-purpose instrumentation placement................ 76 4.2.1 Instrumentation of Drops on L4.................. 77 4.2.2 Implementing Event Tracing for Singularity............ 78 4.3 Portability and tracing special environments................ 79 4.3.1 Portability and architecture..................... 79 4.3.2 Portability and versions....................... 80 5 Evaluation 81 5.1 Use cases................................... 81 5.1.1 Model building............................ 82 5.1.1.1 DOpE resource demand modeling............ 82 5.1.1.2 DOpE requests....................... 85 5.1.1.3 Hard-disk–request modeling............... 92 5.1.1.4 Verner............................ 94 5.1.2 Behavior checking.......................... 96 5.1.2.1 Idle-switch optimization.................. 96 5.1.2.2 Taming L4Linux...................... 97 5.1.2.3 Constraint checking in drivers.............. 100 5.1.2.4 Comparison of native and virtualized operating system kernel (fork)........................ 100 VI Contents 5.1.3 Other use cases............................ 102 5.1.3.1 Dynamic function-call tracing.............. 102 5.1.3.2 Event Tracing for Singularity............... 103 5.2 Qualitative evaluation............................ 105 5.3 Quantitative evaluation........................... 107 5.3.1 Hardware description........................ 107 5.3.2 System Management Mode..................... 107 5.3.3 Sensor microbenchmarks....................... 108 5.3.3.1 Throughput survey.................... 109 5.3.3.2 AList............................ 109 5.3.3.3 VList............................ 110 5.3.4 Macrobenchmarks.......................... 111 5.3.4.1 Quantifying intrusiveness................. 112 5.3.4.2 A non–real-time workload................. 114 5.3.4.3 A real-time workload................... 115 5.4 Summary................................... 116 6 Conclusion 117 6.1 Suggestions for future work......................... 119 Bibliography 121 Index 137 VII List of Figures 3.1 Architecture overview of Ferret ...................... 37 3.2 Exemplary data layout description for events............... 45 3.3 Mismatch of event-timestamp order and visibility order......... 53 4.1 Schema for atomic sequences following the combined approach..... 68 4.2 Assembler rollback implementation for AList event production code.. 71 4.3 Assembler rollforward implementation for AList event production code. 72 4.4 AList post-routine wrapper with two 32-bit words payload........ 73 4.5 Simplified C code for VList sensor code.................. 74 4.6 Generated assembler code for C implementation of VList post routine. 75 4.7 Inline-assembler fragment with a stack-prefaulting calling convention.. 75 4.8 Inline-assembler fragment with a page-aligned–stack calling convention. 76 4.9 C wrapper that moves all data into a single stack page.......... 77 4.10 Instrumentation template for Singularity’s CIL-bytecode......... 79 5.1 DOpE’s inner copy routine......................... 82 5.2 Diagram of pixel copy times to graphic-card memory........... 84 5.3 Excerpt from DOpE request schema: prehandlers............. 88 5.4 Excerpt from DOpE request schema: handlers............... 89 5.5 Histogram over the processor-time demand for single DOpE request.. 90 5.6 A single DOpE request............................ 91 5.7 Screenshot of a typical session with active runtime monitoring...... 95 5.8 State machine for idle_switch_mon.................... 96 5.9 State machine for tamer monitor...................... 98 5.10 Timeline visualization of the atomicity problem with the tamer thread. 99 5.11 Timeline view of a vfork system call.................... 101 5.12 Dynamic call graph for the E1000 interrupt thread............ 103 5.13 Histogram showing how SMIs affect execution time............ 108 5.14 Distributions for AList event posting (with cache flooder)........ 111 5.15 Throughput processor time vs. event size for the VList sensor...... 111 5.16 Worst-case execution times......................... 112 5.17 Re-AIM 7 throughput............................ 115 5.18 Execution-time distributions video-decoding loop in Verner....... 116 IX List of Tables 5.1 Qualitative properties of the different sensors compared......... 106 5.2 Throughput results for different processors and sensor configurations.. 110 5.3 Key properties of the noninstrumented and instrumented runs compared 115 XI Chapter 1 Introduction The trend to higher integration in the computer industry has been unchanged in the last years and software systems kept up with the general growth leading to systems with an ever-increasing complexity, at least under the hoods. Monitoring of runtime properties in such complex systems becomes increasingly important. We see not only a quantitative

Runtime Monitoring for Open Real-Time Systems

Lttng and the Love of Development Without Printf() [email protected]

Flame Graphs Freebsd

Linuxcon North America 2012

Linux Performance Tools

Advanced Linux Tracing Technologies for Online Surveillance of Software Systems

Powerful Software Analysis System-Wide Tracing Is Here

Fine-Grained Distributed Application Monitoring Using Lttng

Deploying Lttng on Exotic Embedded Architectures

Towards Efficient Instrumentation for Reverse-Engineering Object Oriented Software Through Static and Dynamic Analyses

A System-Wide Unified CPU and GPU Tracing Tool for Opencl Applications

Experimental Methods for the Evaluation of Big Data Systems Abdulqawi Saif

Distributed Architecture for an Integrated Development Environment, Large Trace Analysis, and Visualization