Automata-Based Formal Analysis and Verification of the Real-Time Linux Kernel
Total Page:16
File Type:pdf, Size:1020Kb
Universidade Federal de Santa Catarina Scuola Superiore Sant'Anna Red Hat, Inc. Ph.D. in Automation and Systems Engineering Ph.D. in Emerging Digital Technologies - Embedded and Real-Time Systems R&D Platform - Core-Kernel - Real-Time Ph.D. Candidate: Daniel Bristot de Oliveira Ph.D. Supervisors: Tommaso Cucinotta Rômulo Silva de Oliveira Automata-based Formal Analysis and Verification of the Real-Time Linux Kernel Pisa 2020 ACKNOWLEDGEMENTS In the first place, I would like to thank my friends and family for supporting me during the development of this research. In particular, I would like to thank Alessandra de Oliveira for motivating me to pursue a Ph.D. degree since I was a teenager. But also for allowing me to use her computer, which was “my” first computer. I also would like to thank Bianca Cartacci, for the unconditional support in the hardest days of these last years. I would like to thank Red Hat, Inc. for allowing me to continue my academic research in concurrency with my professional life. I would also like to thank the pro- fessionals that motivated me to remain in the academy and supported this research, including Nelson Campaner, Steven Rostedt, Juri Lelli and Arnaldo Carvalho de Mello. In special, I would like to thank Clark Williams, not only for his effort in supporting my research at Red Hat but, mainly, for his adivises and personal support. I would like to thank my Ph.D. colleagues from UFSC and Sant’Anna, in special Karila Palma Silva, Marco Pagani, Paolo Pazzaglia and Daniel Casini, all the members of the Retis Lab, and the administrave staff of the TeCIP institute at Sant’Anna and the PPGEAS at UFSC for the support in the cotutela agreement, in special for Enio Snoeijer. Finally, and most importantly, I dedicate this work to my advisors. Thanks, Pro- fessor Tommaso Cucinotta and Professor Rômulo Silva de Oliveira for supporting me and guiding me in this journey, not only as an academic but as a human as well. ABSTRACT Real-time systems are computing systems where the correct behavior does not depend only on the functional behavior, but also on the timing behavior. In the real-time schedul- ing theory, a system is an abstraction, modeled using a set of variables that describe the sole timing behavior of its components. Linux is an implementation of an operating system (OS), that nowadays supports some of the fundamental abstractions from the real-time scheduling theory. Despite all improvements of the last decade, classifying Linux as a real-time operating system (RTOS) is still a source of conflict between real- time Linux and scheduling communities. The central reasons for this conflict lie in the empirical analysis of the timing properties of Linux made by practitioners, as it is ex- pected that the properties of a real-time system derive from a mathematical description of the behavior of the system. Generally, a rigorous set of proofs is required for any con- clusion about the predictability of the runtime behavior of a real-time system. The gap between the real-time Linux and real-time theory roots in the Linux kernel complexity. The amount of effort required to understand all the constraints imposed on real-time tasks on Linux is not negligible. The challenge is then to describe such operations, using a level of abstraction that removes the complexity due to the in-kernel code. The description must use a formal format that facilitates the understanding of Linux dynam- ics for real-time researchers, without being too far from the way developers observe and improve Linux. Hence, to improve the real-time Linux runtime analysis and verification, this thesis presents a formal thread synchronization model for the PREEMPT_RT Linux kernel. This model is built upon the automata formalism, using a modular approach that enables the creation of a model based on a set of independent sub-systems and the specifications that define their synchronized behavior. The thesis also presents a viable modeling methodology, including the validation strategy and tooling that compares the model against the real execution of the system. This model is then used as the base for the creation of a runtime verification of the method for the Linux kernel. The runtime verification method uses automatic code generation from the model to facilitate the development of the monitoring system. Moreover, it uses the dynamic tracing features of Linux to enable on-the-fly verification of the system, at a low overhead. Finally, the formal modeling of the kernel behavior is used as an intermediary step, facilitating the understanding of the rules and properties that rule the timing behavior of Linux tasks. These properties are then used in the formal definition of the components and compo- sition of the main metric used by the real-time Linux developers, the scheduling latency, in the same level of abstraction used by real-time researchers. The values for these variables are then measured and analyzed by a tool proposed in this thesis. Keywords: Real-time Systems. Linux kernel. Formal methods. Automata. Runtime Verification. LIST OF FIGURES Figure 1 – Thesis approach and contributions. 18 Figure 2 – Response-time analysis abstractions in a timeline. 23 Figure 3 – ftrace output. 31 Figure 4 – Trace timeflow output example. 34 Figure 5 – Non-maskable interruption timeline. 35 Figure 6 – Maskable interruption timeline. 35 Figure 7 – Real-time thread. 36 Figure 8 – Forms of thread interference. 37 Figure 9 – Forms of thread blocking. 37 Figure 10 – System and Model. 40 Figure 11 – Example of automaton. 44 Figure 12 – Example of Petri net. 44 Figure 13 – Monolithic generator G of the washer and dryer machine. 49 Figure 14 – Monolithic specification model S of the washer and dryer machine. 50 Figure 15 – S/G of the washer and dryer machine. 50 Figure 16 – Generator: Gdoor . ............................ 50 Figure 17 – Generator: Gwash. ............................ 50 Figure 18 – Generator: Gdry .............................. 50 Figure 19 – Specification: S1.............................. 51 Figure 20 – Specification: S2.............................. 51 Figure 21 – Specification: S3.............................. 51 Figure 22 – Specification: S4.............................. 51 Figure 23 – Modular generator G of the washer and dryer machine. 51 Figure 24 – Modular specification S of the washer and dryer machine. 52 Figure 25 – Modular model S/G of the washer and dryer machine. 52 Figure 26 – Modeling approach. 67 Figure 27 – Examples of generators: G05 need resched (left) and G04 Schedul- ing context (right). 71 Figure 28 – Examples of generators: G01 sleepable and runnable. 71 Figure 29 – Sets of sequences of event. 72 Figure 30 – perf task_model report dynamic. 73 Figure 31 – perf_tool definition inside the task_model structure. 74 Figure 32 – task_model and perf_tool initialization. 74 Figure 33 – perf thread_model: Events to callback mapping. 75 Figure 34 – Handler for the irq_vectors:nmi_entry tracepoint. 75 Figure 35 – process_event: trying to run the automata. 76 Figure 36 – Example of the perf thread_model output: a thread activation. 76 Figure 37 – Kernel trace excerpt. 77 Figure 38 – S18 Scheduler call sufficient and necessary conditions. 78 Figure 39 – Missing kernel events: the output of perf thread_model. 80 Figure 40 – Missing kernel events: the output of kernel tracepoints. 80 Figure 41 – Pseudo-code of tracing recurrence. 80 Figure 42 – Trace excerpt with comments of where the IRQ context is identified in the trace. 81 Figure 43 – mutex_lock not permitted with interrupts disabled. 81 Figure 44 – S12 Events blocked in the IRQ context. 82 Figure 45 – S22 Lock while interruptible. 82 Figure 46 – Trace of mutex_lock taken in the timer interrupt handler. 82 Figure 47 – Function stack, from the timer IRQ to the mutex_lock, used in the report for the Linux kernel developers. 83 Figure 48 – Verification approach. 86 Figure 49 – Wake-up In Preemptive (WIP) Model. 86 Figure 50 – Auto-generated code from the automaton in Figure 49. 87 Figure 51 – Helper functions to get the next state. 88 Figure 52 – Sleeping While in Atomic (SWA) model. 89 Figure 53 – Example of output from the proposed verification module, as occur- ring when a problem is found. 89 Figure 54 – Phoronix Stress-NG Benchmark Results: as-is is the system without tracing nor verification; SWA is the system while verifying Sleeping While in Atomic automata in Figure 56 and with the code in Figure 50; and the trace is the system while tracing the same events used in the SWA verification. 90 Figure 55 – Need re-sched forces scheduling (NRS model). 92 Figure 56 – Latency evaluation, using the SWA model (top) and the NRS model (bottom). 93 Figure 57 – NMI generator (O1). 96 Figure 58 – IRQ disabled by software (O2). 97 Figure 59 – IRQs disabled by hardware (O3). 97 Figure 60 – Context switch generator (04). 97 Figure 61 – Context switch generator (05). 97 Figure 62 – Preempt disable (06). 97 Figure 63 – Scheduling context (07). 97 Figure 64 – Thread runnable/sleepable (08). 97 Figure 65 – Need re-schedule operation (09). 97 Figure 66 – NMI blocks all other operations (R2). 99 Figure 67 – Operations blocked in the IRQ context (R3). 99 Figure 68 – IRQ disabled by thread or IRQs (R4). 99 Figure 69 – The scheduler is called with interrupts enabled (R5). 99 Figure 70 – The scheduler is called with preemption disabled to call the scheduler (R6). 99 Figure 71 – The scheduler context does not enable the preemption (R7). 100 Figure 72 – The context switch occurs with interrupts and preempt disabled (R8). 100 Figure 73 – The context switch occurs in the scheduling context (R9). 100 Figure 74 – Wakeup and need resched requires IRQs and preemption disabled (R10 and R11). 100 Figure 75 – Disabling preemption to schedule always causes a call to the sched- uler (R12). 100 Figure 76 – Scheduling always causes context switch (R13). 100 Figure 77 – Setting need resched always causes a context switch (R14).