Mixed-Criticality Scheduling and Resource Sharing for High-Assurance Operating Systems

Mixed-Criticality Scheduling and Resource Sharing for High-Assurance Operating Systems Anna Lyons Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy School of Computer Science and Engineering University of New South Wales Sydney, Australia September 2018 Abstract Criticality of a software system refers to the severity of the impact of a failure. In a high-criticality system, failure risks significant loss of life or damage to the environ- ment. In a low-criticality system, failure may risk a downgrade in user-experience. As criticality of a software system increases, so too does the cost and time to develop that software: raising the criticality also raises the assurance level, with the highest levels requiring extensive, expensive, independent certification. For modern cyber-physical systems, including autonomous aircraft and other vehicles, the traditional approach of isolating systems of different criticality by using completely separate physical hardware, is no longer practical, being both restrictive and inefficient. The result is mixed-criticality systems, where software applications with different criticalities execute on the same hardware. Sufficient mechanisms are required to ascertain that software in mixed-criticality systems is sufficiently isolated, otherwise, all software on that hardware is promoted to the highest criticality level, driving up costs to impractical levels. For mixed-criticality systems to be viable, both spatial and temporal isolation are required. Current aviation standards allow for mixed-criticality systems where temporal and spatial resources are strictly and statically partitioned in time and space, allowing some improvement over fully isolated hardware. However, further improvements are not only possible, but required for future innovation in cyber-physical systems. This thesis explores further operating systems mechanisms to allow for mixed- criticality software to share resources in far less restrictive ways, opening further possibilities in cyber-physical system design without sacrificing assurance properties. Two key properties are required: first, time must be managed as a central resource of the system, while allowing for overbooking with asymmetric protection without increasing certification burdens. Second, components of different criticalities should be able to safely share resources without suffering undue utilisation penalties. We present a model for capability-controlled access to processing time without incurring over-head related capacity loss or restricting user policy, including processing time in shared resources. This is achieved by combining the idea of resource reservations, from resource kernels, with the concept of resource overbooking, which is central to policy freedom. The result is the core mechanisms of scheduling contexts, scheduling context donation over IPC, and timeout exceptions which allow system designers to designate their own resource allocation policies. We follow with an implementation of the model in the seL4 microkernel, a high- assurance, high-performance platform. Our final contribution is a thorough evaluation, including microbenchmarks, whole system benchmarks, and isolation benchmarks. The micro- and system benchmarks show that our implementation is low overhead, while the user-level scheduling benchmark demonstrates policy freedom in terms of scheduling is retained. Finally, our isolation benchmarks show that our processor temporal isolation mechanisms are effective, even in the case of shared resources. iii Contents Abstract iii Contentsv Acknowledgements ix Publications xi Acronyms xiii List of Figures xvii List of Tables xix List of Listings xxi 1 Introduction1 1.1 Motivation..................................3 1.2 Definitions..................................6 1.3 Objectives..................................6 1.4 Approach..................................7 1.5 Contributions................................7 1.6 Scope....................................8 2 Core concepts9 2.1 Real-time theory..............................9 2.2 Scheduling................................. 11 2.3 Resource sharing.............................. 17 2.4 Operating systems.............................. 21 2.5 Summary.................................. 26 3 Temporal Isolation & Asymmetric Protection 27 3.1 Scheduling................................. 28 3.2 Mixed-criticality schedulers........................ 33 3.3 Resource sharing.............................. 35 3.4 Summary.................................. 36 4 Operating Systems 39 4.1 Standards.................................. 39 v 4.2 Existing operating systems......................... 42 4.3 Isolation mechanisms............................ 46 4.4 Summary.................................. 53 5 seL4 Basics 57 5.1 Capabilities................................. 57 5.2 System calls and invocations........................ 58 5.3 Physical memory management....................... 59 5.4 Control capabilities............................. 63 5.5 Communication............................... 64 5.6 Scheduling................................. 69 5.7 Summary.................................. 74 6 Design & Model 75 6.1 Scheduling................................. 76 6.2 Resource sharing.............................. 84 6.3 Mechanisms................................. 89 6.4 Policies................................... 90 6.5 Summary.................................. 97 7 Implementation in seL4 99 7.1 Objects................................... 99 7.2 MCS API.................................. 108 7.3 Data Structures and Algorithms....................... 114 7.4 Summary.................................. 118 8 Evaluation 119 8.1 Hardware.................................. 119 8.2 Overheads.................................. 120 8.3 Temporal Isolation............................. 133 8.4 Practicality................................. 141 8.5 Multiprocessor benchmarks......................... 147 8.6 Summary.................................. 149 9 Conclusion 151 9.1 Contributions................................ 152 9.2 Future work................................. 152 9.3 Concluding Remarks............................ 153 Bibliography 155 Appendices 167 Appendix A MCS API 169 A.1 System calls................................. 169 A.2 Invocations................................. 173 Appendix B Code 179 B.1 Sporadic servers............................... 179 vi Appendix C Fastpath Performance 185 C.1 ARM.................................... 185 C.2 x86..................................... 191 vii Acknowledgements It’s a common aphorism to state that research outcomes “stand on the shoulders of giants”, referring to all the research and hard work that the outcomes build upon. It’s definitely true, however I’d like to personally thank those who are rarely recognised: the carers, the cleaners, the cooks, the friends and family, the physiotherapists, the detectives, therapists, and all of the people who form a support group around a person. It takes a village to raise a child, and the same goes for a PhD. In both cases, there’s a lot of thankless, gruelling work, which might sound similar, but you don’t get a certificate and floppy hat at the end. You’re all giants to me. I’d like to thank the following people for their help and support: my partner Kara Hatherly, Debbie & Geoff Fear, Biara Webster, Ryan Williams and his team, Patricia Durning and Gary Penhall. I would never have finished this project without you. I thank my supervisor, Gernot Heiser, for encouraging me, giving me the time I needed, and ultimately proving the support that led to me complete the project in the face of extreme external events. My co-supervisor, Kevin Elphinstone, for being drawn into hours of whiteboard discussion, after I waited earnestly at the coffee machine. Additionally all of the students, engineers and researchers who have helped at some point or another: David Greenaway, Thomas Sewell, Adrian Danis, Kent McLeod, Yanyan Shen, Luke Mondy, Hesham Almatary and Qian Ge. Further thanks to everyone in the Trustworthy Systems team, past and present, you’re all fantastic. Matthew Fernandez, Rafal Kolanski, and Joel Beeren get a special thanks not only for help with the work, but for supporting me in the lab, and the pub, and being my friends even at my worst, you were a huge factor in keeping me sane. Finally I’d like to thank my veritable army of proof readers: Kara Hatherly, Kent McLeod, Peter Chubb, Tim Cerexhe, Alex Legg, Joel Beeren, Sebastion Holzapfel, Robert Sison, Sean Peters, Ihor Kuz and of course, Gernot Heiser. ix Publications This thesis is partially based on work described in the following publications: • Anna Lyons, Kent Mcleod, Hesham Almatary, and Gernot Heiser. Scheduling-context capabilities: A principled, light-weight OS mechanism for managing time. In EuroSys Conference, Porto, Portugal, April 2018. ACM • Anna Lyons and Gernot Heiser. Mixed-criticality support in a high-assurance, general- purpose microkernel. In Workshop on Mixed Criticality Systems, pages 9–14, Rome, Italy, December 2014 xi Acronyms AAV autonomous aerial vehicle.1, 34 AES advanced encryption standard. 136, 137, 147, 149 API application programming interface. 99 APIC advanced PIC. 133 ASID address space ID. 63 BMK bare metal kernel. 133 BWI bandwidth inheritance. 54, 94 CA certification authority.4, 33 CBS constant-bandwidth server. 31, 32, 34, 37, 43–47, 49 CFS completely-fair scheduler. 43 CPU central processing unit. 47, 134, 152 DM deadline monotonic. 15

Mixed-Criticality Scheduling and Resource Sharing for High-Assurance Operating Systems

Access Control and Operating System

Research Purpose Operating Systems – a Wide Survey

Operating System Verification—An Overview

Scalability of Microkernel-Based Systems

Introduction Course Outline Why Does This Fail? Lectures Tutorials

A Look at the EROS Operating System Jonathan S

El Debat Polític D'ahir; Al Parlament De La Repúb Ica, Desenrotllat En

Labels and Event Processes in the Asbestos Operating System

Download Full Book

Mobile and Embedded Device

Reinterpreting Eros in Plato's Symposium

Design of a Hard Real-Time Multi-Core Testbed for Energy Measurement$