Towards Predictable Real-Time Performance on Multi-Core Platforms

Towards Predictable Real-Time Performance on Multi-Core Platforms Submitted in partial fulfillment of the requirements for the degreee of Doctor of Philosophy in Electrical and Computer Engineering Hyoseung Kim B.S., Computer Science, Yonsei University, Seoul, Korea M.S., Computer Science, Yonsei University, Seoul, Korea Carnegie Mellon University Pittsburgh, PA, USA June 2016 Copyright © 2016 Hyoseung Kim Keywords: Cyber-physical systems, Real-time embedded systems, Safety-critical systems, Multi-core platforms, Operating systems, Virtualization, Predictable performance. Abstract Cyber-physical systems (CPS) integrate sensing, computing, communication and actu- ation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable real-time performance: the timing correctness of the system should be analyzable at design time with a quantitative metric and guaranteed at runtime with high assurance. This requirement of predictability is particularly important for safety-critical domains such as automobiles, aerospace, defense, manufacturing and medical devices. The work in this dissertation focuses on the challenges arising from the use of modern multi-core platforms in CPS. Even as of today, multi-core platforms are rarely used in safety-critical applications primarily due to the temporal interference caused by contention on various resources shared among processor cores, such as caches, memory buses, and I/O devices. Such interference is hard to predict and can significantly increase task execution time, e.g., up to 12× on commodity quad-core platforms. To address the problem of ensuring timing predictability on multi-core platforms, we develop novel analytical and systems techniques in this dissertation. Our proposed techniques theoretically bound temporal interference that tasks may suffer from when accessing shared resources. Our techniques also involve software primitives and algorithms for real-time operating systems and hypervisors, which significantly reduce the degree of the temporal interference. Specifically, we tackle the issues of cache and memory contention, locking and synchronization, interrupt handling, and access control for computational accelerators such as general-purpose graphics processing units (GPGPUs), all of which are crucial to achieving predictable real-time performance on a modern multi-core platform. Our solutions are readily applicable to commodity multi-core platforms, and can be used not only for developing new systems but also migrating existing applications from single-core to multi-core platforms. vi Acknowledgments This dissertation would have been impossible without the help and support of many people. First and foremost, I would like to thank my advisor, Prof. Raj Rajkumar. I was lucky to work with Raj. His guidance and expertise have made me a better thinker, writer, and researcher. Raj gave me opportunities to participate in exciting projects, demonstrate my research results, and mentor other students, all of which led me to become an independent researcher and to pursue an academic career. I am grateful to the members of my thesis committee, Prof. Onur Mutlu, Prof. Anthony Rowe, and Dr. Shige Wang for their time, effort and inputs in completing this dissertation. Thanks to Onur for his insights on various aspects of my work. I learned a lot from Onur on computer architectures, which was a great asset for my research. Thanks to Anthony for his feedback and advice, even since my very early days at CMU. I enjoyed lively conversations with Anthony and liked to hear his view on cyber-physical systems. Thanks to Shige for his giving me many inputs and motiving me with various practical examples. Working with Shige was a great pleasure to me. I would like to thank my research colleagues at the Software Engineering Institute (SEI): Dio de Niz, Björn Andersson, and Mark Klein. Our weekly meeting was an excellent opportunity to share lots of interesting discussions and do some good collaborative work. I also would like to thank Prof. John Lehoczky for his keen insight and wisdom during our meetings at SEI. I wish to thank the members of the CMU’s autonomous driving team: Prof. John Dolan, Jongho Lee, Tianyu Gu, Chiyu Dong, Adam Werries, Zhiding Yu, and all other former members. Their passion and efforts made me proud of being part of the team and contributing to our autonomous car. A special thanks to General Motors (GM), National Science Foundation (NSF), and the Fulbright association for funding my research. Most of my time during my doctoral studies was spent at the Real-Time and Multimedia systems Lab (RTML). Thanks to all the members of RTML who shared their time with me: Gaurav Bhatia, Karthik Lakshmanan, Arvind Kandhalu, Junsung Kim, Reza Azimi, Alexei Colin, Young-Woo Seo, Anand Bhat, Sandeep D’souza, and Shunsuke Aoki. Also, I would like to thank Toni M. Fox for her kind support on administrative work. Besides the RTML members, I am grateful to my friends at CIC: Max Buevich, Niranjini Rajagopal, Oliver Shih, Adwait Dongare, Donghyuk Lee, Sang Kil Cha, Gihyuk Ko, and Soo-Jin Moon. I am grateful to my Korean friends who I met in Pittsburgh: Sungwon Yang, Jaesok Yu, Yongjune Kim, Min Suk Kang, Minhee Jun, and Kiryong Ha. Without these people, I could not have fully enjoyed my time at CMU. I would like to thank my old buddies who are currently geographically far from me but always on my side: JongMan Koo, Jaehun Ha, Kwangkyu Park, Hwan Lee, Jungho Kim, San Yoon, Junoh Jeon, Jungmyung Kim, and Woongjung Do. I am also very grateful to Wonwoo Jung, Shinyoung Yi, Jongho Rim, and Youngbin You, for their being always supportive of me. My family has given me their endless love and support. Thanks to my parents for being my parents. My immeasurable gratitude is due to them. Thanks to my parents-in-law for their understanding me during the long years of my studies. Thanks to my brother-in-law, Taegon Lee, for his encouragement. Lastly, my thanks go to my wife, Whayoung Lee. She has been the greatest source of warmth, love and support since I met her. I would never have completed this dissertation without her. viii Contents 1 Introduction 1 1.1 Scope of This Work . .3 1.1.1 Multi-Core Platform and Shared Resources . .3 1.1.2 Tasks and Task Execution Environments . .4 1.2 Challenges with Shared Resources . .6 1.2.1 Concurrent Resources . .6 1.2.2 Mutually-Exclusive Resources . .7 1.2.3 Computational Accelerators . .8 1.3 Contributions . .9 1.3.1 Analytical and Systems Support for Concurrent Resources . .9 1.3.2 Analytical and Systems Support for Mutually-Exclusive Resources . 11 1.3.3 Analytical and Systems Support for Computational Accelerators . 12 1.4 Organization . 12 2 Background and Related Work 13 2.1 Cache Interference . 13 2.1.1 Page Coloring . 14 2.1.2 Problems with Page Coloring . 15 2.1.3 Related Work . 16 2.2 Memory Interference . 18 ix 2.2.1 DRAM Organization . 18 2.2.2 Memory Controller . 19 2.2.3 Bank Address Mapping and Bank Partitioning . 21 2.2.4 Related Work . 22 2.3 Synchronization . 24 2.3.1 Timing Penalties from Mutually-Exclusive Resources . 25 2.3.2 Related Work . 26 2.4 Interrupt Handling . 28 2.4.1 Problems with Virtual Interrupts . 28 2.4.2 Related Work . 29 2.5 GPGPU Management . 31 2.5.1 GPU Execution Pattern . 31 2.5.2 Related Work . 32 3 System Model 35 3.1 Platform Model . 36 3.2 Task Model . 37 3.3 Virtual Machine Model . 40 3.4 Other Assumptions . 41 4 Coordinated Approach for Predictable Cache Management 43 4.1 Coordinated Cache Management . 45 4.1.1 Cache Reservation . 46 4.1.2 Cache Sharing: Bounding Intra-core Penalties . 46 4.1.3 Cache Sharing: How to Share Cache Partitions . 49 4.1.4 Cache-Aware Task Allocation . 52 4.1.5 Tasks with Shared Memory Regions . 53 4.2 Evaluation . 54 4.2.1 Implementation . 54 x 4.2.2 Taskset . 56 4.2.3 Cache Reservation . 57 4.2.4 Cache Sharing . 59 4.2.5 Cache-Aware Task Allocation . 62 4.3 Summary . 64 5 Bounding and Reducing Memory Interference 65 5.1 Bounding Memory Interference Delay . 67 5.1.1 Request-Driven Bounding Approach . 68 5.1.2 Job-Driven Bounding Approach . 76 5.1.3 Response-Time Based Schedulability Analysis . 78 5.1.4 Memory Controllers with Write Batching . 79 5.1.5 Combining with Cache Interference Analysis . 80 5.2 Reducing Memory Interference via Task Allocation . 82 5.3 Evaluation . 88 5.3.1 Memory Interference in a Real Platform . 88 5.3.2 Memory Interference-Aware Task Allocation . 97 5.4 Summary . 101 6 Predictable Cache Management for Virtualization 103 6.1 Cache Control in Virtualization . 105 6.1.1 Address Translation in Virtualization . 105 6.1.2 vLLC for Coloring-aware Guest OSs . 106 6.1.3 vColoring for Coloring-unaware Guest OSs . 108 6.2 Cache Management Scheme . 110 6.2.1 Schedulability Analysis . 111 6.2.2 Allocating Cache Partitions to Tasks . 112 6.2.3 Designing a Cache-Aware VM . 113 6.2.4 Allocating Host Cache Partitions to VMs . 116 xi 6.3 Evaluation . 118 6.3.1 vLLC and vColoring . 119 6.3.2 Cache Management Scheme . 123 6.4 Summary . 126 7 Synchronization for Multi-Core Virtual Machines 129 7.1 vMPCP Framework . 131 7.1.1 Protocol Description . 132 7.1.2 VCPU Budget Overrun . 133 7.1.3 vMPCP Para-virtualization Interface . 134 7.2 vMPCP Schedulability Analysis . 135 7.2.1 VCPU Schedulability . 135 7.2.2 Task Schedulability . 138 7.3 Evaluation . 142 7.3.1 Comparison of Different Configurations . 142 7.3.2 Case Study: vMPCP on KVM Hypervisor .

Towards Predictable Real-Time Performance on Multi-Core Platforms

A Lecture Note on Csc 322 Operating System I by Dr

Optimizing Storage Performance with Calibrated Interrupts

Fitting Linux Device Drivers Into an Analyzable Scheduling Framework

Junos® OS Release 19.3R3 for the ACX Series, EX Series, MX Series, NFX Series, PTX Series, QFX Series, SRX Series, and Junos Fusion

Tolerating Malicious Device Drivers in Linux

CS 4310: Operating Systems Lecture Notes - Student Version∗

Towards Predictable Real-Time Performance on Multi-Core Platforms

Shared IRQ Line Considerations AN-PM-059

Tolerating Malicious Device Drivers in Linux

111I.®@@ , Iw J , J '5 Expected 5 Expected 3 a Return to Timer :Ltimer Firings Normal Mode Duration W '5 L Expected 7 Timer Firings,Y Received 3

Interrupt Storm Detection" Feature

Comparing and Improving Current Packet Capturing Solutions Based on Commodity Hardware