Scratchpad Memory Management for Multicore Real-Time Embedded Systems

Scratchpad Memory Management For Multicore Real-Time Embedded Systems by Saud Wasly A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2018 c Saud Wasly 2018 Examining Committee Membership The following served on the Examining Committee for this thesis. The decision of the Examining Committee is by majority vote. External Examiner: Nathan W. Fisher Associate Professor Department of Computer Science, Wayne State University Supervisor: Rodolfo Pelizzoni Associate Professor Department of Electrical and Computer Engineering, University of Waterloo Internal Members: Hiren Patel Associate Professor Department of Electrical and Computer Engineering, University of Waterloo Sebastian Fischmeister Associate Professor Department of Electrical and Computer Engineering, University of Waterloo Internal-External Member: Martin Karsten Associate Professor Department of Computer Science, University of Waterloo ii Author's Declaration This thesis consists of material all of which I authored or co-authored: see Statement of Contributions included in the thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Statement of Contributions A large part of the content in this dissertation has been previously disseminated through 8 papers [170, 171, 18, 159, 161, 173, 160, 172], either peer-reviewed or under submission. The use of the content from the listed publications in this dissertation has been approved by all co-authors. I am the first author, sole student author and main contributor in [170, 171, 173, 172]. I am a student co-author in [18, 159, 161, 160]; for these collaborative papers, I further highlight the extent of my contribution below. The first main contribution in this dissertation is an execution and scheduling model for 3-phase real-time tasks to hide access latency to shared resources, such as main memory, through the use of a Direct Memory Access (DMA) engine. Section 3.1 discusses the main concept of the proposed execution model, while Section 3.2 presents a fixed-priority non- preemptive partitioned scheduling scheme based on the proposed model. The content of these sections is reproduced from [171]. The proposed scheduling scheme is extended in Section 3.3 to handle software-based scheduling of the DMA engine, and in Section 3.4 to recover from soft memory errors. Content in these sections is reprinted from co-authored papers [159] and [161], respectively. In both cases, my main contribution was to develop the response time analysis of the proposed execution model (Sections IV-B, V, and VII-E in [159] and Sections IV-B, V, and VII-D in [161], which are reproduced in this dissertation). Chapter4 presents the realization of the proposed scheme. Section 4.1 discusses the implementation on an FPGA platform. Some of the content of this section is reprinted from [170], specifically, Sections IV and V in [170]. Section 4.2 presents the implementation of the proposed system on a COTS platform, including a complete OS design. The content of this section is reprinted from co-authored papers [159, 161]. I was principally responsible for the schedulability evaluation, presented in Subsection 4.2.3; the rest of the section is reproduced for the sake of completeness. In Chapter5, the scheduling scheme is extended to global scheduling. The content of this chapter is reprinted from co-authored paper [18], in which my contribution was to extend the scheduling policy (Section 5.2), bounding the schedule holes (Section 5.3.2), and conducting the evaluation (Section 7.3). In Chapter6, an asynchronous inter-task communication model is presented. The content of this chapter is reprinted from co-authored paper [160], in which I was responsible for adapting the model in the proposed execution scheme, developing the worst-case latency analysis and conducting the evaluation, (Sections 6.1, 6.3, and 6.4). Finally, Chapter7 presents a new scheduling model for parallel tasks, bundled scheduling, and Chapter8 presents a predictable inter-core NoC. The content of these chapters is reprinted from [172] and [173], respectively. iv Abstract Multicore systems will continue to spread in the domain of real-time embedded systems due to the increasing need for high-performance applications. This research discusses some of the challenges associated with employing multicore systems for safety critical real-time applications. Mainly, this work is concerned with providing: 1) efficient inter-core timing isolation for independent tasks, and 2) predictable task communication for communicat- ing tasks. Principally, we introduce a new task execution model, based on the 3-phase execution model, that exploits the Direct Memory Access (DMA) controllers available in modern embedded platforms along with ScratchPad Memories (SPMs) to enforce strong timing isolation between tasks. The DMA and the SPMs are explicitly managed to pre- load tasks from main memory into the local (private) scratchpad memories. Tasks are then executed from the local SPMs without accessing main memory. This model allows CPU execution to be overlapped with DMA loading/unloading operations from and to main memory. We show that by co-scheduling task execution on CPUs and using DMA to access memory and I/O, we can efficiently hide access latency to physical resources. In turn, this leads to significant improvements in system schedulability, compared to both the case of unregulated contention for access to physical resources, and to previous cache and SPM management techniques for real-time systems. The presented SPM-centric scheduling algorithms and analyses cover single-core, partitioned, and global real-time systems. The proposed scheme is also extended to support large tasks that do not fit entirely into the local SPM. Moreover, the schedulability analysis considers the case of recovering from transient soft errors (bit flips caused by a single event upset) in several levels of memories, that cannot be automatically corrected in hardware by the ECC unit. The proposed SPM-centric scheduling is integrated at the OS level; thus it is transparent to applications. The proposed scheme is implemented and evaluated on a FPGA platform and a Commercial-Off-The-Shelf (COTS) platform. In regards to real-time task communication, two types of communication are consid- ered. 1) Asynchronous inter-task communication, between either sequential tasks (single- threaded) or parallel tasks (multi-threaded). 2) Intra-task communication, where parallel threads of the same application exchange data. A new task scheduling model for parallel tasks (Bundled Scheduling) is proposed to facilitate intra-task communication and reduce synchronization overheads. We show that the proposed bundled scheduling model can be applied to several parallel programming models, such as fork-join and DAG-based applications, leading to improved system schedulability. Finally, intra-task communication is governed by a predictable inter-core communication platform. Specifically, we propose HopliteRT, a lean and predictable Network-on-Chip that connects the private SPMs. v Acknowledgements First and foremost, I am grateful to Allah for empowering me to complete this thesis. Without his help, I would not have the ability to reach this stage in my life. I would like to seize this opportunity to thank all the people who made this dissertation possible and my Ph.D. a spectacular experience. My deepest gratitude goes to my advisor, Prof. Rodolfo Pellizzoni. It is hard to describe the uncountable ways in which Rodolfo impacted my life and career. He has shared with me his expertise and always provided constructive guidance. He involved me in exciting projects and valued my suggestions. I truly admire his technical depth and scientific rigorousness. All in all, he has been an irreplaceable advisor and continues to be a precious friend. I am extremely grateful to the members of my thesis committee: Prof. Nathan Fisher, Prof. Hiren Patel, Prof. Sebastian Fischmeister, and Prof. Martin Karsten. Each of them has supported my work in a peculiar way. I thank Prof. Fisher for taking the time and serving as the external examiner. His feedback was significantly valuable. My encounter with Prof. Fischmeister was the initial trigger to gain interest in the field of real-time systems during my master. I enjoyed his course and admired his regard to the practical aspects of the field. The lectures of Prof. Patel in computer architecture were inspiring. I have learned a lot from his research vision and accurate definitions to research problems. I also have learned a lot from the expertise of Prof. Karsten in operating systems. My encounter with Prof. Karsten inspired me with several research ideas that ended up in this dissertation. I extend my thanks and appreciation to my co-authors, Ahmed Alhammad, Nachiket Kapre, Rohan Tabish, Renato Mancuso, and Marco Caccamo, with whom I have always enjoyed discussing the research-related problems and received valuable feedback. My thanks also extend to Michael Guo, Muhammad Refaat, Anirudh Kaushik, and Mohamed Hassan, the colleagues who I sadly did not have a chance to write papers with, but their help and feedback are highly appreciated. My time at Waterloo was made enjoyable due to the many friends who became a part of my life. Special thanks go to my close friend Ali Albishi, with whom I really enjoyed discussing different aspects of life. His enthusiastic personality has been motivational for me to proceed during the tough times. I am grateful to King Abdulaziz University for their financial support. Without their support, I probably would not have the opportunity to attend a great school like the University of Waterloo. vi Foremost, I thank my father, Mohammad, for his endless love and optimism and for being with me at all times with his prayers. You always encourage me to do the best in my life.

Load more