Predictable Task Migration Support and Static Task Partitioning for Scalable Multicore Real-Time Systems

ABSTRACT SARKAR, ABHIK. Predictable Task Migration Support and Static Task Partitioning for Scalable Multicore Real-Time Systems. (Under the direction of Frank Mueller.) Multicores are becoming ubiquitous, not only in general-purpose but also embedded comput- ing. This trend is a reflection of contemporary embedded applications posing steadily increasing demands in processing power. On such platforms, prediction of timing behavior to ensure that deadlines of real-time tasks can be met is becoming increasingly difficult. While real-time global/semi-partitioned multicore scheduling approaches help to assure deadlines based on firm theoretical properties, their reliance on task migration poses a significant challenge to timing predictability in practice. Task migration actually (a) reduces timing predictability for contemporary multicores due to cache warm-up overheads, (b) renders locking of cache lines infeasible for multicore real-time systems and (c) increases traffic on the network-on-chip (NoC) inter- connect. Additionally, prior work in static task partitioning on multicore architectures focuses on shared cache organization with a fixed number of cores. Such schemes are not suitable for static partitioning on scalable multicore architectures that feature private cache organization. We attempt to address these limitations in this dissertation. The following are the key contributions of this work: 1. First, a task migration into two cores imposes cache warm-up overheads on the migration target, which can lead to missed deadlines for tight real-time schedules. We propose a novel push-assisted cache migration model to pro-actively migrate cache lines through novel software and micro-architectural support. Our mechanism imposes cache migration delays at a fraction of the task's execution time. This delay can be steered to fill idle slots in the schedule, i.e., it does not contribute to the execution time of the migrated task. We also propose micro-architectural modifications that further reduce the delay and bus traffic. 2. Second, locked cache lines that are predominantly used in real-time systems are immobile during task migration. We address this issue by extending the push-assisted migration model with several cache migration techniques to efficiently retain locked cache lines on a bus-based chip multi-processor architecture. We also provide deterministic migration delay bounds that help schedulers to decide which migration technique(s) to utilize while migrating a single or multiple tasks. This information also allows the scheduler to deter- mine feasibility of task migrations, which is critical for the safety of any hard real-time system. Such proactive migration of locked cache lines in multicores is unprecedented to our knowledge. 3. Third, we further the use of locked caches on scalable multicore architectures by analyzing its impact on static task partitioning algorithms. In shared cache architectures, a single resource is shared among all the tasks. However, in scalable cache architectures with private caches, conflicts exist only among the tasks scheduled on one core. This calls for a cache-aware allocation of tasks onto cores. Here, we propose a novel variant of the cache-unaware First Fit Decreasing (FFD) algorithm called the Naive locked First Fit Decreasing (NFFD) policy. We propose two cache-aware static scheduling schemes: (1) Greedy First Fit Decreasing (GFFD) and (2) Colored First Fit Decreasing (CoFFD) for task sets where tasks do not have intra-task conflicts among locked regions (Scenario A). NFFD is capable of scheduling high utilization task sets that FFD cannot schedule. CoFFD consistently outperforms GFFD requiring a lower number of cores and lower system utilization. For a more generic case where tasks have intra-task conflicts, we split the task partitioning into two phases: Task Selection and Task Allocation (Scenario B). Instead of resolving conflicts at a global level, these algorithms resolve conflicts among regions while allocating a task onto a core and perform unlocking at region-level instead of task-level. We show that a combination of our novel Dynamic Ordering (Task Selection) with Chaitin's Coloring (Task Allocation) scheme reduces the number of cores required considerably over a basic scheme (combination of Monotone Ordering and Regional FFD). Overall, this dissertation suggests that deployment of locked caches and hardware support for cache migration on scalable multi-processors can enable more predictable and efficient multi- processor scheduling. c Copyright 2012 by Abhik Sarkar All Rights Reserved Predictable Task Migration Support and Static Task Partitioning for Scalable Multicore Real-Time Systems by Abhik Sarkar A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Computer Science Raleigh, North Carolina 2012 APPROVED BY: James Anderson Xuxian Jiang Helen Gu James Tuck Frank Mueller Chair of Advisory Committee DEDICATION To Ma and Baba. ii BIOGRAPHY Abhik Sarkar was born and raised in New Delhi, India. He attended St. Xavier's high school, Delhi. Thereafter, to pursue a degree in Electronics and Communication Engineering he attended Delhi University. After completing his undergraduate degree, he worked for two years as an Associate Systems Lab Engineer in Advanced Systems Technologies group at STMi- croelectronics, India. While working in a research group led by Dr. Kaushik Saha, he developed a keen interest in parallel processing and parallel computer architectures. During his tenure at ST, he worked on parallel implementations of audio/video codecs. In fall 2005, he moved to North Carolina as a graduate student in the Department of Electrical and Computer Engineering at North Carolina State University (NCSU). There, he worked under the guidance of Dr. Yan Solihin and Dr. Joe Defenderfer (Tekelec Corp.). As a part of his Master's degree in Computer Engineering, he developed libraries to assist development of lock-free data structures. Thereafter, he joined the Department of Computer Science at NCSU in 2008 to continue his graduate studies as a doctoral student in systems research group led by Dr. Frank Mueller. Since then he has been pursuing his research interests in hardware/software co-design for predictable multicore real-time systems. During his doctoral studies he also interned in research labs at Intel and VMware. After finishing his doctoral studies at NCSU, he will be joining Intel Corporation. iii ACKNOWLEDGEMENTS It had drama, excitement, periods of boredom, moments of brilliance, dozes of good calls and poor judgments, tests of patience, lack of concentration, desperate dives, constant pressure, words of confidence, disappointing runs, lots of prayers, back and wrist problems, distance from family, but at the end a tryst with tearful celebrations. As every cricket enthusiast would call it the description of a single test inning played by the god of cricket, Sachin Ramesh Tendulkar. For me, that is the description of my PhD. For non-cricket playing souls, please enlighten yourself with a copy of May'12 edition of TIME magazine. During this period, one of the lessons I have learnt is to be grateful, which makes the acknowledgments an integral part of this dissertation. Firstly, I would like to thank my parents for teaching me the importance of perseverance, for inspiring me with their selflessness, and for all the heart-felt wishes that have made it possible. In their lives, my father wanted to have an engineering degree and my mother wanted to be a doctor. This dissertation strikes two birds with one stone. How? This is left as an exercise for the reader. Ma/Baba, I dedicate this dissertation to you. Secondly, I would like to thank my advisor Dr. Frank Mueller. His commitment towards me and my research has always motivated me to do better. Throughout my research, he has given me invaluable advice and guidance, both in research and life. His constant flow of ideas has amazed me but at the same time, challenged me to think beyond the conventional. His method of operating with doodles on discussion boards has enabled me to appreciate the concept of controlled chaos. It has been an invaluable experience to work with him and it is impossible for me to express my gratitude towards him in words. I am grateful to Paula D'Onofrio for being the motherly figure to me when I needed the most. Her unconditional support for my dreams and aspirations is the integral part of maintaining my focus in this long drawn process. Her furry feline kids Bella, Willoughby, and Oliver have been great stress busters. Their research on the importance of slumber as a mechanism to relieve stress has motivated me to follow suit. I am grateful to Dr. Robert Jenks whose words of encouragement have imbibed self belief. He made me realize that wooden block puzzles can have a humbling effect. I thank Corrine Shuster for making me realize the importance of gratitude and savoring life in every situation. Special thanks to Dr. James Tuck for helping me with simulation environments and the discussions that have helped me understand computer architecture at depth. I sincerely thank Dr. James Anderson, Dr. Helen Gu and Dr. Xuxian Jiang for their advice, suggestions and questions that have contributed substantially to making this thesis comprehensible. I would like to thank Nikhil Deshpande, with whom I have shared a large part of my graduate iv student life. We started by being disgruntled room mates but over the period of seven years, I have developed immense respect for this class act. Then there were two, Suresh Thummalapenta and Gaurav Bawa, who are the embodiment of hard work and Indian community networking. Special thanks to John Whitlow for accommodating me under his roof that I have been calling home during my final year of PhD. You can only wish to have a landlord who ships Garra Rufa fish to get you manicured and pedicured while you write your dissertation.

Predictable Task Migration Support and Static Task Partitioning for Scalable Multicore Real-Time Systems

Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis

Parallel Architecture Hardware and General Purpose Operating System

High-Performance Optimizations on Tiled Many-Core Embedded Systems: a Matrix Multiplication Case Study

Many-Core Key-Value Store

Acceleration of CFD and Data Analysis Using Graphics Processors Ali Khajeh Saeed University of Massachusetts Amherst, [email protected]

Tile Processor Architecture Overview for the Tilepro Series

Parallelized Benchmark-Driven Performance Evaluation of Smps and Tiled Multi-Core Architectures for Embedded Systems

Characterizing and Optimizing the Performance of the MAESTRO 49-Core Processor Eric W

A Survey of Multicore Processors

Jul 2 0 2011 Libraries