Adaptive Cpu-Budget Allocation for Soft-Real-Time Applications

ADAPTIVE CPU-BUDGET ALLOCATION FOR SOFT-REAL-TIME APPLICATIONS A Thesis Presented to The Academic Faculty by Safayet N. Ahmed In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Electrical and Computer Engineering Georgia Institute of Technology August 2014 Copyright c 2014 by Safayet N. Ahmed ADAPTIVE CPU-BUDGET ALLOCATION FOR SOFT-REAL-TIME APPLICATIONS Approved by: Professor Bonnie H. Ferri, Professor Magnus Egerstedt Committee Chair School of ECE School of ECE Georgia Institute of Technology Georgia Institute of Technology Professor Yorai Wardi Professor Sudhakar Yalamanchili School of ECE School of ECE Georgia Institute of Technology Georgia Institute of Technology Professor Karsten Schwan Date Approved: 28 April 2014 School of CS Georgia Institute of Technology To my family, my parents Nizam Ahmed and Mariam Sultana and my brother Rubayet Ahmed iii ACKNOWLEDGEMENTS I would like to start by thanking Professor Bonnie Ferri, who consented to be my PhD advisor and provided much needed help and guidance throughout my PhD career. More importantly, professor Ferri was patient with me at times when I may have been difficult and gave me time to mature and grow as a researcher. Additionally, I would like to thank the members of my reading cmmittee, Professor Sudhakar Yalamanchili and Professor Yorai Wardi for their help and encouragement. I would like to thank Dr. Muhammad Muqarrab Bashir, Dr. Arif Selcuk Uluaguc, Dr. Aleem Mushtaque, Dr. Jeffrey Young, and Dr. Shahnewaz Siddique as well for their help, guidance, and encouragement. I would like to thank Michael Giardino for his help implementing significant aspects of LAMbS. Also, I would like to thank my friends and peers, Musheer Ahmed, Talha Khan, Robert Palmer, and Dr. Daniel Lertpratchiya for making the PhD process a bit more bearable. It always helps to know people who are in the same boat. In addition, I would like to thank Muhammad Rizwan, Elhadji Alpha A Bah, Usman Ali, Aftab Ahmed Khattak, and Hamza Abbasi for their support. I would like to thank my parents Nizam Ahmed and Mariam Sultana for their prayers, patience, and sacrifice throughout my years as a PhD Student. I could not have come this far without their help and support. Lastly, I thank Allah, the Most Mericful and Wise, for all His blessings and for keeping me in the company of good people. iv TABLE OF CONTENTS DEDICATION .................................. iii ACKNOWLEDGEMENTS .......................... iv LIST OF TABLES ............................... ix LIST OF FIGURES .............................. x I INTRODUCTION ............................. 1 1.1 LAMbS: Linear-Adaptive-Models-based System . 3 1.2 Main Contributions . 6 1.3 Dissertation Outline . 9 II POWER MANAGEMENT OF REAL-TIME SYSTEMS ..... 11 2.1 Power Management Mechanisms . 12 2.1.1 Power Management Mechanisms in Commercial Processors . 14 2.1.2 Power Management of Additional Components . 16 2.2 Real-time Scheduling . 16 2.3 Dynamic Voltage and Frequency Scaling . 19 2.3.1 Dynamic Slack Reclamation . 20 2.3.2 Accelerating Voltage Schedules . 21 2.3.3 Intra-Task DVS . 22 2.4 Exploiting Soft Timing Constraints . 24 2.4.1 CPU Reservations . 26 2.4.2 Adaptive CPU-Budget Allocation . 28 2.4.3 Power-Management of Soft-Real-Time systems . 29 2.5 Real-World Platforms . 31 2.5.1 On-chip Work and Off-chip Work . 32 2.5.2 Leakage Power and Critical Frequency . 33 2.5.3 Sleep States and Break-even Time . 34 2.5.4 Discrete DVS Settings . 35 v 2.5.5 Dynamic Power and Performance Parameters . 36 2.6 Comparison of LAMbS to Previous Work . 37 III ASYNCHRONOUS BUDGET ADAPTATIONS .......... 40 3.1 SRT Tasks and Reservation-based Scheduling . 42 3.1.1 Reservation-based Scheduling . 44 3.1.2 Virtual Finishing Time . 47 3.2 Prediction-based Budget Allocation . 48 3.2.1 The PBS Algorithm . 49 3.2.2 How PBS Works . 50 3.3 Queue Stability . 51 3.3.1 Notation . 52 3.3.2 Load, Budget, and Estimated Mean . 53 3.3.3 Expected Queue Length . 54 3.3.4 Conditional Probability of Minimum Queue Length . 56 3.3.5 Queue Stability . 58 3.3.6 Further Discussion . 60 3.4 Simulation-Based Comparison . 60 3.5 Software Architecture for an Asynchronous Budget Mechanism . 61 3.5.1 SRT Tasks . 63 3.5.2 The Kernel Module . 65 3.5.3 The Allocator Daemon . 66 3.6 Experiments and Results . 68 3.6.1 Performance Under Overload . 69 3.6.2 MPEG4 Video Decoding Workload . 70 IV TWO-STAGE PREDICTION ...................... 76 4.1 Revisiting the Task Model and CPU-Reservation Model . 79 4.2 First-Stage Prediction . 80 4.2.1 Array of Moving Averages . 81 vi 4.2.2 The Normalized Variable-Step LMS Algorithm . 82 4.2.3 Array of Adaptive Filters . 85 4.2.4 The LMS-MA Hybrid Predictor . 87 4.3 The Second Stage . 87 4.3.1 Allocated Budget & Computational Load . 88 4.3.2 Bounding the Probability of a Missed Deadline . 89 4.3.3 Using the Output of the First-Stage Predictor . 91 4.3.4 Handling Severe Underpredictions . 93 4.3.5 Approximating the Conditional Mean . 94 4.3.6 Multi-step Prediction . 96 4.4 Implementing the TSP Algorithm . 99 4.4.1 Measurement and Prediction of Job-Execution Times . 100 4.4.2 Soft-Real-Time Tasks . 103 4.4.3 Computing Budget Allocations . 104 4.5 Experiments and Results . 106 4.5.1 Workloads and Experimental Platform . 106 4.5.2 Execution-time Characteristics . 108 4.5.3 Predictor Configurations . 109 4.5.4 Predictor Accuracy and Overhead . 110 4.5.5 Performance of the TSP Algorithm . 111 4.6 Conclusion . 116 V VIC-BASED BUDGET ALLOCATION ................ 118 5.1 The Need for an Alternative Measure of Computation . 120 5.1.1 A Synthetic Workload to Control Memory Boundedness . 121 5.1.2 Invariance Under Change in CPU Frequency . 123 5.1.3 Variability in Instruction-Retirement Rate . 127 5.2 Virtual Instruction Count . 131 5.2.1 Computing VIC . 131 vii 5.2.2 VIC Source . 132 5.2.3 VIC-based Budget . 134 5.2.4 Estimating Average Instruction-Retirement Rates . 135 5.2.5 Handling Overload Conditions . 136 5.3 Implementation of a VIC-Based Budget Mechanism . 137 5.3.1 SRT Tasks . 138 5.3.2 The Allocator . 138 5.3.3 The MOA Module . 140 5.3.4 The PBS Module . 140 5.3.5 Handling an Idle CPU . 142 5.4 Experiments and Results . 143 5.4.1 Performance Metrics . 144 5.4.2 Invariance Under Change in The Mode of Operation . 145 5.4.3 Response Time to Changes in The Mode of Operation . 147 5.4.4 Bandwidth Allocation vs Deadline-Miss Rate . 148 5.5 Conclusion . 151 VI FUTURE WORK ............................. 153 6.1 Power Management Using Linear Models and Constraints . 154 6.1.1 The Power Model . 154 6.1.2 The Optimum Operation-Mode Schedule . 156 6.2 Power Management with Idle States . 158 6.2.1 Addressing Transition Overheads . 159 6.2.2 Practical Considerations . 161 6.3 Workload Mixes with Nonuniform Retirement Rates . 163 VII SUMMARY ................................. 167 REFERENCES .................................. 172 viii LIST OF TABLES 1 Test Platform Configuration. 68 2 Test Platform Configuration. 107 3 Workload Execution-Time Statistics . 108 4 Comparison of Predictor Overheads (in ns) . 110 5 Comparison of Normalized RMS Prediction Error . 111 6 Test Platform Configuration. 122 7 The size of the linked list in the different configurations of the mem- bound workload. 128 8 State updates for a VIC source over a reservation period for Example 1.133 9 Workload Mixes with Uniform and Nonuniform Instruction-Retirement Rates. 164 ix LIST OF FIGURES 1 The overall structure of the Linear Adaptive Models-based System. 3 2 The periodic-real-time task model . 18 3 Maximum, mean, and minimum decoding times of 10 high-definition MPEG4 video files. 25 4 Example schedule of two SRT tasks, τ1 and τ2, with a constant budget allocation. 45 5 Job execution times from a synthetic workload. 61 6 VFT error for the FTU policy (top) and PBS policy (bottom) with the synthetic workload shown in Figure 5. 62 7 Overall structure of the asynchronous-budget mechanism. 63 8 Basic template of an SRT task. 64 9 Computation times of two tasks in the synthetic workload. 70 10 VFT errors for the two tasks during the overload time period for the synthetic workload shown in Figure 9. 70 11 Histogram of job-execution times of τ1 (top) and τ2 (bottom), respectively. 72 12 ACF of job-execution times of τ1 (top) and τ2 (bottom), respectively. 73 13 Trade-off between miss-rate and VFT error with a constant budget allocation. 74 14 Trade-off between miss-rate and VFT error with PBS-based budget allocation. 75 15 Overall structure of the Two-Stage Prediction algorithm. 78 16 Summary of the normalized VS LMS algorithm. 85 17 Summary of terms used for the TSP algorithm. 99 18 Summary of the second stage of the TSP algorithm. 100 19 Budget allocation in the implementation of PBS. Job-execution times consist of only workload-specific computation. 101 20 Budget allocation in the implementation of TSP. Execution-time prediction is done in the context of the corresponding SRT task. Job- execution time consists of both workload-specific computation and prediction-related computation. 102 x 21 Template of an SRT task. 104 22 List of values required from a predictor for budget allocation. The variables,c ^0[j], e0[j],c ^l[j], and el[j] are the single-step and.

Load more