PREEMPTION-DELAY AWARE SCHEDULABILITY ANALYSIS of REAL-TIME SYSTEMS 2020 ISBN 978-91-7485-467-1 ISBN 978-91-7485-467-1 ISSN 1651-4238 Web: P.O

Home , Preemption (computing)

Mälardalen University Doctoral Dissertation315 Doctoral University Mälardalen Preemption-Delay Aware Schedulability of Analysis Real-Time Systems Marković Filip

Filip Marković PREEMPTION-DELAY AWARE SCHEDULABILITY ANALYSIS OF REAL-TIME SYSTEMS 2020 ISBN 978-91-7485-467-1 ISBN 978-91-7485-467-1 ISSN 1651-4238 www.mdh.se Web: P.O. Box 883, SE-721 23 Västerås. Sweden 883, SE-721 23 Västerås. Box P.O. Sweden 325, SE-631 05 Eskilstuna. Box P.O. [email protected] E-mail: Address: Address: 1

ý ý

!" #$ "% h &'&' !" #$ "% h &'&' ()*+)*,+,-* ()*+)*,+,-* -,&+ -,&+ .(3ULQW$%6WRFNKROP2 .(3ULQW$%6WRFNKROP2 3

! "#$ % ""& " "" ! "#$ % ""& " ""

& '( h) & '( h)

( * ( + , ( * ( + ,

*-. ,, ( , ( /* (' ( * *-. ,, ( , ( /* (' ( * -. 0 , )+ ( ((** -- , -. *1 , -. 0 , )+ ( ((** -- , -. *1 , 23 454505 67 89*:0 +.,(0; 1 23 454505 67 89*:0 +.,(0; 1

&(3 '' < ) '- )% 0 - 0 &(3 '' < ) '- )% 0 - 0

( * -. 0 , )+ ( ( ( * -. 0 , )+ ( ( 4

=) =) ")+ 3= - * * 3 ' *' )+ 3 , * - - ")+ 3= - * * 3 ' *' )+ 3 , * - - , 30 * ,)+ 3= ( = , 3 )+ 3= + + ) 3 + , 30 * ,)+ 3= ( = , 3 )+ 3= + + ) 3 + + ''/ * - * * '* 3)+ ( / )3 * 0 * + ''/ * - * * '* 3)+ ( / )3 * 0 * 0 )0 =3 0 + ) / - ' *' )+ 3 ,0 , - ) ''/ * 0 )0 =3 0 + ) / - ' *' )+ 3 ,0 , - ) ''/ * -*))3 ,-(' *' ) ' ,' *' 3) -*))3 ,-(' *' ) ' ,' *' 3) - , )+ 3= 30 + ,+ *' ))3 ' = ''/ * - , )+ 3= 30 + ,+ *' ))3 ' = ''/ * ' *' 0 *' = - ''/ * 0 >+ )+ * + ' *' 0 *' = - ''/ * 0 >+ )+ * + )*' + ''/ * 0 + ,+ ) ' , ) ))33 * ) 3)+ )*' + ''/ * 0 + ,+ ) ' , ) ))33 * ) 3)+ ) * - ' )+ 3= 3 + ) ) ) *') + * ) * - ' )+ 3= 3 + ) ) ) *') + * + - 0+ ,-+ + < + - 0+ ,-+ + < *' + ))3) -)+ 3= - )+ 3= ( * *' + ))3) -)+ 3= - )+ 3= ( * *3 - / ' ' *' )+ 3 ,0=))3 ,- ,+ - ''/ * *3 - / ' ' *' )+ 3 ,0=))3 ,- ,+ - ''/ * -' *' -' *' ! ) =3 + * - * , - , ) * * 3 ' *' ! ) =3 + * - * , - , ) * * 3 ' *' )+ 3 ,='' ,> ))+ > )+ 3= < --3' *' (0 )+ 3 ,='' ,> ))+ > )+ 3= < --3' *' (0 -(> +- / ' *' ' 0> '' * +- ,- -(> +- / ' *' ' 0> '' * +- ,- ,+3'' =3 ))+ ' *' -(> +- / ' *' ' & 0 ,+3'' =3 ))+ ' *' -(> +- / ' *' ' & 0 > ) =3 + * -*3 ) ' + * *='' , > ) =3 + * -*3 ) ' + * *='' , ' , ) - >- ) , ' ,0 = , ,+ -- ) - ' , ) - >- ) , ' ,0 = , ,+ -- ) - -- ' , , ' () >+ )+ 2 ' ? + )+ 3= -- ' , , ' () >+ )+ 2 ' ? + )+ 3= -( + ) /-' *' @)+ 3 , -( + ) /-' *' @)+ 3 ,

"%ABCABDCDEB "%ABCABDCDEB ""ED4C ""ED4C 5

Abstract Abstract

Schedulability analysis of real-time systems under preemptive scheduling may Schedulability analysis of real-time systems under preemptive scheduling may often lead to false-negative results, deeming a schedulable taskset being un- often lead to false-negative results, deeming a schedulable taskset being unschedulable. This is the case due to the inherent over-approximation of many schedulable. This is the case due to the inherent over-approximation of many time-related parameters such as task execution time, system delays, etc., but also, time-related parameters such as task execution time, system delays, etc., but also, in the context of preemptive scheduling, a signiﬁcant over-approximation arises in the context of preemptive scheduling, a signiﬁcant over-approximation arises from accounting for task preemptions and corresponding preemption-related from accounting for task preemptions and corresponding preemption-related delays. To reduce false-negative schedulability results, it is highly important delays. To reduce false-negative schedulability results, it is highly important to as accurately as possible approximate preemption-related delays. Also, it to as accurately as possible approximate preemption-related delays. Also, it is important to obtain safe approximations, which means that compared to the is important to obtain safe approximations, which means that compared to the approximated delay, no higher corresponding delay can occur at runtime since approximated delay, no higher corresponding delay can occur at runtime since such case may lead to false-positive schedulability results that can critically such case may lead to false-positive schedulability results that can critically impact the analysed system. Therefore, the overall goal of this thesis is: impact the analysed system. Therefore, the overall goal of this thesis is:

To improve the accuracy of schedulability analyses to identify schedulable To improve the accuracy of schedulability analyses to identify schedulable tasksets in real-time systems under ﬁxed-priority preemptive scheduling, by tasksets in real-time systems under ﬁxed-priority preemptive scheduling, by accounting for tight and safe approximations of preemption-related delays. accounting for tight and safe approximations of preemption-related delays.

We contribute to the domain of timing analysis for single-core real-time systems We contribute to the domain of timing analysis for single-core real-time systems under preemptive scheduling by proposing two novel cache-aware schedulability under preemptive scheduling by proposing two novel cache-aware schedulability analyses: one for fully-preemptive tasks, and one for tasks with fixed preemption analyses: one for fully-preemptive tasks, and one for tasks with fixed preemption points. Also, we propose a novel method for deriving safe and tight upper points. Also, we propose a novel method for deriving safe and tight upper bounds on cache-related preemption delay of tasks with fixed preemption points. bounds on cache-related preemption delay of tasks with fixed preemption points. Finally, we contribute to the domain of multi-core partitioned hard real-time Finally, we contribute to the domain of multi-core partitioned hard real-time systems by proposing a novel partitioning criterion for worst-fit decreasing systems by proposing a novel partitioning criterion for worst-fit decreasing partitioning, and by investigating the effectiveness of different partitioning partitioning, and by investigating the effectiveness of different partitioning strategies to provide task allocation which does not jeopardize the schedulability strategies to provide task allocation which does not jeopardize the schedulability of a taskset in the context of preemptive scheduling. of a taskset in the context of preemptive scheduling.

i i 6 7

Written with the immense support and understanding Written with the immense support and understanding from my supervisors, my parents, from my supervisors, my parents, my wife, and my daughter. my wife, and my daughter. 8 9

Acknowledgments Acknowledgments

To acknowledge all the entities that were relevant to the creation of this thesis is To acknowledge all the entities that were relevant to the creation of this thesis is an almost impossible quest. Nonetheless, with the causality of the universe in an almost impossible quest. Nonetheless, with the causality of the universe in mind, I am humbly aware and grateful for this opportunity to mind, I am humbly aware and grateful for this opportunity to

"play gracefully with ideas". "play gracefully with ideas".

In the remainder, I will name a few persons and institutions, whose involvement In the remainder, I will name a few persons and institutions, whose involvement directly affected the mind flow which traversed over the last five years and directly affected the mind flow which traversed over the last five years and settled in this very thesis. settled in this very thesis.

I owe my utmost debt of gratitude to my professors Jan Carlson, and Radu I owe my utmost debt of gratitude to my professors Jan Carlson, and Radu Dobrin. Jan Carlson is my main PhD supervisor and the most important force Dobrin. Jan Carlson is my main PhD supervisor and the most important force in the transformation of my PhD progress into a growth function. His ability in the transformation of my PhD progress into a growth function. His ability to intertwine quick-witted humor with discussions on mathematical formalism to intertwine quick-witted humor with discussions on mathematical formalism made this research even more interesting and playful. He is also the most patient made this research even more interesting and playful. He is also the most patient person I met in my life and an amazingly modest, kind, and supportive mentor. person I met in my life and an amazingly modest, kind, and supportive mentor. Professor Radu is the person who offered me to become a PhD student, after one Professor Radu is the person who offered me to become a PhD student, after one football game, because that is when you offer them. Without his help, I would football game, because that is when you offer them. Without his help, I would probably still be lost in the administrative maze, trying to ﬁnd the necessary probably still be lost in the administrative maze, trying to ﬁnd the necessary document for even starting the PhD. document for even starting the PhD. In addition, there were multiple instances of very important decisions when In addition, there were multiple instances of very important decisions when both of them stepped up and provided an amazing understanding and support. both of them stepped up and provided an amazing understanding and support. Among many such occasions, I must single out their support when my daughter Among many such occasions, I must single out their support when my daughter was born, and all of their efforts to allow me to enjoy that period without having was born, and all of their efforts to allow me to enjoy that period without having any problems and concerns. For that, I will be always grateful, and I cannot any problems and concerns. For that, I will be always grateful, and I cannot thank you enough! thank you enough!

v v 10

vi vi

I also owe a huge debt of gratitude to professor Björn Lisper, my third PhD I also owe a huge debt of gratitude to professor Björn Lisper, my third PhD supervisor, who opened me the most important doors of my research: Cache- supervisor, who opened me the most important doors of my research: Cache- Related Preemption Delay analysis, one day when he stopped by my office with Related Preemption Delay analysis, one day when he stopped by my office with a few papers and a book from his collection. a few papers and a book from his collection. My greatest gratitude goes to Mälardalen University which enabled me to My greatest gratitude goes to Mälardalen University which enabled me to step into the grounds of scientific research, and also to all people involved in step into the grounds of scientific research, and also to all people involved in the Euroweb+ project which funded my Master and PhD studies for the first the Euroweb+ project which funded my Master and PhD studies for the first three years. Few persons from the administration with whom I communicated three years. Few persons from the administration with whom I communicated directly, and therefore am highly aware of their help, are Carola Ryttersson, directly, and therefore am highly aware of their help, are Carola Ryttersson, Susanne Fronnå, and Jenny Hägglund. Thank you all for giving me the thread Susanne Fronnå, and Jenny Hägglund. Thank you all for giving me the thread for escaping the administrative maze. for escaping the administrative maze. I also want to thank professor Sebastian Altmeyer, who was the faculty I also want to thank professor Sebastian Altmeyer, who was the faculty reviewer for my Licentiate thesis, and someone with whom I discussed research reviewer for my Licentiate thesis, and someone with whom I discussed research ideas on many conferences. I am very happy that our discussions were man- ideas on many conferences. I am very happy that our discussions were man- ifested in a joint publication. His comments for my Licentiate thesis greatly ifested in a joint publication. His comments for my Licentiate thesis greatly improved this very thesis. My work was also scrutinized by many other re- improved this very thesis. My work was also scrutinized by many other re- viewers over the years, and I want to thank them all, especially professor Javier viewers over the years, and I want to thank them all, especially professor Javier Gutiérrez, and professor Christian Berger who were in the grading committee Gutiérrez, and professor Christian Berger who were in the grading committee for my Licentiate thesis. Also, I owe a great debt of gratitude to professor for my Licentiate thesis. Also, I owe a great debt of gratitude to professor Mikael Sjödin, who reviewed the PhD proposal and the initial draft of this Mikael Sjödin, who reviewed the PhD proposal and the initial draft of this thesis, and whose valuable comments improved its contents and understanding. thesis, and whose valuable comments improved its contents and understanding. I am very grateful to professor Javier Gutiérrez, professor Giuseppe Lipari, I am very grateful to professor Javier Gutiérrez, professor Giuseppe Lipari, professor Paul Pop, and professor Alessandro Papadopoulos, for accepting the professor Paul Pop, and professor Alessandro Papadopoulos, for accepting the invitation to be in the grading committee for this thesis, and I am very grateful invitation to be in the grading committee for this thesis, and I am very grateful for that. Lastly, I owe a huge debt of gratitude to professor Enrico Bini, who for that. Lastly, I owe a huge debt of gratitude to professor Enrico Bini, who accepted the invitation to be the main faculty reviewer for this thesis. accepted the invitation to be the main faculty reviewer for this thesis. Over the last five years, there were many occasions when my colleagues Over the last five years, there were many occasions when my colleagues and professors were the ones who provided valuable insights, help and support. and professors were the ones who provided valuable insights, help and support. For this reason, I owe my greatest debt of gratitude to Irfan Šljivo, Matthias For this reason, I owe my greatest debt of gratitude to Irfan Šljivo, Matthias Becker, Branko Miloradovic,´ Mirgita Frasheri, Saad Mubeen, Omar Jaradat, Becker, Branko Miloradovic,´ Mirgita Frasheri, Saad Mubeen, Omar Jaradat, Sebastian Hahn, Robbert Jongeling, Davor Cirkinagiˇ c,´ Jean Malm, Leo Hatvani, Sebastian Hahn, Robbert Jongeling, Davor Cirkinagiˇ c,´ Jean Malm, Leo Hatvani, Nandinbaatar Tsog, Jakob Danielsson, Ignacio Sañudo, Juan Maria Rivas, Nandinbaatar Tsog, Jakob Danielsson, Ignacio Sañudo, Juan Maria Rivas, Antonio Paolillo, Adnan and Aida Cauševiˇ c,´ Mohammad Ashjaei, Sarah Afshar, Antonio Paolillo, Adnan and Aida Cauševiˇ c,´ Mohammad Ashjaei, Sarah Afshar, Afshin Ameri, Meng Liu, Elena Lisova, Svetlana Girs, Zdravko Krpic,´ Vaclav Afshin Ameri, Meng Liu, Elena Lisova, Svetlana Girs, Zdravko Krpic,´ Vaclav Struhar, Nitin Desai, Gabriel Campeanu, Julieth Castellanos, Per Hellström, Struhar, Nitin Desai, Gabriel Campeanu, Julieth Castellanos, Per Hellström, Patrick Denzler, Damir Bilic,´ Nikola Petrovic,´ Marjan Sirjani, Lorenzo Addazi, Patrick Denzler, Damir Bilic,´ Nikola Petrovic,´ Marjan Sirjani, Lorenzo Addazi, Husni Khanfar, Sasikumar Punnekkat, Jan Gustafsson, Peter Puschner, and Husni Khanfar, Sasikumar Punnekkat, Jan Gustafsson, Peter Puschner, and many more, humbly asking for your forgiveness for not naming you all. many more, humbly asking for your forgiveness for not naming you all. 11

Contents Contents

List of Figures xi List of Figures xi

List of Tables xv List of Tables xv

List of Algorithms xvii List of Algorithms xvii

List of Source Codes xix List of Source Codes xix

Abbreviations xxi Abbreviations xxi

List of Core Symbols xxiii List of Core Symbols xxiii

1 Introduction 1 1 Introduction 1

2 Background 5 2 Background 5 2.1 Real-time systems ...... 5 2.1 Real-time systems ...... 5 2.1.1 Real-time tasks ...... 6 2.1.1 Real-time tasks ...... 6 2.1.2 Classification of real time systems and scheduling . . . 9 2.1.2 Classification of real time systems and scheduling . . . 9 2.2 Classification according to interruption policy ...... 9 2.2 Classification according to interruption policy ...... 9 2.2.1 Preemption-related delays ...... 11 2.2.1 Preemption-related delays ...... 11 2.3 Preemptive vs Non-preemptive Scheduling ...... 12 2.3 Preemptive vs Non-preemptive Scheduling ...... 12 2.4 Limited-Preemptive Scheduling ...... 13 2.4 Limited-Preemptive Scheduling ...... 13 2.4.1 Tasks with fixed preemption points (LPS-FPP) .... 14 2.4.1 Tasks with fixed preemption points (LPS-FPP) .... 14 2.4.2 Implementation example of LPS-FPP ...... 15 2.4.2 Implementation example of LPS-FPP ...... 15 2.5 Embedded systems – Architecture Overview ...... 18 2.5 Embedded systems – Architecture Overview ...... 18 2.5.1 Classification according to CPU type ...... 18 2.5.1 Classification according to CPU type ...... 18 2.5.2 Cache Memory ...... 18 2.5.2 Cache Memory ...... 18 2.6 Cache-related preemption delay (CRPD) ...... 26 2.6 Cache-related preemption delay (CRPD) ...... 26 vii vii 12

viii Contents viii Contents

2.7 Analysis of temporal correctness of real-time systems ..... 27 2.7 Analysis of temporal correctness of real-time systems ..... 27 2.7.1 Feasibility and Schedulability Analysis ...... 27 2.7.1 Feasibility and Schedulability Analysis ...... 27 2.7.2 Timing and cache analysis ...... 28 2.7.2 Timing and cache analysis ...... 28 2.7.3 Cache-related preemption delay analysis ...... 28 2.7.3 Cache-related preemption delay analysis ...... 28 2.8 Summary ...... 29 2.8 Summary ...... 29

3 Research Description 31 3 Research Description 31 3.1 Research process ...... 31 3.1 Research process ...... 31 3.2 Research Goals ...... 33 3.2 Research Goals ...... 33 3.2.1 Research Goal 1 ...... 33 3.2.1 Research Goal 1 ...... 33 3.2.2 Research Goal 2 ...... 33 3.2.2 Research Goal 2 ...... 33 3.2.3 Research Goal 3 ...... 34 3.2.3 Research Goal 3 ...... 34 3.3 Thesis Contributions ...... 34 3.3 Thesis Contributions ...... 34 3.3.1 Research contribution C1 ...... 35 3.3.1 Research contribution C1 ...... 35 3.3.2 Research contribution C2 ...... 36 3.3.2 Research contribution C2 ...... 36 3.3.3 Research contribution C3 ...... 37 3.3.3 Research contribution C3 ...... 37 3.3.4 Research contribution C4 ...... 38 3.3.4 Research contribution C4 ...... 38 3.3.5 Research contribution C5 ...... 39 3.3.5 Research contribution C5 ...... 39 3.4 Publications forming the thesis ...... 40 3.4 Publications forming the thesis ...... 40 3.5 Other publications ...... 44 3.5 Other publications ...... 44 3.6 Summary ...... 44 3.6 Summary ...... 44

4 Related work 45 4 Related work 45 4.1 Preemption-aware schedulability and 4.1 Preemption-aware schedulability and timing analysis of single-core systems ...... 45 timing analysis of single-core systems ...... 45 4.2 Schedulability and timing analysis of 4.2 Schedulability and timing analysis of tasks with ﬁxed preemption points ...... 46 tasks with ﬁxed preemption points ...... 46 4.3 Analysis of partitioned multi-core systems ...... 47 4.3 Analysis of partitioned multi-core systems ...... 47 4.4 Relevant work in timing and cache analysis ...... 49 4.4 Relevant work in timing and cache analysis ...... 49

5 System model and notation 51 5 System model and notation 51 5.1 High-level task model ...... 51 5.1 High-level task model ...... 51 5.2 Low-level task model ...... 52 5.2 Low-level task model ...... 52 5.2.1 Cache-block classiﬁcation ...... 53 5.2.1 Cache-block classiﬁcation ...... 53 5.3 Task model after preemption point selection ...... 55 5.3 Task model after preemption point selection ...... 55 5.4 Fully-preemptive task ...... 57 5.4 Fully-preemptive task ...... 57 5.5 Sequential task model (runnables) ...... 57 5.5 Sequential task model (runnables) ...... 57 5.6 Relevant sets ot tasks ...... 58 5.6 Relevant sets ot tasks ...... 58 5.7 CRPD notation ...... 58 5.7 CRPD notation ...... 58 13

Contents ix Contents ix

5.8 System assumptions and limitations ...... 59 5.8 System assumptions and limitations ...... 59 5.9 Summary ...... 60 5.9 Summary ...... 60

6 Preemption-delay analysis for tasks with ﬁxed preemption points 61 6 Preemption-delay analysis for tasks with ﬁxed preemption points 61 6.1 Problem statement ...... 63 6.1 Problem statement ...... 63 6.1.1 Infeasible preemptions ...... 63 6.1.1 Infeasible preemptions ...... 63 6.1.2 Infeasible useful cache block reloads ...... 64 6.1.2 Infeasible useful cache block reloads ...... 64 6.2 Computation of tight CRPD bounds ...... 66 6.2 Computation of tight CRPD bounds ...... 66 6.2.1 Variables ...... 67 6.2.1 Variables ...... 67 6.2.2 Constraints ...... 67 6.2.2 Constraints ...... 67 6.2.3 Goal function ...... 69 6.2.3 Goal function ...... 69 6.3 Evaluation ...... 72 6.3 Evaluation ...... 72 6.4 Summary and Conclusions ...... 74 6.4 Summary and Conclusions ...... 74

7 Preemption-delay aware schedulability analysis 7 Preemption-delay aware schedulability analysis for tasks with fixed preemption points 77 for tasks with fixed preemption points 77 7.1 Self-Pushing phenomenon ...... 79 7.1 Self-Pushing phenomenon ...... 79 7.2 Feasibility analysis for LPS-FPP ...... 80 7.2 Feasibility analysis for LPS-FPP ...... 80 7.3 Problem statement ...... 81 7.3 Problem statement ...... 81 7.3.1 Motivating Examples ...... 81 7.3.1 Motivating Examples ...... 81 7.4 Preemption-delay aware RTA for LPS-FPP ...... 83 7.4 Preemption-delay aware RTA for LPS-FPP ...... 83 7.4.1 Maximum lower priority blocking ...... 85 7.4.1 Maximum lower priority blocking ...... 85 7.4.2 Computation of CRPD bounds ...... 86 7.4.2 Computation of CRPD bounds ...... 86 7.4.3 Maximum time interval that affects preemption delays 95 7.4.3 Maximum time interval that affects preemption delays 95 7.4.4 The latest start time of a job ...... 97 7.4.4 The latest start time of a job ...... 97 7.4.5 The latest finish time of a job ...... 99 7.4.5 The latest finish time of a job ...... 99 7.4.6 Level-i active period ...... 99 7.4.6 Level-i active period ...... 99 7.4.7 The worst-case response time of a task ...... 100 7.4.7 The worst-case response time of a task ...... 100 7.4.8 Schedulability analysis...... 101 7.4.8 Schedulability analysis...... 101 7.5 Evaluation ...... 101 7.5 Evaluation ...... 101 7.5.1 Evaluation setup ...... 101 7.5.1 Evaluation setup ...... 101 7.5.2 Experiments ...... 104 7.5.2 Experiments ...... 104 7.6 Summary and Conclusions ...... 107 7.6 Summary and Conclusions ...... 107

8 Preemption-delay aware schedulability analysis 8 Preemption-delay aware schedulability analysis for fully-preemptive tasks 109 for fully-preemptive tasks 109 8.1 ECB- and UCB-based CRPD approaches ...... 111 8.1 ECB- and UCB-based CRPD approaches ...... 111 8.2 Problem statement ...... 113 8.2 Problem statement ...... 113 8.3 Improved response-time analysis ...... 115 8.3 Improved response-time analysis ...... 115 14

x Contents x Contents

8.3.1 Upper bounds on the number of preemptions ..... 118 8.3.1 Upper bounds on the number of preemptions ..... 118 8.3.2 Preemption partitioning ...... 118 8.3.2 Preemption partitioning ...... 118 8.3.3 CRPD bound on preemptions from a single partition . 119 8.3.3 CRPD bound on preemptions from a single partition . 119 8.3.4 CRPD bound on all preemptions within a time interval 122 8.3.4 CRPD bound on all preemptions within a time interval 122 8.3.5 Worst-case response time ...... 122 8.3.5 Worst-case response time ...... 122 8.3.6 Time complexity ...... 123 8.3.6 Time complexity ...... 123 8.3.7 CRPD computation using preemption scenarios .... 125 8.3.7 CRPD computation using preemption scenarios .... 125 8.4 Evaluation ...... 134 8.4 Evaluation ...... 134 8.5 Summary and conclusions ...... 138 8.5 Summary and conclusions ...... 138

9 Preemption-delay aware partitioning in multi-core systems 139 9 Preemption-delay aware partitioning in multi-core systems 139 9.1 Problem statement and motivation ...... 141 9.1 Problem statement and motivation ...... 141 9.2 System model and preemption point selection ...... 141 9.2 System model and preemption point selection ...... 141 9.2.1 Preemption point selection ...... 142 9.2.1 Preemption point selection ...... 142 9.2.2 Task model after preemption point selection ...... 144 9.2.2 Task model after preemption point selection ...... 144 9.3 Simpliﬁed feasibility analysis for LPS-FPP ...... 145 9.3 Simpliﬁed feasibility analysis for LPS-FPP ...... 145 9.4 Partitioned scheduling under LPS-FPP ...... 146 9.4 Partitioned scheduling under LPS-FPP ...... 146 9.4.1 Partitioning test ...... 146 9.4.1 Partitioning test ...... 146 9.4.2 Task partitioning ...... 149 9.4.2 Task partitioning ...... 149 9.5 Evaluation ...... 154 9.5 Evaluation ...... 154 9.5.1 A comparison of LPS-FPP partitioning strategies . . . 154 9.5.1 A comparison of LPS-FPP partitioning strategies . . . 154 9.5.2 A comparison between LP-FPP, NP, and FP scheduling 161 9.5.2 A comparison between LP-FPP, NP, and FP scheduling 161 9.5.3 Effect of preemption delay on partitioning strategies . 164 9.5.3 Effect of preemption delay on partitioning strategies . 164 9.6 Summary and Conclusions ...... 166 9.6 Summary and Conclusions ...... 166

10 Conclusions 167 10 Conclusions 167

11 Future work 169 11 Future work 169 11.1 Static code analysis for reload bounds ...... 169 11.1 Static code analysis for reload bounds ...... 169 11.2 Improved analysis and preemption point 11.2 Improved analysis and preemption point selection for LRU caches ...... 170 selection for LRU caches ...... 170 11.3 Task & cache & preemption partitioning ...... 171 11.3 Task & cache & preemption partitioning ...... 171 11.4 Analysis of complex multi-core systems ...... 173 11.4 Analysis of complex multi-core systems ...... 173

Bibliography 175 Bibliography 175 15

List of Figures List of Figures

1.1 Approximation of delays in real-time systems ...... 2 1.1 Approximation of delays in real-time systems ...... 2

2.1 Real-time computation example...... 6 2.1 Real-time computation example...... 6 2.2 Task parameters...... 8 2.2 Task parameters...... 8 2.3 Example of a task τi being preempted by a task τh with higher 2.3 Example of a task τi being preempted by a task τh with higher priority than τi...... 10 priority than τi...... 10 2.4 Example of preemption-related delays...... 11 2.4 Example of preemption-related delays...... 11 2.5 Example of the fully-preemptive scheduling drawback. .... 12 2.5 Example of the fully-preemptive scheduling drawback. .... 12 2.6 Example of the non-preemptive scheduling drawback...... 12 2.6 Example of the non-preemptive scheduling drawback...... 12 2.7 Example of the limited-preemptive scheduling benefit. .... 13 2.7 Example of the limited-preemptive scheduling benefit. .... 13 2.8 a) Preemption points before the declaration of non-preemptive 2.8 a) Preemption points before the declaration of non-preemptive regions, and b) after the declaration of non-preemptive regions. 15 regions, and b) after the declaration of non-preemptive regions. 15 2.9 Scheduling trace of the implemented job obtained with Feather- 2.9 Scheduling trace of the implemented job obtained with Feather- Trace...... 17 Trace...... 17 2.10 Memory units and their simplified structures ...... 19 2.10 Memory units and their simplified structures ...... 19 2.11 Memory requests (left) and initial state of the direct-mapped 2.11 Memory requests (left) and initial state of the direct-mapped cache (right)...... 24 cache (right)...... 24 2.12 The final classification of cache accesses (left) and the final 2.12 The final classification of cache accesses (left) and the final cache state (right)...... 24 cache state (right)...... 24 2.13 Example of a cache-related preemption delay...... 26 2.13 Example of a cache-related preemption delay...... 26

3.1 Research process steps...... 32 3.1 Research process steps...... 32 3.2 Multicore partitioning of the tasks, using ﬁxed preemption points. 38 3.2 Multicore partitioning of the tasks, using ﬁxed preemption points. 38

5.1 Example of CFGi of a preemptive task τi...... 52 5.1 Example of CFGi of a preemptive task τi...... 52 5.2 Example of CFGi of a task τi after the declaration of non- 5.2 Example of CFGi of a task τi after the declaration of non- preemptive regions...... 56 preemptive regions...... 56 xi xi 16

xii List of Figures xii List of Figures

5.3 Example of a simpliﬁed sequential task model with non-preemptive 5.3 Example of a simpliﬁed sequential task model with non-preemptive regions...... 57 regions...... 57

6.1 A preempted task τ2 with three preemption points (PP 2,1, PP 2,2 6.1 A preempted task τ2 with three preemption points (PP 2,1, PP 2,2 and PP 2,3), and four non-preemptive regions with worst case and PP 2,3), and four non-preemptive regions with worst case CRPD at each point. Top of the ﬁgure: preempting task τ1 with CRPD at each point. Top of the ﬁgure: preempting task τ1 with C1 =20, and T1 =65...... 63 C1 =20, and T1 =65...... 63

6.2 A preempted task τ1 with three preemption points with deﬁned 6.2 A preempted task τ1 with three preemption points with deﬁned UCB sets (UCB 2,1, UCB 2,2, and UCB 2,3) , and four non- UCB sets (UCB 2,1, UCB 2,2, and UCB 2,3) , and four non- preemptive regions. The cache block accesses throughout the preemptive regions. The cache block accesses throughout the task execution are shown as circled integer values for each cache task execution are shown as circled integer values for each cache block...... 64 block...... 64

6.3 The preempted task τi with a cache block m accessed before 6.3 The preempted task τi with a cache block m accessed before preemption point PP i,k and re-accessed at δi,l. Top: The pre- preemption point PP i,k and re-accessed at δi,l. Top: The preempting task τh that evicts m and preempts τi at all preemption empting task τh that evicts m and preempts τi at all preemption points between PP i,k and δi,l...... 65 points between PP i,k and δi,l...... 65 6.4 CRPD estimation per taskset for different levels of cache utiliza- 6.4 CRPD estimation per taskset for different levels of cache utilization, calculated as the average over the 2000 generated tasksets. 73 tion, calculated as the average over the 2000 generated tasksets. 73 6.5 CRPD estimation per taskset, for different taskset sizes, calcu- 6.5 CRPD estimation per taskset, for different taskset sizes, calculated as the average over the 2000 generated tasksets...... 74 lated as the average over the 2000 generated tasksets...... 74

7.1 Example of the self-pushing phenomenon...... 79 7.1 Example of the self-pushing phenomenon...... 79 7.2 Example of different time intervals used in the analysis. .... 82 7.2 Example of different time intervals used in the analysis. .... 82 7.3 Running taskset example...... 83 7.3 Running taskset example...... 83

7.4 Top row: CRPD approximation for τ3, considering two pre- 7.4 Top row: CRPD approximation for τ3, considering two preempting jobs from τ1, and one preempting job from τ2. Bottom empting jobs from τ1, and one preempting job from τ2. Bottom row: CRPD approximation for τ2, considering one preempting row: CRPD approximation for τ2, considering one preempting job from τ1...... 88 job from τ1...... 88 7.5 Schedulability ratio at different taskset utilisation...... 104 7.5 Schedulability ratio at different taskset utilisation...... 104 7.6 Weighted measure at different taskset size...... 105 7.6 Weighted measure at different taskset size...... 105 7.7 Weighted measure at different upper bound on number of non- 7.7 Weighted measure at different upper bound on number of non- preemptive regions...... 105 preemptive regions...... 105 7.8 Weighted measure at different upper bound on number of UCBs 7.8 Weighted measure at different upper bound on number of UCBs per preemption point...... 106 per preemption point...... 106 7.9 Weighted measure at different reload-ability conditions. .... 106 7.9 Weighted measure at different reload-ability conditions. .... 106 17

List of Figures xiii List of Figures xiii

8.1 Example of the pessimistic CRPD estimation in both, UCB- 8.1 Example of the pessimistic CRPD estimation in both, UCB- and ECB-union based approaches. Notice that the worst-case and ECB-union based approaches. Notice that the worst-case execution time is in reality significantly larger than CRPD (black execution time is in reality significantly larger than CRPD (black rectangles), but the focus of the figure is rather on preemptions rectangles), but the focus of the figure is rather on preemptions and CRPD depiction...... 113 and CRPD depiction...... 113 8.2 Worst-case preemptions for τ3 during the time duration t =46. 116 8.2 Worst-case preemptions for τ3 during the time duration t =46. 116 8.3 Top: Algorithm walktrough with an example from Figure 8.2. 8.3 Top: Algorithm walktrough with an example from Figure 8.2. Bottom: Example for extending the combination Πc of four tasks.132 Bottom: Example for extending the combination Πc of four tasks.132 8.4 Left: Schedulability ratio at different taskset utilisation. Right: 8.4 Left: Schedulability ratio at different taskset utilisation. Right: Weighted measure at different taskset size...... 136 Weighted measure at different taskset size...... 136 8.5 Leftmost: The worst-case measured analysis time per taskset, at 8.5 Leftmost: The worst-case measured analysis time per taskset, at different tasket size. Center and rightmost: Venn Diagrams[1] different tasket size. Center and rightmost: Venn Diagrams[1] representing schedulability result relations between different representing schedulability result relations between different methods, over 120000 analysed tasksets per each – TACLe methods, over 120000 analysed tasksets per each – TACLe Benchmark and Mälardalen Benchmark...... 137 Benchmark and Mälardalen Benchmark...... 137

9.1 Correlation between the maximum blocking tolerance and max- 9.1 Correlation between the maximum blocking tolerance and maximum length of the non-preemptive region...... 142 imum length of the non-preemptive region...... 142 9.2 Example when all the potential preemption points are selected. 143 9.2 Example when all the potential preemption points are selected. 143 9.3 Example of the LP-FPPS benefit...... 143 9.3 Example of the LP-FPPS benefit...... 143 9.4 Top: τk before the preemption point selection. Bottom: τk 9.4 Top: τk before the preemption point selection. Bottom: τk after the preemption point selection where only the second after the preemption point selection where only the second preemption point is selected...... 144 preemption point is selected...... 144 9.5 Comparison of the influence of different task ordering strate- 9.5 Comparison of the influence of different task ordering strategies on the preemption related delay introduced upon a task gies on the preemption related delay introduced upon a task assignment...... 153 assignment...... 153 9.6 Schedulability success ratio as the function of system utilisation 9.6 Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.3) ...... 157 (Umin =0.1 and Umax =0.3) ...... 157 9.7 Schedulability success ratio as the function of system utilisation 9.7 Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.5) ...... 157 (Umin =0.1 and Umax =0.5) ...... 157 9.8 Schedulability success ratio as the function of system utilisation 9.8 Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =1) ...... 157 (Umin =0.1 and Umax =1) ...... 157 9.9 Schedulability success ratio as the function of system utilisation 9.9 Schedulability success ratio as the function of system utilisation (Umin =0.25 and Umax =0.75)...... 157 (Umin =0.25 and Umax =0.75)...... 157 9.10 Relative contribution to the combined approach (m =4, Umin = 9.10 Relative contribution to the combined approach (m =4, Umin = 0.1, Umax =0.3, Usys =0.86)...... 160 0.1, Umax =0.3, Usys =0.86)...... 160 9.11 Relative contribution to the combined approach (m =8, Umin = 9.11 Relative contribution to the combined approach (m =8, Umin = 0.1, Umax =0.5, Usys =0.9) ...... 160 0.1, Umax =0.5, Usys =0.9) ...... 160 18

xiv List of Figures xiv List of Figures

9.12 Schedulability success ratio as the function of system utilisation 9.12 Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.3) ...... 162 (Umin =0.1 and Umax =0.3) ...... 162 9.13 Schedulability success ratio as the function of system utilisation 9.13 Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.5) ...... 162 (Umin =0.1 and Umax =0.5) ...... 162 9.14 Schedulability success ratio as the function of system utilisation 9.14 Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =1) ...... 162 (Umin =0.1 and Umax =1) ...... 162 9.15 Schedulability success ratio as the function of system utilisation 9.15 Schedulability success ratio as the function of system utilisation (Umin =0.25 and Umax =0.75)...... 162 (Umin =0.25 and Umax =0.75)...... 162 9.16 Schedulability success ratio as a function of the maximum 9.16 Schedulability success ratio as a function of the maximum CRPD for the Fixed Preemption Point Scheduling partitioning CRPD for the Fixed Preemption Point Scheduling partitioning strategies...... 164 strategies...... 164 9.17 Schedulability success ratio as a function of the maximum 9.17 Schedulability success ratio as a function of the maximum CRPD for the non-preemptive, fully-preemptive and ﬁxed pre- CRPD for the non-preemptive, fully-preemptive and ﬁxed preemption point partitioned scheduling...... 165 emption point partitioned scheduling...... 165

11.1 Example of over-approximation of cache-block reloads in the 11.1 Example of over-approximation of cache-block reloads in the conditional execution ﬂow...... 170 conditional execution ﬂow...... 170 11.2 Cache-block access example for LRU caches...... 171 11.2 Cache-block access example for LRU caches...... 171 11.3 Overview of the envisioned approach for CRPD minimisation 11.3 Overview of the envisioned approach for CRPD minimisation and improving schedulability of a real-time system...... 172 and improving schedulability of a real-time system...... 172 19

List of Tables List of Tables

3.1 Mapping between the research goals and the contributions . . 35 3.1 Mapping between the research goals and the contributions . . 35 3.2 Mapping between the papers and the contributions ...... 40 3.2 Mapping between the papers and the contributions ...... 40

6.1 List of important symbols used in this chapter (CRPD terms are 6.1 List of important symbols used in this chapter (CRPD terms are upper bounds)...... 62 upper bounds)...... 62

7.1 List of important symbols used in this chapter (CRPD terms are 7.1 List of important symbols used in this chapter (CRPD terms are upper bounds)...... 78 upper bounds)...... 78 7.2 Cache conﬁgurations obtained with LLVMTA [2] analysis tool 7.2 Cache conﬁgurations obtained with LLVMTA [2] analysis tool used on Mälardalen benchmark programs [3]...... 102 used on Mälardalen benchmark programs [3]...... 102

8.1 List of important symbols used in this chapter (CRPD terms are 8.1 List of important symbols used in this chapter (CRPD terms are upper bounds)...... 110 upper bounds)...... 110 8.2 Task characteristics obtained with LLVMTA [2] analysis tool 8.2 Task characteristics obtained with LLVMTA [2] analysis tool used on Mälardalen [3] and TACLe [4] benchmark tasks . . . . 135 used on Mälardalen [3] and TACLe [4] benchmark tasks . . . . 135

9.1 List of important symbols used in this chapter (CRPD terms are 9.1 List of important symbols used in this chapter (CRPD terms are upper bounds)...... 140 upper bounds)...... 140

xv xv 20 21

List of Algorithms List of Algorithms

1 Algorithm for tightening the upper bounds on CRPD of tasks 1 Algorithm for tightening the upper bounds on CRPD of tasks with ﬁxed preemption points...... 66 with ﬁxed preemption points...... 66

2 Algorithm for computing the cumulative CRPD during a time 2 Algorithm for computing the cumulative CRPD during a time interval of length t...... 124 interval of length t...... 124 3 Algorithm that generates a set Πi,Λ of preemption combinations. 131 3 Algorithm that generates a set Πi,Λ of preemption combinations. 131 4 Derivation of the Maximum Blocking Tolerance of a taskset. . . 147 4 Derivation of the Maximum Blocking Tolerance of a taskset. . . 147 5 FFD partitioning ...... 150 5 FFD partitioning ...... 150 6 WFD partitioning ...... 151 6 WFD partitioning ...... 151

xvii xvii 22 23

List of Source Codes List of Source Codes

2.1 Subroutine SR() whose WCET is approximately 20ms long . 16 2.1 Subroutine SR() whose WCET is approximately 20ms long . 16 2.2 Fully-preemptive job whose execution consists of two subroutines 16 2.2 Fully-preemptive job whose execution consists of two subroutines 16 2.3 Liblitmus function for declaring the start of the non-preemptive 2.3 Liblitmus function for declaring the start of the non-preemptive region ...... 16 region ...... 16 2.4 Liblitmus function for declaring the end of the non-preemptive 2.4 Liblitmus function for declaring the end of the non-preemptive region ...... 17 region ...... 17 2.5 Job with a single preemption point whose execution consists of 2.5 Job with a single preemption point whose execution consists of two subroutines ...... 17 two subroutines ...... 17

xix xix 24 25

Abbreviations Abbreviations

BB Basic Block BB Basic Block BRT Block Reload Time BRT Block Reload Time CFG Control Flow Graph CFG Control Flow Graph CPU Central Processing Unit CPU Central Processing Unit CRPD Cache-Related Preemption Delay CRPD Cache-Related Preemption Delay CRPD Cache-Related Preemption Delay CRPD Cache-Related Preemption Delay ECB Evicting Cache Block ECB Evicting Cache Block FFD First Fit Decreasing FFD First Fit Decreasing FIFO First In First Out FIFO First In First Out FPPS Fixed-Priority Preemptive Scheduling FPPS Fixed-Priority Preemptive Scheduling FPS Fully-Preemptive Scheduling FPS Fully-Preemptive Scheduling LPS-DP Limited-Preemptive Scheduling of tasks with LPS-DP Limited-Preemptive Scheduling of tasks with Deferred Preemptions Deferred Preemptions LPS-FPP Limited-Preemptive Scheduling of tasks with LPS-FPP Limited-Preemptive Scheduling of tasks with Fixed Preemption Points Fixed Preemption Points LPS-PT Limited-Preemptive Scheduling of tasks with LPS-PT Limited-Preemptive Scheduling of tasks with Preemption Thresholds Preemption Thresholds LPS Limited-Preemptive Scheduling LPS Limited-Preemptive Scheduling LRU Least Recently Used LRU Least Recently Used MBT Maximum Blocking Tolerance MBT Maximum Blocking Tolerance MILP Mixed Integer Linear Program MILP Mixed Integer Linear Program NPR Non-Preemptive Region NPR Non-Preemptive Region NPS Non-Preemptive Scheduling NPS Non-Preemptive Scheduling PLRU Pseudo Least Recently Used PLRU Pseudo Least Recently Used PP Preemption Point PP Preemption Point RCB Reloadable Cache Block RCB Reloadable Cache Block RTA Response-Time Analysis RTA Response-Time Analysis xxi xxi 26

xxii Abbreviations xxii Abbreviations

RTS Real-Time System RTS Real-Time System SOTA State Of The Art SOTA State Of The Art SPP Selected Preemption Point SPP Selected Preemption Point UCB Useful Cache Block UCB Useful Cache Block WCET Worst-Case Execution Time WCET Worst-Case Execution Time WFD Worst Fit Decreasing WFD Worst Fit Decreasing 27

List of Core Symbols List of Core Symbols

Pi The i-th processor of a system Pi The i-th processor of a system

Γ Taskset Γ Taskset UΓ Taskset utilisation UΓ Taskset utilisation τi Task with index i τi Task with index i Ti The minimum inter-arrival time of τi Ti The minimum inter-arrival time of τi Di Relative deadline of τi Di Relative deadline of τi Ci The worst-case execution time of τi without Ci The worst-case execution time of τi without preemption delays preemption delays γ γ Ci The worst-case execution time of τi with pre- Ci The worst-case execution time of τi with preemption delays emption delays Pi Priority of τi Pi Priority of τi Ri The worst-case response time of τi Ri The worst-case response time of τi Ui Utilisation of τi (Ui = Ci/Ti) Ui Utilisation of τi (Ui = Ci/Ti) Si Density of τi (Si = Ck/Di) Si Density of τi (Si = Ck/Di) hp(i) Set of tasks with higher priority than Pi hp(i) Set of tasks with higher priority than Pi lp(i) Set of tasks with lower priority than Pi lp(i) Set of tasks with lower priority than Pi hpe(i) hp(i) ∪{τi} hpe(i) hp(i) ∪{τi} aff (i, h) Set of tasks that can affect Ri, and can be pre- aff (i, h) Set of tasks that can affect Ri, and can be preempted by τh, aff (i, h)=hpe(i) ∩ lp(h) empted by τh, aff (i, h)=hpe(i) ∩ lp(h)

τi,j The j-th job (task instance) of a task. τi,j The j-th job (task instance) of a task. ai,j Arrival time of the j-th job of τi ai,j Arrival time of the j-th job of τi si,j Start time of the j-th job of τi si,j Start time of the j-th job of τi fi,j Finishing time of the j-th job of τi fi,j Finishing time of the j-th job of τi ri,j Response time of the j-th job of τi ri,j Response time of the j-th job of τi xxiii xxiii 28

xxiv List of Core Symbols xxiv List of Core Symbols

CFGi Control ﬂow graph of τi CFGi Control ﬂow graph of τi BB i,k Basic block of τi BB i,k Basic block of τi PP i,k Preemption point of τi PP i,k Preemption point of τi δi,k Non-preemptive region of τi δi,k Non-preemptive region of τi qi,k The worst-case execution time of δi,k qi,k The worst-case execution time of δi,k SJ i,k Subjob of τi SJ i,k Subjob of τi ci,k The worst-case execution time of SJ i,k ci,k The worst-case execution time of SJ i,k

UCB i,k Set of useful cache blocks at PP i,k UCB i,k Set of useful cache blocks at PP i,k UCB i Set of useful cache blocks of τi UCB i Set of useful cache blocks of τi max max ucbi The maximum number of UCBs at a single pre- ucbi The maximum number of UCBs at a single preemption point of τi emption point of τi ξi,k The worst-case preemption delay resulting from ξi,k The worst-case preemption delay resulting from the complete preemption scenario at PP i,k the complete preemption scenario at PP i,k BB BB ECB i,k Set of accessed (evicting) cache blocks during ECB i,k Set of accessed (evicting) cache blocks during the execution of basic block BB i,k the execution of basic block BB i,k ECB i,k Set of accessed (evicting) cache blocks during ECB i,k Set of accessed (evicting) cache blocks during the execution of non-preemptive region δi,k the execution of non-preemptive region δi,k ECB i The set of evicting cache blocks of τi ECB i The set of evicting cache blocks of τi m Cache block, i.e. cache line m Cache block, i.e. cache line L Memory block size, cache line size L Memory block size, cache line size AL Memory address length AL Memory address length CS Cache size CS Cache size S Number of cache sets S Number of cache sets K Number of cache lines in a cache set K Number of cache lines in a cache set BRT The maximum duration needed to reload a block BRT The maximum duration needed to reload a block γ General symbol for upper bound on preemption γ General symbol for upper bound on preemption (or cache-related preemption) delay (or cache-related preemption) delay

The above list represents the core symbols, The above list represents the core symbols, while the other symbols, defined in this thesis, while the other symbols, defined in this thesis, are enlisted at the beginning of each chapter are enlisted at the beginning of each chapter where they are defined or used. where they are defined or used. 29 30 31

Chapter 1 Chapter 1

Introduction Introduction

In many computing systems, it is not only important that the computations In many computing systems, it is not only important that the computations provide correct results, i.e. that they comply with the intended goal of the provide correct results, i.e. that they comply with the intended goal of the computation, but it is also important that the computation is finished within a computation, but it is also important that the computation is finished within a specified time interval. Such systems are called real-time systems and they play specified time interval. Such systems are called real-time systems and they play an integral part in many industrial domains, e.g. automotive [5, 6], aerospace an integral part in many industrial domains, e.g. automotive [5, 6], aerospace [6, 7], space industry [6, 8, 9], etc., where the timing requirements are of [6, 7], space industry [6, 8, 9], etc., where the timing requirements are of critical importance. critical importance. Real-time systems control and arrange different tasks (processes or work- Real-time systems control and arrange different tasks (processes or work- loads), which conduct many of the system’s functionalities. Tasks are often loads), which conduct many of the system’s functionalities. Tasks are often executed on complex software and hardware architectures which on average in- executed on complex software and hardware architectures which on average increase the execution performance but at the same time contribute to the increased crease the execution performance but at the same time contribute to the increased complexity of tracing and analysing the system execution. Such complexity complexity of tracing and analysing the system execution. Such complexity can be a significant problem when it needs to be proven that a real-time system can be a significant problem when it needs to be proven that a real-time system satisfies its specified timing requirements, which is even more amplified when satisfies its specified timing requirements, which is even more amplified when we consider that tasks may interrupt (preempt) each other. This is the case we consider that tasks may interrupt (preempt) each other. This is the case because each interruption causes a series of additional activities spread across because each interruption causes a series of additional activities spread across many different system entities, e.g. central processing unit, cache memory, many different system entities, e.g. central processing unit, cache memory, kernel, etc., and those activities may introduce additional delays in the task kernel, etc., and those activities may introduce additional delays in the task execution times. Such delays are called preemption delays and they must be execution times. Such delays are called preemption delays and they must be thoroughly analysed and accounted for, in order to derive a safe conclusion on thoroughly analysed and accounted for, in order to derive a safe conclusion on the achievability of system’s timing requirements. The safe conclusion means the achievability of system’s timing requirements. The safe conclusion means that for any system for which some timing analysis claims that it complies that for any system for which some timing analysis claims that it complies with the specified timing requirements, the system, indeed complies with the with the specified timing requirements, the system, indeed complies with the requirements. requirements.

1 1 32

2 Chapter 1. Introduction 2 Chapter 1. Introduction

Another important point when the timing of real-time systems is analysed, is Another important point when the timing of real-time systems is analysed, is to avoid the situation where the conclusion of the timing analysis is that system to avoid the situation where the conclusion of the timing analysis is that system does not comply with its requirements, while in reality it indeed does. This can does not comply with its requirements, while in reality it indeed does. This can quite often be the case due to timing over-approximations that are necessary for quite often be the case due to timing over-approximations that are necessary for the reasons of safe analysis, explained in the previous paragraph. Therefore, the reasons of safe analysis, explained in the previous paragraph. Therefore, it is also important to as accurately as possible approximate execution times it is also important to as accurately as possible approximate execution times and preemption delays that may occur within a real-time system, since those and preemption delays that may occur within a real-time system, since those approximations may affect the ﬁnal analysis outcome. approximations may affect the ﬁnal analysis outcome. In the context of preemption delay analysis, its safety goal is to always In the context of preemption delay analysis, its safety goal is to always approximate the preemption delay with a value that is larger than or equal to the approximate the preemption delay with a value that is larger than or equal to the one that may occur at run-time under any possible scenario. While the accuracy one that may occur at run-time under any possible scenario. While the accuracy goal is to approximate as closely as possible to the worst-case run-time delay. goal is to approximate as closely as possible to the worst-case run-time delay. This is illustrated in Figure 1.1. This is illustrated in Figure 1.1.

Highest possible delay Highest possible delay at runtime at runtime

Possible delays Safe delay Possible delays Safe delay at runtime approximations time at runtime approximations time

Accuracy goal Accuracy goal

Figure 1.1. Approximation of delays in real-time systems Figure 1.1. Approximation of delays in real-time systems

The high-level goal of this thesis is to propose novel timing-related analyses The high-level goal of this thesis is to propose novel timing-related analyses that safely and as accurately as possible account for preemption delays, in order that safely and as accurately as possible account for preemption delays, in order to correctly estimate whether a real-time system complies with its speciﬁed to correctly estimate whether a real-time system complies with its speciﬁed timing requirements. timing requirements.

Thesis outline Thesis outline The remainder of the thesis is organised as follows. The remainder of the thesis is organised as follows.

In Chapter 2, we describe the background knowledge, necessary for In Chapter 2, we describe the background knowledge, necessary for understanding the thesis contributions, while the reader which is already understanding the thesis contributions, while the reader which is already familiar with real-time and embedded systems may directly start from familiar with real-time and embedded systems may directly start from Chapter 3. Chapter 3. In Chapter 3, we describe the applied research process, the thesis goals In Chapter 3, we describe the applied research process, the thesis goals and its contributions. and its contributions. 33

3 3

In Chapter 4, we describe the related work of the thesis. In Chapter 4, we describe the related work of the thesis.

In Chapter 5, we describe the general system and task model used in the In Chapter 5, we describe the general system and task model used in the thesis, along with the system assumptions and limitations. thesis, along with the system assumptions and limitations. In Chapters 6 and 7, we present preemption-delay aware analyses for In Chapters 6 and 7, we present preemption-delay aware analyses for tasks with ﬁxed preemption points, meaning that the points where a task tasks with ﬁxed preemption points, meaning that the points where a task can be preempted are selected before runtime. can be preempted are selected before runtime.

In Chapter 8, we present a preemption-delay aware analysis for tasks In Chapter 8, we present a preemption-delay aware analysis for tasks which can be preempted at any point during their execution (fully pre- which can be preempted at any point during their execution (fully preemptive tasks). emptive tasks). In Chapter 9, we present the preemption-delay aware partitioning for In Chapter 9, we present the preemption-delay aware partitioning for multi-core systems, where the goal is to allocate tasks to different process- multi-core systems, where the goal is to allocate tasks to different processing units in order to have a system that complies with the given timing ing units in order to have a system that complies with the given timing requirements. requirements. In Chapter 10, we present the thesis conclusions. In Chapter 10, we present the thesis conclusions. In Chapter 11, we describe the future research plans. In Chapter 11, we describe the future research plans. 34 35

Chapter 2 Chapter 2

Background Background

2.1 Real-time systems 2.1 Real-time systems

Computing systems play an important role in modern society. The ability of Computing systems play an important role in modern society. The ability of computing systems to increase the speed of manufacturing, decision-making, computing systems to increase the speed of manufacturing, decision-making, development, or even reducing the cost of those processes, has made them development, or even reducing the cost of those processes, has made them being widely used in many industrial domains. Some of those domains require being widely used in many industrial domains. Some of those domains require precision, predictability, and conformation to certain safety requirements, i.e. it precision, predictability, and conformation to certain safety requirements, i.e. it is important that the computation is valid, computed on time, and that it cannot is important that the computation is valid, computed on time, and that it cannot provide harm or an error beyond the threshold defined by the safety authorities. provide harm or an error beyond the threshold defined by the safety authorities. Some of the industrial domains that follow those principles are avionic systems, Some of the industrial domains that follow those principles are avionic systems, spacecraft systems, telecommunication systems, robotics, automotive systems, spacecraft systems, telecommunication systems, robotics, automotive systems, chemical, and nuclear plant controls, etc. chemical, and nuclear plant controls, etc. To provide precision, predictability, and functionality in such industrial To provide precision, predictability, and functionality in such industrial domains, computing systems are often designed as entities with a dedicated domains, computing systems are often designed as entities with a dedicated function within a mechanical or electrical system, called an embedded system. function within a mechanical or electrical system, called an embedded system. Such systems are often designed considering real-time constraints which must Such systems are often designed considering real-time constraints which must not be violated. This means that it is not always enough to produce the correct not be violated. This means that it is not always enough to produce the correct result of the computation, but it is also important to deliver the result on time. result of the computation, but it is also important to deliver the result on time. Such a system is called a real-time system, and according to a definition by Such a system is called a real-time system, and according to a definition by Stankovic et al. [10]: "A real-time system is a system that reacts upon outside Stankovic et al. [10]: "A real-time system is a system that reacts upon outside events and performs a function based on these and gives a response within a events and performs a function based on these and gives a response within a certain time. Correctness of the function does not only depend on correctness certain time. Correctness of the function does not only depend on correctness of the result, but also on the time when these are produced." of the result, but also on the time when these are produced."

5 5 36

6 Chapter 2. Background 6 Chapter 2. Background

Metaphors of real-time systems can be found in many sports, e.g., basketball, Metaphors of real-time systems can be found in many sports, e.g., basketball, formula one cars, etc.. In basketball, it is not just important that a player scores formula one cars, etc.. In basketball, it is not just important that a player scores when the ball is in the possession of its team, but it is also important that the when the ball is in the possession of its team, but it is also important that the points are scored within the 24 second time interval from the start time of the points are scored within the 24 second time interval from the start time of the possession, which is a time constraint given by the basketball rules. possession, which is a time constraint given by the basketball rules. Many industrial domains have similar timing constraints. For example, Many industrial domains have similar timing constraints. For example, in a nuclear power plant, the system consists of many sensors that constantly in a nuclear power plant, the system consists of many sensors that constantly measure the information about the radiation levels, electricity generation, steam measure the information about the radiation levels, electricity generation, steam line pressure, etc. However, if some of the measured values, e.g. amount of line pressure, etc. However, if some of the measured values, e.g. amount of radiation (see Figure 2.1), exceeds the predefined safety threshold, the system radiation (see Figure 2.1), exceeds the predefined safety threshold, the system needs to alarm the personnel of the plant within a specified time interval so that needs to alarm the personnel of the plant within a specified time interval so that adequate measures can be taken. If the system does not compute and inform adequate measures can be taken. If the system does not compute and inform about the leakage in the predefined time interval, then the consequences can be about the leakage in the predefined time interval, then the consequences can be catastrophic, regardless of the computation precision which is delivered late. catastrophic, regardless of the computation precision which is delivered late.

Figure 2.1. Real-time computation example. Figure 2.1. Real-time computation example.

2.1.1 Real-time tasks 2.1.1 Real-time tasks A real-time task is the main building unit of a real-time system. It represents a A real-time task is the main building unit of a real-time system. It represents a program that performs certain computations. The term task is often considered program that performs certain computations. The term task is often considered as a synonym to thread. However, with task we describe a program at the design as a synonym to thread. However, with task we describe a program at the design and the analysis level of real-time systems, and with thread we describe the and the analysis level of real-time systems, and with thread we describe the implementation of a task in an operating system. Also, the term task refers to implementation of a task in an operating system. Also, the term task refers to the general design description (system model description) of a program and is the general design description (system model description) of a program and is 37

2.1 Real-time systems 7 2.1 Real-time systems 7

denoted with τi, while a speciﬁc instance of a task (called a task instance in the denoted with τi, while a speciﬁc instance of a task (called a task instance in the remainder of the thesis) is assumed to be executed in a system at the certain remainder of the thesis) is assumed to be executed in a system at the certain time point and is denoted with τi,j. time point and is denoted with τi,j.

Task types Task types The most common task types which are used in real-time systems are: The most common task types which are used in real-time systems are: Periodic task: A task type whose consecutive task instances are activated Periodic task: A task type whose consecutive task instances are activated with exact periods, i.e. the activations of any two consecutive instances with exact periods, i.e. the activations of any two consecutive instances are separated by the same time interval. are separated by the same time interval. Sporadic task: A task which handles events which arrive at arbitrary Sporadic task: A task which handles events which arrive at arbitrary time points, but with predefined maximum frequency, i.e. the activations time points, but with predefined maximum frequency, i.e. the activations of two consecutive instances are separated by at least the predefined of two consecutive instances are separated by at least the predefined minimum time interval. minimum time interval. Aperiodic task: A task for which we know nothing about the time between Aperiodic task: A task for which we know nothing about the time between the activation of its instances. the activation of its instances.

Task parameters Task parameters

Each task τi is described with the speciﬁed parameters, which can vary between Each task τi is described with the speciﬁed parameters, which can vary between different types of real-time systems. Depending on whether the parameters different types of real-time systems. Depending on whether the parameters change during run-time of the system, they can be static or dynamic. The change during run-time of the system, they can be static or dynamic. The general task parameters which we use in this thesis to describe a task are: general task parameters which we use in this thesis to describe a task are:

Ti: minimum inter-arrival time or period – The minimum time interval Ti: minimum inter-arrival time or period – The minimum time interval between consecutive invocations of a sporadic task, or the exact time between consecutive invocations of a sporadic task, or the exact time interval (period) between the two consecutive invocations of a periodic interval (period) between the two consecutive invocations of a periodic task. task.

Di: deadline – A timing constraint representing the latest time point Di: deadline – A timing constraint representing the latest time point relative to the arrival time of the task, at which the task must complete its relative to the arrival time of the task, at which the task must complete its execution and deliver a result. execution and deliver a result.

Ci: worst-case execution time (WCET) – The longest possible time Ci: worst-case execution time (WCET) – The longest possible time interval needed for the completion of the task execution from its start interval needed for the completion of the task execution from its start time, without any interruptions from the other tasks. time, without any interruptions from the other tasks.

Pi: priority – The priority of a task. A task with higher priority has the Pi: priority – The priority of a task. A task with higher priority has the advantage of being executed prior to a task with a lower priority if both advantage of being executed prior to a task with a lower priority if both are ready for execution. are ready for execution. 38

8 Chapter 2. Background 8 Chapter 2. Background

Ri: Worst-case response time – The upper bound on the time interval Ri: Worst-case response time – The upper bound on the time interval between the arrival time and the ﬁnishing time of among all possible task between the arrival time and the ﬁnishing time of among all possible task instances. instances.

Task instance parameters Task instance parameters

Each task instance τi,j has the same parameters as the task, but is also described Each task instance τi,j has the same parameters as the task, but is also described with the following additional parameters: with the following additional parameters:

ai,j: arrival time – The time when the task instance is ready to execute. ai,j: arrival time – The time when the task instance is ready to execute.

si,j: start time – The time when the task instance enters the executing si,j: start time – The time when the task instance enters the executing state, i.e. when the instance starts to run. state, i.e. when the instance starts to run.

fi,j: ﬁnishing time – The time when the task has completed its execution. fi,j: ﬁnishing time – The time when the task has completed its execution.

ri,j: response time – The time interval between the arrival time and the ri,j: response time – The time interval between the arrival time and the ﬁnishing time of a task instance. ﬁnishing time of a task instance.

In Figure 2.2, we show an instance of a sporadic task, which means that In Figure 2.2, we show an instance of a sporadic task, which means that the minimum time interval between the consecutive task instances is at least the minimum time interval between the consecutive task instances is at least equal to Ti. The task execution is depicted with the grey rectangle on a timeline, equal to Ti. The task execution is depicted with the grey rectangle on a timeline, whose length represents the WCET of τi. The arrival time and period start are whose length represents the WCET of τi. The arrival time and period start are the same time instant. In the remainder of the thesis, we denote the arrival time the same time instant. In the remainder of the thesis, we denote the arrival time of a task instance with the arrow pointed upwards, and the absolute deadline of a task instance with the arrow pointed upwards, and the absolute deadline with the arrow pointed downwards if it is the same time instance than it is with the arrow pointed downwards if it is the same time instance than it is depicted with the double-pointed arrow. depicted with the double-pointed arrow.

Figure 2.2. Task parameters. Figure 2.2. Task parameters. 39

2.2 Classiﬁcation according to interruption policy 9 2.2 Classiﬁcation according to interruption policy 9

2.1.2 Classification of real time systems and scheduling 2.1.2 Classification of real time systems and scheduling Real-time systems can be classified according to many criteria. Depending Real-time systems can be classified according to many criteria. Depending on the factors outside the computer system, more precisely depending on the on the factors outside the computer system, more precisely depending on the potential consequences due to a deadline miss, we distinguish between: potential consequences due to a deadline miss, we distinguish between: Hard real-time systems: Systems where a deadline miss may lead to Hard real-time systems: Systems where a deadline miss may lead to catastrophic consequences, and therefore all the deadlines must be met. catastrophic consequences, and therefore all the deadlines must be met. Soft real-time systems: Systems where an occasional deadline miss is Soft real-time systems: Systems where an occasional deadline miss is acceptable. acceptable. Depending on the design and the implementation of the system we differen- Depending on the design and the implementation of the system we differen- tiate: tiate: Event-triggered systems: Task activations depend on the occurrences of Event-triggered systems: Task activations depend on the occurrences of relevant events which are external to the system, e.g., sensor readings, relevant events which are external to the system, e.g., sensor readings, etc. etc. Time-triggered systems: Task activations are handled at predefined time Time-triggered systems: Task activations are handled at predefined time points. points. To fulfill the timing requirements, real-time systems are designed with To fulfill the timing requirements, real-time systems are designed with specified scheduling in mind, which is the method of controlling and arranging specified scheduling in mind, which is the method of controlling and arranging the tasks in the computation process. Depending on when the scheduling the tasks in the computation process. Depending on when the scheduling decision is made, real-time scheduling algorithms are classified as: decision is made, real-time scheduling algorithms are classified as: Online Scheduling – scheduling decisions are made at runtime, using a Online Scheduling – scheduling decisions are made at runtime, using a specified criteria (e.g., priorities). specified criteria (e.g., priorities). Offline Scheduling – scheduling decisions are made offline and the sched- Offline Scheduling – scheduling decisions are made offline and the schedule is stored. ule is stored. In this thesis, we consider online scheduling, moreover fixed-priority schedul- In this thesis, we consider online scheduling, moreover fixed-priority scheduling which means that the task priorities are assigned before the run-time. The ing which means that the task priorities are assigned before the run-time. The alternative is to have priorities that change dynamically, e.g., based on the alternative is to have priorities that change dynamically, e.g., based on the remaining time to the deadline. remaining time to the deadline.

2.2 Classiﬁcation according to interruption policy 2.2 Classiﬁcation according to interruption policy

Scheduling algorithms can further be classified according to the interruption Scheduling algorithms can further be classified according to the interruption policy being used, determining whether the executing task can be suspended or policy being used, determining whether the executing task can be suspended or not. Before describing this classification, we first introduce a very important not. Before describing this classification, we first introduce a very important term for understanding classification – a preemption. term for understanding classification – a preemption. 40

10 Chapter 2. Background 10 Chapter 2. Background

Preemption Preemption Preemption is the act of temporary interrupting a task execution with the inten- Preemption is the act of temporary interrupting a task execution with the inten- tion of resuming at some later time point. This interruption may be performed tion of resuming at some later time point. This interruption may be performed for various reasons but in this thesis, we will consider only the interruptions due for various reasons but in this thesis, we will consider only the interruptions due to the arrival of a higher priority task which takes over the processing unit of to the arrival of a higher priority task which takes over the processing unit of the system. the system. In Figure 2.3 we show two tasks: τi, and the higher priority task τh. In this In Figure 2.3 we show two tasks: τi, and the higher priority task τh. In this example, τi starts to execute immediately upon its arrival. However, during its example, τi starts to execute immediately upon its arrival. However, during its execution, τh arrives as well, and since it has a higher priority, it preempts τi execution, τh arrives as well, and since it has a higher priority, it preempts τi which resumes its execution after the complete execution of τh. which resumes its execution after the complete execution of τh.

Figure 2.3. Example of a task τi being preempted by a task τh with higher priority than τi. Figure 2.3. Example of a task τi being preempted by a task τh with higher priority than τi.

In this context, the most used and researched scheduling paradigms are: In this context, the most used and researched scheduling paradigms are:

Non-preemptive Scheduling – tasks execute without preemption once the Non-preemptive Scheduling – tasks execute without preemption once the execution has started. execution has started.

Fully-preemptive Scheduling – tasks may be preempted by other tasks Fully-preemptive Scheduling – tasks may be preempted by other tasks during their execution, and the preemption is performed almost instantly during their execution, and the preemption is performed almost instantly upon the arrival of a task with a higher priority. upon the arrival of a task with a higher priority.

Limited-preemptive Scheduling – tasks may be preempted by other tasks, Limited-preemptive Scheduling – tasks may be preempted by other tasks, but preemptions may be postponed or even cancelled. but preemptions may be postponed or even cancelled.

In this section, we further describe the preemptive scheduling (fully and In this section, we further describe the preemptive scheduling (fully and limited variants) and the non-preemptive scheduling. We also explain some of limited variants) and the non-preemptive scheduling. We also explain some of the important differences between them, starting by describing what preemption- the important differences between them, starting by describing what preemption- related delay is. related delay is. 41

2.2 Classiﬁcation according to interruption policy 11 2.2 Classiﬁcation according to interruption policy 11

2.2.1 Preemption-related delays 2.2.1 Preemption-related delays When a preemption occurs in a real-time system, it can introduce significant When a preemption occurs in a real-time system, it can introduce significant runtime overhead which is called a preemption-related delay. This is the case runtime overhead which is called a preemption-related delay. This is the case because, during a preemption, many processes and hardware components need because, during a preemption, many processes and hardware components need to perform adequate procedures to achieve a valid act of preemption, and this to perform adequate procedures to achieve a valid act of preemption, and this takes time. Therefore, when we account for preemption in real-time systems, takes time. Therefore, when we account for preemption in real-time systems, we account for the following delays, as described by Buttazzo [11]: we account for the following delays, as described by Buttazzo [11]: cache-related delay – the time needed to reload all the memory cache cache-related delay – the time needed to reload all the memory cache blocks which are evicted after the preemption, when they are reused in blocks which are evicted after the preemption, when they are reused in the remaining execution time of a task. the remaining execution time of a task. pipeline-related delay – the time needed to flush the pipeline of the pipeline-related delay – the time needed to flush the pipeline of the processor when the task is interrupted and the time needed to refill the processor when the task is interrupted and the time needed to refill the pipeline upon its resume. pipeline upon its resume. scheduling-related delay – the time which is needed by the scheduling scheduling-related delay – the time which is needed by the scheduling algorithm to suspend the running task, insert it into the ready queue, algorithm to suspend the running task, insert it into the ready queue, switch the context, and dispatching the incoming task. switch the context, and dispatching the incoming task. bus-related delay – the time which accounts for the extra bus interference bus-related delay – the time which accounts for the extra bus interference when the cache memory is accessed due to the additional cache misses when the cache memory is accessed due to the additional cache misses caused by the preemption. caused by the preemption.

Figure 2.4. Example of preemption-related delays. Figure 2.4. Example of preemption-related delays.

In Figure 2.4, we present the same tasks from Figure 2.3 but now we account In Figure 2.4, we present the same tasks from Figure 2.3 but now we account for the preemption delay (represented by black rectangles) due to preemption for the preemption delay (represented by black rectangles) due to preemption from τh on τi. Preemption delays may lead to a deadline miss, as shown in from τh on τi. Preemption delays may lead to a deadline miss, as shown in the ﬁgure. Also, some of the preemption delays can occur before resuming the ﬁgure. Also, some of the preemption delays can occur before resuming the previously preempted task (e.g. scheduling delay), or throughout the task the previously preempted task (e.g. scheduling delay), or throughout the task execution (e.g. cache-related delays), which we explain in Section 2.6. execution (e.g. cache-related delays), which we explain in Section 2.6. 42

12 Chapter 2. Background 12 Chapter 2. Background

2.3 Preemptive vs Non-preemptive Scheduling 2.3 Preemptive vs Non-preemptive Scheduling

Preemptive- and non-preemptive scheduling are widely used approaches in Preemptive- and non-preemptive scheduling are widely used approaches in real-time systems. However, where one approach sets the drawbacks, the real-time systems. However, where one approach sets the drawbacks, the other provides advantages and vice versa. We illustrate this statement with the other provides advantages and vice versa. We illustrate this statement with the following two examples. following two examples. In the ﬁrst example, shown in Figure 2.5, we illustrate two tasks: τi and τh, In the ﬁrst example, shown in Figure 2.5, we illustrate two tasks: τi and τh, where τh has a higher priority than τi. During one period of τi, τh is released where τh has a higher priority than τi. During one period of τi, τh is released two times, and since we use a preemptive scheduler, τh preempts τi twice and two times, and since we use a preemptive scheduler, τh preempts τi twice and causes two preemption-related delays. These preemption-related delays are causes two preemption-related delays. These preemption-related delays are long enough to cause a deadline miss of τi. long enough to cause a deadline miss of τi. In real-time systems where preemption can lead to high or even frequent In real-time systems where preemption can lead to high or even frequent preemption-related delays, it is a greater probability that the schedulability preemption-related delays, it is a greater probability that the schedulability of some task may be jeopardised by the introduced delays. In those cases it of some task may be jeopardised by the introduced delays. In those cases it might be better to use a non-preemptive scheduling, since the drawback of the might be better to use a non-preemptive scheduling, since the drawback of the

Figure 2.5. Example of the fully-preemptive scheduling drawback. Figure 2.5. Example of the fully-preemptive scheduling drawback.

Figure 2.6. Example of the non-preemptive scheduling drawback. Figure 2.6. Example of the non-preemptive scheduling drawback. 43

2.4 Limited-Preemptive Scheduling 13 2.4 Limited-Preemptive Scheduling 13 fully-preemptive scheduling is emphasised. fully-preemptive scheduling is emphasised. In the second example, shown in Figure 2.6, we illustrate the same tasks as In the second example, shown in Figure 2.6, we illustrate the same tasks as in the previous example, however now we use a non-preemptive scheduler. Here in the previous example, however now we use a non-preemptive scheduler. Here we show the drawback of the non-preemptive scheduling and that is blocking we show the drawback of the non-preemptive scheduling and that is blocking from the lower priority tasks. In this example, τi arrives before the higher from the lower priority tasks. In this example, τi arrives before the higher priority task τh. Therefore, τh waits for τi before it can start to execute, and this priority task τh. Therefore, τh waits for τi before it can start to execute, and this event is called blocking. Since the blocking from the lower priority task τi is event is called blocking. Since the blocking from the lower priority task τi is long, τh misses its deadline. long, τh misses its deadline. To overcome the drawbacks of the fully-preemptive scheduling (high preemption- To overcome the drawbacks of the fully-preemptive scheduling (high preemption- related delays) and non-preemptive scheduling (high lower priority blocking) related delays) and non-preemptive scheduling (high lower priority blocking) a new scheduling approach emerged at the time, called Limited Preemptive a new scheduling approach emerged at the time, called Limited Preemptive Scheduling. This paradigm resolves the drawback of the two above mentioned Scheduling. This paradigm resolves the drawback of the two above mentioned scheduling algorithms and it is described in the following section. scheduling algorithms and it is described in the following section.

2.4 Limited-Preemptive Scheduling 2.4 Limited-Preemptive Scheduling

Instead of always enabling preemption (fully-preemptive scheduling) or never Instead of always enabling preemption (fully-preemptive scheduling) or never enabling preemption (non-preemptive scheduling), in some cases we may enabling preemption (non-preemptive scheduling), in some cases we may improve a taskset schedulability if we combine both approaches. Limited- improve a taskset schedulability if we combine both approaches. Limited- Preemptive Scheduling (LPS) is based on the observation that in order to Preemptive Scheduling (LPS) is based on the observation that in order to improve the schedulability of a taskset, we can choose when to enable or disable improve the schedulability of a taskset, we can choose when to enable or disable a preemption. a preemption. E.g., given the tasks from Figures 2.5 and 2.6, LPS can guarantee that all E.g., given the tasks from Figures 2.5 and 2.6, LPS can guarantee that all the tasks meet their deadlines, as shown in Figure 2.7. In this example, the the tasks meet their deadlines, as shown in Figure 2.7. In this example, the lower priority task τi starts to execute ﬁrst, and during its execution, a higher lower priority task τi starts to execute ﬁrst, and during its execution, a higher priority task τh arrives in the ready queue. At this point, preemption is enabled, priority task τh arrives in the ready queue. At this point, preemption is enabled,

Figure 2.7. Example of the limited-preemptive scheduling beneﬁt. Figure 2.7. Example of the limited-preemptive scheduling beneﬁt. 44

14 Chapter 2. Background 14 Chapter 2. Background

and it introduces a preemption-related delay, and τi continues its execution. and it introduces a preemption-related delay, and τi continues its execution. At the second arrival of τh, a preemption-related delay would jeopardise the At the second arrival of τh, a preemption-related delay would jeopardise the schedulability of τi, but the remaining execution of τi would not produce schedulability of τi, but the remaining execution of τi would not produce the blocking which would jeopardise the schedulability of τh. Therefore, the the blocking which would jeopardise the schedulability of τh. Therefore, the preemption is disabled at this point and both of the tasks are able to meet their preemption is disabled at this point and both of the tasks are able to meet their deadlines. deadlines. Butazzo et al. [12] have shown that LPS can significantly improve a taskset Butazzo et al. [12] have shown that LPS can significantly improve a taskset schedulability compared to fully-preemptive and non-preemptive scheduling. schedulability compared to fully-preemptive and non-preemptive scheduling. Also, LPS can be seen as the superset of those approaches, since if any taskset Also, LPS can be seen as the superset of those approaches, since if any taskset is schedulable with the fully-preemptive or non-preemptive scheduling, it will is schedulable with the fully-preemptive or non-preemptive scheduling, it will also be schedulable with LPS. However, some tasksets may be schedulable only also be schedulable with LPS. However, some tasksets may be schedulable only with LPS. with LPS. Several approaches are introduced in order to enable LPS, such as: Several approaches are introduced in order to enable LPS, such as: Preemption Thresholds (LPS-PT) – Approach proposed by Wang and Preemption Thresholds (LPS-PT) – Approach proposed by Wang and Saksena [13], where for each task τi, a priority threshold is assigned such Saksena [13], where for each task τi, a priority threshold is assigned such that τi can be preempted only by those tasks which have a priority higher that τi can be preempted only by those tasks which have a priority higher than the predefined threshold. than the predefined threshold. Deferred Preemptions (LPS-DP) – Approach proposed by Baruah [14], Deferred Preemptions (LPS-DP) – Approach proposed by Baruah [14], where for each task τi, the maximum non-preemptive interval is specified. where for each task τi, the maximum non-preemptive interval is specified. When a higher priority task arrives during the execution of τi, it is able to When a higher priority task arrives during the execution of τi, it is able to preempt it only after the finalisation of this time interval. preempt it only after the finalisation of this time interval. Fixed Preemption Points (LPS-FPP) – Approach proposed by Burns Fixed Preemption Points (LPS-FPP) – Approach proposed by Burns [15], where each task is divided into non-preemptive regions, which are [15], where each task is divided into non-preemptive regions, which are obtained by selecting predefined locations inside the task code where obtained by selecting predefined locations inside the task code where preemptions are enabled. preemptions are enabled. Varying Preemption Thresholds(LPS-VPT) – Approach proposed by Bril Varying Preemption Thresholds(LPS-VPT) – Approach proposed by Bril et al. [16] which is a combination of LPS-DP and LPS-FPP. et al. [16] which is a combination of LPS-DP and LPS-FPP. It has been shown by Butazzo et al. [12] that LPS-FPP provides better schedula- It has been shown by Butazzo et al. [12] that LPS-FPP provides better schedulability results compared to LPS-PT and LPS-DP. Hence, in this thesis, we select bility results compared to LPS-PT and LPS-DP. Hence, in this thesis, we select LPS-FPP as the approach of interest and in the following section, we describe it LPS-FPP as the approach of interest and in the following section, we describe it in more detail. in more detail.

2.4.1 Tasks with ﬁxed preemption points (LPS-FPP) 2.4.1 Tasks with ﬁxed preemption points (LPS-FPP) Fixed Preemption Points is a limited-preemptive scheduling approach where Fixed Preemption Points is a limited-preemptive scheduling approach where the preemption points of a task are selected and known prior to runtime. We the preemption points of a task are selected and known prior to runtime. We now explain one potential way on how this can be achieved. now explain one potential way on how this can be achieved. 45

2.4 Limited-Preemptive Scheduling 15 2.4 Limited-Preemptive Scheduling 15

Initially, a task is speciﬁed by a program code consisting of many instruc- Initially, a task is speciﬁed by a program code consisting of many instructions between which the preemptions may occur. However, succeeding instructions between which the preemptions may occur. However, succeeding instructions may be joined in a non-preemptive region, and during the execution of tions may be joined in a non-preemptive region, and during the execution of this region, preemptions are disabled until the region is exited, i.e. until the this region, preemptions are disabled until the region is exited, i.e. until the next instruction to be executed does not belong to the declared region. This next instruction to be executed does not belong to the declared region. This is achieved by disabling a scheduler to interrupt the currently running task, is achieved by disabling a scheduler to interrupt the currently running task, whenever the instructions that belong to a NPR are executed. In Figure 2.8.a), whenever the instructions that belong to a NPR are executed. In Figure 2.8.a), we show a task state when no NPR is declared. Between the three illustrated we show a task state when no NPR is declared. Between the three illustrated instructions (I1, I2, and I3) there are two potential preemption points (PP 1 and instructions (I1, I2, and I3) there are two potential preemption points (PP 1 and PP 2). In Figure 2.8.a) we show the task state when we declare the NPR after PP 2). In Figure 2.8.a) we show the task state when we declare the NPR after I1, which lasts until the end of I3. Now, the only preemption that is allowed is I1, which lasts until the end of I3. Now, the only preemption that is allowed is at PP 1, between I1 and I2. The process of declaring non-preemptive regions is at PP 1, between I1 and I2. The process of declaring non-preemptive regions is also called preemption point selection. also called preemption point selection. a) a)

b) b)

Figure 2.8. a) Preemption points before the declaration of non-preemptive regions, and b) after the Figure 2.8. a) Preemption points before the declaration of non-preemptive regions, and b) after the declaration of non-preemptive regions. declaration of non-preemptive regions.

2.4.2 Implementation example of LPS-FPP 2.4.2 Implementation example of LPS-FPP Non-preemptive regions can be implemented in many different ways due to Non-preemptive regions can be implemented in many different ways due to many different kernels, programming languages, etc. In the remainder of this many different kernels, programming languages, etc. In the remainder of this section, we show the brief example on how this can be achieved on LitmusRT section, we show the brief example on how this can be achieved on LitmusRT [17, 18], which is a real-time extension of the Linux kernel, using the tasks [17, 18], which is a real-time extension of the Linux kernel, using the tasks coded in C programming language. coded in C programming language. In the following example, we implement a job and split its execution into In the following example, we implement a job and split its execution into two parts. During the ﬁrst part of the job’s execution, for the pedagogical two parts. During the ﬁrst part of the job’s execution, for the pedagogical purpose, we bound the execution duration to approximately 20 ms. During this purpose, we bound the execution duration to approximately 20 ms. During this 46

16 Chapter 2. Background 16 Chapter 2. Background interval, the goal is to disable preemptions. Then, after the first part, we allow interval, the goal is to disable preemptions. Then, after the first part, we allow for preemption, and at the end, we execute non-preemptively for another 20 ms. for preemption, and at the end, we execute non-preemptively for another 20 ms. For this purpose, we first create a subroutine SR() whose execution is bounded For this purpose, we first create a subroutine SR() whose execution is bounded to approximately 20 ms in the worst-case scenario. to approximately 20 ms in the worst-case scenario.

Listing 2.1. Subroutine SR() whose WCET is approximately 20ms long Listing 2.1. Subroutine SR() whose WCET is approximately 20ms long void SR ( void){ void SR ( void){ clock_t begin_t = clock (); // start time clock_t begin_t = clock (); // start time clock_t current_t = begin_t ; // time−tracking variable clock_t current_t = begin_t ; // time−tracking variable double duration = 0.0; // execution−tracking variable double duration = 0.0; // execution−tracking variable while ( duration < 0.02 ){ while ( duration < 0.02 ){ current_t = clock (); current_t = clock (); duration = (double )( current_t − begin_t )/CLOCKS_PER_SEC ; duration = (double )( current_t − begin_t )/CLOCKS_PER_SEC ; } } } }

Then, the job would have the following code structure, consisting of two Then, the job would have the following code structure, consisting of two subsequent executions of the above-deﬁned subroutine. subsequent executions of the above-deﬁned subroutine.

Listing 2.2. Fully-preemptive job whose execution consists of two subroutines Listing 2.2. Fully-preemptive job whose execution consists of two subroutines int job (void){ int job (void){ SR(); SR(); SR(); SR(); } }

However, under preemptive scheduling, in its current form, the job can be However, under preemptive scheduling, in its current form, the job can be preempted many times during its execution. To allow for both subroutines to preempted many times during its execution. To allow for both subroutines to execute non-preemptively we create a single fixed preemption point throughout execute non-preemptively we create a single fixed preemption point throughout the execution of the job. To achieve this, we use the LitmusRT userspace library the execution of the job. To achieve this, we use the LitmusRT userspace library (called liblitmus), and its functions enter_np(), and exit_np(). (called liblitmus), and its functions enter_np(), and exit_np(). The function enter_np() raises the flag of the thread in the thread control The function enter_np() raises the flag of the thread in the thread control page, which is monitored by a kernel, thus not-allowing for preemptions starting page, which is monitored by a kernel, thus not-allowing for preemptions starting from the point of the flag raise. This is achieved with the following code: from the point of the flag raise. This is achieved with the following code:

Listing 2.3. Liblitmus function for declaring the start of the non-preemptive region Listing 2.3. Liblitmus function for declaring the start of the non-preemptive region void enter_np (void){ void enter_np (void){ if ( likely (ctrl_page != NULL ) | | i n i t_kernel_iface () == 0) if ( likely (ctrl_page != NULL ) | | i n i t_kernel_iface () == 0) ctrl_page−>sched.np. flag++; ctrl_page−>sched.np. flag++; else else fprintf (stderr , "enter_np: control page not mapped ! \ n" ) ; fprintf (stderr , "enter_np: control page not mapped ! \ n" ) ; } } 47

2.4 Limited-Preemptive Scheduling 17 2.4 Limited-Preemptive Scheduling 17

The function exit_np() invokes the scheduler with the sched_yield() func- The function exit_np() invokes the scheduler with the sched_yield() function that checks whether there is a waiting job which has a higher priority than tion that checks whether there is a waiting job which has a higher priority than the currently executing one. If there is, the CPU starts to execute the highest- the currently executing one. If there is, the CPU starts to execute the highest- priority ready job. If there is not, the CPU continues to execute the current job. priority ready job. If there is not, the CPU continues to execute the current job. This is achieved with the following code: This is achieved with the following code:

Listing 2.4. Liblitmus function for declaring the end of the non-preemptive region Listing 2.4. Liblitmus function for declaring the end of the non-preemptive region void exit_np (void){ void exit_np (void){ if ( likely (ctrl_page != NULL ) && if ( likely (ctrl_page != NULL ) && ctrl_page−>sched.np. flag && ctrl_page−>sched.np. flag && !(−− ctrl_page−>sched.np. flag )) { !(−− ctrl_page−>sched.np. flag )) { / / became preemptive , check delayed preemptions / / became preemptive , check delayed preemptions __sync_synchronize (); __sync_synchronize (); if ( ctrl_page−>sched .np. preempt ) if ( ctrl_page−>sched .np. preempt ) sched_yield (); sched_yield (); } } } } By applying these two functions, we obtain the following job which may be By applying these two functions, we obtain the following job which may be preempted only between the two subroutines, at a single preemption point. preempted only between the two subroutines, at a single preemption point.

Listing 2.5. Job with a single preemption point whose execution consists of two subroutines Listing 2.5. Job with a single preemption point whose execution consists of two subroutines int job (void){ int job (void){ enter_np (); SR(); exit_np (); enter_np (); SR(); exit_np (); / / ===== PREEMPTION POINT ===== / / ===== PREEMPTION POINT ===== enter_np (); SR(); exit_np (); enter_np (); SR(); exit_np (); } }

Figure 2.9. Scheduling trace of the implemented job obtained with Feather-Trace. Figure 2.9. Scheduling trace of the implemented job obtained with Feather-Trace.

In Figure 2.9, we illustrate the scheduling trace where the above job is In Figure 2.9, we illustrate the scheduling trace where the above job is released 200 ms after starting the trace, and the higher priority job arrives at released 200 ms after starting the trace, and the higher priority job arrives at 210-th ms after the trace start. As expected, the preemption occurs after the 210-th ms after the trace start. As expected, the preemption occurs after the execution of the ﬁrst subroutine, 220 ms after the trace start, instead of 210 ms. execution of the ﬁrst subroutine, 220 ms after the trace start, instead of 210 ms. Since an interplay between tasks and system components plays an integral Since an interplay between tasks and system components plays an integral role in the possibility of a task to miss its deadline, in the following section we role in the possibility of a task to miss its deadline, in the following section we describe the embedded system features and architecture properties that are of describe the embedded system features and architecture properties that are of importance in the remainder of the thesis. importance in the remainder of the thesis. 48

18 Chapter 2. Background 18 Chapter 2. Background

2.5 Embedded systems – Architecture Overview 2.5 Embedded systems – Architecture Overview

An embedded system is a computer system which has a dedicated function An embedded system is a computer system which has a dedicated function within a larger electrical or mechanical system. It is often a combination of a within a larger electrical or mechanical system. It is often a combination of a Central Processing Unit (CPU), memory, and input/output peripheral devices. Central Processing Unit (CPU), memory, and input/output peripheral devices. In this section, we brieﬂy explain the most relevant parts for this thesis, about In this section, we brieﬂy explain the most relevant parts for this thesis, about the architecture of embedded systems. the architecture of embedded systems.

2.5.1 Classification according to CPU type 2.5.1 Classification according to CPU type Embedded systems can be classified according to the type of central-processing Embedded systems can be classified according to the type of central-processing unit, more precisely by the number of cores available on a chip. This number unit, more precisely by the number of cores available on a chip. This number often determines how many threads can be processed at any time. Therefore, often determines how many threads can be processed at any time. Therefore, we primarily distinguish among the following types of CPUs: we primarily distinguish among the following types of CPUs:

Single-core CPU – A single-core processor is a microprocessor with a Single-core CPU – A single-core processor is a microprocessor with a single core on a chip, running a single thread at a time. single core on a chip, running a single thread at a time.

Multi-core CPU – A multi-core processor has from two to eight cores on Multi-core CPU – A multi-core processor has from two to eight cores on a chip. a chip.

Many-core CPU – A many-core processor has more than eight cores on a Many-core CPU – A many-core processor has more than eight cores on a chip, and they are often designed for a high degree of parallel processing. chip, and they are often designed for a high degree of parallel processing.

2.5.2 Cache Memory 2.5.2 Cache Memory The prerequisite to executing an instruction on CPU is to fetch the instruction The prerequisite to executing an instruction on CPU is to fetch the instruction and its necessary data from a memory unit where those are stored. An architec- and its necessary data from a memory unit where those are stored. An architecture with a single memory unit which is directly connected to CPU often brings ture with a single memory unit which is directly connected to CPU often brings many technical and economical limitations in the general case, which led to the many technical and economical limitations in the general case, which led to the invention of cache memory. With the expense of size, compared to the main invention of cache memory. With the expense of size, compared to the main memory, cache memory improves the efﬁciency and speed of data retrieval, memory, cache memory improves the efﬁciency and speed of data retrieval, which is many times slower when the main memory directly communicates which is many times slower when the main memory directly communicates with a CPU. with a CPU. Instructions and data are stored in memory blocks, and a memory block is Instructions and data are stored in memory blocks, and a memory block is the main atomic entity being transferred between different memory units (main the main atomic entity being transferred between different memory units (main and cache memory in this case). Simply explained, when some instruction from and cache memory in this case). Simply explained, when some instruction from the main memory is needed for the execution on the CPU, the memory block the main memory is needed for the execution on the CPU, the memory block containing the instruction is loaded from the main memory into the appropriate containing the instruction is loaded from the main memory into the appropriate location of the cache memory. And then, that memory block is directly loaded location of the cache memory. And then, that memory block is directly loaded 49

2.5 Embedded systems – Architecture Overview 19 2.5 Embedded systems – Architecture Overview 19 from the cache memory to the CPU. However, if a memory block is already in from the cache memory to the CPU. However, if a memory block is already in the cache, it is directly loaded from there without the need for a costly reload the cache, it is directly loaded from there without the need for a costly reload from the main memory. In this way, cache-based architecture often achieves from the main memory. In this way, cache-based architecture often achieves improved efﬁciency and performance due to principles of locality, that are improved efﬁciency and performance due to principles of locality, that are integrated into the storing mechanisms of the cache memory. Those are: integrated into the storing mechanisms of the cache memory. Those are:

Temporal locality – which means that the recently accessed memory Temporal locality – which means that the recently accessed memory blocks are likely to be reaccessed soon. This principle is evident in blocks are likely to be reaccessed soon. This principle is evident in the execution of loops since looping increases the likelihood of reusing the execution of loops since looping increases the likelihood of reusing recently accessed memory blocks. recently accessed memory blocks.

Spatial locality – which means that the adjacent or surrounding memory Spatial locality – which means that the adjacent or surrounding memory blocks are likely to be accessed contemporary. This principle is evident blocks are likely to be accessed contemporary. This principle is evident because of the sequential code alignment and data clustering. because of the sequential code alignment and data clustering.

In the remainder of this section, we describe cache organisation, classiﬁca- In the remainder of this section, we describe cache organisation, classiﬁca- tion, and other relevant concepts for this work, referring to [19], and [20]. tion, and other relevant concepts for this work, referring to [19], and [20].

Single-level cache Single-level cache In this section, we describe the organization and the important concepts within In this section, we describe the organization and the important concepts within a domain of single-level caches. Single level caches are typically located very a domain of single-level caches. Single level caches are typically located very close to the microprocessor, sometimes even on the processor circuit, in which close to the microprocessor, sometimes even on the processor circuit, in which case they are called processor caches. We now explain the most important case they are called processor caches. We now explain the most important concepts of memory transfer between the CPU, cache, and the main memory concepts of memory transfer between the CPU, cache, and the main memory (see Figure 2.10). (see Figure 2.10).

... ... ... ... ... ...... ...

Figure 2.10. Memory units and their simpliﬁed structures Figure 2.10. Memory units and their simpliﬁed structures 50

20 Chapter 2. Background 20 Chapter 2. Background

Memory access is a request issued by a CPU and its goal is to read from or Memory access is a request issued by a CPU and its goal is to read from or write to a location in the main memory. Considering a read request, the cache write to a location in the main memory. Considering a read request, the cache controller first checks for a corresponding location in the cache. If the content controller first checks for a corresponding location in the cache. If the content of the requested location from the main memory exists in a corresponding cache of the requested location from the main memory exists in a corresponding cache memory location, then a cache hit has occurred, and data is directly read from memory location, then a cache hit has occurred, and data is directly read from the cache location. Otherwise, if the CPU does not find a memory location the cache location. Otherwise, if the CPU does not find a memory location whose content corresponds to the location from the main memory, then a cache whose content corresponds to the location from the main memory, then a cache miss has occurred. This further requires loading the requested memory block miss has occurred. This further requires loading the requested memory block from the main memory to a location within cache memory, and only then the from the main memory to a location within cache memory, and only then the CPU request can be fulfilled by reading the content from the location within the CPU request can be fulfilled by reading the content from the location within the cache. The write access is a bit different than the read access, but to describe cache. The write access is a bit different than the read access, but to describe them both properly, we will need to describe cache organisation in more detail. them both properly, we will need to describe cache organisation in more detail.

Cache organisation Cache organisation Caches store and load memory blocks of line size L. A cache line is usually Caches store and load memory blocks of line size L. A cache line is usually quite larger than one of its elements that may be accessed, i.e. requested by the quite larger than one of its elements that may be accessed, i.e. requested by the CPU. Caches are divided into S cache sets, and each set contains K cache lines. CPU. Caches are divided into S cache sets, and each set contains K cache lines. A memory block is first mapped onto a set and then placed into one of the cache A memory block is first mapped onto a set and then placed into one of the cache lines of the mapped set. The number of cache lines within a single cache set lines of the mapped set. The number of cache lines within a single cache set determines the cache associativity. A set-associative cache can be represented determines the cache associativity. A set-associative cache can be represented as a S × K matrix, where each cache set contains K cache lines. as a S × K matrix, where each cache set contains K cache lines. Many existing processor caches are either direct-mapped, two-way set- Many existing processor caches are either direct-mapped, two-way set- associative, or four-way set-associative, as identified by Yan Solihin [21]. We associative, or four-way set-associative, as identified by Yan Solihin [21]. We further explain two special cases of set-associative caches: further explain two special cases of set-associative caches:

Direct-mapped cache is a cache structure where K =1, i.e. each cache Direct-mapped cache is a cache structure where K =1, i.e. each cache set consists of a single line, which further means that a memory block set consists of a single line, which further means that a memory block can reside in exactly one cache line. can reside in exactly one cache line.

Fully-associative cache is a cache structure where K = S, i.e. the cache Fully-associative cache is a cache structure where K = S, i.e. the cache consists of a single cache set, and that cache set consist of K cache lines, consists of a single cache set, and that cache set consist of K cache lines, which further means that a memory block can reside in any of the K which further means that a memory block can reside in any of the K cache lines. cache lines.

Caches can also be categorised according to the type of information they Caches can also be categorised according to the type of information they store. For that reason, there is a difference between instruction caches, and data store. For that reason, there is a difference between instruction caches, and data caches. The main difference is that instructions can only be read from memory, caches. The main difference is that instructions can only be read from memory, i.e. copied from the main memory to cache. This means that CPU issues only i.e. copied from the main memory to cache. This means that CPU issues only read accesses to the instruction cache. Regarding the data cache, memory blocks read accesses to the instruction cache. Regarding the data cache, memory blocks 51

2.5 Embedded systems – Architecture Overview 21 2.5 Embedded systems – Architecture Overview 21 can be copied in several directions, from the main memory to the data cache, can be copied in several directions, from the main memory to the data cache, from the data cache to the main memory, and also from registers to the data from the data cache to the main memory, and also from registers to the data cache. This means that in data caches both read and write accesses are possible. cache. This means that in data caches both read and write accesses are possible. In a single cache row entry, there are several important ﬁelds, and those are In a single cache row entry, there are several important ﬁelds, and those are tag, cache line, and valid bit. tag, cache line, and valid bit. Cache line, or cache block, is simply a container where a memory block Cache line, or cache block, is simply a container where a memory block can be stored. can be stored.

Tag is an identifier from which the original address of the main memory Tag is an identifier from which the original address of the main memory can be reconstructed in order to check if its content is already in cache. can be reconstructed in order to check if its content is already in cache. Valid bit is an indicator of whether the data in a data block is valid Valid bit is an indicator of whether the data in a data block is valid (resulting in 1) or not (resulting in 0). In the case of data cache, it is also (resulting in 1) or not (resulting in 0). In the case of data cache, it is also required to have an additional flag bit, called a dirty bit. The dirty bit required to have an additional flag bit, called a dirty bit. The dirty bit indicates a changed memory content in the associated cache line, which indicates a changed memory content in the associated cache line, which did not yet propagate to the main memory. did not yet propagate to the main memory. Now, we explain how a memory request is handled, given that each cache Now, we explain how a memory request is handled, given that each cache row entry has the above structure. row entry has the above structure.

Locating the requested content in the cache memory Locating the requested content in the cache memory When a CPU issues a request for content from some memory address, the When a CPU issues a request for content from some memory address, the cache controller performs several steps to check whether the requested memory cache controller performs several steps to check whether the requested memory content is already in the cache memory and return its content if it is. Since content is already in the cache memory and return its content if it is. Since the requested memory address contains all the information that is needed for the requested memory address contains all the information that is needed for locating the corresponding data in the cache, the address is ﬁrst separated into locating the corresponding data in the cache, the address is ﬁrst separated into three parts, tag, set index, and offset. three parts, tag, set index, and offset. Example: We will use the following 12-bit memory address as an example Example: We will use the following 12-bit memory address as an example MA = 00001100000, which is 0x060 in the hexadecimal numeral system. MA = 00001100000, which is 0x060 in the hexadecimal numeral system.

00000 11 00000 00000 11 00000 ↑↑↑ ↑↑↑ Tag Set index Oﬀset Tag Set index Oﬀset

Offset represents the least signiﬁcant bits of the requested address, and given Offset represents the least signiﬁcant bits of the requested address, and given that the block size is equal to L, the number of offset bits is equal to log2(L). that the block size is equal to L, the number of offset bits is equal to log2(L). E.g. For memory block size of 32B, the number of offset bits is log2(32) = 5, E.g. For memory block size of 32B, the number of offset bits is log2(32) = 5, thus the last 5 bits of the above address do not need to be considered. thus the last 5 bits of the above address do not need to be considered. 52

22 Chapter 2. Background 22 Chapter 2. Background

Set index represents the next highest group of bits which are used to deter- Set index represents the next highest group of bits which are used to determine which cache set should be looked at to check if the memory block from the mine which cache set should be looked at to check if the memory block from the requested address is stored in the respective cache set. Given that the number requested address is stored in the respective cache set. Given that the number of cache sets is S, the number of index bits is log2(S). E.g. Assuming that of cache sets is S, the number of index bits is log2(S). E.g. Assuming that the cache size is 128B, and L =32B, S = 128B/32B =4. Since the cache the cache size is 128B, and L =32B, S = 128B/32B =4. Since the cache contains 4 sets, number of set index bits is log2(4) = 2. contains 4 sets, number of set index bits is log2(4) = 2. Tag represents the remaining group of bits other than set index and offset, Tag represents the remaining group of bits other than set index and offset, and it is computed with equation AL − log2(L) − log2(S), where AL is the and it is computed with equation AL − log2(L) − log2(S), where AL is the length of the requested address. E.g. The number of tag bits is 12 − 5 − 2=5. length of the requested address. E.g. The number of tag bits is 12 − 5 − 2=5. Thus the ﬁrst 5 bits of the address are tag bits. Thus the ﬁrst 5 bits of the address are tag bits. Next, to check whether the requested memory content is already in the cache Next, to check whether the requested memory content is already in the cache memory, the cache controller performs the following procedure: memory, the cache controller performs the following procedure:

1. Extracts the set index from the requested address, as shown above. This 1. Extracts the set index from the requested address, as shown above. This is exactly the set index where the content of the requested address should is exactly the set index where the content of the requested address should reside in. reside in.

2. Checks in the above-derived set if any of its cache row entries has a 2. Checks in the above-derived set if any of its cache row entries has a tag which is the same as the tag of the requested address. If there is tag which is the same as the tag of the requested address. If there is such cache entry, the controller proceeds to the next step, otherwise, the such cache entry, the controller proceeds to the next step, otherwise, the requested content is not in the cache and should be loaded from the main requested content is not in the cache and should be loaded from the main memory. memory.

3. For the cache row entry where the data was found, the controller checks 3. For the cache row entry where the data was found, the controller checks the valid bit. In case it is 1, the data is in the respective cache line, the valid bit. In case it is 1, the data is in the respective cache line, otherwise, it is not in cache at all. otherwise, it is not in cache at all.

Loading the requested content into the cache Loading the requested content into the cache

In case when the content from the requested address is not present in the cache In case when the content from the requested address is not present in the cache memory, it needs to be loaded from main memory and placed in one of the cache memory, it needs to be loaded from main memory and placed in one of the cache blocks. When loading the cache, the contents of the neighbouring addresses are blocks. When loading the cache, the contents of the neighbouring addresses are also brought in, to take advantage of spatial locality. Commonly, the range of also brought in, to take advantage of spatial locality. Commonly, the range of concerned addresses is determined such that the starting address of the range concerned addresses is determined such that the starting address of the range is obtained when the offset part of the requested address is "zeroed out", and is obtained when the offset part of the requested address is "zeroed out", and the ending address of the range is obtained by replacing all the bits in the offset the ending address of the range is obtained by replacing all the bits in the offset part with 1’s. part with 1’s. 53

2.5 Embedded systems – Architecture Overview 23 2.5 Embedded systems – Architecture Overview 23

Write requests (data cache) Write requests (data cache) Write requests can be handled only in data caches since data can be written to Write requests can be handled only in data caches since data can be written to them, but then it can happen that a cache line does not have the same data as the them, but then it can happen that a cache line does not have the same data as the corresponding block in main memory. In this case, there are two widely used corresponding block in main memory. In this case, there are two widely used policies that can be applied. policies that can be applied.

Write-Trough is a policy which writes data directly and synchronously Write-Trough is a policy which writes data directly and synchronously both to the main memory and to the cache memory. both to the main memory and to the cache memory.

Write-Back is a policy which writes data in the cache memory and assigns Write-Back is a policy which writes data in the cache memory and assigns a dirty bit of the respective cache block to 1. The change propagates to a dirty bit of the respective cache block to 1. The change propagates to main memory only when the cache block is loaded with data from some main memory only when the cache block is loaded with data from some other address (when the tag is changed). other address (when the tag is changed).

When a cache miss occurs from a write request, there are several ways to When a cache miss occurs from a write request, there are several ways to decide with the content loading into the cache and main memory. Based on this, decide with the content loading into the cache and main memory. Based on this, there are two most commonly used approaches: there are two most commonly used approaches:

Write allocate – data from the requested location is loaded to cache, and Write allocate – data from the requested location is loaded to cache, and this is followed by a write-hit operation. this is followed by a write-hit operation.

No-write allocate – data from the requested location is not loaded to No-write allocate – data from the requested location is not loaded to cache but written directly to the main memory. Because of such rule, data cache but written directly to the main memory. Because of such rule, data is loaded into the cache only on read misses. is loaded into the cache only on read misses.

The above policies are typically combined into the following two setups, The above policies are typically combined into the following two setups, write-trough with no-write allocate and write-back with write allocate. As write-trough with no-write allocate and write-back with write allocate. As stated by Altmeyer [20], for timing-critical embedded systems, write-through stated by Altmeyer [20], for timing-critical embedded systems, write-through caches are considered beneﬁcial as the point in time of each main-memory write caches are considered beneﬁcial as the point in time of each main-memory write depends statically on the instruction. depends statically on the instruction.

Accesses in direct-mapped cache Accesses in direct-mapped cache In this section, we provide an example of a direct-mapped cache and a list of In this section, we provide an example of a direct-mapped cache and a list of memory requests resulting in cache hits and misses. Let us assume that the memory requests resulting in cache hits and misses. Let us assume that the given cache parameters are as follows: cache size CS = 128B, address length given cache parameters are as follows: cache size CS = 128B, address length AL =12, memory block size L =32B. Now, we can compute the number of AL =12, memory block size L =32B. Now, we can compute the number of cache sets S = CS/L = 128B/32B =4. Also, we can compute the number of cache sets S = CS/L = 128B/32B =4. Also, we can compute the number of offset bits for any requested memory address #offset =log2(L)=log2(32) = offset bits for any requested memory address #offset =log2(L)=log2(32) = 5. The number of set index bits is #index =log2(S)=log2(4) = 2, and the 5. The number of set index bits is #index =log2(S)=log2(4) = 2, and the 54

24 Chapter 2. Background 24 Chapter 2. Background

Address requests Cache (simplified) Address requests Cache (simplified)

Address Tag Index Offset Set V bit Tag Address Tag Index Offset Set V bit Tag

0x080 00001 00 00000 00 1 01000 0x080 00001 00 00000 00 1 01000

01 1 10010 01 1 10010 0x08C 00001 00 01100 0x08C 00001 00 01100 10 1 00000 10 1 00000 0x178 00010 11 11000 0x178 00010 11 11000 11 0 00010 11 0 00010

Figure 2.11. Memory requests (left) and initial state of the direct-mapped cache (right). Figure 2.11. Memory requests (left) and initial state of the direct-mapped cache (right). number of tag bits is #tag =12− 5 − 2=5. With the above-computed values, number of tag bits is #tag =12− 5 − 2=5. With the above-computed values, we can check for any CPU memory request whether it results in cache hit, i.e. we can check for any CPU memory request whether it results in cache hit, i.e. requested content is in cache, or cache miss, i.e. requested content is not in requested content is in cache, or cache miss, i.e. requested content is not in cache. cache.

Address requests Cache (simplified) Address requests Cache (simplified)

Address Tag Index Offset Set V bit Tag Address Tag Index Offset Set V bit Tag

0x080 00001 00 00000 00 1 00001 0x080 00001 00 00000 00 1 00001 Miss Miss 01 1 10010 01 1 10010 0x08C 00001 00 01100 0x08C 00001 00 01100 Hit 10 1 00000 Hit 10 1 00000

0x178 00010 11 11000 11 1 00010 0x178 00010 11 11000 11 1 00010 Miss Miss

Figure 2.12. The final classification of cache accesses (left) and the final cache state (right). Figure 2.12. The final classification of cache accesses (left) and the final cache state (right).

Let us take for example that the current state of the cache is depicted in Let us take for example that the current state of the cache is depicted in Figure 2.11 and that we have the three shown address requests, in an order given Figure 2.11 and that we have the three shown address requests, in an order given from top to bottom. For the requested address 0x080, we ﬁrst check where from top to bottom. For the requested address 0x080, we ﬁrst check where it should reside in the cache by reading the index bits 00. Now, we compare it should reside in the cache by reading the index bits 00. Now, we compare the tag (0001) derived from the requested memory address, and the tag that the tag (0001) derived from the requested memory address, and the tag that is currently written for cache set 00, which is 0100. Since those two tags do is currently written for cache set 00, which is 0100. Since those two tags do not match, this means that a cache miss occurs and that the content from the not match, this means that a cache miss occurs and that the content from the requested address needs to be loaded into the cache. Next, we handle the request requested address needs to be loaded into the cache. Next, we handle the request for the address 0x08C. Its content should be in cache set 00, and since now for the address 0x08C. Its content should be in cache set 00, and since now 55

2.5 Embedded systems – Architecture Overview 25 2.5 Embedded systems – Architecture Overview 25 the tags are matching (see the ﬁnal Figure 2.12), and a valid bit is set to 1, the the tags are matching (see the ﬁnal Figure 2.12), and a valid bit is set to 1, the access results in a cache hit. Finally, the last address results in a cache miss access results in a cache hit. Finally, the last address results in a cache miss although the tags are matching. This is the case because the valid set is set to 0. although the tags are matching. This is the case because the valid set is set to 0.

Accesses in 2-way associative cache Accesses in 2-way associative cache

Accesses in 2-way associative caches are slightly different than in the case Accesses in 2-way associative caches are slightly different than in the case of direct-mapped cache. First of all, let us assume the same initial cache of direct-mapped cache. First of all, let us assume the same initial cache parameters as for the direct-mapped cache before, thus cache size CS = 128B, parameters as for the direct-mapped cache before, thus cache size CS = 128B, address length AL =12, memory block size L =32B, Now, the number address length AL =12, memory block size L =32B, Now, the number of cache sets is computed a bit differently because each set has 2 cache lines of cache sets is computed a bit differently because each set has 2 cache lines S = CS/L × 2 = 128B/32B × 2=2. The number of offset bits remains S = CS/L × 2 = 128B/32B × 2=2. The number of offset bits remains the same #offset =log2(L)=log2(32) = 5, but the number of set index the same #offset =log2(L)=log2(32) = 5, but the number of set index bits is #index =log2(S)=log2(2) = 1, and the number of tag bits is bits is #index =log2(S)=log2(2) = 1, and the number of tag bits is #tag =12− 5 − 1=6. #tag =12− 5 − 1=6. Since each cache set consists of two cache lines, let us assume that there is an Since each cache set consists of two cache lines, let us assume that there is an address request with index bits 00, and both lines within cache set 00 are filled. address request with index bits 00, and both lines within cache set 00 are filled. Now, on a cache miss, the requested content has to be stored in a set, while one Now, on a cache miss, the requested content has to be stored in a set, while one line has to be evicted, i.e. replaced. There are many replacement policies, such line has to be evicted, i.e. replaced. There are many replacement policies, such as, Least-Recently Used (LRU), First-In-First-Out (FIFO), Pseudo-Recently as, Least-Recently Used (LRU), First-In-First-Out (FIFO), Pseudo-Recently Used (PLRU), etc. In this thesis, we are interested only in LRU replacement Used (PLRU), etc. In this thesis, we are interested only in LRU replacement policy, and this policy evicts the least-recently-used cache line. policy, and this policy evicts the least-recently-used cache line.

Multi-level cache Multi-level cache

In general, one of the main problems about cache memory is the trade-off In general, one of the main problems about cache memory is the trade-off between cache latency and hit rate. While small cache may beneﬁt from shorter between cache latency and hit rate. While small cache may beneﬁt from shorter latency, they can suffer from cache misses, i.e. low hit rates. Contrary, large latency, they can suffer from cache misses, i.e. low hit rates. Contrary, large caches have higher hit rates but longer latency. This problem led to the invention caches have higher hit rates but longer latency. This problem led to the invention of multi-level caches. Caches are now often organised such that the closest of multi-level caches. Caches are now often organised such that the closest cache to the CPU is also the one with the shortest latency, but also the smallest cache to the CPU is also the one with the shortest latency, but also the smallest size. As the hierarchy goes up, the latency increases together with the cache size. As the hierarchy goes up, the latency increases together with the cache size. Therefore, today we have level 1 (L1), level 2 (L2), and other cache levels. size. Therefore, today we have level 1 (L1), level 2 (L2), and other cache levels. With the increase of cache levels, the predictability of their behaviour becomes With the increase of cache levels, the predictability of their behaviour becomes a harder problem to solve. a harder problem to solve. 56

26 Chapter 2. Background 26 Chapter 2. Background

2.6 Cache-related preemption delay (CRPD) 2.6 Cache-related preemption delay (CRPD)

Each task is executed in a central processing unit as a series of instructions, but Each task is executed in a central processing unit as a series of instructions, but as we showed in the previous pages, those instructions are handed to CPU over as we showed in the previous pages, those instructions are handed to CPU over the cache-memory. Because of that, the cache memory can severely impact the the cache-memory. Because of that, the cache memory can severely impact the task’s execution and its worst-case execution time. In this chapter, we introduce task’s execution and its worst-case execution time. In this chapter, we introduce one of the most important concepts in this thesis – a cache-related preemption one of the most important concepts in this thesis – a cache-related preemption delay. This type of delay is the largest and the most important part of preemption delay. This type of delay is the largest and the most important part of preemption delays [22],[11]. Bui et al. [23] showed that on a PowerPC MPC7410 with delays [22],[11]. Bui et al. [23] showed that on a PowerPC MPC7410 with 2 MB two-way associative L2 cache, the WCET inflation due to this type of 2 MB two-way associative L2 cache, the WCET inflation due to this type of delay can be as large as 33% of the worst-case execution time measured in delay can be as large as 33% of the worst-case execution time measured in non-preemptive mode. For the above reasons, an accurate CRPD approximation non-preemptive mode. For the above reasons, an accurate CRPD approximation is a relevant problem in many contexts, including the aerospace industry [8, 9]. is a relevant problem in many contexts, including the aerospace industry [8, 9]. CRPD is defined as the time needed to reload all the requested memory CRPD is defined as the time needed to reload all the requested memory blocks which are evicted from their respective cache blocks during preemptions. blocks which are evicted from their respective cache blocks during preemptions.

Figure 2.13. Example of a cache-related preemption delay. Figure 2.13. Example of a cache-related preemption delay.

For example, in Figure 2.13 we show the preemption of τi by τh considering For example, in Figure 2.13 we show the preemption of τi by τh considering only CRPD. The circle denotes that a certain cache block is being accessed only CRPD. The circle denotes that a certain cache block is being accessed by the task. We notice that during the execution of τi, it accesses the content by the task. We notice that during the execution of τi, it accesses the content from the cache block m. Upon preemption, τh evicts the previous content from the cache block m. Upon preemption, τh evicts the previous content from m, which does not anymore belong to τi. Since in the remainder of the from m, which does not anymore belong to τi. Since in the remainder of the execution of τi the cache block m will be accessed again, we need to account execution of τi the cache block m will be accessed again, we need to account for an extra delay associated with the preemption, as illustrated by the black for an extra delay associated with the preemption, as illustrated by the black rectangle in Figure 2.13. This delay occurs just after the cache miss, i.e. after the rectangle in Figure 2.13. This delay occurs just after the cache miss, i.e. after the memory content is requested. Upon reloading it from the higher memory unit, memory content is requested. Upon reloading it from the higher memory unit, the requested content from the cache block is successfully accessed, resulting in the requested content from the cache block is successfully accessed, resulting in a cache hit. a cache hit. 57

2.7 Analysis of temporal correctness of real-time systems 27 2.7 Analysis of temporal correctness of real-time systems 27

In the general case, each task accesses and evicts some cache blocks through- In the general case, each task accesses and evicts some cache blocks throughout its execution, and if a task is preempted, the preempting task may evict the out its execution, and if a task is preempted, the preempting task may evict the content of the cache blocks that are previously used by the preempted task. If content of the cache blocks that are previously used by the preempted task. If the preempted task reuses some of those evicted cache blocks in its remaining the preempted task reuses some of those evicted cache blocks in its remaining execution, they need to be reloaded from the higher memory units in order to execution, they need to be reloaded from the higher memory units in order to continue the correct execution of the task. continue the correct execution of the task. Given the all previously introduced terms in this chapter, one may notice Given the all previously introduced terms in this chapter, one may notice that real-time systems depend on many parameters such as cache memory type, that real-time systems depend on many parameters such as cache memory type, replacement policy, task types, schedulers, CPUs, etc. In order to verify real- replacement policy, task types, schedulers, CPUs, etc. In order to verify real- time-system correctness, researchers developed and are still developing many time-system correctness, researchers developed and are still developing many types of formal analyses. A part of those we describe in the following section. types of formal analyses. A part of those we describe in the following section.

2.7 Analysis of temporal correctness of real-time 2.7 Analysis of temporal correctness of real-time systems systems

Since the real-time tasks and temporal properties of a system are affected by Since the real-time tasks and temporal properties of a system are affected by many different entities, in order to guarantee that all tasks in a system satisfy the many different entities, in order to guarantee that all tasks in a system satisfy the given time constraints we use many different types of analysis. In this chapter, given time constraints we use many different types of analysis. In this chapter, we name a few. we name a few.

2.7.1 Feasibility and Schedulability Analysis 2.7.1 Feasibility and Schedulability Analysis Feasibility and Schedulability analyses test real-time system guarantees in the Feasibility and Schedulability analyses test real-time system guarantees in the following ways: following ways:

Feasibility analysis – An analysis which determines whether there exists Feasibility analysis – An analysis which determines whether there exists a schedule such that all the tasks of a taskset satisfy the temporal and any a schedule such that all the tasks of a taskset satisfy the temporal and any other constraints. other constraints.

Schedulability analysis – An analysis which determines whether all the Schedulability analysis – An analysis which determines whether all the tasks of a taskset satisfy their temporal and any other constraints when tasks of a taskset satisfy their temporal and any other constraints when scheduled with a given scheduling algorithm. scheduled with a given scheduling algorithm.

Response-Time Analysis Response-Time Analysis To guarantee that all of the tasks can meet their deadlines, in this thesis we To guarantee that all of the tasks can meet their deadlines, in this thesis we mostly use the response-time analysis. For each task of a taskset, this analysis mostly use the response-time analysis. For each task of a taskset, this analysis calculates the maximum response time. If for all tasks of a taskset the maximum calculates the maximum response time. If for all tasks of a taskset the maximum 58

28 Chapter 2. Background 28 Chapter 2. Background

response time of each task is less or equal to its relative deadline (Ri ≤ Di)it response time of each task is less or equal to its relative deadline (Ri ≤ Di)it means that the taskset is schedulable. means that the taskset is schedulable.

2.7.2 Timing and cache analysis 2.7.2 Timing and cache analysis In the domain of real-time systems, a timing analysis is such analysis whose In the domain of real-time systems, a timing analysis is such analysis whose goal is to derive safe bounds on the execution time of a task. In case of hard goal is to derive safe bounds on the execution time of a task. In case of hard real-time systems, the goal is often to derive safe worst-case execution time of real-time systems, the goal is often to derive safe worst-case execution time of a task, and the analysis used for computing these values is called worst-case a task, and the analysis used for computing these values is called worst-case execution time analysis. When the analysis derives WCET values for each execution time analysis. When the analysis derives WCET values for each task are, those values are used in the schedulability, or some other higher-level task are, those values are used in the schedulability, or some other higher-level analysis which considers joint task interaction to prove the timing-behaviour analysis which considers joint task interaction to prove the timing-behaviour correctness of a system. The important type of timing analysis is static timing correctness of a system. The important type of timing analysis is static timing analysis which forms and uses abstract models of the system to safely bound the analysis which forms and uses abstract models of the system to safely bound the execution time of a task. Similarly, cache analysis forms abstract cache model execution time of a task. Similarly, cache analysis forms abstract cache model to classify a memory access into cache hit or cache miss. As we mentioned to classify a memory access into cache hit or cache miss. As we mentioned before, cache memory can signiﬁcantly impact the performance of a real-time before, cache memory can signiﬁcantly impact the performance of a real-time system, and this type of analysis is very important for computing tight timing system, and this type of analysis is very important for computing tight timing bounds, as shown by [20]. bounds, as shown by [20]. For more information on timing and cache analysis, refer to [24, 25, 26, 27, For more information on timing and cache analysis, refer to [24, 25, 26, 27, 28, 29]. 28, 29].

2.7.3 Cache-related preemption delay analysis 2.7.3 Cache-related preemption delay analysis While the goal of cache analysis is to classify hits and misses and discover how While the goal of cache analysis is to classify hits and misses and discover how cache accesses affect the WCET of a task, the goal of cache-related preemption cache accesses affect the WCET of a task, the goal of cache-related preemption delay analysis is to bound the CRPD. Relevant questions to CRPD analysis are: delay analysis is to bound the CRPD. Relevant questions to CRPD analysis are:

Which cache blocks are accessed by a preempted task, before preemption? Which cache blocks are accessed by a preempted task, before preemption?

Which cache blocks are evicted by the preempting task? Which cache blocks are evicted by the preempting task?

Which cache blocks are used in the remaining execution of the preempted Which cache blocks are used in the remaining execution of the preempted task, after preemption. task, after preemption.

The difference between CRPD and cache analysis is that cache analysis The difference between CRPD and cache analysis is that cache analysis can be used even for non-preemptive tasks since it examines how WCET of a can be used even for non-preemptive tasks since it examines how WCET of a task can be prolonged due to internal cache usage, while the CRPD examines task can be prolonged due to internal cache usage, while the CRPD examines how preemptions affect cache and WCET, thus it is only useful for systems how preemptions affect cache and WCET, thus it is only useful for systems employing preemptive scheduling. employing preemptive scheduling. 59

2.8 Summary 29 2.8 Summary 29

The main question of CRPD analysis is: What is the upper bound on the The main question of CRPD analysis is: What is the upper bound on the number of necessary cache block reloads caused by preemptions? Thus the number of necessary cache block reloads caused by preemptions? Thus the main goal of the CRPD analysis is to derive the upper bound on the number of main goal of the CRPD analysis is to derive the upper bound on the number of cache blocks which can be evicted by the preempting tasks and might be used cache blocks which can be evicted by the preempting tasks and might be used after preemption, thus resulting in cache block reloads. after preemption, thus resulting in cache block reloads.

2.8 Summary 2.8 Summary

In this chapter, we described the building blocks of real-time and embedded In this chapter, we described the building blocks of real-time and embedded systems and showed how many different entities, such as tasks, scheduler, cache systems and showed how many different entities, such as tasks, scheduler, cache memory, main memory, CPU, etc. can interact between each other. We also memory, main memory, CPU, etc. can interact between each other. We also showed how important it is to analyse those interactions in order to provide showed how important it is to analyse those interactions in order to provide safe temporal guarantees of real-time systems. At the end of this chapter, we safe temporal guarantees of real-time systems. At the end of this chapter, we showed the signiﬁcance of cache-related preemption delays in real-time systems, showed the signiﬁcance of cache-related preemption delays in real-time systems, and why it is very important to compute tight CRPD estimations if possible. and why it is very important to compute tight CRPD estimations if possible. In the remainder of this thesis, we propose novel CRPD analyses and novel In the remainder of this thesis, we propose novel CRPD analyses and novel CRPD-aware response-time analyses. In the following chapter, we describe a CRPD-aware response-time analyses. In the following chapter, we describe a brief overview of the research process and thesis contributions. brief overview of the research process and thesis contributions. 60 61

Chapter 3 Chapter 3

Research Description Research Description

In this chapter, we ﬁrst describe the research process in Section 3.1, then we In this chapter, we ﬁrst describe the research process in Section 3.1, then we describe research goals in Section 3.2, followed by the thesis contributions in describe research goals in Section 3.2, followed by the thesis contributions in Section 3.3. The chapter is concluded with Section 3.4 where we describe the Section 3.3. The chapter is concluded with Section 3.4 where we describe the publications which form the thesis. publications which form the thesis.

3.1 Research process 3.1 Research process

In this thesis, we define and evaluate novel solutions (methods, techniques In this thesis, we define and evaluate novel solutions (methods, techniques and theoretical foundations) in order to reduce the problem of false-negative and theoretical foundations) in order to reduce the problem of false-negative schedulability results in the context of real-time systems under preemptive schedulability results in the context of real-time systems under preemptive scheduling. Such type of research that implies the construction of an artefact scheduling. Such type of research that implies the construction of an artefact based on the existing knowledge which is then used to produce new knowledge based on the existing knowledge which is then used to produce new knowledge about how a problem can be solved is defined as constructive research, as about how a problem can be solved is defined as constructive research, as described by Dodig-Crnkovic [30]. Constructive research is widely used in described by Dodig-Crnkovic [30]. Constructive research is widely used in computer science and its research process consists of the following four steps: computer science and its research process consists of the following four steps:

1. Problem formulation, which deﬁnes: 1. Problem formulation, which deﬁnes: the problem that we want to solve. the problem that we want to solve. the problem relevance from the theoretical and practical perspective. the problem relevance from the theoretical and practical perspective. the current state of the art and state of the practice with respect to the current state of the art and state of the practice with respect to the problem. the problem.

31 31 62

32 Chapter 3. Research Description 32 Chapter 3. Research Description

2. Solution proposal, which: 2. Solution proposal, which: describes the existing knowledge that can be used in order to solve describes the existing knowledge that can be used in order to solve the problem. the problem. fills conceptual and other knowledge gaps, by purposefully tailored fills conceptual and other knowledge gaps, by purposefully tailored building blocks to support the solution construct. building blocks to support the solution construct. defines the theoretical solution construct. defines the theoretical solution construct. 3. Implementation of the solution, which: 3. Implementation of the solution, which: constructs a practical solution from the theoretical construct. constructs a practical solution from the theoretical construct. 4. Evaluation, which : 4. Evaluation, which : asserts whether the proposed solution solves the problem. asserts whether the proposed solution solves the problem. asserts whether the proposed solution is better than the other solu- asserts whether the proposed solution is better than the other solutions (if they exist). tions (if they exist).

Figure 3.1. Research process steps. Figure 3.1. Research process steps.

In addition to the described four steps, the research process of each research In addition to the described four steps, the research process of each research goal (see Figure 3.1) starts with a literature survey where we explore the current goal (see Figure 3.1) starts with a literature survey where we explore the current state of the art, i.e. the existing knowledge. Next, we identify the problems, such state of the art, i.e. the existing knowledge. Next, we identify the problems, such as sources of over-approximation for the schedulability and timing analyses as sources of over-approximation for the schedulability and timing analyses in the context of preemptive scheduling. For each of the defined problems we in the context of preemptive scheduling. For each of the defined problems we construct and propose a solution which is later implemented and evaluated. construct and propose a solution which is later implemented and evaluated. For the experiments we used methods for empirical evaluation which are well For the experiments we used methods for empirical evaluation which are well accepted in real-time research, e.g., the UUnifast algorithm [31] for the gener- accepted in real-time research, e.g., the UUnifast algorithm [31] for the generation of tasks, and cache parameter utilisations. To perform experiments on a ation of tasks, and cache parameter utilisations. To perform experiments on a broad range of different cache-usage and task parameters, we either used artifi- broad range of different cache-usage and task parameters, we either used artificial task generation, within the selected parameter limitations or well-accepted cial task generation, within the selected parameter limitations or well-accepted benchmarks in real-time research, Mälardalen [3] and TACLe [4]. benchmarks in real-time research, Mälardalen [3] and TACLe [4]. 63

3.2 Research Goals 33 3.2 Research Goals 33

3.2 Research Goals 3.2 Research Goals

In order to reduce the false-negative schedulability approximations in the con- In order to reduce the false-negative schedulability approximations in the context of preemptive real-time systems, it is highly important to as accurately as text of preemptive real-time systems, it is highly important to as accurately as possible account for preemption delays. Thus, the overall goal of the thesis is: possible account for preemption delays. Thus, the overall goal of the thesis is:

"To improve the accuracy of schedulability analyses to identify schedulable "To improve the accuracy of schedulability analyses to identify schedulable tasksets in real-time systems under ﬁxed-priority preemptive scheduling, by tasksets in real-time systems under ﬁxed-priority preemptive scheduling, by accounting for tight and safe approximations of preemption delays". accounting for tight and safe approximations of preemption delays".

Next to the accuracy of identifying schedulable tasksets, we also focused Next to the accuracy of identifying schedulable tasksets, we also focused on practicality in terms of complexity of the proposed solutions, and those are on practicality in terms of complexity of the proposed solutions, and those are explained in more detail for each contribution individually, in the chapters where explained in more detail for each contribution individually, in the chapters where they are described. The main goal is further decomposed into the following they are described. The main goal is further decomposed into the following three directions, for three types of real-time systems: three directions, for three types of real-time systems:

3.2.1 Research Goal 1 3.2.1 Research Goal 1 RG1: To improve the accuracy of the schedulability analysis for tasks with ﬁxed RG1: To improve the accuracy of the schedulability analysis for tasks with ﬁxed preemption points in single-core real-time systems. preemption points in single-core real-time systems.

The existing schedulability analyses for tasks with ﬁxed preemption points are The existing schedulability analyses for tasks with ﬁxed preemption points are affected by many sources of cache-related preemption delay over-approximation. affected by many sources of cache-related preemption delay over-approximation. Therefore, such analyses may deem schedulable tasksets being unschedulable. Therefore, such analyses may deem schedulable tasksets being unschedulable. For this reason, we identify several sources of over-approximation in the current For this reason, we identify several sources of over-approximation in the current state of the art and propose solutions for improving the accuracy of per task state of the art and propose solutions for improving the accuracy of per task preemption delay upper bound approximation, and schedulability analysis. preemption delay upper bound approximation, and schedulability analysis.

3.2.2 Research Goal 2 3.2.2 Research Goal 2 RG2: To improve the accuracy of the schedulability analysis for fully-preemptive RG2: To improve the accuracy of the schedulability analysis for fully-preemptive tasks in single-core real-time systems. tasks in single-core real-time systems.

The existing schedulability analyses for fully-preemptive tasks account for The existing schedulability analyses for fully-preemptive tasks account for the cache-related preemption delay over-approximation which can be reduced the cache-related preemption delay over-approximation which can be reduced by a more accurate analysis. For this reason, we identify sources of over- by a more accurate analysis. For this reason, we identify sources of over- approximation in the state of the art approaches and propose a novel analysis approximation in the state of the art approaches and propose a novel analysis which improves the accuracy of identifying schedulable tasksets. which improves the accuracy of identifying schedulable tasksets. 64

34 Chapter 3. Research Description 34 Chapter 3. Research Description

3.2.3 Research Goal 3 3.2.3 Research Goal 3 RG3: To improve the accuracy of the schedulability analysis for tasks with RG3: To improve the accuracy of the schedulability analysis for tasks with ﬁxed-preemption points in partitioned multi-core systems. ﬁxed-preemption points in partitioned multi-core systems.

Partitioned scheduling can be seen as a set of several single-core cases, and Partitioned scheduling can be seen as a set of several single-core cases, and it is widely used in the industrial domain. Since it is shown by Butazzo et al. it is widely used in the industrial domain. Since it is shown by Butazzo et al. [12] that in the single-core case, fixed preemption points scheduling dominates [12] that in the single-core case, fixed preemption points scheduling dominates over the other limited preemptive approaches, and also over the non-preemptive over the other limited preemptive approaches, and also over the non-preemptive and the fully-preemptive scheduling, our goal is to investigate possible im- and the fully-preemptive scheduling, our goal is to investigate possible improvements in schedulability analysis when using fixed-preemption points and provements in schedulability analysis when using fixed-preemption points and partitioning in the multi-core real-time systems. partitioning in the multi-core real-time systems.

3.3 Thesis Contributions 3.3 Thesis Contributions

After defining the research goals, we define the thesis contributions, mapping After defining the research goals, we define the thesis contributions, mapping them to the research goals they contribute to. We first list the contributions and them to the research goals they contribute to. We first list the contributions and then describe them in more detail in the following subsections. then describe them in more detail in the following subsections. To achieve research goal RG1, we present: To achieve research goal RG1, we present:

Contribution C1: A method for tightening the bounds on cache-related Contribution C1: A method for tightening the bounds on cache-related preemption delay in ﬁxed-preemption points scheduling, taking into preemption delay in ﬁxed-preemption points scheduling, taking into account infeasible preemption combinations and infeasible useful cache account infeasible preemption combinations and infeasible useful cache block reloads. block reloads.

Contribution C2: A novel preemption-delay aware response-time analysis Contribution C2: A novel preemption-delay aware response-time analysis for tasks with ﬁxed preemption points, taking into account the maximum for tasks with ﬁxed preemption points, taking into account the maximum number of times each memory cache block can be reloaded upon all number of times each memory cache block can be reloaded upon all possible preemptions, and also accounting for the individual preemptions possible preemptions, and also accounting for the individual preemptions on the non-preemptive regions. on the non-preemptive regions.

To achieve research goal RG2, we deﬁne: To achieve research goal RG2, we deﬁne:

Contribution C3: A novel preemption-delay aware response-time analysis Contribution C3: A novel preemption-delay aware response-time analysis for fully-preemptive tasks, taking into account that all occurring preemp- for fully-preemptive tasks, taking into account that all occurring preemptions, within some time interval, can be divided into groups that can tions, within some time interval, can be divided into groups that can be analysed separately which can eventually improve the schedulability be analysed separately which can eventually improve the schedulability analysis. analysis.

To achieve research goal RG3, we deﬁne: To achieve research goal RG3, we deﬁne: 65

3.3 Thesis Contributions 35 3.3 Thesis Contributions 35

Contribution C4: A partitioning criterion for Worst-Fit Decreasing-based Contribution C4: A partitioning criterion for Worst-Fit Decreasing-based partitioning, taking into account that the maximum blocking tolerance partitioning, taking into account that the maximum blocking tolerance together with preemption-delay analysis can be a useful partitioning together with preemption-delay analysis can be a useful partitioning determinant since it quantiﬁes the peak processing demand which may determinant since it quantiﬁes the peak processing demand which may jeopardise the schedulability of preemptive tasks in a processor. jeopardise the schedulability of preemptive tasks in a processor.

Contribution C5: A comparison of different partitioning strategies in the Contribution C5: A comparison of different partitioning strategies in the context of ﬁxed-preemption points, fully-preemptive, and non-preemptive context of ﬁxed-preemption points, fully-preemptive, and non-preemptive scheduling. scheduling.

In Table 3.1, we present the mapping between the research goals and the In Table 3.1, we present the mapping between the research goals and the contributions deﬁned in the thesis. contributions deﬁned in the thesis.

Table 3.1. Mapping between the research goals and the contributions Table 3.1. Mapping between the research goals and the contributions C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 RG1 x x RG1 x x RG2 x RG2 x RG3 x x RG3 x x

3.3.1 Research contribution C1 3.3.1 Research contribution C1 C1: A method for tightening the bounds on cache-related preemption delay in C1: A method for tightening the bounds on cache-related preemption delay in ﬁxed-preemption points scheduling, taking into account infeasible preemption ﬁxed-preemption points scheduling, taking into account infeasible preemption combinations and infeasible useful cache block reloads. combinations and infeasible useful cache block reloads.

In the remainder, we describe the two main approaches used in C1. In the remainder, we describe the two main approaches used in C1.

Infeasible preemption combinations Infeasible preemption combinations

The existing CRPD-based analyses account for CRPD upper bounds which The existing CRPD-based analyses account for CRPD upper bounds which may include infeasible preemption combinations. By accounting only for the may include infeasible preemption combinations. By accounting only for the feasible preemption combinations the upper bound on CRPD can be signiﬁcantly feasible preemption combinations the upper bound on CRPD can be signiﬁcantly reduced. The problem consists of two main subproblems: reduced. The problem consists of two main subproblems:

How to identify infeasible preemption combinations? How to identify infeasible preemption combinations?

How to determine which feasible combination results in the worst-case How to determine which feasible combination results in the worst-case CRPD? CRPD? 66

36 Chapter 3. Research Description 36 Chapter 3. Research Description

To address the first subproblem we defined a condition for identifying To address the first subproblem we defined a condition for identifying infeasible preemption combinations, and for the second subproblem we defined infeasible preemption combinations, and for the second subproblem we defined a constraint satisfaction problem which derives the maximum CRPD based on a constraint satisfaction problem which derives the maximum CRPD based on the preemption combinations not deemed infeasible. the preemption combinations not deemed infeasible.

Infeasible useful cache block reloads Infeasible useful cache block reloads The existing approaches for CRPD estimation in the context of LP-FPP schedul- The existing approaches for CRPD estimation in the context of LP-FPP scheduling do not consider the fact that once we account for the eviction of the useful ing do not consider the fact that once we account for the eviction of the useful cache block at some preemption point, the reload of the same memory block cache block at some preemption point, the reload of the same memory block should be accounted at most once until the succeeding non-preemptive region should be accounted at most once until the succeeding non-preemptive region where the memory block is re-accessed by the same task. where the memory block is re-accessed by the same task. Therefore, we formulate the following sub-problems: Therefore, we formulate the following sub-problems: How to identify infeasible useful cache block reloads? How to identify infeasible useful cache block reloads? How to determine what feasible preemption combinations and useful How to determine what feasible preemption combinations and useful cache block reloads result in a worst-case CRPD? cache block reloads result in a worst-case CRPD? To address the first subproblem, we defined a condition for identifying To address the first subproblem, we defined a condition for identifying infeasible useful cache block reloads, and for the second subproblem we defined infeasible useful cache block reloads, and for the second subproblem we defined a constraint satisfaction model which derives the maximum CRPD based on a constraint satisfaction model which derives the maximum CRPD based on the feasible useful cache block reloads together with the feasible preemption the feasible useful cache block reloads together with the feasible preemption combinations. The feasible useful cache block reloads depend on the feasible combinations. The feasible useful cache block reloads depend on the feasible preemption combinations, which is why we use the constraint satisfaction preemption combinations, which is why we use the constraint satisfaction problem to compute an upper bound on the CRPD. problem to compute an upper bound on the CRPD.

3.3.2 Research contribution C2 3.3.2 Research contribution C2 C2: A novel preemption-delay aware response-time analysis for tasks with ﬁxed C2: A novel preemption-delay aware response-time analysis for tasks with ﬁxed preemption points, taking into account the maximum number of times each preemption points, taking into account the maximum number of times each cache memory block can be reloaded upon all possible preemptions, and also cache memory block can be reloaded upon all possible preemptions, and also accounting for the individual preemptions on the non-preemptive regions. accounting for the individual preemptions on the non-preemptive regions.

In the current state of the art for preemptive scheduling of tasks with fixed In the current state of the art for preemptive scheduling of tasks with fixed preemption points, the existing schedulability analysis considers overly pes- preemption points, the existing schedulability analysis considers overly pessimistic estimation of cache-related preemption delay, which eventually leads simistic estimation of cache-related preemption delay, which eventually leads to overly pessimistic schedulability results. We propose a novel response time to overly pessimistic schedulability results. We propose a novel response time analysis for real-time tasks with fixed preemption points, accounting for a more analysis for real-time tasks with fixed preemption points, accounting for a more accurate estimation of cache-related preemption delays. This is achieved by accurate estimation of cache-related preemption delays. This is achieved by accounting for: accounting for: 67

3.3 Thesis Contributions 37 3.3 Thesis Contributions 37

upper-bounded number of times a memory block can be reloaded. This upper-bounded number of times a memory block can be reloaded. This number is important because a cache memory block cannot be reloaded number is important because a cache memory block cannot be reloaded more times than it can be accessed throughout the execution of a task, but more times than it can be accessed throughout the execution of a task, but also it cannot be reloaded more times than it can be evicted by the pre- also it cannot be reloaded more times than it can be evicted by the preempting tasks. The absence of this joint view led to over-approximation empting tasks. The absence of this joint view led to over-approximation in the existing analyses. in the existing analyses.

individual preemptions on the non-preemptive regions. In this approach, individual preemptions on the non-preemptive regions. In this approach, the goal is to ﬁnd the worst-case mapping between possible preempting the goal is to ﬁnd the worst-case mapping between possible preempting jobs and points they may preempt. The worst-case mapping is the one jobs and points they may preempt. The worst-case mapping is the one that yields the highest CRPD. that yields the highest CRPD.

Either of the above-mentioned approaches may result in a tighter CRPD Either of the above-mentioned approaches may result in a tighter CRPD compared to the other method, and they jointly improve the schedulability compared to the other method, and they jointly improve the schedulability analysis by providing more accurate CRPD approximations. analysis by providing more accurate CRPD approximations. The evaluation shows that the proposed analysis signiﬁcantly dominates the The evaluation shows that the proposed analysis signiﬁcantly dominates the existing approach by being able to always identify more schedulable tasksets. existing approach by being able to always identify more schedulable tasksets.

3.3.3 Research contribution C3 3.3.3 Research contribution C3

C5: A novel preemption-delay aware response-time analysis for fully-preemptive C5: A novel preemption-delay aware response-time analysis for fully-preemptive tasks, taking into account that all occurring preemptions within some time in- tasks, taking into account that all occurring preemptions within some time interval, can be decomposed and then analysed in different preemption groups, terval, can be decomposed and then analysed in different preemption groups, which can eventually improve the schedulability analysis. which can eventually improve the schedulability analysis.

The existing CRPD-aware response-time analyses over-approximate the The existing CRPD-aware response-time analyses over-approximate the cumulative cache-related preemption delay that is caused by multiple possi- cumulative cache-related preemption delay that is caused by multiple possible preemptions within a specified time interval. The main reason for over- ble preemptions within a specified time interval. The main reason for over- approximation is that the existing approaches treat all preemptions with one approximation is that the existing approaches treat all preemptions with one method at a time. method at a time. We first identified that all possible preemptions within a specified time We first identified that all possible preemptions within a specified time interval can be partitioned in different groups of preemptions, which may be interval can be partitioned in different groups of preemptions, which may be analysed individually with different methods that may produce tighter CRPD analysed individually with different methods that may produce tighter CRPD results. Then, we proposed a novel algorithm for computing the cumulative results. Then, we proposed a novel algorithm for computing the cumulative CRPD resulting from the partitioned set of preemption groups. Finally, we CRPD resulting from the partitioned set of preemption groups. Finally, we proposed a response-time analysis that integrates the proposed CRPD analysis. proposed a response-time analysis that integrates the proposed CRPD analysis. The evaluation shows that the proposed schedulability analysis more accurately The evaluation shows that the proposed schedulability analysis more accurately identifies schedulable tasksets compared to SOTA approaches. identifies schedulable tasksets compared to SOTA approaches. 68

38 Chapter 3. Research Description 38 Chapter 3. Research Description

3.3.4 Research contribution C4 3.3.4 Research contribution C4 C4: A partitioning criterion for the Worst-Fit Decreasing-based partition- C4: A partitioning criterion for the Worst-Fit Decreasing-based partitioning, taking into account that the maximum blocking tolerance together with ing, taking into account that the maximum blocking tolerance together with preemption-delay analysis can be a useful partitioning determinant since it preemption-delay analysis can be a useful partitioning determinant since it quantiﬁes the peak processing demand which may jeopardise the schedulability quantiﬁes the peak processing demand which may jeopardise the schedulability of preemptive tasks in a processor. of preemptive tasks in a processor.

The existing approaches for fixed preemption points multi-core scheduling The existing approaches for fixed preemption points multi-core scheduling do not consider partitioned multi-core systems. Partitioned systems are well do not consider partitioned multi-core systems. Partitioned systems are well accepted in the industry and they can be seen as a set of single-core scheduling accepted in the industry and they can be seen as a set of single-core scheduling cases since a subset of a taskset is allocated to a specified processor. The exact cases since a subset of a taskset is allocated to a specified processor. The exact allocation of tasks to the processors affects the preemption delays and eventually allocation of tasks to the processors affects the preemption delays and eventually affects schedulability of the real-time system. The optimal allocation problem affects schedulability of the real-time system. The optimal allocation problem is proven to be NP-hard, as it is a variation of the bin-packing problem. is proven to be NP-hard, as it is a variation of the bin-packing problem.

Figure 3.2. Multicore partitioning of the tasks, using ﬁxed preemption points. Figure 3.2. Multicore partitioning of the tasks, using ﬁxed preemption points.

In Figure 3.2 we show the overall process which first orders a taskset In Figure 3.2 we show the overall process which first orders a taskset according to a specified task parameter, then the tasks are allocated one by according to a specified task parameter, then the tasks are allocated one by one considering fixed preemption points approach and specified partitioning one considering fixed preemption points approach and specified partitioning criterion. Partitioning criteria are based on the known bin-packing heuristics, criterion. Partitioning criteria are based on the known bin-packing heuristics, for example, First Fit Decreasing (FFD) or Worst Fit Decreasing (WFD). for example, First Fit Decreasing (FFD) or Worst Fit Decreasing (WFD). We contribute to RG3 by integrating the fixed-preemption points approach We contribute to RG3 by integrating the fixed-preemption points approach with the partitioned scheduling, and we propose a novel partitioning criterion with the partitioned scheduling, and we propose a novel partitioning criterion which is used during the taskset allocation considering the WFD bin packing which is used during the taskset allocation considering the WFD bin packing heuristics. Unlike, FFD based partitioning, which needs only a binary schedula- heuristics. Unlike, FFD based partitioning, which needs only a binary schedula- 69

3.3 Thesis Contributions 39 3.3 Thesis Contributions 39 bility result in order to allocate a task to a processor, WFD based partitioning bility result in order to allocate a task to a processor, WFD based partitioning needs a quantitative value which should provide a schedulability metrics for needs a quantitative value which should provide a schedulability metrics for different processors. Based on these metrics, it allocates a task to the processor different processors. Based on these metrics, it allocates a task to the processor where this quantitative value is the lowest, accounting also for the preemption where this quantitative value is the lowest, accounting also for the preemption delays that may be imposed on the tasks within a processor. The proposed delays that may be imposed on the tasks within a processor. The proposed criterion is based on the maximum blocking tolerance of the tasks in a processor, criterion is based on the maximum blocking tolerance of the tasks in a processor, because the higher the maximum blocking tolerance is, the greater the possibil- because the higher the maximum blocking tolerance is, the greater the possibility is that the next task assignment to a processor still results in a schedulable ity is that the next task assignment to a processor still results in a schedulable taskset. This is the case because the maximum blocking tolerance quantiﬁes the taskset. This is the case because the maximum blocking tolerance quantiﬁes the peak processing demand which would be able to jeopardise the schedulability peak processing demand which would be able to jeopardise the schedulability of the taskset in a processor. of the taskset in a processor.

3.3.5 Research contribution C5 3.3.5 Research contribution C5

C5: A comparison of different partitioning strategies in the context of ﬁxed- C5: A comparison of different partitioning strategies in the context of ﬁxed- preemption points, fully-preemptive, and non-preemptive scheduling. preemption points, fully-preemptive, and non-preemptive scheduling.

We also contribute to RG3 by comparing different combinations of bin We also contribute to RG3 by comparing different combinations of bin packing heuristics and taskset orderings to investigate which one improves packing heuristics and taskset orderings to investigate which one improves schedulability the most under different system parameters. The evaluation, schedulability the most under different system parameters. The evaluation, performed on randomly generated tasksets, shows that in the general case, no performed on randomly generated tasksets, shows that in the general case, no single partitioning strategy fully dominates the others. However, the evaluation single partitioning strategy fully dominates the others. However, the evaluation results reveal that certain partitioning strategies perform significantly better with results reveal that certain partitioning strategies perform significantly better with respect to the overall schedulability for specific taskset characteristics. While the respect to the overall schedulability for specific taskset characteristics. While the density-based partitioning increases the schedulability of the tasksets consisting density-based partitioning increases the schedulability of the tasksets consisting of high-density tasks, priority-based partitioning increases the schedulability of of high-density tasks, priority-based partitioning increases the schedulability of the tasksets with a higher number of tasks and lower average utilisations. On the tasksets with a higher number of tasks and lower average utilisations. On average, FFD based heuristics dominates the WFD heuristics. However, WFD average, FFD based heuristics dominates the WFD heuristics. However, WFD based heuristics increase the schedulability of the tasksets with lower average based heuristics increase the schedulability of the tasksets with lower average utilisations, since the tasks may be evenly distributed among the cores. The utilisations, since the tasks may be evenly distributed among the cores. The results also reveal that the proposed LPS-FPP partitioning strategies dominate results also reveal that the proposed LPS-FPP partitioning strategies dominate over fully-preemptive and non-preemptive partitioned scheduling. Moreover, over fully-preemptive and non-preemptive partitioned scheduling. Moreover, the proposed approach is affected significantly less by an increase of the CRPD the proposed approach is affected significantly less by an increase of the CRPD compared to fully-preemptive partitioned scheduling. compared to fully-preemptive partitioned scheduling. 70

40 Chapter 3. Research Description 40 Chapter 3. Research Description

3.4 Publications forming the thesis 3.4 Publications forming the thesis

In this section, we present basic information about the papers which form the In this section, we present basic information about the papers which form the thesis and state the author’s contribution to each of them. We ﬁrst show the thesis and state the author’s contribution to each of them. We ﬁrst show the mapping between the papers included in the thesis and the thesis contributions, mapping between the papers included in the thesis and the thesis contributions, in Table 3.2. in Table 3.2.

Table 3.2. Mapping between the papers and the contributions Table 3.2. Mapping between the papers and the contributions Paper A Paper B Paper C Paper D Paper E Paper A Paper B Paper C Paper D Paper E C1 x x C1 x x C2 x C2 x C3 x C3 x C4 x C4 x C5 x C5 x

Paper A Paper A Tightening the Bounds on Cache-Related Preemption Delay in Fixed Pre- Tightening the Bounds on Cache-Related Preemption Delay in Fixed Pre- emption Point Scheduling emption Point Scheduling Authors: Filip Markovic, Jan Carlson, Radu Dobrin Authors: Filip Markovic, Jan Carlson, Radu Dobrin Venue: The 17th International Workshop on Worst-Case Execution Time Analy- Venue: The 17th International Workshop on Worst-Case Execution Time Analy- sis (WCET 2017). (More info at [32]) sis (WCET 2017). (More info at [32])

Abstract Limited Preemptive Fixed Preemption Point scheduling (LP-FPP) Abstract Limited Preemptive Fixed Preemption Point scheduling (LP-FPP) has the ability to decrease and control the preemption-related overheads in the has the ability to decrease and control the preemption-related overheads in the real-time task systems, compared to other limited or fully preemptive schedul- real-time task systems, compared to other limited or fully preemptive scheduling approaches. However, existing methods for computing the preemption ing approaches. However, existing methods for computing the preemption overheads in LP-FPP systems rely on over-approximation of the evicting cache overheads in LP-FPP systems rely on over-approximation of the evicting cache blocks (ECB) calculations, potentially leading to pessimistic schedulability anal- blocks (ECB) calculations, potentially leading to pessimistic schedulability analysis. In this paper, we propose a novel method for preemption cost calculation ysis. In this paper, we propose a novel method for preemption cost calculation that exploits the benefits of the LP-FPP task model both at the scheduling and that exploits the benefits of the LP-FPP task model both at the scheduling and cache analysis level. The method identifies certain infeasible preemption combi- cache analysis level. The method identifies certain infeasible preemption combinations, based on analysis on the scheduling level, and combines it with cache nations, based on analysis on the scheduling level, and combines it with cache analysis information into a constraint problem from which less pessimistic upper analysis information into a constraint problem from which less pessimistic upper bounds on cache-related preemption delays (CRPD) can be derived. The evalu- bounds on cache-related preemption delays (CRPD) can be derived. The evaluation results indicate that our proposed method has the potential to significantly ation results indicate that our proposed method has the potential to significantly reduce the upper bound on CRPD, by up to 50% in our experiments, compared reduce the upper bound on CRPD, by up to 50% in our experiments, compared to the existing over-approximating calculations of the eviction scenarios. to the existing over-approximating calculations of the eviction scenarios. My contribution: I was the main contributor to the work under the supervi- My contribution: I was the main contributor to the work under the supervi- sion of the coauthors. My contributions include the problem formulation of the sion of the coauthors. My contributions include the problem formulation of the 71

3.4 Publications forming the thesis 41 3.4 Publications forming the thesis 41 paper, solution deﬁnition (jointly with professor Jan Carlson), implementation, paper, solution deﬁnition (jointly with professor Jan Carlson), implementation, and evaluation. and evaluation.

Paper B Paper B Improved Cache-Related Preemption Delay Estimation for Fixed Preemp- Improved Cache-Related Preemption Delay Estimation for Fixed Preemp- tion Point Scheduling tion Point Scheduling Authors: Filip Markovic, Jan Carlson, Radu Dobrin Authors: Filip Markovic, Jan Carlson, Radu Dobrin Venue: The 23rd International Conference on Reliable Software Technologies, Venue: The 23rd International Conference on Reliable Software Technologies, Ada-Europe 2018. (More info at [33]) Ada-Europe 2018. (More info at [33])

Abstract Cache-Related Preemption Delays (CRPD) can significantly in- Abstract Cache-Related Preemption Delays (CRPD) can significantly increase tasks’ execution time in preemptive real-time scheduling, potentially crease tasks’ execution time in preemptive real-time scheduling, potentially jeopardising the system schedulability. In order to reduce the cumulative CRPD, jeopardising the system schedulability. In order to reduce the cumulative CRPD, Limited Preemptive Scheduling (LPS) has emerged as a scheduling approach Limited Preemptive Scheduling (LPS) has emerged as a scheduling approach which limits the maximum number of preemptions encountered by real-time which limits the maximum number of preemptions encountered by real-time tasks, thus decreasing CRPD compared to fully preemptive scheduling. Further- tasks, thus decreasing CRPD compared to fully preemptive scheduling. Further- more, an instance of LPS, called Fixed Preemption Point Scheduling (LP-FPP), more, an instance of LPS, called Fixed Preemption Point Scheduling (LP-FPP), defines the exact points where the preemptions are permitted within a task, defines the exact points where the preemptions are permitted within a task, which enables a more precise CRPD estimation. The majority of the research, in which enables a more precise CRPD estimation. The majority of the research, in the domain of LP-FPP, estimates CRPD with pessimistic upper bounds, without the domain of LP-FPP, estimates CRPD with pessimistic upper bounds, without considering the possible sources of over-approximation: 1) accounting for the considering the possible sources of over-approximation: 1) accounting for the infeasible preemption combinations, and 2) accounting for the infeasible cache infeasible preemption combinations, and 2) accounting for the infeasible cache block reloads. In this paper, we improve the analysis by accounting for those block reloads. In this paper, we improve the analysis by accounting for those two cases towards a more precise estimation of the CRPD upper bounds. The two cases towards a more precise estimation of the CRPD upper bounds. The evaluation of the approach on synthetic tasksets reveals a significant reduction evaluation of the approach on synthetic tasksets reveals a significant reduction of the pessimism in the calculation of the CRPD upper bounds, compared to the of the pessimism in the calculation of the CRPD upper bounds, compared to the existing approaches. existing approaches. My contribution: I was the main contributor to the work under the supervi- My contribution: I was the main contributor to the work under the supervi- sion of the coauthors. My contributions include the problem formulation of the sion of the coauthors. My contributions include the problem formulation of the paper, i.e. the identification of the pessimism in the previous work (Paper A), paper, i.e. the identification of the pessimism in the previous work (Paper A), solution definition, implementation, and evaluation. solution definition, implementation, and evaluation.

Paper C Paper C Cache-aware response time analysis for real-time tasks with ﬁxed preemp- Cache-aware response time analysis for real-time tasks with ﬁxed preemption points tion points Authors: Filip Markovic, Jan Carlson, Radu Dobrin Authors: Filip Markovic, Jan Carlson, Radu Dobrin Venue: The 26th IEEE Real-Time and embedded technology and Applications Venue: The 26th IEEE Real-Time and embedded technology and Applications 72

42 Chapter 3. Research Description 42 Chapter 3. Research Description

Symposium (RTAS 2020). (More info at [34]) Symposium (RTAS 2020). (More info at [34])

Abstract In real-time systems which employ preemptive scheduling and Abstract In real-time systems which employ preemptive scheduling and cache architecture, it is essential to account as precisely as possible for cache cache architecture, it is essential to account as precisely as possible for cache related preemption delays in the schedulability analysis, as an imprecise estima- related preemption delays in the schedulability analysis, as an imprecise estimation may falsely deem the system unschedulable. In the current state of the art tion may falsely deem the system unschedulable. In the current state of the art for preemptive scheduling of tasks with fixed preemption points, the existing for preemptive scheduling of tasks with fixed preemption points, the existing schedulability analysis considers overly pessimistic estimation of cache-related schedulability analysis considers overly pessimistic estimation of cache-related preemption delay, which eventually leads to overly pessimistic schedulability preemption delay, which eventually leads to overly pessimistic schedulability results. In this paper, we propose a novel response time analysis for real-time results. In this paper, we propose a novel response time analysis for real-time tasks with fixed preemption points, accounting for a more precise estimation tasks with fixed preemption points, accounting for a more precise estimation of cache-related preemption delays. The evaluation shows that the proposed of cache-related preemption delays. The evaluation shows that the proposed analysis significantly dominates the existing approach by being able to always analysis significantly dominates the existing approach by being able to always identify more schedulable tasksets. identify more schedulable tasksets. My contribution: I was the main contributor to the work under the supervi- My contribution: I was the main contributor to the work under the supervi- sion of the coauthors. My contributions include the problem formulation of the sion of the coauthors. My contributions include the problem formulation of the paper, solution definition, implementation, and evaluation. paper, solution definition, implementation, and evaluation.

Paper D Paper D Improving the accuracy of cache-aware response time analysis using pre- Improving the accuracy of cache-aware response time analysis using preemption partitioning emption partitioning Authors: Filip Markovic,´ Jan Carlson, Sebastian Altmeyer, Radu Dobrin Authors: Filip Markovic,´ Jan Carlson, Sebastian Altmeyer, Radu Dobrin Venue: The 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Venue: The 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). (More info at [35]) (More info at [35])

Abstract Schedulability analyses for preemptive real-time systems need Abstract Schedulability analyses for preemptive real-time systems need to take into account cache-related preemption delays (CRPD) caused by pre- to take into account cache-related preemption delays (CRPD) caused by preemptions between the tasks. The estimation of the CRPD values must be sound, emptions between the tasks. The estimation of the CRPD values must be sound, i.e. it must not be lower than the worst-case CRPD that may occur at runtime, i.e. it must not be lower than the worst-case CRPD that may occur at runtime, but also should minimise the pessimism of estimation. The existing meth- but also should minimise the pessimism of estimation. The existing methods over-approximate the computed CRPD upper bounds by accounting for ods over-approximate the computed CRPD upper bounds by accounting for multiple preemption combinations which cannot occur simultaneously during multiple preemption combinations which cannot occur simultaneously during runtime. This over-approximation may further lead to the over-approximation runtime. This over-approximation may further lead to the over-approximation of the worst-case response times of the tasks, and therefore a false-negative of the worst-case response times of the tasks, and therefore a false-negative estimation of the system’s schedulability. In this paper, we propose a more estimation of the system’s schedulability. In this paper, we propose a more precise cache-aware response time analysis for sporadic real-time systems under precise cache-aware response time analysis for sporadic real-time systems under fully-preemptive fixed priority scheduling. The evaluation shows a significant fully-preemptive fixed priority scheduling. The evaluation shows a significant improvement over the existing state of the art approaches. improvement over the existing state of the art approaches. 73

3.4 Publications forming the thesis 43 3.4 Publications forming the thesis 43

My contribution: I was the main contributor to the work under the supervi- My contribution: I was the main contributor to the work under the supervi- sion of the coauthors. My contributions include the problem formulation of the sion of the coauthors. My contributions include the problem formulation of the paper, solution deﬁnition, implementation, and evaluation. paper, solution deﬁnition, implementation, and evaluation.

Paper E Paper E A Comparison of Partitioning Scheduling Strategies for Fixed Points-based A Comparison of Partitioning Scheduling Strategies for Fixed Points-based Limited Preemptive Scheduling Limited Preemptive Scheduling Authors: Filip Markovic, Jan Carlson, Radu Dobrin Authors: Filip Markovic, Jan Carlson, Radu Dobrin Venue: IEEE Transactions on Industrial Informatics, Volume: 15 , Issue: 2 , Feb. Venue: IEEE Transactions on Industrial Informatics, Volume: 15 , Issue: 2 , Feb. 2019. (More info at [36]) 2019. (More info at [36])

Abstract The increasing industrial demand for handling complex func- Abstract The increasing industrial demand for handling complex functionalities has influenced the design of hardware architectures for time critical tionalities has influenced the design of hardware architectures for time critical embedded systems, during the past decade. Multi-core systems with hierarchi- embedded systems, during the past decade. Multi-core systems with hierarchi- cal cache architecture facilitate the inclusion of many complex functionalities, cal cache architecture facilitate the inclusion of many complex functionalities, while, however, introducing cache related overheads, as well as partitioning while, however, introducing cache related overheads, as well as partitioning complexity to the overall system schedulability. One of the efficient paradigms complexity to the overall system schedulability. One of the efficient paradigms for controlling and reducing the cache related costs in real-time systems is for controlling and reducing the cache related costs in real-time systems is Limited Pre- emptive Scheduling (LPS), with its particular instance Fixed Pre- Limited Pre- emptive Scheduling (LPS), with its particular instance Fixed Pre- emption Points Scheduling (FPPS), that has been shown to outperform other emption Points Scheduling (FPPS), that has been shown to outperform other alternatives as well as has been supported by and investigated in the automotive alternatives as well as has been supported by and investigated in the automotive domain. With respect to the partitioning constraints, Partitioned Scheduling has domain. With respect to the partitioning constraints, Partitioned Scheduling has been widely used to pre-runtime allocate tasks to specific cores, resulting in a been widely used to pre-runtime allocate tasks to specific cores, resulting in a predictable cache-related preemption delays approximations. In this paper we predictable cache-related preemption delays approximations. In this paper we propose to integrate FFPS and Partitioned Scheduling on multicore real-time propose to integrate FFPS and Partitioned Scheduling on multicore real-time systems in order to increase the overall system schedulability. We define a new systems in order to increase the overall system schedulability. We define a new joint approach for task partitioning and preemption point selection, together joint approach for task partitioning and preemption point selection, together with an investigation on partitioning strategies based on different heuristics, with an investigation on partitioning strategies based on different heuristics, such as First Fit Decreasing and Worst Fit Decreasing, and priority and density such as First Fit Decreasing and Worst Fit Decreasing, and priority and density taskset orderings. The evaluation performed on randomly generated tasksets taskset orderings. The evaluation performed on randomly generated tasksets shows that in the general case, no single partitioning strategy fully dominates the shows that in the general case, no single partitioning strategy fully dominates the others. However, the evaluation results reveal that certain partitioning strategies others. However, the evaluation results reveal that certain partitioning strategies perform significantly better with respect to the overall schedulability for specific perform significantly better with respect to the overall schedulability for specific taskset characteristics. taskset characteristics. My contribution: I was the main contributor to the work under the super- My contribution: I was the main contributor to the work under the super- vision of the coauthors. My contributions include the problem formulation of vision of the coauthors. My contributions include the problem formulation of the paper (discussed also with Abhilash Thekkillakattil), solution definition, the paper (discussed also with Abhilash Thekkillakattil), solution definition, implementation, and evaluation. implementation, and evaluation. 74

44 Chapter 3. Research Description 44 Chapter 3. Research Description

Some passages of this thesis have been quoted verbatim from the above- Some passages of this thesis have been quoted verbatim from the above- listed papers, and the author’s Licentiate thesis: listed papers, and the author’s Licentiate thesis: "Improving the Schedulability of Real Time Systems under Fixed Pre- "Improving the Schedulability of Real Time Systems under Fixed Pre- emption Point Scheduling". Filip Markovic.´ Licentiate thesis. E-Print emption Point Scheduling". Filip Markovic.´ Licentiate thesis. E-Print AB, 2018. (More info at [37]) AB, 2018. (More info at [37])

3.5 Other publications 3.5 Other publications

Other publications, not included in the thesis are: Other publications, not included in the thesis are:

"Probabilistic Response Time Analysis for Fixed Preemption Point Se- "Probabilistic Response Time Analysis for Fixed Preemption Point Se- lection". Markovic, Filip and Carlson, Jan and Dobrin, Radu and Lisper, lection". Markovic, Filip and Carlson, Jan and Dobrin, Radu and Lisper, Björn and Thekkilakattil, Abhilash. 2018 IEEE 13th International Sym- Björn and Thekkilakattil, Abhilash. 2018 IEEE 13th International Sym- posium on Industrial Embedded Systems (SIES). IEEE, 2018. (More info posium on Industrial Embedded Systems (SIES). IEEE, 2018. (More info at [38]) at [38])

"Preemption Point Selection in Limited Preemptive Scheduling using "Preemption Point Selection in Limited Preemptive Scheduling using Probabilistic Preemption Costs". Filip Markovic, Jan Carlson, Radu Probabilistic Preemption Costs". Filip Markovic, Jan Carlson, Radu Dobrin. 28th Euromicro Conference on Real-Time Systems (ECRTS Dobrin. 28th Euromicro Conference on Real-Time Systems (ECRTS 2016) (WiP). (More info at [39]) 2016) (WiP). (More info at [39])

3.6 Summary 3.6 Summary

In this chapter, we described the research process used in this thesis, goals, In this chapter, we described the research process used in this thesis, goals, contributions, and the publications which form the thesis. Before describing contributions, and the publications which form the thesis. Before describing the contributions in more detail, in the following chapter we ﬁrst describe the the contributions in more detail, in the following chapter we ﬁrst describe the related work. related work. 75

Chapter 4 Chapter 4

Related work Related work

In this chapter, we describe the related work in areas of preemption-aware In this chapter, we describe the related work in areas of preemption-aware scheduling and timing analysis of real-time systems. scheduling and timing analysis of real-time systems.

4.1 Preemption-aware schedulability and 4.1 Preemption-aware schedulability and timing analysis of single-core systems timing analysis of single-core systems

In the context of CRPD analysis for fixed-priority fully-preemptive scheduling, In the context of CRPD analysis for fixed-priority fully-preemptive scheduling, many different approaches have been proposed in the last few decades, of which many different approaches have been proposed in the last few decades, of which we describe a selection of the most important and recent ones. we describe a selection of the most important and recent ones. Tomiyama et al. [40], and Busquets-Mataix et al. [41] proposed analyses Tomiyama et al. [40], and Busquets-Mataix et al. [41] proposed analyses which are based on the over-approximation that a single preemption causes the which are based on the over-approximation that a single preemption causes the CRPD equal to the time needed for reloading all the evicting cache blocks from CRPD equal to the time needed for reloading all the evicting cache blocks from a preempting task. These analyses neglected the fact that not every eviction a preempting task. These analyses neglected the fact that not every eviction results in a cache block reload. Contrary to this, Lee et al. [42] proposed the results in a cache block reload. Contrary to this, Lee et al. [42] proposed the analysis that bounds the CRPD by accounting for the cache blocks which may analysis that bounds the CRPD by accounting for the cache blocks which may be reused at some later point in a task, called the useful cache blocks. However, be reused at some later point in a task, called the useful cache blocks. However, their analysis did not account for the fact that although useful, some cache their analysis did not account for the fact that although useful, some cache block cannot be evicted, and thus cannot result in a cache block reload. These block cannot be evicted, and thus cannot result in a cache block reload. These two opposite approaches defined the two main branches in CRPD analysis: two opposite approaches defined the two main branches in CRPD analysis: ECB-based CRPD and UCB-based CRPD. ECB-based CRPD and UCB-based CRPD. Later, Tan and Mooney [43] proposed the UCB-union approach, which ac- Later, Tan and Mooney [43] proposed the UCB-union approach, which accounted for the limitations of the above-described approaches. In their approach, counted for the limitations of the above-described approaches. In their approach, CRPD is computed using the information about all possibly affected useful CRPD is computed using the information about all possibly affected useful

45 45 76

46 Chapter 4. Related work 46 Chapter 4. Related work cache blocks along with the evicting cache blocks from the tasks which may cache blocks along with the evicting cache blocks from the tasks which may evict them. Opposite to that, Altmeyer et al. [44] proposed the ECB-union evict them. Opposite to that, Altmeyer et al. [44] proposed the ECB-union approach, where the all possibly evicting cache blocks are used along with the approach, where the all possibly evicting cache blocks are used along with the useful cache blocks from the tasks that may be preempted. useful cache blocks from the tasks that may be preempted. The latest and in overall the most precise CRPD approaches are proposed The latest and in overall the most precise CRPD approaches are proposed by Altmeyer et al. [45], called ECB-union multiset and UCB-union multiset by Altmeyer et al. [45], called ECB-union multiset and UCB-union multiset approaches. Those approaches are improvements over the UCB-union and approaches. Those approaches are improvements over the UCB-union and ECB-union because they account for a more precise estimation of the nested ECB-union because they account for a more precise estimation of the nested preemptions. The multiset approach was also used by Staschulat et al. [46] for preemptions. The multiset approach was also used by Staschulat et al. [46] for the periodic task systems where they accounted that each additionally accounted the periodic task systems where they accounted that each additionally accounted preemption of a single preempting task may result in a smaller CRPD value preemption of a single preempting task may result in a smaller CRPD value compared to the previous preemptions. In the context of periodic systems, compared to the previous preemptions. In the context of periodic systems, Ramaprasad and Mueller [47, 48] investigated the possibility of tightening the Ramaprasad and Mueller [47, 48] investigated the possibility of tightening the CRPD bounds using preemption patterns. However, in this paper, we consider a CRPD bounds using preemption patterns. However, in this paper, we consider a sporadic task model which constrains the applicability of such analysis type. sporadic task model which constrains the applicability of such analysis type. Furthermore, in recent years, several cache-aware analysis were proposed Furthermore, in recent years, several cache-aware analysis were proposed in the following contexts: in the following contexts:

cache partitioning, by Altmeyer et al. [49, 50], cache partitioning, by Altmeyer et al. [49, 50],

cache-persistence, by Rashid et al. [51, 52] and Stock et al. [29], cache-persistence, by Rashid et al. [51, 52] and Stock et al. [29],

write-back cache, by Davis et al. [53] and Blaß et al. [54]. write-back cache, by Davis et al. [53] and Blaß et al. [54].

mixed-criticality systems, by Davis et al. [55] mixed-criticality systems, by Davis et al. [55]

tasks with preemption thresholds, by Bril et al. [56] tasks with preemption thresholds, by Bril et al. [56]

4.2 Schedulability and timing analysis of 4.2 Schedulability and timing analysis of tasks with ﬁxed preemption points tasks with ﬁxed preemption points

Considering the feasibility analysis of tasks with fixed preemptions, the seminal Considering the feasibility analysis of tasks with fixed preemptions, the seminal and the most relevant work was proposed by Yao et al. [57, 58]. Before their and the most relevant work was proposed by Yao et al. [57, 58]. Before their work, a very important contribution was made by Bril et al. [59] who for the work, a very important contribution was made by Bril et al. [59] who for the first time identified the scheduling anomaly known as self-pushing phenomenon. first time identified the scheduling anomaly known as self-pushing phenomenon. The benefits of using a fixed preemption model are described by Butazzo et The benefits of using a fixed preemption model are described by Butazzo et al. [12], among which are the increased timing predictability and the facilitation al. [12], among which are the increased timing predictability and the facilitation of the mutual exclusion problems. Butazzo et al. [12] and Yao et al. [60] of the mutual exclusion problems. Butazzo et al. [12] and Yao et al. [60] showed that LP-FPPS can in the majority of the cases outperform many of showed that LP-FPPS can in the majority of the cases outperform many of 77

4.3 Analysis of partitioned multi-core systems 47 4.3 Analysis of partitioned multi-core systems 47 the existing scheduling algorithms. Becker et al. [61] showed the benefits of the existing scheduling algorithms. Becker et al. [61] showed the benefits of limited-preemptiveness in automotive systems. limited-preemptiveness in automotive systems. Other works in the area of fixed preemptions based task models included Other works in the area of fixed preemptions based task models included the preemption point selection algorithms, proposed by Bertogna et al. [62], the preemption point selection algorithms, proposed by Bertogna et al. [62], Peng et al. [63], and Cavicchio et al. [64]. Cavicchio et al. [64] identified Peng et al. [63], and Cavicchio et al. [64]. Cavicchio et al. [64] identified for the first time that a detailed view on CRPD is important for the tasks with for the first time that a detailed view on CRPD is important for the tasks with fixed preemption points, especially a possibility of cache block to be reloaded. fixed preemption points, especially a possibility of cache block to be reloaded. A more elaborate and very important work on preemption point selection for A more elaborate and very important work on preemption point selection for cache overhead minimisation is proposed and presented in John Cavicchio’s cache overhead minimisation is proposed and presented in John Cavicchio’s PhD dissertation [65]. PhD dissertation [65].

4.3 Analysis of partitioned multi-core systems 4.3 Analysis of partitioned multi-core systems

A majority of work in the context of multi-core real-time systems under the A majority of work in the context of multi-core real-time systems under the LPS-FPP considered the global scheduling (where tasks may be allocated to LPS-FPP considered the global scheduling (where tasks may be allocated to any of the cores during runtime). Thekkilakattil et al. [66] investigated the any of the cores during runtime). Thekkilakattil et al. [66] investigated the schedulability implications of using eager and lazy preemption approaches schedulability implications of using eager and lazy preemption approaches in the context of fixed-priority global limited preemptive scheduling. Also, in the context of fixed-priority global limited preemptive scheduling. Also, the feasibility analysis for such a system was defined by Thekkilakattil et al. the feasibility analysis for such a system was defined by Thekkilakattil et al. [67] considering the earliest deadline first scheduling. However, Bastoni et al. [67] considering the earliest deadline first scheduling. However, Bastoni et al. [68] have shown that global scheduling may introduce high preemption- and [68] have shown that global scheduling may introduce high preemption- and migration-related delays, since in such systems the task may migrate to any migration-related delays, since in such systems the task may migrate to any of the available cores, potentially increasing the number of preemptions in the of the available cores, potentially increasing the number of preemptions in the system. system. In the context of partitioned scheduling (where each task is allocated to a In the context of partitioned scheduling (where each task is allocated to a single core before runtime) migration is not possible, which facilitates system single core before runtime) migration is not possible, which facilitates system analysis and predictability. On the other side, allocating tasks to the proces- analysis and predictability. On the other side, allocating tasks to the processors is shown to be an NP-hard problem, from the transformation of the bin sors is shown to be an NP-hard problem, from the transformation of the bin packing problem, shown by Johnson [69]. Since partitioned scheduling is a packing problem, shown by Johnson [69]. Since partitioned scheduling is a widely accepted industrial approach, there are many proposed heuristics and widely accepted industrial approach, there are many proposed heuristics and partitioning concepts which tried to further improve the schedulability results partitioning concepts which tried to further improve the schedulability results for such multicore real-time systems. for such multicore real-time systems. An algorithm which allocates the tasks in decreasing relative-deadline order An algorithm which allocates the tasks in decreasing relative-deadline order was proposed by Fisher et al. [70], considering the fixed-priority partitioned was proposed by Fisher et al. [70], considering the fixed-priority partitioned scheduling. Further on, Kato et al. [71] proposed a task portioning algorithm scheduling. Further on, Kato et al. [71] proposed a task portioning algorithm which sorts a given taskset by decreasing priority order and then performs which sorts a given taskset by decreasing priority order and then performs a first-fit decreasing heuristics, considering the rate monotonic task model. a first-fit decreasing heuristics, considering the rate monotonic task model. Lakshmanan et al. [72] proposed a semi-partitioning algorithm which sorts Lakshmanan et al. [72] proposed a semi-partitioning algorithm which sorts 78

48 Chapter 4. Related work 48 Chapter 4. Related work taskset by decreasing task density order and then performs first fit heuristics, taskset by decreasing task density order and then performs first fit heuristics, considering the deadline monotonic task model. Kato et al. [73] proposed an considering the deadline monotonic task model. Kato et al. [73] proposed an algorithm which sorts the tasks in a decreasing relative deadline order and splits algorithm which sorts the tasks in a decreasing relative deadline order and splits them if necessary during the allocation, which is a characteristic of the semi- them if necessary during the allocation, which is a characteristic of the semi- partitioning algorithms. Nemati et al. [74] proposed a partitioning approach partitioning algorithms. Nemati et al. [74] proposed a partitioning approach which can increase the overall system schedulability in the context of multicore which can increase the overall system schedulability in the context of multicore scheduling with resource sharing by using a blocking aware allocation of tasks. scheduling with resource sharing by using a blocking aware allocation of tasks. Andersson et al. [75] proposed a sufficient schedulability test for partitioned Andersson et al. [75] proposed a sufficient schedulability test for partitioned fixed-priority preemptive scheduling. fixed-priority preemptive scheduling.

In the context of Deferred-Preemption Scheduling, Davis et al. [76] pro- In the context of Deferred-Preemption Scheduling, Davis et al. [76] proposed the global scheduling algorithm and they also evaluated the partitioned posed the global scheduling algorithm and they also evaluated the partitioned scheduling for such model, by comparing different allocation strategies defined scheduling for such model, by comparing different allocation strategies defined with the FFD heuristics and different taskset orderings. In their work, it is with the FFD heuristics and different taskset orderings. In their work, it is assumed that the final non-preemptive region is defined for each task, without assumed that the final non-preemptive region is defined for each task, without assuming the preemption costs for the preemptive parts of the task. assuming the preemption costs for the preemptive parts of the task.

The majority of the above-described work does not include the preemption The majority of the above-described work does not include the preemption related delays in the proposed task models and analysis. In some cases, they related delays in the proposed task models and analysis. In some cases, they are accounted for in the worst-case execution time of a task, which may lead are accounted for in the worst-case execution time of a task, which may lead to significant over-provisioning. Starke et al. [77] proposed an approach to significant over-provisioning. Starke et al. [77] proposed an approach which performs the partitioning but also accounts for the preemption-related which performs the partitioning but also accounts for the preemption-related delays. However, their work does not account for the possible reduction of delays. However, their work does not account for the possible reduction of the preemption delay impact, which is possible in the LPS approaches, by the preemption delay impact, which is possible in the LPS approaches, by limiting the number of preemptions or even selecting the task intervals where limiting the number of preemptions or even selecting the task intervals where they are possible. To reduce the preemption delay in partitioned systems, Ward they are possible. To reduce the preemption delay in partitioned systems, Ward et al. [78] proposed the use of page colouring for cache management which et al. [78] proposed the use of page colouring for cache management which however requires modifications of the operating system. To overcome this however requires modifications of the operating system. To overcome this problem, Thekkilakattil et al. [79] proposed the use of Limited Preemptive problem, Thekkilakattil et al. [79] proposed the use of Limited Preemptive Scheduling in the context of Partitioned Scheduling in order to minimise shared Scheduling in the context of Partitioned Scheduling in order to minimise shared cache effects among the tasks by using temporal separation of the task which cache effects among the tasks by using temporal separation of the task which share cache blocks. In the context of semi-partitioned scheduling, Souto et share cache blocks. In the context of semi-partitioned scheduling, Souto et al. [80] proposed an overhead-aware schedulability. Altmeyer et al. [81] al. [80] proposed an overhead-aware schedulability. Altmeyer et al. [81] introduced a multi-core response-time analysis framework which accounts for introduced a multi-core response-time analysis framework which accounts for explicit interference analysis, and their work was extended by Davis et al. [82] explicit interference analysis, and their work was extended by Davis et al. [82] assuming partitioned fixed-priority preemptive scheduling. Huang et al. [83] assuming partitioned fixed-priority preemptive scheduling. Huang et al. [83] proposed a schedulability analysis for partitioned fixed-priority preemptive proposed a schedulability analysis for partitioned fixed-priority preemptive scheduling, assuming contention for a shared resource. For more work in this scheduling, assuming contention for a shared resource. For more work in this domain, refer to a survey created by Maiza et al. [84]. domain, refer to a survey created by Maiza et al. [84]. 79

4.4 Relevant work in timing and cache analysis 49 4.4 Relevant work in timing and cache analysis 49

4.4 Relevant work in timing and cache analysis 4.4 Relevant work in timing and cache analysis

The most relevant work on cache analysis in the context of preemption delay The most relevant work on cache analysis in the context of preemption delay and preemption-delay aware schedulability analysis is proposed by Lee et al. and preemption-delay aware schedulability analysis is proposed by Lee et al. [42] and Altmeyer et al. [85, 44]. Lee et al. [42] proposed the static analysis [42] and Altmeyer et al. [85, 44]. Lee et al. [42] proposed the static analysis computation of useful cache blocks, while Altmeyer superseded this compu- computation of useful cache blocks, while Altmeyer superseded this computation, defining a notion of a definitely useful cache block. Zhang et al. [86] tation, defining a notion of a definitely useful cache block. Zhang et al. [86] defined the useful cache block analysis for data caches. There are many other defined the useful cache block analysis for data caches. There are many other relevant works on timing and cache analysis, such as work proposed by Hardy relevant works on timing and cache analysis, such as work proposed by Hardy et al. [87], and Lesage et al. [88], where novel cache-access classifications are et al. [87], and Lesage et al. [88], where novel cache-access classifications are defined. For a more detailed overview of timing and cache analyses, refer to defined. For a more detailed overview of timing and cache analyses, refer to surveys created by Wilhelm et al. [24], [89]. surveys created by Wilhelm et al. [24], [89]. In recent years, after the above surveys were made, there are several works In recent years, after the above surveys were made, there are several works in the context of timing and cache analysis that deserve attention for the context in the context of timing and cache analysis that deserve attention for the context of preemption-aware schedulability analyses for single-core and multi-core of preemption-aware schedulability analyses for single-core and multi-core systems. Primarily, a PhD dissertation by Sebastian Hahn [28] titled "On static systems. Primarily, a PhD dissertation by Sebastian Hahn [28] titled "On static timing analysis" which considers a complex problem of compositional analysis timing analysis" which considers a complex problem of compositional analysis on modern architectures. The basic principle of compositional timing analysis is on modern architectures. The basic principle of compositional timing analysis is to separate concerns of counter-intuitive timing behaviour by analysing different to separate concerns of counter-intuitive timing behaviour by analysing different aspects of the system’s behaviour individually. However, the goal of timing aspects of the system’s behaviour individually. However, the goal of timing analysis is to derive tight results, which can be harder to achieve by using a analysis is to derive tight results, which can be harder to achieve by using a compositional approach. In his work, Sebastian Hahn proposed a compositional compositional approach. In his work, Sebastian Hahn proposed a compositional base bound analysis that enables a sound and tight timing analysis even for base bound analysis that enables a sound and tight timing analysis even for complex processors. Additional contributions of his thesis include hardware complex processors. Additional contributions of his thesis include hardware designs and modifications that enable efficient compositional analysis and designs and modifications that enable efficient compositional analysis and facilitate sound timing predictability. In the same line of research, Sebastian facilitate sound timing predictability. In the same line of research, Sebastian Hahn and Jan Reineke [90] proposed a provably timing-predictable pipelined Hahn and Jan Reineke [90] proposed a provably timing-predictable pipelined processor core which is only about 6–7% slower than a conventional non-strict processor core which is only about 6–7% slower than a conventional non-strict in-order pipelined processor, but the gain in timing predictability enables orders- in-order pipelined processor, but the gain in timing predictability enables orders- of-magnitude faster WCET and multi-core timing analysis. Another result of of-magnitude faster WCET and multi-core timing analysis. Another result of his dissertation is a timing analysis tool LLVMTA [2] which enables timing and his dissertation is a timing analysis tool LLVMTA [2] which enables timing and cache analysis for many different system architectures. cache analysis for many different system architectures. In the context of static code analysis of write-back caches, proposed by Blaß In the context of static code analysis of write-back caches, proposed by Blaß et al. [54], and the additional work described in the Master thesis by Tobias Blaß et al. [54], and the additional work described in the Master thesis by Tobias Blaß [91]. This research opens the possibilities for more precise preemption-aware [91]. This research opens the possibilities for more precise preemption-aware analysis for write-back cache policy. analysis for write-back cache policy. In the context of joint static timing, cache, and interference analysis, Jan In the context of joint static timing, cache, and interference analysis, Jan C. Kleinsorge proposed in his dissertation [92] tight integration of cache, path C. Kleinsorge proposed in his dissertation [92] tight integration of cache, path and task-interference modeling. In that thesis, he also contributed with a more and task-interference modeling. In that thesis, he also contributed with a more 80

50 Chapter 4. Related work 50 Chapter 4. Related work accurate CRPD analysis for real-time systems employing periodic tasks. accurate CRPD analysis for real-time systems employing periodic tasks. In the context of resilience analysis, which examines the amount of distur- In the context of resilience analysis, which examines the amount of distur- bance a useful cache block may suffer by preempting tasks without causing an bance a useful cache block may suffer by preempting tasks without causing an additional cache reload due to preemption, is of high importance for reducing additional cache reload due to preemption, is of high importance for reducing the CRPD over-approximation in set-associative LRU caches. There are two the CRPD over-approximation in set-associative LRU caches. There are two relevant works in this domain, by Altmeyer et al. [93], and Rashid et al. [94]. relevant works in this domain, by Altmeyer et al. [93], and Rashid et al. [94]. Furthermore, the resilience analysis, multiset-union CRPD approaches, static Furthermore, the resilience analysis, multiset-union CRPD approaches, static analysis on deﬁnitely useful cache blocks, and few more contributions described analysis on deﬁnitely useful cache blocks, and few more contributions described in this chapter, are summarised in Altmeyer’s dissertation [20]. in this chapter, are summarised in Altmeyer’s dissertation [20]. In the context of multi-threaded real-time tasks and their interplay with In the context of multi-threaded real-time tasks and their interplay with cache memory, Tessler et al. made several important contributions [95, 96, 97] cache memory, Tessler et al. made several important contributions [95, 96, 97] which are summarised in Tessler’s dissertation [98]. which are summarised in Tessler’s dissertation [98]. To capture the dynamic behavior of data memory references, Zhang et al. To capture the dynamic behavior of data memory references, Zhang et al. contributed with the concept of temporal scope and scope-aware analysis thus contributed with the concept of temporal scope and scope-aware analysis thus improving abstract cache state analysis [86], [99], [100]. improving abstract cache state analysis [86], [99], [100]. As the last research area listed in this section, we refer to works by Koo et al. As the last research area listed in this section, we refer to works by Koo et al. [8], and Cecere et al. [9] who showed the importance and need for tight cache- [8], and Cecere et al. [9] who showed the importance and need for tight cache- related preemption delay analysis and precise delay measuring for real-time related preemption delay analysis and precise delay measuring for real-time systems running on spacecraft on-board computers. systems running on spacecraft on-board computers. 81

Chapter 5 Chapter 5

System model and notation System model and notation

In this chapter, we present the task model and notation which are used in the In this chapter, we present the task model and notation which are used in the following chapters of this thesis. following chapters of this thesis.

5.1 High-level task model 5.1 High-level task model

We consider a sporadic task model, with preassigned fixed and disjunct task We consider a sporadic task model, with preassigned fixed and disjunct task priorities, under preemptive scheduling. A taskset Γ consists of n tasks sorted priorities, under preemptive scheduling. A taskset Γ consists of n tasks sorted in a decreasing priority order, where each task τi generates an infinite number in a decreasing priority order, where each task τi generates an infinite number of jobs. The j-th job of τi is denoted with τi,j. Each task τi (1 ≤ i ≤ n)is of jobs. The j-th job of τi is denoted with τi,j. Each task τi (1 ≤ i ≤ n)is characterised with the following tuple of parameters: Pi,Ci,Ti,Di, where Pi characterised with the following tuple of parameters: Pi,Ci,Ti,Di, where Pi denotes a task priority, while the worst-case execution time without accounted denotes a task priority, while the worst-case execution time without accounted preemption delays is denoted with Ci. Ti denotes the minimum inter-arrival preemption delays is denoted with Ci. Ti denotes the minimum inter-arrival time between the two consecutive jobs of τi, and the relative deadline is denoted time between the two consecutive jobs of τi, and the relative deadline is denoted with Di. We assume constrained deadlines, i.e. Di ≤ Ti. Also, utilisation of with Di. We assume constrained deadlines, i.e. Di ≤ Ti. Also, utilisation of a task τi is denoted with Ui, and it is defined as a ratio between the minimum a task τi is denoted with Ui, and it is defined as a ratio between the minimum inter-arrival time and the worst-case execution time of a task, i.e. inter-arrival time and the worst-case execution time of a task, i.e.

Ui = Ti/Ci Ui = Ti/Ci

Taskset utilisation is denoted with UΓ and it is the sum of utilisation values for Taskset utilisation is denoted with UΓ and it is the sum of utilisation values for all tasks in the taskset, i.e. all tasks in the taskset, i.e. n n UΓ = Ui UΓ = Ui i=1 i=1

51 51 82

52 Chapter 5. System model and notation 52 Chapter 5. System model and notation

5.2 Low-level task model 5.2 Low-level task model

In addition, for the low-level task modelling, we use a Control Flow Graph In addition, for the low-level task modelling, we use a Control Flow Graph (CFG). Each task τi is represented by a CFG, denoted with CFGi, and in each (CFG). Each task τi is represented by a CFG, denoted with CFGi, and in each CFGi, nodes are represented by basic blocks, where zi is the total number of CFGi, nodes are represented by basic blocks, where zi is the total number of basic blocks within CFGi. A basic block BB i,k (1 ≤ k ≤ zi) represents a basic blocks within CFGi. A basic block BB i,k (1 ≤ k ≤ zi) represents a linear sequence of instructions with exactly one entry and one exit point, and if linear sequence of instructions with exactly one entry and one exit point, and if one instruction of a basic block is executed, then all of the other instructions one instruction of a basic block is executed, then all of the other instructions within the block are executed as well. It is often assumed in the literature, within the block are executed as well. It is often assumed in the literature, e.g. [101, 62] that a basic block is the minimum possible non-preemptive e.g. [101, 62] that a basic block is the minimum possible non-preemptive segment within the execution of a task. Formally, CFGi is represented with segment within the execution of a task. Formally, CFGi is represented with the tuple BS i, ES i, sni, eni where BS i represents a set of basic blocks of a the tuple BS i, ES i, sni, eni where BS i represents a set of basic blocks of a program, ES i ⊆ BS i × BS i represents the set of edges between the blocks. program, ES i ⊆ BS i × BS i represents the set of edges between the blocks. Each edge also represents a potential preemption point, and with vi we denote a Each edge also represents a potential preemption point, and with vi we denote a total number of disjunct preemption points within τi, i.e. the total number of total number of disjunct preemption points within τi, i.e. the total number of edges within CFGi. With sni, the starting node of CFGi is denoted, and eni edges within CFGi. With sni, the starting node of CFGi is denoted, and eni represents the ending node. In case when the program of a task has several entry represents the ending node. In case when the program of a task has several entry basic blocks, an artificial starting node sni is placed and connected to each of basic blocks, an artificial starting node sni is placed and connected to each of the blocks. The same holds for the ending points. Another term that is used in the blocks. The same holds for the ending points. Another term that is used in the thesis is a path which represents a sequence of nodes including the specified the thesis is a path which represents a sequence of nodes including the specified entry and exit nodes. entry and exit nodes.

Figure 5.1. Example of CFGi of a preemptive task τi. Figure 5.1. Example of CFGi of a preemptive task τi.

In Figure 5.1 we show an example of CFGi of a preemptive task τi. The In Figure 5.1 we show an example of CFGi of a preemptive task τi. The illustrated task consists of six basic blocks (BB i,1, BB i,2, BB i,3, BB i,4, BB i,5, illustrated task consists of six basic blocks (BB i,1, BB i,2, BB i,3, BB i,4, BB i,5, and BB i,6), and six preemption points (PP i,1, PP i,2, PP i,3, PP i,4, PP i,5, and and BB i,6), and six preemption points (PP i,1, PP i,2, PP i,3, PP i,4, PP i,5, and PP i,6). Thus, the total number of basic blocks is zi =6, and the total number PP i,6). Thus, the total number of basic blocks is zi =6, and the total number of disjunct preemption points is vi =6. of disjunct preemption points is vi =6. 83

5.2 Low-level task model 53 5.2 Low-level task model 53

Sets and multisets Sets and multisets Before introducing the detailed cache model and cache-block classifications, Before introducing the detailed cache model and cache-block classifications, we first introduce the relevant mathematical notation and terminology. We use we first introduce the relevant mathematical notation and terminology. We use standard integer sets to denote cache block sets, but we also use multisets. A standard integer sets to denote cache block sets, but we also use multisets. A multiset is a set modification that can consist of multiple instances of the same multiset is a set modification that can consist of multiple instances of the same element, unlike a set where only one instance of an element can be present. For element, unlike a set where only one instance of an element can be present. For this reason, besides the standard set operations such as set union (∪), and set this reason, besides the standard set operations such as set union (∪), and set intersection (∩), we also use the multiset union ( ). The result of the multiset intersection (∩), we also use the multiset union ( ). The result of the multiset union over two multisets is a multiset that consists of the summed number of union over two multisets is a multiset that consists of the summed number of instances of each element that is present in either of the unified multisets. With instances of each element that is present in either of the unified multisets. With |A| we denote the size of the set A. The same notation holds if A is a multiset. |A| we denote the size of the set A. The same notation holds if A is a multiset.

5.2.1 Cache-block classification 5.2.1 Cache-block classification In this thesis, we primarily assume a single-level direct-mapped cached, but we In this thesis, we primarily assume a single-level direct-mapped cached, but we also provide method adjustments for single-level set-associative LRU caches. also provide method adjustments for single-level set-associative LRU caches. The only exception is Chapter 9 where the assumptions about the cache archi- The only exception is Chapter 9 where the assumptions about the cache architecture are lifted and we also account for multi-level caches. For each assumed tecture are lifted and we also account for multi-level caches. For each assumed cache architecture in the thesis, both, instruction and data caches are considered, cache architecture in the thesis, both, instruction and data caches are considered, while for data caches we assume a write-through and no-write-allocate policies. while for data caches we assume a write-through and no-write-allocate policies. Throughout its execution, each task τi accesses cache blocks in cache Throughout its execution, each task τi accesses cache blocks in cache memory. More precisely, each instruction within a basic block of CFGi accesses memory. More precisely, each instruction within a basic block of CFGi accesses a certain cache block m which resides in some cache set s within the cache a certain cache block m which resides in some cache set s within the cache memory. We denote cache sets as integer values ranging from 0 to S − 1, where memory. We denote cache sets as integer values ranging from 0 to S − 1, where S is the total number of cache sets. Cache access can result in a cache hit or S is the total number of cache sets. Cache access can result in a cache hit or a cache miss, and for a remainder on this process and general cache-memory a cache miss, and for a remainder on this process and general cache-memory terminology refer to Section 2.5.2. terminology refer to Section 2.5.2. Considering each basic block BB i,k of τi where BB i,k ∈ BS i : BS i ∈ Considering each basic block BB i,k of τi where BB i,k ∈ BS i : BS i ∈ CFGi and (1 ≤ k ≤ zi), we define: CFGi and (1 ≤ k ≤ zi), we define: BB BB ECB i,k – a set of evicting cache blocks such that a cache-set s is in ECB i,k – a set of evicting cache blocks such that a cache-set s is in BB BB ECB i,k if some cache block in s may be accessed throughout the execu- ECB i,k if some cache block in s may be accessed throughout the execution of BB i,k. tion of BB i,k.

For each preemption point PP i,k of τi, where PP i,k ∈ ES i : ES i ∈ CFGi For each preemption point PP i,k of τi, where PP i,k ∈ ES i : ES i ∈ CFGi and (1 ≤ k ≤ vi), we deﬁne a notion of a useful memory block (UCB), building and (1 ≤ k ≤ vi), we deﬁne a notion of a useful memory block (UCB), building on the formulation originally proposed by Lee et al. [42], and later superseded on the formulation originally proposed by Lee et al. [42], and later superseded by Altmeyer et al. [102], and Zhang et al. [86] for data cache. by Altmeyer et al. [102], and Zhang et al. [86] for data cache.

A cache block m is useful at PP i,k if and only if: A cache block m is useful at PP i,k if and only if: 84

54 Chapter 5. System model and notation 54 Chapter 5. System model and notation

a) m must be cached at preemption point PP i,k, and a) m must be cached at preemption point PP i,k, and b) m may be reused on at least one control ﬂow path starting from b) m may be reused on at least one control ﬂow path starting from PP i,k without self-eviction of m before the reuse. PP i,k without self-eviction of m before the reuse.

Based on the above classification of memory blocks at preemption points, Based on the above classification of memory blocks at preemption points, we define the following terms. we define the following terms.

UCB i,k (Direct-mapped cache) – a set of useful cache blocks at PP i,k UCB i,k (Direct-mapped cache) – a set of useful cache blocks at PP i,k such that a cache-set s is in UCB i,k if τi has a useful cache block in such that a cache-set s is in UCB i,k if τi has a useful cache block in cache-set s at PP i,k. cache-set s at PP i,k.

UCB i,k (N-way Set-associative LRU cache) – a multiset of useful cache UCB i,k (N-way Set-associative LRU cache) – a multiset of useful cache blocks at PP i,k such that N instances of a cache-set s are present in blocks at PP i,k such that N instances of a cache-set s are present in UCB i,k if τi has a useful cache block in cache-set s at PP i,k. UCB i,k if τi has a useful cache block in cache-set s at PP i,k.

The UCB i,k definition for set-associative LRU cache also holds for the The UCB i,k definition for set-associative LRU cache also holds for the direct-mapped cache since the direct-mapped cache is just a special case of direct-mapped cache since the direct-mapped cache is just a special case of set-associative cache, i.e. one-way set-associative cache. However, in case of set-associative cache, i.e. one-way set-associative cache. However, in case of direct-mapped cache, we choose to separately define UCB i,k as a set (instead direct-mapped cache, we choose to separately define UCB i,k as a set (instead of a multiset) in order to reduce the complexity of the examples given in the of a multiset) in order to reduce the complexity of the examples given in the thesis, and improve its readability. thesis, and improve its readability. On a task-level, for each τi ∈ Γ, we define the following terms: On a task-level, for each τi ∈ Γ, we define the following terms:

ECB i – a set of evicting cache blocks of τi, such that cache-set s is in ECB i – a set of evicting cache blocks of τi, such that cache-set s is in ECB i if and only if a memory block m from s may be accessed during ECB i if and only if a memory block m from s may be accessed during the execution of τi. This set is also deﬁned with the following equation: the execution of τi. This set is also deﬁned with the following equation:

zi zi BB BB ECB i = ECB i,k ECB i = ECB i,k k=1 k=1

UCB i (Direct-mapped cache) – a set of useful cache blocks throughout UCB i (Direct-mapped cache) – a set of useful cache blocks throughout the execution of τi which is deﬁned as the union of useful cache blocks at the execution of τi which is deﬁned as the union of useful cache blocks at any preemption point of τi. This is expressed with the following equation: any preemption point of τi. This is expressed with the following equation:

vi vi UCB i = UCB i,k (5.1) UCB i = UCB i,k (5.1) k=1 k=1

UCB i (N-way set-associative LRU cache) – a multiset of all potentialy UCB i (N-way set-associative LRU cache) – a multiset of all potentialy useful cache blocks throughout the execution of τi. It is deﬁned as a useful cache blocks throughout the execution of τi. It is deﬁned as a 85

5.3 Task model after preemption point selection 55 5.3 Task model after preemption point selection 55 multiset union of all cache sets where a useful cache block may reside. multiset union of all cache sets where a useful cache block may reside. Each cache set is accounted N times, accounting each of its cache blocks. Each cache set is accounted N times, accounting each of its cache blocks.

vi vi UCB i = s|s ∈ UCB i,k (5.2) UCB i = s|s ∈ UCB i,k (5.2) N k=1 N k=1

The above equation uses the multiset-union operation which in the above The above equation uses the multiset-union operation which in the above case inserts each cache set s exactly N times in the resulting set UCB i,given case inserts each cache set s exactly N times in the resulting set UCB i,given that there is a useful memory block in s throughout the execution of τi. that there is a useful memory block in s throughout the execution of τi.

5.3 Task model after preemption point selection 5.3 Task model after preemption point selection

In the previous section, we defined the general model of a preemptive task. In the previous section, we defined the general model of a preemptive task. However, as we described in Section 2.4.1, given a task τi, it is possible to However, as we described in Section 2.4.1, given a task τi, it is possible to declare non-preemptive regions throughout the task’s code, which may change declare non-preemptive regions throughout the task’s code, which may change its potential execution as well as its control-flow graph CFGi. In this section, its potential execution as well as its control-flow graph CFGi. In this section, we represent the task notation upon preemption point selection. we represent the task notation upon preemption point selection. With δi,k (1 ≤ k ≤ li) we denote all resulting non-preemptive regions after With δi,k (1 ≤ k ≤ li) we denote all resulting non-preemptive regions after the preemption point selection, where li is the total number of resulting NPRs. the preemption point selection, where li is the total number of resulting NPRs. A NPR δi,k is characterised with the tuple BB i,s, BB i,e where BB i,s is the A NPR δi,k is characterised with the tuple BB i,s, BB i,e where BB i,s is the entry or the starting node of δi,k , and BB i,s is its ending node. We say that entry or the starting node of δi,k , and BB i,s is its ending node. We say that a basic block BB i,b ∈ δi,k iff BB i,b can be executed at some path starting a basic block BB i,b ∈ δi,k iff BB i,b can be executed at some path starting from BB i,s and finishing at BB i,e of δi,k. Since all basic blocks also represent from BB i,s and finishing at BB i,e of δi,k. Since all basic blocks also represent the minimum possible non-preemptive regions, if after the preemption point the minimum possible non-preemptive regions, if after the preemption point selection a basic block BB i,b is not in any of the declared NPRs, we assume selection a basic block BB i,b is not in any of the declared NPRs, we assume an artificially placed NPR δi,a such that BB i,s = BB i,e = BB i,b meaning that an artificially placed NPR δi,a such that BB i,s = BB i,e = BB i,b meaning that there is a direct mapping between the basic block BB i,b and the non-preemptive there is a direct mapping between the basic block BB i,b and the non-preemptive region δi,a. region δi,a. As described above, preemption point selection significantly affects the way As described above, preemption point selection significantly affects the way a task can be preempted and executed. Thus, with CFGi we denote the updated a task can be preempted and executed. Thus, with CFGi we denote the updated control flow graph of τi, represented with the tuple BS i, ES i, sn , en where control flow graph of τi, represented with the tuple BS i, ES i, sn , en where BS i represents a set of non-preemptive regions of a program, and ES i ⊆ BS i represents a set of non-preemptive regions of a program, and ES i ⊆ BS i × BS i represents the set of edges between the non-preemptive regions. BS i × BS i represents the set of edges between the non-preemptive regions. ES i contains the subset of edges from the original set ES i of edges, such that ES i contains the subset of edges from the original set ES i of edges, such that all the edges from ES i that are now contained within the declared NPRs are all the edges from ES i that are now contained within the declared NPRs are removed. Also, each δi,k inherits the entry edges of its entry node BB i,s, and removed. Also, each δi,k inherits the entry edges of its entry node BB i,s, and all the exit edges of its exit node BB i,e. Furthermore, each edge also represents all the exit edges of its exit node BB i,e. Furthermore, each edge also represents a selected preemption point, since many preemption points are not possible a selected preemption point, since many preemption points are not possible 86

56 Chapter 5. System model and notation 56 Chapter 5. System model and notation at runtime upon declaration of non-preemptive regions. We denote the set of at runtime upon declaration of non-preemptive regions. We denote the set of selected preemption points of τi with SPP i. selected preemption points of τi with SPP i.

Figure 5.2. Example of CFGi of a task τi after the declaration of non-preemptive regions. Figure 5.2. Example of CFGi of a task τi after the declaration of non-preemptive regions.

The worst-case execution time of δi,k is denoted with qi,k, assuming no The worst-case execution time of δi,k is denoted with qi,k, assuming no preemptions during the execution of any NPR from which δi,k can be reached. preemptions during the execution of any NPR from which δi,k can be reached. In Figure 5.2 we show an example of CFGi of a preemptive task τi after In Figure 5.2 we show an example of CFGi of a preemptive task τi after the declaration of non-preemptive regions. The illustrated task consists of three the declaration of non-preemptive regions. The illustrated task consists of three NPRs (δi,1,δi,2, and δi,3), and two selected preemption points (PP i,1, and NPRs (δi,1,δi,2, and δi,3), and two selected preemption points (PP i,1, and PP i,2), meaning that the total number of non-preemptive regions is li =3. PP i,2), meaning that the total number of non-preemptive regions is li =3. Regarding the cache-block classification, we define ECB i,k as a set of Regarding the cache-block classification, we define ECB i,k as a set of evicting cache blocks of δi,k such that a cache-set s is in ECB i,k if some cache evicting cache blocks of δi,k such that a cache-set s is in ECB i,k if some cache block in s may be accessed throughout the execution of any basic block in δi,k. block in s may be accessed throughout the execution of any basic block in δi,k. It is formally defined in the following equation. It is formally defined in the following equation. BB BB ECB i,k = ECB i,a ECB i,k = ECB i,a

BB i,a∈δi,k BB i,a∈δi,k

A set ECB i of evicting cache blocks of τi remains the same as defined in the A set ECB i of evicting cache blocks of τi remains the same as defined in the previous section. previous section. The set UCB i,k of useful cache blocks at PP i,k remains the same as defined The set UCB i,k of useful cache blocks at PP i,k remains the same as defined in the previous section with the only difference that now it holds only for the in the previous section with the only difference that now it holds only for the selected preemption points, i.e. for each PP i,k ∈ ES i : ES i ∈ CFGi. The selected preemption points, i.e. for each PP i,k ∈ ES i : ES i ∈ CFGi. The same holds for a set UCB i of useful cache blocks of τi with the additional same holds for a set UCB i of useful cache blocks of τi with the additional difference that Equations 5.1 and 5.2 need to account only for UCB i,k terms of difference that Equations 5.1 and 5.2 need to account only for UCB i,k terms of selected preemption points. selected preemption points. 87

5.4 Fully-preemptive task 57 5.4 Fully-preemptive task 57

In the remainder of the thesis, we also consider two special cases of the In the remainder of the thesis, we also consider two special cases of the above-described general task model, fully-preemptive task, and sequential task. above-described general task model, fully-preemptive task, and sequential task.

5.4 Fully-preemptive task 5.4 Fully-preemptive task

In the existing literature, it is assumed that during the execution of a fully- In the existing literature, it is assumed that during the execution of a fully- preemptive task, upon the arrival of a task with a higher priority preemption preemptive task, upon the arrival of a task with a higher priority preemption occurs immediately, without any blocking. In this case, we assume that basic occurs immediately, without any blocking. In this case, we assume that basic blocks are negligibly small. blocks are negligibly small.

5.5 Sequential task model (runnables) 5.5 Sequential task model (runnables)

In the sequential task model, it is assumed that a CFG of a task consists of a In the sequential task model, it is assumed that a CFG of a task consists of a linear sequence of non-preemptive regions, also called subjobs when NPRs are linear sequence of non-preemptive regions, also called subjobs when NPRs are predefined atomic execution segments (similarly to basic blocks). Such task predefined atomic execution segments (similarly to basic blocks). Such task model is for example often used in automotive industry [61], [103], [104], [105], model is for example often used in automotive industry [61], [103], [104], [105], where the term runnable is used as the naming convention for a subjob. This where the term runnable is used as the naming convention for a subjob. This model represents a special case of the task model from the previous section and model represents a special case of the task model from the previous section and we further describe its simplified notation. we further describe its simplified notation.

Figure 5.3. Example of a simpliﬁed sequential task model with non-preemptive regions. Figure 5.3. Example of a simpliﬁed sequential task model with non-preemptive regions.

A task consists of a sequence of li non-preemptive regions, separated by A task consists of a sequence of li non-preemptive regions, separated by li − 1 preemption points (see Figure 5.3). The k-th NPR of τi is denoted with li − 1 preemption points (see Figure 5.3). The k-th NPR of τi is denoted with δi,k,(1 ≤ k ≤ li), and its worst-case execution time, assuming no preemptions δi,k,(1 ≤ k ≤ li), and its worst-case execution time, assuming no preemptions during the execution of δi,1, ..., δi,k−1, is denoted with qi,k. Thus, Ci can be during the execution of δi,1, ..., δi,k−1, is denoted with qi,k. Thus, Ci can be li − li − expressed as k=1 qi,k, and the worst-case execution time of the ﬁrst li 1 expressed as k=1 qi,k, and the worst-case execution time of the ﬁrst li 1 li−1 li−1 NPRS is denoted with Ei and is equal to Ei = k=1 qi,k. The k-th preemption NPRS is denoted with Ei and is equal to Ei = k=1 qi,k. The k-th preemption point of τi is denoted with PP i,k. point of τi is denoted with PP i,k. 88

58 Chapter 5. System model and notation 58 Chapter 5. System model and notation

As explained in the previous section, a set of evicting cache blocks through- As explained in the previous section, a set of evicting cache blocks throughout the execution of qi,k is denoted with ECB i,k. And a set of useful cache out the execution of qi,k is denoted with ECB i,k. And a set of useful cache blocks at PP i,k is denoted with UCB i,k. blocks at PP i,k is denoted with UCB i,k.

5.6 Relevant sets ot tasks 5.6 Relevant sets ot tasks

Throughout the thesis, we also use the following notation to refer to different Throughout the thesis, we also use the following notation to refer to different sets of tasks. sets of tasks.

hp(i) : The set of tasks with priorities higher than Pi. hp(i) : The set of tasks with priorities higher than Pi.

hpe(i) : hp(i) ∪{τi}. hpe(i) : hp(i) ∪{τi}.

lp(i) : The set of tasks with priorities lower than Pi. lp(i) : The set of tasks with priorities lower than Pi.

aff (i, h) The set of tasks that can affect the worst-case response time Ri, aff (i, h) The set of tasks that can affect the worst-case response time Ri, and can be preempted by τh, i.e. aff (i, h)=hpe(i) ∩ lp(h). and can be preempted by τh, i.e. aff (i, h)=hpe(i) ∩ lp(h).

5.7 CRPD notation 5.7 CRPD notation

On a lower level, we consider that a cache-related preemption delay γ is com- On a lower level, we consider that a cache-related preemption delay γ is computed as the upper bound on number of cache block reloads, multiplied by a puted as the upper bound on number of cache block reloads, multiplied by a constant BRT , which is the longest time needed for a single memory block to constant BRT , which is the longest time needed for a single memory block to be reloaded into cache memory, i.e. memory block reload time. The general be reloaded into cache memory, i.e. memory block reload time. The general formula is: formula is: γ =#reloads × BRT (5.3) γ =#reloads × BRT (5.3)

Based on this concept, we deﬁne an upper bound ξi,k on CRPD resulting from a Based on this concept, we deﬁne an upper bound ξi,k on CRPD resulting from a preemption at PP i,k, using the following equations. For a direct-mapped cache preemption at PP i,k, using the following equations. For a direct-mapped cache we compute: we compute: ξi,k = UCB i,k ∩ ECB h × BRT (5.4) ξi,k = UCB i,k ∩ ECB h × BRT (5.4)

τh∈hp(i) τh∈hp(i)

The above equation accounts that upper bound on number of reloads is equal The above equation accounts that upper bound on number of reloads is equal to the maximum number of cache blocks that may be cached at PP i,k, may be to the maximum number of cache blocks that may be cached at PP i,k, may be reused in the remaining part of execution of τi, and can be evicted by one of the reused in the remaining part of execution of τi, and can be evicted by one of the tasks with higher priority than τi. tasks with higher priority than τi. 89

5.8 System assumptions and limitations 59 5.8 System assumptions and limitations 59

In case of set-associative LRU cache we compute: In case of set-associative LRU cache we compute: ξi,k = UCB i,k ∩ ECB h × BRT (5.5) ξi,k = UCB i,k ∩ ECB h × BRT (5.5)

τh∈hp(i) τh∈hp(i) where the result of the ∩ operation is a multiset that contains each element where the result of the ∩ operation is a multiset that contains each element from UCBi,k if it is also in ECB h. This operation accounts for the fact that from UCBi,k if it is also in ECB h. This operation accounts for the fact that each cache block of a cache set s is evicted if s is in ECB h. In the remainder or each cache block of a cache set s is evicted if s is in ECB h. In the remainder or the thesis, we always use basic intersection operation ∩ since all the examples the thesis, we always use basic intersection operation ∩ since all the examples and equations are given for direct-mapped cache. However, in case of LRU set and equations are given for direct-mapped cache. However, in case of LRU set associative caches, it is only necessary to use ∩ instead, wherever there is an associative caches, it is only necessary to use ∩ instead, wherever there is an intersection between a UCB multiset and ECB set. intersection between a UCB multiset and ECB set.

5.8 System assumptions and limitations 5.8 System assumptions and limitations

Throughout the thesis, we primarily assume single-core systems, except in Throughout the thesis, we primarily assume single-core systems, except in Chapter 9 where we assume multi-core systems. However, there is an important Chapter 9 where we assume multi-core systems. However, there is an important system assumption which holds for both system types, and it deals with timing system assumption which holds for both system types, and it deals with timing anomalies. A timing anomaly [106] refers to a behavior of a processor architec- anomalies. A timing anomaly [106] refers to a behavior of a processor architecture where a local best-case causes a global worst-case. In some systems it is ture where a local best-case causes a global worst-case. In some systems it is possible that upon a cache hit, roll-back of an instruction is necessary which possible that upon a cache hit, roll-back of an instruction is necessary which may lead to a longer execution time compared to a situation of a cache miss may lead to a longer execution time compared to a situation of a cache miss [20]. This situation is referred to as a domino effect and if it occurs, a delay it [20]. This situation is referred to as a domino effect and if it occurs, a delay it causes cannot be bounded by a constant, but is instead proportional to the total causes cannot be bounded by a constant, but is instead proportional to the total execution time of a task. execution time of a task. Wilhelm et al. [107] defined the following classification of system architec- Wilhelm et al. [107] defined the following classification of system architectures, depending on whether a system may exhibit a timing anomaly: tures, depending on whether a system may exhibit a timing anomaly: A fully timing composable architecture – system architecture which does A fully timing composable architecture – system architecture which does not exhibit timing anomalies. not exhibit timing anomalies. A compositional architecture with constant-bounded effect – system ar- A compositional architecture with constant-bounded effect – system architecture which exhibits timing anomalies but no domino effects. chitecture which exhibits timing anomalies but no domino effects. A non-compositional architecture – system architecture which exhibits A non-compositional architecture – system architecture which exhibits timing anomalies and domino effects. timing anomalies and domino effects. As shown by Lundqvist et al. [108] and Altmeyer et al. [85], a separate As shown by Lundqvist et al. [108] and Altmeyer et al. [85], a separate computation of the timing and CRPD bounds is proven to be safe only for the computation of the timing and CRPD bounds is proven to be safe only for the analysis of target systems without timing anomalies. Thus, in this thesis we analysis of target systems without timing anomalies. Thus, in this thesis we assume system architectures without timing anomalies. assume system architectures without timing anomalies. 90

60 Chapter 5. System model and notation 60 Chapter 5. System model and notation

Another limitation is that in case of FIFO and PLRU cache-replacement Another limitation is that in case of FIFO and PLRU cache-replacement policies, the concepts of useful and evicting cache blocks cannot be applied, policies, the concepts of useful and evicting cache blocks cannot be applied, as shown by Burguiere et al. [109]. Therefore, the proposed methodologies as shown by Burguiere et al. [109]. Therefore, the proposed methodologies that use the above cache block concepts cannot be applied for FIFO and PLRU that use the above cache block concepts cannot be applied for FIFO and PLRU replacement policies. replacement policies.

5.9 Summary 5.9 Summary

In this chapter, we described the system and task model, along with mathemati- In this chapter, we described the system and task model, along with mathematical terminology used in the thesis. Also, we described the system and model cal terminology used in the thesis. Also, we described the system and model limitations for the contributions that follow. limitations for the contributions that follow. 91

Chapter 6 Chapter 6

Preemption-delay analysis Preemption-delay analysis for tasks with ﬁxed for tasks with ﬁxed preemption points preemption points

In Fixed Preemption Points approach, the goal is often to decrease a number of In Fixed Preemption Points approach, the goal is often to decrease a number of possible preemption points, minimise preemption delays, and increase the pre- possible preemption points, minimise preemption delays, and increase the predictability of a system since there are fewer task interactions that timing-related dictability of a system since there are fewer task interactions that timing-related analyses need to account for, compared to fully-preemptive tasks. However, analyses need to account for, compared to fully-preemptive tasks. However, the existing approaches over-approximate the computation of CRPD bounds, the existing approaches over-approximate the computation of CRPD bounds, without seizing the increased predictability to derive tighter approximations. without seizing the increased predictability to derive tighter approximations. In the remainder of the chapter, we present a novel method for computing In the remainder of the chapter, we present a novel method for computing safe and tight CRPD bounds. We consider sequential tasks (see Section 5.5), safe and tight CRPD bounds. We consider sequential tasks (see Section 5.5), with fixed preemption points, running on single-core systems with the single- with fixed preemption points, running on single-core systems with the single- level direct-mapped or set-associative LRU cache. The chapter is divided into level direct-mapped or set-associative LRU cache. The chapter is divided into the following sections: the following sections: Problem statement – a description of the identified problems in SOTA, Problem statement – a description of the identified problems in SOTA, Computation of tight CRPD bounds – the proposed solution, Computation of tight CRPD bounds – the proposed solution,

Evaluation – a description of experiments and their results, Evaluation – a description of experiments and their results,

Summary and Conclusions of the chapter Summary and Conclusions of the chapter

61 61 92

62 Chapter 6. Preemption delay analysis for LPS-FPP 62 Chapter 6. Preemption delay analysis for LPS-FPP

Table 6.1. List of important symbols used in this chapter (CRPD terms are upper bounds). Table 6.1. List of important symbols used in this chapter (CRPD terms are upper bounds).

Symbol Brief explanation Symbol Brief explanation Γ Taskset of size n Γ Taskset of size n

A(i, k, m) A set of subsequent preemption points, starting from PP i,k, which A(i, k, m) A set of subsequent preemption points, starting from PP i,k, which upon preemption can lead to at most one reload of m upon preemption can lead to at most one reload of m 93

6.1 Problem statement 63 6.1 Problem statement 63

6.1 Problem statement 6.1 Problem statement

Many papers which consider scheduling of tasks with fixed preemption points, Many papers which consider scheduling of tasks with fixed preemption points, e.g., [62, 63, 12] assume a CRPD approximation which accounts for the worst- e.g., [62, 63, 12] assume a CRPD approximation which accounts for the worst- case scenario that all possible preemption points are preempted such that each case scenario that all possible preemption points are preempted such that each point exhibits the maximum eviction scenario of its useful cache blocks. Thus, point exhibits the maximum eviction scenario of its useful cache blocks. Thus, for a preempted task τi, the worst-case execution time with accounted CRPD for a preempted task τi, the worst-case execution time with accounted CRPD γ li−1 γ li−1 is computed as: Ci = Ci + =1 ξr,k. This computation can lead to a is computed as: Ci = Ci + =1 ξr,k. This computation can lead to a significant CRPD over-approximation. We describe the following two main significant CRPD over-approximation. We describe the following two main problems which may be the source of over-approximation: 1) accounting for problems which may be the source of over-approximation: 1) accounting for infeasible preemption combinations, 2) accounting for infeasible useful cache infeasible preemption combinations, 2) accounting for infeasible useful cache block reloads. block reloads.

6.1.1 Infeasible preemptions 6.1.1 Infeasible preemptions

In Figure 6.1 we show the task τ2 with three preemption points and four non- In Figure 6.1 we show the task τ2 with three preemption points and four non- preemptive regions with their worst-case execution times. We also show the preemptive regions with their worst-case execution times. We also show the preempting task τ1 with C1 =20and T1 =65. preempting task τ1 with C1 =20and T1 =65. We show a scenario where it is obvious that if τ1 preempts τ2 at PP 2,1, then We show a scenario where it is obvious that if τ1 preempts τ2 at PP 2,1, then it cannot preempt also on PP 2,2. This is the case because the next instance of τ1 it cannot preempt also on PP 2,2. This is the case because the next instance of τ1 cannot be released before the latest start time of the third non-preemptive region cannot be released before the latest start time of the third non-preemptive region δ2,3. Let us assume that an instance of τ2 started to execute. If an instance of τ1 δ2,3. Let us assume that an instance of τ2 started to execute. If an instance of τ1 is released during the execution of δ2,1, thus definitely preempting at PP 2,1, can is released during the execution of δ2,1, thus definitely preempting at PP 2,1, can the next instance of τ1 preempt τ2 at PP 2,2? We first compute the maximum the next instance of τ1 preempt τ2 at PP 2,2? We first compute the maximum

Figure 6.1. A preempted task τ2 with three preemption points (PP 2,1, PP 2,2 and PP 2,3), and Figure 6.1. A preempted task τ2 with three preemption points (PP 2,1, PP 2,2 and PP 2,3), and four non-preemptive regions with worst case CRPD at each point. Top of the ﬁgure: preempting four non-preemptive regions with worst case CRPD at each point. Top of the ﬁgure: preempting task τ1 with C1 =20, and T1 =65. task τ1 with C1 =20, and T1 =65. 94

64 Chapter 6. Preemption delay analysis for LPS-FPP 64 Chapter 6. Preemption delay analysis for LPS-FPP

time interval between the start time of δ2,1 and the start time of δ2,3, and obtain: time interval between the start time of δ2,1 and the start time of δ2,3, and obtain: q2,1 + C1 + ξ2,1 + q2,2 =59 q2,1 + C1 + ξ2,1 + q2,2 =59

However, at earliest, the next instance of τ1 comes 65 time units after the start of However, at earliest, the next instance of τ1 comes 65 time units after the start of τ2, which is less than the latest start time of δ2,3. Therefore, preempting at both τ2, which is less than the latest start time of δ2,3. Therefore, preempting at both PP 2,1 and PP 2,2 is infeasible. By analysing the other infeasible preemptions PP 2,1 and PP 2,2 is infeasible. By analysing the other infeasible preemptions from the example shown in Figure 6.1 we would see that τ1 can preempt τ2 from the example shown in Figure 6.1 we would see that τ1 can preempt τ2 only in two scenarios: 1) τ1 preempts τ2 at PP 2,1 and PP 2,3,or2)τ1 preempts only in two scenarios: 1) τ1 preempts τ2 at PP 2,1 and PP 2,3,or2)τ1 preempts τ2 only at PP 2,2. Assuming that τ2 is preempted by τ1 at all three points is τ2 only at PP 2,2. Assuming that τ2 is preempted by τ1 at all three points is safe, but it might be a significant over-approximation. To find which of feasible safe, but it might be a significant over-approximation. To find which of feasible preemption combinations results in a maximum CRPD is not a trivial problem, preemption combinations results in a maximum CRPD is not a trivial problem, especially with the higher number of preempting tasks, and it is solved in especially with the higher number of preempting tasks, and it is solved in Section 6.2. Section 6.2.

6.1.2 Infeasible useful cache block reloads 6.1.2 Infeasible useful cache block reloads

In Figure 6.2, we show the same tasks: τ2 with its useful cache block sets, In Figure 6.2, we show the same tasks: τ2 with its useful cache block sets, and the preempting task τ1 with a set of its evicting cache blocks. Inside the and the preempting task τ1 with a set of its evicting cache blocks. Inside the non-preemptive regions, we show the accessed cache blocks, e.g., during the non-preemptive regions, we show the accessed cache blocks, e.g., during the ﬁrst non-preemptive region of τ3 those are: 1, 2, 3, and 4. Since they are all ﬁrst non-preemptive region of τ3 those are: 1, 2, 3, and 4. Since they are all used in the remaining part of the task execution, they belong to the UCB 2,1 used in the remaining part of the task execution, they belong to the UCB 2,1 set. Notice that for the second non-preemptive region, the only accessed cache set. Notice that for the second non-preemptive region, the only accessed cache block is 1, but the UCB 2,2 is equal to {2, 3, 4} since the cache blocks 2, 3, and block is 1, but the UCB 2,2 is equal to {2, 3, 4} since the cache blocks 2, 3, and 4 are cached and may be reused in the remaining part of the task execution. 4 are cached and may be reused in the remaining part of the task execution.

Figure 6.2. A preempted task τ1 with three preemption points with deﬁned UCB sets (UCB 2,1, Figure 6.2. A preempted task τ1 with three preemption points with deﬁned UCB sets (UCB 2,1, UCB 2,2, and UCB 2,3) , and four non-preemptive regions. The cache block accesses throughout UCB 2,2, and UCB 2,3) , and four non-preemptive regions. The cache block accesses throughout the task execution are shown as circled integer values for each cache block. the task execution are shown as circled integer values for each cache block. 95

6.1 Problem statement 65 6.1 Problem statement 65

Let us ignore for now the analysis from the previous example and assume Let us ignore for now the analysis from the previous example and assume that τ1 may preempt τ2 at all three preemption points. Considering the cache that τ1 may preempt τ2 at all three preemption points. Considering the cache block 2, it may be evicted when τ1 preempts τ2 at PP 2,1,oratPP 2,2. The block 2, it may be evicted when τ1 preempts τ2 at PP 2,1,oratPP 2,2. The existing CRPD analysis would account for two reloads, because cache block 2 is existing CRPD analysis would account for two reloads, because cache block 2 is in the useful cache block sets of both points, and it is in ECB 1 of τ1. However, in the useful cache block sets of both points, and it is in ECB 1 of τ1. However, this is an over-approximation since the analysis would over-approximate the this is an over-approximation since the analysis would over-approximate the number of reloads, which in reality can be at most one. The same holds for number of reloads, which in reality can be at most one. The same holds for cache blocks 3 and 4 since τ1 may evict them at any of the three points, but at cache blocks 3 and 4 since τ1 may evict them at any of the three points, but at most one reload should be accounted for. Accounting for the precise number of most one reload should be accounted for. Accounting for the precise number of cache block reloads is another problem that we address in this chapter, and for cache block reloads is another problem that we address in this chapter, and for that, we use the following observation: that, we use the following observation:

Between two consecutive accesses of cache block m during the execution of Between two consecutive accesses of cache block m during the execution of τi, at most one reload of m should be accounted for in the CRPD analysis for τi. τi, at most one reload of m should be accounted for in the CRPD analysis for τi.

Let UCB i,k be a useful cache block set at PP i,k of a preempted task τi, and Let UCB i,k be a useful cache block set at PP i,k of a preempted task τi, and let m be a cache block such that m ∈ UCB i,k. Then at most one reload of m let m be a cache block such that m ∈ UCB i,k. Then at most one reload of m should be accounted by CRPD for τi, until the ﬁrst succeeding non-preemptive should be accounted by CRPD for τi, until the ﬁrst succeeding non-preemptive region δi,l, where k

Figure 6.3. The preempted task τi with a cache block m accessed before preemption point PP i,k Figure 6.3. The preempted task τi with a cache block m accessed before preemption point PP i,k and re-accessed at δi,l. Top: The preempting task τh that evicts m and preempts τi at all and re-accessed at δi,l. Top: The preempting task τh that evicts m and preempts τi at all preemption points between PP i,k and δi,l. preemption points between PP i,k and δi,l.

Existing papers in the domain of CRPD estimation account for cache block Existing papers in the domain of CRPD estimation account for cache block reloads based on useful-cache blocks at preemption points, thus they would ac- reloads based on useful-cache blocks at preemption points, thus they would account for l −k −1 reloads of m from PP i,k to δi,l (see Figure 6.3). Although m count for l −k −1 reloads of m from PP i,k to δi,l (see Figure 6.3). Although m may be useful block at all preemption points between two consecutive accesses may be useful block at all preemption points between two consecutive accesses within τi, such computation would lead to a signiﬁcant over-approximation within τi, such computation would lead to a signiﬁcant over-approximation since a cache block can be reloaded due to preemption only within the region since a cache block can be reloaded due to preemption only within the region where it is requested, and in this example it is within the region δi,l. where it is requested, and in this example it is within the region δi,l. 96

66 Chapter 6. Preemption delay analysis for LPS-FPP 66 Chapter 6. Preemption delay analysis for LPS-FPP

6.2 Computation of tight CRPD bounds 6.2 Computation of tight CRPD bounds

In this section, we describe the proposed Mixed Integer Linear Program (MILP) In this section, we describe the proposed Mixed Integer Linear Program (MILP) which jointly addresses the two stated sources of over-approximation, described which jointly addresses the two stated sources of over-approximation, described in the previous section. The overview of the method is described in Algorithm in the previous section. The overview of the method is described in Algorithm 1. The algorithm considers a taskset Γ with tasks ordered in decreasing priority 1. The algorithm considers a taskset Γ with tasks ordered in decreasing priority order. For each task τi it computes the CRPD bound γi by solving the defined order. For each task τi it computes the CRPD bound γi by solving the defined MILP, and updates the WCET of τi which after the update accounts for CRPD MILP, and updates the WCET of τi which after the update accounts for CRPD γ γ and is then denoted with Ci . The proposed MILP is defined in three steps: and is then denoted with Ci . The proposed MILP is defined in three steps:

1. First, we generate the variables that represent the possible preemptions 1. First, we generate the variables that represent the possible preemptions from the preempting tasks at different preemption points of the preempted from the preempting tasks at different preemption points of the preempted task. task.

2. Second, we generate the constraints which deﬁne infeasible preemption 2. Second, we generate the constraints which deﬁne infeasible preemption combinations. combinations.

3. Third, we generate a goal function which accounts for the infeasible 3. Third, we generate a goal function which accounts for the infeasible useful cache block reloads, described in Section 6.1, jointly accounting useful cache block reloads, described in Section 6.1, jointly accounting for the infeasible preemption combinations. for the infeasible preemption combinations.

In the following subsections we use the example deﬁned (in Section 6.1) in In the following subsections we use the example deﬁned (in Section 6.1) in Figures 6.1 and 6.2. Figures 6.1 and 6.2.

Data: Task set Γ Data: Task set Γ Result: Set Y of CRPD values for each task from Γ Result: Set Y of CRPD values for each task from Γ 1 Y ←∅ 1 Y ←∅

2 for i ← 2 to n do 2 for i ← 2 to n do 3 Var i ← Generate variables that represent all preemptions on τi 3 Var i ← Generate variables that represent all preemptions on τi 4 Coni ← Generate constraints which define infeasible preemption combinations 4 Coni ← Generate constraints which define infeasible preemption combinations 5 Goal i ← Generate a goal function 5 Goal i ← Generate a goal function 6 γi ← Compute CRPD bound by solving the MILP Var i, Coni, Goal i 6 γi ← Compute CRPD bound by solving the MILP Var i, Coni, Goal i 7 Y ← Y ∪ γi 7 Y ← Y ∪ γi γ γ 8 Ci ← Ci + γi 8 Ci ← Ci + γi 9 end 9 end 10 return Y 10 return Y Algorithm 1: Algorithm for tightening the upper bounds on CRPD of tasks Algorithm 1: Algorithm for tightening the upper bounds on CRPD of tasks with fixed preemption points. with fixed preemption points. 97

6.2 Computation of tight CRPD bounds 67 6.2 Computation of tight CRPD bounds 67

6.2.1 Variables 6.2.1 Variables

For each preempted task τi, we generate li − 1 × (i − 1) boolean variables, For each preempted task τi, we generate li − 1 × (i − 1) boolean variables, where li − 1 is the number of preemption points of τi, and (i − 1) is a number of where li − 1 is the number of preemption points of τi, and (i − 1) is a number of higher priority (preempting) tasks. Each boolean variable Xh,k represents the higher priority (preempting) tasks. Each boolean variable Xh,k represents the case where an instance of the preempting task τh can affect the preemption cost case where an instance of the preempting task τh can affect the preemption cost of the preemption point PP i,k. The set Var i of boolean variables is formally of the preemption point PP i,k. The set Var i of boolean variables is formally deﬁned as: deﬁned as: Var i = Xh,k ∈{0, 1} (1 ≤ h

Considering the running example from Figures 6.1 and 6.2, the set Var 2 is: Considering the running example from Figures 6.1 and 6.2, the set Var 2 is: Var 2 = X1,1 ∈{0, 1},X1,2 ∈{0, 1},X1,3 ∈{0, 1} Var 2 = X1,1 ∈{0, 1},X1,2 ∈{0, 1},X1,3 ∈{0, 1}

6.2.2 Constraints 6.2.2 Constraints

Considering the constraints, we are interested in identifying infeasible preemp- Considering the constraints, we are interested in identifying infeasible preemption combinations, focusing on preemption scenarios where two instances of a tion combinations, focusing on preemption scenarios where two instances of a preempting task τh can directly affect the cost of only one of the two preemption preempting task τh can directly affect the cost of only one of the two preemption points PP i,k and PP i,l (k

68 Chapter 6. Preemption delay analysis for LPS-FPP 68 Chapter 6. Preemption delay analysis for LPS-FPP

k,l k,l k,l k,l where z is the lowest value for which Ii (z)=Ii (z +1). where z is the lowest value for which Ii (z)=Ii (z +1). ⎧ ⎧ ⎪ l ⎪ l ⎪ k,l γ ⎪ k,l γ ⎪I (0) = (qi,w + ξi,w)+ C ⎪I (0) = (qi,w + ξi,w)+ C ⎨⎪ i h ⎨⎪ i h w=k τ ∈hp(τ ) w=k τ ∈hp(τ ) h i h i ⎪ l k,l − ⎪ l k,l − ⎪ k,l Ii (z 1) γ ⎪ k,l Ii (z 1) γ ⎪Ii (z)= (qi,w + ξi,w)+ +1 × Ch ⎪Ii (z)= (qi,w + ξi,w)+ +1 × Ch ⎩ Th ⎩ Th w=k τh∈hp(τi) w=k τh∈hp(τi) (6.1) (6.1) k,l k,l Next, we show how the time interval Ii can be used to identify infeasible Next, we show how the time interval Ii can be used to identify infeasible preemption combinations. preemption combinations.

Lemma 6.1. If task τh affects the preemption cost at PP i,k, then an instance Lemma 6.1. If task τh affects the preemption cost at PP i,k, then an instance of τh was released between the start of δi,k and the start of δi,k+1. of τh was released between the start of δi,k and the start of δi,k+1.

Proof. An instance of τh released before the start of δi,k will either not preempt Proof. An instance of τh released before the start of δi,k will either not preempt τi at all, or execute during a preemption before the start of δi,k. An instance τi at all, or execute during a preemption before the start of δi,k. An instance released after the start of δi,k+1 clearly did not affect a preemption at PP i,k. released after the start of δi,k+1 clearly did not affect a preemption at PP i,k.

Note that an instance released after the end of δi,k cannot cause a preemption Note that an instance released after the end of δi,k cannot cause a preemption at PP i,k, but it can still affect the preemption delay if it arrives during the at PP i,k, but it can still affect the preemption delay if it arrives during the execution of some other task that caused the preemption. execution of some other task that caused the preemption.

Corollary 6.1. If task τh affects the preemptions at both preemption points Corollary 6.1. If task τh affects the preemptions at both preemption points PP i,k and PP i,l (where k

Proof. A single instance of τh cannot affect both preemptions, and the interval Proof. A single instance of τh cannot affect both preemptions, and the interval in which they were released is given by Lemma 6.1. in which they were released is given by Lemma 6.1. k,l k,l Proposition 6.1. If Ii ≤ Th then task τh cannot affect one instance of τi at Proposition 6.1. If Ii ≤ Th then task τh cannot affect one instance of τi at both preemption points PP i,k and PP i,l. both preemption points PP i,k and PP i,l. k,l k,l Proof. Proof by contradiction: Assume that Ii ≤ Th and that τh affects one Proof. Proof by contradiction: Assume that Ii ≤ Th and that τh affects one instance of τi at both preemption points PP i,k and PP i,l. Then, by Corol- instance of τi at both preemption points PP i,k and PP i,l. Then, by Corol- lary 6.1, two instances of τh were released between the start of δi,k and the lary 6.1, two instances of τh were released between the start of δi,k and the k,l k,l start of δi,l+1, and thus within an interval of length Ii . This contradicts the start of δi,l+1, and thus within an interval of length Ii . This contradicts the minimum inter-arrival time Th. minimum inter-arrival time Th. k,l k,l Therefore, we generate a constraint whenever the inequality Ii ≤ Th Therefore, we generate a constraint whenever the inequality Ii ≤ Th holds. Since the constraint expresses the case when τh cannot affect τi at the holds. Since the constraint expresses the case when τh cannot affect τi at the two points PP i,k and PP i,l, it has the following form: two points PP i,k and PP i,l, it has the following form:

Xh,k + Xh,l ≤ 1 Xh,k + Xh,l ≤ 1 99

6.2 Computation of tight CRPD bounds 69 6.2 Computation of tight CRPD bounds 69

The constraint specifies that only one of the two preemptions is possible, Xh,k The constraint specifies that only one of the two preemptions is possible, Xh,k or Xh,l. The whole set Coni of constraints is formally defined as: or Xh,l. The whole set Coni of constraints is formally defined as: k,l k,l Coni = Xh,k + Xh,l ≤ 1 (1 ≤ h

Considering the running example from Figures 6.1 and 6.2, we compute that: Considering the running example from Figures 6.1 and 6.2, we compute that: 1,2 2,3 1,3 1,2 2,3 1,3 I2 =62, and 62 ≤ T1; I2 =55, and 55 ≤ T1; and I2 =99, and 99 >T1. I2 =62, and 62 ≤ T1; I2 =55, and 55 ≤ T1; and I2 =99, and 99 >T1. Hence, the set of constraints of τ2 is: Hence, the set of constraints of τ2 is: Con2 = X1,1 + X1,2 ≤ 1; X1,2 + X1,3 ≤ 1 Con2 = X1,1 + X1,2 ≤ 1; X1,2 + X1,3 ≤ 1 ↑↑ ↑↑ 1,2 2,3 1,2 2,3 I2 ≤ T1 I2 ≤ T1 I2 ≤ T1 I2 ≤ T1

As discussed previously, the scenario where the instances of τ1 preempt τ2 at As discussed previously, the scenario where the instances of τ1 preempt τ2 at both PP 2,1 and PP 2,2 is not possible. Also, the scenario where τ1 preempts both PP 2,1 and PP 2,2 is not possible. Also, the scenario where τ1 preempts both PP 2,2 and PP 2,3 is not possible. Preempting at PP 2,1 and PP 2,3 is both PP 2,2 and PP 2,3 is not possible. Preempting at PP 2,1 and PP 2,3 is however possible, as are the scenarios of preempting only at one or none of the however possible, as are the scenarios of preempting only at one or none of the three preemption points. three preemption points.

6.2.3 Goal function 6.2.3 Goal function Finally, we define the goal function of the constraint satisfaction model. The Finally, we define the goal function of the constraint satisfaction model. The goal function computes the CRPD upper bound considering the infeasible goal function computes the CRPD upper bound considering the infeasible preemption combinations defined by the constraints, but also accounts for the preemption combinations defined by the constraints, but also accounts for the precise number of cache block reloads which should be considered for the precise number of cache block reloads which should be considered for the CRPD approximation. CRPD approximation. To account for the precise maximum number of cache block reloads, we To account for the precise maximum number of cache block reloads, we generate a goal function based on observation from Section 6.1.2. For each generate a goal function based on observation from Section 6.1.2. For each useful cache block m at PP i,k (m ∈ UCB i,k) we first define a sequence of the useful cache block m at PP i,k (m ∈ UCB i,k) we first define a sequence of the non-preemptive regions during which m can be reloaded at most once by τi. non-preemptive regions during which m can be reloaded at most once by τi. Therefore, we define the set A(i, k, m), which for a task τi denotes a se- Therefore, we define the set A(i, k, m), which for a task τi denotes a sequence of preemption points from PP i,k during which we need to account for quence of preemption points from PP i,k during which we need to account for at most one reload of m. Formally: at most one reload of m. Formally:

A(i, k, m)={l | k

Considering the example in Figure 6.2, A(2, 1, 3) = {2, 3}, since the Considering the example in Figure 6.2, A(2, 1, 3) = {2, 3}, since the preemption points (PP 2,2, and PP 2,3) succeed PP 2,1, and precede the non- preemption points (PP 2,2, and PP 2,3) succeed PP 2,1, and precede the non- preemptive region (δ2,4) where cache block 3 is re-accessed by τi. The CRPD preemptive region (δ2,4) where cache block 3 is re-accessed by τi. The CRPD 100

70 Chapter 6. Preemption delay analysis for LPS-FPP 70 Chapter 6. Preemption delay analysis for LPS-FPP estimation should account for at most one reload of cache block 3 even if there estimation should account for at most one reload of cache block 3 even if there are preempting tasks that may evict m at both points (PP 2,2 and PP 2,3). are preempting tasks that may evict m at both points (PP 2,2 and PP 2,3). We propose the following goal function in order to account for both, the We propose the following goal function in order to account for both, the infeasible preemptions and the precise number of useful cache block reloads: infeasible preemptions and the precise number of useful cache block reloads:

Goal i = Maximize : Goal i = Maximize : min 1, {Xh,k | τh ∈ hp(τi) ∧ m ∈ ECB h} × min 1, {Xh,k | τh ∈ hp(τi) ∧ m ∈ ECB h} × PP ∈τ m∈UCB PP ∈τ m∈UCB i,k i i,k i,k i i,k × 1 − min 1, {Xh,r | τh ∈ hp(τi) ∧ m ∈ ECB h} × × 1 − min 1, {Xh,r | τh ∈ hp(τi) ∧ m ∈ ECB h} × r∈A(i,k,m) r∈A(i,k,m) × BRT × BRT

The goal function consists of two expressions which compute the maximum The goal function consists of two expressions which compute the maximum number of necessary cache block reloads for τi, which multiplied by the block number of necessary cache block reloads for τi, which multiplied by the block reload time BRT results into the upper bound γi on the CRPD of τi. reload time BRT results into the upper bound γi on the CRPD of τi. The first expression (line 1) of the goal function iterates over the useful The first expression (line 1) of the goal function iterates over the useful cache blocks of the preemption points, one by one, accounting for the evictions cache blocks of the preemption points, one by one, accounting for the evictions from the higher priority tasks at those specified points. The minimum function from the higher priority tasks at those specified points. The minimum function accounts for at most one eviction of a cache block, even if it is in the evicting accounts for at most one eviction of a cache block, even if it is in the evicting cache block set of more than one higher priority task. For a single cache block cache block set of more than one higher priority task. For a single cache block m, where m ∈ UCB i,k at PP i,k, the minimum function from the first line will m, where m ∈ UCB i,k at PP i,k, the minimum function from the first line will result in: result in:

1 : if there is at least one preemption from a task with m in its evicting 1 : if there is at least one preemption from a task with m in its evicting cache block set. cache block set.

0 : if there are no preemptions from the preempting tasks with m in their 0 : if there are no preemptions from the preempting tasks with m in their evicting cache block sets. evicting cache block sets.

The second expression (line 2) of the goal function accounts for the precise The second expression (line 2) of the goal function accounts for the precise number of useful cache block reloads. It computes if there is an eviction of number of useful cache block reloads. It computes if there is an eviction of cache block m, from PP i,k until the first following non-preemptive region cache block m, from PP i,k until the first following non-preemptive region where m is re-accessed, given by A(i, k, m).Ifm is evicted during this interval, where m is re-accessed, given by A(i, k, m).Ifm is evicted during this interval, the expression results in 1, otherwise it results in 0. the expression results in 1, otherwise it results in 0. Let us consider a cache block m at the specified preemption point PP i,k Let us consider a cache block m at the specified preemption point PP i,k such that m ∈ UCB i,k, and let us assume that m is re-accessed by τi at δi,l+1 such that m ∈ UCB i,k, and let us assume that m is re-accessed by τi at δi,l+1 (k

6.2 Computation of tight CRPD bounds 71 6.2 Computation of tight CRPD bounds 71

0 : if there are no feasible preemptions from the preempting tasks with 0 : if there are no feasible preemptions from the preempting tasks with m in their evicting cache block sets. This is the case because the first m in their evicting cache block sets. This is the case because the first expression results in 0 and it is multiplied with the second expression. expression results in 0 and it is multiplied with the second expression. 0:ifm is in the evicting cache block set of at least one task that preempts 0:ifm is in the evicting cache block set of at least one task that preempts at PP i,k, but also in the evicting cache block set of at least one task that at PP i,k, but also in the evicting cache block set of at least one task that preempts from PP i,k+1 until PP i,l. This is the case because both of the preempts from PP i,k+1 until PP i,l. This is the case because both of the expressions result in 1, which finally results in: 1 × (1 − 1) = 0. expressions result in 1, which finally results in: 1 × (1 − 1) = 0.

1: if m is in the evicting cache block set of at least one task that preempts 1: if m is in the evicting cache block set of at least one task that preempts at PP i,k, and there are no further preemption points until δi,l+1 where m at PP i,k, and there are no further preemption points until δi,l+1 where m is evicted. This is the case because the first expression results in 1 and the is evicted. This is the case because the first expression results in 1 and the second expression results in 0, which finally results in: 1 × (1 − 0) = 1. second expression results in 0, which finally results in: 1 × (1 − 0) = 1.

For any cache block m, the formulation will account for at most one reload For any cache block m, the formulation will account for at most one reload from PP i,k until δi,l+1, accounting it in the last preemption point where m is from PP i,k until δi,l+1, accounting it in the last preemption point where m is evicted during this interval. Reloads of the previous preemption points where evicted during this interval. Reloads of the previous preemption points where m is evicted in this interval will not be accounted for since they will result in 0. m is evicted in this interval will not be accounted for since they will result in 0. Let us take for example cache block 3 from Figures 6.1 and 6.2. First time Let us take for example cache block 3 from Figures 6.1 and 6.2. First time it is accessed by τ2 is at the non-preemptive region δ2,1. Then it is re-accessed it is accessed by τ2 is at the non-preemptive region δ2,1. Then it is re-accessed by τ2 at δ2,4. It can be evicted either at PP 2,1 or PP 2,3, as shown in Section by τ2 at δ2,4. It can be evicted either at PP 2,1 or PP 2,3, as shown in Section 6.2.2, but the analysis should account for at most one reload. The goal function 6.2.2, but the analysis should account for at most one reload. The goal function for this cache block at all preemption points is shown bellow. for this cache block at all preemption points is shown bellow. Goal 2 = Maximize : BRT × Goal 2 = Maximize : BRT × ... + ... + X1,1 × (1 − min(1,X1,2 + X1,3)) + ... ← k =1 ... + ... + X1,1 × (1 − min(1,X1,2 + X1,3)) + ... ← k =1 ... + ... + X1,2 × (1 − min(1,X1,3)) + ... ← k =2 ... + ... + X1,2 × (1 − min(1,X1,3)) + ... ← k =2 ... + ... + X1,3 × (1 − 0) + ... ← k =3 ... + ... + X1,3 × (1 − 0) + ... ← k =3 ↑↑↑↑ ↑↑↑↑ m =1 m =2 m =3 m =4 m =1 m =2 m =3 m =4

We omitted the remaining parts of the nested sum for other m and k values of We omitted the remaining parts of the nested sum for other m and k values of the goal function. Considering the cache block 3 and its possible eviction at the goal function. Considering the cache block 3 and its possible eviction at PP 2,1 we get that it is X1,1 × (1 − min(1,X1,2 + X1,3)). This means that a PP 2,1 we get that it is X1,1 × (1 − min(1,X1,2 + X1,3)). This means that a preemption at PP 2,1 contributes to the CRPD only if there is no preemption at preemption at PP 2,1 contributes to the CRPD only if there is no preemption at PP 2,2 or PP 2,3. Similar formulation holds for the same block at PP 2,2 and PP 2,2 or PP 2,3. Similar formulation holds for the same block at PP 2,2 and PP 2,3, except in case of PP 2,3 the set A(2,3,3) results in 0, since there are no PP 2,3, except in case of PP 2,3 the set A(2,3,3) results in 0, since there are no preemption points after PP 2,3 where cache block 3 can be useful. preemption points after PP 2,3 where cache block 3 can be useful. 102

72 Chapter 6. Preemption delay analysis for LPS-FPP 72 Chapter 6. Preemption delay analysis for LPS-FPP

6.3 Evaluation 6.3 Evaluation

In the following experiments we evaluate several CRPD approximation meth- In the following experiments we evaluate several CRPD approximation methods and report the computed CRPD bounds. We show the estimations of the ods and report the computed CRPD bounds. We show the estimations of the CRPD upper bounds computed by the method proposed in this chapter, denoted CRPD upper bounds computed by the method proposed in this chapter, denoted with IPR (resolving Infeasible Preemptions and Reloads), method proposed with IPR (resolving Infeasible Preemptions and Reloads), method proposed in [32], denoted with IP (resolving only infeasible preemptions), and the over- in [32], denoted with IP (resolving only infeasible preemptions), and the over- approximation OA which accounts for the maximum eviction at all preemption approximation OA which accounts for the maximum eviction at all preemption points, which is used in state of the art, e.g. [62, 63, 12]. points, which is used in state of the art, e.g. [62, 63, 12]. We used the open-source constraint programming solver Choco [110] on a We used the open-source constraint programming solver Choco [110] on a device with 2,9 GHz Intel Core i5 processor and 8 GB 1867 MHz DDR3 RAM device with 2,9 GHz Intel Core i5 processor and 8 GB 1867 MHz DDR3 RAM memory. memory.

Experiment setup Experiment setup

In all of the experiments, tasksets are generated with the ﬁxed utilisation of In all of the experiments, tasksets are generated with the ﬁxed utilisation of 0.8, using the U-unifast algorithm [31]. We randomly generated the minimum 0.8, using the U-unifast algorithm [31]. We randomly generated the minimum inter-arrival times from the uniform distribution [5ms, 5s], as it reasonably inter-arrival times from the uniform distribution [5ms, 5s], as it reasonably corresponds to real systems. The worst case execution times are calculated corresponds to real systems. The worst case execution times are calculated such that: Ci = Ti × Ui, and the priorities are assigned considering the rate such that: Ci = Ti × Ui, and the priorities are assigned considering the rate monotonic order. We also randomly generated a number of non-preemptive monotonic order. We also randomly generated a number of non-preemptive regions from uniform distribution [1, 10]. regions from uniform distribution [1, 10]. Regarding the cache setup we used the one described by Altmeyer et al. [49], Regarding the cache setup we used the one described by Altmeyer et al. [49], where memory blocks are represented with integer values from 0 to 256, and where memory blocks are represented with integer values from 0 to 256, and the maximum cache size CS is 256. Block reload time BRT is set to 8μs. the maximum cache size CS is 256. Block reload time BRT is set to 8μs. The evicting cache block set of a task ECB i is generated using U-unifast The evicting cache block set of a task ECB i is generated using U-unifast algorithm, proposed by Enrico Bini and Giorgio Butazzo [31], such that CU = algorithm, proposed by Enrico Bini and Giorgio Butazzo [31], such that CU = n n 1 |ECB i|/CS, where |ECB i| is the number of cache blocks in the ECB i.In 1 |ECB i|/CS, where |ECB i| is the number of cache blocks in the ECB i.In cases when the algorithm returned values above 1, we assigned all cache blocks cases when the algorithm returned values above 1, we assigned all cache blocks to ECB i. Useful cache block set UCB i of a task is randomly generated from to ECB i. Useful cache block set UCB i of a task is randomly generated from ECB i using the reload factor RF , representing the assumed reuse factor which ECB i using the reload factor RF , representing the assumed reuse factor which in real system varies from very low (0%), to high (30%). Therefore, UCB i = in real system varies from very low (0%), to high (30%). Therefore, UCB i = RF ×|ECB i|, where RF is uniformly generated from the range [0, 0.3]. Next, RF ×|ECB i|, where RF is uniformly generated from the range [0, 0.3]. Next, we randomly generated useful cache block sets UCB i,k for each non-preemptive we randomly generated useful cache block sets UCB i,k for each non-preemptive region from uniform distribution [0, 100 ×|UCB i|], and accordingly added region from uniform distribution [0, 100 ×|UCB i|], and accordingly added evicting cache blocks such that the sequence of the consecutive useful cache evicting cache blocks such that the sequence of the consecutive useful cache block sets is bounded by two memory block accesses. block sets is bounded by two memory block accesses. 103

6.3 Evaluation 73 6.3 Evaluation 73

First experiment First experiment In the first experiment, we evaluated the CRPD estimation average over the In the first experiment, we evaluated the CRPD estimation average over the generated tasksets, varying the total cache utilization from 20% to 90% (see generated tasksets, varying the total cache utilization from 20% to 90% (see Figure 6.4), and the number of tasks in each taskset was fixed to 10. As expected, Figure 6.4), and the number of tasks in each taskset was fixed to 10. As expected, IPR estimates considerably less pessimistic upper bounds compared to IP and IPR estimates considerably less pessimistic upper bounds compared to IP and OA. Moreover, it is evident that the benefit of using IPR is even larger with the OA. Moreover, it is evident that the benefit of using IPR is even larger with the increase of cache utilisation compared to IP. For example, when CU = 20%, increase of cache utilisation compared to IP. For example, when CU = 20%, the average taskset CRPD estimation using IPR is 181μs, which is significantly the average taskset CRPD estimation using IPR is 181μs, which is significantly reduced compared to the estimation of 324μs when using IP (by 44%), and the reduced compared to the estimation of 324μs when using IP (by 44%), and the estimation of 601μs when using OA (by 70%). When CU = 90%, the average estimation of 601μs when using OA (by 70%). When CU = 90%, the average taskset CRPD estimation using IPR is 1040μs, which is a reduction ratio of taskset CRPD estimation using IPR is 1040μs, which is a reduction ratio of 58% compared with the bound of 2432μs when using IP, and 70% compared 58% compared with the bound of 2432μs when using IP, and 70% compared to the estimation of 3280μs when using OA. This is the case since by increasing to the estimation of 3280μs when using OA. This is the case since by increasing cache utilisation, more preempting tasks share the evicting cache blocks, which cache utilisation, more preempting tasks share the evicting cache blocks, which furthermore increases the number of infeasible reloads that should be accounted furthermore increases the number of infeasible reloads that should be accounted by the CRPD analysis. by the CRPD analysis.

Second experiment Second experiment In the second experiment, we evaluated the CRPD estimation average over the In the second experiment, we evaluated the CRPD estimation average over the generated tasksets, varying the number of tasks from 3 to 10 (see Figure 6.5). In generated tasksets, varying the number of tasks from 3 to 10 (see Figure 6.5). In this experiment, we ﬁxed the total cache utilisation to 40%. The results reveal this experiment, we ﬁxed the total cache utilisation to 40%. The results reveal that the IPR dominates the other two approaches (IP and OA) even when that the IPR dominates the other two approaches (IP and OA) even when increasing the number of tasks in a taskset. This is the case since by increasing increasing the number of tasks in a taskset. This is the case since by increasing the number of tasks, we also increase the number of preempting tasks and thus the number of tasks, we also increase the number of preempting tasks and thus the number of possible preemptions. For example, when n =7, the average the number of possible preemptions. For example, when n =7, the average

3500 3500

3000 IPR 3000 IPR

2500 IP 2500 IP OA OA 2000 2000

1500 1500

1000 1000

CRPD (micro seconds) 500 CRPD (micro seconds) 500

0 0 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Cache utilisation (%) Cache utilisation (%)

Figure 6.4. CRPD estimation per taskset for different levels of cache utilization, calculated as the Figure 6.4. CRPD estimation per taskset for different levels of cache utilization, calculated as the average over the 2000 generated tasksets. average over the 2000 generated tasksets. 104

74 Chapter 6. Preemption delay analysis for LPS-FPP 74 Chapter 6. Preemption delay analysis for LPS-FPP

2000 IPR 2000 IPR IP IP 1500 OA 1500 OA

1000 1000

500 500 CRPD (micro seconds) CRPD (micro seconds) 0 0 345678910 345678910 Number of tasks Number of tasks

Figure 6.5. CRPD estimation per taskset, for different taskset sizes, calculated as the average over Figure 6.5. CRPD estimation per taskset, for different taskset sizes, calculated as the average over the 2000 generated tasksets. the 2000 generated tasksets. taskset CRPD estimation using IPR is 470μs, which is a reduction of 48% taskset CRPD estimation using IPR is 470μs, which is a reduction of 48% compared to the bound of 896μs when using IP, and even 69% compared to compared to the bound of 896μs when using IP, and even 69% compared to the estimation of 1560μs when using OA. However the average analysis time the estimation of 1560μs when using OA. However the average analysis time increases with the number of tasks in a taskset, from 48 ms when n =3,to increases with the number of tasks in a taskset, from 48 ms when n =3,to 2772 ms when n =10. Since the analysis time varies a lot among different 2772 ms when n =10. Since the analysis time varies a lot among different cases, we used a time limit of 40 seconds per taskset analysis, and if it failed cases, we used a time limit of 40 seconds per taskset analysis, and if it failed to provide a value within this time bound, it instead reported the CRPD value to provide a value within this time bound, it instead reported the CRPD value computed with OA. However, the proposed solution is an offline method and computed with OA. However, the proposed solution is an offline method and the analysis time can be significantly improved by using different solvers.1 the analysis time can be significantly improved by using different solvers.1

6.4 Summary and Conclusions 6.4 Summary and Conclusions

In this chapter, we proposed an improved Cache-Related Preemption Delay In this chapter, we proposed an improved Cache-Related Preemption Delay (CRPD) analysis for sporadic real-time systems under Fixed Preemption Point (CRPD) analysis for sporadic real-time systems under Fixed Preemption Point Scheduling, which reduces the pessimism in CRPD estimation compared to Scheduling, which reduces the pessimism in CRPD estimation compared to previous approaches. We first identified two potential sources of CRPD over- previous approaches. We first identified two potential sources of CRPD over- approximation: 1) infeasible preemption combinations, and 2) infeasible cache approximation: 1) infeasible preemption combinations, and 2) infeasible cache block reloads. To address those problems and compute more precise CRPD block reloads. To address those problems and compute more precise CRPD upper bounds, we proposed an MILP-based solution. The evaluation results upper bounds, we proposed an MILP-based solution. The evaluation results show that the proposed approach significantly reduces the upper bounds on show that the proposed approach significantly reduces the upper bounds on CRPD estimation, compared to previous methods, although with the increased CRPD estimation, compared to previous methods, although with the increased

1We extracted a few of the most time-consuming (more than 40 s) cases from the evaluation, 1We extracted a few of the most time-consuming (more than 40 s) cases from the evaluation, and with IBM CPLEX [111] they were solved in less than a 200 ms. and with IBM CPLEX [111] they were solved in less than a 200 ms. 105

6.4 Summary and Conclusions 75 6.4 Summary and Conclusions 75 complexity inherited by application of MILP. complexity inherited by application of MILP. In the next chapter, we propose a preemption-delay aware schedulability In the next chapter, we propose a preemption-delay aware schedulability analysis for tasks with ﬁxed preemption points. analysis for tasks with ﬁxed preemption points. 106 107

Chapter 7 Chapter 7

Preemption-delay aware Preemption-delay aware schedulability analysis schedulability analysis for tasks with ﬁxed for tasks with ﬁxed preemption points preemption points

In this chapter, we propose a novel preemption-delay aware schedulability In this chapter, we propose a novel preemption-delay aware schedulability analysis for sporadic sequential tasks (see Section 5.5) with ﬁxed preemption analysis for sporadic sequential tasks (see Section 5.5) with ﬁxed preemption points that are running on single-core systems with the single-level direct- points that are running on single-core systems with the single-level direct- mapped or set-associative LRU cache. The chapter is divided into the following mapped or set-associative LRU cache. The chapter is divided into the following sections: sections: Self-pushing phenomenon – description of the self-pushing phenomenon. Self-pushing phenomenon – description of the self-pushing phenomenon. Feasibility analysis of LPS-FPP – a description of the seminal schedula- Feasibility analysis of LPS-FPP – a description of the seminal schedulability analysis for tasks with non-preemptive regions. bility analysis for tasks with non-preemptive regions.

Problem statement – a description of the identiﬁed problems in SOTA, Problem statement – a description of the identiﬁed problems in SOTA, Preemption-delay aware RTA for LPS-FPP – the proposed solution, Preemption-delay aware RTA for LPS-FPP – the proposed solution,

Evaluation – a description of experiments and their results, Evaluation – a description of experiments and their results,

Summary and Conclusions of the chapter Summary and Conclusions of the chapter

77 77 108

78 Chapter 7. Preemption-delay aware analysis for LPS-FPP 78 Chapter 7. Preemption-delay aware analysis for LPS-FPP

Table 7.1. List of important symbols used in this chapter (CRPD terms are upper bounds). Table 7.1. List of important symbols used in this chapter (CRPD terms are upper bounds).

Symbol Brief explanation Symbol Brief explanation Γ Taskset Γ Taskset

τi Task with index i τi Task with index i Ti The minimum inter-arrival time of τi Ti The minimum inter-arrival time of τi Di Relative deadline of τi Di Relative deadline of τi Ci The worst-case execution time of τi without CRPD Ci The worst-case execution time of τi without CRPD Pi Priority of τi Pi Priority of τi Ri The worst-case response time of τi Ri The worst-case response time of τi Li Level-i pending workload Li Level-i pending workload bi The maximum lower priority blocking on τi bi The maximum lower priority blocking on τi γi Upper bound on CRPD of τi γi Upper bound on CRPD of τi i The largest CRPD caused from a single preempted point in τi i The largest CRPD caused from a single preempted point in τi Ii Upper bound on time duration from the start time of the ﬁrst NPR of Ii Upper bound on time duration from the start time of the ﬁrst NPR of τi until the start time of the last NPR of τi. τi until the start time of the last NPR of τi. li Number of non-preemptive regions in τi li Number of non-preemptive regions in τi max max qi The maximum execution time of a NPR from τi qi The maximum execution time of a NPR from τi ECB i Set of accessed (evicting) cache blocks during τi ECB i Set of accessed (evicting) cache blocks during τi hp(i), lp(i) Set of tasks with higher and lower priority than Pi, respectively, and hp(i), lp(i) Set of tasks with higher and lower priority than Pi, respectively, and hpe(i) set of tasks with higher priority than Pi, including τi hpe(i) set of tasks with higher priority than Pi, including τi

τi,j The j-th job of τi τi,j The j-th job of τi Fi,j The latest ﬁnish time of τi,j Fi,j The latest ﬁnish time of τi,j Si,j The latest start time of the last NPR of τi,j Si,j The latest start time of the last NPR of τi,j

δi,k The k-th non-preemptive region of τi δi,k The k-th non-preemptive region of τi qi,k The worst-case execution time of the k-th NPR of τi qi,k The worst-case execution time of the k-th NPR of τi ECB i,k Set of accessed (evicting) cache blocks during δi,k ECB i,k Set of accessed (evicting) cache blocks during δi,k PP i,k The k-th preemption point of τi PP i,k The k-th preemption point of τi UCB i,k Set of useful cache blocks at PP i,k UCB i,k Set of useful cache blocks at PP i,k x x UCBi,k Set of cache blocks that must be cached at PP i,k, and may be reused UCBi,k Set of cache blocks that must be cached at PP i,k, and may be reused until the end of the x-th NPR until the end of the x-th NPR m Cache block m Cache block t Time duration t Time duration γ∪ RCB-based upper bound on CRPD γ∪ RCB-based upper bound on CRPD γ+ Preemption-based upper bound on CRPD γ+ Preemption-based upper bound on CRPD m m ubi,x Upper bound on reloads of m during the first x NPRs of τi ubi,x Upper bound on reloads of m during the first x NPRs of τi RCBi,x Multiset of reloadable cache blocks in the first x NPRs of τi RCBi,x Multiset of reloadable cache blocks in the first x NPRs of τi h h CRPD i,x A multiset of CRPDs that can be caused by a job of τh on each of the CRPD i,x A multiset of CRPDs that can be caused by a job of τh on each of the first x NPRs of a job of τi first x NPRs of a job of τi h h γi,x(t) CRPD caused by preemptions from jobs of τh on the first x NPRs of γi,x(t) CRPD caused by preemptions from jobs of τh on the first x NPRs of a single job of τi, within t a single job of τi, within t γi,x(t) CRPD in the first x NPRs of a job of τi within duration t γi,x(t) CRPD in the first x NPRs of a job of τi within duration t γi(t) CRPD in all the jobs of τi that can be released within t γi(t) CRPD in all the jobs of τi that can be released within t 109

7.1 Self-Pushing phenomenon 79 7.1 Self-Pushing phenomenon 79

7.1 Self-Pushing phenomenon 7.1 Self-Pushing phenomenon

A self-pushing phenomenon is a scheduling anomaly which may occur in the A self-pushing phenomenon is a scheduling anomaly which may occur in the scheduling of tasks with non-preemptive regions. It is identified and proved by scheduling of tasks with non-preemptive regions. It is identified and proved by Bril et al. [59]. It represents the case when the worst-case response time of a Bril et al. [59]. It represents the case when the worst-case response time of a task does not necessarily occur in the first job after the critical instant, but it task does not necessarily occur in the first job after the critical instant, but it may occur in one of the succeeding jobs, during the Level-i active period. As may occur in one of the succeeding jobs, during the Level-i active period. As proposed by Bril et al. [112] we describe the definition of this term, but before proposed by Bril et al. [112] we describe the definition of this term, but before that we describe what is the Level-i pending workload. that we describe what is the Level-i pending workload.

Definition 7.1. A Level-i pending workload at time point t is the amount of Definition 7.1. A Level-i pending workload at time point t is the amount of unfinished execution of jobs with a priority higher than or equal to Pi, released unfinished execution of jobs with a priority higher than or equal to Pi, released strictly before t. strictly before t.

Deﬁnition 7.2. A Level-i active period Li is an interval [a, b] such that Level-i Deﬁnition 7.2. A Level-i active period Li is an interval [a, b] such that Level-i pending workload is positive for all t ∈ (a, b) and zero in a and b. pending workload is positive for all t ∈ (a, b) and zero in a and b.

Based on the above-deﬁned active period, Bril et al. proved that the worst- Based on the above-deﬁned active period, Bril et al. proved that the worst- case response time of a job case response time of a job

Theorem 7.1. A maximum response time of task τi under sporadic LP-FPPS is Theorem 7.1. A maximum response time of task τi under sporadic LP-FPPS is assumed in a level-i active period that is started when τi has a simultaneous assumed in a level-i active period that is started when τi has a simultaneous release with all higher priority tasks, and a subjob that produces maximum release with all higher priority tasks, and a subjob that produces maximum lower priority blocking starts an inﬁnitesimal time before the simultaneous lower priority blocking starts an inﬁnitesimal time before the simultaneous release. (proved and proposed by Bril et al. [112]) release. (proved and proposed by Bril et al. [112])

Maximum lower Maximum lower priority blocking priority blocking Deadline miss Deadline miss units long units long

Figure 7.1. Example of the self-pushing phenomenon. Figure 7.1. Example of the self-pushing phenomenon. 110

80 Chapter 7. Preemption-delay aware analysis for LPS-FPP 80 Chapter 7. Preemption-delay aware analysis for LPS-FPP

In Figure 7.1, we provide an example of this phenomenon. We illustrate In Figure 7.1, we provide an example of this phenomenon. We illustrate three non-preemptive tasks τ1, τ2, and τ3 and their respective instances. In the three non-preemptive tasks τ1, τ2, and τ3 and their respective instances. In the beginning, we account for the maximum lower priority blocking on τ3, which beginning, we account for the maximum lower priority blocking on τ3, which is b3 time units long. Notice that although the first job τ3,1 finishes before the is b3 time units long. Notice that although the first job τ3,1 finishes before the deadline, the second job τ3,2 misses its relative deadline. deadline, the second job τ3,2 misses its relative deadline. The general computation of the worst-case response time for a task with The general computation of the worst-case response time for a task with non-preemptive regions is computed with the following simplified equation: non-preemptive regions is computed with the following simplified equation:

Ri =max {Fi,j − (j − 1) × Ti} (7.1) Ri =max {Fi,j − (j − 1) × Ti} (7.1) j∈ 1, Li j∈ 1, Li Ti Ti where Fi,j represents the latest finish time, from the simultaneous release, of where Fi,j represents the latest finish time, from the simultaneous release, of the j-th job τi,j that is released within Li. Thus, the the worst-case response the j-th job τi,j that is released within Li. Thus, the the worst-case response time Ri of τi is equal to the maximum relative latest finish time of a job τi,j time Ri of τi is equal to the maximum relative latest finish time of a job τi,j which is released within Li. In the next section, we describe the seminal feasi- which is released within Li. In the next section, we describe the seminal feasibility analysis for limited-preemptive scheduling of tasks with fixed preemption bility analysis for limited-preemptive scheduling of tasks with fixed preemption points. points.

7.2 Feasibility analysis for LPS-FPP 7.2 Feasibility analysis for LPS-FPP

The seminal feasibility analysis for tasks with fixed preemption points was The seminal feasibility analysis for tasks with fixed preemption points was proposed by Yao et al. [57, 58], and here we briefly describe their work using proposed by Yao et al. [57, 58], and here we briefly describe their work using Butazzo’s formulations [11]. For a single job τi,j of τi, a feasibility analysis is Butazzo’s formulations [11]. For a single job τi,j of τi, a feasibility analysis is divided into two parts. First, it is necessary to compute the latest start time Si,j divided into two parts. First, it is necessary to compute the latest start time Si,j of the last non-preemptive region. Then, the latest finish time Fi,j of a job is of the last non-preemptive region. Then, the latest finish time Fi,j of a job is computed. This is performed in two steps because a task can experience the computed. This is performed in two steps because a task can experience the interference from the higher priority tasks prior to the last NPR. But, when the interference from the higher priority tasks prior to the last NPR. But, when the last NPR starts to execute, it cannot further interfere. The latest start time Si,j last NPR starts to execute, it cannot further interfere. The latest start time Si,j of the last NPR of τi,j is equal to the least fixed point of the following equation: of the last NPR of τi,j is equal to the least fixed point of the following equation: ⎧ ⎧ ⎪ (0) − ⎪ (0) − ⎪Si,j = bi + Ci qi,li + Ch ⎪Si,j = bi + Ci qi,li + Ch ⎪ ⎪ ⎨ τh∈hp(i) ⎨ τh∈hp(i) (r−1) (7.2) (r−1) (7.2) ⎪ S ⎪ S ⎪ (r) × − i,j × ⎪ (r) × − i,j × ⎩⎪Si,j = bi + j Ci qi,li + +1 Ch ⎩⎪Si,j = bi + j Ci qi,li + +1 Ch Th Th τh∈hp(i) τh∈hp(i) where bi denotes the maximum lower priority blocking that can be imposed where bi denotes the maximum lower priority blocking that can be imposed on τi. The latest finish time Fi,j of the j-th job is computed with the following on τi. The latest finish time Fi,j of the j-th job is computed with the following equation Fi,j = Si,j + qi,li , i.e. by adding the latest start time of the j-th job’s equation Fi,j = Si,j + qi,li , i.e. by adding the latest start time of the j-th job’s last NPR and the worst-case execution time qi,li of the NPR δi,li . last NPR and the worst-case execution time qi,li of the NPR δi,li . 111

7.3 Problem statement 81 7.3 Problem statement 81

The worst-case response time Ri is finally defined as the maximum latest The worst-case response time Ri is finally defined as the maximum latest finish time of all jobs released during the Level-i active period, (Equation 7.1). finish time of all jobs released during the Level-i active period, (Equation 7.1). Furthermore, Yao et al. [58] proved that the analysis can be simplified to Furthermore, Yao et al. [58] proved that the analysis can be simplified to the first job of each task if for each task τi holds that Di ≤ Ti, and if the taskset the first job of each task if for each task τi holds that Di ≤ Ti, and if the taskset is feasible under fully-preemptive scheduling. is feasible under fully-preemptive scheduling.

7.3 Problem statement 7.3 Problem statement

Considering the preemption related delay, Yao et al. [58] proposed that all Considering the preemption related delay, Yao et al. [58] proposed that all γ γ Ci values should be replaced by: Ci = Ci +(li − 1) × i, where Ci is the Ci values should be replaced by: Ci = Ci +(li − 1) × i, where Ci is the worst-case execution time without considering preemption delays, and i is the worst-case execution time without considering preemption delays, and i is the largest possible preemption cost that τi can experience at one of its preemption largest possible preemption cost that τi can experience at one of its preemption points. This can be a signiﬁcant over-approximation because: points. This can be a signiﬁcant over-approximation because:

the analysis omits the fact that CRPD of a task τi can be caused only the analysis omits the fact that CRPD of a task τi can be caused only upon the execution start of the ﬁrst NPR δi,1, until the latest start time of upon the execution start of the ﬁrst NPR δi,1, until the latest start time of the last NPR δi,li . the last NPR δi,li .

the analysis does not account for the case where a preempting task τh can the analysis does not account for the case where a preempting task τh can preempt only a limited number of preemption points of a preempted task preempt only a limited number of preemption points of a preempted task τi. Instead, it implicitly assumes that each preempting task can preempt τi. Instead, it implicitly assumes that each preempting task can preempt τi at all of its preemption points. τi at all of its preemption points. the analysis does not account for the case that between the two accesses the analysis does not account for the case that between the two accesses of a memory block, it can be reloaded at most once due to preemption. of a memory block, it can be reloaded at most once due to preemption. While it is possible to update the worst-case execution time from Equation While it is possible to update the worst-case execution time from Equation γ γ 7.2 with the CRPD bound defined in the previous chapter, i.e. Ci = Ci + γi, 7.2 with the CRPD bound defined in the previous chapter, i.e. Ci = Ci + γi, where γi is computed with Algorithm 1, the proposed algorithm does not solve where γi is computed with Algorithm 1, the proposed algorithm does not solve the first of the above-mentioned sources of over-approximation. Also, since the first of the above-mentioned sources of over-approximation. Also, since MILP is known to be NP-complete problem, its usability degrades either with MILP is known to be NP-complete problem, its usability degrades either with the increase of preemption points, preempting tasks, or cache blocks which it the increase of preemption points, preempting tasks, or cache blocks which it needs to account for. Therefore, in the following section, we provide motivating needs to account for. Therefore, in the following section, we provide motivating examples for the novel method and then we define a response-time analysis examples for the novel method and then we define a response-time analysis with a more precise preemption delay estimation. with a more precise preemption delay estimation.

7.3.1 Motivating Examples 7.3.1 Motivating Examples In this section, we present two motivating examples for the novel response-time In this section, we present two motivating examples for the novel response-time analysis. In Figure 7.2, we show a high-level scheduling perspective of a job analysis. In Figure 7.2, we show a high-level scheduling perspective of a job 112

82 Chapter 7. Preemption-delay aware analysis for LPS-FPP 82 Chapter 7. Preemption-delay aware analysis for LPS-FPP

τi,j with three preemption points (PP i,1, PP i,2, and PP i,3). Also, we show τi,j with three preemption points (PP i,1, PP i,2, and PP i,3). Also, we show three preempting jobs of τh such that Ph >Pi. three preempting jobs of τh such that Ph >Pi.

Delay due to a Delay due to a cache block reload cache block reload

Figure 7.2. Example of different time intervals used in the analysis. Figure 7.2. Example of different time intervals used in the analysis.

In Figure 7.2, it is shown that immediately after the NPR, with the maximum In Figure 7.2, it is shown that immediately after the NPR, with the maximum blocking bi on τi,j, starts to execute, τi and τh are released. Further, we blocking bi on τi,j, starts to execute, τi and τh are released. Further, we distinguish among three important time intervals. With Si,j we denote the latest distinguish among three important time intervals. With Si,j we denote the latest start time of the last NPR of τi,j, and with Fi,j we denote its latest finish time. start time of the last NPR of τi,j, and with Fi,j we denote its latest finish time. However, notice that we also denote Ii, which is the maximum duration from However, notice that we also denote Ii, which is the maximum duration from the start time of the first NPR of τi,j until the start time of the last NPR of τi,j. the start time of the first NPR of τi,j until the start time of the last NPR of τi,j. Therefore, this interval also represents the maximum time interval during which Therefore, this interval also represents the maximum time interval during which higher priority jobs can affect CRPD of τi,j due to preemptions. Note that the higher priority jobs can affect CRPD of τi,j due to preemptions. Note that the latest start time Si,j of the last NPR includes lower priority blocking (bi time latest start time Si,j of the last NPR includes lower priority blocking (bi time units long), and possibly some higher priority interference before the start time units long), and possibly some higher priority interference before the start time of τi,j. During those two events, τi,j cannot be preempted nor can its CRPD be of τi,j. During those two events, τi,j cannot be preempted nor can its CRPD be affected. Therefore, Ii can be less than Si,j and Fi,j and its precise estimation affected. Therefore, Ii can be less than Si,j and Fi,j and its precise estimation is important since by an over-approximation of Ii it is possible to account for is important since by an over-approximation of Ii it is possible to account for the CRPD impact from multiple preemptions which in reality cannot occur. the CRPD impact from multiple preemptions which in reality cannot occur. In Figure 7.3, we show the low-level perspective of the tasks, with associated In Figure 7.3, we show the low-level perspective of the tasks, with associated task and cache usage parameters. The accessed cache-sets are depicted as task and cache usage parameters. The accessed cache-sets are depicted as circled integers in the NPRs where they are accessed. For example, task τ1 circled integers in the NPRs where they are accessed. For example, task τ1 accesses cache blocks in cache-sets 1,2,3, and 4. Considering task τ3, in its accesses cache blocks in cache-sets 1,2,3, and 4. Considering task τ3, in its first NPR it accesses cache blocks in cache sets 1 and 2, and so on. Therefore, first NPR it accesses cache blocks in cache sets 1 and 2, and so on. Therefore, a set of useful cache blocks at PP 3,1 is UCB 3,1 = {1, 2}, while at PP 3,2 is a set of useful cache blocks at PP 3,1 is UCB 3,1 = {1, 2}, while at PP 3,2 is UCB 3,2 = {1, 2, 3}, and at PP 3,3 it is UCB 3,3 = {1, 2, 3, 4}. UCB 3,2 = {1, 2, 3}, and at PP 3,3 it is UCB 3,3 = {1, 2, 3, 4}. 113

7.4 Preemption-delay aware RTA for LPS-FPP 83 7.4 Preemption-delay aware RTA for LPS-FPP 83

! !

Figure 7.3. Running taskset example. Figure 7.3. Running taskset example.

To motivate the necessity for the analysis on this level, assume that τ1 To motivate the necessity for the analysis on this level, assume that τ1 preempts τ3 at all of its three preemption points. Let us now focus on cache-set preempts τ3 at all of its three preemption points. Let us now focus on cache-set 1. Although it is considered that there might be a UCB residing in cache-set 1 1. Although it is considered that there might be a UCB residing in cache-set 1 at each of the three preempting points, this cache block can be reloaded at most at each of the three preempting points, this cache block can be reloaded at most once due to preemption. This reload can only occur within the last NPR of τ3 once due to preemption. This reload can only occur within the last NPR of τ3 since only in the last NPR it is re-accessed. To consider the worst-case eviction since only in the last NPR it is re-accessed. To consider the worst-case eviction at each of those three points in isolation, as in method proposed by Yao et al. at each of those three points in isolation, as in method proposed by Yao et al. [58], can introduce over-approximation, accounting for 3 block reloads in cache [58], can introduce over-approximation, accounting for 3 block reloads in cache set 1, opposed to at most one possible reload. Next to this, we also consider set 1, opposed to at most one possible reload. Next to this, we also consider several more properties of task interactions on the cache level which are deﬁned several more properties of task interactions on the cache level which are deﬁned and formalised in the following analysis. and formalised in the following analysis.

7.4 Preemption-delay aware RTA for LPS-FPP 7.4 Preemption-delay aware RTA for LPS-FPP

Considering the computation of the response time Ri of τi, the proposed analysis Considering the computation of the response time Ri of τi, the proposed analysis accounts for the following main computations (see Figure 7.2): accounts for the following main computations (see Figure 7.2): 114

84 Chapter 7. Preemption-delay aware analysis for LPS-FPP 84 Chapter 7. Preemption-delay aware analysis for LPS-FPP

1. bi – Computation of the maximum lower-priority blocking on τi, account- 1. bi – Computation of the maximum lower-priority blocking on τi, accounting for the CRPD in the blocking. ing for the CRPD in the blocking.

2. Ii – Computation of the maximum time duration from the start time of 2. Ii – Computation of the maximum time duration from the start time of

δi,1 until the start time of δi,li . Ii represents the upper bound on the δi,1 until the start time of δi,li . Ii represents the upper bound on the interval during which higher priority jobs can affect CRPD of τi,j due to interval during which higher priority jobs can affect CRPD of τi,j due to preemptions (see Figure 7.2). preemptions (see Figure 7.2).

3. Si,j – Computation of the latest start time of the last NPR of τi,j account- 3. Si,j – Computation of the latest start time of the last NPR of τi,j accounting for the CRPD on all jobs with higher priority than τi which can be ing for the CRPD on all jobs with higher priority than τi which can be released within Si,j, and accounting for the CRPD on j − 1 jobs of τi. released within Si,j, and accounting for the CRPD on j − 1 jobs of τi. The CRPD of τi,j should only include the CRPD in ﬁrst li − 1 NPRs, The CRPD of τi,j should only include the CRPD in ﬁrst li − 1 NPRs, since in the last NPR cache block reloads occur after Si,j, during the since in the last NPR cache block reloads occur after Si,j, during the execution of δi,li . (see Figure 7.2) execution of δi,li . (see Figure 7.2)

4. Fi,j – Computation of the latest ﬁnish time of the j-th job of τi, where 4. Fi,j – Computation of the latest ﬁnish time of the j-th job of τi, where the worst-case execution time of the last NPR δi,li of τi is included, with the worst-case execution time of the last NPR δi,li of τi is included, with the CRPD which can be caused during the execution of δi,li . the CRPD which can be caused during the execution of δi,li .

In the remainder of this section, we formally define the terms for computing In the remainder of this section, we formally define the terms for computing the response time of a task and the final computation of the taskset schedulabil- the response time of a task and the final computation of the taskset schedulability, divided into eight subsections: ity, divided into eight subsections:

1) Maximum lower priority blocking, 1) Maximum lower priority blocking,

2) Computation of CRPD bounds 2) Computation of CRPD bounds 3) Maximum time interval that affects preemption delays, 3) Maximum time interval that affects preemption delays,

4) The latest start time of a job, 4) The latest start time of a job, 5) The latest ﬁnish time of a job, 5) The latest ﬁnish time of a job, 6) The Level-i active period, 6) The Level-i active period, 7) The worst-case response time of a task, 7) The worst-case response time of a task,

8) Schedulability analysis. 8) Schedulability analysis.

To ease the understanding of the following equations, the motivating exam- To ease the understanding of the following equations, the motivating example from Figure 7.3 will also serve as the running example. ple from Figure 7.3 will also serve as the running example. 115

7.4 Preemption-delay aware RTA for LPS-FPP 85 7.4 Preemption-delay aware RTA for LPS-FPP 85

7.4.1 Maximum lower priority blocking 7.4.1 Maximum lower priority blocking

We start by defining the longest worst-case execution time of a NPR in τi We start by defining the longest worst-case execution time of a NPR in τi including the time the NPR may be prolonged due to the CRPD. including the time the NPR may be prolonged due to the CRPD. max max Definition 7.3. The longest worst-case execution time qi , accounting for Definition 7.3. The longest worst-case execution time qi , accounting for CRPD, of a non-preemptive region in τi is defined as follows: CRPD, of a non-preemptive region in τi is defined as follows: max max qi =max qi,k + UCB i,k−1 ∩ ECBi,k ∩ qi =max qi,k + UCB i,k−1 ∩ ECBi,k ∩ k∈[1,l ] k∈[1,l ] i i (7.3) (7.3) ∩ ECB h × BRT , where UCBi,0 = ∅ ∩ ECB h × BRT , where UCBi,0 = ∅

τh∈hp(i) τh∈hp(i) max max Proposition 7.1. qi is an upper bound on the longest worst-case execution Proposition 7.1. qi is an upper bound on the longest worst-case execution time, accounting for CRPD, of a non-preemptive region in τi. time, accounting for CRPD, of a non-preemptive region in τi.

Proof. The CRPD of a non-preemptive region δi,k cannot be higher than BRT Proof. The CRPD of a non-preemptive region δi,k cannot be higher than BRT multiplied by the number of cache blocks such that: 1) the block can be evicted multiplied by the number of cache blocks such that: 1) the block can be evicted by some task with higher priority than τi at PP i,k−1, and 2) due to preemption, by some task with higher priority than τi at PP i,k−1, and 2) due to preemption, the block can be reloaded in δi,k, i.e. can be accessed in δi,k, and is useful at the block can be reloaded in δi,k, i.e. can be accessed in δi,k, and is useful at PP i,k−1. This is accounted in Equation 7.3 by the set intersections between PP i,k−1. This is accounted in Equation 7.3 by the set intersections between UCB i,k−1, ECB i,k, and the union of all evicting cache blocks from the tasks UCB i,k−1, ECB i,k, and the union of all evicting cache blocks from the tasks with higher priority than τi. Since the equation computes the maximum sum of with higher priority than τi. Since the equation computes the maximum sum of the WCET and the CRPD upper bound, from all NPRs in τi, then the proposition the WCET and the CRPD upper bound, from all NPRs in τi, then the proposition holds. holds. max max Example: Given the task τ3 from Figure 7.3, q3 =8, since the NPR δ3,4 Example: Given the task τ3 from Figure 7.3, q3 =8, since the NPR δ3,4 has the non-preemptive WCET equal to 4, and all of its 4 useful cache blocks has the non-preemptive WCET equal to 4, and all of its 4 useful cache blocks might be reloaded during its execution due to preemption at PP 3,3. might be reloaded during its execution due to preemption at PP 3,3. max max Given the qi term, we now deﬁne the maximum lower priority blocking Given the qi term, we now deﬁne the maximum lower priority blocking that may block the start time of an instance of τi. Informally, it is the longest that may block the start time of an instance of τi. Informally, it is the longest worst-case execution time of a NPR belonging to the one of the tasks with a worst-case execution time of a NPR belonging to the one of the tasks with a lower priority than τi. lower priority than τi.

Deﬁnition 7.4. Maximum lower priority blocking bi which may delay the start Deﬁnition 7.4. Maximum lower priority blocking bi which may delay the start time of an instance of τi is equal to: time of an instance of τi is equal to: max max bi =max(ql ) (7.4) bi =max(ql ) (7.4) τl∈lp(i) τl∈lp(i)

Example: Given the task τ1, its maximum lower priority blocking is equal Example: Given the task τ1, its maximum lower priority blocking is equal to 8, since the NPR δ3,4 has the largest WCET value accounting for possible to 8, since the NPR δ3,4 has the largest WCET value accounting for possible CRPD, among all the other NPRs of τ2 and τ3. CRPD, among all the other NPRs of τ2 and τ3. 116

86 Chapter 7. Preemption-delay aware analysis for LPS-FPP 86 Chapter 7. Preemption-delay aware analysis for LPS-FPP

7.4.2 Computation of CRPD bounds 7.4.2 Computation of CRPD bounds This section is divided into three main parts: This section is divided into three main parts:

1. Reloadable cache blocks – where a modification on the term of useful 1. Reloadable cache blocks – where a modification on the term of useful cache blocks in tasks with NPRs is defined. cache blocks in tasks with NPRs is defined.

2. Computation of the CRPD bound on a single job of τi within the time 2. Computation of the CRPD bound on a single job of τi within the time interval t – Where the computations for the upper bound on CRPD of a interval t – Where the computations for the upper bound on CRPD of a single job during the time interval t are deﬁned. single job during the time interval t are deﬁned.

3. Computation of the CRPD bound on all jobs of τi released within the 3. Computation of the CRPD bound on all jobs of τi released within the time interval t – Where the computations for the upper bound on CRPD time interval t – Where the computations for the upper bound on CRPD for all jobs of a task that can be released during the time interval t are for all jobs of a task that can be released during the time interval t are deﬁned. deﬁned.

Reloadable cache blocks Reloadable cache blocks In this part, we first define an upper bound on the number of times a cache In this part, we first define an upper bound on the number of times a cache block m can be reloaded during the execution of a task. We exploit the fact that block m can be reloaded during the execution of a task. We exploit the fact that although a single cache block m may be useful and evicted at several consecutive although a single cache block m may be useful and evicted at several consecutive preemption points, it does not necessarily mean that it can be reloaded as many preemption points, it does not necessarily mean that it can be reloaded as many times as it is useful. We prove that between the two consecutive accesses of m times as it is useful. We prove that between the two consecutive accesses of m in τi, there is at most one cache block reload caused by preemptions between in τi, there is at most one cache block reload caused by preemptions between the accesses. the accesses.

Lemma 7.1. If a cache block m is not accessed in a NPR δi,k, then it cannot Lemma 7.1. If a cache block m is not accessed in a NPR δi,k, then it cannot be reloaded in δi,k. be reloaded in δi,k.

Proof. A cache block reload of m in δi,k implies that it must be accessed Proof. A cache block reload of m in δi,k implies that it must be accessed during the execution of δi,k, without being present in the cache memory, which during the execution of δi,k, without being present in the cache memory, which concludes the proof. concludes the proof.

Lemma 7.2. A cache block m can be reloaded at most once in δi,k due to Lemma 7.2. A cache block m can be reloaded at most once in δi,k due to preemption. preemption.

Proof. (By contradiction) Let us assume that due to preemption m can be Proof. (By contradiction) Let us assume that due to preemption m can be reloaded more than once in δi,k. During the execution of δi,k, the ﬁrst access reloaded more than once in δi,k. During the execution of δi,k, the ﬁrst access request for the cache block m can result in a reload due to preemption and request for the cache block m can result in a reload due to preemption and eviction of m prior the execution of δi,k. However, the next reload due to eviction of m prior the execution of δi,k. However, the next reload due to preemption within the same region implies that δi,k was preempted. This is a preemption within the same region implies that δi,k was preempted. This is a contradiction since δi,k is non-preemptive. contradiction since δi,k is non-preemptive. 117

7.4 Preemption-delay aware RTA for LPS-FPP 87 7.4 Preemption-delay aware RTA for LPS-FPP 87

Note that m can be self-evicted during the execution of δi,l and afterwards Note that m can be self-evicted during the execution of δi,l and afterwards again reloaded in the same NPR which is accounted for by the worst-case again reloaded in the same NPR which is accounted for by the worst-case execution time of the region. However, the source of such reload is not a execution time of the region. However, the source of such reload is not a preemption. preemption.

Proposition 7.2. If a cache block m is accessed in a NPR δi,k, then due to Proposition 7.2. If a cache block m is accessed in a NPR δi,k, then due to preemptions it can be reloaded at most once from PP i,k until the end of the ﬁrst preemptions it can be reloaded at most once from PP i,k until the end of the ﬁrst following NPR δi,l where it is accessed. following NPR δi,l where it is accessed.

Proof. (By contradiction) Let us assume that m is accessed in δi,k, and due to Proof. (By contradiction) Let us assume that m is accessed in δi,k, and due to preemption m can be reloaded more than once from PP i,k until the end of δi,l. preemption m can be reloaded more than once from PP i,k until the end of δi,l. This implies two possible cases: 1) m is reloaded in a NPR between δi,k and δi,l This implies two possible cases: 1) m is reloaded in a NPR between δi,k and δi,l where it is not accessed, which contradicts Lemma 7.1, or 2) due to preemptions, where it is not accessed, which contradicts Lemma 7.1, or 2) due to preemptions, m is reloaded more than once in δi,l which contradicts Lemma 7.2. m is reloaded more than once in δi,l which contradicts Lemma 7.2. Given the Proposition 7.2, in order to compute the upper bound on reloads Given the Proposition 7.2, in order to compute the upper bound on reloads of m during the execution of first x NPRs of τi, we can compute the number of of m during the execution of first x NPRs of τi, we can compute the number of first x NPRs for which m is in the ECB set of a NPR and in the UCB set of the first x NPRs for which m is in the ECB set of a NPR and in the UCB set of the first succeeding preemption point. first succeeding preemption point. m m Definition 7.5. The upper bound ubi,x on reloads of a cache block m during Definition 7.5. The upper bound ubi,x on reloads of a cache block m during execution of the first x NPRs of τi is equal to: execution of the first x NPRs of τi is equal to: m x m x ubi,x = {k | (1 ≤ k

Example: Given the task τ3 from Figure 7.3, the upper bound on the number Example: Given the task τ3 from Figure 7.3, the upper bound on the number 1 1 of reloads ub3,4 of a cache block 1 during the execution of the entire task is of reloads ub3,4 of a cache block 1 during the execution of the entire task is 118

88 Chapter 7. Preemption-delay aware analysis for LPS-FPP 88 Chapter 7. Preemption-delay aware analysis for LPS-FPP

1 1 ub3,4 =1. Notice that cache block 1 is useful at all three preemption points and ub3,4 =1. Notice that cache block 1 is useful at all three preemption points and can be evicted upon preemption on any of them, however, it can be reloaded at can be evicted upon preemption on any of them, however, it can be reloaded at most once. Next, we deﬁne: most once. Next, we deﬁne:

Definition 7.6. A multiset RCB i,x, where each cache block is present the Definition 7.6. A multiset RCB i,x, where each cache block is present the maximum number of times it can be reloaded during the execution of the first x maximum number of times it can be reloaded during the execution of the first x NPRs of a single job of τi: NPRs of a single job of τi: RCB i,x = {m} (7.6) RCB i,x = {m} (7.6) m m m∈ECB i ubi,x m∈ECB i ubi,x

Proposition 7.4. RCB i,x is a multiset of all possible reloadable cache blocks Proposition 7.4. RCB i,x is a multiset of all possible reloadable cache blocks during the execution of the first x NPRs of a single job of τi. during the execution of the first x NPRs of a single job of τi. m m Proof. Since each accessed block m (given by m ∈ ECB i) is included ubi,x Proof. Since each accessed block m (given by m ∈ ECB i) is included ubi,x times in the multiset, then by Proposition 7.3 all the possible reloads for each times in the multiset, then by Proposition 7.3 all the possible reloads for each cache block during the execution of the first x NPRs of a job of τi are accounted. cache block during the execution of the first x NPRs of a job of τi are accounted.

RCB-based method Preemption-based method RCB-based method Preemption-based method

4 reloads 10 reloads 4 reloads 10 reloads

3 reloads 2 reloads 3 reloads 2 reloads

Figure 7.4. Top row: CRPD approximation for τ3, considering two preempting jobs from τ1, and Figure 7.4. Top row: CRPD approximation for τ3, considering two preempting jobs from τ1, and one preempting job from τ2. Bottom row: CRPD approximation for τ2, considering one one preempting job from τ2. Bottom row: CRPD approximation for τ2, considering one preempting job from τ1. preempting job from τ1. 119

7.4 Preemption-delay aware RTA for LPS-FPP 89 7.4 Preemption-delay aware RTA for LPS-FPP 89

Example: Given the task τ3 from Figure 7.4, the RCB 3,4 multiset is equal Example: Given the task τ3 from Figure 7.4, the RCB 3,4 multiset is equal to {1, 2, 3, 4}, and RCB 3,3 = ∅. Notice that cache blocks 1, 2, 3 and 4 are not to {1, 2, 3, 4}, and RCB 3,3 = ∅. Notice that cache blocks 1, 2, 3 and 4 are not present in RCB 3,3 because their reloads cannot occur during the execution of present in RCB 3,3 because their reloads cannot occur during the execution of the ﬁrst three regions of τ3.Forτ2, RCB 2,4 = {1, 3, 4, 4}. the ﬁrst three regions of τ3.Forτ2, RCB 2,4 = {1, 3, 4, 4}.

Computation of the CRPD bound on a single job of τi Computation of the CRPD bound on a single job of τi within the time interval t within the time interval t

For the upper bound on CRPD caused on a job of τi during the time interval t, For the upper bound on CRPD caused on a job of τi during the time interval t, we define two complementary types of CRPD computations. In the analysis, we we define two complementary types of CRPD computations. In the analysis, we compute CRPD with both approaches, and later use the lower value as the final compute CRPD with both approaches, and later use the lower value as the final upper bound. The proposed concepts are: upper bound. The proposed concepts are: RCB-based upper bound (γ∪) – which considers that all higher priority RCB-based upper bound (γ∪) – which considers that all higher priority jobs, releasable during t, can preempt a job τi,j at all preemption points. jobs, releasable during t, can preempt a job τi,j at all preemption points. In this case, we consider that each reloadable cache block is reloaded In this case, we consider that each reloadable cache block is reloaded during the execution of a job if it can be evicted by some higher priority during the execution of a job if it can be evicted by some higher priority job released during t. job released during t. Preemption-based upper bound (γ+) – which considers that a preemp- Preemption-based upper bound (γ+) – which considers that a preemption, from a single higher priority job τh,j can evict and cause a reload tion, from a single higher priority job τh,j can evict and cause a reload only for the useful cache blocks of the point at τi,j where the preemption only for the useful cache blocks of the point at τi,j where the preemption occurred. In this case, for each preempting task τh, whose jobs can be occurred. In this case, for each preempting task τh, whose jobs can be released j times during t interval, we account for j highest CRPD values released j times during t interval, we account for j highest CRPD values resulting from individual preemptions of those jobs on preemption points resulting from individual preemptions of those jobs on preemption points of τi,j. of τi,j. To give an informal example of the two CRPD computations, let us assume that To give an informal example of the two CRPD computations, let us assume that given the taskset from Figure 7.3 we compute the upper bound on CRPD caused given the taskset from Figure 7.3 we compute the upper bound on CRPD caused on a single job of τ3 during a 24 units long time interval. During the interval of on a single job of τ3 during a 24 units long time interval. During the interval of 24 units, at most two jobs of τ1 can be released, and at most one job of τ2 can 24 units, at most two jobs of τ1 can be released, and at most one job of τ2 can be released. be released. For the RCB-based method it does not matter how we allocate preemp- For the RCB-based method it does not matter how we allocate preemptions of these jobs on preemption points, we care only about what are possibly tions of these jobs on preemption points, we care only about what are possibly evicting cache blocks in two jobs from τ1 and one job from τ2, and those evicting cache blocks in two jobs from τ1 and one job from τ2, and those are {1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4}. Also, we consider what cache blocks can be are {1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4}. Also, we consider what cache blocks can be reloaded upon those evictions in single job of τ3, and those are RCB 3,4 = reloaded upon those evictions in single job of τ3, and those are RCB 3,4 = {1, 2, 3, 4}. For each cache block, present in both – the RCB- and the evicting {1, 2, 3, 4}. For each cache block, present in both – the RCB- and the evicting cache blocks multisets, we account that it is evicted and reloaded, meaning that cache blocks multisets, we account that it is evicted and reloaded, meaning that CRPD at the end considers four cache block reloads. CRPD at the end considers four cache block reloads. 120

90 Chapter 7. Preemption-delay aware analysis for LPS-FPP 90 Chapter 7. Preemption-delay aware analysis for LPS-FPP

For the preemption-based method the allocation of preemptions on pre- For the preemption-based method the allocation of preemptions on preemption points of a task is important. First, we consider two preempting jobs of emption points of a task is important. First, we consider two preempting jobs of τ1 and their two possible preemptions on a job of τ3. In this case, the maximum τ1 and their two possible preemptions on a job of τ3. In this case, the maximum CRPD from two preemptions is caused on PP 3,3 and PP 3,2, causing the num- CRPD from two preemptions is caused on PP 3,3 and PP 3,2, causing the number of reloads equal to 4 (due to evictions and reloads of blocks in UCB 3,3), and ber of reloads equal to 4 (due to evictions and reloads of blocks in UCB 3,3), and 3 (due to evictions and reloads of blocks in UCB 3,2). For the preemption from 3 (due to evictions and reloads of blocks in UCB 3,2). For the preemption from a job of τ2, the maximum CRPD occurs when PP 3,3 is preempted, since useful a job of τ2, the maximum CRPD occurs when PP 3,3 is preempted, since useful cache blocks 1,3, and 4 can be evicted, and CRPD accounts for 3 cache block cache blocks 1,3, and 4 can be evicted, and CRPD accounts for 3 cache block reloads. In total, 7+3=10cache block reloads are accounted for, which is reloads. In total, 7+3=10cache block reloads are accounted for, which is the less precise upper bound than the one derived with the RCB-based method. the less precise upper bound than the one derived with the RCB-based method. However, in some cases, e.g., computing CRPD for a single job of τ2, the However, in some cases, e.g., computing CRPD for a single job of τ2, the preemption-based method provides a better result. In this case, the RCB-based preemption-based method provides a better result. In this case, the RCB-based method would account for three cache block reloads resulting from a single method would account for three cache block reloads resulting from a single preemption by τ1. However, the preemption-based method computes a tighter preemption by τ1. However, the preemption-based method computes a tighter upper bound on CRPD accounting for one preemption from τ1 on PP 2,1. This upper bound on CRPD accounting for one preemption from τ1 on PP 2,1. This is the case that produces the maximum CRPD, equal to two cache block reloads is the case that produces the maximum CRPD, equal to two cache block reloads of cache blocks 1, and 3. For a single preemption at any other point, the CRPD of cache blocks 1, and 3. For a single preemption at any other point, the CRPD is equal to one. is equal to one. It is important to mention that in this solution, for a single job of any task, It is important to mention that in this solution, for a single job of any task, we are interested in the CRPD that can be caused: we are interested in the CRPD that can be caused:

during the execution of the first li − 1 NPRs of a job. during the execution of the first li − 1 NPRs of a job. during the execution of all NPRs of a job. during the execution of all NPRs of a job. This is important because when considering the time intervals (see Figure This is important because when considering the time intervals (see Figure 7.2) for the job τi,j itself, we should only consider CRPD caused during the first 7.2) for the job τi,j itself, we should only consider CRPD caused during the first li − 1 of its NPRs. This is because in Ii and Si,j intervals, we do not consider li − 1 of its NPRs. This is because in Ii and Si,j intervals, we do not consider the CRPD of the last NPR of the job. But, when considering how this job affects the CRPD of the last NPR of the job. But, when considering how this job affects other jobs, we must consider CRPD in all of its NPRs. other jobs, we must consider CRPD in all of its NPRs. For these reasons, many equations in this section consider the CRPD caused For these reasons, many equations in this section consider the CRPD caused during the execution of the first x NPRs of a job, where x is the provided during the execution of the first x NPRs of a job, where x is the provided parameter. We distinguish among: parameter. We distinguish among: ∪ ∪ γx : for RCB-based CRPD computations, and γx : for RCB-based CRPD computations, and + + γx : for preemption-based CRPD computations. γx : for preemption-based CRPD computations. Based on whether or not the last NPR should be included, as discussed above, x Based on whether or not the last NPR should be included, as discussed above, x can be assigned the value li − 1 or li, respectively. can be assigned the value li − 1 or li, respectively. We first define the upper bound on CRPD for a single job within a time We first define the upper bound on CRPD for a single job within a time interval t using RCB-based computation: interval t using RCB-based computation: 121

7.4 Preemption-delay aware RTA for LPS-FPP 91 7.4 Preemption-delay aware RTA for LPS-FPP 91

∪ ∪ Definition 7.7. The upper bound γi,x(t) on the CRPD caused on the first x Definition 7.7. The upper bound γi,x(t) on the CRPD caused on the first x NPRs of a single job of τi by all preempting jobs which can be released within NPRs of a single job of τi by all preempting jobs which can be released within a time interval t is defined by the following equation: a time interval t is defined by the following equation: ∪ ∩ × ∪ ∩ × γi,x(t)= RCB i,x ECB h BRT (7.7) γi,x(t)= RCB i,x ECB h BRT (7.7) τh∈hp(i) t/Th τh∈hp(i) t/Th

The equation accounts for the maximum number t/Th of releases of jobs The equation accounts for the maximum number t/Th of releases of jobs of each higher priority task τh during t, and includes all of their evicting cache of each higher priority task τh during t, and includes all of their evicting cache blocks as many times as they can be released. blocks as many times as they can be released. ∪ ∪ Proposition 7.5. γi,x(t) is an upper bound on the CRPD caused on the first x Proposition 7.5. γi,x(t) is an upper bound on the CRPD caused on the first x NPRs of a single job of τi during the interval t. NPRs of a single job of τi during the interval t. Proof. By Proposition 7.4, all possibly reloadable cache blocks in a single job Proof. By Proposition 7.4, all possibly reloadable cache blocks in a single job of τi during the execution of the first x NPRs are given by RCB i,x. By the of τi during the execution of the first x NPRs are given by RCB i,x. By the multiset union, Equation 7.7 accounts for the maximum number of possibly multiset union, Equation 7.7 accounts for the maximum number of possibly evicting cache blocks of all jobs with higher priority than τi that can be released evicting cache blocks of all jobs with higher priority than τi that can be released during t. Since by the multiset intersection Equation 7.7 accounts that every during t. Since by the multiset intersection Equation 7.7 accounts that every reloadable cache block from RCB i,x indeed results in a reload if there is at reloadable cache block from RCB i,x indeed results in a reload if there is at least one evicting block from all possibly preempting jobs released during t, least one evicting block from all possibly preempting jobs released during t, ∪ ∪ then γi,x(t) is an upper bound on CRPD caused on the first x NPRs of a single then γi,x(t) is an upper bound on CRPD caused on the first x NPRs of a single job of τi during t. job of τi during t.

Example: Given all the NPRs of a task τ3 (see Figure 7.3) the upper bound Example: Given all the NPRs of a task τ3 (see Figure 7.3) the upper bound on their CRPD during the 24 units long time interval is equal to: on their CRPD during the 24 units long time interval is equal to: ∪ ∪ γ3,4(24) = {1, 2, 3, 4}∩ {1, 2, 3, 4} {1, 2, 3} γ3,4(24) = {1, 2, 3, 4}∩ {1, 2, 3, 4} {1, 2, 3} 24 24 24 24 22 50 22 50 = {1, 2, 3, 4}∩ {1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4} =4 = {1, 2, 3, 4}∩ {1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4} =4

Before defining the preemption-based CRPD upper bound, we first define a Before defining the preemption-based CRPD upper bound, we first define a multiset which consists of all possible CRPD values that jobs of τh can cause multiset which consists of all possible CRPD values that jobs of τh can cause on the first x NPRs of τi. on the first x NPRs of τi. h h Definition 7.8. A multiset CRPD i,x of all upper bounded CRPDs that can be Definition 7.8. A multiset CRPD i,x of all upper bounded CRPDs that can be caused by a job of τh on each of the first x NPRs of a single job of τi is equal to: caused by a job of τh on each of the first x NPRs of a single job of τi is equal to: x x h h CRPD i,x = ECB h ∩ UCB i,k−1 × BRT CRPD i,x = ECB h ∩ UCB i,k−1 × BRT k=1 (7.8) k=1 (7.8) where UCB i,0 = ∅ where UCB i,0 = ∅ 122

92 Chapter 7. Preemption-delay aware analysis for LPS-FPP 92 Chapter 7. Preemption-delay aware analysis for LPS-FPP

h h Proposition 7.6. CRPD i,x is a multiset of upper bounds of the CRPD that can Proposition 7.6. CRPD i,x is a multiset of upper bounds of the CRPD that can be caused by a job of τh on the ﬁrst x NPRs of a single job of τi. be caused by a job of τh on the ﬁrst x NPRs of a single job of τi.

Proof. A CRPD upper bound on a single NPR δi,k is computed by accounting Proof. A CRPD upper bound on a single NPR δi,k is computed by accounting that all useful cache blocks at the preemption point directly preceding δi,k that all useful cache blocks at the preemption point directly preceding δi,k can be evicted by the evicting cache blocks of τh. Thus, CRPD on δi,k is can be evicted by the evicting cache blocks of τh. Thus, CRPD on δi,k is computed as the intersection between the ECB h and UCB i,k−1 sets. Since computed as the intersection between the ECB h and UCB i,k−1 sets. Since Equation 7.8 unites the upper bounds from the x first NPRs in a multiset, the Equation 7.8 unites the upper bounds from the x first NPRs in a multiset, the proposition holds. proposition holds. 1 1 Example: Given the taskset from Figure 7.3, a multiset CRPD 3,4 of CRPDs Example: Given the taskset from Figure 7.3, a multiset CRPD 3,4 of CRPDs from a job of τ1 caused on first 4 NPRs of τ3 is: from a job of τ1 caused on first 4 NPRs of τ3 is: 1 1 CRPD 3,4 = { ∅ , {1, 2} , {1, 2, 3} , {1, 2, 3, 4} } = {0, 2, 3, 4} (7.9) CRPD 3,4 = { ∅ , {1, 2} , {1, 2, 3} , {1, 2, 3, 4} } = {0, 2, 3, 4} (7.9)

Considering the cumulative CRPD caused by preemptions from jobs of τh Considering the cumulative CRPD caused by preemptions from jobs of τh during a time interval t we define the following term. during a time interval t we define the following term. h h Definition 7.9. The upper bound γi,x(t) on CRPD caused by preemptions from Definition 7.9. The upper bound γi,x(t) on CRPD caused by preemptions from jobs of τh on the first x NPRs of a single job of τi, within the time interval t,is jobs of τh on the first x NPRs of a single job of τi, within the time interval t,is defined in the following equation: defined in the following equation: h h CRPD i,x ,x≤t/Th CRPD i,x ,x≤t/Th γh (t)= γh (t)= i,x h i,x h max c c ⊆ CRPD i,x ∧|c| = t/Th , otherwise max c c ⊆ CRPD i,x ∧|c| = t/Th , otherwise (7.10) (7.10) Equation 7.10 computes the maximum cumulative CRPD from a maximum Equation 7.10 computes the maximum cumulative CRPD from a maximum number of preemptions of jobs of τh on the first x NPRs of an instance of τi. number of preemptions of jobs of τh on the first x NPRs of an instance of τi. Equation 7.10 considers two cases: Equation 7.10 considers two cases: t t x ≤ τh x ≤ τh Case 1: ( Th ), the number of preempting jobs of which can cause Case 1: ( Th ), the number of preempting jobs of which can cause CRPD on first x regions of a single job of τi is greater than or equal to the CRPD on first x regions of a single job of τi is greater than or equal to the number of preemption points that can be preempted. number of preemption points that can be preempted. h h Lemma 7.3. γi,x(t) is an upper bound on CRPD caused by preemptions from Lemma 7.3. γi,x(t) is an upper bound on CRPD caused by preemptions from jobs of τh on the first x NPRs of a single job of τi, within the time interval t,in jobs of τh on the first x NPRs of a single job of τi, within the time interval t,in x ≤ t x ≤ t case Th . case Th . x ≤ t x ≤ t Proof. When Th , Equation 7.10 sums all the CRPD values in the multiset Proof. When Th , Equation 7.10 sums all the CRPD values in the multiset h h h h CRPD i,x, and since by Proposition 7.6, CRPD i,x is a multiset of upper bounds CRPD i,x, and since by Proposition 7.6, CRPD i,x is a multiset of upper bounds of the CRPD that can be caused by a job of τh on the first x NPRs of a single of the CRPD that can be caused by a job of τh on the first x NPRs of a single job of τi, then the lemma holds. job of τi, then the lemma holds. 123

7.4 Preemption-delay aware RTA for LPS-FPP 93 7.4 Preemption-delay aware RTA for LPS-FPP 93 x ≤ t x ≤ t Example for Th : Given the taskset from Figure 7.3, the upper bound Example for Th : Given the taskset from Figure 7.3, the upper bound on CRPD caused by preemptions from jobs of τ1 on all NPRs of a single job of on CRPD caused by preemptions from jobs of τ1 on all NPRs of a single job of 1 1 τ3 during 100 time units is equal to γ3,4(100) = 9 although jobs of τ1 can be τ3 during 100 time units is equal to γ3,4(100) = 9 although jobs of τ1 can be released 5 times during 100 time units, given by 100/T1 = 100/22. This released 5 times during 100 time units, given by 100/T1 = 100/22. This is the case because only three preemption points can be preempted, and each is the case because only three preemption points can be preempted, and each of those preemptions individually cause maximum CRPD of 2,3, and 4 cache of those preemptions individually cause maximum CRPD of 2,3, and 4 cache block reloads respectively. block reloads respectively. t t x> t case Th . case Th . h h Proof. By Proposition 7.6, CRPD i,x is a multiset of upper bounds of the CRPD Proof. By Proposition 7.6, CRPD i,x is a multiset of upper bounds of the CRPD that can be caused by a job of τh on the ﬁrst x NPRs of a single job of τi. Since that can be caused by a job of τh on the ﬁrst x NPRs of a single job of τi. Since t t τh t τh t Th is the maximum number of preemptions from the jobs of during , then Th is the maximum number of preemptions from the jobs of during , then t t τh τh only Th number of NPRs can experience CRPD caused by jobs of . Since only Th number of NPRs can experience CRPD caused by jobs of . Since t CRPD h t CRPD h Equation 7.10 sums the Th highest values from i,x multiset, then the Equation 7.10 sums the Th highest values from i,x multiset, then the lemma holds. lemma holds.

Proof. Follows directly from Lemma 7.3, and 7.4. Proof. Follows directly from Lemma 7.3, and 7.4.

Now, we consider that all higher priority jobs may contribute to the CRPD Now, we consider that all higher priority jobs may contribute to the CRPD of a single job of τi during the time interval t. of a single job of τi during the time interval t.

+ + Definition 7.10. The upper bound γi,x(t) on CRPD caused by preemptions of Definition 7.10. The upper bound γi,x(t) on CRPD caused by preemptions of all higher priority jobs during the time interval t on the first x NPRs of a single all higher priority jobs during the time interval t on the first x NPRs of a single 124

94 Chapter 7. Preemption-delay aware analysis for LPS-FPP 94 Chapter 7. Preemption-delay aware analysis for LPS-FPP

job of τi is deﬁned in the following equation: job of τi is deﬁned in the following equation: γ+ (t)= γh (t) γ+ (t)= γh (t) i,x i,x (7.11) i,x i,x (7.11) τh∈hp(τi) τh∈hp(τi)

+ + Proposition 7.8. γi,x(t) is an upper bound on CRPD caused by all higher Proposition 7.8. γi,x(t) is an upper bound on CRPD caused by all higher priority jobs which can be released within time interval t on the first x NPRs of priority jobs which can be released within time interval t on the first x NPRs of a single job of τi. a single job of τi. h h Proof. By Proposition 7.7, for a single higher priority task τh, γi,x(t) is an Proof. By Proposition 7.7, for a single higher priority task τh, γi,x(t) is an upper bound on CRPD caused by all jobs of τh on the first x NPRs of a single upper bound on CRPD caused by all jobs of τh on the first x NPRs of a single h h job of τi within t. Since Equation 7.11 sums γi,x(t) for each task with higher job of τi within t. Since Equation 7.11 sums γi,x(t) for each task with higher priority than τi, the proposition holds. priority than τi, the proposition holds.

+ 1 + 1 Example: Given the taskset from Figure 7.3: γ3,4(24) = γ3,4(24) + Example: Given the taskset from Figure 7.3: γ3,4(24) = γ3,4(24) + 2 2 γ3,4(24)=7+3=10. This is the case because Equation 7.11 accounts γ3,4(24)=7+3=10. This is the case because Equation 7.11 accounts for two preemptions from τ1 resulting in adding the two highest CRPD values for two preemptions from τ1 resulting in adding the two highest CRPD values (occurring at points PP 3,3: four reloads, and PP 3,2: three reloads), and one (occurring at points PP 3,3: four reloads, and PP 3,2: three reloads), and one preemption from τ2 resulting in a single highest CRPD value (occurring at preemption from τ2 resulting in a single highest CRPD value (occurring at PP 3,3: three reloads). PP 3,3: three reloads). Finally, we define the CRPD upper-bound using both – the RCB-based and Finally, we define the CRPD upper-bound using both – the RCB-based and the preemption-based computations. the preemption-based computations. Definition 7.11. Combining the two proposed methods, a safe upper bound Definition 7.11. Combining the two proposed methods, a safe upper bound γi,x(t) on CRPD caused on the first x NPRs of a job of τi during the time γi,x(t) on CRPD caused on the first x NPRs of a job of τi during the time interval t is derived with the following equation: interval t is derived with the following equation: + ∪ + ∪ γi,x(t)=min γi,x(t) ,γi,x(t) (7.12) γi,x(t)=min γi,x(t) ,γi,x(t) (7.12)

Proposition 7.9. γi,x(t) is an upper bound on CRPD caused on the ﬁrst x Proposition 7.9. γi,x(t) is an upper bound on CRPD caused on the ﬁrst x NPRs of a job of τi during the time interval t. NPRs of a job of τi during the time interval t.

Proof. γi,x(t) is an upper bound because it takes the least of two upper bounds, Proof. γi,x(t) is an upper bound because it takes the least of two upper bounds, given by Propositions 7.5 and 7.8. given by Propositions 7.5 and 7.8. Example: Given the taskset from Figure 7.3, the ﬁnal safe upper bound Example: Given the taskset from Figure 7.3, the ﬁnal safe upper bound ∪ + ∪ + is γ3,4(24) = min(γ3,4(24),γ3,4(24)) = min(4, 10) = 4, meaning that the is γ3,4(24) = min(γ3,4(24),γ3,4(24)) = min(4, 10) = 4, meaning that the tighter upper bound is given by the RCB-based method. However, in case of tighter upper bound is given by the RCB-based method. However, in case of τ2, the CRPD on a single job within 100 time units is computed as follows: τ2, the CRPD on a single job within 100 time units is computed as follows: ∪ + ∪ + γ2,4(10) = min(γ2,4(10),γ2,4(10)) = min(3, 2) = 2, meaning that the tighter γ2,4(10) = min(γ2,4(10),γ2,4(10)) = min(3, 2) = 2, meaning that the tighter upper bound is given by the preemption-based method. upper bound is given by the preemption-based method. 125

7.4 Preemption-delay aware RTA for LPS-FPP 95 7.4 Preemption-delay aware RTA for LPS-FPP 95

Computation of the CRPD bound on all jobs of τi Computation of the CRPD bound on all jobs of τi released within the time interval t released within the time interval t

Considering that Ii is the maximum time interval during which jobs with higher Considering that Ii is the maximum time interval during which jobs with higher priority than τi may affect the CRPD of a single job of τi (see Figure 7.2), then priority than τi may affect the CRPD of a single job of τi (see Figure 7.2), then the CRPD upper bound of all jobs of τi which can be released during the time the CRPD upper bound of all jobs of τi which can be released during the time interval t, is deﬁned as follows: interval t, is deﬁned as follows:

Definition 7.12. The upper bound γi(t) on the CRPD on all the jobs of τi that Definition 7.12. The upper bound γi(t) on the CRPD on all the jobs of τi that can be released during the time interval t is defined with the following equation: can be released during the time interval t is defined with the following equation: t × t × γi(t)= γi,li (Ii) (7.13) γi(t)= γi,li (Ii) (7.13) Ti Ti

Proposition 7.10. Assuming that Ii is an upper bound on the time duration Proposition 7.10. Assuming that Ii is an upper bound on the time duration from the start of the ﬁrst NPR of τi until the start of the last NPR of τi, then γi(t) from the start of the ﬁrst NPR of τi until the start of the last NPR of τi, then γi(t) is a CRPD upper bound on all the jobs of τi which can be released during t. is a CRPD upper bound on all the jobs of τi which can be released during t.

Proof. By Proposition 7.9 and the assumption in Proposition 7.10, Equation Proof. By Proposition 7.9 and the assumption in Proposition 7.10, Equation

7.13 derives the CRPD upper bound γi,li (Ii) caused during the Ii interval on 7.13 derives the CRPD upper bound γi,li (Ii) caused during the Ii interval on all NPRs of a single job of τi. Since for each job of τi, that can be released all NPRs of a single job of τi. Since for each job of τi, that can be released within t, its CRPD upper bound γi(t) is multiplied by the maximum number of within t, its CRPD upper bound γi(t) is multiplied by the maximum number of releases, then γi(t) is a CRPD upper bound on all the jobs of τi which can be releases, then γi(t) is a CRPD upper bound on all the jobs of τi which can be released during t. released during t.

Computation of Ii is deﬁned in Equation 7.14, in the following subsection. Computation of Ii is deﬁned in Equation 7.14, in the following subsection.

7.4.3 Maximum time interval that affects preemption delays 7.4.3 Maximum time interval that affects preemption delays

Next, we define the computation of the Ii interval. In Equation 7.14, we consider Next, we define the computation of the Ii interval. In Equation 7.14, we consider the critical instant, i.e. that just after the start of execution of δi,1 all the higher the critical instant, i.e. that just after the start of execution of δi,1 all the higher priority jobs may be released. It also considers the execution of a job of τi priority jobs may be released. It also considers the execution of a job of τi until the start of its last NPR, accounting for the possible CRPD during the until the start of its last NPR, accounting for the possible CRPD during the execution of the first li − 1 NPRs. In each new iteration, the Equation accounts execution of the first li − 1 NPRs. In each new iteration, the Equation accounts that the computed interval can be enlarged with the additional CRPD by the that the computed interval can be enlarged with the additional CRPD by the newly released higher priority jobs with their respective CRPDs: newly released higher priority jobs with their respective CRPDs:

Deﬁnition 7.13. The upper bound Ii of the time duration from the start time of Deﬁnition 7.13. The upper bound Ii of the time duration from the start time of

δi,1 until the start time of δi,li is defined as the least fixed point of the following δi,1 until the start time of δi,li is defined as the least fixed point of the following 126

96 Chapter 7. Preemption-delay aware analysis for LPS-FPP 96 Chapter 7. Preemption-delay aware analysis for LPS-FPP recursive equation: recursive equation: ⎧ ⎧ (0) (0) ⎪Ii = Ei ⎪Ii = Ei ⎪ ⎪ ⎨⎪ (r) (r−1) ⎨⎪ (r) (r−1) Ii = Ei + γi,li−1(Ii )+ Ii = Ei + γi,li−1(Ii )+ (7.14) (7.14) ⎪ (r−1) ⎪ (r−1) ⎪ Ii × (r−1) ⎪ Ii × (r−1) ⎩⎪ +1 Ch + γh(Ii ) ⎩⎪ +1 Ch + γh(Ii ) Th Th τh∈hp(i) τh∈hp(i)

Proposition 7.11. Ii is the upper bound of the time duration from the start time Proposition 7.11. Ii is the upper bound of the time duration from the start time of the first NPR of τi until the start time of the last NPR of τi for any job of τi. of the first NPR of τi until the start time of the last NPR of τi for any job of τi. Proof. By induction over the tasks in Γ, in a decreasing priority order. Proof. By induction over the tasks in Γ, in a decreasing priority order. ∅ ∅ Base case: I1 = E1, because hp(τ1)= and γi,li−1(t)=0for any t since τ1 Base case: I1 = E1, because hp(τ1)= and γi,li−1(t)=0for any t since τ1 cannot experience CRPD. E1 is the upper bound on the time duration from the cannot experience CRPD. E1 is the upper bound on the time duration from the start time of the first NPR until the start time of the last NPR of τ1 since it is the start time of the first NPR until the start time of the last NPR of τ1 since it is the worst-case execution time between those two points. worst-case execution time between those two points. Inductive hypothesis: Assume that for all τh in hp(i), Ih is an upper bound Inductive hypothesis: Assume that for all τh in hp(i), Ih is an upper bound on the time duration from the start of the first NPR, until the start of the last on the time duration from the start of the first NPR, until the start of the last non-preemptive region of τh. non-preemptive region of τh. Inductive step: We show that Equation 7.14 computes the safe upper bound Ii. Inductive step: We show that Equation 7.14 computes the safe upper bound Ii. r r−1 r r−1 Consider the least fixed point of Equation 7.14, for which Ii = Ii = Ii .At Consider the least fixed point of Equation 7.14, for which Ii = Ii = Ii .At this point, the equation accounts for the following upper bounds and worst-case this point, the equation accounts for the following upper bounds and worst-case execution times: execution times:

Ei, which is the worst-case execution time, assumed by the system model, Ei, which is the worst-case execution time, assumed by the system model, of the ﬁrst li − 1 non-preemptive regions. of the ﬁrst li − 1 non-preemptive regions.

γi,li−1(Ii), which is proved by Proposition 7.9 to be the CRPD upper γi,li−1(Ii), which is proved by Proposition 7.9 to be the CRPD upper bound of a single job of τi in the ﬁrst li − 1 non-preemptive regions bound of a single job of τi in the ﬁrst li − 1 non-preemptive regions during the time interval Ii. during the time interval Ii. Ii Ii ( +1)× Ch ( +1)× Ch τh∈hp(i) Th – Worst-case interference caused by the τh∈hp(i) Th – Worst-case interference caused by the execution of jobs of higher priority tasks. The interference is accounted execution of jobs of higher priority tasks. The interference is accounted as the sum of individual WCET values of each job with higher priority as the sum of individual WCET values of each job with higher priority than τi, that can be released during Ii. The maximum number of jobs, than τi, that can be released during Ii. The maximum number of jobs, of a single higher priority task τh, which can be released during Ii is of a single higher priority task τh, which can be released during Ii is Ii +1 Ii +1 accounted by term Th . accounted by term Th . h i h i τh∈hp(i) γ (I ) – CRPD upper bound on all jobs of the higher priority τh∈hp(i) γ (I ) – CRPD upper bound on all jobs of the higher priority tasks that can be released during t. By Proposition 7.10 and the inductive tasks that can be released during t. By Proposition 7.10 and the inductive hypothesis, γh(Ii) is an upper bound on the CRPD of all jobs of τh which hypothesis, γh(Ii) is an upper bound on the CRPD of all jobs of τh which 127

7.4 Preemption-delay aware RTA for LPS-FPP 97 7.4 Preemption-delay aware RTA for LPS-FPP 97

can be released within Ii. Then, the sum of CRPD bounds γh(Ii) for can be released within Ii. Then, the sum of CRPD bounds γh(Ii) for each higher priority task than τi is a safe upper bound on the cumulative each higher priority task than τi is a safe upper bound on the cumulative CRPD in all jobs with higher priority than τi, released during Ii. CRPD in all jobs with higher priority than τi, released during Ii.

Since we proved for all the factors, which can prolong the time duration Since we proved for all the factors, which can prolong the time duration from the start time of the first NPR until the finish time of the last NPR of τi, from the start time of the first NPR until the finish time of the last NPR of τi, that they are accounted as the respective upper bounds in Equation 7.14, then that they are accounted as the respective upper bounds in Equation 7.14, then their sum results in an upper bound, which concludes the proof. their sum results in an upper bound, which concludes the proof.

Note that Equation 7.14 uses Equation 7.13, and vice versa. However, Note that Equation 7.14 uses Equation 7.13, and vice versa. However, Equation 7.14 is always computable, because in order to compute Ii, Equation Equation 7.14 is always computable, because in order to compute Ii, Equation 7.14 computes γh(t) term only for higher priority tasks than τi, meaning that it 7.14 computes γh(t) term only for higher priority tasks than τi, meaning that it computes γh(t) using only the values I1 to Ii−1. Since γ1(t) always evaluates computes γh(t) using only the values I1 to Ii−1. Since γ1(t) always evaluates to zero for any value of t, because τ1 cannot experience CRPD, this implies that to zero for any value of t, because τ1 cannot experience CRPD, this implies that I2 can be computed using γ1(t)=0, which implies that I3 can be computed I2 can be computed using γ1(t)=0, which implies that I3 can be computed using the previously computed I1 =0, and I2 values, and so on until Ii. using the previously computed I1 =0, and I2 values, and so on until Ii.

7.4.4 The latest start time of a job 7.4.4 The latest start time of a job

In this subsection, we define the equations for the latest start time Si,j of the last In this subsection, we define the equations for the latest start time Si,j of the last NPR a job τi,j, latest finish time Fi,j of τi,j, and the worst-case response time NPR a job τi,j, latest finish time Fi,j of τi,j, and the worst-case response time Ri of a task τi. All of those equations include the computation of the CRPD Ri of a task τi. All of those equations include the computation of the CRPD values for each accounted job. values for each accounted job. We first define the computation for the latest start time Si,j of the last NPR We first define the computation for the latest start time Si,j of the last NPR of the j-th job of τi after the critical instant. of the j-th job of τi after the critical instant.

Definition 7.14. The latest start time Si,j of the last NPR of the j-th instance of Definition 7.14. The latest start time Si,j of the last NPR of the j-th instance of τi is equal to the least fixed point of the following recursion: τi is equal to the least fixed point of the following recursion: ⎧ ⎧ ⎪ (0) ⎪ (0) ⎪Si,j = bi + Ei ⎪Si,j = bi + Ei ⎪ ⎪ ⎪ (r) − × − × ⎪ (r) − × − × ⎨⎪Si,j = bi +(j 1) Ci + γi((j 1) Ti) ⎨⎪Si,j = bi +(j 1) Ci + γi((j 1) Ti) + Ei + γi,l −1(Ii)+ (7.15) + Ei + γi,l −1(Ii)+ (7.15) ⎪ i ⎪ i ⎪ (r−1) ⎪ (r−1) ⎪ S ⎪ S ⎪ i,j × (r−1) ⎪ i,j × (r−1) ⎩⎪ + +1 Ch + γh(Si,j ) ⎩⎪ + +1 Ch + γh(Si,j ) Th Th τh∈hp(i) τh∈hp(i)

In Equation 7.15, we consider the critical instant from Theorem 7.1, proved In Equation 7.15, we consider the critical instant from Theorem 7.1, proved by Bril et al. [112]. by Bril et al. [112]. 128

98 Chapter 7. Preemption-delay aware analysis for LPS-FPP 98 Chapter 7. Preemption-delay aware analysis for LPS-FPP

Proposition 7.12. Assuming that the Level-i pending workload is greater than Proposition 7.12. Assuming that the Level-i pending workload is greater than zero at each time point from the critical instant until j × Ti, then starting from zero at each time point from the critical instant until j × Ti, then starting from the critical instant, Si,j is the upper bound on the latest start time of the last the critical instant, Si,j is the upper bound on the latest start time of the last NPR of τi,j. NPR of τi,j. Proof. Based on the proposition assumption on the Level-i pending workload, Proof. Based on the proposition assumption on the Level-i pending workload, the start time of τi,j is affected by all preceding j − 1 jobs. Consider the least the start time of τi,j is affected by all preceding j − 1 jobs. Consider the least r r−1 r r−1 ﬁxed point of Equation 7.15, for which Si,j = Si,j = Si,j . At this point, the ﬁxed point of Equation 7.15, for which Si,j = Si,j = Si,j . At this point, the equation accounts for the following upper bounds and worst-case execution equation accounts for the following upper bounds and worst-case execution times: times:

(j − 1) × Ci – WCETs of the ﬁrst j − 1 jobs of τi, which execute prior (j − 1) × Ci – WCETs of the ﬁrst j − 1 jobs of τi, which execute prior to τi,j. to τi,j.

γi((j − 1) × Ti) – By Proposition 7.10, it is an upper bound on CRPD of γi((j − 1) × Ti) – By Proposition 7.10, it is an upper bound on CRPD of the ﬁrst j − 1 jobs of τi. the ﬁrst j − 1 jobs of τi.

Ei, which is the worst-case execution time, assumed by the system model, Ei, which is the worst-case execution time, assumed by the system model, of the first li − 1 NPRs. of the first li − 1 NPRs. γi,li−1(Ii) – By Propostion 7.9, it is an upper bound on CRPD of a γi,li−1(Ii) – By Propostion 7.9, it is an upper bound on CRPD of a single job of τi (in this case τi,j) in the first li − 1 NPRs during the time single job of τi (in this case τi,j) in the first li − 1 NPRs during the time interval Ii. interval Ii. Si,j Si,j ( +1)× Ch ( +1)× Ch τh∈hp(i) Th – Worst-case interference caused by the τh∈hp(i) Th – Worst-case interference caused by the execution of jobs of higher priority tasks. The interference is accounted execution of jobs of higher priority tasks. The interference is accounted as the sum of individual WCET values of each job with higher priority as the sum of individual WCET values of each job with higher priority than τi, that can be released during Si,j. The maximum number of jobs, than τi, that can be released during Si,j. The maximum number of jobs, of a single higher priority task τh, which can be released during Si,j is of a single higher priority task τh, which can be released during Si,j is S S i,j +1 i,j +1 accounted by term Th . accounted by term Th . h i,j h i,j τh∈hp(i) γ (S ) – CRPD upper bound on all jobs of the higher priority τh∈hp(i) γ (S ) – CRPD upper bound on all jobs of the higher priority tasks that can be released during Si,j. By Proposition 7.10, γh(Si,j) is an tasks that can be released during Si,j. By Proposition 7.10, γh(Si,j ) is an upper bound on the CRPD of all jobs of τh which can be released within upper bound on the CRPD of all jobs of τh which can be released within Si,j. Then, the sum of CRPD bounds γh(Si,j) for each higher priority Si,j. Then, the sum of CRPD bounds γh(Si,j) for each higher priority task than τi is a safe upper bound on the cumulative CRPD in all jobs task than τi is a safe upper bound on the cumulative CRPD in all jobs with higher priority than τi, released during Si,j. with higher priority than τi, released during Si,j. Since Equation 7.15 sums and considers upper bounds on all factors that Since Equation 7.15 sums and considers upper bounds on all factors that can delay the start time of the last NPR of τi,j, the proposition holds. can delay the start time of the last NPR of τi,j, the proposition holds. 129

7.4 Preemption-delay aware RTA for LPS-FPP 99 7.4 Preemption-delay aware RTA for LPS-FPP 99

7.4.5 The latest ﬁnish time of a job 7.4.5 The latest ﬁnish time of a job

For the computation of Fi,j (see Figure 7.2) we need to consider the worst-case For the computation of Fi,j (see Figure 7.2) we need to consider the worst-case execution time of the last NPR of τi including the upper bound on CRPD: execution time of the last NPR of τi including the upper bound on CRPD: last last Definition 7.15. The worst-case execution time qi of the last NPR of τi, Definition 7.15. The worst-case execution time qi of the last NPR of τi, including the upper bound on CRPD during its execution, is defined in the including the upper bound on CRPD during its execution, is defined in the following equation: following equation: last × last × qi = qi,li + UCB i,li−1 ∩ ECB h BRT (7.16) qi = qi,li + UCB i,li−1 ∩ ECB h BRT (7.16)

τh∈hp(i) τh∈hp(i) last last Proposition 7.13. qi is an upper bound on the longest execution time of the Proposition 7.13. qi is an upper bound on the longest execution time of the last NPR of τi, including the upper bound on CRPD during its execution. last NPR of τi, including the upper bound on CRPD during its execution.

Proof. The CRPD of a non-preemptive region δi,li can not be higher than BRT Proof. The CRPD of a non-preemptive region δi,li can not be higher than BRT multiplied by the number of cache blocks such that: multiplied by the number of cache blocks such that:

the block is useful at the point preceding δi,li , the block is useful at the point preceding δi,li ,

the block can be evicted by some task with higher priority than τi at the block can be evicted by some task with higher priority than τi at

PP i,li−1 PP i,li−1

This is accounted in Equation 7.16 by the set intersection between UCB i,li−1 This is accounted in Equation 7.16 by the set intersection between UCB i,li−1 and union of ECB h sets. Since the equation sums the WCET of δi,li and its and union of ECB h sets. Since the equation sums the WCET of δi,li and its CRPD upper bound, then the proposition holds. CRPD upper bound, then the proposition holds.

Definition 7.16. The latest finish time Fi,j of the j-th instance of τi is equal to: Definition 7.16. The latest finish time Fi,j of the j-th instance of τi is equal to: last last Fi,j = Si,j + qi (7.17) Fi,j = Si,j + qi (7.17)

Proposition 7.14. Assuming that the Level-i pending workload is greater than Proposition 7.14. Assuming that the Level-i pending workload is greater than zero at each time point from the critical instant until j × Ti, then starting from zero at each time point from the critical instant until j × Ti, then starting from the critical instant, Fi,j is the upper bound on the latest ﬁnish time of the last the critical instant, Fi,j is the upper bound on the latest ﬁnish time of the last NPR of τi,j. NPR of τi,j.

Proof. Follows from Propositions 7.12 and 7.13. Proof. Follows from Propositions 7.12 and 7.13.

7.4.6 Level-i active period 7.4.6 Level-i active period Now, we deﬁne the upper bound on the Level-i active period accounting for the Now, we deﬁne the upper bound on the Level-i active period accounting for the CRPD. CRPD. 130

100 Chapter 7. Preemption-delay aware analysis for LPS-FPP 100 Chapter 7. Preemption-delay aware analysis for LPS-FPP

Definition 7.17. The level-i active period Li, accounting for the CRPD on all Definition 7.17. The level-i active period Li, accounting for the CRPD on all jobs within the period, is defined as follows: jobs within the period, is defined as follows: ⎧ ⎧ (0) (0) ⎨⎪L = bi + Ci ⎨⎪L = bi + Ci i i (r) (r−1) × (r−1) (7.18) (r) (r−1) × (r−1) (7.18) ⎩⎪Li = bi + (( Li /Tk +1) Ck + γk(Li )) ⎩⎪Li = bi + (( Li /Tk +1) Ck + γk(Li )) τk∈hpe(i) τk∈hpe(i)

Proposition 7.15. Li is an upper bound on Level-i period. Proposition 7.15. Li is an upper bound on Level-i period. Proof. Similar to the proof for Proposition 7.12. Consider the least ﬁxed point Proof. Similar to the proof for Proposition 7.12. Consider the least ﬁxed point r r−1 r r−1 of Equation 7.18, for which Li = Li = Li . At this point, the equation of Equation 7.18, for which Li = Li = Li . At this point, the equation accounts for the following upper bounds and worst-case execution times: accounts for the following upper bounds and worst-case execution times:

bi – By Proposition 7.1 and Deﬁnition 7.4, bi is an upper bound on lower bi – By Proposition 7.1 and Deﬁnition 7.4, bi is an upper bound on lower priority blocking on τi. priority blocking on τi. Li Li ( +1)× Ch ( +1)× Ch τk∈hpe(i) Tk – The worst-case execution time of all jobs τk∈hpe(i) Tk – The worst-case execution time of all jobs of τi and higher priority jobs than τi which can be released within Li. of τi and higher priority jobs than τi which can be released within Li. The maximum number of jobs, of a single task τk ∈ hpe(i), which can The maximum number of jobs, of a single task τk ∈ hpe(i), which can Li Li Li +1 Li +1 be released during is accounted by the following term Tk . be released during is accounted by the following term Tk . k j i k j i τk∈hpe(i) γ (L ) – CRPD upper bound on all jobs of τ and higher τk∈hpe(i) γ (L ) – CRPD upper bound on all jobs of τ and higher priority jobs than τi which can be released within Li. Follows from priority jobs than τi which can be released within Li. Follows from Proposition 7.10. Proposition 7.10. Since Equation 7.18 sums and considers upper bounds on all factors that can Since Equation 7.18 sums and considers upper bounds on all factors that can extend the duration of the Level-i active period, the proposition holds. extend the duration of the Level-i active period, the proposition holds.

7.4.7 The worst-case response time of a task 7.4.7 The worst-case response time of a task Finally, we can deﬁne the worst-case response time: Finally, we can deﬁne the worst-case response time:

Definition 7.18. The worst case response time Ri of τi is equal to the maximum Definition 7.18. The worst case response time Ri of τi is equal to the maximum relative latest finish time of a job τi,j which is released within Li. relative latest finish time of a job τi,j which is released within Li. { − − × } { − − × } Ri =max Fi,j (j 1) Ti (7.19) Ri =max Fi,j (j 1) Ti (7.19) j∈[1,Li/Ti] j∈[1,Li/Ti]

Theorem 7.2. Ri is the safe upper bound on the worst-case response time of τi. Theorem 7.2. Ri is the safe upper bound on the worst-case response time of τi.

Proof. Since each respective Fi,j value is computed within Li, then the propo- Proof. Since each respective Fi,j value is computed within Li, then the proposition assumption from Proposition 7.14 holds. Then, by Theorem 7.1 Ri is the sition assumption from Proposition 7.14 holds. Then, by Theorem 7.1 Ri is the safe upper bound on the worst-case response time of τi, which concludes the safe upper bound on the worst-case response time of τi, which concludes the proof. proof. 131

7.5 Evaluation 101 7.5 Evaluation 101

7.4.8 Schedulability analysis 7.4.8 Schedulability analysis Given the above-proposed response time analysis, the taskset schedulability is Given the above-proposed response time analysis, the taskset schedulability is determined by computing whether for each task the derived response time is determined by computing whether for each task the derived response time is less than or equal to the task deadline. I.e. if for any task τi in Γ the inequation less than or equal to the task deadline. I.e. if for any task τi in Γ the inequation Ri >Di holds, the taskset is deemed unschedulable by the proposed analysis. Ri >Di holds, the taskset is deemed unschedulable by the proposed analysis. As described before, in Section 7.2, the schedulability test can be simplified to As described before, in Section 7.2, the schedulability test can be simplified to the first job of each task, as proposed by Yao et al. [58]. Thus, the proposed the first job of each task, as proposed by Yao et al. [58]. Thus, the proposed approach can be adopted in the following simplified schedulability test (see approach can be adopted in the following simplified schedulability test (see Theorem 3 from [58]), which is based on discontinuous points of cumulative Theorem 3 from [58]), which is based on discontinuous points of cumulative execution-request functions, proposed by Bini and Buttazzo [113]. In this case, execution-request functions, proposed by Bini and Buttazzo [113]. In this case, the cumulative execution request Wi(t) for τi and all higher priority tasks, the cumulative execution request Wi(t) for τi and all higher priority tasks, within duration t, is computed using the following equation. within duration t, is computed using the following equation. t × t × Wi(t)=Ei + γi,li−1(t)+ +1 Ch + γh(t) (7.20) Wi(t)=Ei + γi,li−1(t)+ +1 Ch + γh(t) (7.20) Th Th τh∈hp(i) τh∈hp(i)

With this test, taskset Γ is schedulable if for each task τi in Γ there exists With this test, taskset Γ is schedulable if for each task τi in Γ there exists ∈ − ∈ − t (0,Di qi,li ] such that: t (0,Di qi,li ] such that:

Wi(t)+bi ≤ t (7.21) Wi(t)+bi ≤ t (7.21)

The above equation simply asserts whether the ﬁnal non-preemptive region The above equation simply asserts whether the ﬁnal non-preemptive region of τi can start at least qi,li time units before the deadline. of τi can start at least qi,li time units before the deadline.

7.5 Evaluation 7.5 Evaluation

In this section, we present and discuss the evaluation results on the effectiveness In this section, we present and discuss the evaluation results on the effectiveness of different approaches to identify schedulable tasksets. We primarily compare of different approaches to identify schedulable tasksets. We primarily compare the existing feasibility analysis (Feasibility Analysis), proposed by Yao et al. the existing feasibility analysis (Feasibility Analysis), proposed by Yao et al. [57, 58], and the proposed RTA analysis (denoted as CRPD-Fixed-PP). Addi- [57, 58], and the proposed RTA analysis (denoted as CRPD-Fixed-PP). Addi- tionally, we compare those two approaches with state of the art CRPD analysis tionally, we compare those two approaches with state of the art CRPD analysis (Combined-Multiset) for fully-preemptive scheduling, proposed by Altmeyer et (Combined-Multiset) for fully-preemptive scheduling, proposed by Altmeyer et al. [45], and unsafe (NO-CRPD) which does not account for any preemption al. [45], and unsafe (NO-CRPD) which does not account for any preemption cost on preemption points, thus resulting in an unsafe approximation. cost on preemption points, thus resulting in an unsafe approximation.

7.5.1 Evaluation setup 7.5.1 Evaluation setup For the evaluation, we use basic cache-set conﬁgurations obtained with a low- For the evaluation, we use basic cache-set conﬁgurations obtained with a low- level analysis tool, named LLVMTA [2]. LLVMTA uses LLVM machine level analysis tool, named LLVMTA [2]. LLVMTA uses LLVM machine 132

102 Chapter 7. Preemption-delay aware analysis for LPS-FPP 102 Chapter 7. Preemption-delay aware analysis for LPS-FPP intermediate representation after all backend compiler passes have run. The intermediate representation after all backend compiler passes have run. The resulting machine program corresponds to a CFG reconstruction of a binary resulting machine program corresponds to a CFG reconstruction of a binary file. Cache-set configurations are obtained on real-time programs from the file. Cache-set configurations are obtained on real-time programs from the Mälardalen benchmark [3], given in Table 8.2. The obtained parameters per Mälardalen benchmark [3], given in Table 8.2. The obtained parameters per each program are set of evicting (ECB) and definitely useful cache blocks (UCB) each program are set of evicting (ECB) and definitely useful cache blocks (UCB) (in the table we show the size of each set). Also, the maximum number (Max) (in the table we show the size of each set). Also, the maximum number (Max) of definitely useful cache blocks per any preemption point of each program is of definitely useful cache blocks per any preemption point of each program is obtained. The assumed direct-mapped cache memory consists of 256 sets with a obtained. The assumed direct-mapped cache memory consists of 256 sets with a line size of 8 bytes. For more details about the low-level analysis refer to [114]. line size of 8 bytes. For more details about the low-level analysis refer to [114]. In all of the experiments, we generate 2000 tasksets for each investigated In all of the experiments, we generate 2000 tasksets for each investigated parameter value. To include realistic cache-set usage, each task in a taskset parameter value. To include realistic cache-set usage, each task in a taskset is randomly assigned one of the distinct cache-set configurations presented in is randomly assigned one of the distinct cache-set configurations presented in Table 8.2. Since the task binaries were analysed individually, they all start at Table 8.2. Since the task binaries were analysed individually, they all start at the same address (mapping to cache set 0). In a multi-task scheduling situation, the same address (mapping to cache set 0). In a multi-task scheduling situation, this can hardly be the case because the ECB and UCB placement is determined this can hardly be the case because the ECB and UCB placement is determined by their respective locations in memory. We took this into account by randomly by their respective locations in memory. We took this into account by randomly shifting the cache set indices, e.g. the ECB in cache set i is shifted to the shifting the cache set indices, e.g. the ECB in cache set i is shifted to the

Program ECB UCB Max Program ECB UCB Max Program ECB UCB Max Program ECB UCB Max adpcm 256 230 103 lcdnum 51 11 9 adpcm 256 230 103 lcdnum 51 11 9 bs 43 23 20 lms 242 134 38 bs 43 23 20 lms 242 134 38 bsort100 57 40 30 ludcmp 210 168 44 bsort100 57 40 30 ludcmp 210 168 44 cnt 123 58 44 matmult 85 51 31 cnt 123 58 44 matmult 85 51 31 compress 247 150 63 minver 256 178 47 compress 247 150 63 minver 256 178 47 cover 256 38 15 ndes 253 176 38 cover 256 38 15 ndes 253 176 38 crc 121 62 30 ns 55 37 34 crc 121 62 30 ns 55 37 34 edn 256 222 123 nsichneu 256 183 2 edn 256 222 123 nsichneu 256 183 2 expint 117 47 29 prime 75 47 33 expint 117 47 29 prime 75 47 33 fdct 126 113 62 qsort-ex. 142 83 39 fdct 126 113 62 qsort-ex. 142 83 39 fft1 222 154 63 qurt 130 40 26 fft1 222 154 63 qurt 130 40 26 fibcall 28 16 16 select 159 73 55 fibcall 28 16 16 select 159 73 55 fir 94 42 21 sqrt 53 21 12 fir 94 42 21 sqrt 53 21 12 insertsort 29 16 15 st 192 95 52 insertsort 29 16 15 st 192 95 52 janne_co. 39 28 27 statemate 256 105 1 janne_co. 39 28 27 statemate 256 105 1 jfdctint 132 122 54 ud 194 151 39 jfdctint 132 122 54 ud 194 151 39

Table 7.2. Cache conﬁgurations obtained with LLVMTA [2] analysis tool used on Mälardalen Table 7.2. Cache conﬁgurations obtained with LLVMTA [2] analysis tool used on Mälardalen benchmark programs [3]. benchmark programs [3]. 133

7.5 Evaluation 103 7.5 Evaluation 103 cache line equal to (i + random(256)) modulo 256. Further task parameters cache line equal to (i + random(256)) modulo 256. Further task parameters were generated using the evaluation setup proposed by Altmeyer et al. [44, 45] were generated using the evaluation setup proposed by Altmeyer et al. [44, 45] which is used in many CRPD-related works, to exhaustively evaluate CRPD- which is used in many CRPD-related works, to exhaustively evaluate CRPD- based analyses. Utilisation of each task was generated using the standardised based analyses. Utilisation of each task was generated using the standardised algorithm in real-time research – UUnifast, proposed by Bini and Butazzo [31]. algorithm in real-time research – UUnifast, proposed by Bini and Butazzo [31]. The task periods were generated according to a uniform distribution from 5 The task periods were generated according to a uniform distribution from 5 to 500 milliseconds, representing the general spread of periods in automotive to 500 milliseconds, representing the general spread of periods in automotive and aerospace hard real-time applications [44]. The worst-case execution time and aerospace hard real-time applications [44]. The worst-case execution time for each task was computed with the following equation Ci = Ui × Ti, where for each task was computed with the following equation Ci = Ui × Ti, where Ui represents the utilisation of τi. Task deadlines were implicit, meaning that Ui represents the utilisation of τi. Task deadlines were implicit, meaning that Di = Ti. Task priorities were assigned according to the deadline-monotonic Di = Ti. Task priorities were assigned according to the deadline-monotonic order. Furthermore, to allow for an exhaustive evaluation, we generated other order. Furthermore, to allow for an exhaustive evaluation, we generated other parameters for LP-FPP task model and in different experiments we varied parameters for LP-FPP task model and in different experiments we varied different parameters. different parameters.

Default evaluation setup Default evaluation setup

The number of subjobs for each task was randomly generated according to a The number of subjobs for each task was randomly generated according to a uniform distribution from the range [3, 100] as this is a representative range uniform distribution from the range [3, 100] as this is a representative range of non-preemptive regions used in automotive real-time applications (where of non-preemptive regions used in automotive real-time applications (where NPRs are also called runnables). The worst-case execution time of each subjob NPRs are also called runnables). The worst-case execution time of each subjob was set to Ci/li. The default taskset size was set to 6. For each preemption was set to Ci/li. The default taskset size was set to 6. For each preemption point PP i,k of a task τi, its useful cache blocks UCB i,k were generated from point PP i,k of a task τi, its useful cache blocks UCB i,k were generated from the UCB set, assigned to τk from Table 8.2. Also, we pessimistically assumed the UCB set, assigned to τk from Table 8.2. Also, we pessimistically assumed that each point of τk exhibits a maximum number of UCBs, given in the table. that each point of τk exhibits a maximum number of UCBs, given in the table. Accordingly, each evicting cache block was included at the NPRs prior and after Accordingly, each evicting cache block was included at the NPRs prior and after the preemption point where it is useful. The assumed block reload time was 8 the preemption point where it is useful. The assumed block reload time was 8 microseconds, as given in [44]. microseconds, as given in [44]. To increase the exhaustiveness of the performed evaluation, we used the To increase the exhaustiveness of the performed evaluation, we used the weighted schedulability measure which also enables emulation of a 3-dime- weighted schedulability measure which also enables emulation of a 3-dimensional plot to a 2-dimensional one, as proposed by Bastoni et al. [68]. In nsional plot to a 2-dimensional one, as proposed by Bastoni et al. [68]. In the shown ﬁgures, we show the weighted schedulability measure Wy(ρ), for the shown ﬁgures, we show the weighted schedulability measure Wy(ρ), for schedulability test y as a function of parameter ρ. For each value of ρ, this schedulability test y as a function of parameter ρ. For each value of ρ, this measure combines data for all of the tasksets generated for each utilisation level measure combines data for all of the tasksets generated for each utilisation level from 0.7 to 1, with a step of 0.2. For each value ρ of the selected parameter, from 0.7 to 1, with a step of 0.2. For each value ρ of the selected parameter, the schedulability measure is Wy(ρ)= ∀Γ(UΓ × By(Γ,ρ))/ ∀Γ UΓ, where the schedulability measure is Wy(ρ)= ∀Γ(UΓ × By(Γ,ρ))/ ∀Γ UΓ, where By(Γ,ρ) is a result (1 if schedulable, 0 otherwise) of a schedulability test y for a By(Γ,ρ) is a result (1 if schedulable, 0 otherwise) of a schedulability test y for a taskset Γ and parameter value ρ (UΓ is a taskset utilisation). A method producing taskset Γ and parameter value ρ (UΓ is a taskset utilisation). A method producing the highest weighted measure is the most prone to identify schedulable tasksets. the highest weighted measure is the most prone to identify schedulable tasksets. 134

104 Chapter 7. Preemption-delay aware analysis for LPS-FPP 104 Chapter 7. Preemption-delay aware analysis for LPS-FPP

100 100

80 80

60 NO-CRPD 60 NO-CRPD CRPD-Fixed-PP CRPD-Fixed-PP Combined-Multiset Combined-Multiset 40 Feasibility Analysis 40 Feasibility Analysis

20 20 Schedulable tasksets (%) Schedulable tasksets (%) 0 0 70 75 80 85 90 95 100 70 75 80 85 90 95 100 Utilisation Utilisation

Figure 7.5. Schedulability ratio at different taskset utilisation. Figure 7.5. Schedulability ratio at different taskset utilisation.

7.5.2 Experiments 7.5.2 Experiments

In the first experiment (Fig.7.5), we evaluated the impact of the taskset util- In the first experiment (Fig.7.5), we evaluated the impact of the taskset utilisation (x-axis) on the effectiveness of the analyses to determine the taskset isation (x-axis) on the effectiveness of the analyses to determine the taskset schedulability (y-axis). We notice that CRPD-Fixed-PP succeeds to identify schedulability (y-axis). We notice that CRPD-Fixed-PP succeeds to identify significantly more schedulable tasksets compared to (Feasibility Analysis) and significantly more schedulable tasksets compared to (Feasibility Analysis) and (Combined-Multiset), e.g. for UΓ =0.88, CRPD-Fixed-PP identifies 48%, (Combined-Multiset), e.g. for UΓ =0.88, CRPD-Fixed-PP identifies 48%, Combined-Multiset 34%, and Feasibility-Analysis only 0.6% of all generated Combined-Multiset 34%, and Feasibility-Analysis only 0.6% of all generated tasksets as schedulable. tasksets as schedulable. In the second experiment (Fig.7.6), we evaluated the impact of the taskset In the second experiment (Fig.7.6), we evaluated the impact of the taskset size (x-axis) on the effectiveness of the analyses to determine the taskset schedu- size (x-axis) on the effectiveness of the analyses to determine the taskset schedulability (y-axis), with the weighted measure. The evaluated taskset size ranges lability (y-axis), with the weighted measure. The evaluated taskset size ranges from 3 to 10. We notice that with the increase in taskset size, all methods deteri- from 3 to 10. We notice that with the increase in taskset size, all methods deteriorate in finding schedulable tasksets. However, CRPD-Fixed-PP deteriorates orate in finding schedulable tasksets. However, CRPD-Fixed-PP deteriorates with the lowest rate, still being able to find the largest number of schedulable with the lowest rate, still being able to find the largest number of schedulable tasksets compared to Feasibility-Analysis and Combined-Multiset. tasksets compared to Feasibility-Analysis and Combined-Multiset. In the third experiment (Fig.7.7), we evaluated the impact of the numbers In the third experiment (Fig.7.7), we evaluated the impact of the numbers of NPRs per task (x-axis) on the effectiveness of the analyses to determine the of NPRs per task (x-axis) on the effectiveness of the analyses to determine the taskset schedulability (y-axis), with the weighted measure. In this experiment, taskset schedulability (y-axis), with the weighted measure. In this experiment, we randomly generated the number of NPRs from the uniform distribution in we randomly generated the number of NPRs from the uniform distribution in the range [3,x] where x is a value on the x-axis from the figure. We notice the range [3,x] where x is a value on the x-axis from the figure. We notice that the greater the number of NPRs, the higher the weighted schedulability of that the greater the number of NPRs, the higher the weighted schedulability of CRPD-Fixed-PP, meaning it is more prone to identify schedulable tasksets. This CRPD-Fixed-PP, meaning it is more prone to identify schedulable tasksets. This 135

7.5 Evaluation 105 7.5 Evaluation 105

0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 Weighted measure Weighted measure 0.1 0.1 0 0 345678910 345678910 Taskset size Taskset size

Figure 7.6. Weighted measure at different taskset size. Figure 7.6. Weighted measure at different taskset size.

0.6 0.6 0.5 0.5

0.4 Combined-Multiset 0.4 Combined-Multiset CRPD-Fixed-PP CRPD-Fixed-PP 0.3 NO-CRPD 0.3 NO-CRPD Feasibility Analysis Feasibility Analysis 0.2 0.2 Weighted measure Weighted measure 0.1 0.1 0 0 20 40 60 80 100 20 40 60 80 100 Maximum number of NPRs Maximum number of NPRs

Figure 7.7. Weighted measure at different upper bound on number of non-preemptive regions. Figure 7.7. Weighted measure at different upper bound on number of non-preemptive regions. is the case because the greater the number of NPRs, the lower the blocking from is the case because the greater the number of NPRs, the lower the blocking from the lower priority tasks. In contrast, with Feasibility-Analysis it is the opposite the lower priority tasks. In contrast, with Feasibility-Analysis it is the opposite because the greater number of NPRs, the greater estimation of preemption cost because the greater number of NPRs, the greater estimation of preemption cost in this case. in this case. In the fourth experiment (Fig.7.8), we evaluated the impact of the number In the fourth experiment (Fig.7.8), we evaluated the impact of the number of UCBs per preemption point (x-axis) on the effectiveness of the analyses to of UCBs per preemption point (x-axis) on the effectiveness of the analyses to determine the taskset schedulability (y-axis), with the weighted measure. In determine the taskset schedulability (y-axis), with the weighted measure. In 136

106 Chapter 7. Preemption-delay aware analysis for LPS-FPP 106 Chapter 7. Preemption-delay aware analysis for LPS-FPP

0.6 0.6 0.5 0.5 0.4 0.4 Combined-Multiset Combined-Multiset 0.3 CRPD-Fixed-PP 0.3 CRPD-Fixed-PP NO-CRPD NO-CRPD 0.2 Feasibility Analysis 0.2 Feasibility Analysis Weighted measure Weighted measure 0.1 0.1 0 0 0 64 128 192 256 0 64 128 192 256 Maximum number of UCBs per PP Maximum number of UCBs per PP

Figure 7.8. Weighted measure at different upper bound on number of UCBs per preemption point. Figure 7.8. Weighted measure at different upper bound on number of UCBs per preemption point. this experiment, we removed the pessimistic consideration from the default this experiment, we removed the pessimistic consideration from the default evaluation setup that all preemption points exhibit the maximum number (Max evaluation setup that all preemption points exhibit the maximum number (Max from Table 8.2) of UCBs at each preemption point. Therefore, for each pre- from Table 8.2) of UCBs at each preemption point. Therefore, for each preemption point PP i,k, we randomly generated the number of UCBs from the emption point PP i,k, we randomly generated the number of UCBs from the uniform distribution in the range [x, Max i] where x is a value on the x-axis uniform distribution in the range [x, Max i] where x is a value on the x-axis from the figure, and Max i the maximum number of UCBs of τi, derived from from the figure, and Max i the maximum number of UCBs of τi, derived from the configuration assigned from Table 8.2. In cases when x value exceeded the configuration assigned from Table 8.2. In cases when x value exceeded Max i, Max i was assigned. We notice that only CRPD-Fixed-PP is affected in Max i, Max i was assigned. We notice that only CRPD-Fixed-PP is affected in this experiment. In case when the number of UCBs per point is derived from this experiment. In case when the number of UCBs per point is derived from the range [0, Max i] the proposed method is more prone to identify schedula- the range [0, Max i] the proposed method is more prone to identify schedulable taskset, as it accounts for the variability of CRPD throughout the task’s ble taskset, as it accounts for the variability of CRPD throughout the task’s execution, unlike the other presented methods. execution, unlike the other presented methods.

0.55 0.55

0.5 0.5

0.45 0.45 Weighted measure Weighted measure 0.4 0.4 C.M. C.F.P C.F.P' C.M. C.F.P C.F.P'

Figure 7.9. Weighted measure at different reload-ability conditions. Figure 7.9. Weighted measure at different reload-ability conditions. 137

7.6 Summary and Conclusions 107 7.6 Summary and Conclusions 107

In the fifth experiment (Fig.7.9), we evaluated the impact of changing the In the fifth experiment (Fig.7.9), we evaluated the impact of changing the reload-ability condition on CRPD-aware methods. We considered two cases. reload-ability condition on CRPD-aware methods. We considered two cases. Bars (C.M. – Combined-Multiset) and (C.F.P. – CRPD-Fixed-PP) represent the Bars (C.M. – Combined-Multiset) and (C.F.P. – CRPD-Fixed-PP) represent the weighted measures obtained with default evaluation setup, where each UCB weighted measures obtained with default evaluation setup, where each UCB within the task’s execution is definitely reloadable (since ECBs are assigned in within the task’s execution is definitely reloadable (since ECBs are assigned in NPRs directly preceding and succeeding the point where the block is useful). NPRs directly preceding and succeeding the point where the block is useful). Bar (C.F.P’. – CRPD-Fixed-PP) considers the case where UCB is reloadable Bar (C.F.P’. – CRPD-Fixed-PP) considers the case where UCB is reloadable only once throughout the entire sequence of preemption points where it is only once throughout the entire sequence of preemption points where it is useful, thus resulting in more identified schedulable tasksets. This is even more useful, thus resulting in more identified schedulable tasksets. This is even more important for tasks with variable initialization in the beginning, and dispatching important for tasks with variable initialization in the beginning, and dispatching at the task’s end. at the task’s end.

7.6 Summary and Conclusions 7.6 Summary and Conclusions

In this chapter, we first described the importance of the self-pushing phe- In this chapter, we first described the importance of the self-pushing phenomenon on the schedulability analysis of tasks with non-preemptive regions. nomenon on the schedulability analysis of tasks with non-preemptive regions. Then, we presented the existing feasibility analysis for LPS-FPP, proposed Then, we presented the existing feasibility analysis for LPS-FPP, proposed by Yao et al.[57, 58], and we showed the problems in the way their analysis by Yao et al.[57, 58], and we showed the problems in the way their analysis accounts for preemption delays. Based on the identified problems, we proposed accounts for preemption delays. Based on the identified problems, we proposed a preemption-delay aware response-time analysis for tasks with non-preemptive a preemption-delay aware response-time analysis for tasks with non-preemptive regions under fixed-priority preemptive scheduling. The analysis accounts for regions under fixed-priority preemptive scheduling. The analysis accounts for a tighter estimation of cache-related preemption delays thus dominating the a tighter estimation of cache-related preemption delays thus dominating the existing feasibility analysis. The proposed analysis is based on the observations existing feasibility analysis. The proposed analysis is based on the observations that: that: • the number of cache blocks which can be reloaded during the execution • the number of cache blocks which can be reloaded during the execution of a task can be smaller than the number of cache blocks which can be of a task can be smaller than the number of cache blocks which can be useful in a task, useful in a task, • a single job, with non-preemptive regions, can be affected by preemption • a single job, with non-preemptive regions, can be affected by preemption delay only from the preemptions occurring during the maximum time delay only from the preemptions occurring during the maximum time duration from the start of its first NPR, until the start of its last NPR. duration from the start of its first NPR, until the start of its last NPR. The evaluation shows that the proposed analysis significantly improves The evaluation shows that the proposed analysis significantly improves the state of the art of preemption-delay aware schedulability analyses, since the state of the art of preemption-delay aware schedulability analyses, since it accounts for the estimation of tighter bounds on cache-related preemption it accounts for the estimation of tighter bounds on cache-related preemption delays. delays. 138 139

Chapter 8 Chapter 8

Preemption-delay aware Preemption-delay aware schedulability analysis schedulability analysis for fully-preemptive tasks for fully-preemptive tasks

In this chapter, we propose a novel preemption-delay aware schedulability In this chapter, we propose a novel preemption-delay aware schedulability analysis for sporadic fully-preemptive tasks (see Section 5.4). We assume the analysis for sporadic fully-preemptive tasks (see Section 5.4). We assume the general task model, represented with a control flow graph consisting of basic general task model, represented with a control flow graph consisting of basic blocks and edges, and consider single-core systems with the single-level direct- blocks and edges, and consider single-core systems with the single-level direct- mapped or set-associative LRU cache. The chapter is divided into the following mapped or set-associative LRU cache. The chapter is divided into the following sections: sections: ECB- and UCB-based CRPD approaches – a description of SOTA schedu- ECB- and UCB-based CRPD approaches – a description of SOTA schedulability analyses for fully-preemptive tasks. lability analyses for fully-preemptive tasks. Problem statement – a description of the identified problems in SOTA, Problem statement – a description of the identified problems in SOTA,

Improved response-time analysis – the proposed solution, Improved response-time analysis – the proposed solution, Evaluation – a description of experiments and their results, Evaluation – a description of experiments and their results, Summary and Conclusions of the chapter Summary and Conclusions of the chapter

109 109 140

110 Chapter 8. Preemption-delay aware analysis for FPPS 110 Chapter 8. Preemption-delay aware analysis for FPPS

Table 8.1. List of important symbols used in this chapter (CRPD terms are upper bounds). Table 8.1. List of important symbols used in this chapter (CRPD terms are upper bounds).

Symbol Brief explanation Symbol Brief explanation Γ Taskset Γ Taskset

τi Task with index i τi Task with index i Ti The minimum inter-arrival time of τi Ti The minimum inter-arrival time of τi Di Relative deadline of τi Di Relative deadline of τi Ci The worst-case execution time of τi Ci The worst-case execution time of τi Pi Priority of τi Pi Priority of τi Ri The worst-case response time of τi Ri The worst-case response time of τi γi Upper bound on CRPD of τi γi Upper bound on CRPD of τi ECB i Set of accessed (evicting) cache blocks during τi ECB i Set of accessed (evicting) cache blocks during τi UCB i Set of useful cache blocks during τi UCB i Set of useful cache blocks during τi max max ucbi The maximum number of UCBs at a single preemption point of τi ucbi The maximum number of UCBs at a single preemption point of τi hp(i), lp(i) Set of tasks with higher and lower priority than Pi, respectively, and hp(i), lp(i) Set of tasks with higher and lower priority than Pi, respectively, and hpe(i) Set of tasks with higher priority than Pi, including τi hpe(i) Set of tasks with higher priority than Pi, including τi aﬀ (i, h) Set of tasks that can affect Ri, and can be preempted by τh aﬀ (i, h) Set of tasks that can affect Ri, and can be preempted by τh m Cache block m Cache block t Time duration t Time duration γecbu ,γucbu CRPD upper bound computed with ECB-Union, UCB-Union, ECB- γecbu ,γucbu CRPD upper bound computed with ECB-Union, UCB-Union, ECB- γecbm ,γucbm Union multiset, UCB-Union multiset, ECB-Union partitioning, and γecbm ,γucbm Union multiset, UCB-Union multiset, ECB-Union partitioning, and γecbp ,γucbp UCB-Union partitioning method, respectively. γecbp ,γucbp UCB-Union partitioning method, respectively.

γi,h CRPD due to a single job of a higher priority task τh executing γi,h CRPD due to a single job of a higher priority task τh executing within the worst-case response time of task τi. within the worst-case response time of task τi. (τh,τj ) Ordered pair representing the preemption from the preempting task (τh,τj ) Ordered pair representing the preemption from the preempting task τh (first element), on the preempted task τj (second element) τh (first element), on the preempted task τj (second element) λr Set of preemption pairs, representing a single partition of all possible λr Set of preemption pairs, representing a single partition of all possible preemptions (λr ∈ Λi,t) preemptions (λr ∈ Λi,t) Λi,t Multiset of preemption partitions, consisting of all possible preemp- Λi,t Multiset of preemption partitions, consisting of all possible preemptions within duration t, between jobs of the first i tasks of Γ tions within duration t, between jobs of the first i tasks of Γ γi(λr) CRPD from possible preemptions given in partition λr γi(λr) CRPD from possible preemptions given in partition λr γ(i, t) Function that computes CRPD bound, accounting for all preemptions γ(i, t) Function that computes CRPD bound, accounting for all preemptions that can occur within duration t, between the first i tasks of Γ. that can occur within duration t, between the first i tasks of Γ. h h Ej (t) Upper bound on number of times a task τh can preempt τj within Ej (t) Upper bound on number of times a task τh can preempt τj within duration t duration t h h Ai,t Matrix of Ej (t) values for each pair of the first i tasks in Γ Ai,t Matrix of Ej (t) values for each pair of the first i tasks in Γ

(τi, PT ) Preemption scenario, representing single interruption due to preemp- (τi, PT ) Preemption scenario, representing single interruption due to preemption of a task τi by preempting tasks from the set PT tion of a task τi by preempting tasks from the set PT γ(τi, PT ) CRPD of the preemption scenario (τi, PT ) γ(τi, PT ) CRPD of the preemption scenario (τi, PT ) Πc Preemption combination, a set of non-empty preemption scenarios Πc Preemption combination, a set of non-empty preemption scenarios γ(Πc) CRPD of the preemption combination Πc γ(Πc) CRPD of the preemption combination Πc Πi,λ Set of preemption combinations between the ﬁrst i tasks of Γ such Πi,λ Set of preemption combinations between the ﬁrst i tasks of Γ such that each combination is consistent with partition λ that each combination is consistent with partition λ 141

8.1 ECB- and UCB-based CRPD approaches 111 8.1 ECB- and UCB-based CRPD approaches 111

8.1 ECB- and UCB-based CRPD approaches 8.1 ECB- and UCB-based CRPD approaches

In this section, we briefly describe the most relevant CRPD-aware analyses In this section, we briefly describe the most relevant CRPD-aware analyses for understanding the contributions of this chapter. We describe UCB-union for understanding the contributions of this chapter. We describe UCB-union approach [43], ECB-union approach [44], and their multiset variants [45]. approach [43], ECB-union approach [44], and their multiset variants [45]. UCB-union and ECB-union approaches are computed as the least-fixed UCB-union and ECB-union approaches are computed as the least-fixed points of the following equation for the worst-case response time: points of the following equation for the worst-case response time: (l+1) (l) (l+1) (l) Ri = Ci + Ri /Th×(Ch + γi,h) (8.1) Ri = Ci + Ri /Th×(Ch + γi,h) (8.1)

τh∈hp(i) τh∈hp(i)

In Equation 8.1, γi,h represents the CRPD due to a single job of a higher In Equation 8.1, γi,h represents the CRPD due to a single job of a higher priority task τh executing within the worst-case response time of task τi. This priority task τh executing within the worst-case response time of task τi. This term is computed differently for each of the CRPD approaches. term is computed differently for each of the CRPD approaches. ucbu ucbu UCB-Union approach computes γi,h with the following equation, account- UCB-Union approach computes γi,h with the following equation, accounting that a job of τh causes a reload of each cache block which it may access and ing that a job of τh causes a reload of each cache block which it may access and which is useful during the execution of at least one of the tasks from the range which is useful during the execution of at least one of the tasks from the range [τh+1,τh+2, ..., τi]. [τh+1,τh+2, ..., τi]. ucbu ucbu γi,h = UCB k ∩ ECB h × BRT (8.2) γi,h = UCB k ∩ ECB h × BRT (8.2)

τk∈aff (i,h) τk∈aff (i,h) ecbu ecbu ECB-Union approach computes γi,h with the following equation, ac- ECB-Union approach computes γi,h with the following equation, accounting that a job of τh is preempted by all of the tasks with higher prior- counting that a job of τh is preempted by all of the tasks with higher priority than τh, after the job directly preempted one of the tasks from the range ity than τh, after the job directly preempted one of the tasks from the range [τh+1,τh+2, ..., τi]. In this case, the preemption resulting in the highest number [τh+1,τh+2, ..., τi]. In this case, the preemption resulting in the highest number of evicted useful cache blocks is considered. of evicted useful cache blocks is considered. ecbu ecbu γi,h =max ECB h ∩ UCB k × BRT (8.3) γi,h =max ECB h ∩ UCB k × BRT (8.3) τk∈aff (i,h) τk∈aff (i,h) τh ∈hpe(h) τh ∈hpe(h)

Improving the above two approaches, Altmeyer et al. [45] introduced UCB- Improving the above two approaches, Altmeyer et al. [45] introduced UCB- Union multiset and ECB-Union multiset which are computed as the least-ﬁxed Union multiset and ECB-Union multiset which are computed as the least-ﬁxed points of the following equation: points of the following equation: (l+1) (l) (l+1) (l) Ri = Ci + (Ri /Th×Ch + γi,h) (8.4) Ri = Ci + (Ri /Th×Ch + γi,h) (8.4)

τh∈hp(i) τh∈hp(i) where γi,h represents the CRPD due to each job of a higher priority task τh where γi,h represents the CRPD due to each job of a higher priority task τh executing within the the worst-case response time of task τi. executing within the the worst-case response time of task τi. 142

112 Chapter 8. Preemption-delay aware analysis for FPPS 112 Chapter 8. Preemption-delay aware analysis for FPPS

ecbm ecbm ECB-Multiset approach computes γi,h accounting that τh can preempt ECB-Multiset approach computes γi,h accounting that τh can preempt each task τk | h

τk∈aff (i,h) Rk/Th×Ri/Tk τh ∈hpe(h) τk∈aff (i,h) Rk/Th×Ri/Tk τh ∈hpe(h) ecbm ecbm Based on the above multiset, γi,h is computed as the sum of the maximum Based on the above multiset, γi,h is computed as the sum of the maximum Ri/Th values from Mi,h, accounting that only Ri/Th jobs of τh can di- Ri/Th values from Mi,h, accounting that only Ri/Th jobs of τh can directly preempt and cause CRPD on the preemptable jobs accounted in Mi,h. rectly preempt and cause CRPD on the preemptable jobs accounted in Mi,h. ucbm ucbm UCB-Multiset approach computes γi,h by first computing the multiset UCB-Multiset approach computes γi,h by first computing the multiset ucb ucb Mi,h which consists of all possibly useful cache blocks from jobs which can Mi,h which consists of all possibly useful cache blocks from jobs which can be released within Ri, and have priority higher than or equal to τi, and lower be released within Ri, and have priority higher than or equal to τi, and lower than τh. than τh. ucb ucb Mi,h = UCB k (8.6) Mi,h = UCB k (8.6)

τk∈aﬀ (i,h) Rk/Th×Ri/Tk τk∈aﬀ (i,h) Rk/Th×Ri/Tk ecb ecb Next, this approach computes the multiset Mi,h which consists of all possi- Next, this approach computes the multiset Mi,h which consists of all possibly evicting cache blocks within jobs of τh that can be released within Ri. The bly evicting cache blocks within jobs of τh that can be released within Ri. The following equation includes an instance of evicting cache block from τh for following equation includes an instance of evicting cache block from τh for each job of τh that can be released within Ri: each job of τh that can be released within Ri: ecb ecb Mi,h = ECB h (8.7) Mi,h = ECB h (8.7)

Ri/Th Ri/Th

The upper bound on CRPD from jobs of τh preempting all jobs from τh+1 The upper bound on CRPD from jobs of τh preempting all jobs from τh+1 to τi is equal to the size of intersection of those multisets, with accounted block to τi is equal to the size of intersection of those multisets, with accounted block reload time: reload time: ucbm ucb ecb ucbm ucb ecb γi,h = |Mi,h ∩ Mi,h |×BRT (8.8) γi,h = |Mi,h ∩ Mi,h |×BRT (8.8)

The Combined-Multiset approach first computes the worst-case response The Combined-Multiset approach first computes the worst-case response ecbm ecbm ecbm ecbm time Ri using Equation 8.4 and γi,h , and similarly does with UCB-Union time Ri using Equation 8.4 and γi,h , and similarly does with UCB-Union ucbm ucbm ucbm ucbm multiset, using Equation 8.4 and γi,h thus deriving Ri . Then, the final multiset, using Equation 8.4 and γi,h thus deriving Ri . Then, the final ecbm ucbm ecbm ucbm result is computed as min(Ri ,Ri ). result is computed as min(Ri ,Ri ). 143

8.2 Problem statement 113 8.2 Problem statement 113

8.2 Problem statement 8.2 Problem statement

In this section, we present the identiﬁed problems considering CRPD over- In this section, we present the identiﬁed problems considering CRPD over- approximation when using UCB- and ECB-based approaches, including the approximation when using UCB- and ECB-based approaches, including the multiset variants. multiset variants. Problem 1: Combined approach over-approximates the CRPD bounds be- Problem 1: Combined approach over-approximates the CRPD bounds because all preemptions that may occur within a response time Ri are treated the cause all preemptions that may occur within a response time Ri are treated the same, with at most one method at a time. However, within all preemptions that same, with at most one method at a time. However, within all preemptions that may occur within Ri, different preemption sub-groups may be analysed with may occur within Ri, different preemption sub-groups may be analysed with different analyses, thus the CRPD may be further reduced by computing the different analyses, thus the CRPD may be further reduced by computing the bounds for different preemption sub-groups individually instead of computing bounds for different preemption sub-groups individually instead of computing it with one method at a time for a single group of all preemptions. it with one method at a time for a single group of all preemptions. Problem 2: Combined approach accounts for CRPD from many different Problem 2: Combined approach accounts for CRPD from many different preemption combinations, which cannot occur together. This is presented with preemption combinations, which cannot occur together. This is presented with the following example. In Figure 8.1, we present three tasks τ1,τ2, and τ3 with the following example. In Figure 8.1, we present three tasks τ1,τ2, and τ3 with their respective sets of evicting and useful cache blocks. In the example, it is their respective sets of evicting and useful cache blocks. In the example, it is assumed that tasks τ1 and τ2 can be released at most once during the execution assumed that tasks τ1 and τ2 can be released at most once during the execution of τ3 and that block reload time is equal to 1. Based on this, only two preemption of τ3 and that block reload time is equal to 1. Based on this, only two preemption combinations which result in the worst-case CRPD bound are possible: 1) A combinations which result in the worst-case CRPD bound are possible: 1) A job of τ2 directly preempts a job of τ3, and a job of τ1 directly preempts a job job of τ2 directly preempts a job of τ3, and a job of τ1 directly preempts a job of τ2 (nested preemptions), 2) A job of τ2 directly preempts a job of τ3, and a of τ2 (nested preemptions), 2) A job of τ2 directly preempts a job of τ3, and a job of τ1 directly preempts a job of τ3 (direct preemptions). job of τ1 directly preempts a job of τ3 (direct preemptions).

Figure 8.1. Example of the pessimistic CRPD estimation in both, UCB- and ECB-union based Figure 8.1. Example of the pessimistic CRPD estimation in both, UCB- and ECB-union based approaches. Notice that the worst-case execution time is in reality significantly larger than CRPD approaches. Notice that the worst-case execution time is in reality significantly larger than CRPD (black rectangles), but the focus of the figure is rather on preemptions and CRPD depiction. (black rectangles), but the focus of the figure is rather on preemptions and CRPD depiction. 144

114 Chapter 8. Preemption-delay aware analysis for FPPS 114 Chapter 8. Preemption-delay aware analysis for FPPS

For each task, black rectangles in the ﬁgures represent the worst-case ex- For each task, black rectangles in the ﬁgures represent the worst-case execution time, grey rectangles represent CRPD, whereas the sets of integer ecution time, grey rectangles represent CRPD, whereas the sets of integer values above the grey rectangles represent the cache sets whose reloads must be values above the grey rectangles represent the cache sets whose reloads must be accounted. accounted. Considering the given cache block sets, the actual worst-case CRPD, based Considering the given cache block sets, the actual worst-case CRPD, based on the separately analysed preemption combinations, is: on the separately analysed preemption combinations, is:

8 (nested preemption): This is the case because τ2 evicts cache blocks 3, 8 (nested preemption): This is the case because τ2 evicts cache blocks 3, 4, 7, and 8 which are then reloaded during the post-preemption execution 4, 7, and 8 which are then reloaded during the post-preemption execution of τ3. After that, τ1 evicts blocks 1 and 2 when preempting τ2, which are of τ3. After that, τ1 evicts blocks 1 and 2 when preempting τ2, which are reloaded during the post-preemption execution of τ2, and also τ1 evicts reloaded during the post-preemption execution of τ2, and also τ1 evicts cache blocks 5 and 6 which are reloaded during the post-preemption cache blocks 5 and 6 which are reloaded during the post-preemption execution of τ3. Notice that although τ1 also potentially evicts blocks 3, execution of τ3. Notice that although τ1 also potentially evicts blocks 3, and 4 from τ3, they are accounted as reloads only once within τ3, because and 4 from τ3, they are accounted as reloads only once within τ3, because τ3 is interrupted once and thus only one reload of each useful cache block τ3 is interrupted once and thus only one reload of each useful cache block within remaining execution of τ3 is possible. within remaining execution of τ3 is possible.

8 (direct preemptions): This is the case because τ2 evicts cache blocks 3, 8 (direct preemptions): This is the case because τ2 evicts cache blocks 3, 4, 7, and 8 from τ3, and τ1 evicts cache blocks 3, 4, 5, and 6 from τ3. 4, 7, and 8 from τ3, and τ1 evicts cache blocks 3, 4, 5, and 6 from τ3.

Since any other preemption combination can be derived only by removing one Since any other preemption combination can be derived only by removing one of the preemptions accounted in the two above, the worst-case CRPD is equal of the preemptions accounted in the two above, the worst-case CRPD is equal to 8. However, UCB-union based approaches (including the multiset variant) to 8. However, UCB-union based approaches (including the multiset variant) compute the following CRPD: compute the following CRPD: ucbu ucbu γi,h = τ ∈aﬀ (i,h) UCB k ∩ ECB h γi,h = τ ∈aﬀ (i,h) UCB k ∩ ECB h k k ucbu ucbu γ = UCB 3 ∪ UCB 2 ∩ ECB 1 =6 γ = UCB 3 ∪ UCB 2 ∩ ECB 1 =6 3,1 3,1 ucbu ucbu γ3,2 = UCB 3 ∩ ECB 2 =4 γ3,2 = UCB 3 ∩ ECB 2 =4 ucbu ucbu ucbu ucbu γ3,1 + γ3,2 =4+6=10 γ3,1 + γ3,2 =4+6=10 accounted reloads for blocks: 1, 2, 3, 3, 4, 4, 5, 6, 7, 8 accounted reloads for blocks: 1, 2, 3, 3, 4, 4, 5, 6, 7, 8

UCB-union based approaches compute CRPD upper bound of 10 reloads, thus UCB-union based approaches compute CRPD upper bound of 10 reloads, thus approximating two block reloads over the safe upper bound (8 reloads) il- approximating two block reloads over the safe upper bound (8 reloads) illustrated in Figure 8.1. Compared to the leftmost case from Figure 8.1, the lustrated in Figure 8.1. Compared to the leftmost case from Figure 8.1, the accounted infeasible reloads are for blocks 3 and 4. Compared to the rightmost accounted infeasible reloads are for blocks 3 and 4. Compared to the rightmost case, the accounted infeasible reloads are for blocks 1 and 2. case, the accounted infeasible reloads are for blocks 1 and 2. 145

8.3 Improved response-time analysis 115 8.3 Improved response-time analysis 115

ECB-union based approaches (including the multiset variant) compute the ECB-union based approaches (including the multiset variant) compute the CRPD upper-bound as follows: CRPD upper-bound as follows: ecbu ecbu γi,h =max τ ∈hpe(h) ECB h ∩ UCB k γi,h =max τ ∈hpe(h) ECB h ∩ UCB k τ ∈aﬀ (i,h) h τ ∈aﬀ (i,h) h k k ecbu ecbu γ3,1 =max |(ECB 1) ∩ UCB k| γ3,1 =max |(ECB 1) ∩ UCB k| τ ∈{2,3} τ ∈{2,3} k k =max |ECB 1 ∩ UCB 2|, |ECB 1 ∩ UCB 3| =4 =max |ECB 1 ∩ UCB 2|, |ECB 1 ∩ UCB 3| =4 ecbu ecbu γ3,2 =max | ECB 1 ∪ ECB 2 ∩ UCB 3| =6 γ3,2 =max | ECB 1 ∪ ECB 2 ∩ UCB 3| =6 ecbu ecbu ecbu ecbu γ3,1 + γ3,2 =4+6=10 γ3,1 + γ3,2 =4+6=10

Similarly to UCB-union based approaches, ECB-union based approaches com- Similarly to UCB-union based approaches, ECB-union based approaches compute CRPD upper bound of 10 reloads, thus approximating two cache block pute CRPD upper bound of 10 reloads, thus approximating two cache block reloads over the safe bound. reloads over the safe bound. Even when the lowest bound of the two is selected, CRPD bound is over- Even when the lowest bound of the two is selected, CRPD bound is over- approximated by accounting for two cache block reloads which cannot occur in a approximated by accounting for two cache block reloads which cannot occur in a single combination of preemptions during runtime. CRPD over-approximation single combination of preemptions during runtime. CRPD over-approximation is further increased when multiple jobs of each task are introduced. In this is further increased when multiple jobs of each task are introduced. In this chapter, we propose a novel method for computing the CRPD and the worst- chapter, we propose a novel method for computing the CRPD and the worst- case response time, accounting for the above-described problems. case response time, accounting for the above-described problems.

8.3 Improved response-time analysis 8.3 Improved response-time analysis

In the remainder of the chapter, when we refer to the term preemption we In the remainder of the chapter, when we refer to the term preemption we consider both, indirect (nested) and direct preemptions. We start with defining consider both, indirect (nested) and direct preemptions. We start with defining a cache-aware worst-case response time equation, slightly different than the a cache-aware worst-case response time equation, slightly different than the existing ones. Formally, the response time analysis is defined as the least existing ones. Formally, the response time analysis is defined as the least fixed-point of the following equation: fixed-point of the following equation: (l) (l) (l+1) (l) Ri (l+1) (l) Ri Ri = Ci + γ(i, Ri )+ Ch (8.9) Ri = Ci + γ(i, Ri )+ Ch (8.9) Th Th τh∈hp(i) τh∈hp(i)

Notice that unlike in the existing approaches, Equation 8.9 computes the CRPD Notice that unlike in the existing approaches, Equation 8.9 computes the CRPD upper bound γ(i, t), which is a function that implicitly accounts for all preemp- upper bound γ(i, t), which is a function that implicitly accounts for all preemptions that can occur within duration t, between the ﬁrst i tasks of Γ. tions that can occur within duration t, between the ﬁrst i tasks of Γ. A CRPD upper bound on all preemptions that can occur within duration t A CRPD upper bound on all preemptions that can occur within duration t can be computed more accurately by applying the following four steps that we can be computed more accurately by applying the following four steps that we describe in more detail in the remainder of this section: describe in more detail in the remainder of this section: 146

116 Chapter 8. Preemption-delay aware analysis for FPPS 116 Chapter 8. Preemption-delay aware analysis for FPPS

Figure 8.2. Worst-case preemptions for τ3 during the time duration t =46. Figure 8.2. Worst-case preemptions for τ3 during the time duration t =46.

1. Derive all possible preemptions which can occur within duration t, be- 1. Derive all possible preemptions which can occur within duration t, between the jobs of the ﬁrst i tasks of Γ, (described in Subsection 8.3.1). tween the jobs of the ﬁrst i tasks of Γ, (described in Subsection 8.3.1). 2. Divide the possible preemptions into partitions such that each partition 2. Divide the possible preemptions into partitions such that each partition accounts for single-job preemptions between the tasks, (described in accounts for single-job preemptions between the tasks, (described in Subsection 8.3.2) Subsection 8.3.2) 3. Compute the CRPD bounds for each partition individually, (described in 3. Compute the CRPD bounds for each partition individually, (described in Subsection 8.3.3). Subsection 8.3.3). 4. Sum the CRPD bounds of all partitions to obtain the cumulative CRPD 4. Sum the CRPD bounds of all partitions to obtain the cumulative CRPD bound on all possible preemptions within duration t, (described in Sub- bound on all possible preemptions within duration t, (described in Sub- section 8.3.4). section 8.3.4). To show an overview of how the proposed analysis works, we provide the To show an overview of how the proposed analysis works, we provide the running example in Figure 8.2, and we compute the upper bound on CRPD running example in Figure 8.2, and we compute the upper bound on CRPD within the time duration t =46. The analysis computes the bound as follows: within the time duration t =46. The analysis computes the bound as follows:

1. Derive all possible preemptions which can occur within duration t, be- 1. Derive all possible preemptions which can occur within duration t, between the jobs of the ﬁrst i tasks of Γ. tween the jobs of the ﬁrst i tasks of Γ. Example: Given the tasks from Figure 8.2, during the 46 time units, task Example: Given the tasks from Figure 8.2, during the 46 time units, task τ1 can preempt τ3 at most two times, and it can preempt τ2 at most once. τ1 can preempt τ3 at most two times, and it can preempt τ2 at most once. Also, τ2 can preempt τ3 at most once. More formally, a single preemption Also, τ2 can preempt τ3 at most once. More formally, a single preemption from a job of τh on a job of τj is represented with an ordered pair (τh, τj). from a job of τh on a job of τj is represented with an ordered pair (τh, τj). Thus, all possible preemptions, within 46 time units, can be represented Thus, all possible preemptions, within 46 time units, can be represented by the following multiset of ordered preemption pairs: by the following multiset of ordered preemption pairs: (τ1,τ3), (τ1,τ3), (τ1,τ2), (τ2,τ3) (8.10) (τ1,τ3), (τ1,τ3), (τ1,τ2), (τ2,τ3) (8.10)

2. Divide the possible preemptions into partitions such that each partition 2. Divide the possible preemptions into partitions such that each partition accounts for single-job preemptions between the tasks. accounts for single-job preemptions between the tasks. Example: To represent the partitions, we generate the multiset Λ which Example: To represent the partitions, we generate the multiset Λ which 147

8.3 Improved response-time analysis 117 8.3 Improved response-time analysis 117 consists of all possible preemptions, divided into partitions that account consists of all possible preemptions, divided into partitions that account for single-job preemptions between the tasks. Given the possible preemp- for single-job preemptions between the tasks. Given the possible preemptions from the multiset derived in the previous step, the multiset Λ of all tions from the multiset derived in the previous step, the multiset Λ of all partitions is: partitions is:

Partition 1 Partition 2 Partition 1 Partition 2 ↓↓ ↓↓ Λ= {(τ1,τ2), (τ1,τ3), (τ2,τ3)} , {(τ1,τ3)} Λ= {(τ1,τ2), (τ1,τ3), (τ2,τ3)} , {(τ1,τ3)}

The multiset Λ consists of two partitions (each represented as a set of The multiset Λ consists of two partitions (each represented as a set of preemptions), such that the ﬁrst partition consists of the following pre- preemptions), such that the ﬁrst partition consists of the following preemptions {(τ1,τ2), (τ1,τ3), (τ2,τ3)}, meaning that it is possible that τ1 emptions {(τ1,τ2), (τ1,τ3), (τ2,τ3)}, meaning that it is possible that τ1 preempts τ2, that τ1 preempts τ3, and that τ2 preempts τ3. Jointly, the preempts τ2, that τ1 preempts τ3, and that τ2 preempts τ3. Jointly, the preemptions consist of all possible preemptions among the three tasks preemptions consist of all possible preemptions among the three tasks within a duration of 46 time units. within a duration of 46 time units. 3. Compute the CRPD bounds for each partition individually. 3. Compute the CRPD bounds for each partition individually. Example: As we showed in the previous section, when a single job of Example: As we showed in the previous section, when a single job of each task may preempt the other jobs with lower priority, the upper bound each task may preempt the other jobs with lower priority, the upper bound on CRPD is 8 time units. This is the upper bound on all preemptions on CRPD is 8 time units. This is the upper bound on all preemptions accounted in Partition 1. Considering Partition 2, it consists of a single accounted in Partition 1. Considering Partition 2, it consists of a single preemption, from a job of τ1 on τ3, and in this case the upper bound is 4 preemption, from a job of τ1 on τ3, and in this case the upper bound is 4 time units since the preemption may lead to the reloads of cache blocks time units since the preemption may lead to the reloads of cache blocks 3, 4, 5 and 6. 3, 4, 5 and 6. 4. Sum the CRPD bounds of all partitions to obtain the cumulative CRPD 4. Sum the CRPD bounds of all partitions to obtain the cumulative CRPD bound on all possible preemptions within duration t. bound on all possible preemptions within duration t. Example: The sum of upper bounds for Partition 1 and Partition 2 is Example: The sum of upper bounds for Partition 1 and Partition 2 is 8+4=12, which is the upper bound on all preemptions within 46 time 8+4=12, which is the upper bound on all preemptions within 46 time units of the three shown tasks. units of the three shown tasks.

In the remainder of this section, we formally deﬁne the introduced terms In the remainder of this section, we formally deﬁne the introduced terms and prove that the proposed analysis results in a safe CRPD upper bound. The and prove that the proposed analysis results in a safe CRPD upper bound. The running example remains and it serves for a better understanding of how the running example remains and it serves for a better understanding of how the above values are computed and what they formally represent. This section is above values are computed and what they formally represent. This section is divided into the following subsections: 8.3.1 – describes the computation of divided into the following subsections: 8.3.1 – describes the computation of upper bounds on the number of preemptions, 8.3.2 – describes the preemption upper bounds on the number of preemptions, 8.3.2 – describes the preemption partitioning, 8.3.3 – computation of CRPD bound for single partition, 8.3.4 – partitioning, 8.3.3 – computation of CRPD bound for single partition, 8.3.4 – computation of CRPD bound for all preemptions, 8.3.5 – correctness proof for computation of CRPD bound for all preemptions, 8.3.5 – correctness proof for the computation of the worst-case response time, 8.3.6 – time complexity, and the computation of the worst-case response time, 8.3.6 – time complexity, and 148

118 Chapter 8. Preemption-delay aware analysis for FPPS 118 Chapter 8. Preemption-delay aware analysis for FPPS

8.3.7 – the additional computation for CRPD bound for single partition, based 8.3.7 – the additional computation for CRPD bound for single partition, based on ﬁnding the worst-case preemption combination. on ﬁnding the worst-case preemption combination.

8.3.1 Upper bounds on the number of preemptions 8.3.1 Upper bounds on the number of preemptions h h Deﬁnition 8.1. An upper bound Ej (t) on number of times a task τh can preempt Deﬁnition 8.1. An upper bound Ej (t) on number of times a task τh can preempt τj (h ⎩ × , > Tj Th Th Tj Tj Th Th Tj h h Proposition 8.1. Ej (t) is an upper bound on number of possible preemptions Proposition 8.1. Ej (t) is an upper bound on number of possible preemptions from τh on τj within duration t. from τh on τj within duration t. Proof. Let us consider the following cases: Proof. Let us consider the following cases: t t t t ≤ τh τj ≤ τh τj Th Tj : Each job of can preempt at most once, therefore the number Th Tj : Each job of can preempt at most once, therefore the number of τh jobs which can be released within duration t is a safe bound on the number of τh jobs which can be released within duration t is a safe bound on the number of preemptions from τh on τj within t. of preemptions from τh on τj within t. t t t t > τh > τh Th Tj : An upper bound on preemptions from jobs of on a single job Th Tj : An upper bound on preemptions from jobs of on a single job Rj Rj τj τj of is equal to Th since it is also an upper bound on number of times that of is equal to Th since it is also an upper bound on number of times that jobs of τh can be released within the worst-case response time Rj of a single jobs of τh can be released within the worst-case response time Rj of a single Rj Rj τj τj job. Since Equation 8.11 applies the bound Th on each job of which can job. Since Equation 8.11 applies the bound Th on each job of which can be released within t, the proposition holds. be released within t, the proposition holds.

8.3.2 Preemption partitioning 8.3.2 Preemption partitioning Once the all possible preemptions which can occur within duration t are iden- Once the all possible preemptions which can occur within duration t are identified, we divide them into partitions, such that no partition accounts for the tified, we divide them into partitions, such that no partition accounts for the same preemption pair of the first i tasks in Γ, and such that all partitions jointly same preemption pair of the first i tasks in Γ, and such that all partitions jointly account for all possible preemptions. account for all possible preemptions.

Definition 8.2. A multiset Λi,t of partitions consisting of all possible preemp- Definition 8.2. A multiset Λi,t of partitions consisting of all possible preemptions that can occur within duration t, between jobs of the first i tasks of Γ. tions that can occur within duration t, between jobs of the first i tasks of Γ. h h Λi,t = {λ1,λ2, ..., λz} such that λr = {(τh,τj) | r ≤ Ej (t)} (8.12) Λi,t = {λ1,λ2, ..., λz} such that λr = {(τh,τj) | r ≤ Ej (t)} (8.12)

In Equation 8.12, Λi,t is deﬁned as a multiset of of sets (partitions). Each In Equation 8.12, Λi,t is deﬁned as a multiset of of sets (partitions). Each set λr consists of possible preemptions and each preemption is represented as set λr consists of possible preemptions and each preemption is represented as 149

8.3 Improved response-time analysis 119 8.3 Improved response-time analysis 119

an ordered pair (τh,τj) where the ﬁrst element represents the preempting, and an ordered pair (τh,τj) where the ﬁrst element represents the preempting, and the second element represents the preempted task. The multiset Λ is formed of the second element represents the preempted task. The multiset Λ is formed of h h exactly z partitions, where z =max{Ej (t) | 1 ≤ h

Λ3,46 = {λ1,λ2} where λ1 = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} and λ2 = {(τ1,τ3)} Λ3,46 = {λ1,λ2} where λ1 = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} and λ2 = {(τ1,τ3)}

It is important to notice that the multiset union ( ) of all partitions in Λ3,46 It is important to notice that the multiset union ( ) of all partitions in Λ3,46 results in the multiset of all possible preemptions, e.g., results in the multiset of all possible preemptions, e.g.,

λ1 λ2 = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} {(τ1,τ3)} λ1 λ2 = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} {(τ1,τ3)}

= {(τ1,τ2), (τ1,τ3), (τ2,τ3), (τ1,τ3)} = {(τ1,τ2), (τ1,τ3), (τ2,τ3), (τ1,τ3)}

Proposition 8.2. Multiset Λi,t consists of all possible preemptions that may Proposition 8.2. Multiset Λi,t consists of all possible preemptions that may occur within duration t, between the jobs of the first i tasks of Γ. occur within duration t, between the jobs of the first i tasks of Γ. Proof. Directly follows from Proposition 8.1 and Equation 8.12 since Equation Proof. Directly follows from Proposition 8.1 and Equation 8.12 since Equation 8.12 includes each possible preemption, occurable within duration t between 8.12 includes each possible preemption, occurable within duration t between the first i tasks of Γ, in one of the partitions of Λi,t. the first i tasks of Γ, in one of the partitions of Λi,t.

8.3.3 CRPD bound on preemptions from a single partition 8.3.3 CRPD bound on preemptions from a single partition As suggested in Section 3, considering the Problem 1 of CRPD over-approxi- As suggested in Section 3, considering the Problem 1 of CRPD over-approximation, computing a bound for different preemption partitions individually, mation, computing a bound for different preemption partitions individually, instead of computing it for all preemptions at once, may result in more precise instead of computing it for all preemptions at once, may result in more precise CRPD estimations. CRPD estimations. To achieve this, once the multiset Λi,t of preemption partitions is computed, To achieve this, once the multiset Λi,t of preemption partitions is computed, an upper bound on CRPD resulting from preemptions of a single partition an upper bound on CRPD resulting from preemptions of a single partition λr ∈ Λi,t can be computed by selecting the minimum CRPD bound among the λr ∈ Λi,t can be computed by selecting the minimum CRPD bound among the results from UCB-Union and ECB-Union approaches. Here, we describe the results from UCB-Union and ECB-Union approaches. Here, we describe the improvements and adjustments on those approaches to compute CRPD bound improvements and adjustments on those approaches to compute CRPD bound from preemptions contained within a partition. from preemptions contained within a partition. In the following equations, aff (i, h, λr) represents a set of tasks with pri- In the following equations, aff (i, h, λr) represents a set of tasks with priorities higher than or equal to τi and lower than τh which can be preempted orities higher than or equal to τi and lower than τh which can be preempted by τh according to preemptions represented in λr. Formally: aff (i, h, λr)= by τh according to preemptions represented in λr. Formally: aff (i, h, λr)= {τk | (τh,τk) ∈ λr ∧ τk ∈ hpe(i)}. Also, with hp(i, λr) we denote a set of {τk | (τh,τk) ∈ λr ∧ τk ∈ hpe(i)}. Also, with hp(i, λr) we denote a set of tasks with priorities higher than τi such that for each τh ∈ hp(i, λr) there is tasks with priorities higher than τi such that for each τh ∈ hp(i, λr) there is (τh,τi) ∈ λr. (τh,τi) ∈ λr. 150

120 Chapter 8. Preemption-delay aware analysis for FPPS 120 Chapter 8. Preemption-delay aware analysis for FPPS

First, we improve and adjust the ECB-Union approach, proposed by Alt- First, we improve and adjust the ECB-Union approach, proposed by Alt- meyer et al. [45]. In that approach, for a job of τh, it is accounted that it directly meyer et al. [45]. In that approach, for a job of τh, it is accounted that it directly preempts one of the tasks from aff (i, h, λr) set such that the maximum possible preempts one of the tasks from aff (i, h, λr) set such that the maximum possible number of UCBs are evicted. In order for this approach to be correct, it is number of UCBs are evicted. In order for this approach to be correct, it is also accounted that tasks that can preempt τh also contribute to the evictions also accounted that tasks that can preempt τh also contribute to the evictions of useful cache blocks of the preempted task. This scenario is represented by a of useful cache blocks of the preempted task. This scenario is represented by a ecbp ecbp CRPD bound γi,h . We further improve this formulation by accounting that a CRPD bound γi,h . We further improve this formulation by accounting that a preemption from a single job of τh on any job of τk from aff (i, h, λr) cannot preemption from a single job of τh on any job of τk from aff (i, h, λr) cannot cause more cache-block reloads than the maximum number of UCBs that can be cause more cache-block reloads than the maximum number of UCBs that can be evicted at a single preemption point of τk. The maximum number of UCBs at a evicted at a single preemption point of τk. The maximum number of UCBs at a max max single preemption point of τk is represented by ucbk . The above translates single preemption point of τk is represented by ucbk . The above translates to Equation 8.13. to Equation 8.13. ecbp ecbp γi,h (λr)= max γi,h (λr)= max τk∈aff (i,h,λr ) τk∈aff (i,h,λr ) max max min ECB h ∩ UCB k , ucbk min ECB h ∩ UCB k , ucbk

τh ∈hp(h,λr )∪{τh} τh ∈hp(h,λr )∪{τh} (8.13) (8.13)

We build the correctness of the proposed computation on the correctness of the We build the correctness of the proposed computation on the correctness of the standard ECB-Union method [45]. standard ECB-Union method [45]. ecbp ecbp Proposition 8.3. γi,h (λr) is an upper bound on number of reloads that may be Proposition 8.3. γi,h (λr) is an upper bound on number of reloads that may be imposed by a direct preemption from τh on one of the tasks within aﬀ (i, h, λr). imposed by a direct preemption from τh on one of the tasks within aﬀ (i, h, λr).

Proof. A direct preemption from τh on one of the tasks within aﬀ (i, h,λr ) set Proof. A direct preemption from τh on one of the tasks within aﬀ (i, h,λr ) set cannot cause more reloads than the maximum number of UCBs of a preemptable cannot cause more reloads than the maximum number of UCBs of a preemptable task, which can be evicted by τh and all the tasks that may preempt τh. Also, task, which can be evicted by τh and all the tasks that may preempt τh. Also, such bound cannot be greater than the maximum number of useful cache blocks such bound cannot be greater than the maximum number of useful cache blocks max max ucbk that may be present at a single preemption point within a preemptable ucbk that may be present at a single preemption point within a preemptable task, which concludes the proof. task, which concludes the proof.

The ECB-Union based upper bound on all preemptions from λr is computed The ECB-Union based upper bound on all preemptions from λr is computed ecbp ecbp by summing all γi,h terms for each possibly preempting task, from τ1 to τi−1: by summing all γi,h terms for each possibly preempting task, from τ1 to τi−1: ecbp i−1 ecbp ecbp i−1 ecbp γi (λr)= h=1 γi,h (λr) (8.14) γi (λr)= h=1 γi,h (λr) (8.14) ecbp ecbp Proposition 8.4. γi (λr) is an upper bound on number of reloads that can Proposition 8.4. γi (λr) is an upper bound on number of reloads that can be caused by preemptions from the partition λr. be caused by preemptions from the partition λr. 151

8.3 Improved response-time analysis 121 8.3 Improved response-time analysis 121

Proof. For each direct preemption from the preempting jobs of tasks from τ1 to Proof. For each direct preemption from the preempting jobs of tasks from τ1 to τi−1 in λr, Equation 8.14 accounts that the upper-bounded number of cache- τi−1 in λr, Equation 8.14 accounts that the upper-bounded number of cache- blocks is reloaded in one of the preemptable jobs, as follows from Proposition blocks is reloaded in one of the preemptable jobs, as follows from Proposition 8.3. Therefore, the proposition holds. 8.3. Therefore, the proposition holds.

Next, we adjust the UCB-Union approach, proposed by Tan and Mooney Next, we adjust the UCB-Union approach, proposed by Tan and Mooney [43]. In this approach, for a job of τh, it is assumed that it can evict useful [43]. In this approach, for a job of τh, it is assumed that it can evict useful cache blocks from each task τk from the aff (i, h, λr) set. However, since a cache blocks from each task τk from the aff (i, h, λr) set. However, since a single job of τh can directly or indirectly preempt each τk at only one of its single job of τh can directly or indirectly preempt each τk at only one of its preemption points, this cost can at most be equal to the sum of the number of preemption points, this cost can at most be equal to the sum of the number of maximum useful cache blocks at single preemption point of each task τk from maximum useful cache blocks at single preemption point of each task τk from ucbp ucbp aff (i, h, λr). The above is formally represented with γi,h in the following aff (i, h, λr). The above is formally represented with γi,h in the following equation: equation: ucbp max ucbp max γi,h (λr)=min UCB k ∩ECB h , ucbk γi,h (λr)=min UCB k ∩ECB h , ucbk

τk∈aff (i,h,λr ) τk ∈aff (i,h,λr ) τk∈aff (i,h,λr ) τk ∈aff (i,h,λr ) (8.15) (8.15) We build the correctness of the proposed computation on the correctness of the We build the correctness of the proposed computation on the correctness of the standard UCB-Union method [43]. standard UCB-Union method [43]. ucbp ucbp Proposition 8.5. γi,h (λr) is an upper bound on number of reloads within Proposition 8.5. γi,h (λr) is an upper bound on number of reloads within all tasks from aff (i, h, λr), that may be imposed because of the cache-block all tasks from aff (i, h, λr), that may be imposed because of the cache-block accesses from a single job of τh. accesses from a single job of τh.

Proof. A job of τh cannot impose more than one cache block reload per cache- Proof. A job of τh cannot impose more than one cache block reload per cache- memory block m, such that m ∈ ECBh, and m ∈ UCBk for any τk such that memory block m, such that m ∈ ECBh, and m ∈ UCBk for any τk such that τk ∈ aff (i, h, λr), as follows from UCB-Union [43]. Also, since each task τk τk ∈ aff (i, h, λr), as follows from UCB-Union [43]. Also, since each task τk from aff (i, h, λr) can be preempted by τh at only one of its preemption points, from aff (i, h, λr) can be preempted by τh at only one of its preemption points, the maximum number of reloads from τh cannot be greater than the sum of the the maximum number of reloads from τh cannot be greater than the sum of the maximum numbers of useful cache blocks that may be present at a preemption maximum numbers of useful cache blocks that may be present at a preemption point within each such task. This concludes the proof. point within each such task. This concludes the proof.

The UCB-Union based upper bound on all preemptions from λr is also The UCB-Union based upper bound on all preemptions from λr is also ucbp ucbp computed by summing all γi,h terms for each possibly preempting task from computed by summing all γi,h terms for each possibly preempting task from τ1 to τi−1: τ1 to τi−1: ucbp i−1 ucbp ucbp i−1 ucbp γi (λr)= h=1 γi,h (λr) (8.16) γi (λr)= h=1 γi,h (λr) (8.16) ucbp ucbp Proposition 8.6. γi (λr) is an upper bound on number of reloads that can Proposition 8.6. γi (λr) is an upper bound on number of reloads that can be caused by preemptions from the partition λr. be caused by preemptions from the partition λr. 152

122 Chapter 8. Preemption-delay aware analysis for FPPS 122 Chapter 8. Preemption-delay aware analysis for FPPS

Proof. For each possibly preempting job from τ1 to τi−1 in λr, Equation 8.16 Proof. For each possibly preempting job from τ1 to τi−1 in λr, Equation 8.16 accounts that the job leads to upper-bounded number of cache-block reloads in accounts that the job leads to upper-bounded number of cache-block reloads in its possibly preemptable jobs, as follows from Proposition 8.5. Therefore, the its possibly preemptable jobs, as follows from Proposition 8.5. Therefore, the proposition holds. proposition holds.

The final upper bound γi(λr) on CRPD from possible preemptions given in The final upper bound γi(λr) on CRPD from possible preemptions given in λr, between single jobs of the first i tasks from Γ, is defined as the least bound λr, between single jobs of the first i tasks from Γ, is defined as the least bound of the two. of the two. ecbp ucbp ecbp ucbp γi(λr)=min γi (λr) ,γi (λr) × BRT (8.17) γi(λr)=min γi (λr) ,γi (λr) × BRT (8.17)

Proposition 8.7. γi(λr) is an upper bound on CRPD from possible preemptions Proposition 8.7. γi(λr) is an upper bound on CRPD from possible preemptions given in the partition λr. given in the partition λr. Proof. Follows from Propositions 8.4 and 8.6 since Equation 8.17 results in the Proof. Follows from Propositions 8.4 and 8.6 since Equation 8.17 results in the least bound. least bound.

8.3.4 CRPD bound on all preemptions within a time interval 8.3.4 CRPD bound on all preemptions within a time interval Now, we define a computation for the CRPD bound on all preemptions which Now, we define a computation for the CRPD bound on all preemptions which can occur within duration t, between the first i tasks of Γ. can occur within duration t, between the first i tasks of Γ. Definition 8.3. An upper bound γ(i, t) on CRPD of all preemptions, which Definition 8.3. An upper bound γ(i, t) on CRPD of all preemptions, which can occur within duration t between the first i tasks of Γ, is defined with the can occur within duration t between the first i tasks of Γ, is defined with the following equation: following equation: |Λi,t| |Λi,t| γ(i, t)= γi(λr) (8.18) γ(i, t)= γi(λr) (8.18) k=1 k=1 Proposition 8.8. γ(i, t) is an upper bound on CRPD of all preemptions which Proposition 8.8. γ(i, t) is an upper bound on CRPD of all preemptions which can occur within duration t between the first i tasks of Γ. can occur within duration t between the first i tasks of Γ. Proof. Directly follows from Propositions 8.2 and 8.7, since Equation 8.18 is a Proof. Directly follows from Propositions 8.2 and 8.7, since Equation 8.18 is a sum of CRPD upper bounds of preemption partitions that jointly consist of all sum of CRPD upper bounds of preemption partitions that jointly consist of all preemptions within t. preemptions within t.

8.3.5 Worst-case response time 8.3.5 Worst-case response time In this subsection, we prove that the computed worst-case response time is an In this subsection, we prove that the computed worst-case response time is an upper bound. upper bound.

Theorem 8.1. Ri is an upper bound on worst-case response time of τi. Theorem 8.1. Ri is an upper bound on worst-case response time of τi. 153

8.3 Improved response-time analysis 123 8.3 Improved response-time analysis 123

Proof. By induction, over the tasks in Γ in a decreasing priority order. Proof. By induction, over the tasks in Γ in a decreasing priority order. Base case: R1 = C1, because hp(i)=∅. Since Ci is the worst-case execution Base case: R1 = C1, because hp(i)=∅. Since Ci is the worst-case execution time of τ1 the proposition holds. time of τ1 the proposition holds. Inductive hypothesis: Assume that for all τh ∈ hp(i), Rh is an upper bound on Inductive hypothesis: Assume that for all τh ∈ hp(i), Rh is an upper bound on worst-case response time of τh. worst-case response time of τh. Inductive step: We show that Equation 8.9 computes the worst-case response Inductive step: We show that Equation 8.9 computes the worst-case response time of τi. Consider the least ﬁxed point of Equation 8.9, for which Ri = time of τi. Consider the least ﬁxed point of Equation 8.9, for which Ri = (l) (l+1) (l) (l+1) Ri = Ri . At this point, the equation accounts for the following upper Ri = Ri . At this point, the equation accounts for the following upper bounds and worst-case execution times: bounds and worst-case execution times: Ci, which is the worst-case execution time of τi, assumed by the system Ci, which is the worst-case execution time of τi, assumed by the system model. model. γ(i, Ri), which is proved by Proposition 8.8 to be an upper bound on CRPD γ(i, Ri), which is proved by Proposition 8.8 to be an upper bound on CRPD of all jobs which can be released within duration Ri, and have higher than or of all jobs which can be released within duration Ri, and have higher than or equal priority to τi. equal priority to τi. i h h i h h ∀τh∈hp(i) R /T C , which is the worst-case interference caused by exe- ∀τh∈hp(i) R /T C , which is the worst-case interference caused by execution of all jobs of tasks with higher priority than τi without CRPD. cution of all jobs of tasks with higher priority than τi without CRPD. Since we proved for all the factors which can prolong the response time of Since we proved for all the factors which can prolong the response time of τi that they are accounted as the respective execution and CRPD upper bounds τi that they are accounted as the respective execution and CRPD upper bounds in Equation 8.9, then their sum results in an upper bound. in Equation 8.9, then their sum results in an upper bound.

8.3.6 Time complexity 8.3.6 Time complexity The time complexity of the proposed analysis can be improved since in its The time complexity of the proposed analysis can be improved since in its current form Equation 8.12 explicitly creates all partitions which can lead current form Equation 8.12 explicitly creates all partitions which can lead to re-computation of CRPD bounds for many identical partitions. For this to re-computation of CRPD bounds for many identical partitions. For this reason, we first define the matrix Ai,t, from which it is possible to identify how reason, we first define the matrix Ai,t, from which it is possible to identify how many repeated partitions there are, and compute CRPD bound for each distinct many repeated partitions there are, and compute CRPD bound for each distinct partition only once, as introduced in Algorithm 2. partition only once, as introduced in Algorithm 2.

Definition 8.4. A matrix Ai,t of upper bounds on number of preemptions be- Definition 8.4. A matrix Ai,t of upper bounds on number of preemptions between each pair of tasks with higher than or equal priority to Pi which can tween each pair of tasks with higher than or equal priority to Pi which can occur within duration t, is defined with the following equation: occur within duration t, is defined with the following equation: 0 ,j ≤ h 0 ,j ≤ h ∈ Ni×i | ∈ Ni×i | Ai,t =(aj,h) aj,h = h (8.19) Ai,t =(aj,h) aj,h = h (8.19) Ej (t) ,j >h Ej (t) ,j >h

Proposition 8.9. Ai,t stores an upper-bounded number of preemptions, which Proposition 8.9. Ai,t stores an upper-bounded number of preemptions, which can occur within t, between each pair of tasks from hpe(i). can occur within t, between each pair of tasks from hpe(i). Proof. Proposition 8.9 follows directly from Proposition 8.1 and the fact that Proof. Proposition 8.9 follows directly from Proposition 8.1 and the fact that τj cannot preempt any task τh of higher priority, or τj itself (j ≤ h). τj cannot preempt any task τh of higher priority, or τj itself (j ≤ h). 154

124 Chapter 8. Preemption-delay aware analysis for FPPS 124 Chapter 8. Preemption-delay aware analysis for FPPS

Equation 8.19 deﬁnes a square matrix Ai,t such that the number of rows Equation 8.19 deﬁnes a square matrix Ai,t such that the number of rows and columns is equal to i, and each entry of the matrix represents the maximum and columns is equal to i, and each entry of the matrix represents the maximum number of preemptions from a task τh on τj within duration t. number of preemptions from a task τh on τj within duration t. Example: Given the taskset from Figure 8.2, a matrix of preemptions during 46 Example: Given the taskset from Figure 8.2, a matrix of preemptions during 46 time units looks as follows: time units looks as follows:

τ1 τ2 τ3 τ1 τ2 τ3 000 τ1 000 τ1 A3,46 = 100 τ2 A3,46 = 100 τ2 210 τ3 210 τ3

The element a(2, 1) = 1 represents the maximum number of preemptions from The element a(2, 1) = 1 represents the maximum number of preemptions from τ1 on τ2 during 46 time units (note R2 =14). τ1 on τ2 during 46 time units (note R2 =14). n∗(n−1) n∗(n−1) Algorithm explanation: In a matrix Ai,t, there are at most 2 values Algorithm explanation: In a matrix Ai,t, there are at most 2 values representing different numbers of possible preemptions among the tasks. There- representing different numbers of possible preemptions among the tasks. There- n∗(n−1) n∗(n−1) fore, there are at most 2 distinct partitions to be generated. In Algorithm fore, there are at most 2 distinct partitions to be generated. In Algorithm 2, we define the procedure that first generates Ai,t (line 3), and then generates 2, we define the procedure that first generates Ai,t (line 3), and then generates distinct partitions one by one (lines 4 – 10). For each distinct partition, we com- distinct partitions one by one (lines 4 – 10). For each distinct partition, we compute the number sp of times a partition is repeated, then compute the partition pute the number sp of times a partition is repeated, then compute the partition

Data: Time duration t, task index i, Taskset Γ Data: Time duration t, task index i, Taskset Γ Result: CRPD upper bound ξ on all jobs with priority higher than or Result: CRPD upper bound ξ on all jobs with priority higher than or equal to Pi, which can be released within duration t. equal to Pi, which can be released within duration t. 1 Function γ(i, t) 1 Function γ(i, t) 2 ξ ←− 0 2 ξ ←− 0 3 Ai,t ←− generate the matrix of maximum preemption counts 3 Ai,t ←− generate the matrix of maximum preemption counts between the tasks (Equation 8.19) between the tasks (Equation 8.19) 4 while Ai,t =0 i×i do 4 while Ai,t =0 i×i do 5 sp ←− minimum value from Ai,t, greater than zero 5 sp ←− minimum value from Ai,t, greater than zero 6 λ ←− { (τh,τj) | sp ≤ aj,h} 6 λ ←− { (τh,τj) | sp ≤ aj,h} 7 γi(λ) ←− compute the CRPD upper bound from the 7 γi(λ) ←− compute the CRPD upper bound from the preemptions in λ preemptions in λ 8 ξ ←− ξ + sp × γi(λ) 8 ξ ←− ξ + sp × γi(λ) 9 Ai,t ←− decrease all values, greater than zero, by sp 9 Ai,t ←− decrease all values, greater than zero, by sp 10 end 10 end 11 return ξ 11 return ξ Algorithm 2: Algorithm for computing the cumulative CRPD during a time Algorithm 2: Algorithm for computing the cumulative CRPD during a time interval of length t. interval of length t. 155

8.3 Improved response-time analysis 125 8.3 Improved response-time analysis 125

(line 6), and account for its CRPD bound sp times in the cumulative CRPD (line 6), and account for its CRPD bound sp times in the cumulative CRPD bound ξ that is updated for each distinct partition (lines 7 and 8). After this, bound ξ that is updated for each distinct partition (lines 7 and 8). After this, the partitioned preemptions are removed (line 9), and the next distinct partition the partitioned preemptions are removed (line 9), and the next distinct partition is computed until no more preemptions are left to be partitioned. Formally, is computed until no more preemptions are left to be partitioned. Formally, termination criteria is satisﬁed when Ai,t equals to the zero matrix 0i×i. At the termination criteria is satisﬁed when Ai,t equals to the zero matrix 0i×i. At the end, the algorithm results in the same CRPD bound as Equation 8.18. Using end, the algorithm results in the same CRPD bound as Equation 8.18. Using this algorithm, the time complexity is O(n3 ∗ x), where the complexity of this algorithm, the time complexity is O(n3 ∗ x), where the complexity of computation at line 6 is O(x). computation at line 6 is O(x).

8.3.7 CRPD computation using preemption scenarios 8.3.7 CRPD computation using preemption scenarios In this section, we propose an alternative computation for the upper bound In this section, we propose an alternative computation for the upper bound γi(λ) on CRPD of preemptions in the partition λ. The goal is to compute the γi(λ) on CRPD of preemptions in the partition λ. The goal is to compute the CRPD bound from a single worst-case preemption combination among the CRPD bound from a single worst-case preemption combination among the preemptions from λ, addressing Problem 2 from Section 8.2. To achieve this, preemptions from λ, addressing Problem 2 from Section 8.2. To achieve this, we first formally define the following terms: we first formally define the following terms:

Preemption scenario (τi, PT ), and its CRPD upper bound γ(τi, PT ), Preemption scenario (τi, PT ), and its CRPD upper bound γ(τi, PT ), c c c c Preemption combination Πλ, and its CRPD upper bound γ(Πλ). Preemption combination Πλ, and its CRPD upper bound γ(Πλ). Informally, we define a preemption combination as a set of feasible pre- Informally, we define a preemption combination as a set of feasible preemptions where only one job of each task is involved, such that all accounted emptions where only one job of each task is involved, such that all accounted preemptions are present in a partition λ. Before being able to formally define a preemptions are present in a partition λ. Before being able to formally define a preemption combination, we first formally define a preemption scenario. preemption combination, we first formally define a preemption scenario.

Definition 8.5 (Preemption scenario (τi, PT )). A preemption scenario repre- Definition 8.5 (Preemption scenario (τi, PT )). A preemption scenario represents a single interruption due to preemption of a task and it is defined as an sents a single interruption due to preemption of a task and it is defined as an ordered pair (τi, PT ), where τi represents the preempted (interrupted) task, ordered pair (τi, PT ), where τi represents the preempted (interrupted) task, and PT is a set of preempting tasks which execute after the interruption at and PT is a set of preempting tasks which execute after the interruption at τi and before the immediate resumption of τi. Formally, for each preemption τi and before the immediate resumption of τi. Formally, for each preemption scenario (τi, PT ) it holds that PT ⊆ hp(i). scenario (τi, PT ) it holds that PT ⊆ hp(i).

Example: Given the example from Fig. 8.2, the first preemption scenario Example: Given the example from Fig. 8.2, the first preemption scenario in τ3 is (τ3 , {τ1,τ2}), and the second preemption scenario is (τ3 , {τ1}). in τ3 is (τ3 , {τ1,τ2}), and the second preemption scenario is (τ3 , {τ1}). Also, in the same figure, τ2 is preempted once and this preemption scenario is Also, in the same figure, τ2 is preempted once and this preemption scenario is (τ2 , {τ1}). (τ2 , {τ1}). In order to compute the upper bound on CRPD on τi, resulting from one In order to compute the upper bound on CRPD on τi, resulting from one interruption scenario, the ordering of the preempting tasks is not important. interruption scenario, the ordering of the preempting tasks is not important. All of them are equally capable of evicting cache blocks of τi between its All of them are equally capable of evicting cache blocks of τi between its preemption and resumption, regardless of their ordering. preemption and resumption, regardless of their ordering. 156

126 Chapter 8. Preemption-delay aware analysis for FPPS 126 Chapter 8. Preemption-delay aware analysis for FPPS

Deﬁnition 8.6 (CRPD of a preemption scenario γ(τi, PT )). An upper bound Deﬁnition 8.6 (CRPD of a preemption scenario γ(τi, PT )). An upper bound γ(τi, PT ) on the CRPD of a preempted task τi resulting from a preemption γ(τi, PT ) on the CRPD of a preempted task τi resulting from a preemption scenario (τi, PT ) is: scenario (τi, PT ) is: | ∩ |× | ∩ |× γ(τi, PT )= UCBi ECB h BRT (8.20) γ(τi, PT )= UCBi ECB h BRT (8.20) τh∈PT τh∈PT

Proposition 8.10. γ(τi, PT ) is an upper bound on the CRPD of a preempted Proposition 8.10. γ(τi, PT ) is an upper bound on the CRPD of a preempted task τi resulting from a preemption scenario (τi, PT ). task τi resulting from a preemption scenario (τi, PT ).

Proof. Since Equation 8.20 accounts that each UCB from τi is deﬁnitely Proof. Since Equation 8.20 accounts that each UCB from τi is deﬁnitely reloaded with the worst-case block reload time if there is a corresponding reloaded with the worst-case block reload time if there is a corresponding evicting cache block from any of the preempting tasks within a scenario, the evicting cache block from any of the preempting tasks within a scenario, the proposition holds. proposition holds.

Example: Given the preemption scenario (τ3 , {τ1,τ2}), the upper bound Example: Given the preemption scenario (τ3 , {τ1,τ2}), the upper bound on CRPD of τ3 is: on CRPD of τ3 is:

γ(τ3 , {τ1,τ2})=| UCB3 ∩{ECB1 ∪ ECB2 }|=6 γ(τ3 , {τ1,τ2})=| UCB3 ∩{ECB1 ∪ ECB2 }|=6

A safe upper bound on CRPD of τi resulting from a preemption scenario A safe upper bound on CRPD of τi resulting from a preemption scenario (τi, PT ) can be tightened even more if the low-level task analysis can provide (τi, PT ) can be tightened even more if the low-level task analysis can provide more detailed information on UCBs and ECBs at different program points. In more detailed information on UCBs and ECBs at different program points. In such case, the bound is computed as the maximum intersection of UCBs from a such case, the bound is computed as the maximum intersection of UCBs from a single point within τi and the evicting cache blocks of tasks in PT . This is the single point within τi and the evicting cache blocks of tasks in PT . This is the case because Equation 8.20 considers a single preempted point and tasks which case because Equation 8.20 considers a single preempted point and tasks which may evict cache blocks before preempted task resumes to execute. On the other may evict cache blocks before preempted task resumes to execute. On the other hand, many existing approaches consider multiple preemption scenarios at once, hand, many existing approaches consider multiple preemption scenarios at once, and therefore this improvement is not applicable in their case, as shown by Shah and therefore this improvement is not applicable in their case, as shown by Shah et al. [114]. et al. [114]. Given the formal definition of preemption scenarios, now we can define Given the formal definition of preemption scenarios, now we can define a preemption combination. With this definition, we need to insure that a pre- a preemption combination. With this definition, we need to insure that a preemption combination consists only of preemption scenarios which account for emption combination consists only of preemption scenarios which account for interactions between a single job of each task. Therefore, we need to insure that interactions between a single job of each task. Therefore, we need to insure that a preemption combination does not include two preemption scenarios which are a preemption combination does not include two preemption scenarios which are mutually exclusive, given the constraint of using only single jobs. mutually exclusive, given the constraint of using only single jobs.

Definition 8.7 (Preemption combination Πc). A preemption combination Πc is Definition 8.7 (Preemption combination Πc). A preemption combination Πc is defined as a set of disjoint non-empty preemption scenarios between single jobs defined as a set of disjoint non-empty preemption scenarios between single jobs of tasks in Γ such that: of tasks in Γ such that: 157

8.3 Improved response-time analysis 127 8.3 Improved response-time analysis 127

j l c j l c 1) If there are two preemption scenarios (τj, PT ) and (τl, PT ) in Π 1) If there are two preemption scenarios (τj, PT ) and (τl, PT ) in Π j l l j l l such that τh ∈ PT ∩ PT and Pl

The first constraint refers to a case: If a single job of task τh preempts single The first constraint refers to a case: If a single job of task τh preempts single jobs of tasks τj and τl, where Pj >Pl, then that job of τl is definitely preempted jobs of tasks τj and τl, where Pj >Pl, then that job of τl is definitely preempted by the τj job. by the τj job. The second constraint accounts that an additional preemption scenario with The second constraint accounts that an additional preemption scenario with c c τh implies that Π accounted for two jobs of τh preempting a job of τj, while τh implies that Π accounted for two jobs of τh preempting a job of τj , while the definition accounts for at most one job of each task. the definition accounts for at most one job of each task. Example: Given a definition of a preemption combination, one possible combi- Example: Given a definition of a preemption combination, one possible combination is: {(τ3 , {τ1}) , (τ3 , {τ2})} which describes the preemption scenario nation is: {(τ3 , {τ1}) , (τ3 , {τ2})} which describes the preemption scenario where τ1 directly preempts τ3 at one preemption point, while τ2 directly pre- where τ1 directly preempts τ3 at one preemption point, while τ2 directly preempts τ3 at another preemption point. However, the set {(τ3 , {τ1}) , (τ2 , {τ1})} empts τ3 at another preemption point. However, the set {(τ3 , {τ1}) , (τ2 , {τ1})} is not a preemption combination, because it describes the case where a job of τ3 is not a preemption combination, because it describes the case where a job of τ3 is preempted by a job of τ1, and a job of τ2 is preempted by a job of τ1, while is preempted by a job of τ1, and a job of τ2 is preempted by a job of τ1, while a job of τ2 does not preempt a job of τ3. Since this is the case, more than two a job of τ2 does not preempt a job of τ3. Since this is the case, more than two jobs of τ1 are accounted, which violates Definition 8.7. jobs of τ1 are accounted, which violates Definition 8.7. Definition 8.8 (Preemption combination consistent with λ). We say that Πc is Definition 8.8 (Preemption combination consistent with λ). We say that Πc is a preemption combination consistent with the preemption partition λ iff for any a preemption combination consistent with the preemption partition λ iff for any c c preemption scenario (τj, PT ) ∈ Π the preemptions captured by the scenario preemption scenario (τj, PT ) ∈ Π the preemptions captured by the scenario are possible, i.e. present in λ. Formally, are possible, i.e. present in λ. Formally, c c ∀(τj, PT ) ∈ Πi : ∀τh ∈ PT :(τj,τh) ∈ λ ∀(τj, PT ) ∈ Πi : ∀τh ∈ PT :(τj,τh) ∈ λ

Example: Given the preemption partition λ = {(τ1,τ2), (τ1,τ3), (τ2,τ3)}, Example: Given the preemption partition λ = {(τ1,τ2), (τ1,τ3), (τ2,τ3)}, a preemption combination consistent with λ is {(τ3 , {τ1}) , (τ3 , {τ2})} since a preemption combination consistent with λ is {(τ3 , {τ1}) , (τ3 , {τ2})} since it describes preemption scenarios made of preemptions that are possible, i.e. it describes preemption scenarios made of preemptions that are possible, i.e. present in λ. present in λ.

Definition 8.9 (CRPD γ(Πc) of a preemption combination). An upper bound Definition 8.9 (CRPD γ(Πc) of a preemption combination). An upper bound γ(Πc) on the CRPD of a single preemption combination Πc is defined as the γ(Πc) on the CRPD of a single preemption combination Πc is defined as the sum of upper bounds of all preemption scenarios in Πc. Formally, sum of upper bounds of all preemption scenarios in Πc. Formally, c c γ(Π )= γ(τk, PT ) (8.21) γ(Π )= γ(τk, PT ) (8.21) c c (τk,PT)∈Π (τk,PT)∈Π 158

128 Chapter 8. Preemption-delay aware analysis for FPPS 128 Chapter 8. Preemption-delay aware analysis for FPPS

Proposition 8.11. γ(Πc) is an upper bound on CRPD from preemptions in Πc. Proposition 8.11. γ(Πc) is an upper bound on CRPD from preemptions in Πc. Proof. A combination consists of a number of preemption scenarios repre- Proof. A combination consists of a number of preemption scenarios representing the preemptions from preempting tasks on different preemption points. senting the preemptions from preempting tasks on different preemption points. Following from Proposition 8.10, a sum of CRPD upper bounds of each task Following from Proposition 8.10, a sum of CRPD upper bounds of each task interruption in a combination is an upper bound on CRPD of all preemptions interruption in a combination is an upper bound on CRPD of all preemptions accounted in a combination, which concludes the proof. accounted in a combination, which concludes the proof. Example: Given the preemption combination of two direct preemptions Example: Given the preemption combination of two direct preemptions from Figure 8.1, CRPD upper bound is: from Figure 8.1, CRPD upper bound is: γ {(τ3 , {τ1}) , (τ3 , {τ2})} = γ(τ3 , {τ1})+γ(τ3 , {τ2})=4+4=8 γ {(τ3 , {τ1}) , (τ3 , {τ2})} = γ(τ3 , {τ1})+γ(τ3 , {τ2})=4+4=8

Given the preemption combination of a nested preemption from the ﬁgure, Given the preemption combination of a nested preemption from the ﬁgure, γ {(τ3 , {τ1,τ2}) , (τ2 , {τ1})} = γ(τ3 , {τ1,τ2})+γ(τ2 , {τ1})=6+2 γ {(τ3 , {τ1,τ2}) , (τ2 , {τ1})} = γ(τ3 , {τ1,τ2})+γ(τ2 , {τ1})=6+2

Now, we can imagine a set which consists of all possible preemption combi- Now, we can imagine a set which consists of all possible preemption combinations which are consistent with preemptions enlisted in λ. Then, among all nations which are consistent with preemptions enlisted in λ. Then, among all the generated preemption combinations, we find one which results in the worst- the generated preemption combinations, we find one which results in the worst- case CRPD and declare that as a safe upper-bound, since that is the maximum case CRPD and declare that as a safe upper-bound, since that is the maximum obtainable CRPD value among all the possible preemption combinations. obtainable CRPD value among all the possible preemption combinations. However, to generate such complete set of all possible preemption combina- However, to generate such complete set of all possible preemption combinations is computationally inefficient. A potential solution can come from the fact tions is computationally inefficient. A potential solution can come from the fact that it is enough to compute a subset of the complete set of combinations, as that it is enough to compute a subset of the complete set of combinations, as long as we are sure that no greater CRPD value can be obtained in the remaining, long as we are sure that no greater CRPD value can be obtained in the remaining, unaccounted combinations. We show this with the following example: Given unaccounted combinations. We show this with the following example: Given a set of possible preemptions λ = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} from Figure 8.1, a set of possible preemptions λ = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} from Figure 8.1, let us consider the following set of two combinations: let us consider the following set of two combinations: Π3,λ = (τ3 , {τ1}) , (τ3 , {τ2}) , (τ3 , {τ1,τ2}) , (τ2 , {τ1}) Π3,λ = (τ3 , {τ1}) , (τ3 , {τ2}) , (τ3 , {τ1,τ2}) , (τ2 , {τ1})

Π3,λ consists of: 1) a combination of direct preemption scenarios, and 2) a Π3,λ consists of: 1) a combination of direct preemption scenarios, and 2) a combination of nested preemption. Any other possible preemption combination, combination of nested preemption. Any other possible preemption combination, from preemptions in λ, can only be derived by omitting at least one preemption from preemptions in λ, can only be derived by omitting at least one preemption from a preemption scenario from one of the two combinations given in Π3,λ. E.g. from a preemption scenario from one of the two combinations given in Π3,λ. E.g. preemption combination {(τ3 , {τ1,τ2})} is equal to and results in the same preemption combination {(τ3 , {τ1,τ2})} is equal to and results in the same CRPD as {(τ3 , {τ1,τ2}) , (τ2 , ∅)}. Also, all of the preemption scenarios from CRPD as {(τ3 , {τ1,τ2}) , (τ2 , ∅)}. Also, all of the preemption scenarios from {(τ3 , {τ1,τ2})} are already included in the second combination. Therefore, {(τ3 , {τ1,τ2})} are already included in the second combination. Therefore, all the other possible combinations cannot result in a greater CRPD value than all the other possible combinations cannot result in a greater CRPD value than those in Π3,λ, meaning that this subset is sufﬁcient to compute a safe CRPD. those in Π3,λ, meaning that this subset is sufﬁcient to compute a safe CRPD. 159

8.3 Improved response-time analysis 129 8.3 Improved response-time analysis 129

A preemption combination which is constructed by adding a preemption A preemption combination which is constructed by adding a preemption scenario to any of the combinations in Π3,λ cannot be obtained. This is the scenario to any of the combinations in Π3,λ cannot be obtained. This is the case because in the first combination, it is accounted that τ3 is preempted by case because in the first combination, it is accounted that τ3 is preempted by both tasks, at two different points, accounting for all preemptions in λ where both tasks, at two different points, accounting for all preemptions in λ where τ3 can be preempted, i.e. (τ1,τ3) and (τ2,τ3). In this case, τ2 cannot be τ3 can be preempted, i.e. (τ1,τ3) and (τ2,τ3). In this case, τ2 cannot be preempted considering preemption (τ1,τ2) ∈ λ as shown in Definition 8.7. In preempted considering preemption (τ1,τ2) ∈ λ as shown in Definition 8.7. In the second combination, τ3 is interrupted once while both tasks preempt it, and the second combination, τ3 is interrupted once while both tasks preempt it, and τ2 is preempted by τ1, meaning that all preemptions from λ are accounted. τ2 is preempted by τ1, meaning that all preemptions from λ are accounted. In order to define a safe set of combinations Πi,λ, such that at least one of In order to define a safe set of combinations Πi,λ, such that at least one of those combinations may result in the worst-case CRPD, we first introduce a those combinations may result in the worst-case CRPD, we first introduce a term of set partitioning1 in order to represent different ways one task may be term of set partitioning1 in order to represent different ways one task may be preempted by the others. preempted by the others. Example: Given the task τ3, set partitions of the set {τ1,τ2} of its potentially Example: Given the task τ3, set partitions of the set {τ1,τ2} of its potentially preempting tasks are preempting tasks are

{{τ1}, {τ2}} {{τ1}, {τ2}}

Given a preemptable task τk, and a set of its possibly preempting tasks Given a preemptable task τk, and a set of its possibly preempting tasks PT , all set partitions of PT represent all the ways τk may be preempted such PT , all set partitions of PT represent all the ways τk may be preempted such that each task from PT preempts τk. This is the case because set partitions that each task from PT preempts τk. This is the case because set partitions represent all the ways a set can be grouped in non-empty subsets, such that represent all the ways a set can be grouped in non-empty subsets, such that each set element is included in exactly one subset. Analogically, in this chapter, each set element is included in exactly one subset. Analogically, in this chapter, each set partition is transformed into a preemption combination defining one each set partition is transformed into a preemption combination defining one way how a task (e.g. τ3 above) can be preempted. Each set partition consists of way how a task (e.g. τ3 above) can be preempted. Each set partition consists of subsets, and each subset represents a preemption scenario on the preempted task. subsets, and each subset represents a preemption scenario on the preempted task. This transformation is formally defined in function generateCombs(PT ,τk) This transformation is formally defined in function generateCombs(PT ,τk) in Algorithm 3. Example: Considering different ways τ3 can be preempted, in Algorithm 3. Example: Considering different ways τ3 can be preempted, set partition {{τ1,τ2}} consists of a single subset, and forms a preemption set partition {{τ1,τ2}} consists of a single subset, and forms a preemption combination {(τ3, {τ1,τ2})}, while set partition {{τ1}, {τ2}} consists of two combination {(τ3, {τ1,τ2})}, while set partition {{τ1}, {τ2}} consists of two subsets and forms a preemption combination {(τ3, {τ1}), (τ3, {τ2})} with two subsets and forms a preemption combination {(τ3, {τ1}), (τ3, {τ2})} with two preemption scenarios: (τ3, {τ1}), and (τ3, {τ2}). preemption scenarios: (τ3, {τ1}), and (τ3, {τ2}).

Proposition 8.12. Given a preemptable task τk, and a set of possibly preempt- Proposition 8.12. Given a preemptable task τk, and a set of possibly preempting tasks PT , any combination of preemptions on τk will result in a less than or ing tasks PT , any combination of preemptions on τk will result in a less than or equal CRPD than any combination generated from generateCombs(PT ,τk). equal CRPD than any combination generated from generateCombs(PT ,τk).

1Set partitioning is a mathematical concept sometimes also known as Bell partitioning [115, 1Set partitioning is a mathematical concept sometimes also known as Bell partitioning [115, 116] named after Eric Temple Bell. There are many fast algorithms for generating set partitions, e.g. 116] named after Eric Temple Bell. There are many fast algorithms for generating set partitions, e.g. [117, 118]. [117, 118]. 160

130 Chapter 8. Preemption-delay aware analysis for FPPS 130 Chapter 8. Preemption-delay aware analysis for FPPS

Proof. By contradiction: Let us assume that there is a preemption combination Proof. By contradiction: Let us assume that there is a preemption combination c c Πl representing the ways how τk can be preempted by tasks from PT , and Πl representing the ways how τk can be preempted by tasks from PT , and c c that Πk can result in a greater CRPD than any combination derived from that Πk can result in a greater CRPD than any combination derived from generateCombs(PT ,τk). The generated combinations represent all the ways generateCombs(PT ,τk). The generated combinations represent all the ways τk may be preempted such that each task from PT preempts τk since set τk may be preempted such that each task from PT preempts τk since set partitions represent all the ways a set can be grouped in non-empty subsets, partitions represent all the ways a set can be grouped in non-empty subsets, c c such that each element is included in exactly one subset. Thus, Πk must omit at such that each element is included in exactly one subset. Thus, Πk must omit at least one preemption, compared to at least one preemption combination derived least one preemption, compared to at least one preemption combination derived from generateCombs(PT ,τk). The initial assumption therefore contradicts from generateCombs(PT ,τk). The initial assumption therefore contradicts c c Proposition 8.11 because Πk cannot impose larger CRPD than the corresponding Proposition 8.11 because Πk cannot impose larger CRPD than the corresponding preemption combination from generateCombs(PT ,τk), which accounts for preemption combination from generateCombs(PT ,τk), which accounts for c c the same preemptions as in preemption scenarios from Πk and at least one the same preemptions as in preemption scenarios from Πk and at least one c c additional preemption compared to Πk. additional preemption compared to Πk.

Using the concept of set partitioning to represent the ways a single task may Using the concept of set partitioning to represent the ways a single task may be preempted, we generate a set Πi,λ of preemption combinations on how all be preempted, we generate a set Πi,λ of preemption combinations on how all tasks from τi to τ1 can interact among each other: tasks from τi to τ1 can interact among each other:

Definition 8.10. By Πi,λ we denote the result from Algorithm 3, i.e. the set of Definition 8.10. By Πi,λ we denote the result from Algorithm 3, i.e. the set of preemption combinations between the first i tasks of Γ such that each combina- preemption combinations between the first i tasks of Γ such that each combination is consistent with λ. tion is consistent with λ.

We describe Algorithm 3 in more details and we use the running example We describe Algorithm 3 in more details and we use the running example from Figure 8.2 to show the algorithm walk-through in Figure 8.3. As stated from Figure 8.2 to show the algorithm walk-through in Figure 8.3. As stated before, for each τk, from τi to τ1, the algorithm first generates possible combina- before, for each τk, from τi to τ1, the algorithm first generates possible combinations on how τk can be preempted, using set partitioning (line 3). This process tions on how τk can be preempted, using set partitioning (line 3). This process is defined in function generateCombs (line 7) and it translates the set partitions is defined in function generateCombs (line 7) and it translates the set partitions of possibly preempting tasks on τk, into different ways τk can be preempted, of possibly preempting tasks on τk, into different ways τk can be preempted, which is represented with a set of preemption combinations (line 16). Then, which is represented with a set of preemption combinations (line 16). Then, for each of those combinations, the algorithm performs extendCombs (line 4), for each of those combinations, the algorithm performs extendCombs (line 4), which is a function that updates the existing preemption combinations, with which is a function that updates the existing preemption combinations, with further preemption scenarios that are possible on the preempting tasks of τk. further preemption scenarios that are possible on the preempting tasks of τk. Take for example the preemption combination Πc, given in Figure 8.3. Take for example the preemption combination Πc, given in Figure 8.3. The combination represents the case where τ4 is preempted at one pre- The combination represents the case where τ4 is preempted at one preemption point, by all of its three possibly preempting tasks (τ1,τ2,τ3). In the emption point, by all of its three possibly preempting tasks (τ1,τ2,τ3). In the figure, this is represented by one arrow (standing for one preempted point of figure, this is represented by one arrow (standing for one preempted point of τ4) and tasks preempting a point (above the arrow). Function extendCombs() τ4) and tasks preempting a point (above the arrow). Function extendCombs() further computes possible ways of preempting τ3 since τ3 is the lowest-priority further computes possible ways of preempting τ3 since τ3 is the lowest-priority preempting task from the preemption scenario (τ4, {τ1,τ2,τ3}). Those ways preempting task from the preemption scenario (τ4, {τ1,τ2,τ3}). Those ways are represented with a set Π3 of preemption combinations. After this, the are represented with a set Π3 of preemption combinations. After this, the 161

8.3 Improved response-time analysis 131 8.3 Improved response-time analysis 131

Data: Set of possible preemption pairs Λ, task index i Data: Set of possible preemption pairs Λ, task index i Result: A set Πi,Λ of preemption combinations consistent with Λ Result: A set Πi,Λ of preemption combinations consistent with Λ 1 Πi,Λ ←− ∅ 1 Πi,Λ ←− ∅ 2 for k ← i to 2 by −1 do 2 for k ← i to 2 by −1 do 3 Πk,Λ ←− generateCombs(hp(k) ,τk) 3 Πk,Λ ←− generateCombs(hp(k) ,τk) 4 Πi,Λ ←− Πi,Λ ∪ extendCombs(Πk,Λ) 4 Πi,Λ ←− Πi,Λ ∪ extendCombs(Πk,Λ) 5 end 5 end 6 return Πi,Λ 6 return Πi,Λ

7 Function generateCombs(PT ,τk) 7 Function generateCombs(PT ,τk) 8 Πk,Λ ←− ∅ 8 Πk,Λ ←− ∅ 9 PT k,Λ ←− remove those tasks from PT that cannot preempt τk according to Λ 9 PT k,Λ ←− remove those tasks from PT that cannot preempt τk according to Λ 10 partitions(PT) ←− generate all possible partitions of a set PT k,Λ, representing 10 partitions(PT) ←− generate all possible partitions of a set PT k,Λ, representing ways a job of τk can be preempted. ways a job of τk can be preempted. 11 for each partition ∈ partitions(PT) 11 for each partition ∈ partitions(PT) c c 12 Πk ←− ∅ 12 Πk ←− ∅ 13 for each subset ∈ partition 13 for each subset ∈ partition c c c c 14 Πk ←− Πk ∪{(τk, subset)} 14 Πk ←− Πk ∪{(τk, subset)} c c 15 Πk,Λ ←− Πk,Λ ∪ Πk 15 Πk,Λ ←− Πk,Λ ∪ Πk 16 return Πk,Λ 16 return Πk,Λ

17 Function extendCombs(Πq,Λ) 17 Function extendCombs(Πq,Λ) c c c c c c 18 while ∃Π ∈ Πq,Λ : ∃(τr, PT) ∈ Π | extended?((τr, PT) , Π )=⊥ do 18 while ∃Π ∈ Πq,Λ : ∃(τr, PT) ∈ Π | extended?((τr, PT) , Π )=⊥ do 19 τl ←− lowest-priority task in PT 19 τl ←− lowest-priority task in PT 20 Πl ←− generateCombs(PT \ τl ,τl) 20 Πl ←− generateCombs(PT \ τl ,τl) c c c c 21 for each Π ∈ Πq,Λ | (τr, PT) ∈ Π 21 for each Π ∈ Πq,Λ | (τr, PT) ∈ Π c c 22 Πq,Λ ←− Πq,Λ ∪ (Π × Πl) 22 Πq,Λ ←− Πq,Λ ∪ (Π × Πl) c c 23 Πq,Λ ←− Πq,Λ \ Π 23 Πq,Λ ←− Πq,Λ \ Π 24 end 24 end 25 return Πq,Λ 25 return Πq,Λ c c 26 Function extended?( (τr, PT) , Π ) 26 Function extended?( (τr, PT) , Π ) 27 τl ←− lowest-priority task in PT 27 τl ←− lowest-priority task in PT c c 28 if ∃(τx, PT ) ∈ Π | τx = τl then 28 if ∃(τx, PT ) ∈ Π | τx = τl then 29 return 29 return 30 else return ⊥; 30 else return ⊥; Algorithm 3: Algorithm that generates a set Πi,Λ of preemption combina- Algorithm 3: Algorithm that generates a set Πi,Λ of preemption combinations. tions. function updates the preemption combinations with a Cartesian product of the function updates the preemption combinations with a Cartesian product of the two. Therefore, on the right side of the ﬁgure, you may notice that now we two. Therefore, on the right side of the ﬁgure, you may notice that now we have two new combinations, updating the Πc with different ways τ3 can be have two new combinations, updating the Πc with different ways τ3 can be preempted. The topmost combination can be updated further on, since pre- preempted. The topmost combination can be updated further on, since preemption scenario (τ3, {τ1,τ2}) can be updated with additional scenario on how emption scenario (τ3, {τ1,τ2}) can be updated with additional scenario on how τ2 can be preempted by τ1. This is eventually computed within the algorithm τ2 can be preempted by τ1. This is eventually computed within the algorithm because the condition in line 18 insures that all combinations are updated until because the condition in line 18 insures that all combinations are updated until no new preemption scenario can be added to any of the existing preemption no new preemption scenario can be added to any of the existing preemption 162

132 Chapter 8. Preemption-delay aware analysis for FPPS 132 Chapter 8. Preemption-delay aware analysis for FPPS

Data: Λ={(τ1,τ2), (τ1,τ3), (τ2,τ3)}, i =3 Data: Λ={(τ1,τ2), (τ1,τ3), (τ2,τ3)}, i =3 Algorithm run: Algorithm run: Π3,Λ ←− ∅ Π3,Λ ←− ∅ for k =3 for k =3 Π3,Λ ←− generateCombs(hp(3 ),τ3) Π3,Λ ←− generateCombs(hp(3 ),τ3) ←− { { (τ3, {τ1}) , (τ3, {τ2})} , {(τ3, {τ1,τ2})}} ←− { { (τ3, {τ1}) , (τ3, {τ2})} , {(τ3, {τ1,τ2})}} Π3,Λ ←− ∅ ∪ extendCombs(Π3,Λ) Π3,Λ ←− ∅ ∪ extendCombs(Π3,Λ) ←− { { (τ3, {τ1}) , (τ3, {τ2})} , {(τ3, {τ1,τ2}) , (τ2, {τ1})}} ←− { { (τ3, {τ1}) , (τ3, {τ2})} , {(τ3, {τ1,τ2}) , (τ2, {τ1})}} for k =2 for k =2 Π2,Λ ←− generateCombs(hp(2 ),τ2) Π2,Λ ←− generateCombs(hp(2 ),τ2) ←− { { (τ2, {τ1})}} ←− { { (τ2, {τ1})}} Π3,Λ ←− { { (τ3, {τ1}) , (τ3, {τ2})} , {(τ3, {τ1,τ2}) , (τ2, {τ1})}} Π3,Λ ←− { { (τ3, {τ1}) , (τ3, {τ2})} , {(τ3, {τ1,τ2}) , (τ2, {τ1})}} ∪{{(τ2, {τ1})}} ∪{{(τ2, {τ1})}} return Π3,Λ return Π3,Λ

Figure 8.3. Top: Algorithm walktrough with an example from Figure 8.2. Bottom: Example for Figure 8.3. Top: Algorithm walktrough with an example from Figure 8.2. Bottom: Example for extending the combination Πc of four tasks. extending the combination Πc of four tasks. combinations. More formally, this criteria is satisﬁed when for each preemp- combinations. More formally, this criteria is satisﬁed when for each preemption scenario (τx, PT ) within any preemption combination from Πi,λ, function tion scenario (τx, PT ) within any preemption combination from Πi,λ, function extended?((τx, PT )) yields true (), meaning that all preemption scenarios extended?((τx, PT )) yields true (), meaning that all preemption scenarios are extended. are extended.

Proposition 8.13. Πi,λ is a safe set of preemption combinations between single Proposition 8.13. Πi,λ is a safe set of preemption combinations between single jobs of the ﬁrst i tasks in Γ, i.e. there is no preemption combination consistent jobs of the ﬁrst i tasks in Γ, i.e. there is no preemption combination consistent with λ with higher CRPD than the maximum CRPD of combinations in Πi,λ. with λ with higher CRPD than the maximum CRPD of combinations in Πi,λ. 163

8.3 Improved response-time analysis 133 8.3 Improved response-time analysis 133

Proof. By contradiction: Let us assume that there is a preemption combination Proof. By contradiction: Let us assume that there is a preemption combination c c Πo between the single jobs of the ﬁrst i tasks, consistent with λ, which can Πo between the single jobs of the ﬁrst i tasks, consistent with λ, which can result in a higher CRPD than any of the combinations in Πi,λ. For each task result in a higher CRPD than any of the combinations in Πi,λ. For each task τk such that (1

Finally, we can compute an upper bound on CRPD resulting from the Finally, we can compute an upper bound on CRPD resulting from the worst-case preemption combination consisting of the preemptions in λ, with the worst-case preemption combination consisting of the preemptions in λ, with the following equation: following equation: c c γi(λ)= max γ(Π ) γi(λ)= max γ(Π ) c (8.22) c (8.22) Π ∈Πi,λ Π ∈Πi,λ

Proposition 8.14. γi(λ) is an upper bound on CRPD from preemptions given Proposition 8.14. γi(λ) is an upper bound on CRPD from preemptions given in the partition λ, between the single jobs of the ﬁrst i tasks from Γ. in the partition λ, between the single jobs of the ﬁrst i tasks from Γ.

Proof. Equation 8.22 computes the maximum upper bound from all preemption Proof. Equation 8.22 computes the maximum upper bound from all preemption combinations accounted by Πi,λ. Then, following from Propositions 8.11 and combinations accounted by Πi,λ. Then, following from Propositions 8.11 and 8.13, the proposition holds. 8.13, the proposition holds.

Example: Given a set of possible preemptions λ = {(τ1,τ2), (τ1,τ3), Example: Given a set of possible preemptions λ = {(τ1,τ2), (τ1,τ3), (τ2,τ3)} from Figure 8.1 and continuing from the example after Proposition (τ2,τ3)} from Figure 8.1 and continuing from the example after Proposition 8.11, the upper bound on CRPD resulting from preemptions in λ is computed 8.11, the upper bound on CRPD resulting from preemptions in λ is computed as γ(λ)=max({8, 8})=8. as γ(λ)=max({8, 8})=8. 164

134 Chapter 8. Preemption-delay aware analysis for FPPS 134 Chapter 8. Preemption-delay aware analysis for FPPS

8.4 Evaluation 8.4 Evaluation

In this section, we show the evaluation results. The goal of the evaluation was In this section, we show the evaluation results. The goal of the evaluation was to investigate to what extent the proposed method is able to identify schedulable to investigate to what extent the proposed method is able to identify schedulable tasksets upon the analysis of the cache-related preemption delays. We compared tasksets upon the analysis of the cache-related preemption delays. We compared the state-of the art analyses for CRPD: (ECB-Union Multiset), (UCB-Union the state-of the art analyses for CRPD: (ECB-Union Multiset), (UCB-Union Multiset) methods, and (Combined Multiset), with two versions of the proposed Multiset) methods, and (Combined Multiset), with two versions of the proposed method, i.e. (Partitioning-ver1) which computes CRPD according to Section method, i.e. (Partitioning-ver1) which computes CRPD according to Section 8.3.3, and the version (Partitioning-ver2) which computes CRPD from the 8.3.3, and the version (Partitioning-ver2) which computes CRPD from the worst-case preemption combination, presented in Section 8.3.7. worst-case preemption combination, presented in Section 8.3.7.

As shown by Shah et al. [114], an evaluation of the CRPD-aware methods As shown by Shah et al. [114], an evaluation of the CRPD-aware methods should consider task parameters derived by using the existing low-level analysis should consider task parameters derived by using the existing low-level analysis tools. Therefore, in this chapter we use the suggested task parameters that are tools. Therefore, in this chapter we use the suggested task parameters that are derived with LLVMTA analysis tool [2], used on Mälardalen [3] and TACLe derived with LLVMTA analysis tool [2], used on Mälardalen [3] and TACLe [4] benchmark tasks. The derived task characteristics are shown in Table 8.2, [4] benchmark tasks. The derived task characteristics are shown in Table 8.2, and they are: worst-case execution time, expressed in terms of wall-clock time, and they are: worst-case execution time, expressed in terms of wall-clock time, set of evicting cache blocks, set of definitely useful cache blocks (shown in the set of evicting cache blocks, set of definitely useful cache blocks (shown in the table as the size of the respective sets – ECB and DC-UCB), and the maximum table as the size of the respective sets – ECB and DC-UCB), and the maximum number (Max DC-UCB) of definitely useful cache blocks per any program number (Max DC-UCB) of definitely useful cache blocks per any program point of a task. The characteristics were derived with assumed direct-mapped point of a task. The characteristics were derived with assumed direct-mapped instruction cache and a data scratchpad. The assumed cache memory consists instruction cache and a data scratchpad. The assumed cache memory consists of 256 sets with line size equal to 8 bytes, while block reload time is equal to of 256 sets with line size equal to 8 bytes, while block reload time is equal to 22 cycles. For more details about the low-level analysis refer to [114]. 22 cycles. For more details about the low-level analysis refer to [114].

Tasksets are generated by randomly selecting a subset of tasks from one Tasksets are generated by randomly selecting a subset of tasks from one of the two benchmarks, Mälardalen or TACLe, specified in each figure. We of the two benchmarks, Mälardalen or TACLe, specified in each figure. We generated 1000 tasksets for each pair of selected utilisation and taskset size. generated 1000 tasksets for each pair of selected utilisation and taskset size. Since the task binaries were analysed individually, they all start at the same Since the task binaries were analysed individually, they all start at the same address (mapping to cache set 0). In a multi-task scheduling situation this can address (mapping to cache set 0). In a multi-task scheduling situation this can hardly be a case because the ECB and UCB placement is determined by their hardly be a case because the ECB and UCB placement is determined by their respective locations in memory. We took this into account by randomly shifting respective locations in memory. We took this into account by randomly shifting the cache set indices, e.g. the ECB in cache set i is shifted to the cache line the cache set indices, e.g. the ECB in cache set i is shifted to the cache line equal to (i+random(256)) modulo 256. Task utilisations were generated using equal to (i+random(256)) modulo 256. Task utilisations were generated using U-Unifast algorithm, as proposed by Bini et al. [31]. Minimum inter-arrival U-Unifast algorithm, as proposed by Bini et al. [31]. Minimum inter-arrival times were then computed using equation Ti = Ci/Ui, while the deadlines times were then computed using equation Ti = Ci/Ui, while the deadlines are assumed to be implicit, i.e. Di = Ti. Task priorities were assigned using are assumed to be implicit, i.e. Di = Ti. Task priorities were assigned using deadline-monotonic order. deadline-monotonic order. 165

8.4 Evaluation 135 8.4 Evaluation 135

Task (TACLe Bench.) WCET ECB UCB Max Task (TACLe Bench.) WCET ECB UCB Max app/lift 13592762 250 125 23 app/lift 13592762 250 125 23 app/powerwindow 55842069 256 120 25 app/powerwindow 55842069 256 120 25 kernel/binarysearch 2860 43 19 18 kernel/binarysearch 2860 43 19 18 kernel/bsort 3332496 42 30 29 kernel/bsort 3332496 42 30 29 kernel/complex_update 8190 36 28 27 kernel/complex_update 8190 36 28 27 kernel/countnegative 260303 78 45 45 kernel/countnegative 260303 78 45 45 kernel/fft 493123975 103 87 52 kernel/fft 493123975 103 87 52 kernel/filterbank 38302875 164 151 66 kernel/filterbank 38302875 164 151 66 kernel/fir2dim 86737 212 197 116 kernel/fir2dim 86737 212 197 116 kernel/iir 3307 41 32 31 kernel/iir 3307 41 32 31 kernel/insertsort 16148 50 35 28 kernel/insertsort 16148 50 35 28 kernel/jfdctint 9043 115 107 54 kernel/jfdctint 9043 115 107 54 kernel/lms 1758977 82 56 23 kernel/lms 1758977 82 56 23 kernel/ludcmp 97908 173 137 44 kernel/ludcmp 97908 173 137 44 kernel/matrix1 248058 48 43 42 kernel/matrix1 248058 48 43 42 kernel/md5 367421931 256 149 72 kernel/md5 367421931 256 149 72 kernel/minver 67700 254 173 46 kernel/minver 67700 254 173 46 kernel/pm 141189221 256 247 45 kernel/pm 141189221 256 247 45 kernel/prime 386343 80 54 41 kernel/prime 386343 80 54 41 kernel/sha 28380272 253 185 31 kernel/sha 28380272 253 185 31 kernel/st 1763900 161 80 43 kernel/st 1763900 161 80 43 sequential/adpcm_dec 52530 233 145 59 sequential/adpcm_dec 52530 233 145 59 sequential/adpcm_enc 58861 236 158 75 sequential/adpcm_enc 58861 236 158 75 sequential/audiobeam 6434692 256 212 46 sequential/audiobeam 6434692 256 212 46 sequential/cjpeg_transupp 535718162 256 256 103 sequential/cjpeg_transupp 535718162 256 256 103 sequential/cjpeg_wrbmp 1610145 138 80 38 sequential/cjpeg_wrbmp 1610145 138 80 38 sequential/dijkstra 39781181581 151 80 46 sequential/dijkstra 39781181581 151 80 46 sequential/epic 7423276281 256 256 107 sequential/epic 7423276281 256 256 107 sequential/g723_enc 22919200 256 154 81 sequential/g723_enc 22919200 256 154 81 sequential/gsm_dec 3744323 256 236 69 sequential/gsm_dec 3744323 256 236 69 sequential/gsm_encode 2115350 256 256 118 sequential/gsm_encode 2115350 256 256 118 sequential/h264_dec 24979237 256 166 29 sequential/h264_dec 24979237 256 166 29 sequential/huff_dec 9360435 254 144 44 sequential/huff_dec 9360435 254 144 44 sequential/mpeg2 130756234186 256 256 154 sequential/mpeg2 130756234186 256 256 154 sequential/ndes 996427 253 167 39 sequential/ndes 996427 253 167 39 sequential/petrinet 39951 256 92 2 sequential/petrinet 39951 256 92 2 sequential/ri._dec 1811372648 256 173 44 sequential/ri._dec 1811372648 256 173 44 sequential/ri._enc 39467989 256 181 44 sequential/ri._enc 39467989 256 181 44 sequential/statemate 1949343 256 91 1 sequential/statemate 1949343 256 91 1 sequential/susan 2051176771 256 255 79 sequential/susan 2051176771 256 255 79 Task (Mälardalen Bench.) WCET ECB UCB Max Task (Mälardalen Bench.) WCET ECB UCB Max adpcm 82492494 256 230 103 adpcm 82492494 256 230 103 bs 3052 43 23 20 bs 3052 43 23 20 bsort100 3146185 57 40 30 bsort100 3146185 57 40 30 cnt 127558 123 58 44 cnt 127558 123 58 44 compress 1090099 247 150 63 compress 1090099 247 150 63 cover 74509 256 38 15 cover 74509 256 38 15 crc 1376054 121 62 30 crc 1376054 121 62 30 edn 739866 256 222 123 edn 739866 256 222 123 expint 2161270 117 47 29 expint 2161270 117 47 29 fdct 10258 126 113 62 fdct 10258 126 113 62 fft1 271733 222 154 63 fft1 271733 222 154 63 fibcall 8406 28 16 16 fibcall 8406 28 16 16 fir 12413071 94 42 21 fir 12413071 94 42 21 insertsort 11291 29 16 15 insertsort 11291 29 16 15 janne_complex 33778 39 28 27 janne_complex 33778 39 28 27 jfdctint 21742 132 122 54 jfdctint 21742 132 122 54 lcdnum 6100 51 11 9 lcdnum 6100 51 11 9 lms 10178805 242 134 38 lms 10178805 242 134 38 ludcmp 116312 210 168 44 ludcmp 116312 210 168 44 matmult 1447379 85 51 31 matmult 1447379 85 51 31 minver 67157 256 178 47 minver 67157 256 178 47 ndes 1050163 253 176 38 ndes 1050163 253 176 38 ns 126865 55 37 34 ns 126865 55 37 34 nsichneu 201969 256 183 2 nsichneu 201969 256 183 2 prime 7782800 75 47 33 prime 7782800 75 47 33 qsort. 163089 142 83 39 qsort. 163089 142 83 39 qurt 71655 130 40 26 qurt 71655 130 40 26 select 6306 159 73 55 select 6306 159 73 55 sqrt 22436 53 21 12 sqrt 22436 53 21 12 st 3701746 192 95 52 st 3701746 192 95 52 statemate 41579 256 105 1 statemate 41579 256 105 1 ud 355318 194 151 39 ud 355318 194 151 39

Table 8.2. Task characteristics obtained with LLVMTA [2] analysis tool used on Mälardalen [3] and Table 8.2. Task characteristics obtained with LLVMTA [2] analysis tool used on Mälardalen [3] and TACLe [4] benchmark tasks . TACLe [4] benchmark tasks . 166

136 Chapter 8. Preemption-delay aware analysis for FPPS 136 Chapter 8. Preemption-delay aware analysis for FPPS

Mälardalen Benchmark Mälardalen Benchmark 100 100

80 0.7 80 0.7

60 60 0.6 0.6 40 40

20 20 Weighted measure 0.5 Weighted measure 0.5

Schedulable tasksets (%) 0 Schedulable tasksets (%) 0 85 90 95 100 345678910 85 90 95 100 345678910 Utilisation Taskset size Utilisation Taskset size TACLe Benchmark TACLe Benchmark 100 0.8 100 0.8

80 0.75 80 0.75

60 0.7 60 0.7 UCB-Union Multiset UCB-Union Multiset 40 ECB-Union Multiset 0.65 40 ECB-Union Multiset 0.65 Combined Multiset Combined Multiset 20 3DUWLWLRQLQJ-ver1 0.6 20 3DUWLWLRQLQJ-ver1 0.6 3DUWLWLRQLQJ-ver2 Weighted measure 3DUWLWLRQLQJ-ver2 Weighted measure Schedulable tasksets (%) 0 0.55 Schedulable tasksets (%) 0 0.55 85 90 95 100 345678910 85 90 95 100 345678910 Utilisation Taskset size Utilisation Taskset size

Figure 8.4. Left: Schedulability ratio at different taskset utilisation. Right: Weighted measure at Figure 8.4. Left: Schedulability ratio at different taskset utilisation. Right: Weighted measure at different taskset size. different taskset size.

In Figure 8.4, on the leftmost plot, we show the schedulability results of In Figure 8.4, on the leftmost plot, we show the schedulability results of an experiment where we generated tasksets of size 9, from the Mälardalen an experiment where we generated tasksets of size 9, from the Mälardalen tasks (top left), and TACLe tasks (bottom left). For each generated taskset, tasks (top left), and TACLe tasks (bottom left). For each generated taskset, its utilisation was varied from 0.5 to 1, by step of 0.01. The results show its utilisation was varied from 0.5 to 1, by step of 0.01. The results show that Partitioning-ver2 and Partitioning-ver1 identify the highest number of that Partitioning-ver2 and Partitioning-ver1 identify the highest number of schedulable tasksets, even up to 23% more for Partitioning-ver2, and 20% more schedulable tasksets, even up to 23% more for Partitioning-ver2, and 20% more for Partitioning-ver1, compared to Combined multiset. for Partitioning-ver1, compared to Combined multiset. To increase the exhaustiveness of the performed evaluation and the respec- To increase the exhaustiveness of the performed evaluation and the respective results, for the rightmost plots from Figure 8.4 we used the weighted tive results, for the rightmost plots from Figure 8.4 we used the weighted schedulability measure in order to show a 2-dimensional plot which would schedulability measure in order to show a 2-dimensional plot which would otherwise be a 3-dimensional plot, as proposed by Bastoni et al. [68]. In those otherwise be a 3-dimensional plot, as proposed by Bastoni et al. [68]. In those ﬁgures, we show the weighted schedulability measure Wy(|Γ|), for schedula- ﬁgures, we show the weighted schedulability measure Wy(|Γ|), for schedulability test y as a function of taskset size |Γ|. For each taskset size (in range bility test y as a function of taskset size |Γ|. For each taskset size (in range from 3 to 10), this measure combines data for all of the tasksets generated from 3 to 10), this measure combines data for all of the tasksets generated for each utilisation level from 0.85 to 1, with step of 0.1, since for utilisation for each utilisation level from 0.85 to 1, with step of 0.1, since for utilisation levels from 0 to 0.85 all of the compared methods deem almost all tasksets to levels from 0 to 0.85 all of the compared methods deem almost all tasksets to 167

8.4 Evaluation 137 8.4 Evaluation 137

TACLe Benchmark Mälardalen Benchmark TACLe Benchmark Mälardalen Benchmark Combined-Multiset 3DUWLWLRQLQJYHU Combined-Multiset 3DUWLWLRQLQJver Combined-Multiset 3DUWLWLRQLQJYHU Combined-Multiset 3DUWLWLRQLQJver 800 800

600 0 0 600 0 0 0 0 0 0 0 0 0 0

400 83312 69324 400 83312 69324 C-M S37alYHU S37YHU C-M S37alYHU S37YHU 200 0 5568 0 9087 200 0 5568 0 9087 Worst-case time (s) Worst-case time (s) 0 348 1059 0 348 1059 345678910 345678910 Taskset size 3DUWLWLRQLQJYHU 3DUWLWLRQLQJver2 Taskset size 3DUWLWLRQLQJYHU 3DUWLWLRQLQJver2

Figure 8.5. Leftmost: The worst-case measured analysis time per taskset, at different tasket size. Figure 8.5. Leftmost: The worst-case measured analysis time per taskset, at different tasket size. Center and rightmost: Venn Diagrams[1] representing schedulability result relations between Center and rightmost: Venn Diagrams[1] representing schedulability result relations between different methods, over 120000 analysed tasksets per each – TACLe Benchmark and Mälardalen different methods, over 120000 analysed tasksets per each – TACLe Benchmark and Mälardalen Benchmark. Benchmark.

be schedulable. For each taskset size |Γ|, the schedulability measure Wy(|Γ|) be schedulable. For each taskset size |Γ|, the schedulability measure Wy(|Γ|) is equal to Wy(|Γ|)= ∀Γ(UΓ × By(Γ, |Γ|))/ ∀Γ UΓ, where By(Γ, |Γ|) is is equal to Wy(|Γ|)= ∀Γ(UΓ × By(Γ, |Γ|))/ ∀Γ UΓ, where By(Γ, |Γ|) is the binary result (1 if schedulable, 0 otherwise) of a schedulability test y for a the binary result (1 if schedulable, 0 otherwise) of a schedulability test y for a taskset Γ and taskset size |Γ|. Weighting the schedulability results by taskset taskset Γ and taskset size |Γ|. Weighting the schedulability results by taskset utilisation means that the method which succeeds to produce a higher weighted utilisation means that the method which succeeds to produce a higher weighted measure, compared to the others, is more prone to identify tasksets with higher measure, compared to the others, is more prone to identify tasksets with higher utilisation as schedulable. utilisation as schedulable. The results show that Partitioning-ver2 is able to identify more schedulable The results show that Partitioning-ver2 is able to identify more schedulable tasksets compared to the others for any given taskset size, immediately followed tasksets compared to the others for any given taskset size, immediately followed by Partitioning-ver1. Also, as the taskset size increases, the multiset-based by Partitioning-ver1. Also, as the taskset size increases, the multiset-based methods deteriorate more in identifying schedulable tasksets compared to the methods deteriorate more in identifying schedulable tasksets compared to the proposed methods. This means that partitioning-based methods are able to proposed methods. This means that partitioning-based methods are able to identify more tasksets as schedulable with an increase of the taskset size and identify more tasksets as schedulable with an increase of the taskset size and utilisation. utilisation. Next, we report the worst-case computation time results since Partitioning- Next, we report the worst-case computation time results since Partitioning- ver2 uses set partitioning which is known to be a computation with quadratic/- ver2 uses set partitioning which is known to be a computation with quadratic/- exponential complexity, depending on the algorithm type. The results, reported exponential complexity, depending on the algorithm type. The results, reported in the left-most plot in Figure 8.5, were computed on MacBook Pro (Retina, 13- in the left-most plot in Figure 8.5, were computed on MacBook Pro (Retina, 13- inch, Early 2015) version, with Intel Core i5 processor of 2,9 GHz, and DDR3 inch, Early 2015) version, with Intel Core i5 processor of 2,9 GHz, and DDR3 RAM memory of 8 GB, and 1867 MHz. We used a sequential set partitioning RAM memory of 8 GB, and 1867 MHz. We used a sequential set partitioning algorithm, and as shown in the graph, in this case exponential complexity is algorithm, and as shown in the graph, in this case exponential complexity is evident for Partitioning-ver2. However, the proposed method is intended to be evident for Partitioning-ver2. However, the proposed method is intended to be used ofﬂine, and its performance can be improved using the algorithm from used ofﬂine, and its performance can be improved using the algorithm from [118], and even more with parallel computing, e.g. set partitioning algorithms [118], and even more with parallel computing, e.g. set partitioning algorithms 168

138 Chapter 8. Preemption-delay aware analysis for FPPS 138 Chapter 8. Preemption-delay aware analysis for FPPS proposed by Djokic et al. [117]. In contrast, Partitioning-ver1 has a low worst- proposed by Djokic et al. [117]. In contrast, Partitioning-ver1 has a low worst- case time measured for each experiment for different taskset sizes, similar to case time measured for each experiment for different taskset sizes, similar to the the Combined-multiset approach. the the Combined-multiset approach. Finally, we show the relations between the results in Figure 8.5 (central and Finally, we show the relations between the results in Figure 8.5 (central and rightmost figures). In the central figure, it is evident that all tasksets from TACLe rightmost figures). In the central figure, it is evident that all tasksets from TACLe benchmark, that are identified as schedulable by Combined-multiset, are also benchmark, that are identified as schedulable by Combined-multiset, are also identified as schedulable by Partitioning-ver1 and Partitioning-ver2. However, identified as schedulable by Partitioning-ver1 and Partitioning-ver2. However, partitioning-based approaches identify 5568 (and 9087) additional schedulable partitioning-based approaches identify 5568 (and 9087) additional schedulable tasksets depending on the benchmark, while Partitioning-ver2 identifies addi- tasksets depending on the benchmark, while Partitioning-ver2 identifies additional 348 (and 1059) schedulable tasksets compared to Partitioning-ver1.In tional 348 (and 1059) schedulable tasksets compared to Partitioning-ver1.In conclusion of the evaluation, we notice that the proposed partitioning-based al- conclusion of the evaluation, we notice that the proposed partitioning-based algorithms outperform existing state of the art Combined-Multiset approach. Also gorithms outperform existing state of the art Combined-Multiset approach. Also Partitioning-ver2 outperforms Partitioning-ver1, however this comes with the Partitioning-ver2 outperforms Partitioning-ver1, however this comes with the expense of time complexity. The complexity of Partitioning-ver2 can be further expense of time complexity. The complexity of Partitioning-ver2 can be further decreased by narrowing down the task interactions for which the preemption decreased by narrowing down the task interactions for which the preemption combinations should be generated. This remains as a part of the future work combinations should be generated. This remains as a part of the future work as well as the formal proof of the dominance relations between the methods. as well as the formal proof of the dominance relations between the methods. Finally, the proposed approaches allow for a hybrid, joint use of the two pro- Finally, the proposed approaches allow for a hybrid, joint use of the two proposed algorithms, while Partitioning-ver1 significantly outperforms the existing posed algorithms, while Partitioning-ver1 significantly outperforms the existing multiset approaches without the expense of time complexity. multiset approaches without the expense of time complexity.

8.5 Summary and conclusions 8.5 Summary and conclusions

In this chapter, we proposed a partitioning based preemption-delay aware In this chapter, we proposed a partitioning based preemption-delay aware schedulability analysis for precise and safe estimation of cache-related preemp- schedulability analysis for precise and safe estimation of cache-related preemption delays in the context of fully-preemptive scheduling of real-time systems tion delays in the context of fully-preemptive scheduling of real-time systems with sporadic fixed-priority tasks. The proposed analyses account for : 1) differ- with sporadic fixed-priority tasks. The proposed analyses account for : 1) different preemption subgroups, and 2) different preemption combinations that may ent preemption subgroups, and 2) different preemption combinations that may occur within a system, and therefore they are able to compute more accurate occur within a system, and therefore they are able to compute more accurate cache-related preemption delay estimations compared to the state of the art cache-related preemption delay estimations compared to the state of the art approaches. The evaluation was performed using the realistic task parameters approaches. The evaluation was performed using the realistic task parameters from well-established benchmarks, obtained with a low-level analysis tool, and from well-established benchmarks, obtained with a low-level analysis tool, and it showed that the proposed approaches manage to identify significantly more it showed that the proposed approaches manage to identify significantly more schedulable tasksets compared to the other preemption-delay aware approaches. schedulable tasksets compared to the other preemption-delay aware approaches. 169

Chapter 9 Chapter 9

Preemption-delay aware Preemption-delay aware partitioning in multi-core partitioning in multi-core systems systems

In this chapter, we consider multi-core partitioned systems, and we propose a In this chapter, we consider multi-core partitioned systems, and we propose a new joint approach for task–to–core allocation and preemption point selection, new joint approach for task–to–core allocation and preemption point selection, which is based on computation of the maximum blocking tolerance. This which is based on computation of the maximum blocking tolerance. This approach enables quantification of taskset schedulability per each processor, and approach enables quantification of taskset schedulability per each processor, and also reduction of preemption delays that may be imposed upon task allocation. also reduction of preemption delays that may be imposed upon task allocation. The chapter is divided into the following sections: The chapter is divided into the following sections: Problem statement and motivation – a description of problems in multi- Problem statement and motivation – a description of problems in multi- core partitioned systems, and motivation for a novel approach. core partitioned systems, and motivation for a novel approach. System model and preemption point selection – extensions to the system System model and preemption point selection – extensions to the system model from Chapter 5, accounting for multiple cores and preemption model from Chapter 5, accounting for multiple cores and preemption point selection of sequential tasks. point selection of sequential tasks. Simplified feasibility analysis for LPS-FPP – a description of the simpli- Simplified feasibility analysis for LPS-FPP – a description of the simplified feasibility analysis for preemption point selection, fied feasibility analysis for preemption point selection, Partitioned scheduling under LPS-FPP – the proposed solution, Partitioned scheduling under LPS-FPP – the proposed solution, Evaluation – a description of experiments and their results, Evaluation – a description of experiments and their results, Summary and Conclusions of the chapter Summary and Conclusions of the chapter

139 139 170

140 Chapter 9. Preemption-delay aware partitioning (multi-core) 140 Chapter 9. Preemption-delay aware partitioning (multi-core)

Table 9.1. List of important symbols used in this chapter (CRPD terms are upper bounds). Table 9.1. List of important symbols used in this chapter (CRPD terms are upper bounds).

Symbol Brief explanation Symbol Brief explanation

Pi Processor with index i Pi Processor with index i Γi Set of tasks assigned to Pi Γi Set of tasks assigned to Pi w Number of processors in a system w Number of processors in a system Usys System utilisation Usys System utilisation Umin Minimum utilisation of a single task Umin Minimum utilisation of a single task Umax Maximum utilisation of a single task Umax Maximum utilisation of a single task MBT (Γx) The maximum blocking tolerance of Γx MBT (Γx) The maximum blocking tolerance of Γx Γ Set of all tasks before partitioning Γ Set of all tasks before partitioning τk A task with index i (decreasing priority order, i.e. Pk >Pk+1) τk A task with index i (decreasing priority order, i.e. Pk >Pk+1) Tk The minimum inter-arrival time of τk Tk The minimum inter-arrival time of τk Dk Relative deadline of τk Dk Relative deadline of τk Ck The worst-case execution time of τk without preemption delays. Ck The worst-case execution time of τk without preemption delays. Pk Priority of τk Pk Priority of τk Sk Density of τk (Sk = Ck/Dk) Sk Density of τk (Sk = Ck/Dk) hp(τk) Set of tasks with higher priority than τk, that are allocated to a pro- hp(τk) Set of tasks with higher priority than τk, that are allocated to a processor which is considered for allocation of τk cessor which is considered for allocation of τk lp(τk) Set of tasks with lower priority than τk, that are allocated to a proces- lp(τk) Set of tasks with lower priority than τk, that are allocated to a processor which is considered for allocation of τk sor which is considered for allocation of τk zk Number of subjobs in τk before preemption point selection zk Number of subjobs in τk before preemption point selection lk Number of NPRs in τk after preemption point selection lk Number of NPRs in τk after preemption point selection βk Maximum blocking tolerance of the tasks with higher than or equal βk Maximum blocking tolerance of the tasks with higher than or equal priority to τk. priority to τk. Qk The maximum blocking tolerance of the tasks with the higher priority Qk The maximum blocking tolerance of the tasks with the higher priority than τk than τk SPP k Set of preemption points in τk, enabled for runtime preemptions SPP k Set of preemption points in τk, enabled for runtime preemptions

SJ k,y The y-th subjob of τk SJ k,y The y-th subjob of τk ck,y The worst-case execution time of the y-th subjob of τk ck,y The worst-case execution time of the y-th subjob of τk δk,j The j-th non-preemptive region of τk δk,j The j-th non-preemptive region of τk qk,j The worst-case execution time of the j-th NPR of τk qk,j The worst-case execution time of the j-th NPR of τk max max qk WCET of the largest non-preemptive region from τk qk WCET of the largest non-preemptive region from τk PP k,y The y-th preemption point of τk PP k,y The y-th preemption point of τk ξi,k Upper bound on preemption delay resulting from preemptions at ξi,k Upper bound on preemption delay resulting from preemptions at PP i,k PP i,k

Π Set of ordered pairs (Pi, Γi) representing the assignment of the subset Π Set of ordered pairs (Pi, Γi) representing the assignment of the subset of tasks Γi (Γi ⊆ Γ) to a processor Pi of tasks Γi (Γi ⊆ Γ) to a processor Pi 171

9.1 Problem statement and motivation 141 9.1 Problem statement and motivation 141

9.1 Problem statement and motivation 9.1 Problem statement and motivation

From the real-time perspective, multi-core architectures introduce an additional From the real-time perspective, multi-core architectures introduce an additional dimension to scheduling, as discussed by e.g., Schliecker et al. [119], since dimension to scheduling, as discussed by e.g., Schliecker et al. [119], since it is no longer only a question of when to execute a task, but also on which it is no longer only a question of when to execute a task, but also on which core to be executed. The latter problem is addressed in two main classes of core to be executed. The latter problem is addressed in two main classes of approaches: global scheduling (where a task may be assigned to and executed approaches: global scheduling (where a task may be assigned to and executed by any of the available computational cores), and partitioned scheduling (where by any of the available computational cores), and partitioned scheduling (where the tasks are a priori assigned to a single specified core where they may be the tasks are a priori assigned to a single specified core where they may be executed). Compared to the global scheduling, partitioned scheduling offers executed). Compared to the global scheduling, partitioned scheduling offers better predictability of the guarantees for the temporal system properties. The better predictability of the guarantees for the temporal system properties. The schedulability of a taskset in the partitioned multi-core domain depends on the schedulability of a taskset in the partitioned multi-core domain depends on the exact allocation of the tasks to the specified cores. exact allocation of the tasks to the specified cores. Although it has been proved that the LPS-FPP approach increases the Although it has been proved that the LPS-FPP approach increases the schedulability of a taskset in a single-core case, the schedulability of a taskset schedulability of a taskset in a single-core case, the schedulability of a taskset in the multi-core domain depends also on the task–to–core allocation. There- in the multi-core domain depends also on the task–to–core allocation. There- fore, to improve a taskset schedulability in fixed-priority multi-core systems fore, to improve a taskset schedulability in fixed-priority multi-core systems there is a need to consider the joint schedulability impact coming from 1) task there is a need to consider the joint schedulability impact coming from 1) task partitioning, 2) determination of the non-preemptive regions within the tasks, partitioning, 2) determination of the non-preemptive regions within the tasks, and 3) consequential preemption-related delays due to the previous two. In and 3) consequential preemption-related delays due to the previous two. In this chapter, we integrate the task partitioning and the LPS-FPP approach on this chapter, we integrate the task partitioning and the LPS-FPP approach on multi-core systems to gain the benefits from both concepts concerning taskset multi-core systems to gain the benefits from both concepts concerning taskset schedulability. schedulability.

9.2 System model and preemption point selection 9.2 System model and preemption point selection

In this chapter, we consider a real-time system consisting of w identical proces- In this chapter, we consider a real-time system consisting of w identical processors P1, P2, ..., Pw and a set Γ of n sporadic independent tasks τ1, τ2, ..., τn sors P1, P2, ..., Pw and a set Γ of n sporadic independent tasks τ1, τ2, ..., τn with constrained deadlines, under the Fixed-Priority LPS-FPP paradigm. We with constrained deadlines, under the Fixed-Priority LPS-FPP paradigm. We assume that the tasks are ordered in a decreasing priority order. Each task τk is assume that the tasks are ordered in a decreasing priority order. Each task τk is deﬁned with a tuple {Ck,Dk,Tk,Sk} where Ck denotes the task’s worst-case deﬁned with a tuple {Ck,Dk,Tk,Sk} where Ck denotes the task’s worst-case execution time (WCET) without preemption-related delays. The relative dead- execution time (WCET) without preemption-related delays. The relative deadline of τk is denoted by Dk, and the minimum inter-arrival time between the line of τk is denoted by Dk, and the minimum inter-arrival time between the two consecutive instances of τk is denoted by Tk. The density Sk of each τk two consecutive instances of τk is denoted by Tk. The density Sk of each τk Ck Ck Sk = Sk = is calculated by Dk and it represents the peak processing demand of a is calculated by Dk and it represents the peak processing demand of a single job of a task. single job of a task. We consider a sequential task model under the LPS-FPP paradigm, and We consider a sequential task model under the LPS-FPP paradigm, and each task τk initially consists of zk subjobs, where the y-th subjob is denoted each task τk initially consists of zk subjobs, where the y-th subjob is denoted 172

142 Chapter 9. Preemption-delay aware partitioning (multi-core) 142 Chapter 9. Preemption-delay aware partitioning (multi-core)

with SJ k,y with the WCET ck,y. Subjobs are separated by zk − 1 potential with SJ k,y with the WCET ck,y. Subjobs are separated by zk − 1 potential preemption points, where the y-th preemption point is denoted with PP k,y with preemption points, where the y-th preemption point is denoted with PP k,y with a worst case preemption-related delay ξk,y which can be caused when τi is a worst case preemption-related delay ξk,y which can be caused when τi is preempted at PP k,y, assuming the worst possible cache interference from the preempted at PP k,y, assuming the worst possible cache interference from the preemption. preemption.

9.2.1 Preemption point selection 9.2.1 Preemption point selection Once we know the potential preemption points for each task, we may select a Once we know the potential preemption points for each task, we may select a subset of those to improve the schedulability of a system. Several preemption subset of those to improve the schedulability of a system. Several preemption point algorithms are proposed in state of the art, e.g., [62, 63, 64]. For each point algorithms are proposed in state of the art, e.g., [62, 63, 64]. For each task τi of a taskset, these algorithms ﬁrst compute the maximum blocking task τi of a taskset, these algorithms ﬁrst compute the maximum blocking tolerance of the tasks with higher than or equal priority to τi and try to select tolerance of the tasks with higher than or equal priority to τi and try to select preemption points such that the maximum length of any created non-preemptive preemption points such that the maximum length of any created non-preemptive region is not greater than the calculated tolerance. Maximum blocking tolerance region is not greater than the calculated tolerance. Maximum blocking tolerance represents the longest time interval for which a task can be blocked before it represents the longest time interval for which a task can be blocked before it can miss a deadline. can miss a deadline. For example, we show in Figure 9.1 the maximum blocking tolerance of τh, For example, we show in Figure 9.1 the maximum blocking tolerance of τh, which is also the length of the possible maximum non-preemptive region for any which is also the length of the possible maximum non-preemptive region for any task with lower priority than τh. We notice that the maximum blocking tolerance task with lower priority than τh. We notice that the maximum blocking tolerance of τh is smaller than the WCET of τi, therefore if we want to guarantee that τh of τh is smaller than the WCET of τi, therefore if we want to guarantee that τh meets its deadline, we need to split τi into at least two actual non-preemptive meets its deadline, we need to split τi into at least two actual non-preemptive regions. We also need to consider the worst case preemption delay that the regions. We also need to consider the worst case preemption delay that the preemption may cause. preemption may cause. maximum maximum blocking tolerance blocking tolerance

maximum length of the maximum length of the non-preemptive region non-preemptive region

Figure 9.1. Correlation between the maximum blocking tolerance and maximum length of the Figure 9.1. Correlation between the maximum blocking tolerance and maximum length of the non-preemptive region. non-preemptive region.

Assuming that τi has two potential preemption points (as shown in Figure Assuming that τi has two potential preemption points (as shown in Figure 9.2), we can select none, only one of those, or both. Selecting none of the points 9.2), we can select none, only one of those, or both. Selecting none of the points 173

9.2 System model and preemption point selection 143 9.2 System model and preemption point selection 143

would result in a deadline miss from τh, because the largest non-preemptive would result in a deadline miss from τh, because the largest non-preemptive region of τi is larger than the maximum blocking tolerance of τi, as shown in region of τi is larger than the maximum blocking tolerance of τi, as shown in the example from Figure 2.6 and Figure 9.1 . Selecting both points would result the example from Figure 2.6 and Figure 9.1 . Selecting both points would result in a deadline miss of τi, since the introduced preemption delay is large enough in a deadline miss of τi, since the introduced preemption delay is large enough to jeopardise its schedulability. We show this case in Figure 9.2. to jeopardise its schedulability. We show this case in Figure 9.2.

Figure 9.2. Example when all the potential preemption points are selected. Figure 9.2. Example when all the potential preemption points are selected.

However, by selecting only one point, e.g., PP i,1, we reduce the maximum However, by selecting only one point, e.g., PP i,1, we reduce the maximum length of the non-preemptive region of τi, which is then smaller than the length of the non-preemptive region of τi, which is then smaller than the maximum blocking tolerance of τh, as shown in Figure 9.3. This means that τh maximum blocking tolerance of τh, as shown in Figure 9.3. This means that τh meets its deadline, because τi may only be preempted at PP i,1. Considering meets its deadline, because τi may only be preempted at PP i,1. Considering the introduced preemption delay, schedulability of τi is not jeopardised since the introduced preemption delay, schedulability of τi is not jeopardised since we account for the worst-case preemption delay only for PP i,1. we account for the worst-case preemption delay only for PP i,1.

Figure 9.3. Example of the LP-FPPS beneﬁt. Figure 9.3. Example of the LP-FPPS beneﬁt. 174

144 Chapter 9. Preemption-delay aware partitioning (multi-core) 144 Chapter 9. Preemption-delay aware partitioning (multi-core)

9.2.2 Task model after preemption point selection 9.2.2 Task model after preemption point selection

As a result of preemption point selection, each task τk consists of lk non- As a result of preemption point selection, each task τk consists of lk non- preemptive regions and therefore lk − 1 selected preemption points. The j- preemptive regions and therefore lk − 1 selected preemption points. The j- th non-preemptive region is denoted with δk,j with the WCET qk,j and it th non-preemptive region is denoted with δk,j with the WCET qk,j and it includes the preemption-related delay that can be caused by preempting the includes the preemption-related delay that can be caused by preempting the selected preemption point which precedes the non-preemptive region (see Figure selected preemption point which precedes the non-preemptive region (see Figure 9.4). The exception is the ﬁrst non-preemptive region for which we assume a 9.4). The exception is the ﬁrst non-preemptive region for which we assume a preemption-related cost of zero. preemption-related cost of zero.

Figure 9.4. Top: τk before the preemption point selection. Bottom: τk after the preemption point Figure 9.4. Top: τk before the preemption point selection. Bottom: τk after the preemption point selection where only the second preemption point is selected. selection where only the second preemption point is selected.

In Figure 9.4, we show the difference between the estimated WCET of a task In Figure 9.4, we show the difference between the estimated WCET of a task before and after the preemption point selection. Top of the figure shows the task before and after the preemption point selection. Top of the figure shows the task with two potential preemption points (PP k,1 and PP k,2) with the estimated with two potential preemption points (PP k,1 and PP k,2) with the estimated CRPDs (ξk,1 and ξk,2, coloured black in the figure). Two potential preemption CRPDs (ξk,1 and ξk,2, coloured black in the figure). Two potential preemption points define three task subjobs with the WCET values ck,1,ck,2 and ck,3. In the points define three task subjobs with the WCET values ck,1,ck,2 and ck,3. In the bottom of the figure, we show the task τk after the preemption point selection, bottom of the figure, we show the task τk after the preemption point selection, assuming that only PP k,2 is selected. Thus, we account only for the CRPD assuming that only PP k,2 is selected. Thus, we account only for the CRPD from PP k,2 and consequently reduce the WCET of τk by defining two non- from PP k,2 and consequently reduce the WCET of τk by defining two non- preemptive regions with WCETs qk,1 = ck,1 + ck,2 and qk,2 = ξk,2 + ck,3.In preemptive regions with WCETs qk,1 = ck,1 + ck,2 and qk,2 = ξk,2 + ck,3.In the following subsection, we summarize the feasibility analysis for the above- the following subsection, we summarize the feasibility analysis for the above- defined task model. defined task model. 175

9.3 Simpliﬁed feasibility analysis for LPS-FPP 145 9.3 Simpliﬁed feasibility analysis for LPS-FPP 145

9.3 Simpliﬁed feasibility analysis for LPS-FPP 9.3 Simpliﬁed feasibility analysis for LPS-FPP

In this section, we describe the feasibility analysis for the preemption point In this section, we describe the feasibility analysis for the preemption point selection algorithms, proposed by Bertogna et al. [120]. This analysis examines selection algorithms, proposed by Bertogna et al. [120]. This analysis examines whether the non-preemptive regions may jeopardise the schedulability of the whether the non-preemptive regions may jeopardise the schedulability of the higher priority tasks whose start times may be delayed due to the non-preemptive higher priority tasks whose start times may be delayed due to the non-preemptive execution of the lower priority tasks. Therefore, for each task τk (τk ∈ Γ ∧ execution of the lower priority tasks. Therefore, for each task τk (τk ∈ Γ ∧ |Γ| = n ) the analysis considers the relation between the following parameters: |Γ| = n ) the analysis considers the relation between the following parameters: max max qk – WCET of the largest non-preemptive region from τk. It also qk – WCET of the largest non-preemptive region from τk. It also represents the maximum possible blocking that τk may cause on the represents the maximum possible blocking that τk may cause on the higher priority tasks than τk, formally: higher priority tasks than τk, formally: lk lk max maxj=1{qk,j} ,k ≤ n max maxj=1{qk,j} ,k ≤ n qk = (9.1) qk = (9.1) 0 ,k = n +1 0 ,k = n +1

βk – Maximum blocking tolerance of the tasks with higher than or βk – Maximum blocking tolerance of the tasks with higher than or equal priority to τk. It is also the maximum allowed length of the non- equal priority to τk. It is also the maximum allowed length of the non- preemptive regions in tasks with lower priority than τk, whose blocking preemptive regions in tasks with lower priority than τk, whose blocking may jeopardise the schedulability of the higher priority tasks: may jeopardise the schedulability of the higher priority tasks: a a βk =max a − Ch βk =max a − Ch a∈A|a≤Dk Th a∈A|a≤Dk Th h≤k (9.2) h≤k (9.2) where A = {vTh| v ∈ N, 1 ≤ h

Thus, based on these two factors, Bertogna et al. proved the following theorem: Thus, based on these two factors, Bertogna et al. proved the following theorem:

Theorem 9.1 (from [120]). A taskset Γ is schedulable with limited preemptive Theorem 9.1 (from [120]). A taskset Γ is schedulable with limited preemptive scheduling if: scheduling if: max max ∀k, 1

The theorem states that the taskset is schedulable under LPS if the WCET The theorem states that the taskset is schedulable under LPS if the WCET of the largest non-preemptive region of each task τk is not greater than the of the largest non-preemptive region of each task τk is not greater than the maximum blocking tolerance of the tasks with the higher priority than τk. maximum blocking tolerance of the tasks with the higher priority than τk. Notice, that for the last task τn in a taskset, we need to guarantee that βn > 0 Notice, that for the last task τn in a taskset, we need to guarantee that βn > 0 for τn to be schedulable. This condition is taken into account by n +1. for τn to be schedulable. This condition is taken into account by n +1. 176

146 Chapter 9. Preemption-delay aware partitioning (multi-core) 146 Chapter 9. Preemption-delay aware partitioning (multi-core)

9.4 Partitioned scheduling under LPS-FPP 9.4 Partitioned scheduling under LPS-FPP

Task partitioning is proved to be NP-hard in the strong sense by the transforma- Task partitioning is proved to be NP-hard in the strong sense by the transformation from the bin-packing problem, as shown by David Johnson [69]. In this tion from the bin-packing problem, as shown by David Johnson [69]. In this chapter, we use two widely proposed near-optimal bin-packing heuristics to chapter, we use two widely proposed near-optimal bin-packing heuristics to partition the tasks. Following the methodology proposed for the bin-packing partition the tasks. Following the methodology proposed for the bin-packing heuristics we assume that the original set of tasks is already sorted by some heuristics we assume that the original set of tasks is already sorted by some criteria before the partitioning. Afterwards, we assign the tasks to the proces- criteria before the partitioning. Afterwards, we assign the tasks to the processors in a given order, one by one, and upon each assignment we evaluate the sors in a given order, one by one, and upon each assignment we evaluate the partitioning test defined by the partitioning algorithm. A task is assigned to a partitioning test defined by the partitioning algorithm. A task is assigned to a specified processor only if the partitioning test is satisfied. In this section we first specified processor only if the partitioning test is satisfied. In this section we first describe Algorithm 4 which represents the schedulability and the partitioning describe Algorithm 4 which represents the schedulability and the partitioning test for a set of tasks which are assigned to a single processor. The partitioning test for a set of tasks which are assigned to a single processor. The partitioning test is based on the maximum blocking tolerance of the taskset in the specified test is based on the maximum blocking tolerance of the taskset in the specified processor since the higher the maximum blocking tolerance is, the greater the processor since the higher the maximum blocking tolerance is, the greater the possibility is that the next task assignment to this processor still results in a possibility is that the next task assignment to this processor still results in a schedulable taskset. Afterwards, we describe the partitioning algorithms based schedulable taskset. Afterwards, we describe the partitioning algorithms based on the First Fit Decreasing (FFD) and the Worst Fit Decreasing (WFD) bin on the First Fit Decreasing (FFD) and the Worst Fit Decreasing (WFD) bin packing heuristics. We select those two heuristics since it is shown by Bran- packing heuristics. We select those two heuristics since it is shown by Bran- denburg et al. [121] that both are very effective considering the partitioning denburg et al. [121] that both are very effective considering the partitioning problem in real-time systems. problem in real-time systems.

9.4.1 Partitioning test 9.4.1 Partitioning test

Before the assignment of a task τk to the speciﬁed processor Pi, with a set Γi Before the assignment of a task τk to the speciﬁed processor Pi, with a set Γi of the tasks already assigned to Pi, we need to consider the following problem of the tasks already assigned to Pi, we need to consider the following problem based on the desired assignment strategy: based on the desired assignment strategy:

Assuming that we assign τk to the processor, is it possible to select the Assuming that we assign τk to the processor, is it possible to select the preemption points for all tasks in Γi ∪{τk} in a way that does not jeopardise preemption points for all tasks in Γi ∪{τk} in a way that does not jeopardise the taskset schedulability? the taskset schedulability?

Therefore, we propose an algorithm which uses the maximum blocking Therefore, we propose an algorithm which uses the maximum blocking tolerance of the tasks in a taskset as the main criteria for partitioning. The main tolerance of the tasks in a taskset as the main criteria for partitioning. The main reason is that the higher the maximum blocking tolerance is, the greater the reason is that the higher the maximum blocking tolerance is, the greater the possibility is that the next task assignment to this processor still results in a possibility is that the next task assignment to this processor still results in a schedulable taskset. This is because the computation of the maximum blocking schedulable taskset. This is because the computation of the maximum blocking tolerance of a taskset identifies a task whose schedulability is closest to be tolerance of a taskset identifies a task whose schedulability is closest to be jeopardised and quantifies the peak processing demand which would be able to jeopardised and quantifies the peak processing demand which would be able to 177

9.4 Partitioned scheduling under LPS-FPP 147 9.4 Partitioned scheduling under LPS-FPP 147

Data: Taskset Γx of size n, with potential preemption points, sorted in Data: Taskset Γx of size n, with potential preemption points, sorted in a decreasing priority order a decreasing priority order

Result: Maximum blocking tolerance Qn+1 of a taskset, or -1 if the Result: Maximum blocking tolerance Qn+1 of a taskset, or -1 if the taskset is not schedulable taskset is not schedulable

1 Function MBT (Γx) 1 Function MBT (Γx)

2 for (k ← 2 to n) do 2 for (k ← 2 to n) do

3 Qk ← min1≤i

4 SPP k ← select preemption points in τk using 4 SPP k ← select preemption points in τk using 5 a preemption point selection algorithm 5 a preemption point selection algorithm max max 6 if (qk >Qk) then 6 if (qk >Qk) then

7 return −1 7 return −1

8 end 8 end lk lk 9 k ← k,l k,z 9 k ← k,l k,z q l=1 q + PP k,z∈SPP k ξ q l=1 q + PP k,z∈SPP k ξ 10 end 10 end 11 Qn+1 ← min1≤i

12 return Qn+1 12 return Qn+1

Algorithm 4: Derivation of the Maximum Blocking Tolerance of a taskset. Algorithm 4: Derivation of the Maximum Blocking Tolerance of a taskset. jeopardise this task. This is the main difference between the proposed algorithm jeopardise this task. This is the main difference between the proposed algorithm and the feasibility analysis proposed by Bertogna et al. [120], since the feasibil- and the feasibility analysis proposed by Bertogna et al. [120], since the feasibility analysis is a binary schedulability test, unlike the proposed algorithm whose ity analysis is a binary schedulability test, unlike the proposed algorithm whose maximum blocking tolerance result may be used for more heuristics which are maximum blocking tolerance result may be used for more heuristics which are proposed in the remainder of the chapter. Furthermore, the algorithm integrates proposed in the remainder of the chapter. Furthermore, the algorithm integrates the preemption point selection and also validates the taskset schedulability (see the preemption point selection and also validates the taskset schedulability (see Algorithm 4). Algorithm 4).

Brief explanation: The algorithm aims to compute the preemption point Brief explanation: The algorithm aims to compute the preemption point selection for a given set of tasks. If for any of the tasks the resulting selection selection for a given set of tasks. If for any of the tasks the resulting selection fails to satisfy the condition from Theorem 9.1, the algorithm will return -1, fails to satisfy the condition from Theorem 9.1, the algorithm will return -1, meaning that the taskset is unschedulable. Any other value returned by the meaning that the taskset is unschedulable. Any other value returned by the algorithm indicates that the taskset is schedulable and it represents its maximum algorithm indicates that the taskset is schedulable and it represents its maximum blocking tolerance. blocking tolerance. 178

148 Chapter 9. Preemption-delay aware partitioning (multi-core) 148 Chapter 9. Preemption-delay aware partitioning (multi-core)

The algorithm (see Algorithm 4) starts from the second highest priority The algorithm (see Algorithm 4) starts from the second highest priority task in the provided taskset Γx (line 2), since the highest priority task cannot task in the provided taskset Γx (line 2), since the highest priority task cannot be preempted. The algorithm first computes the maximum allowed length of be preempted. The algorithm first computes the maximum allowed length of the non-preemptive region, e.g., Qk, considering the blocking tolerance from the non-preemptive region, e.g., Qk, considering the blocking tolerance from the higher priority tasks (line 3). Then, based on the maximum allowed NPR the higher priority tasks (line 3). Then, based on the maximum allowed NPR length, the algorithm selects the preemption points with a predefined selection length, the algorithm selects the preemption points with a predefined selection algorithm (line 4). If the preemption point selection cannot guarantee that every algorithm (line 4). If the preemption point selection cannot guarantee that every NPR defined upon selection is less or equal to the maximum allowed NPR NPR defined upon selection is less or equal to the maximum allowed NPR length, e.g., ∃qk,j|qk,j >Qk, the algorithm returns −1 as the result (line 7), length, e.g., ∃qk,j|qk,j >Qk, the algorithm returns −1 as the result (line 7), meaning that the taskset is unschedulable. Otherwise, the algorithm updates the meaning that the taskset is unschedulable. Otherwise, the algorithm updates the k-th task with the worst case execution time accounting for the CRPD on the k-th task with the worst case execution time accounting for the CRPD on the selected preemption points (line 9). These steps are repeated until the algorithm selected preemption points (line 9). These steps are repeated until the algorithm finds an infeasible selection or until it selects the preemption points for all tasks. finds an infeasible selection or until it selects the preemption points for all tasks. One more step to be done upon the preemption point selection of all tasks is One more step to be done upon the preemption point selection of all tasks is to check if the maximum allowed NPR length of the last task is the one that is to check if the maximum allowed NPR length of the last task is the one that is minimal for the entire taskset (line 11). This is important because the accounted minimal for the entire taskset (line 11). This is important because the accounted preemption costs of the higher priority tasks may jeopardise schedulability of preemption costs of the higher priority tasks may jeopardise schedulability of the lowest priority task and it is identified with the negative Qn+1 value returned the lowest priority task and it is identified with the negative Qn+1 value returned by the algorithm. Finally, the algorithm returns the maximum allowed NPR by the algorithm. Finally, the algorithm returns the maximum allowed NPR length defined for the taskset Γx (line 12). As in the previous case, a negative length defined for the taskset Γx (line 12). As in the previous case, a negative Qn+1 value means that the taskset is unschedulable. Otherwise, the algorithm Qn+1 value means that the taskset is unschedulable. Otherwise, the algorithm quantifies the maximum blocking tolerance left for the tasks in the taskset. quantifies the maximum blocking tolerance left for the tasks in the taskset.

Computational complexity Computational complexity The complexity of Algorithm 4 depends on the provided preemption point selec- The complexity of Algorithm 4 depends on the provided preemption point selection algorithm. The preemption point selection algorithm proposed by Bertogna tion algorithm. The preemption point selection algorithm proposed by Bertogna et al. [62] is optimal and linear with respect to the number of preemption points. et al. [62] is optimal and linear with respect to the number of preemption points. Assuming that we use Bertogna’s selection and that the maximum number of Assuming that we use Bertogna’s selection and that the maximum number of potential preemption points in any task is L, the complexity of the proposed potential preemption points in any task is L, the complexity of the proposed Algorithm 4 is O(n ∗ L) since the selection algorithm is O(L) and Equation 9.2 Algorithm 4 is O(n ∗ L) since the selection algorithm is O(L) and Equation 9.2 can be evaluated in constant time. Moreover Qk can be computed in constant can be evaluated in constant time. Moreover Qk can be computed in constant time in line 3 using the value of Qk−1 from the previous iteration. time in line 3 using the value of Qk−1 from the previous iteration. 179

9.4 Partitioned scheduling under LPS-FPP 149 9.4 Partitioned scheduling under LPS-FPP 149

9.4.2 Task partitioning 9.4.2 Task partitioning In this section, we describe the task partitioning algorithms considering the In this section, we describe the task partitioning algorithms considering the LP-FPPS scheduling. In the remainder of the section we use the following LP-FPPS scheduling. In the remainder of the section we use the following notation: notation:

Next task to be assigned: τk Next task to be assigned: τk

The processor under consideration: Pi The processor under consideration: Pi

The set of tasks already assigned to the i-th processor: Γi The set of tasks already assigned to the i-th processor: Γi

Maximum blocking tolerance of a set of tasks assigned to the i-th proces- Maximum blocking tolerance of a set of tasks assigned to the i-th processor: MBT (Γi) sor: MBT (Γi)

First Fit Based Algorithm (FF) First Fit Based Algorithm (FF) The first proposed algorithm is defined with respect to the FF bin packing The first proposed algorithm is defined with respect to the FF bin packing heuristics. By using FF heuristics, we assign τk to the lowest indexed processor heuristics. By using FF heuristics, we assign τk to the lowest indexed processor that satisfies the following condition: that satisfies the following condition:

MBT (Γi ∪{τk}) > 0 (9.4) MBT (Γi ∪{τk}) > 0 (9.4)

This condition represents a case when τk can be assigned to Pi without jeop- This condition represents a case when τk can be assigned to Pi without jeopardising the schedulability of any task upon the assignment. In more detail, ardising the schedulability of any task upon the assignment. In more detail, the proposed algorithm (see Algorithm 5) assumes that the initial taskset Γ is the proposed algorithm (see Algorithm 5) assumes that the initial taskset Γ is provided and sorted in the specified order (e.g., priority decreasing, density provided and sorted in the specified order (e.g., priority decreasing, density decreasing, etc.). As the result, the algorithm returns the set Π of ordered pairs decreasing, etc.). As the result, the algorithm returns the set Π of ordered pairs (Pi, Γi) which represents the pairing between the specified processor Pi and (Pi, Γi) which represents the pairing between the specified processor Pi and the set of tasks Γi which are finally assigned to Pi. the set of tasks Γi which are finally assigned to Pi.

Worst Fit Based Algorithm (WF) Worst Fit Based Algorithm (WF)

Instead of assigning τk to the lowest indexed processor that satisﬁes the condi- Instead of assigning τk to the lowest indexed processor that satisﬁes the condition, as performed with FF based algorithm, with WF heuristics we assign τk to tion, as performed with FF based algorithm, with WF heuristics we assign τk to the processor such that the maximum blocking tolerance upon the assignment is the processor such that the maximum blocking tolerance upon the assignment is maximised: maximised: max {MBT (Γi ∪{τk})} (9.5) max {MBT (Γi ∪{τk})} (9.5) 1≤i≤w 1≤i≤w In more detail, the proposed algorithm (see Algorithm 6) assumes that the In more detail, the proposed algorithm (see Algorithm 6) assumes that the taskset is already sorted according to some criteria, and returns Π as the result. taskset is already sorted according to some criteria, and returns Π as the result. 180

150 Chapter 9. Preemption-delay aware partitioning (multi-core) 150 Chapter 9. Preemption-delay aware partitioning (multi-core)

The algorithm iterates over the tasks in a given order and first initialises the The algorithm iterates over the tasks in a given order and first initialises the processor pointer a to zero, prior to each assignment attempt of τk (line 4). Next, processor pointer a to zero, prior to each assignment attempt of τk (line 4). Next, it attempts to assign τk by computing the maximum blocking tolerance upon the it attempts to assign τk by computing the maximum blocking tolerance upon the assignment of τk to each processor (lines 5–11). If upon the assignment there assignment of τk to each processor (lines 5–11). If upon the assignment there exists a processor with the minimal blocking tolerance greater than, or equal to exists a processor with the minimal blocking tolerance greater than, or equal to zero, then the algorithm sets the pointer a to the appropriate index of the proces- zero, then the algorithm sets the pointer a to the appropriate index of the processor (line 8) and assigns τk to Pa (line 15). However, if the proposed condition is sor (line 8) and assigns τk to Pa (line 15). However, if the proposed condition is not satisfied (line 7) then the value of the processor’s pointer remains set to the not satisfied (line 7) then the value of the processor’s pointer remains set to the initial value (zero) which means that the taskset is deemed unschedulable by the initial value (zero) which means that the taskset is deemed unschedulable by the algorithm (lines 12–14). If all tasks are successfully assigned to the appropriate algorithm (lines 12–14). If all tasks are successfully assigned to the appropriate processors, the algorithm returns the set of ordered pairs Π. processors, the algorithm returns the set of ordered pairs Π.

Data: Taskset {τ1,τ2, ..., τn} sorted in a speciﬁed order, set of Data: Taskset {τ1,τ2, ..., τn} sorted in a speciﬁed order, set of processors {P1, P2, ..., Pw} processors {P1, P2, ..., Pw} Result: Set Π of ordered pairs (Pi, Γi) representing the assignment of Result: Set Π of ordered pairs (Pi, Γi) representing the assignment of the subset of tasks Γi to a processor Pi the subset of tasks Γi to a processor Pi

1 Function FFD_partitioning ({τ1, ..., τn}, {P1, ..., Pw}) 1 Function FFD_partitioning ({τ1, ..., τn}, {P1, ..., Pw})

2 ∀i ∈ [1,w]Γi ←∅ 2 ∀i ∈ [1,w]Γi ←∅

3 for (k ← 1 to n) do 3 for (k ← 1 to n) do 4 a ← 0 4 a ← 0 5 for (i ← 1 to w) do 5 for (i ← 1 to w) do

6 if (MBT (Γi ∪{τk}) > 0 ∧ a =0)then 6 if (MBT (Γi ∪{τk}) > 0 ∧ a =0)then

7 Γi ← Γi ∪{τk} 7 Γi ← Γi ∪{τk} 8 a ← i 8 a ← i

9 end 9 end 10 end 10 end 11 if (a =0)then 11 if (a =0)then

12 return unschedulable 12 return unschedulable

13 end 13 end

14 end 14 end 15 Π ←{(Pi, Γi) | 1 ≤ i ≤ w} 15 Π ←{(Pi, Γi) | 1 ≤ i ≤ w} 16 return Π 16 return Π Algorithm 5: FFD partitioning Algorithm 5: FFD partitioning 181

9.4 Partitioned scheduling under LPS-FPP 151 9.4 Partitioned scheduling under LPS-FPP 151

2 ∀i ∈ [1,w]Γi ←∅, 2 ∀i ∈ [1,w]Γi ←∅,

3 for (k ← 1 to n) do 3 for (k ← 1 to n) do

4 a ← 0 4 a ← 0

5 for (i ← 1 to w) do 5 for (i ← 1 to w) do

6 b ← 0 6 b ← 0

7 if (b

8 b ← MBT (Γi ∪{τk}) 8 b ← MBT (Γi ∪{τk}) 9 a ← i 9 a ← i 10 end 10 end 11 end 11 end

12 if (a =0)then 12 if (a =0)then

13 return unschedulable 13 return unschedulable

14 end 14 end

15 Γa ← Γa ∪{τk} 15 Γa ← Γa ∪{τk} 16 end 16 end 17 Π ←{(Pi, Γi) | 1 ≤ i ≤ w} 17 Π ←{(Pi, Γi) | 1 ≤ i ≤ w} 18 return Π 18 return Π Algorithm 6: WFD partitioning Algorithm 6: WFD partitioning

Taskset Ordering Taskset Ordering

There are two main task ordering strategies which are widely used in Fixed There are two main task ordering strategies which are widely used in Fixed Priority partitioning: priority decreasing and density decreasing order. The main Priority partitioning: priority decreasing and density decreasing order. The main reasoning comes from the fact that those two, high priority and high-density reasoning comes from the fact that those two, high priority and high-density tasks, are the ones which mostly affect the schedulability of a taskset in general tasks, are the ones which mostly affect the schedulability of a taskset in general and it is important to assign them in the earlier partitioning stages. Therefore, and it is important to assign them in the earlier partitioning stages. Therefore, we deﬁne and analyse four possible combinations of the bin packing heuristics we deﬁne and analyse four possible combinations of the bin packing heuristics and the taskset orderings: and the taskset orderings: 182

152 Chapter 9. Preemption-delay aware partitioning (multi-core) 152 Chapter 9. Preemption-delay aware partitioning (multi-core)

1. FFDp – FFD based algorithm with the input taskset sorted in a priority 1. FFDp – FFD based algorithm with the input taskset sorted in a priority decreasing ordering. decreasing ordering.

2. FFDd – FFD based algorithm with the input taskset sorted in a density 2. FFDd – FFD based algorithm with the input taskset sorted in a density decreasing ordering. decreasing ordering. 3. WFDp – WFD based algorithm with the input taskset sorted in a priority 3. WFDp – WFD based algorithm with the input taskset sorted in a priority decreasing ordering. decreasing ordering.

4. WFDd – WFD based algorithm with the input taskset sorted in a density 4. WFDd – WFD based algorithm with the input taskset sorted in a density decreasing ordering. decreasing ordering.

Both of the ordering strategies introduce some benefits and drawbacks when Both of the ordering strategies introduce some benefits and drawbacks when applied with the proposed algorithms and they are briefly discussed in the re- applied with the proposed algorithms and they are briefly discussed in the remainder of the section. mainder of the section.

Example: The preemption related delay may increase the peak processing Example: The preemption related delay may increase the peak processing demand of the tasks that are already assigned to the specified processor. In demand of the tasks that are already assigned to the specified processor. In Figure 9.5, we show the utilisations of the tasks assigned to the processor Figure 9.5, we show the utilisations of the tasks assigned to the processor considering different assignment cases. On the left side of the figure, we considering different assignment cases. On the left side of the figure, we show the utilisations prior to the assignment of the task τk, and on the right show the utilisations prior to the assignment of the task τk, and on the right side we show the utilisations computed after the assignment, considering two side we show the utilisations computed after the assignment, considering two different cases: different cases: a) τk is a task with a higher priority than τa and τb, denoted with a) τk is a task with a higher priority than τa and τb, denoted with lp(τk ) ⊇{τa ,τb }, and lp(τk ) ⊇{τa ,τb }, and b) τk is a task with a lower priority than τa and τb, denoted with b) τk is a task with a lower priority than τa and τb, denoted with hp(τk ) ⊇{τa ,τb }. hp(τk ) ⊇{τa ,τb }.

Considering the first scenario (case a from Figure 9.5), τk may increase the Considering the first scenario (case a from Figure 9.5), τk may increase the preemption related delay of the tasks that are already assigned to the specified preemption related delay of the tasks that are already assigned to the specified processor since we need to account for the possible preemption related delays, processor since we need to account for the possible preemption related delays, PO a and POb , caused by preemptions from τk on τa and τb. Therefore, in PO a and POb , caused by preemptions from τk on τa and τb. Therefore, in this case the preemption point selection needs to be computed for τa and τb, this case the preemption point selection needs to be computed for τa and τb, i.e. preemption point selection needs to be recomputed upon each new task i.e. preemption point selection needs to be recomputed upon each new task assignment. assignment. 183

9.4 Partitioned scheduling under LPS-FPP 153 9.4 Partitioned scheduling under LPS-FPP 153

Figure 9.5. Comparison of the inﬂuence of different task ordering strategies on the preemption Figure 9.5. Comparison of the inﬂuence of different task ordering strategies on the preemption related delay introduced upon a task assignment. related delay introduced upon a task assignment.

Priority decreasing ordering reduces the complexity of computing the pre- Priority decreasing ordering reduces the complexity of computing the preemption related delay which needs to be accounted upon each new task assign- emption related delay which needs to be accounted upon each new task assignment. Assuming that we use the priority decreasing assignment order, the tasks ment. Assuming that we use the priority decreasing assignment order, the tasks that are already assigned to the speciﬁed processor Pi cannot be preempted that are already assigned to the speciﬁed processor Pi cannot be preempted by τk upon its assignment. Therefore, their worst-case execution times remain by τk upon its assignment. Therefore, their worst-case execution times remain the same as before to the assignment of τk (case b from Figure 9.5). Thus, we the same as before to the assignment of τk (case b from Figure 9.5). Thus, we account only for the preemption related overhead PO k of τk, considering the account only for the preemption related overhead PO k of τk, considering the possible preemptions from the higher priority tasks τa and τb. Also, the preemp- possible preemptions from the higher priority tasks τa and τb. Also, the preemption point selection needs to be computed only for the task to be assigned. tion point selection needs to be computed only for the task to be assigned.

Density decreasing ordering improves the starting point for the blocking Density decreasing ordering improves the starting point for the blocking tolerance conditions that are defined in the proposed algorithms. This is because tolerance conditions that are defined in the proposed algorithms. This is because density represents the peak processing demand of a single job of a task which density represents the peak processing demand of a single job of a task which is also a blocking tolerance of a task without consideration of the possible is also a blocking tolerance of a task without consideration of the possible higher priority interference. By assigning those tasks first, the partitioning higher priority interference. By assigning those tasks first, the partitioning algorithm may achieve schedulability more often in cases when the density and algorithm may achieve schedulability more often in cases when the density and priority ordering are proportional. However, since the density decreasing taskset priority ordering are proportional. However, since the density decreasing taskset ordering may be opposite to the priority decreasing ordering, in the worst case ordering may be opposite to the priority decreasing ordering, in the worst case the preemption point selection needs to be computed for all tasks upon each new the preemption point selection needs to be computed for all tasks upon each new assignment. This may increase the complexity of the density-based partitioning assignment. This may increase the complexity of the density-based partitioning strategy which is discussed in the following subsection. strategy which is discussed in the following subsection. 184

154 Chapter 9. Preemption-delay aware partitioning (multi-core) 154 Chapter 9. Preemption-delay aware partitioning (multi-core)

Computational Complexity Computational Complexity

Each partitioning starts with sorting the taskset in a given order, whose complex- Each partitioning starts with sorting the taskset in a given order, whose complexity is Θ(n log n). Considering the partitioning algorithms, assuming that we use ity is Θ(n log n). Considering the partitioning algorithms, assuming that we use Bertogna’s selection [62] in Algorithm 4, the complexity of all of them (FFDp, Bertogna’s selection [62] in Algorithm 4, the complexity of all of them (FFDp, WFDp, FFDd, and WFDd)isO(n2 × w × L) where w is the number of WFDp, FFDd, and WFDd)isO(n2 × w × L) where w is the number of available processors. However, for the priority decreasing order, the complexity available processors. However, for the priority decreasing order, the complexity may be reduced since all Qk and qk values computed in Algorithm 4 remain may be reduced since all Qk and qk values computed in Algorithm 4 remain unchanged when the algorithm is called again, so only the kth iteration of the unchanged when the algorithm is called again, so only the kth iteration of the loop is needed to compute the values of the newly added task τk. In this case, loop is needed to compute the values of the newly added task τk. In this case, the complexity of FFDp and WFDp is O(n × w × L). the complexity of FFDp and WFDp is O(n × w × L).

9.5 Evaluation 9.5 Evaluation

In this section, we show and discuss the evaluation results for the proposed task In this section, we show and discuss the evaluation results for the proposed task partitioning algorithms and task sorting combinations. In the ﬁrst experiment partitioning algorithms and task sorting combinations. In the ﬁrst experiment (shown in Section 9.5.1), we compare the partitioning strategies under the (shown in Section 9.5.1), we compare the partitioning strategies under the LPS-FPP approach. Next, we compared the proposed LPS-FPP partitioning LPS-FPP approach. Next, we compared the proposed LPS-FPP partitioning approach with the fully-preemptive and non-preemptive partitioning approaches approach with the fully-preemptive and non-preemptive partitioning approaches (shown in Section 9.5.2). Finally, in the third experiment, we investigate how (shown in Section 9.5.2). Finally, in the third experiment, we investigate how the maximum CRPD affects the partitioning effectiveness of the presented the maximum CRPD affects the partitioning effectiveness of the presented partitioning approaches (shown in Section 9.5.3). partitioning approaches (shown in Section 9.5.3).

9.5.1 A comparison of LPS-FPP partitioning strategies 9.5.1 A comparison of LPS-FPP partitioning strategies

The goal of this experiment was to investigate and compare the partitioning The goal of this experiment was to investigate and compare the partitioning effectiveness of FFDp, FFDd, WFDp, and WFDd versions of the proposed effectiveness of FFDp, FFDd, WFDp, and WFDd versions of the proposed partitioning. Since the task partitioning is an NP-hard problem and the proposed partitioning. Since the task partitioning is an NP-hard problem and the proposed algorithms differ in partitioning strategies, we also investigated the complemen- algorithms differ in partitioning strategies, we also investigated the complemen- tarity of the proposed algorithms by introducing another evaluation category tarity of the proposed algorithms by introducing another evaluation category Combined for which we report a partitioning success whenever any of the Combined for which we report a partitioning success whenever any of the proposed strategies succeed to partition a taskset. To report the partitioning proposed strategies succeed to partition a taskset. To report the partitioning effectiveness we use the schedulability success ratio which is deﬁned as the effectiveness we use the schedulability success ratio which is deﬁned as the proportion of: proportion of: the number of successfully partitioned tasksets the number of successfully partitioned tasksets the total number of tasksets the total number of tasksets 185

9.5 Evaluation 155 9.5 Evaluation 155

Evaluation setup Evaluation setup We generate a number of tasksets considering different system utilisations. In We generate a number of tasksets considering different system utilisations. In the experiment we assume the implicit-deadline task model (Di = Ti) and we the experiment we assume the implicit-deadline task model (Di = Ti) and we use the experiment setup proposed by Kato et al. [73] such that each simulation use the experiment setup proposed by Kato et al. [73] such that each simulation is characterised with the following parameters: is characterised with the following parameters:

Usys – System utilisation, Usys = Utot /m. Usys – System utilisation, Usys = Utot /m.

Umin – Minimum utilisation of a single task. Umin – Minimum utilisation of a single task.

Umax – Maximum utilisation of a single task. Umax – Maximum utilisation of a single task.

This setup tends to investigate the effectiveness of the partitioning approaches This setup tends to investigate the effectiveness of the partitioning approaches considering the tasksets with different combinations of the low and heavy density considering the tasksets with different combinations of the low and heavy density tasks, since it has been shown by Kato et al. [73] that those distributions signif- tasks, since it has been shown by Kato et al. [73] that those distributions significantly affect the partitioning effectiveness. We used the deadline-monotonic icantly affect the partitioning effectiveness. We used the deadline-monotonic priority assignment with the arbitrary priority assignment if the deadlines are priority assignment with the arbitrary priority assignment if the deadlines are equal. equal. For each system utilisation, set every 2% within the range [0.5, 1] we For each system utilisation, set every 2% within the range [0.5, 1] we generate 3000 tasksets. Furthermore, we deﬁne three different evaluation sets generate 3000 tasksets. Furthermore, we deﬁne three different evaluation sets based on the number of processors available for partitioning: m =4, m =8, based on the number of processors available for partitioning: m =4, m =8, and m =16. Therefore, for each taskset we generated a number of task and m =16. Therefore, for each taskset we generated a number of task utilisations such that a single task utilisation Ui is randomly generated within utilisations such that a single task utilisation Ui is randomly generated within the range [Umin ,Umax ], with uniform distribution, until the total utilisation the range [Umin ,Umax ], with uniform distribution, until the total utilisation Utot is close to m × Usys . The utilisation of the last generated task is adjusted Utot is close to m × Usys . The utilisation of the last generated task is adjusted to get the correct system utilisation. We used four different sets for the single to get the correct system utilisation. We used four different sets for the single task utilisation ranges: [0.1, 0.3], [0.1, 0.5], [0.1, 1], and [0.25, 0.75]. The task utilisation ranges: [0.1, 0.3], [0.1, 0.5], [0.1, 1], and [0.25, 0.75]. The execution parameters of the tasks are generated considering the LPS-FPP task execution parameters of the tasks are generated considering the LPS-FPP task setup proposed by Bertogna et al. [62] which is based on the realistic tasks. setup proposed by Bertogna et al. [62] which is based on the realistic tasks. Each task consists of a number of subjobs randomly generated within the Each task consists of a number of subjobs randomly generated within the range [20, 200], with a uniform distribution. The WCET of each subjob is range [20, 200], with a uniform distribution. The WCET of each subjob is generated from the normal Gaussian distribution with the mean μ = 4000 and generated from the normal Gaussian distribution with the mean μ = 4000 and the standard deviation δ = 3000. The worst-case preemption related delay the standard deviation δ = 3000. The worst-case preemption related delay of each preemption point is generated with the distribution equation proposed of each preemption point is generated with the distribution equation proposed by Bertogna et al. [62]. Task period and the relative deadline of each task is by Bertogna et al. [62]. Task period and the relative deadline of each task is calculated by Ti = Di = Ci/Ui. Throughout the evaluation, in the partitioning calculated by Ti = Di = Ci/Ui. Throughout the evaluation, in the partitioning algorithms, we used the preemption point selection algorithm proposed by algorithms, we used the preemption point selection algorithm proposed by Bertogna et al. [62]. Finally, in the experiment we report only the results Bertogna et al. [62]. Finally, in the experiment we report only the results for system utilization in the range [0.6, 1] since the majority of the algorithms for system utilization in the range [0.6, 1] since the majority of the algorithms successfully partition tasksets with Usys < 0.6. successfully partition tasksets with Usys < 0.6. 186

156 Chapter 9. Preemption-delay aware partitioning (multi-core) 156 Chapter 9. Preemption-delay aware partitioning (multi-core)

Evaluation results Evaluation results In the first evaluation, shown in Figure 9.6, we investigated the schedulability In the first evaluation, shown in Figure 9.6, we investigated the schedulability success of the proposed algorithms considering the tasksets consisting of light success of the proposed algorithms considering the tasksets consisting of light tasks, i.e. single task utilisation cannot exceed 0.3. Therefore, the number tasks, i.e. single task utilisation cannot exceed 0.3. Therefore, the number of generated tasks is higher compared to other experiments. In Figure 9.6, of generated tasks is higher compared to other experiments. In Figure 9.6, considering 4 processors, we notice that no partitioning strategy dominates the considering 4 processors, we notice that no partitioning strategy dominates the others. However, the combination of all four strategies achieves considerable others. However, the combination of all four strategies achieves considerable success ratios even for the higher utilisations, e.g., for m =4and Usys =0.86, success ratios even for the higher utilisations, e.g., for m =4and Usys =0.86, FFDd succeeds in partitioning 76% of the tasks, whereas FFDp partitions FFDd succeeds in partitioning 76% of the tasks, whereas FFDp partitions 70%, WFDd 63%, and WFDp 45% of the attempted tasksets. The percentage 70%, WFDd 63%, and WFDp 45% of the attempted tasksets. The percentage when combining the four strategies of the successfully partitioned tasksets, when combining the four strategies of the successfully partitioned tasksets, shown with Combined in the figures, is 95%. The trend is evident even for the shown with Combined in the figures, is 95%. The trend is evident even for the increased number of processors and different system utilisations. This is the increased number of processors and different system utilisations. This is the case because the WFD based algorithms complement the FFD based algorithms case because the WFD based algorithms complement the FFD based algorithms by evenly distributing the light tasks among the processors. Also, as shown in by evenly distributing the light tasks among the processors. Also, as shown in Figure 9.6 (m =16), the priority-based partitioning FFDp often dominates in Figure 9.6 (m =16), the priority-based partitioning FFDp often dominates in cases when the taskset consists of a high number of tasks which may introduce cases when the taskset consists of a high number of tasks which may introduce a significant CRPD upon each new task allocation if the priority decreased a significant CRPD upon each new task allocation if the priority decreased ordering is not used. ordering is not used.

In the second evaluation, shown in Figure 9.7, no task utilisation exceeds In the second evaluation, shown in Figure 9.7, no task utilisation exceeds 0.5. We notice that FFDd dominates the other approaches since it achieves 0.5. We notice that FFDd dominates the other approaches since it achieves considerable success ratios even for the higher utilisations, e.g., for m =8 considerable success ratios even for the higher utilisations, e.g., for m =8 and Usys =0.9, FFDd succeeds in partitioning 51% of the tasks, whereas and Usys =0.9, FFDd succeeds in partitioning 51% of the tasks, whereas FFDp partitions 34%, WFDd23%, and WFDp 5% of the attempted tasksets. FFDp partitions 34%, WFDd23%, and WFDp 5% of the attempted tasksets. The percentage when combining the four strategies, shown with Combined in The percentage when combining the four strategies, shown with Combined in the figures, is 95%. Unlike in the previous evaluation, although the priority- the figures, is 95%. Unlike in the previous evaluation, although the priority- based algorithms complement to the combined results, FFDd achieves greater based algorithms complement to the combined results, FFDd achieves greater schedulability success ratio. This is because FFDd allocates the high-density schedulability success ratio. This is because FFDd allocates the high-density tasks on the first allocatable processors, thus the tasks that may hugely impact tasks on the first allocatable processors, thus the tasks that may hugely impact the schedulability are allocated on the same processors. This leaves other the schedulability are allocated on the same processors. This leaves other processors to be allocated by more low-density tasks since they do not impact processors to be allocated by more low-density tasks since they do not impact on schedulability as much as the high-density tasks do. on schedulability as much as the high-density tasks do. 187

9.5 Evaluation 157 9.5 Evaluation 157

m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100

80 80 80 80 80 80

60 60 60 60 60 60

40 FFDp 40 FFDp 40 FFDp 40 FFDp 40 FFDp 40 FFDp WFDp WFDp WFDp WFDp WFDp WFDp FFDd FFDd FFDd FFDd FFDd FFDd 20 WFDd 20 WFDd 20 WFDd 20 WFDd 20 WFDd 20 WFDd Combined Combined Combined Combined Combined Combined Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio 0 0 0 0 0 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.6. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Figure 9.6. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.3) Umax =0.3) m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100

80 80 80 80 80 80

60 60 60 60 60 60

Figure 9.7. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Figure 9.7. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.5) Umax =0.5) m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100

80 80 80 80 80 80

60 60 60 60 60 60

40 FFDp 40 FFDp 40 FFDp 40 FFDp 40 FFDp 40 FFDp WFDp WFDp WFDp WFDp WFDp WFDp FFDd FFDd FFDd FFDd FFDd FFDd 20 WFDd 20 WFDd 20 WFDd 20 WFDd 20 WFDd 20 WFDd

Schedulability success ratio Combined Schedulability success ratio Combined Schedulability success ratio Combined Schedulability success ratio Combined Schedulability success ratio Combined Schedulability success ratio Combined 0 0 0 0 0 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.8. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Figure 9.8. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =1) Umax =1) m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100

80 80 80 80 80 80

60 60 60 60 60 60

40 40 40 40 40 40 FFDp FFDp FFDp FFDp FFDp FFDp WFDp WFDp WFDp WFDp WFDp WFDp FFDd FFDd FFDd FFDd 20 20 20 FFDd 20 20 20 FFDd WFDd WFDd WFDd WFDd WFDd WFDd Schedulability success ratio Combined Schedulability success ratio Combined Schedulability success ratio Schedulability success ratio Combined Schedulability success ratio Combined Schedulability success ratio Combined Combined 0 0 0 0 0 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.9. Schedulability success ratio as the function of system utilisation (Umin =0.25 and Figure 9.9. Schedulability success ratio as the function of system utilisation (Umin =0.25 and Umax =0.75) Umax =0.75) 188

158 Chapter 9. Preemption-delay aware partitioning (multi-core) 158 Chapter 9. Preemption-delay aware partitioning (multi-core)

In the third evaluation (see Figure 9.8), we increased the range of possible In the third evaluation (see Figure 9.8), we increased the range of possible task utilisations, from light tasks (Umin =0.1) to heavy tasks (Umin =1). task utilisations, from light tasks (Umin =0.1) to heavy tasks (Umin =1). All of the proposed algorithms successfully partition tasksets when the system All of the proposed algorithms successfully partition tasksets when the system utilisation is less than 0.5. With the further increase of Usys , the schedulability utilisation is less than 0.5. With the further increase of Usys , the schedulability success ratio of the WFDp algorithm significantly drops compared to the other success ratio of the WFDp algorithm significantly drops compared to the other proposed algorithms, which is especially evident with the increased number of proposed algorithms, which is especially evident with the increased number of the processors. This is the case because WFDp may first partition the tasks with the processors. This is the case because WFDp may first partition the tasks with low utilisations since the task priority and the density order may be reversed. low utilisations since the task priority and the density order may be reversed. Therefore, the tasks with the high densities (equal to the task utilisation in case Therefore, the tasks with the high densities (equal to the task utilisation in case of implicit deadline systems that we evaluate) may be left for the assignment at of implicit deadline systems that we evaluate) may be left for the assignment at the later stages, thus increasing the possibility of jeopardising the schedulability the later stages, thus increasing the possibility of jeopardising the schedulability of the tasks that are already assigned to the processors. Consequently, density of the tasks that are already assigned to the processors. Consequently, density decreasing based algorithms (FFDd and WFDd) outperform priority decreas- decreasing based algorithms (FFDd and WFDd) outperform priority decreasing based algorithms, e.g., for m =16and Usys =0.88, the schedulability ing based algorithms, e.g., for m =16and Usys =0.88, the schedulability success ratio for the FFDd is 64% and for the WFDd is 50%, however for the success ratio for the FFDd is 64% and for the WFDd is 50%, however for the FFDp it is equal to 43% and only 8% for the WFDp. The combination of the FFDp it is equal to 43% and only 8% for the WFDp. The combination of the algorithm, in this case, has a success ratio of 64%, since the majority of the algorithm, in this case, has a success ratio of 64%, since the majority of the successful partitionings come from FFDd. successful partitionings come from FFDd.

In the fourth evaluation (see Figure 9.9) we decreased the utilisation range In the fourth evaluation (see Figure 9.9) we decreased the utilisation range defining the light and heavy tasks (Umin =0.25 and Umax =0.75). The defining the light and heavy tasks (Umin =0.25 and Umax =0.75). The results show a drop in the average success ratio as compared to the previous results show a drop in the average success ratio as compared to the previous experiment. This is the case because of the increased minimal task utilisations experiment. This is the case because of the increased minimal task utilisations which deteriorates the task fitting in the processor, i.e. the larger the minimal which deteriorates the task fitting in the processor, i.e. the larger the minimal utilisations of the tasks are, the fewer fitting combinations are possible. As in utilisations of the tasks are, the fewer fitting combinations are possible. As in the previous case, we notice that the density-based algorithms dominate the the previous case, we notice that the density-based algorithms dominate the algorithms based on the priority decreasing ordering. algorithms based on the priority decreasing ordering.

Conclusions Conclusions

Finally, we may conclude that the combination of all of the proposed partitioning Finally, we may conclude that the combination of all of the proposed partitioning strategies may signiﬁcantly increase the partitioning success ratios, especially strategies may signiﬁcantly increase the partitioning success ratios, especially in cases when tasksets do not primarily consist of high-density tasks, i.e. utilisa- in cases when tasksets do not primarily consist of high-density tasks, i.e. utilisations above 0.5. In this case, WFD increases the combined success ratio since tions above 0.5. In this case, WFD increases the combined success ratio since the tasks may be evenly distributed among the cores. Also, the FFDp algorithm, the tasks may be evenly distributed among the cores. Also, the FFDp algorithm, which is the least complex among the proposed ones, achieves considerable which is the least complex among the proposed ones, achieves considerable success ratios even at the higher system utilisations and with a higher number of success ratios even at the higher system utilisations and with a higher number of tasks. Therefore, the schedulability success ratios may be further increased by tasks. Therefore, the schedulability success ratios may be further increased by integrating FFDp with a more complex preemption point selection algorithms, integrating FFDp with a more complex preemption point selection algorithms, 189

9.5 Evaluation 159 9.5 Evaluation 159 e.g., the one considering evicting cache blocks of the preempting tasks. For a e.g., the one considering evicting cache blocks of the preempting tasks. For a taskset consisting of high-density tasks, FFDd outperforms other partitioning taskset consisting of high-density tasks, FFDd outperforms other partitioning strategies on average. However, a taskset consisting of a higher number of tasks strategies on average. However, a taskset consisting of a higher number of tasks whose partitioning consequently introduces a larger number of possible preemp- whose partitioning consequently introduces a larger number of possible preemptions and CRPD may benefit from the priority decreasing ordering, especially tions and CRPD may benefit from the priority decreasing ordering, especially the FFDp version, as seen in Figures 9.6 and 9.7. Since the computational the FFDp version, as seen in Figures 9.6 and 9.7. Since the computational complexity of the proposed approaches does not exceed O(n2 × m × M) (see complexity of the proposed approaches does not exceed O(n2 × m × M) (see Section 18), and since the different orderings and bin-packing allocations have Section 18), and since the different orderings and bin-packing allocations have benefits in specific cases, they should be used in combination, instead of restrict- benefits in specific cases, they should be used in combination, instead of restrict- ing to a single partitioning strategy. Finally, we notice that WFDp performs ing to a single partitioning strategy. Finally, we notice that WFDp performs worst regardless of the evaluation setup used in the experiment. This is the case worst regardless of the evaluation setup used in the experiment. This is the case because the worst fit decreasing heuristics combined with the priority decreasing because the worst fit decreasing heuristics combined with the priority decreasing ordering tends to put the tasks with lowest periods (i.e. highest frequencies) to ordering tends to put the tasks with lowest periods (i.e. highest frequencies) to different processors. By doing so, the lower priority tasks coming for the alloca- different processors. By doing so, the lower priority tasks coming for the allocation are most likely to be divided in many non-preemptive regions in order not tion are most likely to be divided in many non-preemptive regions in order not to jeopardise the schedulability of such tasks, thus producing preemption costs to jeopardise the schedulability of such tasks, thus producing preemption costs which furthermore increases the utilisation upon allocation. On the other side, which furthermore increases the utilisation upon allocation. On the other side, first fit heuristics tends to group high-frequency tasks on the same processor, first fit heuristics tends to group high-frequency tasks on the same processor, thus localising the negative impact from such tasks which causes selection of thus localising the negative impact from such tasks which causes selection of more preemption points on the lower priority ones. more preemption points on the lower priority ones.

Relative contribution to the combined approach Relative contribution to the combined approach We also investigated the relative contribution of the various partitioning strate- We also investigated the relative contribution of the various partitioning strategies to the combined approach. We selected the utilisation of 0.86 from Figure gies to the combined approach. We selected the utilisation of 0.86 from Figure 9.6 (m =4, Umin =0.1, Umax =0.3) and the utilisation of 0.9 from Figure 9.6 (m =4, Umin =0.1, Umax =0.3) and the utilisation of 0.9 from Figure 9.7 (m =8, Umin =0.1, Umax =0.5), since at these two points the con- 9.7 (m =8, Umin =0.1, Umax =0.5), since at these two points the contribution to the Combined approach is not clearly dominated by any of the tribution to the Combined approach is not clearly dominated by any of the evaluated strategies. evaluated strategies. For the ﬁrst setup, see Figure 9.10 we notice that all of the partitioning For the ﬁrst setup, see Figure 9.10 we notice that all of the partitioning strategies contribute individually to the Combined approach, FFDp with 6.4%, strategies contribute individually to the Combined approach, FFDp with 6.4%, WFDp with 1.2%, FFDd with 5.8%, and WFDd with 2.5%. The individual WFDp with 1.2%, FFDd with 5.8%, and WFDd with 2.5%. The individual contribution of the priority-based strategies is 9,1%, while for the density-based contribution of the priority-based strategies is 9,1%, while for the density-based strategies it is 47,5%. The individual contribution of the FFD based strategies is strategies it is 47,5%. The individual contribution of the FFD based strategies is 21.7%, and 5% for the WFD strategies. 21.7%, and 5% for the WFD strategies. For the second setup, see Figure 9.11 we notice that all of the partitioning For the second setup, see Figure 9.11 we notice that all of the partitioning strategies contribute individually to the Combined approach, FFDp with 12.1%, strategies contribute individually to the Combined approach, FFDp with 12.1%, WFDp with 1.1%, FFDd with 30.1%, and WFDd with 8%. The individual WFDp with 1.1%, FFDd with 30.1%, and WFDd with 8%. The individual contribution of the priority-based strategies is 13.7%, while for the density- contribution of the priority-based strategies is 13.7%, while for the density- 190

160 Chapter 9. Preemption-delay aware partitioning (multi-core) 160 Chapter 9. Preemption-delay aware partitioning (multi-core) based strategies it is 14,9%. The individual contribution of the FFD based based strategies it is 14,9%. The individual contribution of the FFD based strategies is 61.7%, while for the WFD strategies it is 9,4%. strategies is 61.7%, while for the WFD strategies it is 9,4%.

159 159

Figure 9.10. Relative contribution to the combined approach (m =4, Umin =0.1, Umax =0.3, Figure 9.10. Relative contribution to the combined approach (m =4, Umin =0.1, Umax =0.3, Usys =0.86) Usys =0.86)

924 924

Figure 9.11. Relative contribution to the combined approach (m =8, Umin =0.1, Umax =0.5, Figure 9.11. Relative contribution to the combined approach (m =8, Umin =0.1, Umax =0.5, Usys =0.9) Usys =0.9)

This investigation shows that the combined use of the partitioning strategies This investigation shows that the combined use of the partitioning strategies 191

9.5 Evaluation 161 9.5 Evaluation 161 can be beneﬁcial for the tasksets consisting of the low-utilisation tasks since can be beneﬁcial for the tasksets consisting of the low-utilisation tasks since any of the proposed partitioning strategies may contribute to the schedulable any of the proposed partitioning strategies may contribute to the schedulable allocation. allocation.

9.5.2 A comparison between LP-FPP, NP, and FP scheduling 9.5.2 A comparison between LP-FPP, NP, and FP scheduling In this experiment, we compare the partitioning strategies under Fixed Pre- In this experiment, we compare the partitioning strategies under Fixed Pre- emption Points approach (LP-FPP) against Fully-Preemptive (FP) and Non- emption Points approach (LP-FPP) against Fully-Preemptive (FP) and Non- Preemptive (NP) scheduling. Preemptive (NP) scheduling.

For the fully-preemptive schedulability test, we used the following ﬁxed For the fully-preemptive schedulability test, we used the following ﬁxed point equation for computing the response time Ri of a task τi, which is similar point equation for computing the response time Ri of a task τi, which is similar to the schedulability test used by Starke et al. [77]: to the schedulability test used by Starke et al. [77]: k k k+1 Ri max k+1 Ri max Ri = Ci + × (Ch + ξi ) Ri = Ci + × (Ch + ξi ) Th Th h∈hp(i) h∈hp(i) max 0 max 0 where ξi is the maximum preemption cost in τi and using Ri = Ci as the where ξi is the maximum preemption cost in τi and using Ri = Ci as the start value. start value.

We investigate the partitioning effectiveness, reporting a partitioning success We investigate the partitioning effectiveness, reporting a partitioning success whenever any of the proposed strategies (FFDp, FFDd, WFDp,orWFDd) whenever any of the proposed strategies (FFDp, FFDd, WFDp,orWFDd) succeeds to partition a taskset under the given scheduling approach. Thus, succeeds to partition a taskset under the given scheduling approach. Thus, LP-FPP corresponds to Combined in the previous experiment. LP-FPP corresponds to Combined in the previous experiment.

Evaluation setup Evaluation setup We use the same evaluation setup as the one described in Section 9.5.1. We use the same evaluation setup as the one described in Section 9.5.1. 192

162 Chapter 9. Preemption-delay aware partitioning (multi-core) 162 Chapter 9. Preemption-delay aware partitioning (multi-core)

m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100

80 80 80 80 80 80

60 60 60 60 60 60

40 40 40 40 40 40

NP NP NP NP NP NP 20 FP 20 FP 20 FP 20 FP 20 FP 20 FP LP-FPP LP-FPP Schedulability success ratio LP-FPP Schedulability success ratio LP-FPP Schedulability success ratio Schedulability success ratio LP-FPP Schedulability success ratio LP-FPP Schedulability success ratio 0 0 0 0 0 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.12. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Figure 9.12. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.3) Umax =0.3) m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 50 100 100 50

80 80 40 80 80 40

60 60 30 60 60 30

40 40 20 40 40 20 NP NP NP NP NP NP FP FP 20 20 FP 10 FP 20 20 FP 10 FP LP-FPP LP-FPP LP-FPP LP-FPP LP-FPP LP-FPP

Schedulability success ratio 0 Schedulability success ratio 0 Schedulability success ratio 0 Schedulability success ratio 0 Schedulability success ratio 0 Schedulability success ratio 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.13. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Figure 9.13. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =0.5) Umax =0.5) m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100

80 80 80 80 80 80

60 60 60 60 60 60

40 40 40 40 40 40

NP NP NP NP NP NP 20 FP 20 FP 20 FP 20 FP 20 FP 20 FP LP-FPP LP-FPP LP-FPP LP-FPP LP-FPP LP-FPP Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio 0 0 0 0 0 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.14. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Figure 9.14. Schedulability success ratio as the function of system utilisation (Umin =0.1 and Umax =1) Umax =1) m = 4 m = 8 m = 16 m = 4 m = 8 m = 16 100 100 100 100 100 100 NP NP NP NP NP NP FP FP FP FP FP FP 80 80 80 80 80 80 LP-FPP LP-FPP LP-FPP LP-FPP LP-FPP LP-FPP

60 60 60 60 60 60

40 40 40 40 40 40

20 20 20 20 20 20 Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio Schedulability success ratio 0 0 0 0 0 0 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9 System utilisation System utilisation System utilisation System utilisation System utilisation System utilisation

Figure 9.15. Schedulability success ratio as the function of system utilisation (Umin =0.25 and Figure 9.15. Schedulability success ratio as the function of system utilisation (Umin =0.25 and Umax =0.75) Umax =0.75) 193

9.5 Evaluation 163 9.5 Evaluation 163

Evaluation results Evaluation results In the ﬁrst evaluation, shown in Figure 9.12 we compared the schedulabil- In the ﬁrst evaluation, shown in Figure 9.12 we compared the schedulability success ratios of the three partitioning approaches considering light tasks, ity success ratios of the three partitioning approaches considering light tasks, Umax =0.3. Since the number of generated tasks, in this case, is high, the Umax =0.3. Since the number of generated tasks, in this case, is high, the number of possible preemptions causes the FPS-based partitioning to result in number of possible preemptions causes the FPS-based partitioning to result in the lowest schedulability success ratios. On the other side, NPS-based partition- the lowest schedulability success ratios. On the other side, NPS-based partitioning successfully partitions a higher number of tasksets, since it does not cause ing successfully partitions a higher number of tasksets, since it does not cause the CRPD. NPS-based partitioning dominates even more with the increased the CRPD. NPS-based partitioning dominates even more with the increased number of processors since in this case the number of tasks is higher and it leads number of processors since in this case the number of tasks is higher and it leads to the larger number of possible preemptions, i.e., to the higher accumulative to the larger number of possible preemptions, i.e., to the higher accumulative CRPD introduced in case of FPS-based partitioning. However, LP-FPP-based CRPD introduced in case of FPS-based partitioning. However, LP-FPP-based partitioning always succeeds to partition the highest number of tasksets. partitioning always succeeds to partition the highest number of tasksets. The second evaluation (Figure 9.13) shows that with further increase in the The second evaluation (Figure 9.13) shows that with further increase in the maximum utilisation (Umax =0.5), which decreases the number of tasks in the maximum utilisation (Umax =0.5), which decreases the number of tasks in the processors and the possible preemptions compared to the previous evaluation, processors and the possible preemptions compared to the previous evaluation, the difference among the partitioning effectiveness of FPS-based and NPS-based the difference among the partitioning effectiveness of FPS-based and NPS-based partitionings is decreased. While FPS successfully partitions larger number partitionings is decreased. While FPS successfully partitions larger number of tasksets when the number of available processors is m =4, the dominance of tasksets when the number of available processors is m =4, the dominance is shifted to NPS at higher utilisations, when m =8, moreover NPS-based is shifted to NPS at higher utilisations, when m =8, moreover NPS-based partitioning completely dominates when m =16. However, LP-FPP-based partitioning completely dominates when m =16. However, LP-FPP-based partitioning provides the highest schedulability success ratios in all of the partitioning provides the highest schedulability success ratios in all of the considered cases. considered cases. The third evaluation (Figure 9.14) shows that the LP-FPP-based partitioning The third evaluation (Figure 9.14) shows that the LP-FPP-based partitioning provides better results compared to the other two partitioning approaches even provides better results compared to the other two partitioning approaches even when the possible utilisation distribution (Umin =0.1 and Umax =1) among when the possible utilisation distribution (Umin =0.1 and Umax =1) among the tasks includes a wider range of low and heavy utilisation tasks. the tasks includes a wider range of low and heavy utilisation tasks. In the fourth evaluation, shown in Figure 9.15, we decreased the distribution In the fourth evaluation, shown in Figure 9.15, we decreased the distribution range [Umin =0.25, Umin =0.75] between the low utilisation and heavy range [Umin =0.25, Umin =0.75] between the low utilisation and heavy utilisation tasks. The results show the lowest schedulability success ratios utilisation tasks. The results show the lowest schedulability success ratios compared to the above-presented evaluations since the minimum task utilisation compared to the above-presented evaluations since the minimum task utilisation (i.e. the minimum object size in the bin packing problem) is considerably high (i.e. the minimum object size in the bin packing problem) is considerably high with respect to the partitioning problem in general. However, the LP-FPP-based with respect to the partitioning problem in general. However, the LP-FPP-based partitioning still dominates the other two partitioning approaches. partitioning still dominates the other two partitioning approaches.

Conclusions Conclusions In this experiment, we showed that the proposed partitioning strategies based In this experiment, we showed that the proposed partitioning strategies based on Fixed Preemption Point Scheduling outperform Fully Preemptive and Non- on Fixed Preemption Point Scheduling outperform Fully Preemptive and Non- Preemptive scheduling in terms of successful partitioning. This is the case since Preemptive scheduling in terms of successful partitioning. This is the case since 194

164 Chapter 9. Preemption-delay aware partitioning (multi-core) 164 Chapter 9. Preemption-delay aware partitioning (multi-core) even in the uniprocessor terms, Fixed Preemption Point Scheduling may be seen even in the uniprocessor terms, Fixed Preemption Point Scheduling may be seen as a superset of the Fully-Preemptive and Non-Preemptive scheduling, therefore as a superset of the Fully-Preemptive and Non-Preemptive scheduling, therefore being able to improve the schedulability results and the utilisation bounds, as being able to improve the schedulability results and the utilisation bounds, as shown by Buttazzo et al. [12]. This effect is also evident when the multi-core shown by Buttazzo et al. [12]. This effect is also evident when the multi-core partitioning is applied for those scheduling algorithms, as shown in the above partitioning is applied for those scheduling algorithms, as shown in the above evaluation. evaluation.

9.5.3 Effect of preemption delay on partitioning strategies 9.5.3 Effect of preemption delay on partitioning strategies

In this experiment, we evaluated the impact of the maximum CRPD on the In this experiment, we evaluated the impact of the maximum CRPD on the effectiveness of the previously evaluated partitioning strategies. effectiveness of the previously evaluated partitioning strategies.

Evaluation setup Evaluation setup

For different maximum CRPD values, set every 10 Kcycles within the range [20 For different maximum CRPD values, set every 10 Kcycles within the range [20 Kcycles, 70 Kcycles], we generate 2000 tasksets. The setup for the generation Kcycles, 70 Kcycles], we generate 2000 tasksets. The setup for the generation of the individual task parameters is the same as in the previous two experiments, of the individual task parameters is the same as in the previous two experiments, whereas the system setup is characterised with: m =8, Umin =0.1, Umax =1, whereas the system setup is characterised with: m =8, Umin =0.1, Umax =1, corresponding to the middle diagram in Fig. 9.8 and Fig. 9.14, and with Usys = corresponding to the middle diagram in Fig. 9.8 and Fig. 9.14, and with Usys = 0.84 because it represents a point where none of the approaches successfully 0.84 because it represents a point where none of the approaches successfully partitions with 100% ratio. partitions with 100% ratio.

80 80

70 70

60 60

50 50 FFDp FFDp 40 WFDp 40 WFDp FFDd FFDd WFDd WFDd 30 Combined 30 Combined Schedulability success ratio Schedulability success ratio

20 20 20 25 30 35 40 45 50 55 60 65 70 20 25 30 35 40 45 50 55 60 65 70 Maximum CRPD (Kcycles) Maximum CRPD (Kcycles)

Figure 9.16. Schedulability success ratio as a function of the maximum CRPD for the Fixed Figure 9.16. Schedulability success ratio as a function of the maximum CRPD for the Fixed Preemption Point Scheduling partitioning strategies. Preemption Point Scheduling partitioning strategies. 195

9.5 Evaluation 165 9.5 Evaluation 165

Evaluation results Evaluation results

In the ﬁrst evaluation, shown in Figure 9.16, we compared the partitioning In the ﬁrst evaluation, shown in Figure 9.16, we compared the partitioning strategies (FFDp, FFDd, WFDp, WFDd and Combined) under Fixed Pre- strategies (FFDp, FFDd, WFDp, WFDd and Combined) under Fixed Pre- emption Point scheduling. The results show that the partitioning effectiveness emption Point scheduling. The results show that the partitioning effectiveness of all partitioning strategies drops linearly, with almost the same rate for all of of all partitioning strategies drops linearly, with almost the same rate for all of the proposed strategies. However, we notice that the partitioning effectiveness the proposed strategies. However, we notice that the partitioning effectiveness for the WFDd drops faster than the FFDp, since the partitioning according to for the WFDd drops faster than the FFDp, since the partitioning according to the priority decreasing ordering does not add the CRPD to all of the tasks that the priority decreasing ordering does not add the CRPD to all of the tasks that are already allocated to the processor, unlike the density decreasing order. are already allocated to the processor, unlike the density decreasing order.

80 80

70 70

60 60

50 50

40 40

30 NP 30 NP FP FP 20 LP-FPP 20 LP-FPP

Schedulability success ratio 10 Schedulability success ratio 10

0 0 20 25 30 35 40 45 50 55 60 65 70 20 25 30 35 40 45 50 55 60 65 70 Maximum CRPD (Kcycles) Maximum CRPD (Kcycles)

Figure 9.17. Schedulability success ratio as a function of the maximum CRPD for the Figure 9.17. Schedulability success ratio as a function of the maximum CRPD for the non-preemptive, fully-preemptive and ﬁxed preemption point partitioned scheduling. non-preemptive, fully-preemptive and ﬁxed preemption point partitioned scheduling.

In the second evaluation, shown in Figure 9.17, we compared the partitioning In the second evaluation, shown in Figure 9.17, we compared the partitioning strategies under the Fully-Preemptive FPS, Non-Preemptive NPS and Fixed strategies under the Fully-Preemptive FPS, Non-Preemptive NPS and Fixed Preemption Points LP-FPP scheduling. Same as in the second experiment, we Preemption Points LP-FPP scheduling. Same as in the second experiment, we report a partitioning success whenever any of the proposed strategies (FFDp, report a partitioning success whenever any of the proposed strategies (FFDp, FFDd, WFDp,orWFDd) succeeds to partition a taskset under the given FFDd, WFDp,orWFDd) succeeds to partition a taskset under the given scheduling approach. The results show that the effectiveness of the FPS-based scheduling approach. The results show that the effectiveness of the FPS-based partitioning drops linearly with the increase of the maximum CRPD, whereas partitioning drops linearly with the increase of the maximum CRPD, whereas the effectiveness of the LP-FPP-based partitioning drops with approximately the effectiveness of the LP-FPP-based partitioning drops with approximately one-third of the FPS rate. As expected, NPS-based partitioning does not change one-third of the FPS rate. As expected, NPS-based partitioning does not change since there are no preemptions to be accounted for, and its partitioning success since there are no preemptions to be accounted for, and its partitioning success ratio is 8% throughout the whole evaluation. ratio is 8% throughout the whole evaluation. 196

166 Chapter 9. Preemption-delay aware partitioning (multi-core) 166 Chapter 9. Preemption-delay aware partitioning (multi-core)

Conclusions Conclusions In this experiment, we showed that the maximum CRPD size affects the effec- In this experiment, we showed that the maximum CRPD size affects the effectiveness of all partitioning strategies that account for the preemptions (Fully tiveness of all partitioning strategies that account for the preemptions (Fully Preemptive and Fixed Preemption Point Scheduling). However, their effective- Preemptive and Fixed Preemption Point Scheduling). However, their effectiveness drops with different rates, since large CRPD has a signiﬁcantly bigger ness drops with different rates, since large CRPD has a signiﬁcantly bigger impact on Fully Preemptive than on Fixed Preemption Point Scheduling, whose impact on Fully Preemptive than on Fixed Preemption Point Scheduling, whose effectiveness drops relatively slow. effectiveness drops relatively slow.

9.6 Summary and Conclusions 9.6 Summary and Conclusions

In this chapter, we proposed a new joint approach for task partitioning and In this chapter, we proposed a new joint approach for task partitioning and preemption point selection for Fixed Preemption Point Scheduling in multi-core preemption point selection for Fixed Preemption Point Scheduling in multi-core systems. We also investigated the approach’s performance in the context of systems. We also investigated the approach’s performance in the context of different bin packing heuristics, such as First Fit Decreasing (FFD) and Worst different bin packing heuristics, such as First Fit Decreasing (FFD) and Worst Fit Decreasing (WFD), together with the density- and priority-based taskset Fit Decreasing (WFD), together with the density- and priority-based taskset orderings. The evaluation performed on randomly generated tasksets showed orderings. The evaluation performed on randomly generated tasksets showed that in the general case, no single partitioning strategy fully dominates the others. that in the general case, no single partitioning strategy fully dominates the others. However, the evaluation results reveal that certain partitioning strategies perform However, the evaluation results reveal that certain partitioning strategies perform significantly better concerning the overall schedulability for specific taskset significantly better concerning the overall schedulability for specific taskset characteristics. While the density-based partitioning increases the schedulability characteristics. While the density-based partitioning increases the schedulability of the tasksets consisting of high-density tasks, priority-based partitioning of the tasksets consisting of high-density tasks, priority-based partitioning increases the schedulability of the tasksets with a higher number of tasks and increases the schedulability of the tasksets with a higher number of tasks and lower average utilisations. On average, FFD-based heuristics dominates the lower average utilisations. On average, FFD-based heuristics dominates the WFD heuristics, however, WFD increases the schedulability of the tasksets with WFD heuristics, however, WFD increases the schedulability of the tasksets with lower average utilisations, since the tasks may be evenly distributed among the lower average utilisations, since the tasks may be evenly distributed among the cores. The results also reveal that the proposed partitioning strategies dominate cores. The results also reveal that the proposed partitioning strategies dominate over Fully Preemptive and Non-Preemptive partitioned scheduling. Moreover, over Fully Preemptive and Non-Preemptive partitioned scheduling. Moreover, the proposed approach is affected significantly less by an increase of the CRPD the proposed approach is affected significantly less by an increase of the CRPD compared to Fully-Preemptive partitioned scheduling. compared to Fully-Preemptive partitioned scheduling. 197

Chapter 10 Chapter 10

Conclusions Conclusions

In this thesis, we identified several problems in SOTA of timing and scheduling In this thesis, we identified several problems in SOTA of timing and scheduling analysis of real-time systems under preemptive scheduling, and proposed so- analysis of real-time systems under preemptive scheduling, and proposed solutions that contribute to the overall thesis goal of improving the accuracy of lutions that contribute to the overall thesis goal of improving the accuracy of schedulability analyses to identify schedulable tasksets in such systems. schedulability analyses to identify schedulable tasksets in such systems. For single-core real-time systems employing tasks with fixed preemption For single-core real-time systems employing tasks with fixed preemption points, we discovered that the existing analysis over-approximate preemption points, we discovered that the existing analysis over-approximate preemption delay bounds by including the infeasible preemptions and cache block reloads delay bounds by including the infeasible preemptions and cache block reloads in the final delay approximations. Based on those observations, we proposed a in the final delay approximations. Based on those observations, we proposed a preemption delay bound analysis and a response-time analysis which resolve the preemption delay bound analysis and a response-time analysis which resolve the identified problems. The evaluation shows that the proposed analyses provide identified problems. The evaluation shows that the proposed analyses provide more accurate approximations of preemption delays than the SOTA methods, more accurate approximations of preemption delays than the SOTA methods, which propagates to more accurate response-time approximations, and final which propagates to more accurate response-time approximations, and final schedulability verdicts. schedulability verdicts. For single-core real-time systems employing fully-preemptive tasks, we For single-core real-time systems employing fully-preemptive tasks, we discovered that the existing analyses over-approximate preemption delay bounds discovered that the existing analyses over-approximate preemption delay bounds by treating all preemptions which may occur within some specified duration with by treating all preemptions which may occur within some specified duration with one method at a time. We proposed a solution that partitions the preemptions in one method at a time. We proposed a solution that partitions the preemptions in different preemption groups, allowing the CRPD approximations of different different preemption groups, allowing the CRPD approximations of different groups to benefit from different approximation methods. Evaluation results groups to benefit from different approximation methods. Evaluation results show that this is indeed the case compared to the SOTA approaches. show that this is indeed the case compared to the SOTA approaches. For multi-core real-time systems employing sequential tasks, we proposed For multi-core real-time systems employing sequential tasks, we proposed a joint approach of preemption point selection and task-to–core allocation, a joint approach of preemption point selection and task-to–core allocation, and we propose a partitioning criterion for the Worst-Fit Decreasing parti- and we propose a partitioning criterion for the Worst-Fit Decreasing partitioning heuristics, based on the computation of maximum blocking tolerance tioning heuristics, based on the computation of maximum blocking tolerance

167 167 198

168 Chapter 10. Conclusions 168 Chapter 10. Conclusions accounting for preemption delays. The motivation for this comes from the fact accounting for preemption delays. The motivation for this comes from the fact that in the context of single-core systems, it has been shown that preemption that in the context of single-core systems, it has been shown that preemption point selection may improve schedulability compared to non-preemptive and point selection may improve schedulability compared to non-preemptive and fully-preemptive scheduling, but in the context of the partitioned multi-core fully-preemptive scheduling, but in the context of the partitioned multi-core system, task-to–core allocation may also affect the schedulability. We also system, task-to–core allocation may also affect the schedulability. We also contributed with a comparison of the different partitioning strategies and show contributed with a comparison of the different partitioning strategies and show the schedulability improvements with the proposed approach under different the schedulability improvements with the proposed approach under different taskset characteristics. taskset characteristics. 199

Chapter 11 Chapter 11

Future work Future work

For future work, we envision several potential research directions, and in this For future work, we envision several potential research directions, and in this section, we describe the most important ones. section, we describe the most important ones.

11.1 Static code analysis for reload bounds 11.1 Static code analysis for reload bounds

The first direction addresses the possible improvements in CRPD approximation The first direction addresses the possible improvements in CRPD approximation using static code analysis. In SOTA of timing analysis of fully-preemptive tasks, using static code analysis. In SOTA of timing analysis of fully-preemptive tasks, useful cache blocks are still a dominant concept that is used for approxima- useful cache blocks are still a dominant concept that is used for approximation of CRPD. As we showed in the context of tasks with fixed preemption tion of CRPD. As we showed in the context of tasks with fixed preemption points, this often leads to a significant delay over-approximation that also affects points, this often leads to a significant delay over-approximation that also affects schedulability verdicts. This problem remains in the fully-preemptive context, schedulability verdicts. This problem remains in the fully-preemptive context, and we show it in the following example. and we show it in the following example. In Figure 11.1, we illustrate the control flow graph of the task consisting of In Figure 11.1, we illustrate the control flow graph of the task consisting of five basic blocks. Throughout the execution, instructions contained within the five basic blocks. Throughout the execution, instructions contained within the basic blocks access cache blocks a, b, c and d. Also, for each of the five basic basic blocks access cache blocks a, b, c and d. Also, for each of the five basic blocks, we assume that there is an upper bound on the number of times it can be blocks, we assume that there is an upper bound on the number of times it can be executed. Thus, basic block BB 1 accesses cache block a and can be executed executed. Thus, basic block BB 1 accesses cache block a and can be executed at most once, denoted with MA(BB 1)=1. Basic block BB 2 accesses cache at most once, denoted with MA(BB 1)=1. Basic block BB 2 accesses cache blocks b and c and can be executed at most four times, i.e. MA(BB 2)=4, etc. blocks b and c and can be executed at most four times, i.e. MA(BB 2)=4, etc. If we assume that a single job of the illustrated task can be preempted k If we assume that a single job of the illustrated task can be preempted k times from jobs with evicting cache blocks a, b, c, and d, the existing analyses times from jobs with evicting cache blocks a, b, c, and d, the existing analyses would first determine that useful cache blocks of the illustrated task are a and c. would first determine that useful cache blocks of the illustrated task are a and c. And then would account for k cache block reloads per each of the two cache And then would account for k cache block reloads per each of the two cache

169 169 200

170 Chapter 11. Future work 170 Chapter 11. Future work

Figure 11.1. Example of over-approximation of cache-block reloads in the conditional execution Figure 11.1. Example of over-approximation of cache-block reloads in the conditional execution flow. flow. blocks. In reality, cache block a can be reloaded at most once, and cache block blocks. In reality, cache block a can be reloaded at most once, and cache block c can be reloaded at most three times, regardless of the number of preemptions. c can be reloaded at most three times, regardless of the number of preemptions. This is the case because cache block a can be accessed at most once, and cache This is the case because cache block a can be accessed at most once, and cache block c can be accessed at most 3 times, such that access leads to a reload. block c can be accessed at most 3 times, such that access leads to a reload. Our goal is to formulate the static code analysis that computes the upper Our goal is to formulate the static code analysis that computes the upper bounds on cache block reloads for each task, regardless of the number of bounds on cache block reloads for each task, regardless of the number of preemptions. Also, an interesting extension is to compute bounds per different preemptions. Also, an interesting extension is to compute bounds per different evicting tasks, and also to compute MA values, which can be achieved with evicting tasks, and also to compute MA values, which can be achieved with Integer Linear Programming. As the final step of this approach, we will define Integer Linear Programming. As the final step of this approach, we will define the response-time analysis that accounts for the reload upper bounds, instead of the response-time analysis that accounts for the reload upper bounds, instead of useful cache blocks. useful cache blocks.

11.2 Improved analysis and preemption point 11.2 Improved analysis and preemption point selection for LRU caches selection for LRU caches

For the second direction, we envision the improved cache-analysis for tasks For the second direction, we envision the improved cache-analysis for tasks with ﬁxed preemption points in systems that employ set-associative LRU caches. with ﬁxed preemption points in systems that employ set-associative LRU caches. While in this thesis we propose safe CRPD upper bounds, they can be more While in this thesis we propose safe CRPD upper bounds, they can be more accurate if we do not assume that all cache lines from a cache set must be accurate if we do not assume that all cache lines from a cache set must be evicted upon preemption with an adequate evicting cache block. As shown by evicted upon preemption with an adequate evicting cache block. As shown by Altmeyer et al. [93], resilience analysis can determine how many times a cache Altmeyer et al. [93], resilience analysis can determine how many times a cache 201

11.3 Task & cache & preemption partitioning 171 11.3 Task & cache & preemption partitioning 171 set can be accessed, without evicting the cache block that was reloaded in that set can be accessed, without evicting the cache block that was reloaded in that cache set prior preemption. cache set prior preemption.

Figure 11.2. Cache-block access example for LRU caches. Figure 11.2. Cache-block access example for LRU caches.

In Figure 11.2, we show the execution CFG of a preempted task, illustrated In Figure 11.2, we show the execution CFG of a preempted task, illustrated with basic blocks (white rectangles). In each basic block, we show individual with basic blocks (white rectangles). In each basic block, we show individual cache set accesses (a, b, or d). In between the second and the third basic block of cache set accesses (a, b, or d). In between the second and the third basic block of the preempted task, we assume that there is preemption, and with grey rectangle the preempted task, we assume that there is preemption, and with grey rectangle we represent any possible execution sequence of preempting tasks, assuming we represent any possible execution sequence of preempting tasks, assuming also that there are possible nested preemptions among them. Given cache also that there are possible nested preemptions among them. Given cache associativity of two, each cache set consists of two cache blocks. Thus, single associativity of two, each cache set consists of two cache blocks. Thus, single access to the cache set a, during the execution of the preempting tasks, will not access to the cache set a, during the execution of the preempting tasks, will not lead to a reload of a at the fourth basic block of the preempted task. Given the lead to a reload of a at the fourth basic block of the preempted task. Given the set associativity of CA, and the maximum number MA(a) of accesses of cache set associativity of CA, and the maximum number MA(a) of accesses of cache set a during the execution of the preempting tasks, a will not be reloaded at set a during the execution of the preempting tasks, a will not be reloaded at the fourth basic block as long as MA(a) < CA. Using the static code analysis the fourth basic block as long as MA(a) < CA. Using the static code analysis from the previous section to compute MA(a), the approximation of cache block from the previous section to compute MA(a), the approximation of cache block reloads can be improved compared to SOTA resilience analysis. reloads can be improved compared to SOTA resilience analysis. Next part of the problem is to select preemption points such that CRPD Next part of the problem is to select preemption points such that CRPD is minimised, but considering the knowledge about the cache block resilience is minimised, but considering the knowledge about the cache block resilience and the above problem statement. Here, it is also possible to use Integer and the above problem statement. Here, it is also possible to use Integer Linear Programming, and also some less complex algorithms. This may be an Linear Programming, and also some less complex algorithms. This may be an interesting contribution to SOTA since the existing preemption point selection interesting contribution to SOTA since the existing preemption point selection algorithms primarily account for direct-mapped caches. algorithms primarily account for direct-mapped caches.

11.3 Task & cache & preemption partitioning 11.3 Task & cache & preemption partitioning

For the third direction, we envision the joint approach for task, cache, and For the third direction, we envision the joint approach for task, cache, and preemption partitioning. It has been shown by Altmeyer et al. [49, 50] that preemption partitioning. It has been shown by Altmeyer et al. [49, 50] that cache partitioning, i.e. the assignment of cache partitions to individual tasks, can cache partitioning, i.e. the assignment of cache partitions to individual tasks, can improve schedulability due to increased predictability and reduced cache usage. improve schedulability due to increased predictability and reduced cache usage. 202

172 Chapter 11. Future work 172 Chapter 11. Future work

Reloadability static PreemptionPPreemptiion ppointoiint ResilienceRResiilience aanalysisnallysiis Reloadability static PreemptionPPreemptiion ppointoiint ResilienceRResiilience aanalysisnallysiis code analysis selectionselection code analysis selectionselection

Code level Code level System and scheduling level System and scheduling level

... ... ......

Cache partitioning Preemption partitioning Task partitioning Cache partitioning Preemption partitioning Task partitioning

Figure 11.3. Overview of the envisioned approach for CRPD minimisation and improving Figure 11.3. Overview of the envisioned approach for CRPD minimisation and improving schedulability of a real-time system. schedulability of a real-time system.

However, by applying the problems introduced in the previous two sections, However, by applying the problems introduced in the previous two sections, and the solutions from this thesis, existing cache partitioning methods can be and the solutions from this thesis, existing cache partitioning methods can be further improved. In Chapter 9, we showed that extensive task partitioning further improved. In Chapter 9, we showed that extensive task partitioning can improve schedulability, thus the joint use of task and cache partitioning can improve schedulability, thus the joint use of task and cache partitioning in multi-core systems deserves attention. Finally, in Chapter 8, we showed in multi-core systems deserves attention. Finally, in Chapter 8, we showed how preemption partitioning can be used to improve the accuracy of CRPD and how preemption partitioning can be used to improve the accuracy of CRPD and schedulability approximation. Also, all the three should be analysed considering schedulability approximation. Also, all the three should be analysed considering the information from the static code analysis and LRU cache improvements the information from the static code analysis and LRU cache improvements stated in the previous two sections, and preemption point selection if possible. stated in the previous two sections, and preemption point selection if possible. In Figure 11.3, we show a simple overview of the envisioned approach. In Figure 11.3, we show a simple overview of the envisioned approach. 203

11.4 Analysis of complex multi-core systems 173 11.4 Analysis of complex multi-core systems 173

11.4 Analysis of multi-core systems with 11.4 Analysis of multi-core systems with complex assumptions and constraints complex assumptions and constraints

For the other directions, we envision: For the other directions, we envision: Framework for preemption point selection that takes into account multi- Framework for preemption point selection that takes into account multi- core systems and requirements such as resource sharing, precedence core systems and requirements such as resource sharing, precedence constraints, preemption delays, etc. In this context, it can be important constraints, preemption delays, etc. In this context, it can be important to select non-preemptive regions and assign execution slots such that to select non-preemptive regions and assign execution slots such that no two non-preemptive regions use the same resource or cache address no two non-preemptive regions use the same resource or cache address space, etc. space, etc. Joint use of preemption threshold and preemption point selection. This Joint use of preemption threshold and preemption point selection. This direction would represent a continuation of the seminal work started by direction would represent a continuation of the seminal work started by Bril et al. [16] in the context of preemption points with varying priority Bril et al. [16] in the context of preemption points with varying priority thresholds, and Cavicchio et al. [64, 65] and Bertogna et al. [62], in the thresholds, and Cavicchio et al. [64, 65] and Bertogna et al. [62], in the context of preemption point selection. Such an approach can minimise context of preemption point selection. Such an approach can minimise CRPD and at the same time account for mutual execution exclusivity, and CRPD and at the same time account for mutual execution exclusivity, and similar problems. similar problems. Extension of the proposed, envisioned approaches to parallel task exe- Extension of the proposed, envisioned approaches to parallel task execution, and other memory hierarchies, such as shared cache, multi-level cution, and other memory hierarchies, such as shared cache, multi-level cache, DRAM, etc. The existing approaches for parallel task execution cache, DRAM, etc. The existing approaches for parallel task execution and DRAM, proposed by Casini et al. [122, 123], can be integrated with and DRAM, proposed by Casini et al. [122, 123], can be integrated with several concepts proposed in this thesis, and preemption point selection. several concepts proposed in this thesis, and preemption point selection. 204 205

Bibliography Bibliography

[1] Henry Heberle, Gabriela Vaz Meirelles, Felipe R da Silva, Guilherme P [1] Henry Heberle, Gabriela Vaz Meirelles, Felipe R da Silva, Guilherme P Telles, and Rosane Minghim. Interactivenn: a web-based tool for the Telles, and Rosane Minghim. Interactivenn: a web-based tool for the analysis of sets through venn diagrams. BMC bioinformatics, 16(1):169, analysis of sets through venn diagrams. BMC bioinformatics, 16(1):169, 2015. 2015. [2] Sebastian Hahn, Michael Jacobs, and Jan Reineke. Enabling compo- [2] Sebastian Hahn, Michael Jacobs, and Jan Reineke. Enabling compo- sitionality for multicore timing analysis. In Proceedings of the 24th sitionality for multicore timing analysis. In Proceedings of the 24th international conference on real-time networks and systems, pages 299– international conference on real-time networks and systems, pages 299– 308. ACM, 2016. 308. ACM, 2016. [3] Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. The [3] Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. The Mälardalen WCET benchmarks: Past, present and future. In 10th In- Mälardalen WCET benchmarks: Past, present and future. In 10th In- ternational Workshop on Worst-Case Execution Time Analysis (WCET ternational Workshop on Worst-Case Execution Time Analysis (WCET 2010). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2010. 2010). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2010. [4] Heiko Falk, Sebastian Altmeyer, Peter Hellinckx, Björn Lisper, Wolfgang [4] Heiko Falk, Sebastian Altmeyer, Peter Hellinckx, Björn Lisper, Wolfgang Pufﬁtsch, Christine Rochange, Martin Schoeberl, Rasmus Bo Sørensen, Pufﬁtsch, Christine Rochange, Martin Schoeberl, Rasmus Bo Sørensen, Peter Wägemann, and Simon Wegener. Taclebench: A benchmark col- Peter Wägemann, and Simon Wegener. Taclebench: A benchmark collection to support worst-case execution time research. In Proceedings of lection to support worst-case execution time research. In Proceedings of the 16th International Workshop on Worst-Case Execution Time Analysis the 16th International Workshop on Worst-Case Execution Time Analysis (WCET’16)/Schoeberl, M.[edit.], volume 55, pages 1–10, 2016. (WCET’16)/Schoeberl, M.[edit.], volume 55, pages 1–10, 2016. [5] Simon Schliecker, Jonas Rox, Mircea Negrean, Kai Richter, Marek [5] Simon Schliecker, Jonas Rox, Mircea Negrean, Kai Richter, Marek Jersak, and Rolf Ernst. System level performance analysis for real-time Jersak, and Rolf Ernst. System level performance analysis for real-time automotive multicore and network architectures. IEEE Transactions on automotive multicore and network architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(7):979– Computer-Aided Design of Integrated Circuits and Systems, 28(7):979– 992, 2009. 992, 2009. [6] Robert I Davis, Iain Bate, Guillem Bernat, Ian Broster, Alan Burns, [6] Robert I Davis, Iain Bate, Guillem Bernat, Ian Broster, Alan Burns, Antoine Colin, Stuart Hutchesson, and Nigel Tracey. Transferring real- Antoine Colin, Stuart Hutchesson, and Nigel Tracey. Transferring real-

175 175 206

176 Bibliography 176 Bibliography time systems research into industrial practice: Four impact case studies. time systems research into industrial practice: Four impact case studies. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

[7] Philippe Baufreton, Vincent Bregeon, Keryan Didier, Guillaume Iooss, [7] Philippe Baufreton, Vincent Bregeon, Keryan Didier, Guillaume Iooss, Dumitru Potop-Butucaru, and Jean Souyris. Efficient fine-grain paral- Dumitru Potop-Butucaru, and Jean Souyris. Efficient fine-grain paral- lelism in shared memory for real-time avionics. In ERTS 2020-10th lelism in shared memory for real-time avionics. In ERTS 2020-10th European Congress Embedded Real Time Systems, 2020. European Congress Embedded Real Time Systems, 2020.

[8] Cheol Hea Koo and Hyungshin Kim. Measurement of cache-related [8] Cheol Hea Koo and Hyungshin Kim. Measurement of cache-related preemption delay for spacecraft computers. In 2018 IEEE 24th Interna- preemption delay for spacecraft computers. In 2018 IEEE 24th Interna- tional Conference on Embedded and Real-Time Computing Systems and tional Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 234–235. IEEE, 2018. Applications (RTCSA), pages 234–235. IEEE, 2018.

[9] Nunzio Cecere, Massimo Tipaldi, Ralf Wenker, and Umberto Villano. [9] Nunzio Cecere, Massimo Tipaldi, Ralf Wenker, and Umberto Villano. Measurement and analysis of schedulability of spacecraft on-board soft- Measurement and analysis of schedulability of spacecraft on-board software. In 2016 IEEE Metrology for Aerospace (MetroAeroSpace), pages ware. In 2016 IEEE Metrology for Aerospace (MetroAeroSpace), pages 545–550. IEEE, 2016. 545–550. IEEE, 2016.

[10] John A Stankovic and Krithi Ramamritham. Hard real-time systems: [10] John A Stankovic and Krithi Ramamritham. Hard real-time systems: tutorial. [sn], 1988. tutorial. [sn], 1988.

[11] Giorgio Buttazzo. Hard real-time computing systems: predictable [11] Giorgio Buttazzo. Hard real-time computing systems: predictable scheduling algorithms and applications, volume 24. Springer Science & scheduling algorithms and applications, volume 24. Springer Science & Business Media, 2011. Business Media, 2011.

[12] Giorgio C Buttazzo, Marko Bertogna, and Gang Yao. Limited preemptive [12] Giorgio C Buttazzo, Marko Bertogna, and Gang Yao. Limited preemptive scheduling for real-time systems. A survey. industrial Informatics, IEEE scheduling for real-time systems. A survey. industrial Informatics, IEEE Transactions on, 9(1):3–15, 2013. Transactions on, 9(1):3–15, 2013.

[13] Yun Wang and Manas Saksena. Scheduling ﬁxed-priority tasks with [13] Yun Wang and Manas Saksena. Scheduling ﬁxed-priority tasks with preemption threshold. In Real-Time Computing Systems and Applications, preemption threshold. In Real-Time Computing Systems and Applications, 1999. RTCSA’99. Sixth International Conference on, pages 328–335. 1999. RTCSA’99. Sixth International Conference on, pages 328–335. IEEE, 1999. IEEE, 1999.

[14] Sanjoy Baruah. The limited-preemption uniprocessor scheduling of [14] Sanjoy Baruah. The limited-preemption uniprocessor scheduling of sporadic task systems. In 17th Euromicro Conference on Real-Time sporadic task systems. In 17th Euromicro Conference on Real-Time Systems (ECRTS’05), pages 137–144. IEEE, 2005. Systems (ECRTS’05), pages 137–144. IEEE, 2005.

[15] Alan Burns and Ed. S. Son. Preemptive priority based scheduling: An [15] Alan Burns and Ed. S. Son. Preemptive priority based scheduling: An appropriate engineering approach. Advances in Real-Time Systems, pages appropriate engineering approach. Advances in Real-Time Systems, pages 225–248, 1994. 225–248, 1994. 207

Bibliography 177 Bibliography 177

[16] Reinder J Bril, Martijn MHP van den Heuvel, and Johan J Lukkien. Im- [16] Reinder J Bril, Martijn MHP van den Heuvel, and Johan J Lukkien. Im- proved feasibility of ﬁxed-priority scheduling with deferred preemption proved feasibility of ﬁxed-priority scheduling with deferred preemption using preemption thresholds for preemption points. In Proceedings of using preemption thresholds for preemption points. In Proceedings of the 21st International conference on Real-Time Networks and Systems, the 21st International conference on Real-Time Networks and Systems, pages 255–264, 2013. pages 255–264, 2013.

[17] Björn B Brandenburg. Scheduling and locking in multiprocessor real- [17] Björn B Brandenburg. Scheduling and locking in multiprocessor real- time operating systems. 2011. time operating systems. 2011.

[18] John M Calandrino, Hennadiy Leontyev, Aaron Block, UmaMaheswari C [18] John M Calandrino, Hennadiy Leontyev, Aaron Block, UmaMaheswari C Devi, and James H Anderson. Litmusˆ rt: A testbed for empirically Devi, and James H Anderson. Litmusˆ rt: A testbed for empirically comparing real-time multiprocessor schedulers. In 2006 27th IEEE comparing real-time multiprocessor schedulers. In 2006 27th IEEE International Real-Time Systems Symposium (RTSS’06), pages 111–126. International Real-Time Systems Symposium (RTSS’06), pages 111–126. IEEE, 2006. IEEE, 2006.

[19] John L Hennessy and David A Patterson. Computer architecture: a [19] John L Hennessy and David A Patterson. Computer architecture: a quantitative approach. Elsevier, 2011. quantitative approach. Elsevier, 2011.

[20] Sebastian Altmeyer. Analysis of preemptively scheduled hard real-time [20] Sebastian Altmeyer. Analysis of preemptively scheduled hard real-time systems. epubli GmbH, 2013. systems. epubli GmbH, 2013.

[21] Yan Solihin. Fundamentals of Parallel Multicore Architecture. CRC [21] Yan Solihin. Fundamentals of Parallel Multicore Architecture. CRC Press, 2015. Press, 2015.

[22] Rodolfo Pellizzoni, Bach D Bui, Marco Caccamo, and Lui Sha. [22] Rodolfo Pellizzoni, Bach D Bui, Marco Caccamo, and Lui Sha. Coscheduling of CPU and I/O transactions in COTS-based embedded Coscheduling of CPU and I/O transactions in COTS-based embedded systems. In Real-Time Systems Symposium, 2008, pages 221–231. IEEE, systems. In Real-Time Systems Symposium, 2008, pages 221–231. IEEE, 2008. 2008.

[23] Bach D Bui, Marco Caccamo, Lui Sha, and Joseph Martinez. Impact [23] Bach D Bui, Marco Caccamo, Lui Sha, and Joseph Martinez. Impact of cache partitioning on multi-tasking real time embedded systems. In of cache partitioning on multi-tasking real time embedded systems. In 2008 14th IEEE International Conference on Embedded and Real-Time 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 101–110. IEEE, 2008. Computing Systems and Applications, pages 101–110. IEEE, 2008.

[24] Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, [24] Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, et al. The worst-case execution-time Reinhold Heckmann, Tulika Mitra, et al. The worst-case execution-time problem—overview of methods and survey of tools. ACM Transactions problem—overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems (TECS), 7(3):1–53, 2008. on Embedded Computing Systems (TECS), 7(3):1–53, 2008.

[25] Richard Zurawski. Embedded Systems Handbook: Embedded systems [25] Richard Zurawski. Embedded Systems Handbook: Embedded systems design and veriﬁcation, volume 6. CRC press, 2018. design and veriﬁcation, volume 6. CRC press, 2018. 208

178 Bibliography 178 Bibliography

[26] Isabelle Puaut. Cache analysis vs static cache locking for schedulability [26] Isabelle Puaut. Cache analysis vs static cache locking for schedulability analysis in multitasking real-time systems. Proc. WCET, Vienna, Austria, analysis in multitasking real-time systems. Proc. WCET, Vienna, Austria, 2002. 2002.

[27] Frank Mueller. Timing analysis for instruction caches. Real-time systems, [27] Frank Mueller. Timing analysis for instruction caches. Real-time systems, 18(2-3):217–247, 2000. 18(2-3):217–247, 2000.

[28] Sebastian Hahn. On static execution-time analysis. Saarländische [28] Sebastian Hahn. On static execution-time analysis. Saarländische Universitäts-und Landesbibliothek, 2018. Universitäts-und Landesbibliothek, 2018.

[29] Gregory Stock, Sebastian Hahn, and Jan Reineke. Cache persistence [29] Gregory Stock, Sebastian Hahn, and Jan Reineke. Cache persistence analysis: Finally exact. arXiv preprint arXiv:1909.04374, 2019. analysis: Finally exact. arXiv preprint arXiv:1909.04374, 2019.

[30] Gordana Dodig Crnkovic. Constructive research and info-computational [30] Gordana Dodig Crnkovic. Constructive research and info-computational knowledge generation. In Model-Based Reasoning in Science and Tech- knowledge generation. In Model-Based Reasoning in Science and Tech- nology, pages 359–380. Springer, 2010. nology, pages 359–380. Springer, 2010.

[31] Enrico Bini and Giorgio C Buttazzo. Measuring the performance of [31] Enrico Bini and Giorgio C Buttazzo. Measuring the performance of schedulability tests. Real-Time Systems, 30(1-2):129–154, 2005. schedulability tests. Real-Time Systems, 30(1-2):129–154, 2005.

[32] Filip Markovic,´ Jan Carlson, and Radu Dobrin. Tightening the bounds [32] Filip Markovic,´ Jan Carlson, and Radu Dobrin. Tightening the bounds on cache-related preemption delay in ﬁxed preemption point scheduling. on cache-related preemption delay in ﬁxed preemption point scheduling. In 17th International Workshop on Worst-Case Execution Time Analysis In 17th International Workshop on Worst-Case Execution Time Analysis (WCET 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. (WCET 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.

[33] F. Markovic,´ J. Carlson, and R. Dobrin. A comparison of partitioning [33] F. Markovic,´ J. Carlson, and R. Dobrin. A comparison of partitioning strategies for ﬁxed points based limited preemptive scheduling. IEEE strategies for ﬁxed points based limited preemptive scheduling. IEEE Transactions on Industrial Informatics, 15(2):1070–1081, 2019. Transactions on Industrial Informatics, 15(2):1070–1081, 2019.

[34] Filip Markovic,´ Jan Carlson, and Radu Dobrin. Cache-aware response [34] Filip Markovic,´ Jan Carlson, and Radu Dobrin. Cache-aware response time analysis for real-time tasks with ﬁxed preemption points. In The time analysis for real-time tasks with ﬁxed preemption points. In The 26TH IEEE Real-time and embedded Technology and Applications Sym- 26TH IEEE Real-time and embedded Technology and Applications Sym- posium (RTAS 2020), April 2020. posium (RTAS 2020), April 2020.

[35] Filip Markovic, Jan Carlson, Sebastian Altmeyer, and Radu Dobrin. [35] Filip Markovic, Jan Carlson, Sebastian Altmeyer, and Radu Dobrin. Improving the accuracy of cache-aware response time analysis using Improving the accuracy of cache-aware response time analysis using preemption partitioning. In The 32nd Euromicro Conference on Real- preemption partitioning. In The 32nd Euromicro Conference on Real- Time Systems, July 2020. Time Systems, July 2020.

[36] F. Markovic,´ J. Carlson, and R. Dobrin. A comparison of partitioning [36] F. Markovic,´ J. Carlson, and R. Dobrin. A comparison of partitioning strategies for ﬁxed points based limited preemptive scheduling. IEEE strategies for ﬁxed points based limited preemptive scheduling. IEEE Transactions on Industrial Informatics, 15(2):1070–1081, 2019. Transactions on Industrial Informatics, 15(2):1070–1081, 2019. 209

Bibliography 179 Bibliography 179

[37] Filip Markovic.´ Improving the Schedulability of Real Time Systems under [37] Filip Markovic.´ Improving the Schedulability of Real Time Systems under Fixed Preemption Point Scheduling. E-Print AB, 2018. Fixed Preemption Point Scheduling. E-Print AB, 2018.

[38] Filip Markovic, Jan Carlson, Abhilash Thekkilakattil, Radu Dobrin, and [38] Filip Markovic, Jan Carlson, Abhilash Thekkilakattil, Radu Dobrin, and Björn Lisper. Probabilistic response time analysis for ﬁxed preemption Björn Lisper. Probabilistic response time analysis for ﬁxed preemption point selection. In 13th International Symposium on Industrial Embedded point selection. In 13th International Symposium on Industrial Embedded Systems, June 2018. Systems, June 2018.

[39] Filip Markovic, Jan Carlson, and Radu Dobrin. Preemption point se- [39] Filip Markovic, Jan Carlson, and Radu Dobrin. Preemption point selection in limited preemptive scheduling using probabilistic preemption lection in limited preemptive scheduling using probabilistic preemption costs. In 28th Euromicro Conference on Real-Time Systems, WiP, July costs. In 28th Euromicro Conference on Real-Time Systems, WiP, July 2016. 2016.

[40] Hiroyuki Tomiyama and Nikil D Dutt. Program path analysis to bound [40] Hiroyuki Tomiyama and Nikil D Dutt. Program path analysis to bound cache-related preemption delay in preemptive real-time systems. In cache-related preemption delay in preemptive real-time systems. In Proceedings of the eighth international workshop on Hardware/software Proceedings of the eighth international workshop on Hardware/software codesign, pages 67–71. ACM, 2000. codesign, pages 67–71. ACM, 2000.

[41] José V Busquets-Mataix, Juan José Serrano, Rafael Ors, Pedro Gil, [41] José V Busquets-Mataix, Juan José Serrano, Rafael Ors, Pedro Gil, and Andy Wellings. Adding instruction cache effect to schedulability and Andy Wellings. Adding instruction cache effect to schedulability analysis of preemptive real-time systems. In Real-Time Technology and analysis of preemptive real-time systems. In Real-Time Technology and Applications Symposium, 1996. Proceedings., 1996 IEEE, pages 204–212. Applications Symposium, 1996. Proceedings., 1996 IEEE, pages 204–212. IEEE, 1996. IEEE, 1996.

[42] Chang-Gun Lee, Joosun Han, Yang-Min Seo, Sang Luyl Min, Rhan Ha, [42] Chang-Gun Lee, Joosun Han, Yang-Min Seo, Sang Luyl Min, Rhan Ha, Seongsoo Hong, Chang Yun Park, Minsuk Lee, and Chong Sam Kim. Seongsoo Hong, Chang Yun Park, Minsuk Lee, and Chong Sam Kim. Analysis of cache-related preemption delay in ﬁxed-priority preemptive Analysis of cache-related preemption delay in ﬁxed-priority preemptive scheduling. IEEE Transactions on Computers, 47(6):700–713, 1998. scheduling. IEEE Transactions on Computers, 47(6):700–713, 1998.

[43] Yudong Tan and Vincent Mooney. Timing analysis for preemptive multi- [43] Yudong Tan and Vincent Mooney. Timing analysis for preemptive multitasking real-time systems with caches. ACM Transactions on Embedded tasking real-time systems with caches. ACM Transactions on Embedded Computing Systems (TECS), 6(1):7, 2007. Computing Systems (TECS), 6(1):7, 2007.

[44] Sebastian Altmeyer, Robert I Davis, and Claire Maiza. Cache related [44] Sebastian Altmeyer, Robert I Davis, and Claire Maiza. Cache related pre-emption delay aware response time analysis for ﬁxed priority pre- pre-emption delay aware response time analysis for ﬁxed priority preemptive systems. In 2011 IEEE 32nd Real-Time Systems Symposium, emptive systems. In 2011 IEEE 32nd Real-Time Systems Symposium, pages 261–271. IEEE, 2011. pages 261–271. IEEE, 2011.

[45] Sebastian Altmeyer, Robert I Davis, and Claire Maiza. Improved cache [45] Sebastian Altmeyer, Robert I Davis, and Claire Maiza. Improved cache related pre-emption delay aware response time analysis for ﬁxed priority related pre-emption delay aware response time analysis for ﬁxed priority pre-emptive systems. Real-Time Systems, 48(5):499–526, 2012. pre-emptive systems. Real-Time Systems, 48(5):499–526, 2012. 210

180 Bibliography 180 Bibliography

[46] Jan Staschulat, Simon Schliecker, and Rolf Ernst. Scheduling analysis [46] Jan Staschulat, Simon Schliecker, and Rolf Ernst. Scheduling analysis of real-time systems with precise modeling of cache related preemption of real-time systems with precise modeling of cache related preemption delay. In Real-Time Systems, 2005.(ECRTS 2005). Proceedings. 17th delay. In Real-Time Systems, 2005.(ECRTS 2005). Proceedings. 17th Euromicro Conference on, pages 41–48. IEEE, 2005. Euromicro Conference on, pages 41–48. IEEE, 2005.

[47] Harini Ramaprasad and Frank Mueller. Tightening the bounds on feasi- [47] Harini Ramaprasad and Frank Mueller. Tightening the bounds on feasible preemptions. ACM Transactions on Embedded Computing Systems ble preemptions. ACM Transactions on Embedded Computing Systems (TECS), 10(2):27, 2010. (TECS), 10(2):27, 2010.

[48] Harini Ramaprasad and Frank Mueller. Bounding worst-case response [48] Harini Ramaprasad and Frank Mueller. Bounding worst-case response time for tasks with non-preemptive regions. In 2008 IEEE Real-Time time for tasks with non-preemptive regions. In 2008 IEEE Real-Time and Embedded Technology and Applications Symposium, pages 58–67. and Embedded Technology and Applications Symposium, pages 58–67. IEEE, 2008. IEEE, 2008.

[49] Altmeyer Sebastian, Douma Roeland, Lunniss Will, and I Davis Robert. [49] Altmeyer Sebastian, Douma Roeland, Lunniss Will, and I Davis Robert. Evaluation of cache partitioning for hard real-time systems. In proceed- Evaluation of cache partitioning for hard real-time systems. In proceedings Euromicro Conference on Real-Time Systems (ECRTS), pages 15–26, ings Euromicro Conference on Real-Time Systems (ECRTS), pages 15–26, 2014. 2014.

[50] Sebastian Altmeyer, Roeland Douma, Will Lunniss, and Robert I Davis. [50] Sebastian Altmeyer, Roeland Douma, Will Lunniss, and Robert I Davis. On the effectiveness of cache partitioning in hard real-time systems. On the effectiveness of cache partitioning in hard real-time systems. Real-Time Systems, 52(5):598–643, 2016. Real-Time Systems, 52(5):598–643, 2016.

[51] Syed Aftab Rashid, Geoffrey Nelissen, Damien Hardy, Benny Akesson, [51] Syed Aftab Rashid, Geoffrey Nelissen, Damien Hardy, Benny Akesson, Isabelle Puaut, and Eduardo Tovar. Cache-persistence-aware response- Isabelle Puaut, and Eduardo Tovar. Cache-persistence-aware response- time analysis for ﬁxed-priority preemptive systems. In 2016 28th Euromi- time analysis for ﬁxed-priority preemptive systems. In 2016 28th Euromi- cro Conference on Real-Time Systems (ECRTS), pages 262–272. IEEE, cro Conference on Real-Time Systems (ECRTS), pages 262–272. IEEE, 2016. 2016.

[52] Syed Aftab Rashid, Geoffrey Nelissen, Sebastian Altmeyer, Robert I [52] Syed Aftab Rashid, Geoffrey Nelissen, Sebastian Altmeyer, Robert I Davis, and Eduardo Tovar. Integrated analysis of cache related pre- Davis, and Eduardo Tovar. Integrated analysis of cache related preemption delays and cache persistence reload overheads. In 2017 IEEE emption delays and cache persistence reload overheads. In 2017 IEEE Real-Time Systems Symposium (RTSS), pages 188–198. IEEE, 2017. Real-Time Systems Symposium (RTSS), pages 188–198. IEEE, 2017.

[53] Robert I Davis, Sebastian Altmeyer, and Jan Reineke. Response-time [53] Robert I Davis, Sebastian Altmeyer, and Jan Reineke. Response-time analysis for ﬁxed-priority systems with a write-back cache. Real-Time analysis for ﬁxed-priority systems with a write-back cache. Real-Time Systems, 54(4):912–963, 2018. Systems, 54(4):912–963, 2018.

[54] Tobias Blaß, Sebastian Hahn, and Jan Reineke. Write-back caches in wcet [54] Tobias Blaß, Sebastian Hahn, and Jan Reineke. Write-back caches in wcet analysis. In 29th Euromicro Conference on Real-Time Systems (ECRTS analysis. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. 211

Bibliography 181 Bibliography 181

[55] Robert I Davis, Sebastian Altmeyer, and Alan Burns. Mixed criticality [55] Robert I Davis, Sebastian Altmeyer, and Alan Burns. Mixed criticality systems with varying context switch costs. In 2018 IEEE Real-Time systems with varying context switch costs. In 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages and Embedded Technology and Applications Symposium (RTAS), pages 140–151. IEEE, 2018. 140–151. IEEE, 2018. [56] Reinder J Bril, Sebastian Altmeyer, Martijn MHP van den Heuvel, [56] Reinder J Bril, Sebastian Altmeyer, Martijn MHP van den Heuvel, Robert I Davis, and Moris Behnam. Fixed priority scheduling with Robert I Davis, and Moris Behnam. Fixed priority scheduling with pre-emption thresholds and cache-related pre-emption delays: integrated pre-emption thresholds and cache-related pre-emption delays: integrated analysis and evaluation. Real-Time Systems, 53(4):403–466, 2017. analysis and evaluation. Real-Time Systems, 53(4):403–466, 2017. [57] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Feasibility anal- [57] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Feasibility analysis under fixed priority scheduling with fixed preemption points. In ysis under fixed priority scheduling with fixed preemption points. In 2010 IEEE 16th International Conference on Embedded and Real-Time 2010 IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications, pages 71–80. IEEE, 2010. Computing Systems and Applications, pages 71–80. IEEE, 2010. [58] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Feasibility analysis [58] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Feasibility analysis under fixed priority scheduling with limited preemptions. Real-Time under fixed priority scheduling with limited preemptions. Real-Time Systems, 47(3):198–223, 2011. Systems, 47(3):198–223, 2011. [59] Reinder J Bril, Johan J Lukkien, and Wim FJ Verhaegh. Worst-case [59] Reinder J Bril, Johan J Lukkien, and Wim FJ Verhaegh. Worst-case response time analysis of real-time tasks under fixed-priority scheduling response time analysis of real-time tasks under fixed-priority scheduling with deferred preemption. Real-Time Systems, 42(1-3):63–119, 2009. with deferred preemption. Real-Time Systems, 42(1-3):63–119, 2009. [60] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Comparative evalua- [60] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Comparative evaluation of limited preemptive methods. In 2010 IEEE 15th Conference on tion of limited preemptive methods. In 2010 IEEE 15th Conference on Emerging Technologies & Factory Automation (ETFA 2010), pages 1–8. Emerging Technologies & Factory Automation (ETFA 2010), pages 1–8. IEEE, 2010. IEEE, 2010. [61] Matthias Becker, Nima Khalilzad, Reinder J Bril, and Thomas Nolte. [61] Matthias Becker, Nima Khalilzad, Reinder J Bril, and Thomas Nolte. Extended support for limited preemption fixed priority scheduling for Extended support for limited preemption fixed priority scheduling for osek/autosar-compliant operating systems. In 10th IEEE International osek/autosar-compliant operating systems. In 10th IEEE International Symposium on Industrial Embedded Systems (SIES), pages 1–11. IEEE, Symposium on Industrial Embedded Systems (SIES), pages 1–11. IEEE, 2015. 2015. [62] Marko Bertogna, Orges Xhani, Mauro Marinoni, Francesco Esposito, [62] Marko Bertogna, Orges Xhani, Mauro Marinoni, Francesco Esposito, and Giorgio Buttazzo. Optimal selection of preemption points to mini- and Giorgio Buttazzo. Optimal selection of preemption points to mini- mize preemption overhead. In Real-Time Systems (ECRTS), 2011 23rd mize preemption overhead. In Real-Time Systems (ECRTS), 2011 23rd Euromicro Conference on, pages 217–227. IEEE, 2011. Euromicro Conference on, pages 217–227. IEEE, 2011. [63] Bo Peng, Nathan Fisher, and Marko Bertogna. Explicit preemption [63] Bo Peng, Nathan Fisher, and Marko Bertogna. Explicit preemption placement for real-time conditional code. In Real-Time Systems (ECRTS), placement for real-time conditional code. In Real-Time Systems (ECRTS), 2014 26th Euromicro Conference on, pages 177–188. IEEE, 2014. 2014 26th Euromicro Conference on, pages 177–188. IEEE, 2014. 212

182 Bibliography 182 Bibliography

[64] John Cavicchio, Corey Tessler, and Nathan Fisher. Minimizing cache [64] John Cavicchio, Corey Tessler, and Nathan Fisher. Minimizing cache overhead via loaded cache blocks and preemption placement. In Real- overhead via loaded cache blocks and preemption placement. In Real- Time Systems (ECRTS), 2015 27th Euromicro Conference on, pages Time Systems (ECRTS), 2015 27th Euromicro Conference on, pages 163–173. IEEE, 2015. 163–173. IEEE, 2015.

[65] John C Cavicchio. Effective and Efﬁcient Preemption Placement for [65] John C Cavicchio. Effective and Efﬁcient Preemption Placement for Cache Overhead Minimization in Hard Real-Time Systems. PhD thesis, Cache Overhead Minimization in Hard Real-Time Systems. PhD thesis, Wayne State University, 2019. Wayne State University, 2019.

[66] Abhilash Thekkilakattil, Kaiqian Zhu, Yonggao Nie, Radu Dobrin, and [66] Abhilash Thekkilakattil, Kaiqian Zhu, Yonggao Nie, Radu Dobrin, and Sasikumar Punnekkat. An empirical investigation of eager and lazy Sasikumar Punnekkat. An empirical investigation of eager and lazy preemption approaches in global limited preemptive scheduling. In Ada- preemption approaches in global limited preemptive scheduling. In Ada- Europe International Conference on Reliable Software Technologies, Europe International Conference on Reliable Software Technologies, pages 163–178. Springer, 2016. pages 163–178. Springer, 2016.

[67] Abhilash Thekkilakattil, Sanjoy Baruah, Radu Dobrin, and Sasikumar [67] Abhilash Thekkilakattil, Sanjoy Baruah, Radu Dobrin, and Sasikumar Punnekkat. The global limited preemptive earliest deadline ﬁrst feasibility Punnekkat. The global limited preemptive earliest deadline ﬁrst feasibility of sporadic real-time tasks. In Real-Time Systems (ECRTS), 2014 26th of sporadic real-time tasks. In Real-Time Systems (ECRTS), 2014 26th Euromicro Conference on, pages 301–310. IEEE, 2014. Euromicro Conference on, pages 301–310. IEEE, 2014.

[68] Andrea Bastoni, Björn Brandenburg, and James Anderson. Cache-related [68] Andrea Bastoni, Björn Brandenburg, and James Anderson. Cache-related preemption and migration delays: Empirical approximation and impact preemption and migration delays: Empirical approximation and impact on schedulability. Proceedings of OSPERT, pages 33–44, 2010. on schedulability. Proceedings of OSPERT, pages 33–44, 2010.

[69] David S Johnson. Near-optimal bin packing algorithms. PhD thesis, [69] David S Johnson. Near-optimal bin packing algorithms. PhD thesis, Massachusetts Institute of Technology, 1973. Massachusetts Institute of Technology, 1973.

[70] Nathan Fisher, Sanjoy Baruah, and Theodore P Baker. The partitioned [70] Nathan Fisher, Sanjoy Baruah, and Theodore P Baker. The partitioned scheduling of sporadic tasks according to static-priorities. In Real-Time scheduling of sporadic tasks according to static-priorities. In Real-Time Systems, 2006. 18th Euromicro Conference on, pages 10–pp. IEEE, 2006. Systems, 2006. 18th Euromicro Conference on, pages 10–pp. IEEE, 2006.

[71] Shinpei Kato and Nobuyuki Yamasaki. Portioned static-priority schedul- [71] Shinpei Kato and Nobuyuki Yamasaki. Portioned static-priority scheduling on multiprocessors. In Parallel and Distributed Processing, 2008. ing on multiprocessors. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1–12. IEEE, IPDPS 2008. IEEE International Symposium on, pages 1–12. IEEE, 2008. 2008.

[72] Karthik Lakshmanan, Ragunathan Rajkumar, and John Lehoczky. Par- [72] Karthik Lakshmanan, Ragunathan Rajkumar, and John Lehoczky. Par- titioned ﬁxed-priority preemptive scheduling for multi-core processors. titioned ﬁxed-priority preemptive scheduling for multi-core processors. In Real-Time Systems, 2009. ECRTS’09. 21st Euromicro Conference on, In Real-Time Systems, 2009. ECRTS’09. 21st Euromicro Conference on, pages 239–248. IEEE, 2009. pages 239–248. IEEE, 2009. 213

Bibliography 183 Bibliography 183

[73] Shinpei Kato and Nobuyuki Yamasaki. Semi-partitioned fixed-priority [73] Shinpei Kato and Nobuyuki Yamasaki. Semi-partitioned fixed-priority scheduling on multiprocessors. In Real-Time and Embedded Technology scheduling on multiprocessors. In Real-Time and Embedded Technology and Applications Symposium, 2009. RTAS 2009. 15th IEEE, pages 23–32. and Applications Symposium, 2009. RTAS 2009. 15th IEEE, pages 23–32. IEEE, 2009. IEEE, 2009. [74] Farhang Nemati, Thomas Nolte, and Moris Behnam. Partitioning real- [74] Farhang Nemati, Thomas Nolte, and Moris Behnam. Partitioning real- time systems on multiprocessors with shared resources. In Interna- time systems on multiprocessors with shared resources. In Interna- tional Conference On Principles Of Distributed Systems, pages 253–269. tional Conference On Principles Of Distributed Systems, pages 253–269. Springer, 2010. Springer, 2010. [75] Björn Andersson, Hyoseung Kim, Dionisio De Niz, Mark Klein, Ra- [75] Björn Andersson, Hyoseung Kim, Dionisio De Niz, Mark Klein, Ra- gunathan Rajkumar, and John Lehoczky. Schedulability analysis of gunathan Rajkumar, and John Lehoczky. Schedulability analysis of tasks with corunner-dependent execution times. ACM Transactions on tasks with corunner-dependent execution times. ACM Transactions on Embedded Computing Systems (TECS), 17(3):1–29, 2018. Embedded Computing Systems (TECS), 17(3):1–29, 2018. [76] Robert I Davis, Alan Burns, Jose Marinho, Vincent Nelis, Stefan M [76] Robert I Davis, Alan Burns, Jose Marinho, Vincent Nelis, Stefan M Petters, and Marko Bertogna. Global and partitioned multiprocessor Petters, and Marko Bertogna. Global and partitioned multiprocessor fixed priority scheduling with deferred preemption. ACM Transactions fixed priority scheduling with deferred preemption. ACM Transactions on Embedded Computing Systems (TECS), 14(3):47, 2015. on Embedded Computing Systems (TECS), 14(3):47, 2015. [77] Renan Augusto Starke and Romulo Silva de Oliveira. Cache-aware [77] Renan Augusto Starke and Romulo Silva de Oliveira. Cache-aware task partitioning for multicore real-time systems. In Computing Systems task partitioning for multicore real-time systems. In Computing Systems Engineering (SBESC), 2013 III Brazilian Symposium on, pages 89–94. Engineering (SBESC), 2013 III Brazilian Symposium on, pages 89–94. IEEE, 2013. IEEE, 2013. [78] Bryan C Ward, Jonathan L Herman, Christopher J Kenna, and James H [78] Bryan C Ward, Jonathan L Herman, Christopher J Kenna, and James H Anderson. Outstanding paper award: Making shared caches more pre- Anderson. Outstanding paper award: Making shared caches more predictable on multicore platforms. In Real-Time Systems (ECRTS), 2013 dictable on multicore platforms. In Real-Time Systems (ECRTS), 2013 25th Euromicro Conference on, pages 157–167. IEEE, 2013. 25th Euromicro Conference on, pages 157–167. IEEE, 2013. [79] Abhilash Thekkilakattil, Radu Dobrin, and Sasikumar Punnekkat. To- [79] Abhilash Thekkilakattil, Radu Dobrin, and Sasikumar Punnekkat. To- wards exploiting limited preemptive scheduling for partitioned multicore wards exploiting limited preemptive scheduling for partitioned multicore systems. JRWRTC 2014, page 45, 2014. systems. JRWRTC 2014, page 45, 2014. [80] Pedro Souto, Paulo Baltarejo Sousa, Robert I Davis, Konstantinos Blet- [80] Pedro Souto, Paulo Baltarejo Sousa, Robert I Davis, Konstantinos Blet- sas, and Eduardo Tovar. Overhead-aware schedulability evaluation of sas, and Eduardo Tovar. Overhead-aware schedulability evaluation of semi-partitioned real-time schedulers. In 2015 IEEE 21st International semi-partitioned real-time schedulers. In 2015 IEEE 21st International Conference on Embedded and Real-Time Computing Systems and Appli- Conference on Embedded and Real-Time Computing Systems and Appli- cations, pages 110–121. IEEE, 2015. cations, pages 110–121. IEEE, 2015. [81] Sebastian Altmeyer, Robert I Davis, Leandro Indrusiak, Claire Maiza, [81] Sebastian Altmeyer, Robert I Davis, Leandro Indrusiak, Claire Maiza, Vincent Nelis, and Jan Reineke. A generic and compositional framework Vincent Nelis, and Jan Reineke. A generic and compositional framework 214

184 Bibliography 184 Bibliography for multicore response time analysis. In Proceedings of the 23rd Interna- for multicore response time analysis. In Proceedings of the 23rd Interna- tional Conference on Real Time and Networks Systems, pages 129–138, tional Conference on Real Time and Networks Systems, pages 129–138, 2015. 2015. [82] Robert I Davis, Sebastian Altmeyer, Leandro S Indrusiak, Claire Maiza, [82] Robert I Davis, Sebastian Altmeyer, Leandro S Indrusiak, Claire Maiza, Vincent Nelis, and Jan Reineke. An extensible framework for multicore Vincent Nelis, and Jan Reineke. An extensible framework for multicore response time analysis. Real-Time Systems, 54(3):607–661, 2018. response time analysis. Real-Time Systems, 54(3):607–661, 2018. [83] Wen-Hung Huang, Jian-Jia Chen, and Jan Reineke. Mirror: symmetric [83] Wen-Hung Huang, Jian-Jia Chen, and Jan Reineke. Mirror: symmetric timing analysis for real-time tasks on multicore platforms with shared timing analysis for real-time tasks on multicore platforms with shared resources. In Proceedings of the 53rd Annual Design Automation Con- resources. In Proceedings of the 53rd Annual Design Automation Con- ference, pages 1–6, 2016. ference, pages 1–6, 2016. [84] Claire Maiza, Hamza Rihani, Juan M Rivas, Joël Goossens, Sebastian [84] Claire Maiza, Hamza Rihani, Juan M Rivas, Joël Goossens, Sebastian Altmeyer, and Robert I Davis. A survey of timing verification techniques Altmeyer, and Robert I Davis. A survey of timing verification techniques for multi-core real-time systems. ACM Computing Surveys (CSUR), for multi-core real-time systems. ACM Computing Surveys (CSUR), 52(3):1–38, 2019. 52(3):1–38, 2019. [85] Sebastian Altmeyer and Claire Burguiere. A new notion of useful cache [85] Sebastian Altmeyer and Claire Burguiere. A new notion of useful cache block to improve the bounds of cache-related preemption delay. In 2009 block to improve the bounds of cache-related preemption delay. In 2009 21st Euromicro Conference on Real-Time Systems, pages 109–118. IEEE, 21st Euromicro Conference on Real-Time Systems, pages 109–118. IEEE, 2009. 2009. [86] Wei Zhang, Fan Gong, Lei Ju, Nan Guan, and Zhiping Jia. Scope-aware [86] Wei Zhang, Fan Gong, Lei Ju, Nan Guan, and Zhiping Jia. Scope-aware useful cache block analysis for data cache related preemption delay. useful cache block analysis for data cache related preemption delay. In 2017 IEEE Real-Time and Embedded Technology and Applications In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 63–74. IEEE, 2017. Symposium (RTAS), pages 63–74. IEEE, 2017. [87] Damien Hardy and Isabelle Puaut. Wcet analysis of multi-level non- [87] Damien Hardy and Isabelle Puaut. Wcet analysis of multi-level non- inclusive set-associative instruction caches. In 2008 Real-Time Systems inclusive set-associative instruction caches. In 2008 Real-Time Systems Symposium, pages 456–466. IEEE, 2008. Symposium, pages 456–466. IEEE, 2008. [88] Benjamin Lesage, Damien Hardy, and Isabelle Puaut. Wcet analysis of [88] Benjamin Lesage, Damien Hardy, and Isabelle Puaut. Wcet analysis of multi-level set-associative data caches. In 9th International Workshop multi-level set-associative data caches. In 9th International Workshop on Worst-Case Execution Time Analysis (WCET’09). Schloss Dagstuhl- on Worst-Case Execution Time Analysis (WCET’09). Schloss Dagstuhl- Leibniz-Zentrum für Informatik, 2009. Leibniz-Zentrum für Informatik, 2009. [89] Reinhard Wilhelm, Sebastian Altmeyer, Claire Burguière, Daniel Grund, [89] Reinhard Wilhelm, Sebastian Altmeyer, Claire Burguière, Daniel Grund, Jörg Herter, Jan Reineke, Björn Wachter, and Stephan Wilhelm. Static Jörg Herter, Jan Reineke, Björn Wachter, and Stephan Wilhelm. Static timing analysis for hard real-time systems. In International Workshop on timing analysis for hard real-time systems. In International Workshop on Verification, Model Checking, and Abstract Interpretation, pages 3–22. Verification, Model Checking, and Abstract Interpretation, pages 3–22. Springer, 2010. Springer, 2010. 215

Bibliography 185 Bibliography 185

[90] Sebastian Hahn and Jan Reineke. Design and analysis of sic: a provably [90] Sebastian Hahn and Jan Reineke. Design and analysis of sic: a provably timing-predictable pipelined processor core. Real-Time Systems, pages timing-predictable pipelined processor core. Real-Time Systems, pages 1–39, 2019. 1–39, 2019.

[91] Tobias Blaß. Array-aware cache analysis for write-through and write- [91] Tobias Blaß. Array-aware cache analysis for write-through and write- back caches (Master thesis), December 2016. back caches (Master thesis), December 2016.

[92] Jan C Kleinsorge. Tight integration of cache, path and task-interference [92] Jan C Kleinsorge. Tight integration of cache, path and task-interference modeling for the analysis of hard real time systems. PhD thesis, 2015. modeling for the analysis of hard real time systems. PhD thesis, 2015.

[93] Sebastian Altmeyer, Claire Maiza, and Jan Reineke. Resilience analysis: [93] Sebastian Altmeyer, Claire Maiza, and Jan Reineke. Resilience analysis: tightening the CRPD bound for set-associative caches. In ACM Sigplan tightening the CRPD bound for set-associative caches. In ACM Sigplan Notices, volume 45, pages 153–162. ACM, 2010. Notices, volume 45, pages 153–162. ACM, 2010.

[94] Syed Aftab Rashid. Resiliencep analysis: Bounding cache persistence [94] Syed Aftab Rashid. Resiliencep analysis: Bounding cache persistence reload overhead for set-associative caches. In 3rd Doctoral Congress in reload overhead for set-associative caches. In 3rd Doctoral Congress in Engineering, 2019. Engineering, 2019.

[95] Corey Tessler and Nathan Fisher. Bundle: real-time multi-threaded [95] Corey Tessler and Nathan Fisher. Bundle: real-time multi-threaded scheduling to reduce cache contention. In 2016 IEEE Real-Time Systems scheduling to reduce cache contention. In 2016 IEEE Real-Time Systems Symposium (RTSS), pages 279–290. IEEE, 2016. Symposium (RTSS), pages 279–290. IEEE, 2016.

[96] Corey Tessler and Nathan Fisher. Npm-bundle: Non-preemptive mul- [96] Corey Tessler and Nathan Fisher. Npm-bundle: Non-preemptive mul- titask scheduling for jobs with bundle-based thread-level scheduling. titask scheduling for jobs with bundle-based thread-level scheduling. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.

[97] Corey Tessler and Nathan Fisher. Bundlep: Prioritizing conﬂict free [97] Corey Tessler and Nathan Fisher. Bundlep: Prioritizing conﬂict free regions in multi-threaded programs to improve cache reuse. In 2018 regions in multi-threaded programs to improve cache reuse. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 325–337. IEEE, IEEE Real-Time Systems Symposium (RTSS), pages 325–337. IEEE, 2018. 2018.

[98] Corey Tessler. BUNDLE: Taming the Cache and Improving Schedulabil- [98] Corey Tessler. BUNDLE: Taming the Cache and Improving Schedulabil- ity of Multi-Threaded Hard Real-Time Systems. PhD thesis, Wayne State ity of Multi-Threaded Hard Real-Time Systems. PhD thesis, Wayne State University, 2019. University, 2019.

[99] Wei Zhang, Nan Guan, Lei Ju, and Weichen Liu. Analyzing data cache [99] Wei Zhang, Nan Guan, Lei Ju, and Weichen Liu. Analyzing data cache related preemption delay with multiple preemptions. IEEE Transac- related preemption delay with multiple preemptions. IEEE Transac- tions on Computer-Aided Design of Integrated Circuits and Systems, tions on Computer-Aided Design of Integrated Circuits and Systems, 37(11):2255–2265, 2018. 37(11):2255–2265, 2018. 216

186 Bibliography 186 Bibliography

[100] Wei Zhang, Nan Guan, Lei Ju, Yue Tang, Weichen Liu, and Zhiping Jia. [100] Wei Zhang, Nan Guan, Lei Ju, Yue Tang, Weichen Liu, and Zhiping Jia. Scope-aware useful cache block calculation for cache-related preemption Scope-aware useful cache block calculation for cache-related preemption delay analysis with set-associative data caches. IEEE Transactions on delay analysis with set-associative data caches. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019. Computer-Aided Design of Integrated Circuits and Systems, 2019.

[101] Sanjoy Baruah and Nathan Fisher. Choosing preemption points to min- [101] Sanjoy Baruah and Nathan Fisher. Choosing preemption points to min- imize typical running times. In Proceedings of the 27th International imize typical running times. In Proceedings of the 27th International Conference on Real-Time Networks and Systems, pages 198–208, 2019. Conference on Real-Time Networks and Systems, pages 198–208, 2019.

[102] Sebastian Altmeyer and Claire Burguiere. A new notion of useful cache [102] Sebastian Altmeyer and Claire Burguiere. A new notion of useful cache block to improve the bounds of cache-related preemption delay. In Real- block to improve the bounds of cache-related preemption delay. In Real- Time Systems, 2009. ECRTS’09. 21st Euromicro Conference on, pages Time Systems, 2009. ECRTS’09. 21st Euromicro Conference on, pages 109–118. IEEE, 2009. 109–118. IEEE, 2009.

[103] Leo Hatvani, Reinder J Bril, and Sebastian Altmeyer. Schedulability [103] Leo Hatvani, Reinder J Bril, and Sebastian Altmeyer. Schedulability using native non-preemptive groups on an autosar/osek platform with using native non-preemptive groups on an autosar/osek platform with caches. In Design, Automation & Test in Europe Conference & Exhibition caches. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pages 244–249. IEEE, 2017. (DATE), 2017, pages 244–249. IEEE, 2017.

[104] Arne Hamann, Dakshina Dasari, Simon Kramer, Michael Pressler, and [104] Arne Hamann, Dakshina Dasari, Simon Kramer, Michael Pressler, and Falk Wurst. Communication centric design in complex automotive em- Falk Wurst. Communication centric design in complex automotive embedded systems. In 29th Euromicro Conference on Real-Time Systems bedded systems. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. (ECRTS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.

[105] Ignacio Sanudo, Paolo Burgio, and Marko Bertogna. Schedulability and [105] Ignacio Sanudo, Paolo Burgio, and Marko Bertogna. Schedulability and timing analysis of mixed preemptive-cooperative tasks on a partitioned timing analysis of mixed preemptive-cooperative tasks on a partitioned multi-core system. In Proceedings of the 7th International Workshop on multi-core system. In Proceedings of the 7th International Workshop on Analysis Tools and Methodologies for Embedded and Real-Time Systems Analysis Tools and Methodologies for Embedded and Real-Time Systems (WATERS’16), Toulouse, France, 2016. (WATERS’16), Toulouse, France, 2016.

[106] Jan Reineke, Björn Wachter, Stefan Thesing, Reinhard Wilhelm, Ilia [106] Jan Reineke, Björn Wachter, Stefan Thesing, Reinhard Wilhelm, Ilia Polian, Jochen Eisinger, and Bernd Becker. A definition and classification Polian, Jochen Eisinger, and Bernd Becker. A definition and classification of timing anomalies. In 6th International Workshop on Worst-Case of timing anomalies. In 6th International Workshop on Worst-Case Execution Time Analysis (WCET’06). Schloss Dagstuhl-Leibniz-Zentrum Execution Time Analysis (WCET’06). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2006. für Informatik, 2006.

[107] Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, [107] Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. Memory hierarchies, pipelines, Markus Pister, and Christian Ferdinand. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. and buses for future architectures in time-critical embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(7):966–978, 2009. Systems, 28(7):966–978, 2009. 217

Bibliography 187 Bibliography 187

[108] Thomas Lundqvist and Per Stenstrom. Timing anomalies in dynamically [108] Thomas Lundqvist and Per Stenstrom. Timing anomalies in dynamically scheduled microprocessors. In Proceedings 20th IEEE Real-Time Systems scheduled microprocessors. In Proceedings 20th IEEE Real-Time Systems Symposium (Cat. No. 99CB37054), pages 12–21. IEEE, 1999. Symposium (Cat. No. 99CB37054), pages 12–21. IEEE, 1999. [109] Claire Burguière, Jan Reineke, and Sebastian Altmeyer. Cache-related [109] Claire Burguière, Jan Reineke, and Sebastian Altmeyer. Cache-related preemption delay computation for set-associative caches-pitfalls and preemption delay computation for set-associative caches-pitfalls and solutions. In 9th International Workshop on Worst-Case Execution Time solutions. In 9th International Workshop on Worst-Case Execution Time Analysis (WCET’09). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Analysis (WCET’09). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2009. 2009. [110] CHOCO: Open Source Java Library for Constraint Programming. http: [110] CHOCO: Open Source Java Library for Constraint Programming. http: //www.choco-solver.org/. Accessed: 2017-04-13. //www.choco-solver.org/. Accessed: 2017-04-13. [111] IBM ILOG CPLEX. https://www.ibm.com/ [111] IBM ILOG CPLEX. https://www.ibm.com/ analytics/data-science/prescriptive-analytics/ analytics/data-science/prescriptive-analytics/ cplex-optimizer. Accessed: 2018-02-05. cplex-optimizer. Accessed: 2018-02-05. [112] Reinder J Bril, Wim FJ Verhaegh, and Johan J Lukkien. Exact worst-case [112] Reinder J Bril, Wim FJ Verhaegh, and Johan J Lukkien. Exact worst-case response times of real-time tasks under fixed-priority scheduling with response times of real-time tasks under fixed-priority scheduling with deferred preemption. In Proc. Work-in-Progress (WiP) session of the deferred preemption. In Proc. Work-in-Progress (WiP) session of the 16th Euromicro Conference on Real-Time Systems (ECRTS), Technical 16th Euromicro Conference on Real-Time Systems (ECRTS), Technical Report from the University of Nebraska-Lincoln, Department of Com- Report from the University of Nebraska-Lincoln, Department of Com- puter Science and Engineering (TR-UNL-CSE-2004-0010), pages 57–60, puter Science and Engineering (TR-UNL-CSE-2004-0010), pages 57–60, 2004. 2004. [113] Enrico Bini and Giorgio C Buttazzo. Schedulability analysis of periodic [113] Enrico Bini and Giorgio C Buttazzo. Schedulability analysis of periodic fixed priority systems. IEEE Transactions on Computers, 53(11):1462– fixed priority systems. IEEE Transactions on Computers, 53(11):1462– 1473, 2004. 1473, 2004. [114] Darshit Shah, Sebastian Hahn, and Jan Reineke. Experimental evalua- [114] Darshit Shah, Sebastian Hahn, and Jan Reineke. Experimental evaluation of cache-related preemption delay aware timing analysis. In 18th tion of cache-related preemption delay aware timing analysis. In 18th International Workshop on Worst-Case Execution Time Analysis (WCET International Workshop on Worst-Case Execution Time Analysis (WCET 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. [115] Martin Aigner. A characterization of the bell numbers. Discrete mathe- [115] Martin Aigner. A characterization of the bell numbers. Discrete mathe- matics, 205(1-3):207–210, 1999. matics, 205(1-3):207–210, 1999. [116] Paul R Halmos. Naive set theory. Courier Dover Publications, 2017. [116] Paul R Halmos. Naive set theory. Courier Dover Publications, 2017. [117] Borivoje Djokic,´ Masahiro Miyakawa, Satoshi Sekiguchi, Ichiro Semba, [117] Borivoje Djokic,´ Masahiro Miyakawa, Satoshi Sekiguchi, Ichiro Semba, and Ivan Stojmenovic.´ Parallel algorithms for generating subsets and and Ivan Stojmenovic.´ Parallel algorithms for generating subsets and set partitions. In International Symposium on Algorithms, pages 76–85. set partitions. In International Symposium on Algorithms, pages 76–85. Springer, 1990. Springer, 1990. 218

188 Bibliography 188 Bibliography

[118] MC Er. A fast algorithm for generating set partitions. The Computer [118] MC Er. A fast algorithm for generating set partitions. The Computer Journal, 31(3):283–284, 1988. Journal, 31(3):283–284, 1988.

[119] Simon Schliecker, Mircea Negrean, and Rolf Ernst. Response time [119] Simon Schliecker, Mircea Negrean, and Rolf Ernst. Response time analysis on multicore ecus with shared resources. IEEE Transactions on analysis on multicore ecus with shared resources. IEEE Transactions on Industrial Informatics, 5(4):402–413, 2009. Industrial Informatics, 5(4):402–413, 2009. [120] Marko Bertogna, Giorgio Buttazzo, Mauro Marinoni, Gang Yao, [120] Marko Bertogna, Giorgio Buttazzo, Mauro Marinoni, Gang Yao, Francesco Esposito, and Marco Caccamo. Preemption points place- Francesco Esposito, and Marco Caccamo. Preemption points placement for sporadic task sets. In Real-Time Systems (ECRTS), 2010 22nd ment for sporadic task sets. In Real-Time Systems (ECRTS), 2010 22nd Euromicro Conference on, pages 251–260. IEEE, 2010. Euromicro Conference on, pages 251–260. IEEE, 2010. [121] Björn B Brandenburg and Mahircan Gül. Global scheduling not required: [121] Björn B Brandenburg and Mahircan Gül. Global scheduling not required: Simple, near-optimal multiprocessor real-time scheduling with semi- Simple, near-optimal multiprocessor real-time scheduling with semi- partitioned reservations. In Real-Time Systems Symposium (RTSS), 2016 partitioned reservations. In Real-Time Systems Symposium (RTSS), 2016 IEEE, pages 99–110. IEEE, 2016. IEEE, pages 99–110. IEEE, 2016. [122] Daniel Casini, Alessandro Biondi, Geoffrey Nelissen, and Giorgio But- [122] Daniel Casini, Alessandro Biondi, Geoffrey Nelissen, and Giorgio But- tazzo. Memory feasibility analysis of parallel tasks running on scratchpad- tazzo. Memory feasibility analysis of parallel tasks running on scratchpad- based architectures. In 2018 IEEE Real-Time Systems Symposium (RTSS), based architectures. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 312–324. IEEE, 2018. pages 312–324. IEEE, 2018. [123] Daniel Casini, Alessandro Biondi, Geoffrey Nelissen, and Giorgio But- [123] Daniel Casini, Alessandro Biondi, Geoffrey Nelissen, and Giorgio But- tazzo. A holistic memory contention analysis for parallel real-time tasks tazzo. A holistic memory contention analysis for parallel real-time tasks under partitioned scheduling. In The 26TH IEEE Real-time and em- under partitioned scheduling. In The 26TH IEEE Real-time and embedded Technology and Applications Symposium (RTAS 2020), April bedded Technology and Applications Symposium (RTAS 2020), April 2020. 2020. 219 220 221

Bibliography 191 Bibliography 191 Mälardalen University Doctoral Dissertation315 Doctoral University Mälardalen Preemption-Delay Aware Schedulability of Analysis Real-Time Systems Marković Filip