<<

2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

Cache-aware response time analysis for real-time tasks with fixed preemption points

Filip Markovic´ Jan Carlson Radu Dobrin Division of Computer Science and Division of Computer Science and Division of Computer Science and Software Engineering (CSE) Software Engineering (CSE) Software Engineering (CSE) Malardalen¨ University Malardalen¨ University Malardalen¨ University Vaster¨ as,˚ Sweden Vaster¨ as,˚ Sweden Vaster¨ as,˚ Sweden fi[email protected] [email protected] [email protected]

Abstract—In real-time systems that employ preemptive preemption related delay. Those are a few reasons why it can be and cache architecture, it is essential to account as important to limit the number of preemptions, which is possible precisely as possible for cache-related preemption delays in the by using LP-FPP scheduling, and why such systems may benefit schedulability analysis, as an imprecise estimation may falsely deem the system unschedulable. In the current state of the art from a precise CRPD estimation since the preemption still may for preemptive scheduling of tasks with fixed preemption points, occur and a pessimistic estimation may deem a schedulable the existing schedulability analysis considers overly pessimistic taskset being unschedulable. estimation of cache-related preemption delay, which eventually The existing work on cache-related preemption estimation, leads to overly pessimistic schedulability results. In this paper, we integrated with the schedulability analysis, has, so far, consid- propose a novel response time analysis for real-time tasks with fixed preemption points, accounting for a more precise estimation ered only fully-preemptive real-time systems, and has been of cache-related preemption delays. The evaluation shows that the shown by Altmeyer et al. [8], [9] to be highly beneficial in proposed analysis significantly dominates the existing approach determining the schedulability of real-time systems. However, by being able to always identify more schedulable tasksets. concerning the LP-FPP scheduling, the existing feasibility Index Terms —Real-time Systems, Cache-related preemption analysis proposed by Yao et al. [10], [11] accounts for overly delay (CRPD), Fixed-Priority scheduling, Preemptive Scheduling, Cache Memory pessimistic estimation of preemption related delay, without seizing many of the properties of the LP-FPP model which facilitate the CRPD estimation of a task. I. INTRODUCTION In this paper, we propose a novel cache-aware response time For many years it has been debated whether the non- analysis for sporadic fixed-priority real-time tasks with fixed preemptive real-time scheduling or the fully-preemptive real- preemption points and sequential non-preemptive regions. We time scheduling offers better schedulability bound and, if so, first identify reloadable cache blocks, by defining the maximum under which circumstances. Eventually, it was shown by Yao number of times each memory cache block can be reloaded et al. [1], and Butazzo et al. [2] that Limited-Preemptive upon all possible preemptions, which is an improvement Scheduling (LPS) can provide even better schedulability results compared to the existing methods. Secondly, we define the than the two since it can combine and mimic both of the CRPD bounds accounting for the individual preemptions on the scheduling paradigms. non-preemptive regions, finally combining the two approaches Under the LPS paradigm, several approaches emerged, such in order to define a safely upper-bounded CRPD estimation as Deferred Preemption Scheduling (LP- DPS) proposed by which is integrated into the schedulability analysis. Also, we Sanjoy Baruah [3], Preemption Thresholds Scheduling (LP- identify and propose a computation of the maximum time PTS) proposed by Wang and Saksena [4], and Fixed Preemption interval during which any of the released preempting jobs Points (LP-FPP) Scheduling, proposed by Alan Burns [5]. may affect the CRPD of a task. Evaluation results show Among all of them, it has been shown by Yao et al. [1], and that the proposed analysis outperforms the existing feasibility Butazzo et al. [2] that LP-FPP allows for the most precise analysis for the LP-FPP task model in all of the evaluated estimation of the taskset schedulability since the possibly cases, regardless of the variation in different task and cache preempted points of a task are limited and known in advance, parameters. prior the system’s runtime. In the remainder of the paper, we first define the system It is shown by Pellizoni et al. [6] that preemptions can model in Section II. In Section III we discuss the related introduce a significant preemption related delay in preemptive work and describe the existing feasibility analysis for LP-FPP real-time systems, even up to 33% of the task’s worst-case scheduling. The proposed response time analysis is motivated execution time. Also, it is shown by Bastoni et al. [7] that in in Section IV and defined in Section V, which is followed by real-time systems which employ a cache-based architecture, the evaluation results, shown in Section VI. The conclusions cache-related preemption delay (CRPD) is the major part of the are discussed in Section VII.

2642-7346/20/$31.00 ©2020 IEEE 27 DOI 10.1109/RTAS48715.2020.00008 II. SYSTEM MODEL AND NOTATION For each task τi we define a set ECB i of evicting cache ECB = li ECB In this paper, we consider a sporadic task model, under a blocks such that i k=1 i,k. preemptive scheduler. In more detail, a taskset Γ consists of n We consider that a cache-related preemption delay is tasks with preassigned fixed and disjunct priorities, where each computed as the upper bound on number of cache block reloads, multiplied by a constant BRT , which is the longest time needed task τi generates an infinite number of jobs, characterised by for a single memory block to be reloaded into cache memory, the following tuple of parameters: Pi,Ci,Ti,Di. For a task i.e. memory block reload time. τi (1 ≤ i ≤ n), its priority is denoted with Pi, its worst-case execution time (WCET) without considering for preemptions In the paper we also use the following notations:  hp(i) P is denoted with Ci. The minimum inter-arrival time between : the set of tasks with priority higher than i.  hpe(i) hp(i) ∪ τ the consecutive task jobs is Ti, and the relative deadline is : i.  lp(i) P denoted with Di. The assumed deadlines are constrained, i.e. : the set of tasks with priority lower than i. Di ≤ Ti. A j-th job of τi is denoted with τi,j. Regarding the mathematical notations, in some cases, we We also consider that a task consists of a sequence of li non- use standard integer sets to denote cache block sets, but we preemptive regions (NPR), separated by li−1 preemption points also use multisets. A multiset is a set modification that can (see Fig. 1). Non-preemptive regions are also called runnables consist of multiple instances of the same element, unlike a set in some real-time system domains, e.g. the automotive industry where only one instance of an element can be present. For this [12], [13], [14]. The k-th NPR of τi is denoted with δi,k, reason, besides the standard set operations such as set union ∪ ∩ (1 ≤ k ≤ li), and its worst-case execution time, assuming no ( ), and set intersection ( ), we also use the multiset union δ , ..., δ  preemptions during the execution of i,1 i,k−1, is denoted ( ). The result of the multiset union over two multisets is a li with qi,k. Thus Ci can be expressed as k=1 qi,k, and the multiset that consists of the summed number of instances of worst-case execution time of the first li − 1 NPRS is denoted each element that is present in either of the unified multisets. E E = li−1 q with i and is equal to i k=1 i,k. The k-th preemption III. RELATED WORK point of τi is denoted with PP i,k. The majority of the cache-aware schedulability analyses are proposed in the context of fully-preemptive scheduling.     Among those, the current state of the art schedulability analyses    are: ECB-Union Multiset, and UCB-Union Multiset, proposed

      by Altmeyer et al. [8]. Multiset approaches account for a more precise estimation of the nested preemptions than the  prior proposed analyses. Prior to the multiset approaches, the Fig. 1: Task model two most precise approaches were: UCB-Union, proposed by Tan and Mooney [17], and ECB-Union, proposed by In this paper, we consider single-core systems with single- Altmeyer et al. [18]. The union approaches considered for level direct-mapped cache, while in Section V-D we enlist the first time that CRPD can be computed by intersecting the the adjustments for using the proposed method in case of evicting cache blocks and useful cache blocks when considering LRU set-associative caches. For each NPR, we assume that task interactions, using however different principles. The very the information about all the cache-sets whose blocks may be first ECB-based analyses were proposed by Tomiyama et al. accessed throughout the execution of the NPR is derived. In [19], and Busquets-Mataix et al. [20]. In these works, it was this paper, we formally represent cache-sets as integers. considered that the upper bound on CRPD can be derived by Formally, for each non-preemptive region δi,k we define: the maximum number of ECBs. Contrary to this,  a set of evicting cache blocks ECB i,k such that a cache-set the very first UCB-based analysis was proposed by Lee et al. s is in ECB i,k if τi may access cache block in cache-set [15], and it considered that the upper bound can be derived by s throughout the execution of δi,k. computing the maximum number of UCBs. Ramaprasad and Mueller [21] accounted for the CRPD bounds by investigating For each preemption point PP i,k we define a notion of a the feasible preemption points and preemption patterns in fully- useful memory block, building on the formulation originaly preemptive periodic tasks, and also in periodic tasks with a proposed by Lee et al. [15], and later superseded by Altmeyer single NPR [22]. In contrast, in this work, we account for et al. [16]. sporadic tasks, self-pushing phenomenon [23], and multiple  m PP A memory block is useful at i,k if and only if: possible NPRs, for which the above analyses are not applicable. a) m must be cached at program point PP i,k, and Considering the feasibility analysis of tasks with fixed b) m may be reused on at least one control flow path preemptions, the seminal and the most relevant work for this starting from PP i,k without self-eviction of m before paper was proposed by Yao et al. [10], [11] and this feasibility the reuse. analysis is formally described at the end of this section. Prior Then, we define a set UCB i,k of useful cache blocks at to this work, a very important contribution was made by Bril et PP i,k such that a cache-set s is in UCB i,k if τi has a useful al. [23] who for the first time identified the scheduling anomaly cache block in cache-set s at PP i,k. known as self-pushing phenomenon (also described later).

28 The benefits of using a fixed preemption model is described further interfered. The latest start time Si,j of the last NPR of by Butazzo et al. [2], among which are the increased timing τi,j is equal to the least fixed point of the following equation: predictability and the facilitation of the ⎧ ⎪S(0) = b + C − q + C ⎪ i i i,li h problems. Butazzo et al. [2] and Yao et al. [1] showed that ⎪ i,j ⎨ ∀τh∈hp(i) LP-FPPS can in the majority of the cases outperform many of ⎪ S(r−1) the existing scheduling algorithms. Becker et al. [13] showed ⎪S(r) = b + j × C − q + i,j +1 × C ⎪ i,j i i i,li h the benefits of limited-preemptiveness in automotive systems. ⎩ Th ∀τ ∈ (i) Other works in the area of fixed preemptions based task h hp models included the preemption point selection algorithms, where bi denotes the maximum lower priority blocking that proposed by Bertogna et al. [24], Peng et al. [25], and Cavicchio can be imposed on τi. The latest finish time Fi,j of the j-th F = S + q et al. [26]. Cavicchio et al. [26] identified for the first time that job is computed with the following equation i,j i,j i,li , a detailed view on CRPD is important for the tasks with fixed i.e. by adding the latest start time of the j-th job’s last NPR q δ preemption points, especially a possibility of cache block to and the worst-case execution time i,li of the NPR i,li . be reloaded. Also, in our previous work [27], [28] considering The worst-case response time Ri is finally defined as the the tasks with fixed preemptions, we proposed the constraint maximum latest finish time of all jobs released during the satisfaction model for analysing CRPD at a task level. However, Level-i active period. the existing feasibility analysis proposed by Yao et al. [10], [11] Furthermore, Yao et al. [11] proved that the analysis can be accounts for the overly pessimistic preemption delay estimation, simplified to the first job of each task, if for each task Di ≤ Ti, without seizing many of the properties of the LP-FPP task and if the taskset is feasible under fully-preemptive scheduling. model which may facilitate the CRPD estimation of a task. B. Limitations of the existing approach In this paper, we propose a novel cache-aware schedulability analysis for fixed-priority preemptive scheduling for sporadic Considering the preemption related overhead, Yao et al. tasks with fixed preemption points. In the following subsection, [11] proposed that all Ci values should be updated with the C = Cnp +(l −1)× Cnp we first describe the existing feasibility methods. following equation: i i i i, where i is the worst-case execution time without considering preemptions, and A. Overview of the feasibility analysis for tasks with non- i is the largest possible preemption cost that τi can experience preemptive regions at one of its preemption points. This can be a significant over- The seminal feasibility analysis for tasks with fixed preemp- approximation because: tion points was proposed by Yao et al. [10], [11], and here we  the analysis omits the fact that CRPD of a task τi can be briefly describe their work using Butazzo’s formulations [29]. caused only upon the execution start of the first NPR δi,1, δ 1) Self-Pushing phenomenon: A self-pushing phenomenon until the latest start time of the last NPR i,li . is a scheduling anomaly which may occur in scheduling tasks  the analysis does not account for the case where a with non-preemptive regions, identified and proved by Bril et preempting task τh can preempt only a limited number al. [23]. It states that the worst-case response time of a task of preemption points of a preempted task τi. Instead, it does not necessarily occur in the first job after the critical implicitly assumes that each preempting task can preempt instant, but it may occur in one of the succeeding jobs, during τi at all of its preemption points. the Level-i active period. As proposed by Bril et al. [30]:  the analysis does not account for the case that between Definition 1: A Level-i pending workload at time point t the two accesses of a memory block, it can be reloaded is the amount of unfinished execution of jobs with a priority at most once due to preemption. higher than or equal to Pi, released strictly before t. Therefore, in the following section, we provide motivating Definition 2: A Level-i active period is an interval [a, b] such examples for the novel method and then we define a response that Level-i pending workload is positive for all t ∈ (a, b) and time analysis with a more precise CRPD estimation. zero in a and b. Theorem 1: A maximum response time of task τi under IV. MOTIVATING EXAMPLES sporadic LP-FPPS is assumed in a level-i active period In this section, we present two motivating examples. In Fig. that is started when τi has a simultaneous release with all 2, we show a high-level scheduling perspective of a job τi,j higher priority tasks, and a subjob that produces maximum with three preemption points (PP i,1, PP i,2, and PP i,3). Also, lower priority blocking starts an infinitesimal time before the we show three preempting jobs of τh such that Ph >Pi. simultaneous release. (proved and proposed by Bril et al. [30]) In Fig. 2, it is shown that immediately after the NPR, with 2) Feasibility analysis: For a single job τi,j of τi, feasibility the maximum blocking bi on τi,j, starts to execute, τi and τh analysis is divided into two parts. First, it is necessary to are released. Further, we distinguish among three important compute the latest start time Si,j of the last non-preemptive time intervals. With Si,j we denote the latest start time of the region. Then, the latest finish time Fi,j of a job is computed. last NPR of τi,j, and with Fi,j we denote its latest finish time. This is performed in two steps because a task can experience However, notice that we also denote Ii, which is the maximum the interference from the higher priority tasks prior to the last duration from the start time of the first NPR of τi,j until the NPR. But, when the last NPR starts to execute, it cannot be start time of the last NPR of τi,j. Therefore, this interval also

29 within the last NPR of τ3 since only in the last NPR it is        re-accessed. To consider the worst-case eviction at each of those three points in isolation, as in method proposed by Yao   et al.[11], can introduce over-approximation, accounting for 3  block reloads in cache set 1, opposed to at most one possible

 reload. Next to this, we also consider several more properties  of task interactions on the cache level which are defined and  formalised in the following analysis.         V. C ACHE-AWARE RESPONSE TIME ANALYSIS Fig. 2: Example of different time intervals used in the analysis. Considering the computation of the response time Ri of τi, the proposed analysis accounts for the following main computations (see Fig. 2): represents the maximum time interval during which higher 1) bi – Computation of the maximum lower-priority blocking priority jobs can affect CRPD of τi,j due to preemptions. Note on τi, accounting for the CRPD in the blocking. that the latest start time Si,j of the last NPR includes lower 2) Ii – Computation of the maximum time duration from the priority blocking (bi time units long), and possibly some higher start time of δi,1 until the start time of δi,l . Ii represents priority interference prior to the start time of τi,j. During those i the upper bound on the interval during which higher two events, τi,j cannot be preempted nor can its CRPD be priority jobs can affect CRPD of τi,j due to preemptions affected. Therefore, Ii can be less than Si,j and Fi,j and its precise estimation is important since by an over-approximation (see Fig. 2). 3) Si,j – Computation of the latest start time of the last NPR of Ii it is possible to account for the CRPD impact from τ multiple preemptions which in reality cannot occur. of i,j accounting for the CRPD on all jobs with higher priority than τi which can be released within Si,j, and j − 1 τ    accounting for the CRPD on jobs of i. The CRPD τi,j li − 1  $   $ $ %& of should only include the CRPD in first NPRs,      $ since in the last NPR cache block reloads occur after Si,j, $ δ during the execution of i,li . (see Fig. 2) $  $ $ %& 4) Fi,j – Computation of the latest finish time of the j-th $ $       !  job of τi, where the worst-case execution time of the last $  $  $  δi,l τi         ! NPR i of is included, with the CRPD which can be " $  caused during the execution of δi,li .   $  # $  $ " "   $  In the remainder of this section, we formally define the  $  ! terms for computing the response time of a task, divided

! $ ! $  ! ! !! into three subsections: 1) Maximum lower priority blocking, ! $ ! $  ! $ $  2) CRPD computations, and 3) Latest start time, finish time, !         !! !" $  and response time. In order to ease the understanding of the  ! $  ! #!" $ !" $  ! $  following equations, the motivating example from Fig. 3 will  !! $  also serve as the running example. Fig. 3: Running taskset example. A. Maximum lower priority blocking In Fig. 3, we show a low-level perspective of the tasks, We start by defining the longest worst-case execution time with associated task and cache usage parameters. The accessed of a NPR in τi including the time the NPR may be prolonged cache-sets are depicted as circled integers in the NPRs where due to the CRPD. qmax they are accessed. For example, task τ1 accesses cache blocks Definition 3: The longest worst-case execution time i , τ in cache-sets 1,2,3, and 4. Considering task τ3, in its first accounting for CRPD, of a non-preemptive region in i is defined as follows: NPR it accesses cache blocks in cache sets 1 and 2, and so on. Therefore, a set of useful cache blocks at PP 3,1 is max qi =max qi,k + UCB i,k−1 ∩ ECBi,k ∩ UCB 3,1 = {1, 2}, while at PP 3,2 is UCB 3,2 = {1, 2, 3}, and k∈[1,li] at PP 3,3 it is UCB 3,3 = {1, 2, 3, 4}. ∩ ECB h × BRT , where UCBi,0 = ∅ To motivate the necessity for the analysis on this level, ∀τh∈hp(i) assume that τ1 preempts τ3 at all of its three preemption points. (1) Let us now focus on cache-set 1. Although it is considered max that there might be a UCB residing in cache-set 1 at each of Proposition 1: qi is an upper bound on the longest worst- the three preempting points, this cache block can be reloaded case execution time, accounting for CRPD, of a non-preemptive at most once due to preemption. This reload can only occur region in τi.

30 Proof: The CRPD of a non-preemptive region δi,k cannot Lemma 1: If a memory block m is not accessed in a NPR be higher than BRT multiplied by the number of cache blocks δi,k, then it cannot be reloaded in δi,k. such that: 1) the block can be evicted by some task with higher Proof: A cache block reload of m in δi,k implies that it priority than τi at PP i,k−1, and 2) due to preemption, the must be accessed during the execution of δi,k, without being block can be reloaded in δi,k, i.e. can be accessed in δi,k, and present in the cache memory, which concludes the proof. is useful at PP i,k−1. This is accounted in Equation 1 by the Lemma 2: A memory block m can be reloaded at most once set intersections between UCB i,k−1, ECB i,k, and the union in δi,k due to preemption. of all evicting cache blocks from the tasks with higher priority Proof: (By contradiction) Let us assume that due to than τi. Since the equation computes the maximum sum of the preemption m can be reloaded more than once in δi,k. During WCET and the CRPD upper bound, from all NPRs in τi, then the execution of δi,k, the first access request for the memory the proposition holds. block m can result in a reload due to preemption and eviction max Example: Given the task τ3 from Fig. 3, q3 =8, since the of m prior the execution of δi,k. However, the next reload due NPR δ3,4 has the non-preemptive WCET equal to 4, and all of to preemption within the same region implies that δi,k was its 4 useful cache blocks might be reloaded during its execution preempted. This is a contradiction since δi,k is non-preemptive. due to preemption at PP 3,3. max Given the qi term, we now define the maximum lower Note that m can be self-evicted during the execution of priority blocking that may block the start time of an instance δi,l and afterwards again reloaded in the same NPR which is of τi. Informally, it is the longest worst-case execution time of accounted for by the worst-case execution time of the region. a NPR belonging to the one of the tasks with a lower priority However, the source of such reload is not a preemption. than τi. Proposition 2: If a memory block m is accessed in a NPR Definition 4: Maximum lower priority blocking bi which δi,k, then due to preemptions it can be reloaded at most once may delay the start time of an instance of τi is equal to: from PP i,k until the end of the first following NPR δi,l where it is accessed. b =max(qmax) i l (2) m τl∈lp(i) Proof: (By contradiction) Let us assume that is accessed in δi,k, and due to preemption m can be reloaded more than Example: Given the task τ1, its maximum lower priority once from PP i,k until the end of δi,l. This implies two possible blocking is equal to 8, since the NPR δ3,4 has the largest cases: 1) m is reloaded in a NPR between δi,k and δi,l where WCET value accounting for possible CRPD, among all the it is not accessed, which contradicts Lemma 1, or 2) due other NPRs of τ2 and τ3. to preemptions, m is reloaded more than once in δi,l which contradicts Lemma 2. B. CRPD related computations Given the Proposition 2, in order to compute the upper This subsection is divided into four main parts: bound on reloads of m during the execution of first x NPRs 1) Reloadable cache blocks – where a modification on the of τi, we can compute the number of first x NPRs for which term of useful cache blocks in tasks with NPRs is defined. m is in the ECB set of a NPR and in the UCB set of the first 2) CRPD of a single job of τi during the time interval t – succeeding preemption point. m Where the computations for the upper bound on CRPD Definition 5: The upper bound ubi,x on reloads of a memory of a single job during the time interval t are defined. block m during execution of the first x NPRs of τi is equal to: 3) CRPD of all jobs of τi released during the time interval t m x – Where the computations for the upper bound on CRPD ubi,x = {k | (1 ≤ k

31 Example: Given the task τ3 from Fig. 3, the upper bound on in two jobs from τ1 and one job from τ2, and those are 1 the number of reloads ub3,4 of a memory block 1 during the {1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4}. Also, we consider what cache 1 execution of the entire task is ub3,4 =1. Notice that memory blocks can be reloaded upon those evictions in single job block 1 is useful at all three preemption points and can be of τ3, and those are RCB 3,4 = {1, 2, 3, 4}. For each cache evicted upon preemption on any of them, however it can be block, present in both – the RCB- and the evicting cache blocks reloaded at most once. Next, we define: multisets, we account that it is evicted and reloaded, meaning Definition 6: A multiset RCB i,x, where each memory block that CRPD at the end considers four cache block reloads. is present the maximum number of times it can be reloaded For the preemption-based method the allocation of pre- during the execution of the first x NPRs of a single job of τi: emptions on preemption points of a task is important. First, we consider two preempting jobs of τ1 and their two possible RCB = {m} i,x (4) preemptions on a job of τ3. In this case, the maximum CRPD ∀m∈ECB ubm i i,x from two preemptions is caused on PP 3 and PP 2, causing Proposition 4: RCB i,x is a multiset of all possible reloadable a number of reloads equal to 4 (due to evictions and reloads UCB cache blocks during the execution of the first x NPRs of a of blocks in 3,3), and 3 (due to evictions and reloads of UCB 3,2 τ2 single job of τi. blocks in ). For the preemption from a job of , the PP 3 Proof: Since each accessed block m (given by m ∈ ECB i) maximum CRPD occurs when is preempted, since useful m is included ub times in the multiset, then by Proposition cache blocks 1,2, and 3 can be evicted, and CRPD accounts for i,x 7+3 = 10 3 all the possible reloads for each memory block during the 3 cache block reloads. In total, cache block reloads execution of the first x NPRs of a job of τi are accounted. are accounted for, which is the less precise upper bound than the one derived with the RCB-based method.

Example: Given the task τ3 from Fig. 3, the RCB 3,4 multiset However, in some cases, e.g., computing CRPD for a single τ2 is equal to {1, 2, 3, 4}, and RCB 3,3 = ∅. Notice that memory job of , the preemption-based method provides a better result. blocks 1, 2, 3 and 4 are not present in RCB 3,3 because their In this case, the RCB-based method would account for 3 τ reloads cannot occur during the execution of the first 3 regions cache block reloads from a single job of 1. However, the of τ3.Forτ2, RCB 2,4 = {1, 3, 4, 4}. preemption-based method computes a tighter upper bound on CRPD accounting for one preemption from τ1 on PP 2,1. This 2) CRPD of a single job of τi within the time interval t: is the case that produces the maximum CRPD, equal to two cache block reloads of memory blocks 1, and 3. For a single τ For the upper bound on CRPD caused on a job of i during preemption at any other point, the CRPD is equal to one. t the time interval , we define two complementary types of It is important to mention that in this paper, for a single job CRPD computations. In the analysis, we compute CRPD with of any task, we are interested in the CRPD that can be caused: both approaches, and later use the lower value as the final  during the execution of the first li − 1 NPRs of a job. upper bound. The proposed concepts are:  during the execution of all NPRs of a job.  RCB-based upper bound (γ∪) – which considers that This is important because when considering the time intervals all higher priority jobs, releasable during t, can preempt a (see Fig.2) for the job τi,j itself, we should only consider CRPD job τi,j at all preemption points. In this case, we consider caused during the first li − 1 of its NPRs. This is because in that each reloadable cache block is reloaded during the Ii and Si,j intervals, we do not consider the CRPD of the last execution of a job if it can be evicted by some higher NPR of the job. But, when considering how this job affects priority job released during t. other jobs, we must consider CRPD in all of its NPRs.  Preemption-based upper bound (γ+) – which considers For these reasons, many equations in this paper consider the that a preemption, from a single higher priority job τh,j CRPD caused during the execution of the first x NPRs of a can evict and cause a reload only for the useful cache x τ job, where is the provided parameter. We distinguish among: blocks of the point at i,j where the preemption occurred.  γ∪ τ x : for RCB-based CRPD computations, and In this case, for each preempting task h, whose jobs  γ+ can be released j times during t interval, we account x : for preemption-based CRPD computations. for j highest CRPD values resulting from individual Based on whether or not the last NPR should be included, x l − 1 l preemptions of those jobs on preemption points of τi,j. as discussed above, can be assigned the value i or i, respectively. To give an informal example of the two CRPD computations, We first define the upper bound on CRPD for a single job let us assume that given the taskset from Fig. 3 we compute within a time interval t using RCB-based computation: the upper bound on CRPD caused on a single job of τ3 during Definition 7: The upper bound γ∪ (t) on the CRPD caused a 24 units long time interval. During the interval of 24 units, i,x on the first x NPRs of a single job of τi by all preempting at most two jobs of τ1 can be released, and at most one job jobs which can be released within time a interval t is defined of τ2 can be released. by the following equation: For the RCB-based method it does not matter how we allocate preemptions of these jobs on preemption points, we γ∪ (t)=RCB ∩ ECB × BRT i,x i,x h (5) care only about what are possibly evicting cache blocks ∀τh∈hp(i) t/Th

32 The equation accounts for the maximum number t/Th of job of τi, within the time interval t, is defined in the following releases of jobs of each higher priority task τh during t, and equation: ⎧ includes all of their evicting cache blocks as many times as h ⎪ CRPD ,x≤ t/Th they can be released. ⎨ i,x ∪ γh (t)= h Proposition 5: γ (t) is an upper bound on the CRPD caused i,x max c c ⊆ CRPD i,x ∧ (7) i,x ⎩⎪ , otherwise on the first x NPRs of a single job of τi during the interval t. |c| = t/Th Proof: By Proposition 4, all possibly reloadable cache blocks in a single job of τi during the execution of the first x Equation 7 computes the maximum cumulative CRPD from τ NPRs are given by RCB i,x. By the multiset union, Equation 5 a maximum number of preemptions of jobs of h on the first x NPRs of an instance of τi. Equation 7 considers two cases: accounts for the maximum number of possibly evicting cache t x ≤ τh blocks of all jobs with higher priority than τi that can be Case 1: ( Th ), the number of preempting jobs of released during t. Since by the multiset intersection Equation which can cause CRPD on first x regions of a single job of τi 5 accounts that every reloadable cache block from RCB i,x is greater than or equal to the number of preemption points indeed results in a reload if there is at least one evicting block that can be preempted. ∪ γh (t) from all possibly preempting jobs released during t, then γi,x(t) Lemma 3: i,x is an upper bound on CRPD caused by is an upper bound on CRPD caused on the first x NPRs of a preemptions from jobs of τh on the first x NPRs of a single t τi t x ≤ single job of τi during t. job of , within the time interval , in case Th . τ x ≤ t Example: Given all the NPRs of a task 3 (see Fig. 3) the Proof: When Th , Equation 7 sums all the CRPD h upper bound on their CRPD during the 24 units long time values in the multiset CRPD i,x, and since by Proposition 6, h interval is equal to: CRPD i,x is a multiset of upper bounds of the CRPD that can be caused by a job of τh on the first x NPRs of a single job γ∪ (24) = {1, 2, 3, 4}∩ {1, 2, 3, 4} {1, 2, 3} 3,4 of τi, then the lemma holds. 24 24 t 22 50 Example for x ≤ : Given the taskset from Fig. 3, the Th = {1, 2, 3, 4}∩ {1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4} =4 upper bound on CRPD caused by preemptions from jobs of τ1 on all NPRs of a single job of τ3 during 100 time units is 1 equal to γ (100) = 9 although jobs of τ1 can be released 5 Before defining the preemption-based CRPD upper bound, we 3,4 times during 100 time units, given by 100/T1 = 100/22 . first define a multiset which consists of all possible CRPD This is the case because only three preemption points can be values that jobs of τh can cause on the first x NPRs of τi. h preempted, and each of those preemptions individually cause Definition 8: A multiset CRPD of all upper bounded i,x maximum CRPD of 2,3, and 4 cache block reloads respectively. CRPDs that can be caused by a job of τh on each of the first Case 2: ( t job of , within the time interval , in case Th . h h Proof: By Proposition 6, CRPD i,x is a multiset of upper Proposition 6: CRPD i,x is a multiset of upper bounds of bounds of the CRPD that can be caused by a job of τh on the the CRPD that can be caused by a job of τh on the first x t first x NPRs of a single job of τi. Since T is the maximum NPRs of a single job of τi. h number of preemptions from the jobs of τh during t, then only Proof: A CRPD upper bound on a single NPR δi,k is t T number of NPRs can experience CRPD caused by jobs computed by accounting that all useful cache blocks at the h t τh of . Since Equation 7 sums the Th highest values from preemption point directly preceding δi,k can be evicted by the h CRPD i,x multiset, then the lemma holds. evicting cache blocks of τh. Thus, CRPD on δi,k is computed Example for t

33 Now, we consider that all higher priority jobs may contribute Proposition 10: Assuming that Ii is an upper bound on the to the CRPD of a single job of τi during the time interval t. time duration from the start of the first NPR of τi until the + Definition 10: The upper bound γi,x(t) on CRPD caused by start of the last NPR of τi, then γi(t) is a CRPD upper bound preemptions of all higher priority jobs during the time interval on all the jobs of τi which can be released during t. t on the first x NPRs of a single job of τi is defined in the Proof: By Proposition 9 and the assumption in Proposition following equation: 10, Equation 10 derives the CRPD upper bound γi,l (Ii) caused i + h during the Ii interval on all NPRs of a single job of τi. Since γi,x(t)= γi,x(t) (8) for each job of τi, that can be released within t, its CRPD ∀τh∈hp(τi) upper bound γi(t) is multiplied by the maximum number of + γ (t) Proposition 8: γi,x(t) is an upper bound on CRPD caused releases, then i is a CRPD upper bound on all the jobs of by all higher priority jobs which can be released within time τi which can be released during t. interval t on the first x NPRs of a single job of τi. Computation of Ii is defined in Equation 11. Proof: By Proposition 7, for a single higher priority task I h 4) Maximum time interval i: τh, γi,x(t) is an upper bound on CRPD caused by all jobs of Next, we define the computation of the Ii interval. In τh on the first x NPRs of a single job of τi within t. Since h Equation 11, we consider the critical instant, i.e. that just Equation 8 sums γi,x(t) for each task with higher priority than after the start of execution of δi,1 all the higher priority jobs τi, the proposition holds. + 1 may be released. It also considers the execution of a job of Example: Given the taskset from Fig. 3: γ3,4(24) = γ3,4(24) + 2 τi until the start of its last NPR, accounting for the possible γ3,4(24)=7+3=10. This is the case because Equation CRPD during the execution of the first li −1 NPRs. In each new 8 accounts for two preemptions from τ1 resulting in adding iteration, the Equation accounts that the computed interval can the two highest CRPD values (occurring at points PP 3,3: four be enlarged with the additional CRPD by the newly released reloads, and PP 3,2: three reloads), and one preemption from τ2 higher priority jobs with their respective CRPDs: resulting in a single highest CRPD value (occurring at PP 3,3: Definition 13: The upper bound Ii of the time duration from three reloads). the start time of δi,1 until the start time of δi,l is defined as Finally, we define the CRPD upper-bound using both – the i the least fixed point of the following recursive equation: RCB-based and the preemption-based computations. ⎧ ⎪I(0) = E Definition 11: Combining the two proposed methods, a safe ⎪ i i γ (t) x ⎪ upper bound i,x on CRPD caused on the first NPRs ⎨I(r) = E + γ (I(r−1))+ i i i,li−1 i of a job of τi during the time interval t is derived with the ⎪ (r−1) following equation: ⎪ Ii (r−1) ⎪ +1 × Ch + γh(Ii ) ⎩ Th + ∪ ∀τ ∈ (i) γi,x(t)=min γi,x(t) ,γi,x(t) (9) h hp (11) Proposition 9: γi,x(t) is an upper bound on CRPD caused Proposition 11: Ii is the upper bound of the time duration on the first x NPRs of a job of τi during the time interval t. from the start time of the first NPR of τi until the start time Proof: γi,x(t) is an upper bound because it takes the least of the last NPR of τi for any job of τi. of two upper bounds, given by Propositions 5 and 8. Proof: By induction over the tasks in Γ, in a decreasing Example: Given the taskset from Fig. 3, the final safe upper ∪ + priority order. bound is γ3,4(24) = min(γ3,4(24),γ3,4(24)) = min(4, 10) = Base case: I1 = E1, because hp(τ1)=∅ and γi,l −1(t)=0 4, meaning that the tighter upper bound is given by the RCB- i for any t since τ1 cannot experience CRPD. E1 is the upper based method. However, in case of τ2, the CRPD on a single bound on the time duration from the start time of the first job within 100 time units is computed as follows: γ2,4(10) = ∪ + NPR until the start time of the last NPR of τ1 since it is the min(γ (10),γ (10)) = min(3, 2) = 2, meaning that the 2,4 2,4 worst-case execution time between those two points. tighter upper bound is given by the preemption-based method. Inductive hypothesis: Assume that for all τh in hp(i), Ih is an 3) CRPD on all jobs of τi released within time interval t: upper bound on the time duration from the start of the first NPR, until the start of the last NPR of τh. Considering that Ii is the maximum time interval during Inductive step: We show that Equation 11 computes the safe τ which jobs with higher priority than i may affect the CRPD upper bound Ii. Consider the least fixed point of Equation 11, τi r r−1 of a single job of (see Fig. 2), then the CRPD upper bound for which Ii = Ii = Ii . At this point, the equation accounts of all jobs of τi which can be released during the time interval for the following upper bounds and worst-case execution times: t , is defined as follows:  Ei, which is the worst-case execution time, assumed by γ (t) Definition 12: The upper bound i on the CRPD on all the system model, of the first li − 1 NPRs. the jobs of τi that can be released during the time interval t is  γ (I ) i,li−1 i , which is proved by Proposition 9 to be the defined with the following equation: τ l − 1 CRPD upper bound of a single job of i in the first i t NPRs during the time interval Ii. γi(t)= × γi,l (Ii) Ii i (10)  (  +1)× Ch Ti ∀τh∈hp(i) Th – Worst-case interference

34 caused by the execution of jobs of higher priority tasks. The all preceding j − 1 jobs. Consider the least fixed point of r r−1 interference is accounted as the sum of individual WCET values Equation 12, for which Si,j = Si,j = Si,j . At this point, the of each job with higher priority than τi, that can be released equation accounts for the following upper bounds and worst- during Ii. The maximum number of jobs, of a single higher case execution times: priority task τh, which can be released during Ii is accounted  bi – By Proposition 1 and Definition 4, bi is an upper Ii   +1 τi by term Th . bound on lower priority blocking on .  γh(Ii)  (j − 1) × Ci j − 1 τi ∀τh∈hp(i) – CRPD upper bound on all jobs of – WCETs of the first jobs of , which the higher priority tasks that can be released during t.By execute prior to τi,j. Proposition 10 and the inductive hypothesis, γh(Ii) is an upper  γi((j − 1) × Ti) – By Proposition 10, it is an upper bound bound on the CRPD of all jobs of τh which can be released on CRPD of the first j − 1 jobs of τi. within Ii. Then, the sum of CRPD bounds γh(Ii) for each  Ei, which is the worst-case execution time, assumed by higher priority task than τi is a safe upper bound on the the system model, of the first li − 1 NPRs.  γ (I ) cumulative CRPD in all jobs with higher priority than τi, i,li−1 i – By Propostion 9, it is an upper bound on released during Ii. CRPD of a single job of τi (in this case τi,j) in the first li − 1 Since we proved for all the factors, which can prolong the NPRs during the time interval Ii. Si,j  (  +1)× Ch time duration from the start time of the first NPR until the ∀τh∈hp(i) Th – Worst-case interference finish time of the last NPR of τi, that they are accounted as caused by the execution of jobs of higher priority tasks. The the respective upper bounds in Equation 11, then their sum interference is accounted as the sum of individual WCET values results in an upper bound, which concludes the proof. of each job with higher priority than τi, that can be released Note that Equation 11 uses Equation 10, and vice versa. during Si,j. The maximum number of jobs, of a single higher However, Equation 11 is always computable, because in order priority task τh, which can be released during Si,j is accounted Si,j to compute Ii, Equation 11 computes γh(t) term only for   +1 by term Th . τi γh(t)  γh(Si,j) higher priority tasks than , meaning that it computes ∀τh∈hp(i) – CRPD upper bound on all jobs of using only the values I1 to Ii−1. Since γ1(t) always evaluates the higher priority tasks that can be released during Si,j.By to zero for any value of t, because τ1 cannot experience CRPD, Proposition 10, γh(Si,j) is an upper bound on the CRPD of this implies that I2 can be computed using γ1(t)=0, which all jobs of τh which can be released within Si,j. Then, the implies that I3 can be computed using the previously computed sum of CRPD bounds γh(Si,j) for each higher priority task I1 =0, and I2 values, and so on until Ii. than τi is a safe upper bound on the cumulative CRPD in all jobs with higher priority than τi, released during Si,j. C. Latest start time, finish time, and response time Since Equation 12 sums and considers upper bounds on all In this subsection, we define the equations for the latest start factors that can delay the start time of the last NPR of τi,j, time Si,j of the last NPR a job τi,j, latest finish time Fi,j the proposition holds. of τi,j, and the response time Ri of a task τi. All of those For the computation of Fi,j (see Fig. 2) we need to consider equations include the computation of the CRPD values for the worst-case execution time of the last NPR of τi including each job. the upper bound on CRPD: last We first define the latest start time Si,j computation, for the Definition 15: The worst-case execution time qi of the last NPR of the j-th job of τi after the critical instant. last NPR of τi, including the upper bound on CRPD during Definition 14: The latest start time Si,j of the last NPR of its execution, is defined in the following equation: the j-th instance of τi is equal to the least fixed point of the last = q + UCB ∩ ECB × BRT qi i,li i,li−1 h (13) following recursion: ⎧ ∀h∈hp(i) (0) last ⎪S = b + E Proposition 13: qi is an upper bound on the longest ⎪ i,j i i ⎪ execution time of the last NPR of τi, including the upper ⎪S(r) = b +(j − 1) × C + γ ((j − 1) × T ) ⎨⎪ i,j i i i i bound on CRPD during its execution. + Ei + γi,l −1(Ii)+ δ ⎪ i Proof: The CRPD of a non-preemptive region i,li can ⎪ (r−1) ⎪ S not be higher than BRT multiplied by the number of cache ⎪ + i,j +1 × C + γ (S(r−1)) ⎪ h h i,j blocks such that: 1) the block is useful at the point preceding ⎩ Th ∀τh∈hp(i) δi,l , and 2) the block can be evicted by some task with higher (12) i priority than τi at PP i,l −1. This is accounted in Equation In Equation 12, we consider the critical instant from Theorem i 13 by the set intersection between UCB i,l −1 and union of 1, proved by Bril et al. [30]. i ECB h sets. Since the equation sums the WCET of δi,li and Proposition 12: Assuming that the Level-i pending workload its CRPD upper bound, then the proposition holds. is greater than zero at each time point from the critical instant Definition 16: The latest finish time Fi,j of the j-th instance until j × Ti, then starting from the critical instant, Si,j is the of τi is equal to: upper bound on the latest start time of the last NPR of τi,j. last Fi,j = Si,j + qi (14) Proof: Based on the proposition assumption on the Level- i pending workload, the start time of τi,j is affected by

35 Proposition 14: Assuming that the Level-i pending workload execution request for τi and all higher priority tasks can be is greater than zero at each time point from the critical instant computed using the second line of Equation 11 by changing (r−1) last until j × Ti, then starting from the critical instant, Fi,j is the the term Ii with t, where t ∈ (0,Di − qi ], while the upper bound on the latest finish time of the last NPR of τi,j. maximum blocking on τi is computed using Equation 2. Proof: Follows from Propositions 12 and 13. Now, we define the upper bound on the Level-i active period D. Set-associative LRU caches accounting for the CRPD. As shown by Altmeyer et al. [8], [9], in case of set- Definition 17: The level-i active period Li, accounting for associative LRU caches, a cache-set may contain several useful the CRPD on all jobs within the period, is defined as follows: cache blocks, e.g., UCB1,2 = {1, 2, 2, 2}, meaning that at ⎧ (0) PP1,2, task τ1 contains three UCBs in cache-set 2, and one ⎨⎪L = bi + Ci i UCB in cache set 1. When a preemption point is preempted, L(r) = b + ((L(r−1)/T  +1)× C + γ (L(r−1))) ⎩⎪ i i i k k k i one ECB of a preempting task is enough to evict all UCBs of ∀τk∈hpe(i) the same cache-set. Therefore, multiple accesses to the same set (15) by the preempting task do not need to appear in the ECB set, Proposition 15: Li is an upper bound on Level-i period. which means that the ECB set can remain the same as in direct- Proof: Similar to the proof for Proposition 12. Consider the r r−1 mapped caches. A bound on CRPD due to preemption from least fixed point of Equation 15, for which Li = Li = Li .  τh on PPi,k can be defined with: UCBi,k ∩ ECBh where At this point, the equation accounts for the following upper the result is a multiset that contains each element from UCBh bounds and worst-case execution times: if it is also in ECBi. Lastly, the RCB definition needs to be  bi – By Proposition 1 and Definition 4, bi is an upper modified to account for each reloadable block as many times bound on lower priority blocking on τi. Li as it is present in the UCB set at each preemption point where  ∀τ ∈hpe(i)(  +1)× Ch – The worst-case execution k Tk the reload-ability condition from Equation 3 is satisfied. This is time of all jobs of τi and higher priority jobs than τi which x  achieved by updating Equation 3 with m ∈ UCBi,k ∩ ECBi. can be released within Li. The maximum number of jobs, of τ ∈ hpe(i) L a single task k , which can be released during i is VI. EVALUATION accounted by the following term  Li  +1. Tk In this section, we present and discuss the evaluation results  γk(Lj) – CRPD upper bound on all jobs of ∀τk∈hpe(i) on the effectiveness of different approaches to identify schedu- τi and higher priority jobs than τi which can be released within lable tasksets. We primarily compare the existing feasibility Li. Follows from Proposition 10. analysis (Feasibility Analysis), proposed by Yao et al. [10], [11], Since Equation 15 sums and considers upper bounds on all and the proposed RTA analysis (denoted as CRPD-Fixed-PP). factors that can extend the duration of the Level-i active period, Additionally, we compare those two approaches with state of the the proposition holds. Finally, we can define the worst-case response time: art CRPD analysis (Combined-Multiset) for fully-preemptive scheduling, proposed by Altmeyer et al. [8], and unsafe (NO- Definition 18: The worst case response time Ri of τi is CRPD) which does not account for any preemption cost on equal to the maximum relative latest finish time of a job τi,j preemption points, thus resulting in an unsafe approximation. which is released within Li. For the evaluation, we use basic cache-set configurations Ri =max{Fi,j − (j − 1) × Ti} obtained with a low-level analysis tool, named LLVMTA [32]. j∈[1,Li/Ti] (16) LLVMTA uses LLVM machine intermediate representation after Theorem 2: Ri is the safe upper bound on the worst-case response time of τi. Program ECB UCB Max Program ECB UCB Max Proof: Since each respective Fi,j value is computed within adpcm 256 230 103 lcdnum 51 11 9 Li, then the proposition assumption from Proposition 14 holds. R bs 43 23 20 lms 242 134 38 Then, by Theorem 1 i is the safe upper bound on the worst- bsort100 57 40 30 ludcmp 210 168 44 case response time of τi, which concludes the proof. cnt 123 58 44 matmult 85 51 31 1) Schedulability analysis: Given the above-proposed re- compress 247 150 63 minver 256 178 47 cover 256 38 15 ndes 253 176 38 sponse time analysis, the taskset schedulability is determined crc 121 62 30 ns 55 37 34 by computing whether for each task the derived response time edn 256 222 123 nsichneu 256 183 2 is less than or equal to the task deadline. I.e. if for any task expint 117 47 29 prime 75 47 33 τ Γ R >D fdct 126 113 62 qsort-ex. 142 83 39 i in the inequation i i holds, the taskset is deemed fft1 222 154 63 qurt 130 40 26 unschedulable by the proposed analysis. As described before, fibcall 28 16 16 select 159 73 55 in Section III-A, the schedulability test can be simplified to fir 94 42 21 sqrt 53 21 12 insertsort 29 16 15 st 192 95 52 the first job of each task, as proposed by Yao et al. [11]. Also, janne co. 39 28 27 statemate 256 105 1 the proposed approach can be adopted in another simplified jfdctint 132 122 54 ud 194 151 39 schedulability test (see Theorem 3 from [11]), which is based on discontinuous points of cumulative execution-request functions, TABLE I: Cache configurations obtained with LLVMTA [32] proposed by Bini and Buttazzo [31]. In this case, the cumulative analysis tool used on Malardalen¨ benchmark programs [33].

36 all backend compiler passes have run. The resulting machine one, as proposed by Bastoni et al. [7]. In the shown figures, program corresponds to a CFG reconstruction of a binary file. we show the weighted schedulability measure Wy(ρ), for Cache-set configurations are obtained on real-time programs schedulability test y as a function of parameter ρ. For each value from the Malardalen¨ benchmark [33], given in Table I. The of ρ, this measure combines data for all of the tasksets generated obtained parameters per each program are set of evicting (ECB) for each utilisation level from 0.7 to 1, with a step of 0.2. and definitely useful cache blocks (UCB) (in the table we show For each value ρ of the selected parameter, the schedulability the size of each set). Also, the maximum number (Max) of measure is Wy(ρ)= ∀Γ(UΓ × By(Γ,ρ))/ ∀Γ UΓ, where definitely useful cache blocks per any preemption point of By(Γ,Ui) is a result (1 if schedulable, 0 otherwise) of a each program is obtained. The assumed direct-mapped cache schedulability test y for a taskset Γ and parameter value ρ memory consists of 256 sets with a line size of 8 bytes. For (UΓ is a taskset utilisation). A method producing the highest more details about the low-level analysis refer to [34]. weighted measure is the most prone to identify schedulable In all of the experiments, we generate 2000 tasksets for tasksets. each investigated parameter value. In order to include realistic 100 cache-set usage, each task in a taskset is randomly assigned one of the distinct cache-set configurations presented in Table 80 I. Since the task binaries were analysed individually, they all start at the same address (mapping to cache set 0). In 60 NO-CRPD CRPD-Fixed-PP a multi-task scheduling situation, this can hardly be the Combined-Multiset case because the ECB and UCB placement is determined 40 Feasibility Analysis by their respective locations in memory. We took this into account by randomly shifting the cache set indices, e.g. the 20 Schedulable tasksets (%) ECB in cache set i is shifted to the cache line equal to 0 (i + random(256)) modulo 256. Further task parameters were 70 75 80 85 90 95 100 generated using the evaluation setup proposed by Altmeyer et al. Utilisation [18], [8] which is used in many CRPD-related papers in order Fig. 4: Schedulability ratio at different taskset utilisation. to exhaustively evaluate CRPD-based analyses. Utilisation of each task was generated using the standardised algorithm in real-time research – UUnifast, proposed by Bini and Butazzo [35]. The task periods were generated according to a uniform 0.6 distribution from 5 to 500 milliseconds, representing the general 0.5 spread of periods in automotive and aerospace hard real-time 0.4 applications [18]. The worst-case execution time for each task 0.3 was computed with the following equation Ci = Ui × Ti, where Ui represents the utilisation of τi. Task deadlines were 0.2 D = T Weighted measure implicit, meaning that i i. Task priorities were assigned 0.1 according to the deadline-monotonic order. Furthermore, in 0 order to allow for an exhaustive evaluation, we generated other 345678910 parameters for LP-FPP task model and in different experiments Taskset size we varied different parameters. We finish this paragraph by enlisting the default evaluation setup. The number of subjobs Fig. 5: Weighted measure at different taskset size. for each task was randomly generated according to a uniform distribution from the range [3, 100] as this is a representative In the first experiment (Fig.4), we evaluated the impact range of non-preemptive regions used in automotive real-time of the taskset utilisation (x-axis) on the effectiveness of the applications (where NPRs are also called runnables). The worst- analyses to determine the taskset schedulability (y-axis). We case execution time of each subjob was set to Ci/li. The default notice that CRPD-Fixed-PP succeeds to identify significantly taskset size was set to 6. For each preemption point PP i,k of a more schedulable tasksets compared to (Feasibility Analysis) task τi, its useful cache blocks UCB i,k were generated from the and (Combined-Multiset), e.g. for UΓ =0.88, CRPD-Fixed- UCB set, assigned to τk from Table I. Also, we pessimistically PP identifies 48%, Combined-Multiset 34%, and Feasibility- assumed that each point of τk exhibits a maximum number Analysis only 0.6% of all generated tasksets as schedulable. of UCBs, given in the table. Accordingly, each evicting cache In the second experiment (Fig.5), we evaluated the impact of block was included at the NPRs prior and after the preemption the taskset size (x-axis) on the effectiveness of the analyses to point where it is useful. The assumed block reload time was 8 determine the taskset schedulability (y-axis), with the weighted microseconds, as given in [18]. measure. The evaluated taskset size ranges from 3 to 10. In order to increase the exhaustiveness of the performed eval- We notice that with the increase in taskset size, all methods uation, we used the weighted schedulability measure which also deteriorate in finding schedulable tasksets. However, CRPD- enables emulation of a 3-dimensional plot to a 2-dimensional Fixed-PP deteriorates with the lowest rate, still being able to

37 find the largest number of schedulable tasksets compared to UCBs of τi, derived from the configuration assigned from Feasibility-Analysis and Combined-Multiset. Table I. In cases when x value exceeded Max i, Max i was In the third experiment (Fig.6), we evaluated the impact of assigned. We notice that only CRPD-Fixed-PP is affected in the numbers of NPRs per task (x-axis) on the effectiveness of this experiment. In case when the number of UCBs per point the analyses to determine the taskset schedulability (y-axis), is derived from the range [0, Max i] the proposed method is with the weighted measure. In this experiment, we randomly more prone to identify schedulable taskset, as it accounts for generated the number of NPRs from the uniform distribution in the variability of CRPD throughout the task’s execution, unlike the range [3,x] where x is a value on the x-axis from the figure. the other presented methods. We notice that the greater the number of NPRs, the higher the weighted schedulability of CRPD-Fixed-PP, meaning it is more prone to identify schedulable tasksets. This is the case because 0.55 the greater the number of NPRs, the lower the blocking from the lower priority tasks. In contrast, with Feasibility-Analysis 0.5 it is the opposite because the greater number of NPRs, the greater estimation of preemption cost in this case. 0.45 Weighted measure 0.6 0.4 0.5 C.M. C.F.P C.F.P' 0.4 Combined-Multiset Fig. 8: Weighted measure at different reload-ability conditions. CRPD-Fixed-PP 0.3 NO-CRPD Feasibility Analysis 0.2 In the fifth experiment (Fig.8), we evaluated the impact of

Weighted measure changing the reload-ability condition on CRPD-aware methods. 0.1 We considered two cases. Bars (C.M. – Combined-Multiset) and 0 (C.F.P. – CRPD-Fixed-PP) represent the weighted measures 20 40 60 80 100 Maximum number of NPRs obtained with default evaluation setup, where each UCB within the task’s execution is definitely reloadable (since ECBs Fig. 6: Weighted measure at different upper bound on number are assigned in NPRs directly preceding and succeeding the of non-preemptive regions. point where the block is useful). Bar (C.F.P’. – CRPD-Fixed- PP) considers the case where UCB is reloadable only once throughout the entire sequence of preemption points where it 0.6 is useful, thus resulting in more identified schedulable tasksets. 0.5 This is even more important for tasks with variable initialization 0.4 in the beginning, and dispatching at the task’s end. Combined-Multiset 0.3 CRPD-Fixed-PP NO-CRPD VII. CONCLUSIONS 0.2 Feasibility Analysis

Weighted measure In this paper, we propose a cache-aware response time 0.1 analysis for tasks with non-preemptive regions (NPR) under 0 fixed-priority preemptive real-time scheduling. The analysis 0 64 128 192 256 accounts for a tighter estimation of cache-related preemption Maximum number of UCBs per PP delays thus dominating the existing feasibility analysis for the Fig. 7: Weighted measure at different upper bound on number fixed-point–based tasks. It is based on the observations that: 1) of UCBs per preemption point. the number of cache blocks which can be reloaded during the execution of a task can be smaller than the number of cache In the fourth experiment (Fig.7), we evaluated the impact blocks which can be useful in a task, 2) a single job with NPRs of the number of UCBs per preemption point (x-axis) on can be affected by cache-related preemption delay from the the effectiveness of the analyses to determine the taskset preemptions occurring only during the maximum time duration schedulability (y-axis), with the weighted measure. In this from the start of its first NPR, until the start of its last NPR. experiment, we removed the pessimistic consideration from We evaluated the proposed analysis by comparing its the default evaluation setup that all preemption points exhibit effectiveness to identify schedulable tasksets with state of the the maximum number (Max from Table I) of UCBs at each art feasibility analysis for tasks with non-preemptive regions preemption point. Therefore, for each preemption point PP i,k, and fully-preemptive tasks. The evaluation showed that the we randomly generated the number of UCBs from the uniform proposed analysis significantly improves the state of the art of distribution in the range [x, Max i] where x is a value on the such schedulability analyses, since it accounts for the estimation x-axis from the figure, and Max i the maximum number of of a tighter bound on cache-related preemption delay.

38 ACKNOWLEDGEMENT [17] Y. Tan and V. Mooney, “Timing analysis for preemptive multitasking real- time systems with caches,” ACM Transactions on Embedded Computing Foremost, we are thankful to our colleagues Sebastin Hahn, Systems (TECS), vol. 6, no. 1, p. 7, 2007. Jan Reineke, and Darshit Shah who provided us with evaluation [18] S. Altmeyer, R. I. Davis, and C. Maiza, “Cache related pre-emption data, derived from the code-level analysis of benchmark delay aware response time analysis for fixed priority pre-emptive systems,” in 2011 IEEE 32nd Real-Time Systems Symposium. IEEE, 2011, pp. programs. Especially, we are thankful to Sebastian Hahn who 261–271. provided additional insights and computed the cache and task [19] H. Tomiyama and N. D. Dutt, “Program path analysis to bound cache- parameters using the low-level analysis tool from Saarland related preemption delay in preemptive real-time systems,” in Proceedings of the eighth international workshop on Hardware/software codesign. University, which significantly improved the quality of this ACM, 2000, pp. 67–71. paper. Also, we are very grateful to Davor Cirkinagiˇ c´ who [20] J. V. Busquets-Mataix, J. J. Serrano, R. Ors, P. Gil, and A. Wellings, provided us with a powerful computing system for performing “Adding instruction cache effect to schedulability analysis of preemptive real-time systems,” in Real-Time Technology and Applications Symposium, the schedulability evaluation. Lastly, we are very grateful to 1996. Proceedings., 1996 IEEE. IEEE, 1996, pp. 204–212. all reviewers, whose comments were insightful and important [21] H. Ramaprasad and F. Mueller, “Tightening the bounds on feasible for correcting this paper. preemption points,” in 2006 27th IEEE International Real-Time Systems Symposium (RTSS’06). IEEE, 2006, pp. 212–224. [22] ——, “Bounding worst-case response time for tasks with non-preemptive REFERENCES regions,” in 2008 IEEE Real-Time and Embedded Technology and [1] G. Yao, G. Buttazzo, and M. Bertogna, “Comparative evaluation of Applications Symposium. IEEE, 2008, pp. 58–67. limited preemptive methods,” in 2010 IEEE 15th Conference on Emerging [23] R. J. Bril, J. J. Lukkien, and W. F. Verhaegh, “Worst-case response time Technologies & Factory Automation (ETFA 2010). IEEE, 2010, pp. 1–8. analysis of real-time tasks under fixed-priority scheduling with deferred [2] G. C. Buttazzo, M. Bertogna, and G. Yao, “Limited preemptive preemption,” Real-Time Systems, vol. 42, no. 1-3, pp. 63–119, 2009. scheduling for real-time systems. A survey,” industrial Informatics, IEEE [24] M. Bertogna, O. Xhani, M. Marinoni, F. Esposito, and G. Buttazzo, Transactions on, vol. 9, no. 1, pp. 3–15, 2013. “Optimal selection of preemption points to minimize preemption overhead,” [3] S. Baruah, “The limited-preemption uniprocessor scheduling of sporadic in Real-Time Systems (ECRTS), 2011 23rd Euromicro Conference on. task systems,” in 17th Euromicro Conference on Real-Time Systems IEEE, 2011, pp. 217–227. (ECRTS’05). IEEE, 2005, pp. 137–144. [25] B. Peng, N. Fisher, and M. Bertogna, “Explicit preemption placement [4] Y. Wang and M. Saksena, “Scheduling fixed-priority tasks with pre- for real-time conditional code,” in Real-Time Systems (ECRTS), 2014 emption threshold,” in Real-Time Computing Systems and Applications, 26th Euromicro Conference on. IEEE, 2014, pp. 177–188. 1999. RTCSA’99. Sixth International Conference on. IEEE, 1999, pp. [26] J. Cavicchio, C. Tessler, and N. Fisher, “Minimizing cache overhead via 328–335. loaded cache blocks and preemption placement,” in Real-Time Systems [5] A. Burns and E. S. Son, “Preemptive priority based scheduling: An (ECRTS), 2015 27th Euromicro Conference on. IEEE, 2015, pp. 163– appropriate engineering approach,” Advances in Real-Time Systems, pp. 173. 225–248, 1994. [27] F. Markovic, J. Carlson, and R. Dobrin, “Tightening the bounds on cache- [6] R. Pellizzoni, B. D. Bui, M. Caccamo, and L. Sha, “Coscheduling of CPU related preemption delay in fixed preemption point scheduling,” in 17th and I/O transactions in COTS-based embedded systems,” in Real-Time International Workshop on Worst-Case Execution Time Analysis (WCET Systems Symposium, 2008. IEEE, 2008, pp. 221–231. 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. [7] A. Bastoni, B. Brandenburg, and J. Anderson, “Cache-related preemption [28] F. Markovic,´ J. Carlson, and R. Dobrin, “Improved cache-related and migration delays: Empirical approximation and impact on schedula- preemption delay estimation for fixed preemption point scheduling,” in bility,” Proceedings of OSPERT, pp. 33–44, 2010. Ada-Europe International Conference on Reliable Software Technologies. [8] S. Altmeyer, R. I. Davis, and C. Maiza, “Improved cache related pre- Springer, 2018, pp. 87–101. emption delay aware response time analysis for fixed priority pre-emptive [29] G. Buttazzo, Hard real-time computing systems: predictable scheduling systems,” Real-Time Systems, vol. 48, no. 5, pp. 499–526, 2012. algorithms and applications. Springer Science & Business Media, 2011, [9] S. Altmeyer, C. Maiza, and J. Reineke, “Resilience analysis: tightening vol. 24. the CRPD bound for set-associative caches,” in ACM Sigplan Notices, [30] R. J. Bril, W. F. Verhaegh, and J. J. Lukkien, “Exact worst-case vol. 45, no. 4. ACM, 2010, pp. 153–162. response times of real-time tasks under fixed-priority scheduling with [10] G. Yao, G. Buttazzo, and M. Bertogna, “Feasibility analysis under deferred preemption,” in Proc. Work-in-Progress (WiP) session of the fixed priority scheduling with fixed preemption points,” in 2010 IEEE 16th Euromicro Conference on Real-Time Systems (ECRTS), Technical 16th International Conference on Embedded and Real-Time Computing Report from the University of Nebraska-Lincoln, Department of Computer Systems and Applications. IEEE, 2010, pp. 71–80. Science and Engineering (TR-UNL-CSE-2004-0010), 2004, pp. 57–60. [11] ——, “Feasibility analysis under fixed priority scheduling with limited [31] E. Bini and G. C. Buttazzo, “Schedulability analysis of periodic fixed preemptions,” Real-Time Systems, vol. 47, no. 3, pp. 198–223, 2011. priority systems,” IEEE Transactions on Computers, vol. 53, no. 11, pp. [12] A. Hamann, D. Dasari, S. Kramer, M. Pressler, and F. Wurst, “Communi- 1462–1473, 2004. cation centric design in complex automotive embedded systems,” in 29th [32] S. Hahn, M. Jacobs, and J. Reineke, “Enabling compositionality for Euromicro Conference on Real-Time Systems (ECRTS 2017). Schloss multicore timing analysis,” in Proceedings of the 24th international Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. conference on real-time networks and systems. ACM, 2016, pp. 299– [13] M. Becker, N. Khalilzad, R. J. Bril, and T. Nolte, “Extended support for 308. limited preemption fixed priority scheduling for osek/autosar-compliant [33] J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, “The malardalen¨ wcet operating systems,” in 10th IEEE International Symposium on Industrial benchmarks: Past, present and future,” in 10th International Workshop on Embedded Systems (SIES). IEEE, 2015, pp. 1–11. Worst-Case Execution Time Analysis (WCET 2010). Schloss Dagstuhl- [14] L. Hatvani, R. J. Bril, and S. Altmeyer, “Schedulability using native Leibniz-Zentrum fuer Informatik, 2010. non-preemptive groups on an autosar/osek platform with caches,” in [34] D. Shah, S. Hahn, and J. Reineke, “Experimental evaluation of cache- Design, Automation & Test in Europe Conference & Exhibition (DATE), related preemption delay aware timing analysis,” in 18th International 2017. IEEE, 2017, pp. 244–249. Workshop on Worst-Case Execution Time Analysis (WCET 2018). Schloss [15] C.-G. Lee, J. Han, Y.-M. Seo, S. L. Min, R. Ha, S. Hong, C. Y. Park, Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. M. Lee, and C. S. Kim, “Analysis of cache-related preemption delay in [35] E. Bini and G. C. Buttazzo, “Measuring the performance of schedulability fixed-priority preemptive scheduling,” IEEE Transactions on Computers, tests,” Real-Time Systems, vol. 30, no. 1-2, pp. 129–154, 2005. vol. 47, no. 6, pp. 700–713, 1998. [16] S. Altmeyer and C. Burguiere, “A new notion of useful cache block to improve the bounds of cache-related preemption delay,” in Real-Time Systems, 2009. ECRTS’09. 21st Euromicro Conference on. IEEE, 2009, pp. 109–118.

39