Predictable Execution Model: Concept and Implementation

Predictable Execution Model: Concept and Implementation Rodolfo Pellizzoniy, Emiliano Bettiy, Stanley Baky, Gang Yao], John Criswelly and Marco Caccamoy y University of Illinois at Urbana-Champaign, IL, USA, rpelliz2, ebetti, sbak2, criswell, mcaccamo @illinois.edu f ] Scuola Superiore Sant’Anna, Italy, [email protected] Abstract I/O peripherals) that can independently initiate access to shared resources, which, in the worst case, cause contention Building safety-critical real-time systems out of inex- leading to timing degradation. pensive, non-real-time, COTS components is challenging. Computing precise bounds on timing delays due to con- Although COTS components generally offer high perfor- tention is difficult. Even though some existing approaches mance, they can occasionally incur significant timing de- can produce safe upper bounds, they need to be very pes- lays. To prevent this, we propose controlling the operating simistic due to the unpredictable behavior of arbiters of point of each COTS shared resource (like the cache, mem- physically shared COTS resources (like caches, memories, ory, and interconnection buses) to maintain it below its sat- and buses). As a motivating example, we have previously uration limit. This is necessary because the low-level ar- shown that the computation time of a task can increase lin- biters of these shared resources are not typically designed early with the number of suffered cache misses due to con- to provide real-time guarantees. In this work, we introduce tention for access to main memory [16]. In a system with a novel system execution model, the PRedictable Execution three active components, a task’s worst case computation Model (PREM), which, in contrast to the standard COTS ex- time can nearly triple. To exploit the high average perfor- ecution model, coschedules at a high level all active COTS mance of COTS components without experiencing the long components in the system, such as CPU cores and I/O pe- delays occasionally suffered by real-time tasks, we need to ripherals. In order to permit predictable, system-wide exe- control the operating point of each COTS shared resource cution, we argue that real-time embedded applications need and maintain it below saturation limits. This is necessary to be compiled according to a new set of rules dictated by because the low-level arbiters of the shared resources are PREM. To experimentally validate our theory, we developed not typically designed to provide real-time guarantees. This a COTS-based PREM testbed and modified the LLVM Com- work aims at showing that this is indeed possible by care- piler Infrastructure to produce PREM-compatible executa- fully rethinking the execution model of real-time tasks and bles. by enforcing a high-level coscheduling mechanism among all active COTS components in the system. Briefly, the key idea is to coschedule active components so that contention for accessing COTS shared resources is implicitly resolved 1. Introduction by the high-level coscheduler without relying on low-level, non-real-time arbiters. Several challenges had to be over- Real-time embedded systems are increasingly being built come to realize the PRedictable Execution Model (PREM): using commercial-off-the-shelf (COTS) components such as mass-produced CPUs, peripherals and buses. Overall Task execution times suffer high variance due to inter- performance of mass produced components is often signif- • nal CPU architecture features (caches, pipelines, etc.) icantly higher than custom-made systems. For example, a and unknown cache miss patterns. This source of tem- PCI Express bus [14] can transfer data three orders of mag- poral unpredictability forces the designer to make very nitude faster than the real-time SAFEbus [8]. However, the pessimistic assumptions when performing schedulabil- main drawback of using COTS components within a real- ity analysis. To address this problem, PREM uses a time system is the presence of unpredictable timing anoma- novel program execution model with three main fea- lies since the individual components are typically designed tures: (1) jobs are divided into a sequence of non- paying little or no attention to worst-case timing behavior. preemptive scheduling intervals; (2) some of these Additionally, modern COTS-based embedded systems in- scheduling intervals (named predictable intervals) clude multiple active components (such as CPU cores and are executed predictably and without cache-misses by prefetching all required data at the beginning of the in- 2. Related Work terval itself; (3) the execution time of predictable intervals is kept constant by monitoring CPU time coun- Several solutions have been proposed in prior real-time ters at run-time. research to address different sources of unpredictability in COTS components, including real-time handling of periph- I/O peripherals with DMA master capabilities contend eral drivers, real-time compilation, and analysis of con- • for physically shared resources, including memory and tention for memory and buses. For peripheral drivers, buses, in an unpredictable manner. To address this Facchinetti et al. [4] proposed using a non-preemptive in- problem, we expand upon on our previous work [1] terrupt server to better support the reusing of legacy drivers. and introduce hardware to put the COTS I/O subsys- Additionally, analysis can be done to model worst-case tem under the discipline of real-time scheduling. temporal interference caused by device drivers [10]. For real-time compilation, a tight coupling between compiler and worst-case execution time (WCET) analyzer Low-level COTS arbiters are usually designed to • can optimize a program’s WCET [5]. Alternatively, a achieve fairness instead of real-time performance. compiler-based approach can provide predictable pag- To address this problem, we enforce a coscheduling [17]. For analysis of contention for memory and ing mechanism that serializes arbitration requests buses, existing techniques can analyze the maximum de- of active components (CPU cores and I/O peripher- lay caused by contention for a shared memory or bus als). During the execution of a task’s predictable in- under various access models [15, 20]. All these works at- terval, a scheduled peripheral can access the bus and tempt to analyze or control a single resource, and ob- memory without experiencing delays due to cache tain safe bounds that are often highly pessimistic. Instead, misses caused by the task’s execution. PREM is based on a global coschedule of all relevant system resources. Our PRedictable Execution Model (PREM) can be used Instead of using COTS components, other researchers with a high level programming language like C by set- have discussed new architectural solutions that can ting some programming guidelines and by using a modified greatly increase system predictability by removing sig- compiler to generate predictable executables. The program- nificant sources of interference. Instead of a standard mer provides some information, like beginning and end of cache-based architecture, a real-time scratchpad archi- each predictable execution interval, and the compiler gen- tecture can be used to provide predictable access time erates programs which perform cache prefetching and en- to main memory [22]. The Precision Time (PRET) ma- force a constant execution time in each predictable inter- chine [3] promises to simultaneously deliver high com- val. In light of the above discussion, we argue that real- putational performance together with cycle-accurate esti- time embedded applications should be compiled according mation of program execution time. While our PREM ex- to a new set of rules dictated by PREM. At the price of mi- ecution model borrows some ideas from these works, nor additional work by the programmer, the generated ex- it exhibits one key difference: our model can be ap- ecutable becomes far more predictable than state-of-the-art plied to existing COTS-based systems, without requir- compiled code, and when run with the rest of the PREM ing significant architectural redesign. This approach al- system, shows significantly reduced worst-case execution lows PREM to leverage the advantage of the economy of time. scale of COTS systems, and support the progressive migra- tion of legacy systems. The rest of the paper is organized as follows. Section 2 discusses related work. In Section 3 we describe our main contribution: a co-scheduling mechanism that schedules I/O 3. System Model interrupt handlers, task memory accesses and I/O peripheral data transfers in such a way that access to shared COTS re- We consider a typical COTS-based real-time embedded sources is serialized achieving zero or negligible contention system comprising of a CPU, main memory and multi- during memory accesses. Then, in Sections 4 and 5 we dis- ple DMA peripherals. While in this paper we restrict our cuss the challenges in term of hardware architecture and discussion to single-core systems with no hardware multi- code organization that must be met to predictably compile threading, we believe that our predictable execution model real-time tasks. Section 6 presents our schedulability analy- is also applicable to multicore systems. We will present a sis. Finally, in Section 7 we detail our prototype testbed, in- predictable execution model for multicore systems as part cluding our compiler implementation based on the LLVM of our planned future work. The CPU can implement one or Compiler Infrastructure [9], and provide an experimental more cache levels. We focus on the last cache level, which evaluation. We conclude with future work in Section 8. typically employs a write-back policy. Whenever a task suf- 8/&,9'3:;-&#&%&'()*"#9"3&/7:-2# 4&4*/1## &%&'()*"# COTS Real-Time RAM Peripheral Bridge ?>:2&# ?>:2&# COTS COTS Real-Time Motherboard <8=#$%&'()*"# Peripheral Bridge North <:'>&#.&3'>&2#:",# PCIe FSB Bridge CPU Peripheral /&?-:'&4&"32# Scheduler 8&/9?>&/:-#,:3:# South 3/:"2.&/2# PCI ATA Disk COTS Real-Time Bridge Peripheral Bridge Figure 1: Real-Time I/O Management System. Figure 2: Predictable Interval with constant execution time.

Load more