Multicore Operating-System Support for Mixed Criticality∗

Multicore Operating-System Support for Mixed Criticality∗ James H. Anderson, Sanjoy K. Baruah, and Bjorn¨ B. Brandenburg The University of North Carolina at Chapel Hill Abstract Different criticality levels provide different levels of assurance against failure. In current avionics designs, Ongoing research is discussed on the development of highly-critical software is kept physically separate from operating-system support for enabling mixed-criticality less-critical software. Moreover, no currently deployed air- workloads to be supported on multicore platforms. This craft uses multicore processors to host highly-critical tasks work is motivated by avionics systems in which such work- (more precisely, if multicore processors are used, and if loads occur. In the mixed-criticality workload model that is such a processor hosts highly-critical applications, then all considered, task execution costs may be determined using but one of its cores are turned off). These design decisions more-stringent methods at high criticality levels, and less- are largely driven by certification issues. For example, cer- stringent methods at low criticality levels. The main focus tifying highly-critical components becomes easier if poten- of this research effort is devising mechanisms for providing tially adverse interactions among components executing on “temporal isolation” across criticality levels: lower levels different cores through shared hardware such as caches are should not adversely “interfere” with higher levels. simply “defined” not to occur. Unfortunately, hosting an overall workload as described here clearly wastes processing resources. 1 Introduction In this paper, we propose operating-system infrastruc- ture that allows applications of different criticalities to be The evolution of computational frameworks for avionics co-hosted on a multicore platform. Our specific focus is systems is being impacted by two trends. First, such frame- ensuring real-time correctness. We assume that criticalities works are becoming increasingly complex. While this is are defined as in the RTCA DO-178B software standard, generally true, a particularly good case in point is next- which specifies five criticality levels, as listed in Table 1 (a generation unmanned systems, where complex decision- more thorough discussion of criticalities is given later). We making software must be used to perform pilot functions. further assume that the workload to be supported is speci- Second, there is increasing pressure to host such complex fied as a collection of periodic tasks with harmonic periods. systems using hardware platforms that require less weight The OS development platform used in our work is and volume. a UNC-produced open-source system called LITMUSRT These trends are occurring at a time when multicore (LInux Testbed for MUltiprocessor Scheduling in Real- chip designs have replaced single-core designs in the prod- Time systems) [7, 9, 12, 14, 15, 26]. LITMUSRTis an uct lines of most chip manufacturers. This development extension of Linux (currently, version 2.6.24) that allows is profound, as it means that multiprocessors are now a different (multiprocessor) real-time scheduling algorithms “common-case” platform that will be used across a wide to be linked at runtime as plug-in components. Relevant range of application domains. In the avionics domain, details regarding LITMUSRT, as well as our rationale in multicore platforms offer the potential of enabling greater choosing it as a development platform, are given later. The computational capabilities in less space. Unfortunately, main focus of this paper is to report on the status of our im- currently-used avionics operating systems do not provide plementation efforts in extending LITMUSRT to support sufficient functionality to allow applications of varying crit- multi-criticality workloads on multicore platforms. We icalities to be co-hosted on a multicore platform. In this also more broadly discuss some of the challenges inherent paper, we discuss ongoing research that is being conducted in supporting such workloads. to address this issue. The rest of this paper is organized as follows. In Sec. 2, ∗ we provide relevant background material and discuss re- Work supported by IBM, Intel, and Sun Corps.; NSF grants CNS RT 0834270, CNS 0834132, and CNS 0615197; and ARO grant W911NF- lated work. In Sec. 3, we describe the LITMUS -based 06-1-0425. implementation efforts mentioned above. In Sec. 4, we Level Failure Condition Interpretation A Catastrophic Failure may cause a crash. B Hazardous Failure has a large negative impact on safety or performance, or reduces the abil- ity of the crew to operate the plane due to physical distress or a higher workload, or causes serious or fatal injuries among the passengers. C Major Failure is significant, but has a lesser impact than a Hazardous failure (for example, leads to passenger discomfort rather than injuries). D Minor Failure is noticeable, but has a lesser impact than a Major failure (for example, causing passenger inconvenience or a routine flight plan change). E No Effect Failure has no impact on safety, aircraft operation, or crew workload. Table 1: DO-178B is a software development process standard, Software Considerations in Airborne Systems and Equipment Certifica- tion, published by RTCA, Inc. The United States Federal Aviation Authority (FAA) accepts the use of DO-178B as a means of certifying software in avionics applications. RTCA DO-178B assigns criticality levels to tasks categorized by effects on commercial aircraft. discuss a number of future extensions to the system being deadlines can never be missed, while in a soft real-time built. We conclude in Sec. 5. (SRT) system, some misses are tolerable. SRT schedulability can be defined in different ways; we assume here that a SRT system is schedulable if deadline tardiness is 2 Background bounded (by a reasonably small constant). In determining schedulability (hard or soft), the processor share required In the following subsections, we present relevant back- by each task T is of importance. The share required by T ground information on multiprocessor real-time schedul- is given by the quantity T:u = T:e=T:p, which is called its ing, multi-criticality scheduling, and scheduling ap- utilization. proaches for ensuring temporal isolation. We also present a brief overview of the LITMUSRTsystem being used in our research. With regard to scheduling, two kinds of algorithms ac- tually are of interest: algorithms used within an OS to 2.1 Multiprocessor Real-Time Scheduling make scheduling decisions (of course) and algorithms used to test schedulability (i.e., timing correctness). The lat- When designing a real-time application, the goal is to pro- ter algorithm is necessarily dependent on the former. In duce a schedulable system, i.e., one whose timing con- the multiprocessor case, a scheduling algorithm may fol- straints can be guaranteed. Timing constraints are usu- low a partitioning or global scheduling approach (or some ally expressed using deadlines. In this paper, we con- combination of the two): under partitioning, tasks are stat- sider a well-studied task model in which such notions arise, ically assigned to processors, while under global schedul- namely the periodic task model [20]. In this model, the ing, they are scheduled from a single run queue and may scheduled system is comprised of a collection of recurring migrate among processors. An example of a partition- sequential tasks. Each such task T releases a succession of ing algorithm is partitioned EDF (P-EDF), which uses jobs and is defined by specifying a period T:p and an ex- the uniprocessor earliest-deadline-first (EDF) algorithm as ecution cost T:e. Successive jobs of T are released every the per-processor scheduler. An example of a global algo- T:p time units, starting at time 0, and one that is released rithm is global EDF (G-EDF), under which tasks are EDF- at time t has a deadline at time t + T:p. Also, each such scheduled using a single priority queue. In this paper, P- job executes for at most T:e time units and its required pro- EDF and EDF are considered extensively. In addition, we cessing time should be allocated between its release and consider the cyclic executive approach, wherein scheduling deadline. In this paper, we make the simplifying assump- decisions are statically pre-determined offline and specified tion that task periods are harmonic. This means that, for in a dispatching table [2]. If task periods are harmonic, then any two tasks T and U, if T:p < U:p, then U:p is an integer the required dispatching table requires pseudo-polynomial multiple of T:p. In avionics systems, task periods are often space. In avionics and other safety-critical domains, the harmonic. cyclic executive approach is often preferred, because dis- Real-time schedulability. In the periodic model, the def- patching tables (once constructed) are relatively easy to inition of the term “schedulable” depends on whether dead- certify as correct. In the case of cyclic executives, we as- lines are hard or soft. In a hard real-time (HRT) system, sume partitioned scheduling (one dispatching table per processor). Global vs. partitioned scheduling. Generally speaking, “reclaimed” at system design time itself, and can be used partitioning algorithms are preferable for HRT workloads, for guaranteeing deadline compliance of lower-criticality and global algorithms are preferable for SRT workloads. If tasks. The multi-criticality task model is remarkably pow- all deadlines are hard, or if partitioning or static-priority erful and expressive. For instance, in order to obtain a scheduling is used, then restrictive caps on overall uti- WCET bound for a task at a higher level of assurance (i.e., lization (of up to half the system’s processing capacity) at a degree of confidence corresponding to a higher critical- must be enforced to ensure that real-time constraints are ity), it is typically necessary to limit the kinds of program- met [16].1 Because such caps are unavoidable in the ming constructs that may be used in that task (loops, for HRT case, partitioning approaches tend to be better be- example, may be required to have static bounds).

Multicore Operating-System Support for Mixed Criticality∗

Security Assurance Requirements for Linux Application Container Deployments

Container and Kernel-Based Virtual Machine (KVM) Virtualization for Network Function Virtualization (NFV)

Containerization Introduction to Containers, Docker and Kubernetes

Control Groups (Cgroups)

A Container-Based Approach to OS Specialization for Exascale Computing

Cgroups Py: Using Linux Control Groups and Systemd to Manage CPU Time and Memory

Kvm – Kernel Based Virtual Machine

Virtualization-Based Approaches for Mitigation of Malware Threats

Cgroups) (Plus Some Systemd Evangelizing) Who Am I?

Cgroups? Why Use Cgroups? How Is Cgroups Implemented? Subsystems Cgroup ﬁlesystem Cgroup Hierarchy

Cgroups, Containers and Htcondor, Oh My

Red Hat Enterprise Linux 8 Managing, Monitoring and Updating the Kernel