<<
Home , QNX

DOCTORAL T H E SIS

Department of Science, Electrical and Space Engineering EISLAB Johan Eriksson Enabling Johan Reactive Eriksson Design of Robust Real-Time Embedded Systems

ISSN 1402-1544 Enabling Reactive Design of ISBN 978-91-7583-835-9 (print) ISBN 978-91-7583-836-6 (pdf) Robust Real-Time Embedded Systems Luleå University of Technology 2017

Johan Eriksson

Embedded Systems

Enabling Reactive Design of Robust Real-Time Embedded Systems

Johan Eriksson

Dept. of Computer Science, Electrical and Space Engineering Lule˚a University of Technology Lule˚a, Sweden

Supervisor: Per Lindgren Printed by Luleå University of Technology, Graphic Production 2017

ISSN 1402-1544 ISBN 978-91-7583-835-9 (print) ISBN 978-91-7583-836-6 (pdf) Luleå 2017 www.ltu.se iii iv Abstract

This thesis aims at advancing the state-of-the-art related to efficient, safe and robust design and execution of Embedded Real-Time Systems by bridging previous published work on code analysis with a simple reactive Model of Computation (MoC). The work builds on the previous licentiate thesis by the author ”Embedded real-time software using TinyTimber: reactive objects in C”, where the C language is used to design reactive em- bedded systems by adopting the Timber concurrency model complemented by a simple reactive scheduler ”tinyTimber”. The proposed method in the previous work is shown to be useful for the design of embedded systems. However to enable safe, robust and efficient design/execution of these systems more support from the software development tools are needed than is possible to realize from writing the software in C. In this thesis, a development of the system model defined in the licentiate thesis is outlined that en- ables, directly from the system model, efficient code generation and benchmarking, that together with the proposed analysis methods enables design time verification of timing and memory constraints. In general, a plethora of analysis methods have been developed for checking timing and memory constraints, however there is a lack of tools applicable to mainstream embedded programming. Moreover, in many cases the connection between theoretic work and the practical design and implementation of embedded systems is lack- ing. In this thesis, the work is focusing on bridging this gap by demonstrating how this system model and its MoC can be used to generate the required input to the analysis steps needed to guarantee the safety and timing properties of the generated executable. A method is also developed to perform precise stack memory analysis by utilizing timing and critical section information to reduce over-approximation. The work described in this thesis is also utilizable for execution and code-generation from other computational models, mappings from tinyOS and IEC61499 are shown in the accompanied papers. The simple and comprehensive MoC makes code-generation from such systems straight- forward and thus gives the benefits of the proposed method to these systems, and in the case of IEC61499 enables execution on a new set of devices (embedded bare-metal targets). In conclusion, the work described in this thesis enables the implementation of a tool-chain that is based on the proposed MoC, this tool-chain enables analysis with respect to timing requirements, memory constraints and common programming errors.

vi Contents

Part I 1 Chapter 1–Thesis Introduction 3 Chapter 2–Related Work 7 2.1 Related system models applicable to ...... 7 2.2 Industry Standards ...... 10 2.3 Tool Suites (IDE/Response time analysis/WCET) ...... 12 Chapter 3–Programming and Execution Model 15 3.1 Programming and Execution Model ...... 15 Chapter 4–Summary of publications 19 4.1 Paper A: Implementation of SRP-DM for Embedded Real- TimeSoftware...... 21 4.2 Paper B: End-to-End Response Time of IEC 61499 Distributed Applica- tionsOverSwitchedEthernet...... 21 4.3PaperC:TRust-ATaskModelfortheRustLanguage...... 22 4.4 Paper D: Scheduling of CRO systems under SRP-DM ...... 22 4.5 Paper E: Refined SRP Stack Memory Analysis by Exploiting Critical Sec- tionsforSharedResources...... 23 4.6 Paper F: Real-Time For the Masses, Step 1: Programming API and Static Priority SRP Kernel Primitives ...... 23 4.7 Paper G: Uniform scheduling of internal and external events under SRP-EDF 24 4.8 Paper H: Real-Time Execution of Function Blocks for Internet of Things usingtheRTFM-kernel...... 24 4.9 Paper I: Leveraging TinyOS for Integration in Process Automation and ControlSystems...... 25 4.10 Paper J An IDE for Component-Based Design of Embedded Real-Time Software...... 25 4.11 Paper K: RTFM-lang Static Semantics for Systems with Mixed Criticality 26 Chapter 5–Conclutions, Ongoing and Future Work 27 5.1Conclusions,OngoingandFutureWork...... 27 References 29 Chapter 6–RTFM-4-SURE preprint 33

vii Part II 57 Paper A 59 Paper B 69 Paper C 83 Paper D 87 Paper E 99 Paper F 109 Paper G 115 Paper H 123 Paper I 131 Paper J 141 Paper K 147

viii Acknowledgments

I wish to thank my friend and supervisor Per Lindgren for the excellent support, feedback and extreme patience given during my PhD studies. A special thanks also goes to my friend and former colleague Simon Aittamaa for excellent feedback and support. Also to the rest of my former colleagues; thank you for helping me form this work and making my time at Lule˚a University of Technology a great experience. Finally, I wish to express my sincere gratitude to my wonderful family; Johanna, Tilde and Adrian you have motivated and inspired me throughout the process of finalizing this thesis and finally concluding my studies.

Johan Eriksson Lule˚a, Mars 2017

ix x Part I

1 2 Chapter 1

Thesis Introduction

The work summarized in this thesis is the main result of my research studies at Lule˚a University of Technology. In this section I will give a background to the work, and discuss about my overall contributions. Chapter 4 gives a detailed description of how to the appended papers relates to the aim of this work, and list my personal contributions in each individual paper. This work is based upon my work performed for the licentiate degree [1]. The foundation of this work came during my Master Thesis work, where I developed a distributed control system for the LTU Formula SAE Race-Car, which consisted of sev- eral embedded MCU’s (Atmel 8Bit AVR and 32bit ARM). This system implemented all the functionality of the car, including: Engine Management, Gearbox control, Traction Control, Power Management and Wireless Communication, where the only third party device where an Ethernet-WIFI adapter. The performance requirements for this system is relatively high - especially for scheduling of the engine management events (an angular resolution of about 1deg at maximum RPM (15000) translates to a timing requirement of about 10us). To implement this system, I used several AVR micro-controllers pro- grammed reactively using handlers, hardware time-stamping of and real-time queuing (no OS/third party code), and an ARM7 micro-controller running a threaded embedded kernel (freeRTOS) and a custom TCP/IP stack using a Wiznet TCP/IP coprocessor. The project as a whole received several awards during the Formula Student competitions in Bruntingthorpe(2005-2006)/Silverstone(2007) , including ”IEEE Most innovative use of Electronics”. And parts of this system (engine control) was later re-used for the LTU Eco Marathon car, also with excellent results (The LTU ”Baldos” car won the Shell ECO Marathon competition in 2008). The main challenges in the master thesis work on device level was identified as coding the high-performance code required to perform the event scheduling, without creating race conditions or excessive concurrency limitations due to blocking to avoid race conditions. The ”standard” implementation technique of using a threaded kernel (such as freeRTOS [2]]) typically introduces context switches for event handling and the timing overhead per event is about an order of mag- nitude larger than the required 10us timing requirement (Paper G in [1]). The threaded

3 4 Thesis Introduction

OS on the ARM micro-controller was utilized to handle Ethernet communication and calculations with longer dead-lines (such as mass/flow/fuel/Ignition ESP calculations), The calculations were in principle implemented using a TTA (time triggered architec- ture, even though it was not identified as such at the time) and the communication stack is event driven (CAN bus for internal communication and Ethernet for external). The challenges identified for this subsystem where setting priorities (task and interrupt) and analyzing/testing the system to meet the deadlines for reacting to events, typically from receiving new sensor data from the CAN bus and responding with new control outputs, while considering potential interference from other threads and context switching over- head. On a system level, reasoning on end-to-end response times was crucial, and to ensure that all end-nodes responded with new sensor data in time for each TTA epoch. During this work is was identified that a reactive design methodology for design of such systems would be beneficial, that enables reasoning on round-trip timing (also for distributed systems) and that doesn’t impose a significant overhead compared to a efficient raw C-code implementation. All the design choices taken in this thesis are taken with the outset of being useful to the practical implementation and verification of lightweight embedded systems, and is based on the experience gained initially in this master thesis work. In the last couple of years I have worked with designing embedded system in a industrial setting and this experience has also been helpful and has influenced the later parts of the work. During my initial studies (for the licentiate degree) I explored the use of tinyTimber to implement these types of systems. It gives some benefits, mainly enables reasoning on timing and offers an intuitive programming interface. However, since it is a real-time kernel with a C-macro API, it is relatively easy to create hard-to find programming errors. However if the programmer follows the proposed design pattern the system will be free from data-races. Also static (off-line) reasoning of program flow, end-to-end timing and memory usage is hard since the program flow is not easily parsed from the code. However, the main drawback is the implementation method used, where several run-time stacks is used to enable concurrency, this approach implies using context switches. For a real- time embedded system context switches results in significant overhead, and thus forces the programmer to code the timing critical parts of the system without support from the real time system. Also, the multiple threads approach requires space for multiple execution stacks (one per ) - the memory requirement for this is not insignificant (Paper F&G in [1]). Another track lead by Johan Norlander at the embedded system group at LTU was the creation of a functional language, Timber, [3] that I was partially involved in. For ex- ample, I implemented the ARM7 port/RTS for Timber and designed a few demonstrator systems. Timber shares its execution model with tinyTimber, and the implementation of this used context switches and thus gives the same drawbacks. Moreover, the code generation resulted in heavy use of dynamic memory allocation, which typically implies expensive (w.r.t. memory/CPU) garbage collection and a lot of indirection/boxing[4] that rendered the generated code infeasible to use for practical systems. Many of these problems could be mitigated by more efficient code-generation (the compiler didn’t im- 5 plement much in the way of optimization). The implication on the high abstraction level offered by the Timber language is also a potential problem, since seemingly trivial changes can create big changes in the generated code and thus the execution time and memory requirements. An advantage of the approach used in Timber is that the behaviour of the whole program is (potentially) known at compile time and it is thus possible to reason about behaviour and potentially perform static analysis with respect to memory and execution time, however this was never attempted. The analysis approaches outlined in this thesis could potentially apply to some systems expressed in Timber (i.e., systems where the compiler has been able to eliminate all dynamic memory allocation). Another significant problem with a functional approach to embedded design is the unfamiliarity of the required coding style, this will present a major inconvenience for the majority of embedded developers used to C programming. At this point it became clear that it would greatly ease the implementation and test- ing/debugging effort if a system/tool is offered to the embedded developer that provides:

• A simple intuitive execution model with easily predictable run-time behaviour.

• No, or very limited runtime overhead compared to a bare-metal C implementation.

• A programming model with familiar notation for a typical embedded programmer, that is used to embedded programming in C (as myself).

With these parts in place a embedded developer can develop embedded systems with similar performance as a hand optimized bare-metal C implementation, even for systems with microsecond timing requirements. The methods outlined in this thesis solves the first two requirements, however the programming model is not finalized but a few prototype languages are presented. Even though the resource requirements (CPU/memory) are low with the proposed system design methodology it is still up to the programmer to ensure that the available memory resources are not exhausted (the system is static, however a (dynamic) run-time stack is still required), and that all timing requirements are met. This is not an easy task in general for systems with a lot of potentially overlapping events. During my work, I have investigated methods to automatically ensure these properties, these methods are in general based on standard theories, with some extensions to fit to the proposed MoC. These methods are summarized below:

• Stack Memory Analysis.

• Automatic Response Time Analysis.

• Automatic Control Flow Analysis

The first two methods are given in the appended published work. However these methods are dependent on the program control flow, I have co-supervised a master-thesis on this topic [5] and this is the topic of promising ongoing work utilising symbolic execution to automatically derive the control flow. A preprint of this work is included in this 6 Thesis Introduction thesis (Chapter 5.1.1). These methods can be integrated in the proposed tool-chain, by using the methodology published in this thesis - as demonstrated by the REKO tool-chain. Moreover typical programming errors such as array out of bounds, pointer errors and non-termination are a common problem in low-level programming, using the methodology outlined in (Chapter 5.1.1) these type of errors can be eliminated for free. As described initially in this chapter, it would be beneficial to extend these analyses to support distributed systems, the feasibility of this approach is shown in this work, where a distributed system that is implemented using the proposed MoC is analyzed while taking network overhead (switched ethernet) into account. I have investigated utilizing the IEC 61499 functional blocks model[6] for bare metal embedded programming. Typically, 61499 systems are implemented by utilizing an em- bedded real-time (typically rt-/VXworks/eCos or similar) that interprets a system descriptor file that is uploaded to the target at deployment time. This approach is usually fine for the implementation of control systems for controlling mechanical processes, where ms timing resolution is sufficient and where the target plat- form is implemented using an application processor suitable for a real-time operating system, typically running at about 500Mhz, and with >100Mbytes of memory (Repre- senting 10x the clock speed and 100x the memory). However, for implementing embedded controllers where tighter timing requirements are required and where the available re- sources are significantly lower (KB vs. MB of memory) this approach is not feasible. A similar mapping is also presented for the tinyOS model to the proposed MoC. These mappings (tinyOS and 61499) shows the versatility of the proposed MoC for develop- ment of embedded systems. Since despite its apparent simplicity, it is shown that the runtime-behaviour of the much more complex tinyOS and 61499 execution models can be implemented in the much simpler RTFM MoC (chapter 3). (The exception is dynamic systems that is not supported.) Some related system models, industrial Tool-Suites and standards are discussed in the thesis in section 2. In my opinion neither of these offers a suitable abstraction for the development of robust embedded real-time software for light weight targets. In general the approaches taken primarily aims at offering the developer a feature rich program- ming environment, without considering the implications for later program analysis. Thus forcing any program analysis to make gross estimations and requiring extensive manual annotations. And also the run-time system required to execute the system becomes com- plex and imposes significant overhead (Assuming an embedded target with relatively few operations per task). This approach then typically forces the programmer to mitigate the overhead whenever tight timing requirements are necessary, by deviating from the built-in programming model and implement the timing critical parts of the system using direct interrupt programming and manual insertion of critical sections, this mitigation then further complicates any analysis and is in general error-prone. In section 5 the overall results of the methods presented are concluded and set direc- tions for the ongoing and planned future work are presented. Chapter 2 Related Work

In the realm of component based design of embedded systems a plethora of systems / models / model of computations (MoC) have been developed / proposed. In this section an overview of the field is given.

2.1 Related system models applicable to Embedded System

2.1.1 The Rubus Component Model

The Rubus Component model (RCM) targets development of embedded systems. The RCM shares many features with ProCom and is covered in detail in [7]. [8] contains an in-depth comparison of RCM to CRO and the related CRC (Concurrent reactive compo- nents) model. One important feature of the RCM is the support of multiple Execution Models [EM], where typically hard real-time parts of the system is executed on a time triggered EM (Refereed as Red tasks), the non time critical parts are typically executed using an event driven EM (Blue tasks) and a hardware interrupt execution model (Green tasks) that are executed pre-emptively on the currently active ”Red”/”Blue” tasks stack. Thus the response-time/stack memory analysis of the Blue/Red tasks must take the in- terrupt execution in account by providing sufficient slack[9]. The implications of the locking primitives required for sharing data with interrupt handlers is not entirely clear by reading the public documentation/published papers, thus a direct comparison with the CRO model is hard since this is a key feature of the CRO model - where all the system tasks, including interrupt handlers is treated in a homogeneous fashion. Also, the RCM offers a more complex Execution model (a mix of time-triggering/events and interrupts) compared to the CRO model that consists of only synchronous or asynchronous tasks (with optional time delayed execution) that executes on a shared stack using SRP and a unified scheduler.

7 8 Related Work

2.1.2 TinyOS In the area of Wireless Sensor Networks (WSNs), TinyOS[10] (TOS) has gained wide spread use, mainly because it offers a simple programming model and it ships with a large set of software libraries (e.g., protocol stacks needed to implement SOA enabled devices) and is available for a large number of lightweight target platforms. However, TOS has yet to make its way into industrial applications where real-time operation is required (which is typical to monitoring and control systems). As being designed primarily with simplicity in mind, the TOS execution model for tasks is non-preemptive, limiting system responsiveness and schedulability. Paper I gives a in-depth comparison of the TOS model.

2.1.3 freeRTOS Is a popular RTOS for embedded programming, holding a 17% market share in 2014 ac- cording to an article in EETimes [11]. It offers a efficient simpistic thread based scheduler, and is supported on wide range of microcontroller platforms[2]. Also comes pre-packaged with network support for many platforms, and in some cases comes delivered with the manufacturer supplied IDE (Such as NXP LPCOpen libraries)[12] thus simplifying in- tegration significantly. The free version ships with a modified GPL, that restrict the user from creating a new RTOS based on freeRTOS, doesn’t force the user to disclose the source of their end products (Except for modification to freeRTOS) and also, as of 2008 restricts the user by forbidding comparison to other RTOSes (!). In paper F a old version is compared to the CRO/RTFM approach, the comparison is not targeted at freeRTOS specifically, instead freeRTOS is selected to demonstrate a typical threaded RTOS behaviour.

2.1.4 Is a lightweight OS developed by Adam Dunkels[13]. Originally designed to enable TCP/IP networking to lightweight sensor nodes. It supports lightweight multitasking by introducing the concept of protothreads. However hard real-time systems are not a target since traditional preemption or priority-scheduling is not supported. Contiki offers a rich set of supported platforms and network stacks, specifically related to IoT / sensor networks.

2.1.5 RIOT Offers a partially POSIX compliant lightweight threaded OS, targeted at typical IoT devices[14, 15]. RIOT offers compatibility for a rich set micro-controller platforms, com- munication standards, transceivers and sensors. And can thus significantly lower the required effort to implement IoT platforms using RIOT. Compared to other POSIX / ”POSIX like” systems RIOT offers significantly lower overhead, similar to a freeRTOS (section 2.1.3) system. RIOT and freeRTOS shares many of the same benefits/problems. 2.1. Related system models applicable to Embedded System 9

2.1.6 ProCom Defines a component model tailored for the development of Real-Time Embedded sys- tems [16] [17]. It defines two layers, ProSys and ProSave. ProSys defines the top layer, where a system is collection of concurrently executing communicating subsystems that communicates by asynchronous messages that is send/received at output/input typed message ports. ProSave is the lower layer, consisting of passive units with separated data and control ports. Data Ports and grouped with a single control port, and Data I/O are accessed atomically at the triggering of its corresponding control port. The ProSave model is similar to the IEC61499 model (section 2.2.2), One important dif- ference is that each data port has only one corresponding control port. For IEC61499 that supports sharing of data between control ports this opens possibilities for ambigu- ous/nondeterministic execution (covered in Paper H and [18]). The ProSave model also includes a clear formalism of a FSM for describing the internal operation of each passive unit, this is also very like the ECC’s defined in IEC61499. With the exception that the IEC61499 ECC execution semantics is rather non-formal and includes a lot of (seemingly intentional) implementation dependent ambiguities. Both ProSave and ProSys supports hierarchical components.

2.1.7 REKO Is a prototype tool/integrated environment that features the Concurrent Reactive Ob- jects (CRO) execution semantics described in this thesis, and that features an abstraction of these CRO models into Concurrent Reactive Components (CRC) [8]. Paper J and [19] gives an in-depth overview of this tool. The tool aims at offering the user a development environment that supports specification of functionality, timing constraints, code gener- ation, code deployment, scheduling analysis (by utilizing the methods described in this thesis and benchmarking of the produced binary on the target platform). The developer expresses the functionality of each method in C and specifies the connections between objects using a graphical notation. No checking is done to verify that the method C code adheres to the execution semantics of CRO, however the tools defines a local name scope for each method that helps to ensure state encapsulation. Moreover, the generated code uses a CRO scheduler together with SRP-DM scheduling (papers A and D) and calculates the required pre-emption levels by tracing the call tree for each event, these can be external event (interrupt) or internal (Asynchronous event, possibly with delayed execution).

2.1.8 RUST Is an emerging language currently sponsored by Mozilla. Its goal is ”To design and im- plement a safe, concurrent, practical systems language.” [20]. It focuses on safety and performance. According to the documentation rust guarantees memory safety, features zero cost-abstractions, enables safe usage of threads and has virtually no run-time. These claims are valid when generating code to be run on a PC operating system, where the 10 Related Work

RUST threading API typically wraps the POSIX threading API in a safe manner, and where the rust memory model ensures state integrity. However, the applicability for programming of embedded ”bare-Metal” embedded systems using rust is limited - since RUST offers a traditional threaded approach to concurrency, and offers no abstraction of interrupt handlers. Thus, forcing the programmer to manually ensure state integrity when sharing memory using interrupt handles, or resort to a traditional threaded ap- proach where the interrupt handler unlocks a waiting user thread for each incoming event. As shown in paper F and discussed in 3.1 this type of abstraction results in an overhead of about an order of magnitude compared to CRO or a raw C implementation. (The paper compares RTFM / CRO) with freeRTOS, however the rust approach will result in a similar execution pattern, i.e. at return from the interrupt handler a context switch needs to performed after some queue operations.

2.2 Industry Standards

In this section an overview of related industry standards related to embedded system programming is given. These industry standard in general does not target resource constrained embedded systems, with the exception of MISRA and OSEK/VDX.

2.2.1 IEC 61131 Is a standard[21] that has faced wide industry adoption for the programming of industrial PLC’s. The standard defines 5 different languages, including:

• Ladder logic ”LAD/LD”, where the functionality is expressed schematically - sim- ilar to wiring the control logic using relays.

• Functional block diagram ”FUP/FBD” where the functionality is expressed graph- ically, similar to an electrical circuit. These functional blocks can be hierarchical. Similar to electrical circuits FBD connections represents a distinct value, it is thus not possible to represent events using FBD.

• Structured Control Language SCL ”ST” is a textual programming language, similar in syntax with pascal.

A control system is typically implemented using several of these languages, for example the top-level design might be graphical using FBD and the ST language is used to define sub component behavior. In general, the execution of systems developed using IEC 61131 is scan based, thus all code is executed on each scan cycle (each scan cycle is typically one to several ms) IEC 61131 is implemented by the major PLC vendor’s development tools e.g. Siemens STEP 7[22] , Rockwell[23] and Beckhoff TwinCAT[24]. There also exist some hardware independent tools for 61131 based development. One example is CODESYS ”Codesys Development System” [25]. 2.2. Industry Standards 11

2.2.2 IEC 61499

Is an international standard defined in [6], it builds on the IEC61131 standard. The standard specifies a generic model that features a holistic view of distributed control systems, where both overall functionality and distribution can be expressed in the model. Compared to IEC61131 it also aims at improving portability, interoperability, increased re-usability and distribution. The 61499 standard defines a method of development using functional blocks syntactically similar to 61131 FBD diagrams, however in 61499 event and data connections is separated. These concepts are further elaborated in [26, 27]. The 61499 standard aims at enabling code-reuse across vendors, however the execution semantics is largely undefined in the standard - thus the run-time behavior of a system can differ significantly between vendors (Covered in Paper H and [18]) The standard has not seen any wide industry adoption [28], and IEC 61499 is not supported by the Development environments offered by the major PLC vendors. For programming using IEC61499 a few tools are available, mainly: Industrial tools such as ISaGRAF and NxtOne and Open Source Tools such as 4DIAC and FBench. All these tools are hardware independent and thus supports a range of different hardware platforms.

2.2.3 OSEK/VDX

Is a standard that defines a API for a Multi-threaded operating system tailored for Automotive applications [29]. It defines a static system, where tasks, mutexes etc. are defined at compile time. It defines two types of tasks, basic tasks (BCC) that are run- to-completion and extended tasks that can halt execution (sleep). In principle BCC’s are similar to CRO tasks, and ECC’s are threads. Deadlocks and priority inversion is prevented by utilizing the PCP protocol [30]. A interesting project is SLOTH [31] that implements a BCC compliant OSEK OS using a similar approach as CRO/RTFM, where SRP and hardware interrupt handlers is utilized to implement a SRP-DM scheduler (Like in Paper A/F).

2.2.4 AUTOSAR

Is a standard [32] that aims at offering reuse of software components, by defining a software framework for reusable software components. The kernel is a superset of the OSEK/VDX kernel (covered in chapter 2.2.3). AUTOSAR provides API’s to support composition of software modules from different vendors, however the execution semantics are not clearly defined and the software modules are often provided as black boxes, thus making verification of the complete system (including black-box software components from different vendors) hard [33]. Since AUTOSAR Release 4.0 timing behaviour can be described as ”Timing Extensions” for software components, compositions, basic software and for the entire system. This is approach for verifying the timing behaviour is evalu- ated in [34] where tool support for this extension is evaluated (Without presenting clear results). 12 Related Work

2.2.5 C MISRA

Is a set of software development guidelines defined by MISRA (Motor Industry Software Reliability Association) [35]. It has evolved into a widely-accepted standard for a best effort programming practice for the development of safety critical embedded systems. However, this approach to realizing safety has been shown to not be particularly effective, in a case study performed at TU Delft [36] an ”Industial embedded software project” is analysed using MISRA rules, with the outset of investigating if compiling to MISRA rules actually makes the embedded software safer. The main issue they identified is that: ”Not only can compliance with a set of rules having little impact on the number of faults be considered wasted effort, but it can actually result in an increase in faults, as any modification has a non-zero probability of introducing a fault or triggering a previously concealed one.”. After evaluating the data, they concluded that: ”Taken together with Adams observation that all modifications have a non-zero probability of introducing a fault, this makes it possible that adherence to the MISRA standard as a whole would have made the software less reliable.” Thus, approaching safety in the design of embedded systems by blindly complying to a set of predefined rules is not very effective and can even have a negative impact under certain conditions.

2.3 Tool Suites (IDE/Response time analysis/WCET)

No tool exists today that is available for mainstream embedded software development that supports response-time analysis of the embedded system. The typical approach utilized by the embedded developer is to validate the system by measurement, a process that is both time-consuming and error prone. However, in the aerospace and automotive indus- tries tools for timing analysis exists, and there is some tools that supports WCET analysis of code snippets of the embedded system, both by measurement and by static analysis. In this section a overview of a few of these tools are given. In general the tools presented in this section support finding of the WCET for individual code chunks (typically one task at a time), and in some cases the inter-arrival time (max/min/distribution) of these snippets can be found through benchmarking. This WCET information is required for assessing the behaviour of the system, however to access the combined effect of all code running on the system a schedulability analysis must be performed. In some cases the Real-Time Operating System offers tools to handle this [37], however in most cases (due to the unavailability of integrated tools) the developer must perform the schedulability analysis on paper to guarantee that the timing constraints are met. In general a generic WCET tool must handle threaded tasks, this is in general much more complex than analysing run-to-end tasks, mainly since the state is dependent on recursive iterations, and each task might have several entry-exit points (Covered in depth in chapter 5.1.1). One exception is Rubus ICE that support some analysis in the integrated environment [38, 39]. 2.3. Tool Suites (IDE/Response time analysis/WCET) 13

2.3.1 aiT from AbsInt.[37] aiT is a WCET analysis package that performs static analysis of code chunks by Abstract Interpretation on the binary code for a wide range of target platforms. In general aiT requires user annotations for generating meaningful output. aiT computes the upper bound including caches and pipelines interactions, thus producing a safe result.

2.3.2 SWEET.[40] SWEET is a open source tool developed at MRTC (M¨alardalen Real-Time Research Centre), it includes a C compiler that outputs a intermediate format (ALF), used for the abstract execution. SWEET lacks a timing model of the target platform, instead third party tools must be utilize for this (Such as aiT). This approach is quite similar to the approach taken in the ongoing work presented in chapter 5.1.1.

2.3.3 RapiTime from Rapita Systems Ltd.[41] Based on research from the Real-Time Systems Group at the University of York, measures execution time of short sub-paths through the code and reconstructs the WCET based on structural analysis of the program. Thus this is a mix of static analysis and benchmarking. Claims to not require heavy annotation of the program or a exact model of the target architecture to generate safe and tight results.

2.3.4 Rubus ICE Rubus ICE is the integrated development tool for the Rubus Component Model (RCM) and the Rubus Kernel. Supports response time analysis and stack memory analysis through a set of experimental plugins [38, 39], little information exists on the commercial availability of these tools. The approach in Rubus ICE features similarities to the work presented in this thesis since the Rubus component model contains all the information re- quired for integrated scheduling and stack memory analysis of the full system[42]. Rubus ICE also supports analysis of distributed systems [43] The RCM is further discussed in chapter 2.1.1 14 Related Work Chapter 3 Programming and Execution Model

3.1 Programming and Execution Model

The implementation methodology and analysis techniques for embedded system design described in this thesis is based on a reactive execution / programming model. The work has progressed over time, and this section will describe these concepts. The model was initially defined by the reactive lightweight Run-Time System (RTS) implementation tinyTimber [1], then CRO specifies a more generic notion of Concurrent Reactive Objects (Initially defined in the Licentiate Thesis [44] and included in paper D), later work focuses on the execution model offered by RTFM (papers K, H, & F). The CRO model is used in the prototype graphical IDE REKO (paper J and [8, 19]). In short the tinyTimber model is based on the execution model utilized in the Timber language [3] and features a simplistic C-macro API to the tinyTimber RTS. The kernel implements Earliest Deadline First (EDF) [45] scheduling with priority inheritance (PI) [46]. To implement EDF + PI a multi-threaded approach is taken. A tinyTimber program consists of:

• a set of resources(or objects), where

• each resource implements a set of methods (stateful functions) and

• each resource has one or several instances.

TinyTimber offers synchronous and asynchronous communication/execution.

• Synchronous calls supports sending/reception of data, and is implemented using a mutex lock on the receiving resource.

• Asynchronous calls features sending of data, and an (optional) time delayed execu- tion (i.e. baseline extension). Each asynchronous call is the start of a new reactive execution chain and has a corresponding execution deadline.

15 16 Programming and Execution Model

An embedded program typically interacts with the environment through a set of ports(I/O) and event inputs (interrupt handlers and reset). To map this to the tinyTimber model the programmer typically wraps each I/O device in a resource. Hovewer the interrupts have no explicit connection to the resource, and can thus not access its corresponding I/O registers directly, to solve this the programmer has to insert a asynchronous call to a method on its corresponding resource. From an embedded system point of view this approach is identified to have following problems: The total Worst Case Execution Time(WCET) for execution of a typical event is heavily influenced by the time to perform (several) context switches. (As shown in paper F) EDF can be shown to be optimal for certain systems, however a pure EDF scheduler is not implementable in software mainly because the exact arrival time of external events cannot be determined. And an EDF scheduler gives significant overhead compared to an DM scheduler (due to online priority resorting). Thus, for the practical implementation of many embedded systems a DM schedule is more suitable. An example of the online priority resorting required to perform EDF in practice is the following (assuming SRP): Scenario I

• Task A is executing, with relative deadline 60 and absolute deadline 100.

• Current time is 50.

• Hardware interrupt Ib occurs, with a relative deadline 40.

• Interrupt preemption occurs, kernel timestamps Ib to time 52. • The RTS schedules Task B (since 52 + 40 < 100) that is assigned to hardware interrupt Ib. Scenario II

• Task A is executing, with relative deadline 60 and absolute deadline 100.

• Current time is 70.

• Hardware interrupt Ib occurs, with a relative deadline 40.

• Interrupt preemption occurs, kernel timestamps Ib to time 72. • Kernel enqueues Task B (since 72+40 > 100) that is assigned to hardware interrupt Ib. • Interrupt is returned.

• Task A is run to completion.

• The RTS examines the pending queue and executes Task B, since it is on top of the queue. 3.1. Programming and Execution Model 17

Scenario II leads to significant overhead, mainly caused by the queuing operations. The approach taken in paper F gives a method for SRP and DM scheduling that utilizes the interrupt hardware to effectively eliminate the queuing and priority inversion in Scenario II. To motivate the use of a more complex EDF scheduler, the overhead must be signif- icantly less than the work performed in the tasks. In principle this can be achieved by reducing the overhead, possibly by a hardware accelerator, however this is not available in standard MCU’s. Alternatively, for certain types of systems where the work in each tasks is sufficiently large the imposed scheduling overhead can be acceptable and thus motivate a EDF approach. Hybrid schemes could also be envisioned (as in [47]) however this adds significant complexity and is thus out of scope of the proposed approach. However, some static analysis is required to implement SRP, which is hard for a sys- tem implemented using a C-macro API. Thus, tinyTimber is not a suitable implemen- tation technique, but the programming and execution model using Concurrent Reactive Objects (CRO) is shown to be useful. To implement systems using CRO an implemen- tation technique is required that enables static analysis, this is captured in the REKO IDE/framework, where a component model[8] is used with the CRO model to enable graphical design of embedded systems. Other ongoing work that addresses the static analysis problem is the RTFM Compiler, this is covered in papers (F & K), and a pro- totype is available at [48]. The motivation for utilizing SRP instead of a multi-threaded approach is covered in the following section:

Threaded execution behaviour Typically for implementation that utilizes only the interrupt handler to achieve concur- rency the execution behaviour is the following:

• Push context at interrupt entry (Limited to the currently used resources)

• Acknowledge interrupt

• Execute interrupt triggered code (Actual work)

• Pop context

The problem in general with this approach is the handling of shared resources. The standard threaded approach (using the PI protocol, that addresses resource shar- ing) to handle this results in the following (Best case behaviour):

• Push context at interrupt entry (Full context)

• Acknowledge interrupt

• RTS Queue operation

• Change return thread 18 Programming and Execution Model

• Pop context (will now return to the thread that waits for interrupts)

• RTS Dequeue operation

• Execute code (Actual work)

• RTS Queue operation

• push context for interrupt thread.

• pop context of interrupted thread.

Coding for the first more efficient behaviour is a typical task for a embedded Real- Time programmer - where the timing critical parts of the system is implemented using this method, however this requires extensive use of volatile variables and manual control of critical sections (typically with global interrupt disable) and without tool-support it is easy to introduce hard to find errors. Using the second approach (using a typical threaded programming pattern) the abstraction is not zero-cost, and for typical operations (setting some I/O, updating internal memory) where the actual work is in the order of a few hundred operations the overhead is significant (Overhead is typically > 1000 cycles for a generic threaded implementation as shown in paper F). Also for an embedded bare- metal MCU the RTS must implement a threading API, compared to the RTFM RTS this represents at least an order of magnitude in overhead (paper F). Thus using a threaded approach for expressing concurrency imposes both severe timing and memory overhead. To conserve memory the programmer typically must set tight bounds on the stack memory reserved to each thread, this increases the chance of stack overrun since the typical tools offers no stack memory assessment. Chapter 4 Summary of publications

Figure 4.1: Appended paper dependency graph Lic Thesis

Paper G Paper D CRO Paper A Uniform SRP-DM RTA SRP-DM Scheduling

Paper J Reko IDE

Paper F RTFM

Paper K Paper E Stack RTFM-Lang Memory

Enabling Robust Reactive System Design

Paper H Paper I TinyOS Realtime Paper C Trust FB (61499)

Paper B Distributed System RTA

19 20 Summary of publications

This chapter provides an introduction to the papers included in this thesis as well as a short summary of each paper. Listed in order of significance, figure 4.1 gives a dependency visualization. The Introduction (chapter 1) introduces the reader to the overall ideas and motivation of the presented work. The Licentiate thesis [1] introduces the reader to the background of the work, where a useful reactive programming model for development of embedded systems is presented. However to develop robust embedded systems a developer also has to address the following issues:

• Performance: The tinyTimber execution model/RTS presented in the Licentiate thesis is usable for development of embedded systems, however a multi-threaded approach to solving the concurrency problem imposes execution time and memory overhead. This is evaluated in paper A, that introduces a alternative model, in paper F this alternative model is compared to the threaded approach. It is shown that utilizing SRP and the proposed execution model results in minimal overhead (A efficiently coded raw C-implementation would give similar overhead for most applications).

• Safety: To create a robust embedded system the memory and timing requirements must be evaluated. A Method for calculating the stack memory requirements of the system is given in paper E, and a mapping of standard response time analysis is given in paper D. The system model presented has no dynamic memory man- agement, thus a stack memory analysis is sufficient to guarantee memory safety. Possible exceptions to this rule, typically for implementing a timing queue or com- munication buffer is not handled in this thesis, but typically these issues are handled by assigning a fixed size buffer and implementing a application specific handler. The timing requirement calculations given assumes that the WCET’s of each tasks is given. In the technical report [19] I show a method for finding these WCET’s, alternatively (as discussed in chapter 2.3 ) there is several industrial tools avail- able for calculating WCET’s. In section some 5.1.1 promising ongoing work is presented that solves this problem using symbolic execution. Another important aspect of safety is the need to capture the behaviour of the full system in the sys- tem model and analysis. For embedded systems the interrupt handlers are often handled non-uniformly from the rest of the code, and also often omitted from the analysis. Paper G gives a method for the inclusion of the interrupt handlers in the system model, and thus in the presented analysis.

• Distribution: To develop robust software in a distributed setting the performance and safety of the combined system, including network overhead must be handled. This is shown in paper B, where a distributed system is analysed, including network overhead.

A prototype IDE is developed in [19] and paper J. This IDE features a , and a prototype compiler from the graphical notation. The downside with this approach is the lack of a textual notation to describe the system. This is approached in Paper K where a OO model for the experimental RTFM-Lang language is given. 4.1. Paper A: Implementation of SRP-DM Scheduling for Embedded Real-Time Software 21

Moreover, the versatility of the proposed system is shown in papers H and I, where the proposed system is mapped to the tinyOS and 61499 programming models. The ongoing work of mapping the proposed system to the rust language in presented in paper C.

4.1 Paper A: Implementation of SRP-DM Schedul- ing for Embedded Real-Time Software

Authors: Johan Eriksson, Simon Aittamaa, Jimmie Wiklander, Pawel Pietrzak and Per Lindgren Publication: Proceedings of the First International Workshop on Dependable and Secure Industrial and Embedded Systems (WORDS), V¨aster˚as, Sweden, 2011. Summary: This paper focus on code synthesis of CRO models and how specified timing requirements are preserved for scheduling and utilized for schedulability analysis. In particular, the paper presents methods to translate resource ceilings and priority levels for Stack Resource Policy (SRP) from the CRO model. Additionally, to support schedulability analysis, we detail algorithms that for the CRO model derives periods (minimum inter-arrival times) and offsets of tasks/jobs. Moreover, to demonstrate how an embedded systems functionality and timing properties can be abstracted into the CRO model an example of a simple embedded system (a process controller) is given. Furthermore, the design of a micro-kernel supporting cooperative hardware- and software- scheduling of CRO based systems under Deadline Monotonic SRP is presented. Contribution: The development of the CRO model and the writing of the paper is a result of Joint research by the authors. The work into partitioning of the system into CRC(Concurrent Reactive Components) should be accounted to Jimmie Wiklander. The development of the SRP-DM kernel is a result of joint work between Johan Eriksson, Simon Aittamaa and Per Lindgren. Extraction of resource ceilings and priority levels for Stack Resource Policy, work on defining the internal XML format, overall code synthesis framework, and extraction of information for scheduling analysis from the model should be accounted to Johan Eriksson.

4.2 Paper B: End-to-End Response Time of IEC 61499 Distributed Applications Over Switched Ether- net

Authors: Per Lindgren, Johan Eriksson, Marcus Lindner, Andreas Lindner, David Pereira, and Luis Miguel Pinho Publication: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017 and a short version (Response Time for IEC 61499 over Ethernet) is published in IEEE 13th International Conference on Industrial Informatics (INDIN 2015) 22 Summary of publications

Summary: In these papers end-to-end response times utilizing COTS Ethernet hard- ware, IEC 61499 and the RTFM MoC is characterized. Contribution: The 61499 example system is developed by Johan Eriksson, the Ethernet Modelling and measurements on the experimental setup is mostly done by Johan Eriksson and Per Lindgren. Writing of the paper is joint work between the authors.

4.3 Paper C: TRust - A Task Model for the Rust Language

Authors: Per Lindgren, Johan Eriksson, Marcus Lindner, David Pereira and Luis Miguel Pinho To be submitted. Summary: This short paper details a method for implementing SRP scheduling with shared resources in the rust language. This enables concurrent programming using rust for embedded bare-metal target, with near zero overhead compared to a raw C implementation. The computation of the SRP pre-emption levels is not covered in this paper and must thus be handled by an external tool, the method given in paper A is applicable here. Contribution: The mapping of the CRO/RTFM MoC to RUST is done by Johan Eriksson and Per Lindgren. Johan Eriksson has also contributed to the writing of the paper.

4.4 Paper D: Scheduling of CRO systems under SRP- DM

Authors: Per Lindgren, Johan Eriksson, Simon Aittamaa, Pawel Pietrzak, Jimmie Wik- lander Presented at Real-Time in Sweden (RTiS) 2011 (Workshop) Summary: This paper gives a method for SRP DM schedulability analysis applicable to embedded systems generated from the CRO model. To ensure a safe estimation kernel overhead and blocking is taken into account. Moreover a method for scheduling internal events by utilizing the interrupt hardware is shown, this approach reduces overhead and gives constant time on the kernel operations. For scheduling internal time-delayed events (events with a baseline) an internal timer is utilized, in this paper it is shown that this can be treated as a CRO/SRP resource and thus the overhead from this operation can included in the schedulability analysis. The interrupt hardware in the ARM Cortex M series microcontrollers is also shown to match the execution semantics required for correct and efficient execution of SRP-DM. The method from paper ”A” for extracting the inter-arrival times of internal jobs is further discussed in this paper. Contributions: The method for SRP DM schedulability analysis of the CRO model while taking into account kernel overhead/operations should be accounted to Johan 4.5. Paper E: Refined SRP Stack Memory Analysis by Exploiting Critical Sections for Shared Resources 23

Eriksson, Simon Aittamaa and Per Lindgren. The inter-arrival time implementation, the methods for utilizing the interrupt hardware for scheduling internal events and the mapping of the Cortex interrupt hardware to SRP-DM should be accounted to Johan Eriksson.

4.5 Paper E: Refined SRP Stack Memory Analysis by Exploiting Critical Sections for Shared Re- sources

Authors: Johan Eriksson, Simon Aittamaa, Per Lindgren Publication: ERTS2 2014 Embedded Real Time Software and Systems Summary: In this paper a method for stack memory analysis of SRP based systems are given, specifically a method for exploiting sub-task stack-memory information and time offset information to reduce the over-approximation of the worst-case stack usage presented in previous work. The proposed method is implemented and showcased on a set of synthetic benchmarks. The analyser makes no approximations and is thus subjected to exponential growth of the execution time, however the feasibility of the proposed method is shown for task sets of practical size for embedded systems (50 tasks and 250 subtasks takes minutes to analyse for the exact worst-case situation). The results in this paper is specifically applicable to the proposed CRO/RTFM based systems, while still being generally applicable for SRP based systems utilizing shared resources. For systems that depends heavily on critical sections and where the timing information resulting in non- preemption in between tasks can be derived the over-approximation of the stack-memory can be eliminated. Even without information of time offset and subtask stack memory usage the method is applicable and gives an solution on the worst case stack memory usage. Contribution: The generic system model and the writing of the paper is a joint work by the authors. The specific method for exploiting sub-task stack-memory informa- tion in the analysis, the implementation of the tool and the generation of the synthetic benchmarks should be accounted to Johan Eriksson.

4.6 Paper F: Real-Time For the Masses, Step 1: Pro- gramming API and Static Priority SRP Kernel Primitives

Authors: Johan Eriksson, Fredrik H¨aggstr¨om, Simon Aittamaa, Andrey Kruglyak, Per Lindgren Publication:8th IEEE International Symposium on Industrial Embedded Systems (SIES 2013) 24 Summary of publications

Summary: This papers presents a Roadmap for RTFM, an API for the RTFM scheduler and gives some benchmarks to compare the proposed solution to freeRTOS. The benchmarks show efficiency (both wrt. memory and execution time) with an order of magnitude improvement compared to freeRTOS. The work around the scheduler and the KCC (Kernel configuration compiler) presented in this paper is a development of the work done for the REKO code-generation backend. The comparison is made on freeR- TOS, however the generic result is applicable to most light weight multithreaded RTOS (such as RIOT, multithreaded RUST etc). Also the freeRTOS comparison is performed on a legacy (pre 2008) version of freeRTOS since later version restricts comparative measurements. Contribution: The roadmap for RTFM and the writing of the paper is a joint work between Per Lindgren, Simon Aittamaa, Johan Eriksson The benchmark for freeRTOS is joint work between Simon Aittamaa and Johan Eriksson. The API and KCC is joint work by the authors, and is based on the work done by Johan Eriksson for the REKO code generation backend.

4.7 Paper G: Uniform scheduling of internal and ex- ternal events under SRP-EDF

Authors: Simon Aittamaa, Johan Eriksson, Per Lindgren Publication:Published in International Conference on Real-Time & Embedded Sys- tems (RTES 2010) Summary: In this paper a method and kernel implementation of a simple scheduler for executing SRP with EDF scheduling of reactive systems on bare metal MCU’s is given. The work presented details a method for treating external and internal events in a uniform manner, thus simplifying both software design of the embedded system and analysis of the resulting code. Contribution: The work described in this paper is the result of joint work between the authors. The design and implementation of the proposed SRP EDF kernel is mostly done by Johan Eriksson and Simon Aittamaa

4.8 Paper H: Real-Time Execution of Function Blocks for Internet of Things using the RTFM-kernel

Authors: Per Lindgren, Marcus Lindner, Andreas Lindner, Johan Eriksson, Valeriy Vyatkin Publication:IEEE Emerging Technology and Factory Automation (ETFA 2014) A short version titled ”RTFM-4-FUN work in progress” was published in Proceedings of the 9th IEEE International Symposium on Industrial Embedded Systems (SIES 2014) Summary In these papers the applicability of the RTFM kernel/MOC is shown by mapping the IEC 61499 standard to RTFM primitives. This work includes implementa- 4.9. Paper I: Leveraging TinyOS for Integration in Process Automation and Control Systems 25 tion of a prototype tool-chain to generate embedded C-code utilizing the RTFM MOC directly from 61499 models designed in 4DIAC. The benefit of this approach is that there is no dependency of any external libraries or run-time systems. Moreover the 61499 stan- dard does not define how the SIFB (service interface function blocks, basically drivers) should be implemented/executed, utilizing the approach defined in this paper it is pos- sible to implement the SIFB’s using RTFM which opens up the possibility of applying the analysis techniques outlined in this thesis on the complete system including device drivers. Contribution: The mapping of 61499 to RTFM is joint work between the authors. The RTFM-4-FUN toolchain is developed by Johan Eriksson and Andreas Lindner, where Andreas Lindner has focused on code generation for individual 61499 function blocks and Johan Eriksson has focused on code generation of 61499 system models to RTFM/SRP.

4.9 Paper I: Leveraging TinyOS for Integration in Process Automation and Control Systems

Authors: Per Lindgren, Henrik M¨akitaavola, Johan Eriksson, Jens Eliasson Publication: 38th Annual Conference on IEEE Industrial Electronics Society (IECON 2012) Summary: In this paper an execution model (TOS-PRO) based on CRO (and pos- sibly implemented in the REKO framework) for TinyOS is presented that enables pre- emptive scheduling of TinyOS programs. By mapping the TinyOS execution model to CRO the TinyOS programming model can be upheld, and programs written for TinyOS can be executed under TOS-PRO mostly without changes. The benefits of using this ap- proach (aside from the benefits of using a preemptive system model) is that the analysis techniques discussed in this thesis can be applied to TinyOS programs. Contribution: Johan Eriksson has contributed in developing the mapping between the TinyOS execution model and CRO, and participated in the writing of the paper.

4.10 Paper J An IDE for Component-Based Design of Embedded Real-Time Software

Authors: Jimmie Wiklander, Johan Eriksson, Per Lindgren Publication: 6th IEEE International Symposium on Industrial and Embedded Sys- tems (SIES 2011) A extended version is available as a technical report [19], titled: ”Utilizing an IDE for Component-Based Design of Embedded Real-Time Software for an Autonomous Car” Summary: The early work on a graphical IDE (REKO) for graphical software de- velopment of CRO systems is presented. The tool supports specification of timing con- straints - utilized for run-time scheduling and schedulability analysis, code generation, code deployment, code benchmarking and schedulability analysis. The tool uses a XML 26 Summary of publications

file format, thus making a pure textual based programming cumbersome. The over- all system architecture/call tree/data connections is programmed graphically while the method/function bodies is programmed textually using C. The tool helps ensuring that this C code adheres to the CRO model by enforcing state encapsulation. The approach taken in this work enables the timing properties are captured at the design stage and preserved throughout the design process, and can the be utilized by the kernel for run- time scheduling and the timing requirements can be checked after code benchmarking. Moreover, in the technical report it is shown how to calculate the schedulability of the tar- get system in the IDE using standard scheduling theorems and machine assisted WCET measurements (Using manual control flow annotations). Contributions: The writing of the papers is joint work between the authors. The graphical user interface and example system should be accounted to Jimmie Wiklander. The backend code-generation, xml file specification/handling, code deployment and code- benchmarking should be accounted to Johan Eriksson. The machine assisted WCET measurement and design and implementation of the scheduling analysis system is done by Johan Eriksson

4.11 Paper K: RTFM-lang Static Semantics for Sys- tems with Mixed Criticality

Authors: Per Lindgren, Johan Eriksson, Marcus Lindner, David Pereira and Luis Miguel Pinho Publication Ada User Journal, Volume 35, Number 2, June 2014 Summary: In this paper a approach for ensuring system properties based on static semantic analysis directly on the system specification (expressed in a object oriented model in the experimental language RTFM-lang) Contributions: The RTFM-lang language design is done by the Authors, and is influenced by work done by Johan Eriksson for the REKO tool, and previous unpublished work on a CRO based tool/language developed by Johan Eriksson. Writing of the paper is joint work by the authors. Chapter 5 Conclutions, Ongoing and Future Work

5.1 Conclusions, Ongoing and Future Work

The proposed approach to design of embedded systems given in this thesis offers a simple programming / execution model tailored for embedded systems. The CRO execution model chosen is designed with the outset of offering robustness and analysability, this is realized mainly through a carefully selected set of limitations:

• Tasks must be run to end, no blocking / waiting for events.

• No dynamic allocation of data.

• All running code on the system must fit within the execution model.

• System is fully static and all communication patterns is known at compile time.

This programming model can thus not be applied to programming in general. But for the programming of resource constrained embedded devices the limitations are shown to be acceptable (e.g. by the example system in [19]). If a more feature rich programming environment is desirable it is shown that a translation to the CRO execution model might be possible (example of transformation of TOS(Paper I) and 61499 (Paper H)). The typical problems a embedded developer must tackle and the proposed solution to the problem are:

Coding errors: Such as pointer errors, array out of bounds etc. Can be addressed by adopting to a more strict language (such as in Paper K), or by using symbolic execution (as in the ongoing work in section 5.1.1).

27 28 Conclutions, Ongoing and Future Work

Race conditions: This is addressed in the programming model by wrapping shared resources in named critical sections, thus eliminating the need for solv- ing atomicity by global interrupt disable. The resource sharing protocol (SRP) together with the efficient implementation (Paper F) offers zero overhead (compa- rable to global interrupt disable) resource locking.

Limited resources: The proposed solution comes at near 0 cost compared to a efficient bare C-code implementation (Paper F). The composite system (user code + RTS) is fully static, with the exception of the run-time stack. To guarantee the maximum stack utilization a method is presented that gives a exact solution to the Stack memory requirements, by utilizing shared resource information, and time offsets between tasks.

Timing requirements: The timing information required for performing response time analysis is present in the programming model. (Paper A) This timing infor- mation can be utilized to perform a Response Time Analysis (RTA) (Paper D). However this assumes that the execution time of each task, including timing in- formation of resource utilization is known. In this work it is shown how this can be solved by using symbolic execution and benchmarking. The simple execution model chosen makes this a seemingly straightforward task, covered in 5.1.1. An- other alternative is to solve this through abstract interpretation and a MCU timing model, as covered in section 2.3.

Deadlocks: SRP offers deadlock free pre-emptive multitasking [49]

Possible language designs around the execution model has been evaluated. A similar Object Oriented (OO) design as in paper K might be preferable, this could be achieved with a language based on Rust, as discussed in paper C. Finalizing the language design and working on a formalization of the execution se- mantics are set topics for future work.

5.1.1 Ongoing work The problem of finding Coding errors and Automatic Program Flow analysis for the proposed MoC is not addressed in the appended papers, A preprint of promising ongoing work that solves both these problem using KLEE [50], a symbolic execution is included in chapter 6. References

[1] J. Eriksson, “Embedded Real-Time Software using TinyTimber : Reactive Objects in C,” Licentiate thesis / Lule˚a University of Technology.

[2] freeRTOS. Quality rtos & embedded software. [Online]. Available: http://www.freertos.org/

[3] Timber timber-lang.org. [Online]. Available: http://www.timber-lang.org

[4] T. Owen and D. Watson, “Reducing the cost of object boxing,” in International Conference on Compiler Construction. Springer, 2004, pp. 202–216.

[5] M. Vicente Romero, “Symbolic simulation for debugging and analysis of REKO models using KLEE,” Master’s thesis, 2013.

[6] IEC 61499 edition 2.0 2012/2013.

[7] R. Inam, J. M¨aki-Turja, M. Sj¨odin, and J. Kun`ear, “Run-time component inte- gration and reuse in cyber-physical systems,” Tech. Rep. ISSN 1404-3041 ISRN MDH-MRTC-256/2011-1-SE, December 2011.

[8] J. Wiklander, “A reactive approach to component-based design of resource- constrained embedded systems,” Ph.D. dissertation, Lule˚a University of Technology, Embedded Internet Systems Lab, 2011.

[9] K. H¨anninen, J. Lundb¨ack, K.-L. Lundb¨ack, J. M¨aki-Turja, and M. Sj¨odin, “Efficient event-triggered tasks in an rtos,” in International Conference on Embedded Systems and Applications (ESA), June 2005.

[10] P. Levis, S. Madden, J. Polastre, R. Szewczyk, K. Whitehouse, A. Woo, D. Gay, J. Hill, M. Welsh, E. Brewer et al., “Tinyos: An operating system for sensor net- works,” in Ambient intelligence. Springer, 2005, pp. 115–148.

[11] EETimes. (2014) 10 embedded design trends.

[12] NXP. Lpcopen libraries and examples. [Online]. Available: http://www.nxp.com/

29 30 References

[13] A. Dunkels, B. Gronvall, and T. Voigt, “Contiki - a lightweight and flexible operating system for tiny networked sensors,” in Proceedings of the 29th Annual IEEE International Conference on Local Computer Networks, ser. LCN ’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 455–462. [Online]. Available: http://dx.doi.org/10.1109/LCN.2004.38

[14] E. Baccelli, O. Hahm, M. Gunes, M. Wahlisch, and T. C. Schmidt, “Riot os: To- wards an os for the internet of things,” in Computer Communications Workshops (INFOCOM WKSHPS), 2013 IEEE Conference on. IEEE, 2013, pp. 79–80.

[15] RIOT. The friendly operating system for the internet of things. [Online]. Available: https://www.riot-os.org/

[16] A. Vulgarakis, J. Suryadevara, J. Carlson, C. Seceleanu, and P. Pettersson, “Formal semantics of the procom real-time component model,” in 2009 35th Euromicro Con- ference on Software Engineering and Advanced Applications, Aug 2009, pp. 478–485.

[17] T. Bures, J. Carlson, I. Crnkovic, S. Sentilles, and A. V. Feljan, “Procom - the progress component model reference manual, version 1.0,” Tech. Rep.

[18] P. Lindgren, M. Lindner, A. Lindner, J. Eriksson, and V. Vyatkin, “RTFM-4-FUN work in progress,” in Proceedings of the 9th IEEE International Symposium on In- dustrial Embedded Systems (SIES 2014), June 2014.

[19] J. Wiklander, J. Eriksson, and P. Lindgren, “Utilizing an IDE for component-based design of embedded real-time software for an autonomous car,” Lule˚a University of Technology, Tech. Rep., 2011.

[20] CODESYS. The rust project, faq. [Online]. Available: https://www.rust- lang.org/en-US/faq.html

[21] K. H. John and M. Tiegelkamp, IEC 61131-3: Programming Industrial Automation Systems Concepts and Programming Languages, Requirements for Programming Sys- tems, Decision-Making Aids, 2nd ed. Springer Publishing Company, Incorporated, 2010.

[22] Simatic step 7 standards compliance according to iec 61131 (3rd edition). [Online]. Available: http://www.siemens.com/

[23] Logix5000 controllers iec 61131-3 compliance. [Online]. Available: http://www.rockwellautomation.com/

[24] Beckhoff web. [Online]. Available: https://www.beckhoff.com

[25] CODESYS. industrial iec 61131-3 plc programming. [Online]. Available: https://www.codesys.com/ References 31

[26] A. Zoitl, Modelling Control Systems Using IEC 61499, ser. Control, Robotics & Sensors. Institution of Engineering and Technology, 2014. [Online]. Available: http://digital-library.theiet.org/content/books/ce/pbce095e

[27] V. Vyatkin, IEC 61499 Function Blocks for Embedded and Distributed Control Sys- tems Design. ISA, 2007.

[28] K. Thramboulidis, “Facts and Fallacies in the IEC61499 Function Block Model,” 2008.

[29] OSEK/VDX, “Operating system specification.”

[30] B. Sprunt, L. Sha, and J. Lehoczky, “Aperiodic task scheduling for hard-real-time systems,” Real-Time Systems, vol. 1, no. 1, pp. 27–60, 1989.

[31] W. Hofer, D. Lohmann, F. Scheler, and W. Schr¨oder-Preikschat, “Sloth: Threads as Interrupts,” in Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS 2009), T. P. Baker, Ed., Los Alamitos, CA, USA, 2009, pp. 204–213.

[32] AUTOSAR Development Partnership, “Specification of timing extensions, version 1.0.0, release 4.0.1.”

[33] J. Nordlander and P. Jansson, “A semantics of core autosar,” Department of Com- puter Science and Engineering, Chalmers University of Technology.

[34] O. Scheickl, C. Ainhauser, and P. Gliwa, “Tool support for seamless system devel- opment based on autosar timing extensions,” in Proceedings of Embedded Real-Time Software Congress (ERTS), 2012.

[35] V. Vyatkin, MISRA-C:2004 Guidelines for the use of the C language in critical systems. MIRA Limited, 2004.

[36] C. Boogerd and L. Moonen, “Assessing the value of coding standards: An empirical study,” in Software Maintenance, 2008. ICSM 2008. IEEE International Conference on. IEEE, 2008, pp. 277–286.

[37] R. Heckmann and C. Ferdinand, “Worst-case execution time prediction by static program analysis.” [Online]. Available: https://www.absint.com/aiT WCET.pdf

[38] S. Mubeen, J. M¨aki-Turja, and M. Sj¨odin, “Implementation of holistic response- time analysis in rubus-ice: Preliminary findings, issues and experiences,” in The 32nd IEEE Real-Time Systems Symposium (RTSS), WIP Session, December 2011.

[39] K. H¨anninen, J. M¨aki-Turja, S. Sandberg, J. Lundb¨ack, M. Lindberg, M. Sj¨odin, and K.-L. Lundb¨ack, “Framework for real-time analysis in rubus-ice,” in 13th IEEE In- ternational Conference on Emerging Technologies and Factory Automation,Septem- ber 2008. 32 References

[40] B. Lisper, “Sweet–a tool for wcet flow analysis,” in International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Springer, 2014, pp. 482–485.

[41] R. I. Davis, “Impact case study: How long does your real-time software take to run?”

[42] S. Mubeen, J. M¨aki-Turja, and M. Sj¨odin, “Extracting end-to-end timing models from component-based distributed embedded systems.” Springer New York, March 2014, pp. 155–169.

[43] S. Mubeen, J. M¨aki-Turja, M. Sj¨odin, and J. Carlson, “Analyzable modeling of legacy communication in component-based distributed embedded systems,” in Pro- ceedings of the 37th EUROMICRO Conference on Software Engineering and Ad- vanced Applications, SEAA 2011; Oulu; 30 August 2011 through 2 September 2011 :, 2011.

[44] S. Aittamaa, “Programming embedded real-time systems : implementation tech- niques for concurrent reactive objects,” 2011, godk¨and; 2011; 20101123 (simait); LICENTIATSEMINARIUM Amnesomr˚¨ ade: Inbyggda system/Embedded Systems Examinator: Professor Per Lindgren, Institutionen f¨or system- och rymdteknik, Lule˚a tekniska universitet Diskutant: Professor Mikael Sj¨odin, M¨alardalens h¨ogskola, V¨aster˚as/Eskilstuna Tid: M˚andag den 23 maj 2011 kl 13.00 Plats: A1514 - Demostu- dion, Lule˚a tekniska universitet.

[45] C. L. Liu and J. W. Layland, “Scheduling algorithms for multiprogramming in a hard-real-time environment,” Journal of the ACM (JACM), vol. 20, no. 1, pp. 46– 61, 1973.

[46] L. Sha, R. Rajkumar, and J. P. Lehoczky, “Priority inheritance protocols: An ap- proach to real-time synchronization,” IEEE Transactions on , vol. 39, no. 9, pp. 1175–1185, 1990.

[47] D.-Z. He, F.-Y. Wang, W. Li, and X.-W. Zhang, “Hybrid earliest deadline first/preemption threshold scheduling for real-time systems,” in Machine Learn- ing and Cybernetics, 2004. Proceedings of 2004 International Conference on,vol.1. IEEE, 2004, pp. 433–438.

[48] RTFM rtfm-lang.org. [Online]. Available: http://www.rtfm-lang.org

[49] T. P. Baker, “A stack-based resource allocation policy for realtime processes,” in Real-Time Systems Symposium, 1990. Proceedings., 11th. IEEE, 1990, pp. 191– 200.

[50] KLEE. (webpage) https://klee.github.io/. [Online]. Available: https://klee.github.io/ Chapter 6 RTFM-4-SURE preprint

33 34 RTFM-4-SURE preprint

RTFM-4-SURE An Automated Framework for Semiformal Analysis of Embedded Real-Time Software

Lulea˚ University of Technology Per Lindgren, and Henrik Tjader¨ Email: {per.lindgren, henrik.tjader}@ltu.se

Grepit AB Johan Eriksson Email: [email protected]

Abstract The mainstream of embedded development as of today is dominated by C programming. To aid the development, hardware abstractions, libraries, kernels and light-weight operating systems are commonplace. However, these typically offer little or no help to the verification process, thus test based approaches remain the de facto. For this paper we take the outset from the Real-Time For the Masses (RTFM) set of languages and tools being developed to facilitate embedded software development and provide highly efficient implementations suitable to the mainstream of embedded system design. The programming model is reactive, based on the familiar notions of concurrent tasks with shared resources (amounting to named critical sections). The supporting RTFM-kernel guarantees deadlock-free preemptive execution on bare metal single core targets by efficiently exploiting the underlying interrupt hardware for static priority scheduling and resource management under the Stack Resource Policy. This approach allows a plethora of well-known methods to static verification (response time analysis, stack memory analysis, etc.) to be readily applied. We report on the development of RTFM-4-SURE, a framework that integrates program verifi- cation and automated test vector generation with an automated testbed performing cycle accurate measurements of the generated tests. Besides WCET estimates, the semiformal analysis reveals common programming errors and checks arbitrary user/model specified invariants. The approach is straightforward (in many cases requiring no manual intervention by the developer), and thus complements or even replaces manual testing without requiring the expert knowledge, effort and cost associated with formal modelling and verification. The feasibility of the proposed approach is demonstrated in an experimental setup applied to an ARM Cortex-M4 target, showing that cycle accurate WCET measurements are obtained. 35

I. INTRODUCTION Embedded software is playing an increasingly important role to the design and imple- mentation of embedded devices of the CPS/IoT era. In additional to functional requirements, embedded software is often subject to requirements on robustness, safety, real-time properties and resource efficiency. To aid the development, ecosystems like hardware abstraction layers, libraries, kernels and light-weight real-time operating systems are commonly adopted. While improving design efficiency (code re-use) they typically leave verification at the hands of the developer. To this end, test based methods remain the de facto as formal methods are still yet to reach industrial state-of-practice. In comparison to formal (model based) approaches, target testing has the clear advantage that the results are derived from observations on the actual hardware, and is thus free from assumptions and approximations associated to models and imprecise program analysis. On the downside however, it is a tedious and error prone process to manually derive test vectors, instrument the code, perform tests, and collect/analyze the results. Moreover, test based verification of concurrent systems is known to be an inherently hard problem, and errors are hard to pin-down even when using advanced tracing tools. Furthermore, test based verification can cover only a limited subset of task interleavings. Additionally, test based approaches may also be fragile, small changes in the system, might imply large consequences to the real-time operation, which in effect calls for a complete re-verification of the system. In this paper we propose RTFM-4-SURE a framework for automated verification of both functional and non-functional requirements (e.g. overall schedulabilty, response times etc.). We take the outset from Real-Time For the Masses (RTFM), a set of languages and tools being developed to facilitate embedded software development and provide highly efficient implementations geared to static verification. The programming model is reactive, based on the familiar notions of concurrent tasks and critical sections (treated as single-unit resources). The execution model provides race-free concurrency and serialization of side effects under full control of the programmer. The supporting RTFM-kernel guarantees deadlock-free preemptive execution on bare metal single core targets by efficiently exploiting the underlying interrupt hardware for static priority scheduling and resource management under the Stack Resource Policy (SRP)[1]. Besides the advantages associated to SRP based scheduling (see Section VII) a plethora of well-known methods to static verification (response time analysis, stack memory analysis, etc.) have been established over the last decades (see e.g. the seminal work [1], and more recent publications [18],[8] and [13]). Methods for response time and schedulability analysis typically rely on a priori WCET information. In the case of SRP based analysis, the WCET and resource holding times for the taskset. Our approach takes the outset of test/measurement based methods well established in industry, leveraged by known methods to semiformal program analysis[2]. The RTFM-4-SURE verification process comprises: • automated semiformal program analysis, deriving concrete test vectors triggering each feasible path of execution and failed assertions (errors) for the tasks of the system; 36 RTFM-4-SURE preprint

• automated measurement based WCET and resource holding time estimation for the task set; • system wide analysis, e.g., schedulability, response time, stack memory etc. In this paper we focus on the program verification, test vector generation, and automated execution time testbed, leaving system wide analysis to future work. In Section II we review the RTFM task and resource model and the KLEE framework[2]. In Section III we present our approach to program analysis and test vector generation. In Section V we outline the automated testbed for cycle accurate measurements of WCETs and resource holding times. Section VI, outlines the RTFM-4-SURE framework and demonstrate its feasibility by a prototype implementation applied to a Cortex M4 target MCU. Related work on WCET estimation and system analysis are covered in Section VII. The contributions of the paper are summarized in Section VIII including an outlook to future work.

II. BACKGROUND A. RTFM Model of Computation For the purpose of this presentation we present a simplified model of computation, see e.g. [15], for further information. A system is defined as as set of tasks/jobs T = {t1, ..., tn}, and a set of single unit, non-preemptable resources R = {r1, ..., rk}. Each task ti is associated a static priority p(ti), a minimum inter arrival time ia(ti), and deadline dl(ti). Tasks are event triggered and executes a finite sequence of operations under run-to-end semantics. A task ti may claim exclusive access to a resource rj for a (named) critical section cs(ti,rj). Critical sections may only be nested in a LIFO manner. In effect the RTFM MoC adheres to the requirements for Stack Resource Policy (SRP) based analysis and scheduling. To this end, ri denotes the resource ceiling for the resource ri.

Example 1 Listing 1, depicts a system with three tasks j1, j2, and j3 (at priorities 1, 2, and 3 respectively1), and two resources r1, and r2. Resource ceilings are derived according to SRP as the maximum priority of any task that potentially claim the resource, see Figure 1.

Listing 1. main.c, example system. 1 ... 2 // task/job priorities 3 #define j1_Prio 1 4 #define j2_Prio 2 5 #define j3_Prio 3 6 7 // resource ceilings 8 #define r1_Ceiling 2 9 #define r2_Ceiling 3 10 11 // System state 12 static int state;

1Higher value indicates higher urgency. 37

j3 (prio 3)

j2 (prio 2) r2 (ceiling 2) j1 r1 (prio 1) (ceiling 3)

idle (prio 0)

Fig. 1. r1 is accessed by j1 and j3, r1 = max(p(j1),p(j3)) = 3, while r2 is accessed by j1 and j2, r2 = max(p(j1),p(j2)) = 2.

13 14 TASK(j1) { 15 LOCK(r2); 16 if (state > 0) { 17 work(10); 18 } else { 19 work(5); 20 } 21 LOCK(r1); 22 work(1); 23 UNLOCK(r1); 24 UNLOCK(r2); 25 } 26 27 TASK(j2) { 28 TRACE("j2_enter\n"); 29 LOCK(r1); 30 work(5); 31 UNLOCK(r1); 32 33 if (state == 100) { 34 work(2); 35 } else { 36 work(1); 37 } 38 } 39 40 TASK(j3) { 41 LOCK(r2); 42 work(1); 43 UNLOCK(r2); 44 TRACE("j3_exit\n"); 45 } 46 ... 38 RTFM-4-SURE preprint

B. RTFM-kernel The RTFM-kernel provides a minimalistic API (C code macros) that efficiently imple- ments SRP based scheduling and resource management for the ARM Cortex M families of MCUs. Listing 2, depicts the resource management on Cortex M3/4/7. For further details and performance evaluation, we refer the reader to [5] and [4].

Listing 2. SRP.h 1 ... 2 #define LOCK(R) { \ 3 int old_en = __get_BASEPRI(); \ 4 __set_BASEPRI_MAX((H(Ceiling(R)) << \ 5 (8U - __NVIC_PRIO_BITS))&(uint32_t)0xFFUL); \ 6 BARRIER_LOCK; 7 8 #define UNLOCK(R) \ 9 BARRIER_UNLOCK; \ 10 __set_BASEPRI(old_en); \ 11 } 12 ...

C. KLEE LLVM Execution Engine KLEE[2] is a tool for symbolic execution, operating on a LLVM IR (intermediate represen- tation) of the input program. Symbolic execution is the process of executing the code through an interpreter (virtual machine) while recording all possible path conditions and modifications to variables. Further traversal is terminated either due to completeness or on encountering an assertion violation (error). The path conditions are discharged by means of modern solvers [19] resulting in concrete satisfying assignments to the symbolic variables (i.e., test vectors from which the exact paths can be reconstructed). According to the authors, KLEE has two goals: 1) hit every line of executable code in the program, and 2) detect at each dangerous operation (e.g., dereference, assertion) if any input value exists that could cause an error. KLEE has been demonstrated to efficiently deal with real sized program suites like BusyBox and core-utils [2]. KLEE provides an API allowing to guide analysis including: • klee_make_symbolic(&v, sizeof(v), "v"), instructs KLEE to treat the vari- able v as symbolic (and initially unknown); • klee_assert(c), infers a condition c. KLEE path search progress through the as- sertion or will return with a counterexample triggering the error; • klee_assume(c), infers a (pruning) condition c that KLEE assumes for subsequent operations along the path; In combination, the API provides precise control over the path exploration as discussed in Sections III and VI. 39

III. SEMIFORMAL PROGRAM ANALYSIS This Section focus the adoption of KLEE to the purpose of program verification and test vector generation. For general KLEE usage we refer the reader to the KLEE documentation[12].

A. Tasks, Resources and RTFM-kernel calls a) Task: Task instances are either defined manually or derived from a RTFM-core model by the rtfm-core compiler[14]. From an implementation perspective a task amounts to a C function (taking optional arguments in the case of RTFM-core derived model). Each task instance is bound to the interrupt vector, and scheduled directly by the hardware. Hence for the purpose of analysis it is sufficient to analyze the code reachable from the interrupt vector (seen as the root of the system). b) Resources: A task t claiming a resource R for a section of code C, amounts to a sequence of operations S =[LOCK(R), C, UNLOCK()], where LOCK/UNLOCK are C code scheduling primitives implemented as macros allowing for efficient inlining. From a scheduling perspective we can see C as a named critical section cs(t, R), where R is held for the duration of C.

B. KLEE based vector generation We derive the per-task critical sections by symbolic execution. For the analysis (task path search) the scheduling primitives (macros) are replaced by empty definitions. Notice, at this point we are not concerned with the run-time behavior of the scheduling primitives, but rather to find concrete test vectors for each possible path (and thus in consequence cover all reachable critical sections).

Example 2 Figure 2, depicts the test vectors (terminal nodes) and the derivation tree. Starting from the top node (start), job and state are marked as symbolic. This corresponds to lines 6,7 of Listing 3. The assumption 0 < job < 4 restricts analysis to the three tasks at hand, corresponding to the node (assume) and line 9 of Listing 3. The terminal nodes (boxed) depicts concrete assignments (job and state) satisfying each path condition as derived by the KLEE tool.

Listing 3. main.c, KLEE harness. 1 ... 2 #ifdef KLEE 3 /* The case when KLEE makes the path analysis */ 4 static int job; 5 int main() { 6 klee_make_symbolic(&job, sizeof(job), "job"); 7 klee_make_symbolic(&state, sizeof(state), "state"); 8 9 klee_assume(job > 0 && job < 4); 10 #ifdef KLEE_STATE /* enable assumption on state */ 11 klee_assume((-10 < state) && (state < 10)); 12 #endif 40 RTFM-4-SURE preprint

start

job = ? state = ?

assume

0 < job < 4 state = ?

switch (job)

1 2 3

job = 3 if (state > 0) if (state == 100) state = 0

false true false true

job = 1 job = 1 job = 2 job = 2 state = 0 state = MAXINT state = 0 state = 100

Fig. 2. Derivation tree for the semiformal analysis, from symbolic values (start) to concrete assignments satisfying each path condition (terminals).

13 switch (job) { 14 default: 15 case 1: // check task 1 16 j1(); break; 17 case 2: // check task 2 18 j2(); break; 19 case 3: // check task 3 20 j3(); break; 21 } 22 } 23 ...

Example 3 Figure 3, depicts a the derivation tree under the additional assumption −10 < state < 10, corresponding to enabling KLEE_STATE line 11 of Listing 3. In comparison to Example 2, we have one path less as the path condition state == 100 is unfeasible. (Dead code can be identified by a coverage report, out of scope for this presentation.)

Example 4 Figure 4, depicts a the derivation tree for the modified task j3 Listing 4 under the assumption job == 3 and −10 < state < 10. KLEE successfully tracks the data dependencies and comes up with 10 distinct paths. A sneak peek 41

start

job = ? state = ?

assume

0 < job < 4 state = ?

0 < job < 4 -10 < state < 10

switch (job)

1 2 3

job = 2 job = 3 if (state > 0) state = 0 state = 0

false true

job = 1 job = 1 state = 0 state = MAXINT

Fig. 3. Addressing the unknown.

to the measured execution times reveals that state == 4 gives the worst case, which may not be obvious to the naked eye!

Listing 4. main.c, indirect data dependencies. 1 ... 2 TASK(j3) { 3 TRACE("j3_enter\n"); 4 LOCK(r2); 5 TRACE("j3_r2_locked\n"); 6 work(1); 7 UNLOCK(r2); 8 TRACE("j3_exit\n"); 9 10 int b=2* state; 11 if (b < 10) 12 work(10); 13 14 for (int i = 0; i < state; i++) 15 work(1); 16 } 17 ... 42 RTFM-4-SURE preprint

j = ? s = ?

j = 3 s = ?

j = 3 -10 < s < 10

b < 10

false true

for (int i = 0; i < state; i++) for (int i = 0; i < state; i++)

j = 3 j = 3 j = 3 j = 3 j = 3 j = 3 j = 3 j = 3 j = 3 j = 3 s = 5 s = 6 s = 7 s = 8 s = 9 s = 0 s = 1 s = 2 s = 3 s = 4 6337 7374 8411 9448 10485 11174 12211 13248 14285 15322

Fig. 4. Numbers 6637 etc., give a sneak peak to measured execution times, showing that non-trivial data dependencies are successfully handled by our approach. For brevity state and job are shortened to s and j respectively.

C. Functional Verification The tests generated by KLEE will cover all the (feasible) paths of the given program (in our case the task set), while KLEE will not cover all the possible assignments of variables, arguments to functions/procedures, reads/writes to memory locations or arguments for prim- itive operations (instructions). However, specific assignments (or ranges thereof) may be of importance to trace, and/or check for feasibility in order to uphold invariants for correctness. In the following we discuss and demonstrate the use of annotations to static analyses and optionally run time verification (the latter by replacing the stated invariants by run-time checks). 1) Contracts: Register access is fundamental to low-level embedded software, to which end contracts in terms of invariants and assumptions can be added as annotations to the software model. Listing 5, depicts a fictive example for static contract verification2. The assertion klee_assert(’0’<= v && v <= ’9’) forces KLEE to prove the range of v (if failed a counter example will be provided). As a bonus, KLEE will generate an implicit

2The example assumes that the USART and ITM refers to known memory locations for the KLEE analysis. 43 assertion to prove that the index p is in bounds of the array ITM->PORT. The assumption klee_assume(’0’<= dr && dr <= ’9’) gives partial information to the returned value. In effect both (A) and (B) will fail the static KLEE verification. For (A) the assertion (line 15, Listing 5) on the range of v will fail, while (B) fails due to the implicit assertion on the array index p (given that we have 32 ITM ports, ranged [0:31]). Notice, the type checker of clang 3.8 is missing this seemingly trivial error of (B), a better compiler/linter should be able to spot this fault.

Listing 5. Register read assumptions and write assertions 1 #include "klee.h" 2 uint32_t usart_read() { 3 dr = USART2->DR; 4 klee_make_symbolic(&dr, sizeof(dr), "dr"); 5 return dr; 6 } 7 8 uint32_t usart_read_digit() { 9 uint32_t d = usart_read(); 10 klee_assume(’0’ <= d && d <= ’9’); 11 return d; 12 } 13 14 void itm_write_digit (uint8_t p, uint8_t v) { 15 klee_assert(’0’ <= v && v <= ’9’); 16 ITM->PORT[p].u8 = v; 17 } 18 19 TASK(j3) { 20 itm_write_digit (0, usart_read()); // (A) 21 itm_write_digit (32, usart_read_digit()); // (B) 22 } 2) Resource type invariants: In our MoC, resources are used to protect shared instances, e.g., peripherals and user defined structures (statically allocated memory regions). A general- ization of the aforementioned register contracts may be applied to encode type invariants. This can be done by associating a resource R with an assumption on the state of R (on entrance of the critical section) and a corresponding assertion on the mutated stated (on exit of the critical section). Such (resource) type invariants brings advantages to both code robustness/safety (as KLEE will prove the contract under the given assumptions), as well as complexity (as the assumptions will prune the search space).

D. Non-Functional Verification The generated test cases can be used to further establish non-functional properties. In this paper we focus on execution time, but other entities like energy or memory access patterns can be envisioned. 1) Time Variant Instructions: The timing of instructions are not side-effects that KLEE is aware of, thus LLVM instructions with varying execution time will not be covered from a WCET perspective. To address such problems in general the programmer must make sure that the worst case WCET is triggered during benchmarking (i.e. that KLEE generates test cases that cover the timing variants of the program). 44 RTFM-4-SURE preprint

a) Hardware Division: Hardware division is a typical example of time variant instruc- tions. For the Cortex case, the execution time can vary in between 2..12 cycles3. The potential underestimation can be calculated by making a KLEE trace for each path (by utilizing klee --replay-ktest-file) and combining this trace with a model of the resulting execution time vs the worst case execution time. For example, consider the following KLEE output:

Listing 6. KLEE replay output. 1 DATA_DEP_INSTR_DIV__main1.c:24 2 _NUM:0 3 _DENUM:1

The model of the hardware UDIV m/n instruction can from this information (UDIV 0/1) deduce that this instruction will take 2 cycles, thus giving a worst case underestimation of 10 cycles. Hence we can compensate for the potential underestimation of such time variant instructions for each path derived, by instrumentation code as shown below.

Listing 7. Instrumentation of hardware division, UDIV m/n. 1 klee_print_expr("DATA_DEP_INSTR_DIV__" 2 __FILE__ , __LINE__); 3 klee_print_expr("_NUM", m); 4 klee_print_expr("_DENUM", n); 5 m/n;

Another approach is to replace the division with a software implementation of the data dependent instruction. In general this approach works, but it will in the worst case generate exponential number of paths w.r.t argument bit size. b) Conditional instructions: A typical example is the following C code: V3 = (COND?V1:V2). LLVM will generate a conditional assignment without any path condition. Depending on the machine instruction set this is potentially unsafe w.r.t. to WCET analysis even if the target machine support conditional instructions. Also depending on the complexity, the LLVM → target compiler might generate branch instruction(s) (missed by the KLEE path condition analysis). In the case of conditional instructions, a safe transformation is achieved by explicitly branching counterparts. c) Emulated Instructions: The aforementioned problem also applies to Software Em- ulation of LLVM instructions in general (when corresponding machine instruction(s) is/are missing, e.g. software emulation of floating point instructions). Such emulation code is largely written in assembly (due to performance) and thus out of reach for the KLEE IR path analysis. To this end, with knowledge of the externally linked code, a potential solution is to enforce KLEE path enumeration through inferring explicit intervals and conditionals (out of scope for this presentation). 2) Limitations and opportunities: The current implementation requires the programmer to manually annotate the occurrence of symbolic variables. For a variable declared symbolic at entrance of a task, consecutive dependencies will regard all assignments, assumptions and

3ARM Cortex -M4 Processor Revision: r0p1 Technical Reference Manual. 45 assertions within the sequential context. This is in general exactly what we want, and implies a minimum of annotations to the source. However, in case of access to a shared resource R (declared symbolic at task entrance) consecutive claims of the resource R may lead to underapproximation of the path enumeration (as possible preemptions by higher priority tasks mutating R are neglected). From a safety perspective, this problem can be address by limiting each resource to be claimed once within each task, or by introducing a fresh instance of the resource R, forcing KLEE to acknowledge the state of R as unknown. The latter will however be un-ergonomic put at the hands of the programmer (as the references to R will be to the fresh instance in the case of analysis, while references to R will be to the original (real) instance for the production code, thus forcing the programmer to maintain two versions of the code). We foresee this problem to be solved by further extending the RTFM-4-SURE framework by taking the outset of (resource) type invariants in combination with symbolic streams. This will provide further safety while relieving the programmer from manual annotations.

IV. COMPLEXITY -TAMING THE BEAST Path explosion is the biggest challenge with this approach.In general for a sequence of statements, the complexity of the total number of paths are (N1 ∗ N2 ∗ ..Ni) where Ni is the number of individual paths for the statement i. In the following we will showcase potential problems and mitigating actions on a set of fictive yet representative examples. 1) Sequential statements: For the following example the possible paths are 100 ∗ 4 = 400:

Listing 8. Consecutive branches/loops 1 klee_make_symbolic(&m, sizeof m, "m_sym"); 2 klee_make_symbolic(&n, sizeof n, "n_sym"); 3 klee_assume(m < 100); 4 int v=0; 5 6 for (int i = 0; (i <= m); i++) 7 v++; 8 9 switch(n){ 10 case 1: 11 v+=10; 12 case 2: 13 v += 100; 14 case 3: 15 v += 1000; 16 default: 17 ; 18 }

Our current unoptimized prototype implementation can perform about 100 test/minute. Thus, feasible path sets are at maximum a few thousand to still generate a answer in reasonable time. An option that reduces the complexity (number of paths generated for benchmarking) during development while still giving a reasonable approximation is to pass the -only-output-states-cove option to KLEE. KLEE will then only output paths that cover new code (however the 46 RTFM-4-SURE preprint complexity of the KLEE path analysis is not affected). This renders linear complexity ((1 + N − N − .. N − N ( 1 1) + ( 2 1) + ( i 1)), where i is the number of paths required to get full code coverage of the statement i. In the last example the number of paths required are reduced to 5, ( 1+(2− 1)for +(4− 1)swtich). Notice, while giving full code coverage (and thus safe to the end of code verification), the approach may lead to underestimated the WCET. For the example at hand the WCET may be greatly underestimated since there is no guarantee that KLEE spots the maximum allowed value on m. 2) Uncorrelated branches: Consider the following example, a straightforward implemen- tation that counts the number of bits in m:

Listing 9. Data dependent loop with uncorrelated branches 1 klee_make_symbolic(&m, sizeof m, "m_sym"); 2 for (int i = 0; (i < 8); i++) 3 if ((m >> i) & 1) v++;

The complexity on this code, w.r.t. possible number of paths = 28. (KLEE generates 256 possible paths.)

This can be mitigated by replacing branches with conditional assignments:

Listing 10. Data dependent loop, using conditional assignments 1 klee_make_symbolic(&m, sizeof m, "m_sym"); 2 for (int i = 0; (i < 8); i++) 3 v=v+(((m >> i) & 1)?1:0);

Since KLEE doesn’t consider the conditional assignment a path the complexity of this is 1 (and KLEE generates just 1 path). For some targets4 we trade performance (efficiency of analysis) to accuracy. 3) Iterated conditionals - correlated branches: Path explosion maybe a problem with iterated conditionals. However, KLEE tracks data dependencies and generates only 2 paths for the below example. This since the branching condition is only dependent on n, which remains constant across all iterations.

Listing 11. Data dependent loop with correlated branches 1 klee_make_symbolic(&m, sizeof m, "m_sym"); 2 klee_make_symbolic(&n, sizeof n, "n_sym"); 3 int v=0; 4 for (int i = 0; (i < 8); i++) 5 if (n)v|=m&(1<

4) Iterated conditionals - un-correlated branches: KLEE also correctly handles more intricate cases, consider this code (Bitcopy) that iteratively computes v = m&sel (bitwise and). KLEE will in general generate 2n paths, for this example 28 = 256 paths:

Listing 12. Bitcopy 1 klee_make_symbolic(&m, sizeof m, "m_sym"); 2 klee_make_symbolic(&sel, sizeof sel, "sel_sym");

4Targets with timing dependent conditional assignments. 47

3 int v=0; 4 for (int i = 0; (i < 8); i++) 5 if ((1<

We may reduce the complexity by local constraints, enforcing branch correlation. By restricting the selector (sel) to the edge case values {0, 0xff} KLEE only generate 2 paths, while the loop iteration code remains unchained. For this particular case the edge cases covers the worst case behavior. However it is up to the programmer to select representative edge case values to trigger worst case. Notice, this mitigating action implicitly restricts v. The local constraint further propagates in the sequential context, causing potential underestimation.

Listing 13. Bitcopy, constrained input 1 klee_make_symbolic(&m, sizeof m, "m_sym"); 2 klee_make_symbolic(&n, sizeof n, "n_sym"); 3 int v=0; 4 int sel = ((n & 1) ? 0xff : 0); 5 for (int i = 0; (i < 8); i++) 6 if ((1<

5) Contracts: Another option to tame path explosion was already discussed in Section III-C1, by the means of contracts. To the end of type invariants, the pruning will be safe (assuming invariants successfully proven by KLEE), while assumptions on the environment may require run-time verification.

V. T ESTBED FOR DEBUG AND TEST In the following we describe a general testbed for debug and test of embedded targets, and discuss, as a running example, how it can be deployed to the ARM Cortex range of MCUs.

A. Overview General Testbed 1) GNU Debugger: The GNU GDB tool provides both command line and scripting control to the debugging process. Additionally, it provides front-end support allowing integration to interactive environments (such as ). For the purpose of debugging embedded software it allows to connect to a remote target over various media. In Figure 5, we depict a general GDB-based testbed5.

        ! "%%%%  ! "#$#$             

Fig. 5. Debug testbed; GDB, openocd, connected over USB to remote target running a JTAG-server.

5For the experimental setup we have used the tools, arm-none-eabi-gdb (7.12.1), openocd (0.10.0), and on the remote target (STM32-401RE Nucleo board), running st-link JTAG-server connected to the MCU. 48 RTFM-4-SURE preprint

2) OpenOCD: OpenOCD is a free and open tool for on-chip debugging, in-system pro- gramming and boundary-scan testing. It provides GDB server functionality (acting a remote server for the target device). It supports numerous protocols for the JTAG layer (among those the Cortex DAP discussed below), and provides functionality for programming (flashing) the non-volatile memory of target devices. 3) Cortex Debug Access Port (DAP): The Cortex Debug Access Port (DAP) command set provides a driver-less USB HID connection between the host and the target device. The protocol allows controlling the debug features of the target, and is used and can be used as an interface driver to the OpenOCD GDB server, as well as an API for raw access. 4) JTAG: The JTAG protocol defines a standard test access port and boundary-scan archi- tecture. The JTAG standard has a huge uptake throughout the electronics industry, applied to testing of circuit boards, programming and testing of memory, as well as to debugging of embedded processors. In the latter case, JTAG is deployed as the transport mechanism to access on-chip debug modules inside the target CPU. The Cortex DAP (and its JTAG adapter implementation) is one prominent example.

B. Overview KLEE Testbed extensions A Makefile automates the multiple different stages: compilation, analysis, and measure- ments. 1) LLVM compilation, LLVM linking into one IR file 2) KLEE program analysis and test vector generation 3) Compilation for ARM target 4) Benchmarking on target hardware via GDB scripting 5) Parsing of the analysis data The Makefile uses pre-processor macro definitions to steer compilation for the different stages. 1) LLVM Compilation: KLEE analysis works on LLVM intermediate representation (IR) files. 2) KLEE Analysis: KLEE will create a file of the type .ktest per test-case (vector) detailing the value for every tracked (symbolic) variable.

Listing 14. main.c GDB benchmarking. 1 #elif defined KLEE_WCET 2 /* The case when GDB runs WCET benchmarking */ 3 static int job; 4 int main() 5 { 6 dwt_enable(); 7 DWT->CYCCNT = 0; 8 9 switch (job) 10 { 11 case 1: 12 TRACE_EVENT(S, j1); 13 j1(); 49

14 TRACE_EVENT(E, j1); 15 break; 16 case 2: 17 TRACE_EVENT(S, j2); 18 j2(); 19 TRACE_EVENT(E, j2); 20 break; 21 case 3: 22 TRACE_EVENT(S, j3); 23 j3(); 24 TRACE_EVENT(E, j3); 25 break; 26 } 27 28 finish_execution(); // GDB breakpoint 29 }

Listing 15. SRP_wcet.h 1 typedef enum { 2 r1, r2, /* Resources */ 3 j1, j2, j3 /* Tasks */ 4 } elem; 5 6 typedef enum { 7 NOACTION, /* Unassigned action */ 8 L, /* Lock */ 9 U, /* Unlock */ 10 S, /* Start */ 11 E, /* End */ 12 } action; 13 14 typedef struct { 15 uint32_t time; /* The DWT counter value */ 16 elem elem; /* Timestamped element */ 17 action action; /* Timestamped action */ 18 } event; 19 20 int event_count = 0; 21 /* Create array containing the trace events */ 22 #define MAX_NUM_LOCKS 1000 23 event eventlist[MAX_NUM_LOCKS]; 24 25 void __attribute__ ((noinline)) 26 trace_event(action a, event e) { 27 eventlist[event_count].time = DWT->CYCCNT; 28 eventlist[event_count].action = a; 29 eventlist[event_count].elem = e; 30 event_count+=1; 31 } 32 33 #define LOCK(X) trace_event(L, X); 34 #define UNLOCK(X) trace_event(R, X);

3) LLVM Compilation for ARM target: The final linking for the Cortex hardware is done via GNU GCC LD. 4) Measurement: A Python script interfacing with arm-none-eabi-gdb parses each of the symbolic variables contained in the klee-last/ folder for each of the different test-cases. 50 RTFM-4-SURE preprint

The script will break the target at main() (Listing 14) and then update each symbolic variable according to the contents of the ktest-file of the current test-case. It then proceeds by running the code until finish_execution() is reached. During the measurement phase a macro TRACE_EVENT(s, r) populates an array with objects containing the action, s, and the resource r as well as the current value of the CYCCNT cycle counter (Listing 15). In the end of execution, the script parses and dumps the trace data array into a sqlite database for later review. After the data is stored the MCU is reset to main() and the whole process is repeated for the next ktest-file. 5) WCET Analysis: Provided is another Python script for parsing the database and calcu- lating the number of cycles taken for each task and resource and displaying them in human readable format.

VI. RTFM-4-SURE The RTFM MoC defines a static task and resource model, core to the analysis performed by the RTFM-4-SURE framework. In this paper we focus on automated program verification and the capability to derive cycle accurate WCETs though a measurement based approach. In this section we demonstrate its feasibility, through a prototype implementation, and evaluate its performance.

A. KLEE Analysis and WCET acquisition The RTFM-4-SURE testbed has been applied to the running example (Listing 1). The resulting test vectors, WCETs and resource holding times are shown in Table I. As expected we see 5 paths (corresponding to Figure 2). The measurements are consistent (cycle accurate) between runs, giving confidence of repeatability. 1) Probing effects: For clarity and simplicity of the presentation, the testbed implementa- tion of the kernel primitives (Listing 15) constitutes only instrumentation code. This results in an instrumentation effect to the derived execution times, omitting the actual kernel overhead (amounting to a few machine instructions per LOCK/UNLOCK). In general, such probing effects can be solved by merging the instrumentation and production code implementations (providing a 1-1 correspondence between the system under test and the production code).

B. Response time analyses The derived WCETs and resource holding times can be utilized to various methods of scheduling analysis. In general additional information on task inter arrival times is assumed. Table II, show the response times under the assumption that the busy period equals the deadline, as derived by a simple Excel sheet6.

6More precise analysis is possible by deriving the busy period from a recurrence equation as implemented by our KCC tool [6]. 51

TABLE I MEASURED NUMBER OF CPU CLOCK CYCLES.

Task/Job state WCET cs(t, R1) cs(t, R2) j3 0 1135 - 1046 j2 0 6165 5048 - j2 100 7165 5048 - j1 0 6216 1048 6127 j1 MAXINT 11216 1048 11127

1) Accuracy: Besides the probing effects and over estimations mentioned, the following considerations apply: • Flash memory access: The Cortex M can run code from both RAM and flash. Flash memory latency varies with target implementation and clock frequency. Given that WCET measurements and production code executes on the same target and frequency as the production code, estimates are accurate7; • Interrupt latency: Interrupt OH (stacking/de-stacking) imply a worst case constant for the Cortex M family; • Flash cache and pre-fetch buffer: The Cortex M family has a simple memory hierarchy, with bound worst case behaviour. During measurements, tasks are executed from main, with no prior caching or pre-fetching, thus safe to that end. However, flash cache/pre- fetch may be polluted due to preemption. To this end, the analysis at hand gives an upper bound to pollution effects, e.g., task j3 may be preempted 5 times, while task j1 suffers no pollution; • DMA transfers: If program and DMA transfers access the same memory region, bus arbitration may impose additional latency to the executing code. In general such effects are complex to model and predict. In our setting a worst case behavior can be obtained by performing the measurements under ongoing worst case DMA load.

C. Performance Evaluation The performance evaluation was executed on an Arch Linux VM, under a MacBook Pro host (i7 at 3.4Ghz) running Virtual Box with 2 cores and 8G RAM allocated to the guest. The openocd server was running at the host, with IP forwarding to the guest such to avoid USB emulation. 1) KLEE analysis and test vector generation: In the first set of experiments (Table III), we study the KLEE analysis. The example is chosen to expose highly non-deterministic behavior to unknown yet bound inputs. Experiments A, B and C scales the number of paths to 10, 100 and 1000 respectively for the task j3, Listing 4.

7Code execution may be exposed to DMA effects as discussed separately. 52 RTFM-4-SURE preprint

TABLE II RESPONSE TIME PER TASK.

Task Inter-Arrival WCET Blocking Interference Response /Job Time j3 20000 1135 11127 0 12262 cs(j3,R2) j2 30000 7165 11127 2070 2 ∗ C(j3) 20562 cs(j3,R2) j1 40000 11216 0 17735 28951 3 ∗ C(j3) + 2 ∗ C(j2)

We can see that the number of instructions interpreted by KLEE scales approximately linear to the number of resulting paths, while complexity (User) of path exploration increases exponentially. As expected methods to tame explosion might be required to constrain larger problems. 2) Measurement based WCET acquisition: In the second set of experiments (Table IV), we study the complete benchmarking process, automated as a single click solution. Overall, for the set of experiments the process is dominated by the python/GDB interaction as seen roughly as the Real2 − (User2 + Sys2 + Real1), The cost of measurements scales linearly to the number of test vectors, as the actual execution time of the MCU instructions are negligible in this case. 3) Evaluation and discussion: The overall process is non-optimized and should be seen as a proof of concept. E.g., pipelining of test vector generation and measurements is possible. Moreover the python/GDB interaction could possibly be replaced by a faster direct commu- nication to openocd, or even omitted by direct communication to the DAP/JTAG server. As the path generation may be exponential we expect the KLEE analysis part to outgrow the measurements for systems with larger non-determinism. Nevertheless, for typical embedded applications with limited non-determinism (and hence limited set of unique paths to consider), the proposed straightforward approach appears applicable.

TABLE III EXPERIMENT 1: EFFICIENCY OF RTFM-4-SURE, KLEE.

Exp # Paths # Instr User1 Sys1 Real1 A 10 14781 0.107s 0.00s 0.164s B 100 99021 0.900s 0.057s 1.240s C 1000 941421 41.810s 3.543s 50.668s 53

TABLE IV EXPERIMENT 2: EFFICIENCY OF RTFM-4-SURE, BENCH.

Exp # Paths # Instr User2 Sys2 Real2 A 10 14781 0.920s 0.327s 6.501s B 100 99021 1.887s 1.053s 41.585s C 1000 941421 45.487s 10.653s 7m7.459s

VII. RELATED WORK A. WCET estimation Execution time estimation has traditionally been conducted through manual test and mea- surement based approaches, either on the actual targets or using simulated environments. However, such processes are tedious and error prone. Moreover, with growing source code complexity, its inherently difficult to ensure that the worst case behaviour has been exposed. To this end, over the last decades, much effort has been spent both in academia and industry to provide methods and tools that improves on manual test and measurement approaches. Ermedahl and Engblom [7] give an excellent overview to the area and layout different ap- proaches to model based and hybrid WCET estimation. Industrial experiences of commercial WCET tools are also covered, identifying a number of outstanding observations. Among these we find8: 1) lack of ”one-click” analysis; 2) lack of automatic loop bound calculation; 3) structure of code has significant impact to analyzability; 4) manual annotations can sometimes be burdensome to generate; 5) WCETs under certain operation conditions are desirable; 6) auto generated code should take analyzability into consideration; 7) operating-system services and interrupt handlers should be taken into account; 8) the large diversity in processors used in embedded systems hinders the deployment of static WCET analysis; 9) measurements and static timing analysis complement each other, together giving a better understanding of the system timing and increase the trust in the resulting timing estimates RTFM-4-SURE seek to address the above observations and challenges: 1) given the harness (Listing 3) test vector generation and execution times are derived in a fully automated manner; 2) data dependencies are tracked throughout the complete call graph, hence loop bounds are derived based on all partial information given, e.g., a top level assumption gives a concrete loop bound as shown in Example 4;

8For the full list of items, see [7]. 54 RTFM-4-SURE preprint

3) the RTFM MoC gives a well structured model amenable to static analysis; 4) the tracing of data dependencies 2) reduces the need for annotations; 5) modes of operation can be encoded in the harness and focused by assumptions, e.g., Example 4 focus analysis to a mode of operation with just one task (j3); 6) auto generated code from RTFM-core models, preserves the MoC 3) and thus amenable to further static analysis; 7) the RTFM MoC unifies application code, interrupts, and operating system services, thus becomes part of the system wide analysis; 8) the program analysis are performed on a LLVM IR representation of the code compiled for the target architecture, hence any target supported by LLVM should work; only the testbed need to be adopted to support new targets; As supporting the ARM Cortex M architecture, a wide variety of targets is already supported. 9) RTFM-4-SURE combines the advantages of measurement and static analysis; Recent related work includes [20], concluding that creating architectural models based on manufacturer’s data sheets is exhaustive, and while safe may lead to gross over estimations using the OTAWA framework. A representative example of hybrid approaches (adopting static analysis complemented by measurements) is found in the commercial tool RapiTime (by Rapita Systems)[11]. A prediction of the worst-case path and WCET is carried out by combining the obtained timing information and the structural model of the code. Over estimations due to imprecise flow facts and approximations during analysis have been recently studied in [3]. In comparison, our approach derives test for all feasible paths or fails due to time- or other user defined - limitations being reached. Hence, test vectors and corresponding measurements derived are ensured to accurately expose worst case behaviour, taking all traceable9 data dependencies into consideration. Thus, our approach may improve accuracy to both architectural model based methods as well as hybrid techniques. Other representative of state-of-the-art tools: 1) aiT from AbsInt [9]: aiT is a commercial WCET analysis package the performs static analysis code snippets by abstract interpretation on the binary code for a wide range of target platforms. In general aiT requires user annotations for generating meaningful output. aiT computes the upper bound including caches and pipeline interactions, based on timing accurate architectural models. In general aiT is not able to calculate the overall schedulability of the system. Instead aiT lets the user characterize selected sub-parts of the system, and perform a schedulability analysis using some other tool/methodology based on the derived WCETs. 2) SWEET [16]: SWEET is a open source tool developed at MRTC (Malardalen¨ Real- Time Research Centre), it includes a C compiler that outputs a intermediate format (ALF), used for the abstract execution. This approach is quite similar to our the KLEE + LLVM approach, where KLEE operates on the LLVM IR (intermediate representation).

9Up to the limitations of the KLEE tool. 55

SWEET assumes a third party timing model, analyzing the emitted flow facts for the target architecture. Similarly to aiT, overall schedulability analysis is out of reach to the SWEET tool.

B. Modelling and analysis Schedulability analysis has been given much attention (see, e.g., the seminal work [17]). The theoretical work typically takes the outset of modelling the system in terms of (concurrent) tasks, potentially supporting shared resources. We base our work on the Stack Resource Policy (SRP) protocol [1]. SRP based scheduling brings a number of benefits, most prominently: • deadlock-free execution; • single (non-transitive) blocking; • shared stack execution; • single context switch per task; In parallel to the development of the RTFM-kernel, a similar approach (SLOTH) was proposed to interrupt driven execution of OSEK models[10]. While SLOTH focus on the scheduling and resource management of OSEK models, RTFM-lang and accompanying MoC aims at providing an ecosystem for modelling, analysis, and implementation of general purpose embedded real-time software. As we have seen in Section V-B5, the proposed RTFM-4- SURE framework can derive WCET estimations from which response time analysis can be readily applied.

VIII. CONCLUSIONS In this paper we have introduced RTFM-4-SURE, a framework for semiformal verification of embedded real-time software. By taking the outset of the familiar task and resource model of the RTFM-MoC, a tight integration to scheduling analysis is possible. We have shown that cycle accurate task WCETs and resource holding times can be derived by a single-click testbed tool, integrating test vector generation and execution time measurements on target MCUs in a fully automated fashion. The test vectors are derived by path enumeration through symbolic execution of the tasks using the KLEE tool. Besides test vector generation, KLEE verifies assertions ensuring invariants for correctness, e.g., automatically inferred assertions to array indexing and user specified invariants. To the latter, we have shown how contracts and type invariants can be encoded. Furthermore, we have discussed complexity issues regarding both program analysis and automated measurements, along with mitigating actions. As a proof of concept, a set of experiments has been performed on a prototype imple- mentation of the framework applied to a Cortex M4 target. The evaluation concludes that cycle accurate measurements are obtained with perfect repeatability. Accuracy as well as performance aspects of the multi-step verification process have been assessed. Moreover, similarities and differences to existing approaches have been discussed. Overall, our experiences indicate that the RTFM-4-SURE approach has the potential to offer a single-click solution to the verification of both functional and non-functional properties. We 56 RTFM-4-SURE preprint foresee a number of potential improvements, and directions to future work. These include: 1) automated code annotations based on contracts and type invariants, with the advantages to improved robustness and reliability to the embedded software, as well as to reduced complexity of the analysis; 2) optimization of the testbed though parallelization of analysis and pipelining of back-end measurements; 3) opportunities to system wide analysis.

REFERENCES

[1] T. P. Baker. Stack-based scheduling for realtime processes. Real-Time Syst., 3(1):67–99, Apr. 1991. [2] C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 209–224, Berkeley, CA, USA, 2008. USENIX Association. [3] H. Casse,´ H. Ozaktas, and C. Rochange. A Framework to Quantify the Overestimations of Static WCET Analysis. In F. J. Cazorla, editor, 15th International Workshop on Worst-Case Execution Time Analysis (WCET 2015), volume 47 of OpenAccess Series in Informatics (OASIcs), pages 1–10, Dagstuhl, Germany, 2015. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. [4] J. Eriksson, S. Aittamaa, J. Wiklander, P. Pietrzak, and P. Lindgren. SRP-DM Scheduling of Component-Based Embedded Real-Time Software. First International Workshop on Dependable and Secure Industrial and Embedded Systems, 2011. [5] J. Eriksson, F. Haggstrom, S. Aittamaa, A. Kruglyak, and P. Lindgren. Real-time for the masses, step 1: Programming API and static priority SRP kernel primitives. In SIES, pages 110–113. IEEE, 2013. [6] J. Eriksson, F. Haggstr¨ om,¨ S. Aittamaa, A. Kruglyak, and P. Lindgren. Real-time for the masses, step 1: Programming and static priority srp kernel primitives. In Industrial Embedded Systems (SIES), 2013 8th IEEE International Symposium on, pages 110–113. IEEE, 2013. [7] A. Ermedahl and J. Engblom. Execution time analysis for embedded real-time systems. Handbook of Real-Time Embedded Systems, s. 35.1, 2007. [8] K. Hanninen,¨ J. Maki-Turja,¨ M. Bohlin, J. Carlson, and M. Sjodin.¨ Analysing stack usage in preemptive shared stack systems. Technical Report ISSN 1404-3041 ISRN MDH-MRTC-202/2006-1-SE, July 2006. [9] R. Heckmann and C. Ferdinand. Worst-case execution time prediction by static program analysis. [10] W. Hofer, D. Lohmann, and W. Schroder-Preikschat.¨ Sleepy Sloth: Threads as Interrupts as Threads. In L. Almeida and S. Brandt, editors, Proceedings of the 32nd IEEE Real-Time Systems Symposium (RTSS 2011), pages 67–77, Los Alamitos, CA, USA, 2011. [11] R. Kirner, P. Puschner, and I. Wenzel. Measurement-based worst-case execution time analysis using automatic test-data generation. In IN PROC. IEEE WORKSHOP ON SOFTWARE TECH. FOR FUTURE EMBEDDED AND UBIQUITOUS SYSTS. (SEUS05, pages 7–10, 2004. [12] KLEE. https://klee.github.io/. last accessed 2017-03-17. [13] P. Lindgren, J. Eriksson, S. Aittamaa, P. Pietrzak, and J. Wiklander. Scheduling of CRO systems under SRP-DM. Technical report, Lule University of Technology, 2011. [14] P. Lindgren, M. Lindner, E. Fresk, D. Pereira, and L. Pinho. RTFM-core: Language and Implementation, 2014. Paper presented at Embedded Systems Week, New Delhi, India. [15] P. Lindgren, D. Pereira, J. Eriksson, M. Lindner, and L. M. Pinho. RTFM-lang Static Semantics for Systems with Mixed Criticality. In Ada-Europe 2014: 19th International Conference on Reliable Software Technologies, 2014. [16] B. Lisper. Sweet–a tool for wcet flow analysis. In International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, pages 482–485. Springer, 2014. [17] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46–61, Jan. 1973. [18] J. Maki-Turja¨ and M. Nolin. Efficient implementation of tight response-times for tasks with offsets. Real-Time Systems, 40(1):77–116, 2008. [19] H. Palikareva and C. Cadar. Multi-solver support in symbolic execution. In International Conference on Computer Aided Verification (CAV 2013), pages 53–68, 7 2013. [20] M. H. Thomas Jerabek. Static worst-case execution time analysis tool development. DEPEND 2016 The Ninth International Conference on Dependability, 2016. Part II

57 58 Paper A Implementation of SRP-DM Scheduling for Embedded Real-Time Software

Authors: Johan Eriksson, Simon Aittamaa, Jimmie Wiklander, Pawel Pietrzak and Per Lindgren

Originally published in: Proceedings of the First International Workshop on Dependable and Secure Industrial and Embedded Systems (WORDS), V¨aster˚as, Sweden, 2011.

c LTU

59 60 Implementation of SRP-DM Scheduling for Embedded Real-Time Software

Johan Eriksson∗, Simon Aittamaa∗, Jimmie Wiklander∗, Pawel Pietrzak∗ and Per Lindgren∗ ∗EISLAB, Lulea˚ University of Technology, 971 87 Lulea˚ Email: {Johan.Eriksson/Simon.Aittamaa/Jimmie.Wiklander/Pawel.Pietrzak/Per.Lindgren}@ltu.se

Abstract—Model and component based design is an established the system can observed as its output to incoming events. means for the development of large software systems, and is In general, system output also depends on previous events. starting to get momentum in the realm of embedded software Thus, embedded systems of scale are stateful. To deal with development. In case of safety critical (dependable systems) it is crucial that the underlying model and its realization captures complexity and allow for CBD of such systems, we partition the requirements on the timely behavior of the system, and that state and functionality into a hierarchy of Concurrent Reactive these requirements can be preserved and validated throughout Components (CRC). Components are specified in terms of the design process (from specification to actual code execution). Concurrent Reactive Objects (CRO) instances and component To this end, we base the presented work on the notion of instances [4], CRO allows the intended behavior of the system Concurrent Reactive Objects (CRO) and their abstraction into Reactive Components. to be expressed in terms of Time-Constrained Reactions (TCR) In many cases, the execution platform puts firm resource [5]. A CRO instance is either implemented in software (e.g, limitations on available memory and speed of computations that synthesized from the CRO model) or implemented by the must be taken into consideration for the validation of the system. system’s environment. This allows incorporating hardware In this paper, we focus on code synthesis from the model, and interactions and legacy code (typically external software li- we show how specified timing requirements are preserved and translated into scheduling information. In particular, we present braries) in the model, as long as their interface is compliant how ceiling levels for Stack Resources Policy (SRP) scheduling with the CRO model. and analysis can be extracted from the model. Additionally, to In this paper, we focus on code synthesis of CRO models support schedulability analysis, we detail algorithms that for a and extraction of information for scheduling (during run-time) CRO model derives periods (minimum inter-arrival times) and as well as offline schedulability analysis. To this end, we show offsets of tasks/jobs. Moreover, the design of a micro-kernel supporting cooperative hardware- and software-scheduling of how specified timing requirements (in the CRO model) are CRO based systems under Deadline Monotonic SRP is presented. preserved and translated into scheduling information, specifi- cally resource ceilings and priority levels for Stack Resource I. INTRODUCTION Policy (SRP) scheduling. Moreover, to support schedulability Model and Component Based Design (CBD) has over the analysis, we detail algorithms that for a CRO model derives years proven to be an effective means for the development of periods (minimum inter-arrival times) and offsets of tasks/jobs. large software systems. Key drivers behind are to increase the Additionally we present the design of a micro-kernel support- efficiency of the design process (mainly through the re-use of ing SRP Deadline Monotonic (SRP-DM) scheduling of CRO components) and to improve the product quality (mainly by based systems, exploiting the interrupt hardware for efficient facilitating the validation process through allowing for separate scheduling. verification of components). Section II gives the necessary background. We detail the With the ever increasing complexity of embedded systems, underlying CRO model (and its abstraction to the CRC model) CBD of embedded software is starting to gain momentum. and briefly recapture the notions and key features of SRP. In In case of embedded, safety critical (dependable) systems, section III, we give an informal mapping from the undertaken it is crucial that the underlying model and its realization CRO model to the notions of SRP. In Section VI, we present an can capture the requirements on the timely behavior of the example system (a process controller), and show how system system, in terms of both external and internal interactions, state, functionality and temporal properties are captured and and that these requirements can be preserved and validated abstracted in terms of the CRC model. In Section V, we throughout the design process (from specification to actual propose a method for code synthesis of CRO models, and code execution). In many cases, the execution platform puts show how the timing requirements from the specification is firm resource limitations on available memory and speed of preserved. In Section IV, we propose an algorithm that from a computations, that must be taken into consideration for the CRO model extracts resource ceilings and priorities for SRP- validation of the system. Thus, a straight forward transition of DM based scheduling. The algorithm is exemplified on the traditional CBD models [1], [2] and tools does not suffice. process controller, showing how timing requirements from We undertake the component model and accompanying the model specification is translated into resource ceilings design methodology presented in our earlier work [3]. With and priorities for the scheduler. Furthermore, in Section VII, the outset from a reactive system view, the behavior of we present the design of an efficient SRP-DM kernel, and

61 demonstrate how the derived resource ceilings and priorities      are utilized at run-time. The paper is concluded in Section  VIII, where we summarize the presented proposals and results,         and give directions for future research in the field.   II. BACKGROUND  

A. Underlying Model Fig. 1. Permissible execution window for a message. 1) Concurrent Reactive Object Model: The concurrent reactive object (CRO) model is the execution and concur- rency model of the Timber programming language, a general- it is not allowed to wait until data becomes available, but it purpose object-oriented language that primarily targets real- must instead return a result indicating that the queue is empty. time systems [6]–[8]. A subset of C, TinyTimber [9], im- plements the core features of Timber and uses CRO as its 4) and specification of timing behavior: execution model. In the CRO model objects communicate by passing mes- In this section we briefly describe the main features of sages. Each message specifies a recipient object (O) and the CRO model and its abstraction to Concurrent Reactive a method (m) of this object that will be invoked. A mes- Components, and discuss its implementation. We present an sage is either synchronous (SYNC(O, m)) or asynchronous informal mapping from the CRO model to the notions of SRP. (ASYNC(O, m)). The sender of a synchronous message blocks We focus on reactivity; object-orientation with complete state waiting for the invoked method to complete (with a possible encapsulation; object-level concurrency with message passing result), while the sender of an asynchronous message can between objects, the ability to specify timing behavior of a continue execution concurrently with the invoked method. system, and the abstraction to components. Thus, asynchronous messages introduce concurrency into the 2) Reactivity: Reactivity is the defining property of the system. An asynchronous message can also be delayed by a CRO model, which makes it particularly suitable for embedded specific amount of time. system, since functionality of most, if not all, embedded Timing behavior of a system can be specified by defining systems can be expressed in terms of reactions to external a baseline and a deadline for an asynchronous message (a stimuli and timer events. A reactive system can be described synchronous message always inherits the timing specification as follows: initially the system is idle, an external stimulus of the sender). The baseline specifies the earliest point in (originating in the system’s environment) or a timer event time when a message becomes eligible for execution, which triggers a burst of activity, and eventually the system returns for an external stimulus corresponds to its “arrival time” and to the idle state. A reactive object is either actively executing for a message sent from one object to another is defined a method in response to an external stimulus or a message directly in the code. If the defined baseline is in the future, from another object, or passively maintaining its state. Since this corresponds to delaying the delivery of the message. The initially the system is idle, some external stimulus is needed deadline specifies the latest point in time when a message must to trigger activity in the system. complete execution, which is always defined relative to the 3) Objects and state encapsulation: The CRO model speci- baseline. Together, baseline and deadline form a permissible fies that all system state is encapsulated in objects O1,...,On. window of execution for a message (see Figure 1). Whenever Each object has a number of methods, and the encapsulated we talk about timing behavior of asynchronous messages, we state is only accessible from the object’s methods. This is shall extend the notation to ASYNC(O, m, B, D), where B also known as a complete state encapsulation. A name of and D are respectively a baseline and a relative deadline of the a method m can be fully expanded as Oi : m, where Oi message. As mentioned above, synchronous messages always is the object of the method. Methods of two objects can inherit their timing specification from the initial asynchronous be executed concurrently, but each method is granted an message. exclusive access to its object’s state, so only one method Concurrent reactive objects can be used to model the system of an object can be active at any given time. Coupled with itself and its interaction with its environment (e.g., via sensors, a complete state encapsulation, this provides a mechanism buttons, keyboards, displays). Events in the physical world for guaranteeing state consistency under concurrent execution. (such as pushing a button) results in an asynchronous message The source of concurrency in a system can either be two (or being sent to a handler object, and system output (e.g., flashing more) external stimuli that are handled by different objects, an LED) is represented as messages sent from an object to the or an asynchronous message sent from one object to another environment. (more about message passing below). 5) Implementation of CRO: The implementation of an To ensure that execution is reactive in its nature, each object instance can be either in software (e.g, synthesized method must follow run-to-end semantics [10], i.e., it is not from the class definition in the Timber language, or from a allowed to block execution awaiting external stimulus or a TinyTimber definition in C) or provided by the environment. message. An example of this would be an object representing This allows incorporating hardware interactions and legacy a queue: if a dequeue method is invoked on an empty queue, code (typically external software libraries) in the model, as

62 long as their interface is compliant with the concurrent reactive numbers. For our case, with the DM scheduling, we can object model. assign π(M)=p(M) for every M (since this assignment 6) Concurrent Reactive Components: To deal with com- satisfies the condition for preemption levels, see (P1) in plexity, system state and functionality is partitioned into a hi- [12]). erarchy of concurrent reactive components (CRCs) encapsulat- This translation into notions of SRP is possible, since ing component instances and CRO instances (see [11]). Thus, methods in the CRO model are run-to-end (blocking for future interaction in between components will always be grounded events to the system is prohibited), thus the execution of a by interaction in between objects (belonging to different message method can be seen as corresponding to the execution components). This allows us to apply a consistent modeling of a job instance. for component interaction, component implementation and interaction with the environment (since a CRC is simply a A. resource ceilings collection of one or more CROs). Assume a message M1 = SYNC(O1,m1,B1,D1) (or M1 = ASYNC(O1,m1,B1,D1)). If the execution of m1 B. Stack Resource Policy can give rise to sending a synchronous message M2 = ∗ Stack resource policy (SRP) is a policy for scheduling SYNC(O2,m2,B2,D2) then we write M1 → M2. Let → real-time tasks with shared resources that permits tasks with be a transitive closure of →. The initial message in a path ∗ different priorities to share a single run-time stack [12]–[14]. defined by → may be synchronous or asynchronous, the SRP applies directly to scheduling policies with dynamic and subsequent messages must be synchronous. The set of re- static priority, including e.g., Earliest Deadline First (EDF) sources (objects) potentially requested by a message M0 = which is used by current Timber and TinyTimber kernels, ASYNC(O0,m0,B0,Do) is defined as and Deadline Monotonic (DM) for which interrupt hardware Objs(M )={O }∪{O | M →∗ SYNC(O, m, B, D)} of commonplace platforms can be utilized efficiently. SRP 0 0 0 scheduling offers a number of advantages, mainly deadlock We also say that M can lock objects Objs(M).Acurrent free execution and memory savings due to the shared runtime resource ceiling O is defined as stack, but it also bounds the number of preemptions to at most two for each job instance. The bounded number of O = max{π(M)|M ∈M,O ∈ Objs(M)} preemptions together with early blocking (a job is not allowed where M stands for all messages in the system. Note that to start execution until all resources are available) allows for O can be computed statically (if M is statically known). simple and sufficient schedulability test for EDF [12], and DM An algorithm for computing O is discussed in Section IV. [15]. The traditional version of SRP only addresses single-core The current system ceiling Π is systems, however, SRP has also been extended to multi-core and multi-processor systems (see, for example, [16] and [17]). Π=max({0}∪{O|O is locked}

III. TRANSLATION OF THE CRO MODEL The SRP states (cf. [12]) that a message M = ASYNC(O, m, B, D) In order to allow a CRO system to be represented in notions is blocked from starting execution until π(M) > Π of SRP, we must first translate the CRO model into jobs and . In addition to that, in order to be scheduled for M resources. A straightforward translation is possible: execution must have highest priority of all jobs, which in the case of DM follows directly from π(M) > Π. • Each object Oi is treated as a single-unit resource. The derived resource ceilings, along with the base and • Each asynchronous message M = ASYNC(O, m, B, D) deadlines of the messages provide sufficient information for is treated as a job request, where the resulting job instance the run-time scheduling under SRP, and will ensure that initially performs a resource request for O, invokes m, the system passing the analysis will be deadlock-free during and releases O. execution. • Each synchronous message M = SYNC(O, m, B, D) as a resource request for O, invocation of m and release of IV. RESOURCE CEILING AND PRIORITY EXTRACTION FOR O. CRO • B ASYNC(O, m, B, D) The baseline of a message , cor- In this section we give algorithms for calculating resource responds to the arrival time of a job request and the ceilings, interarrival time (period) and offsets of jobs (asyn- D deadline to the relative deadline of the job instance. chronous messages). • M Priority of an asynchronous messages is denoted by We define the message passing graph of the program as p(M). Consider M1 = ASYNC(O1,m1,B1,D1) and M1 = ASYNC(O2,m2,B2,D2). In the DM schedul- G =(S, A, N ) ing priorities are defined so that p(M1) >p(M2) iff where D1 D2. Values of preemption levels are natural A = {,...,  }⊆N×N (ASYNC(..) messages.)

63 m represents a method, each m has one unique receiving A. XML Format O(m) object, represented as (i.e. one method cannot have two A model is stored in as an XML file that captures both different receiving objects). system structure and implementation. For a description of the S A κ(m ) and is not initially known. Defining i as the set internal format see [11] synchronous messages that may be sent from a method mi and α(mi) as the set of asynchronous messages (that may be B. Requirements on the methods sent), then S and A can be calculated as nmeth All methods are written in the C language. However, the S = i=1 κ(mi) nmeth C language allows us to specify behavior not allowed by the A = α(mi) i=1 CRO model (such as entering an infinite loop, waiting for For a CRO program to be valid, S must be acyclic (otherwise input, etc.). Thus, in order for a method written in C to be a the program may contain deadlocks). j j valid CRO method it must comply with the following rules, it Let β() denotes the baseline and δ() the deadline j must  m of an asynchronous message . If invoking i may result 1) be run-to-end (complete execution within a finite amount m α(m ) in more than one asynchronous message to j, then i of time), m m will return a message ( i, j) with the shortest baseline and 2) not access any global memory (state) outside of its deadline (baseline and deadline may come from two different object, and messages). The resource ceiling of an object can then be 3) not invoke a method of another object directly (without calculated as  using the ASYNC or SYNC primitive). Oi = min(C V) Compliance with two and three are partly enforced by name V = {δ()| =(m ,m ) ∈A,O(m )=O } scoping, i.e. the framework will generate local defines for j l l i names in the current scope (method). However, the system ∗ designer can still force an incorrect behavior by directly C = {δ()| =(m ,m ) ∈A, (m ,m ) ∈S,O(m )=O } j k k l l i calling methods (of other objects) or accessing global memory ∗ (e.g., using pointers). Strictly enforcing two and three would where S denotes the transitive closure of S. V denotes the set require implementing a parser for a subset of the C language of asynchronous messages to any method ml where O(ml)= (disallowing pointer arithmetic, extern keyword, etc.). The first Oi, and C the set of asynchronous messages to any method that rules is currently not enforced in any way and compliance must may (transitively) send a synchronous message to ml where be ensured by the system designer. One approach to enforcing O(ml)=Oi. the first rule is to perform worst-case execution time analysis Additionally, for a program to be analysed for periods and [18] on the methods, but this is outside the scope of this paper. offsets, the following must hold: Below is an example of emitted C code, showing local • Any Strongly connected component (SCC) of the graph defines for names in the current scope. G G may not be reachable from another SCC of // local name bindings • Any SCC of G may only contain one cycle #define get_feedback ..... #define control_out ..... • Multiple syncs/asyncs from mi to mj is not allowed #define process ..... Then we can define Γ as a function that calculates the period // method implementation i i  j  Γ() β() j int controller_process(OBJ* self, int arg){ of any given . = j for all part of a SCC int fb = SYNC(get_feedback, 0); i that can reach . // Controller state update etc, eg: i self->state.out = 10; For calculating the offset for a  relative to method mj SYNC(control_out,self->state.out); i ASYNC(process, MS(10), MS(1), 0 ); we define a function Δ(mj, ) that will return a set of all } possible offsets. To calculate this function we must first find #undef get_feedback i #undef control_out all paths from mj to (the called method of) , for each path  k #undef process we calculate: k β() for all k part of the path. The set of results of this calculation is the result of the function Δ. C. Code Synthesis

V. C ODE SYNTHESIS OF FRAMEWORK The CRC model contains definitions, instances, methods, The code synthesis framework is depicted in Figure 2. The and states. From this C functions and object definition struc- Reko IDE [11] enables design of CRC models. It provides a tures (C typedefs) can be emitted. Additional information GUI where the user can create, browse, and edit the CRC required to compile (using a C-compiler) the system into an models which are represented graphically in the IDE. The executable binary is: illustrations of the example system presented in Section VI • A static object structure (see below). are screenshots from the Reko IDE. • Defines for preemtion levels and resource ceilings. (cov- Code generation is an integrated part of the Reko IDE, and ered in: IV and VI-A) is covered in section . • A Kernel (VII)

64 !  $% %%!   !   !  $% $  !#'#$

)"$ %%%  #  (&% & %! $ %#&%&# # "%!  '$ 

Fig. 2. The code synthesis framework used for translating the CRC model into an executable.

and a required interface with three ports (reset, int1, int2). The app component consists of a single instance of adapter, controller, and logger definiton respec- tively, see Figure 5. The env interrupts (reset, int1, int2) are passed to the app via its provided interface (start, inc, dec). Internally, these are connected to the provided interface of the adapter. The role of the adapter is to forward the interrupts from the environment to the controller and logger in an application specific format (e.g. both inc and dec is forwarded to the controller’s Fig. 4. The internals of the system component definiton (object instances in port setpoint but with different arguments that will result yellow and component instances in blue) with arrows indicating the message paths. in an increase/decrease of the setpoint). The controller functions as a simple feedback controller attempting to min- imize the error (i.e. the difference between the measured 1) Static object structure: Is generated by transforming process variable and the desired setpoint), by adjusting the the CRC model to a model consisting only of object in- control signal (control_out). The controller acquires stances(CRO instances). From this model it is possible to the process variable through the get_feedback port of its generate one C-struct containing all instances in the system. required interface. It is connected to get_ad of the env The timing specification of the model is preserved during via the required interface port get_val of the app.Ina code synthesis and later used by the run-time kernel, (For similar manner the controller’s control_out interface port reference see the generated C-function above): is connected to set_pwm of the env.

VI. EXAMPLE SYSTEM The system component consists of a single instance of app and env respectively, see Figure 4. The system com- The component model rely on message passing between ponent definition is selected as the root for instantiation of the components/objects and messages can only be sent between CRC model. the provided/required interface of an component/object. Every component/object can define a provided and/or a required Each method (of a object definition) is implemented in interface. The provided interface defines ports that can receive C-code and message passing is done either asynchronously messages from other ports. For an object, a port corresponds using the ASYNC primitive, or synchronously using the SYNC to a method within the object and, for a component, a port primitive (see Section II). This is used to specify the timing corresponds to a method of an object defined at some level behavior of the system, e.g. the controller defines the method inside the component (components can be hierarchical). The process: required interface defines ports used for sending messages to the port of an other component/object. However, this require int fb = SYNC(get_feedback, NO_ARG); //Controller state update etc that the sending port is connected to (has a reference to) a SYNC(control_out, outval); receiving port. Let us now present a small controller system ASYNC(process, MS(10), MS(1), NO_ARG); and show how it is abstracted in terms of the CRC model. At top level in the model we have two component definitions The process periodically acquire feedback values and (app and system) and four object definitions (env, adapter, writes control values as specified by the ASYNC primitive controller, and logger), see Figure 3. The env definition (with a period of 10 milliseconds and a deadline of 1 millisec- encapsulates specific hardware functionality and functions as ond). Since baseline and deadline of synchronous messages are a gateway between the app and the environment. It has always inherited, there is no need to supply them when using a provided interface with two ports (get_ad, set_pwm) the SYNC primitive in the code.

65 Fig. 3. The complete set of definitions of the controller system (components in blue and objects in yellow). Note that only the system definition is complete in the sense that it does not require any other componets/objects in order to be instantiated (it has no provided/required interface).

Fig. 5. The internals of the app component definiton (provided/required interfaces in white and object instances in yellow) with arrows indicating the message paths.

A. SRP Levels for the Example #define __[controllerinst]_rc 0 #define __[adapterinst]_rc 0 Auto-generated output from the analysis for the given #define __[loginst]_rc 1 example. Shows all jobs with corresponding calltrees and deadlines (preemtion levels is statically assigned according This means that [loginst] will be allowed to be pre- to deadline). Also all acquired resources and corresponding empted by any of the other job’s, but no other preemption resource ceilings (rc). is possible (preemption is not allowed between job’s having the same resource ceiling). In this example, all interrupts are #Starting points: JOBREQUEST, dl: 10000 messages to the same object ([envinst]), thus all other -entry1 rc: 15 entry1 [envinst] objects in the transitive closure of the highest priority message --start rc: 15 adapter_start [adapterinst] (to [envinst]) will have a resource ceiling equal to the JOBREQUEST, dl: 50 preemption level of the highest priority message (i.e., 0). How- -entry2 rc: 15 entry2 [envinst] ever, interrupt priorities are equal to corresponding message --inc rc: 15 adapter_inc [adapterinst] priority. Calculation of DM priorites is trivial and emitted in ---set_bor rc: 15 controller_set_bor [controllerinst] a similar fashion as resource ceilings (i.e, as defines). Note JOBREQUEST, dl: 50 that names such as [loginst] are auto-generated by the -entry3 rc: 15 entry3 [envinst] compiler, but renamed in the paper for clarity. --dec rc: 15 adapter_dec [adapterinst] ---set_bor rc: 15 controller_set_bor [controllerinst] VII. KERNEL DESIGN FOR EFFICIENT SRP DM #Detected job requests (internal ASYNCs): SCHEDULING OF CRO JOBREQUEST, dl: 10000 -init rc: 20 logger_init [loginst] The goal of the kernel design is to provide an efficient and predictable implementation of the CRO semantics (see JOBREQUEST, dl: 10000 Section II-A). In order to achieve this we have decided to use -init rc: 15 controller_init [controllerinst] the deadline monotonic (DM) scheduling policy with stack JOBREQUEST, dl: 20 resource policy (SRP). The benefits of SRP are well known -log rc: 20 logger_log [loginst] (see Section II-B), and DM was primarily chosen to allow for --get_ad rc: 15 env_get_ad [envinst] an efficient usage of the priority based interrupt hardware of JOBREQUEST, dl: 15 the target architecture (i.e. Cortex-M3 [19]), more on this in -process rc: 15 controller_process [controllerinst] Section VII-C. The scheduler requires that the priorities and --get_ad rc: 15 env_get_ad [envinst] --set_pwm rc: 15 env_set_pwm [envinst] preemption levels of messages, and resource ceilings of objects are known. In our case, these are automatically generated from Here the resource ceiling is given by its deadline, this is timing specifications in the model during code synthesis (see not suitable for scheduling since interrupt hardware typically Section V). expects priorities given as integers [0,1,2,...] where 0 is most urgent. So all resource ceilings must be sorted and numbered, A. Message Passing for this example we generate the following defines: The kernel must support two types of messages, asyn- #define __[envinst]_rc 0 chronous and synchronous. An asynchronous messages must

66 be queued until the time it becomes eligible for execution (see to implement and when the number of items in the list permissible window of execution, Figure 1). Queued messages is small the performance is comparable to more advanced are stored in the timer-queue (sorted by ascending baseline), structures [20]. The running-stack is simply a last-in-first-out once the baseline of a message is passed (i.e., the baseline stack, and the system ceiling is implemented as an unsigned expires) the message is transferred to the active-queue (sorted integer. To allow for efficient usage of common interrupt by descending priority), and considered when scheduling deci- hardware (priority-based), some values of the system ceiling sions are made (see Section VII-B). The transfer of messages are implemented in hardware. from the timer-queue to the active-queue is initiated by a timer interrupt, i.e. a hardware timer is configured to generate an Example scheduling using interrupt hardware interrupt when the earliest baseline in the timer-queue expires. A subset of the values of the system ceiling are implemented When an asynchronous message is scheduled for execution in hardware, i.e. the interrupt mask register of the processor. the recipient object-resource (see Section III) is requested, the To demonstrate the benefits, we consider a simple example: method is invoked and (when it returns) the object-resource let is released. A synchronous message is executed similarly M =[M = ASYNC(O ,m ,B , 10ms), (i.e., object-resource requested, method invoked, and object- 1 serial read inherited resource released). M2 = ASYNC(Oserial,mwrite,Binherited, 50ms)] B. Scheduling From the definitions in Section III follows: p(M1) > The scheduler only considers messages that are in the p(M2)=⇒ π(M1) >π(M2), and Oserial = π(M1). active queue, and it is invoked when either a baseline ex- M1 is sent from the data-ready interrupt of the serial port pires (timer interrupt) or the system ceiling is lowered (can (with interrupt priority p(M1)), and M2 is sent from the only happen when a method of a message returns). Since ready-to-send interrupt (with interrupt priority p(M2)). Since messages are transferred from the timer-queue to the active- Binherited in the interrupt context corresponds to the time queue, a message with a higher priority than the currently when the interrupt handler is invoked, the messages can be executing message may be eligible for execution. If the head placed directly into the active-queue. of the active-queue, Mi, has the highest priority, it is only π(M ) > Π Let all possible values of the system ceiling that can be allowed to execute if i (see Section III). Assuming represented in hardware be defined as π(Mi) > Π, Mi is transferred from the active-queue to the running-stack (containing all messages that have been H, H⊂N0 allowed to start execution) and executed. If we instead assume π(Mi) ≤ Π then Mi can only become eligible for execution and the hardware system ceiling as when the system ceiling is lowered, i.e. when a message Π, Π ∈H completes execution and releases the object-resource, thus a new scheduling decision must be made when execution of Then, assuming {π(M1),π(M2)}⊂H, the messages can be a message completes. Similarly, if Mi does not have the placed directly into the running-stack. This follows from the highest priority it can only become eligible for execution when one-to-one mapping between priority and preemption level, an asynchronous message completes execution. Synchronous i.e. p(M)=π(M), and for any messages Mj and Mk where messages that are generated by an executing message are {π(Mj ),π(Mk)}⊂H.IfMj is currently executing and an always scheduled immediately, since SRP guarantees that interrupt handler (with priority p(Mk)) that generates Mk is all resources (objects) are available and the priority of a invoked, then Π <π(Mk) (otherwise the interrupt would be synchronous message is always inherited from the sending masked by the hardware system ceiling) and p(Mk) must be message. the highest priority since if Mj is executing then Π ≥ π(Mj), and hence p(Mk) >p(Mj). Thus, an interrupt handler is only C. Implementation of data structures invoked (and corresponding message generated) if it has the The data structures required by the kernel: highest priority and all resources are available. timer-queue: messages with a baseline in the future, sorted by ascending baseline Discussion and limitations of implementation active-queue: messages with an expired baseline, sorted In the previous example, we demonstrate how interrupt by descending priority hardware is exploited to perform scheduling of messages. This running-stack: messages that have been allowed to start hardware scheduling is limited by the number of interrupt execution priorities of the hardware. Thus, if no one-to-one mapping system ceiling: the maximum of object ceilings of locked exists between message priorities and interrupt priorities, then objects co-operative software and hardware scheduling is required, The active- and timer-queues are currently implemented i.e. two or more messages with different priorities must share as sorted lists. While there are more suitable data structures a single interrupt priority. Whenever an interrupt priority is (e.g., heaps or balanced search trees), sorted lists are easier shared by different message priorities, the software scheduler

67 is invoked to determine if a generated message should be REFERENCES executed. [1] I. Crnkovic, “Component-based software engineering for embedded In the current implementation, transfer of time-delayed systems,” in International Conference on Software engineering, messages from timer-queue to active-queue is initiated by a ICSE’05. ACM, May 2005. [Online]. Available: http://www.mrtc.mdh. se/index.php?choice=publications&id=0830 timer interrupt. The priority of this interrupt must be set to [2] M. Nolin et al., “Component based software engineering for embedded that of the highest priority message in the timer-queue. Let pl systems - a literature survey,” Malardalen¨ University, Technical Report ISSN 1404-3041 ISRN MDH-MRTC-102/2003-1-SE, June be the lowest priority of messages in the timer-queue, and ph 2003. [Online]. Available: http://www.mrtc.mdh.se/index.php?choice= be the highest, then all messages (in the system) with priority publications&id=0578 pm, pl

68 Paper B End-to-End Response Time of IEC 61499 Distributed Applications Over Switched Ethernet

Authors: Per Lindgren, Johan Eriksson, Marcus Lindner, Andreas Lindner, David Pereira, and Luis Miguel Pinho

Originally published in: IEEE Transactions on Industrial Informatics

c 2017, IEEE

69 70 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 1, FEBRUARY 2017 287 End-to-End Response Time of IEC 61499 Distributed Applications Over Switched Ethernet Per Lindgren, Johan Eriksson, Marcus Lindner, Andreas Lindner, David Pereira, and Lus Miguel Pinho

Abstract—The IEC 61499 standard provides means to Index Terms—Bound transfer time, distributed control, specify distributed control systems in terms of function hard real-time, IEC 61499, industrial automation, response- blocks. For the deployment, each device may hold one time analysis, switched Ethernet. or many logical resources, each consisting of a function block network with service interface blocks at the edges. The execution model is event driven (asynchronous), where I. INTRODUCTION triggering events may be associated with data (and seen RADITIONAL Scada control systems typically imply as messages). In this paper, we propose a low-complexity the use of costly programmable logic controllers (PLCs) implementation technique allowing to assess end-to-end T response times of event chains spanning over a set of communicating over field buses such as legacy Modbus networked devices. Based on a translation of IEC 61499 [1]/PROFIBUS [2] and the more recent Ethernet-based POW- to RTFM1-tasks and resources, the response time for each ERLINK [3], EtherCAT [4], and PROFINET [5] buses. In order task in the system at the device-level can be derived using to establish end-to-end timing, much attention has to be paid in established scheduling techniques. In this paper, we de- general to the control system design, partitioning, and deploy- velop a holistic method to provide safe end-to-end response times taking both intra and interdevice delivery delays into ment. In this paper, we take an approach based on the recent account. The novelty of our approach is the accuracy of IEC 61499 standard [6]. the system scheduling overhead characterization. While the In contrast to the traditional scan based (time triggered) PLC device-level (RTFM) scheduling overhead was discussed in behavior (as implied by the IEC 61131 standard [7]), the IEC previous works, the network-level scheduling overhead for 61499 undertakes an event triggered (asynchronous) execution switched Ethernets is discussed in this paper. The approach is generally applicable to a wide range of commercial off- model. In this paper, we exploit the benefits of the asynchronous the-shelf Ethernet switches without a need for expensive model both at the device and network layer. In particular, the custom solutions to provide hard real-time performance. A asynchronous model allows at device level preemptive execu- behavior characterization of the utilized switch determines tion of the function block network and at network layer asyn- the guaranteed response times. As a use case, we study chronous communication over standard Ethernet [with the op- the implementation onto (single-core) Advanced RISC Ma- chine (ARM)-cortex-based devices communicating over a tional 802.1Q quality of service (QoS)] frames as implemented switched Ethernet network. For the analysis, we define a in commercial off-the-shelf (COTS) as well as rugged (indus- generic switch model and an experimental setup allowing trial) Ethernet switches. In order to establish safe end-to-end us to study the impact of network topology as well as 802.1Q timing properties of distributed IEC 61499 models, this pa- quality of service in a mixed critical setting. Our results in- per covers the execution model, requirements for per-device dicate that safe sub millisecond end-to-end response times can be obtained using the proposed approach. scheduling, a generic Ethernet switch model, and takes into ac- count the impact of general purpose traffic. This allows us to reason on timing properties under different network topologies, link speeds, and QoS configurations. Manuscript received November 13, 2015; revised February 17, 2016, June 1, 2016, and July 10, 2016; accepted August 4, 2016. Date of Our results indicate that the proposed approach is capable of publication November 8, 2016; date of current version February 6, 2017. safe submillisecond end-to-end response times even using cost This work was supported in part by the National Funds through FCT efficient COTS components. In particular, we show that, given (Portuguese Foundation for Science and Technology), in part by the EU ARTEMIS JU funding under Project ARTEMIS/0001/2013, in part by the assumption that switches implement predictable 802.1Q the JU under Grant 621429 (EMC2), VINNOVA (Swedish Governmental QoS, the impact of noncritical traffic can be bound and safe Agency for Innovation Systems), and Svenska Kraftnat¨ (Swedish national response times can be derived. This allows for flexible deploy- grid). Paper no. TII-15-1676. (Corresponding author: Marcus Lindner.) P. Lindgren, J. Eriksson, M. Lindner, and A Lindner are with ment, where the same physical network is shared in between the the Lulea˚ University of Technology, Lulea˚ 971 87, Sweden (e-mail: timing critical control applications and general purpose usage. [email protected]; [email protected]; [email protected]; Our results are safe to the worst case, while further improve- [email protected]). D. Pereira and L. Miguel Pinho are with the Research Centre in ments can be obtained if the maximum transmission unit maxi- Real-Time and Embedded Computing Systems/INESC TEC, Instituto mum transmission unit (MTU) of general purpose (best-effort) Superior de Engenharia do Porto, Porto 4200-072, Portugal (e-mail: traffic is controlled. [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available We can conclude that the presented approach reduces com- online at http://ieeexplore.ieee.org. plexity of distribution (in comparison to the traditional su- Digital Object Identifier 10.1109/TII.2016.2626463 pervisory control and data acquisition (SCADA) approach), 1Real-time for the masses. 1551-3203 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

71 288 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 1, FEBRUARY 2017 reduces device and network complexity (does neither re- quire complex time triggered architectures, such as scan-based PLCs, nor any dedicated network components/protocols), and is highly flexible (allowing network sharing between real-time and best-effort devices). Moreover, the presented approach opens up for the possibility to automate the generation of interfaces interconnecting function block networks for which correct-by-construction data delivery can be ensured. Moreover, automation would allow the designer to focus on the properties of the application at the model level, independent of networking and deployment specifics. The paper is outlined as follows. Section II gives the back- ground for this paper, Section II-A reviews the task and resource model undertaken. In Section II-B, we review previous work on the mapping of IEC 61499 function block networks (at the de- vice/resource level) to the aforementioned task and resource Fig. 1. Example IEC 61499 system model. model. Device level (single-core) deployment and end-to-end response time are discussed in Section II-C, while Section II-D SRP task model with the ability to emit events (hence task chains gives a brief review of Ethernet technology. Our contributions can be expressed). For sporadic and periodic tasks, where the are detailed in Section III, where we present the model of dis- interarrival time of triggering events is larger than the associated tribution (see Section III-A) together with a general approach to task’s response time, each task is associated with an interrupt end-to-end analysis (see Section III-B). We develop a generic handler. The underlying interrupt hardware is exploited as a network driver in terms of RTFM (real-time for the masses) (single unit) event buffer,2 critical sections are enforced through tasks and resources, and as a use case, the driver design for the interrupt masking, and preemptive static priority scheduling is LPC1769 microcontroller is detailed and characterized. This done directly by the interrupt hardware. The RTFM-kernel im- allows us to derive a holistic analysis at the device level (see plements the SRP scheduling scheme, which is well studied and Section III-C). Implications of the Ethernet link layer are dis- accepted in the research community. Its working principle and cussed in Section III-D, enabling end-to-end response time anal- further details can be found in [8]. yses for point-to-point networks (see Section III-E). A generic The RTFM-core programming language [10], [11] builds on switch model is proposed in Section III-F and applied to star the RTFM-kernel task model extending it with timing con- (see Section III-G) as well as tree networks (see Section III-H). straints. The RTFM-core compiler analyzes the input program, Our experimental setups and experiments are described in derives static priorities and resource ceiling values according Section IV, while evaluation and future work are discussed in to SRP, and generates plain C code. A bare-metal executable Section V. The paper is finally concluded in Section VI. is obtained by passing the generated C code together with the RTFM-kernel scheduling primitives to a back-end C tool chain. II. BACKGROUND A. Task and Resource Model B. IEC 61499 to RTFM-Core Task and Resource Model The task and resource model undertaken for this paper follows In previous work, mapping from the device/resource level the model of the stack resource policy (SRP) [8]. For this paper, IEC 61499 models have been proposed [12], [13]. Under IEC we use task instance and task interchangeably. In short, the SRP 61499, function block networks are mapped to (scheduling) task and resource model is captured in the following. resources. Each device may host one or several resources. At 1) ti is a task with run to completion semantics. the edges of the IEC 61499 resources, service interface blocks 2) p(ti ) is the static priority for ti . (SIFBs) are required for communication. The standard does not 3) rt(ti ) defines the response time for task ti . cover the implementation of SIFBs. For this work, we propose 4) ei is an event triggering the execution of the correspond- RTFM-core as the means of implementation. This brings the ing task ti . advantage that the complete system (at the device level) can 5) rj defines a resource, a task may claim resources in a last be implemented in a uniform manner, allowing the SIFBs to in first out nested manner. be part of the device level scheduling without introducing any SRP provides preemptive, deadlock free, single-core schedul- additional/external run-time system. ing for systems with shared resources. Given worst case exe- 1) System Model Example: cution times (WCETs) for tasks and their inner claims (critical a) IEC 61499 system model: Fig. 1 depicts an IEC sections), response time analysis is readily available [8]. 61499 model developed in the 4DIAC IDE [14]. The SIFB in- The RTFM-kernel has been developed from the outset of stances Ea1 and Ec1 capture the external events and trigger the the SRP task and resource model [9]. The kernel application execution of actions associated with a_1.i_1 and c_1.i_1, programming interface (API) amounts to C code macros with predictable and low overhead (typically a few machine instruc- 2The 1-buffer assumption can be relaxed by user defined buffers, for which tions on the ARM-Cortex family of processors). It extends the error handling semantics can be tailored to the application. 72 LINDGREN et al.: END-TO-END RESPONSE TIME OF IEC 61499 DISTRIBUTED APPLICATIONS OVER SWITCHED ETHERNET 289

Fig. 2. ECC of FB b1_b3. Fig. 4. Point-to-point topologies, CSMA/CD collision detect and resend (left), collision free full duplex (right). Each link is marked with the asso- ciated messages m.

Fig. 5. Full duplex switched star network topology. Each link is marked Fig. 3. Example RTFM system model. We have the (external) events with the associated messages m. ea 1 and ec 1 triggering the task chains headed by a1 and c1 , respec- tively. Tasks b1 and b3 share a resource under the protection of a critical section r. respectively. Fig. 2 depicts the ECC3 of b1_b3, showing the associated actions taskb1 for input event i1 and taskb3 for input event i3, respectively. The execution model of IEC 61499 stipulates that events are handled sequentially by the ECC. Tran- sition conditions (guards) are evaluated in the order determined by the programmer, in this case 1 (2)fori1 (i3), respectively. However, at the network level, the standard does not stipulate the execution order of events. b) RTFM system model: Fig. 3 depicts an example Fig. 6. Tree switched network with mixed real-time (between nodes A, RTFM system model expressing the same behavior as the ex- B, and C) and best-effort traffic (between nodes D and E). Real-time ample model from Fig. 1. messages are marked m, while best-effort traffic is marked n. As seen, links S3 → S1 and S2 → S3 carry both real-time and best-effort traffic. In RTFM-core, the system is expressed in terms of event trig- e gered tasks with named critical sections. The external event a1 triggers the execution of a1 , which in turn triggers the execution compiled together with the (target specific) RTFM-kernel prim- of b1 , which in turn triggers b2 . For a part of the execution, itives render an executable for the deployment. b1 and b3 enter a named critical section r used to ensure the 1) Monolithic Response Times: The end-to-end response nonpreemptive (sequential) behavior corresponding to the ECC times for a task chain can be safely estimated by accumulating in Fig. 2. While each link in RTFM-core is treated as a mes- the response times along the chain. For the example system, the e → a → b → b sage (i.e., event with associated data), there is neither a need response time of the task chain a1 1 1 2 is given for explicit data wiring nor for multiplexing (F_MUX_2_1). as rt(a1 )+rt(b1 )+rt(b2 ). (More precise methods exploiting Each task in the model is associated with a static priority allow- precedence/offset information have been extensively studied, ing fine grained preemptive scheduling at the device level, yet see, e.g., [15], but is out of scope for this paper.) preserving nonpreemptive ECC execution (as stipulated in IEC 61499). D. Ethernet Technology Ethernet-based buses are emerging in industrial applications C. Monolithic Single-Core Deployment as a cost and performance efficient alternative to proprietary For deployment to single-core platforms, the rtfm- field buses. compiler derives the task set from the input program and In the following, we briefly review a set of common topolo- performs SRP analysis (i.e., derives the resource ceilings on gies. (Messages in the figures relate to the distribution example basis of the per-task resource utilization). The generated code later introduced in Section III.) Fig. 4 depicts point-to-point topologies. In legacy systems, a single coaxial cable was used 3An Execution Control Chart defines the state machine of a basic Function (see Fig. 4, left) using a collision detection and resend mecha- Block in IEC 61499. nism for which bound transmission times are hard/impossible to 73 290 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 1, FEBRUARY 2017

Fig. 7. Embedding of 802.1Q (VLAN) into an Ethernet frame (source: Wikipedia; author Bill Stafford). guarantee. Alternatively, collision-free communication is possi- ble by using multiple interfaces and dedicated (point-to-point) cabling. However, such solutions are costly and bulky (see Fig. 4, right). A switched (full duplex) star network mapping of the system is given in Fig. 5. Fig. 6 depicts the scenario of a network with local switches S1 and S2 and a backbone switch S3 . In this case, the network is shared between real-time traffic and best-effort (BE) traffic. 1) Ethernet Framing: An Ethernet frame (see Fig. 7, top) has a payload of 46–1500 bytes (octets). A 802.1Q virtual local area network (VLAN) frame (see Fig. 7, bottom) has a pay- load of 42–1500. In both cases, this yields a minimum packet size of 64 bytes (including the header and cyclic redundancy check (CRC)/frame check sequence (FCS). The large minimal Fig. 8. Distributed system model. size packages are a legacy from nonswitched (shared media) networks, for which the collision detection (CSMA/CD) mech- (hub) configurations. However, time-triggered protocols go out- anism is dependent on the propagation delay of the signal. side the Ethernet standard and require all devices on the network Including Preamble (8 bytes) and Inter Frame Gap (12 bytes), to follow the time-triggered protocol. The use of standard Ether- the total size of a frame is 84 bytes. Hence, the transmission net switches in POWERLINK networks is discouraged in order time for a complete minimal frame over a link is 84 ∗ 8/ls to uphold timing properties. (where ls is the link speed) amounting to 67.2/6.72/0.672 μs EtherCAT [4] is another filed bus system based on the over 10/100 Mb/1 Gb/s links, respectively. This gives the min- Ethernet Phy technology (allowing integration with standard imum period for transmissions over a link (and hence the max- Ethernet framing and overlaying UDP/IP protocols). The imum blocking time inferred by transmission of a minimal EtherCAT protocol deploys a communication scheme provid- sized packet). Considering only packet delivery, we can ex- ing high-precision synchronization (using IEEE 1588 Precision clude the Inter Frame Gap (12 bytes), 57.6/5.76/0.576 μs over Time Protocol [19]). EtherCAT devices may be designed to act 10/100 Mb/1 Gb/s links, respectively.4 as switches, allowing for a wide range and mix of network topologies to be implemented. E. Ethernet Based Field Buses In common, these field buses deploy a master/slave configu- The term industrial Ethernet (IE) is commonly used referring ration. to rugged components relying on Ethernet technology. A gamut of Ethernet-based field buses (such as POWERLINK III. DISTRIBUTED SYSTEMS IN RTFM-CORE [3] and EtherCAT [4]) have been introduced, for an overview In the following, we discuss distribution aspects of RTFM- see, e.g., [18]. In common, they have a focus on predictable core models from a generic perspective. timing; hence, they are collectively referred to as real-time Eth- ernet. As prominent examples thereof, we briefly review POW- A. Distributed Single-Core Deployment ERLINK and EtherCAT. In a distributed setting, the task set and (associated) resources POWERLINK [3] is a time triggered protocol that provides are partitioned in a set of devices. For this paper, we make collision free and predicable communication over shared link the assumption that resources cannot be shared among tasks executing on different devices. 4As a comparison, the minimal packet size is 8 bytes for controller area Fig. 8 depicts a deployment scenario associating the tasks network (CAN) (a packet without payload, using 11 bit address), giving a {a1 ,a2 } to device A, {b1 ,b2 ,b3 } to B, and {c1 ,c2 } to C.The minimum transmission time of 8 ∗ 8/ls, amounting to 5.12/2.56/1.28 ms for CAN-bus speeds of 125/250/500 kb/s, respectively. While having consider- messages (events with optional payload) crossing device bor- ably larger transmission times, the CAN bus was designed with the outset of ders are marked mi , e.g., m1 = a1 → b1 indicates an event predictable prioritized scheduling for which worst case delivery times can be (message) from task a1 (on device A) triggering b1 (on device bound [16]. While originally developed for automotive applications, the CAN- B bus has gained a wider spread (including industrial control applications) due to ). At this level, the model is network agnostic (i.e., the type of its robustness, relative low cost, and predictable behavior [17]. media, protocol, and topology are not explicit).

74 LINDGREN et al.: END-TO-END RESPONSE TIME OF IEC 61499 DISTRIBUTED APPLICATIONS OVER SWITCHED ETHERNET 291

A A A S Fig. 9. Device with RX and TX. is the network, typically a switch.

A m Fig. 11. TX driver operation. In this case, two messages ( 1 and m2 ) have been buffered (by the software driver) for transmission, where m1 started transmission. When a package has been sent, the EMAC will advance the TxConsumerIndex.

buffer has been received. The LPC1769 bus matrix allows the EMAC DMA to read and write package data using a dedicated static random access memory (SRAM) allowing interference to be isolated to reading out/setting up package payload. This amounts for task aRX and function faTX to a few shared mem- ory accesses, for which worst case latency can be deduced from the data sheets. Under the assumption that messages will be B B Fig. 10. Device with RX driver. consumed before reoccurring, a single (nonprotected) buffer for each unique message is sufficient, hence accesses to rRX and rTX B. End-to-End Response Time, a Holistic Approach do not require critical sections (hence marked white/transparent in figure). However, because the function fa can be executed A safe end-to-end response time estimate of a task chain TX (preemptively) on behalf of both tasks a1 and a2 , its operations spanning multiple devices can be computed by the accumulated on the EMAC must be protected in a critical section (marked sum of intradevice response time and the interdevice delivery blue/filled in figure). times. In this paper, we propose a holistic approach taking into The EMAC transmit and receive functionality operates on account the impact of scheduling at device level as well as for circular description tables using producer and consumer indexes each link involved. as depicted in Figs. 11 and 12. As a use case, we study switched Ethernet, the impact of net- a) Transmit: The operation of fa amounts to writ- work topologies, 802.1Q QoS configurations, and the sharing TX ing the message payload into the dedicated send buffer (in the of physical links between real-time and BE domains. For the in- shared SRAM), allocating a descriptor (round robin), update tradevice response time, we rely on the previous discussion (see its packet pointer and size fields, and increasing the TxPro- Section II-C). However, the approach is generic and applica- ducerIndex modulo TxBufferNumber. Fig. 11 depicts ble to other device level programming models, network media, the case where both m1 and m2 have been enqueued (but not protocols, and topologies as far as worst case response/delivery yet transmitted). Under the assumption that messages are deliv- times can be assessed. ered (by the hardware) before reoccurring, a single buffer for each message is sufficient. C. Holistic Device Level Response Times b) Receive: The operation of task bRX (triggered by the In our setting, we consider device drivers as a part of the ap- EMAC when RxConsumerIndex != RxProducerIndex) plication at device level, and this allows us to assess intradevice amounts to reading the destination port number and triggering response times in a holistic manner. either task b1 (messages m1 and m2 )ortaskb3 (message m3 ) Fig. 9 depicts the device level task set for node A including with the corresponding payload as argument. Fig. 12 depicts the the Ethernet drivers ARX and ATX with associated resources case where both m1 , m3 and m2 have been received (but not yet (buffers) rRX and rTX. Fig. 10 depicts the task set for device B handled). Under the assumption that interarrival time for each having only a receiving Ethernet driver BRX with the associated message is larger than its time to be conveyed (i.e., leaving the receiving resource (buffer) rRX. input buffer), no overrun will occur. The ARM-Cortex M3-based LPC1769 microcontroller pro- 1) Device Level Analysis: For the end-to-end response vides an on chip Ethernet MAC (EMAC) with direct memory times, we consider the task chains from each triggering event e access (DMA) functionality triggering the event aRX when a to task chain completion. Assuming that the size (in bytes)

75 292 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 1, FEBRUARY 2017

allows us to isolate the link-layer (see Fig. 4, right). For the discussion, we consider only nodes A and B (hence for each device a single network interface suffices).5 1) Output Queueing: The actual scheduling of the output link is performed by the function faTX. For our implementation, we deploy a simple first in first out (FIFO) mechanism, where packets are queued in the order of their arrival to the task faTX. That is, in the worst case, a message will be blocked/interfered by all other outgoing messages. Assume that we have n outgoing messages and tr(mj ) is the transmission time for a message mj , then the blocking time is given as n b(mi )= tr(mj ),i= j. (6) j=1

The total time for delivery includes the transmission time of mi giving us the total delivery time (or network delay) over a link n B m m ,  Fig. 12. RX driver operation. In this case, three messages ( 1 , 3 l(m )= b(m )+tr(m )= tr(m ). and m2 ) have been received by the EMAC, but not yet received by the i i i j (7) driver. The driver will update RxConsumerIndex when a message has j=1 been internally dispatched to the corresponding RTFM task. MS must be the largest frame size for the received messages as we do not know In case all messages fit the minimum Ethernet payload (46/42 in which buffer each message will be received. bytes for Ethernet II/802.1Q, respectively), we have a common d(m)=tr(m) ∗ n, which (as seen in Section II-D) amounts to of payloads is given as |m1 | =4, |m2 | =4, |m3 | =12, and n ∗ 67.2/6.72/0.672 μs over a 10/100 Mb/1 Gb/s link, respec- |m4 | =4in the following, we have the device level response tively. times for device A: a) Refinement: One could think of improvements to the above queuing mechanism either by allowing nonstandard D (e )= (a ) rt a1 rt 1 (1) framing (reducing the transmission time) or by priority queuing. m 4 D (e m 4 )= (a )+ (a ). 2) Input Queueing: On the receiver side, packets arrive in rt a rt RX rt 2 (2) RX the order defined by the output queueing discussed above. The a a Notice that tasks 1 and 2 include the execution for the queuing propagation delay over the link l can be computed as pd(l)= fa B function TX. For device , we have the following: |l|/Pcopper, where |l| is the length of the link and Pcopper is m m D (e m )= (b 1 )+ (b 1 )+ (b ) the propagation speed. For copper, this is (roughly) two-third of rt b 1 rt RX rt 1 rt 2 (3) RX the speed of light (300 ∗ 106 m/s) amounting to 0.005 ns/m, and m 2 m 2 Drt(ebm 2 )= rt(b )+rt(b1 )+rt(b2 ) (4) hence can be neglected for most practical cases. RX RX m D (e m )= (b 3 )+ (b ). rt b 3 rt RX rt 3 (5) RX E. Response Time in a Point-to-Point Network bm 1 For the scheduling analysis, we have the task instances 1 We can now derive a safe response time for task chains span- bm 2 bm 1 bm 2 bm 3 and 1 and RX , RX , and RX . In this way, we can take into ning multiple devices over a point-to-point Ethernet network. account payload size WCET dependencies for the receiver task Let T be a global set of tasks, |T | the size of the set, M be a b (bm 1 )= (bm 2 ) RX. In this case, WCET RX WCET RX because the mes- global set of messages, and |M| its size. Let TC(ei ) be a set of (bm 3 ) sages are of same length, while WCET RX has a different tasks triggered by the event ei (i.e., tasks along the task chain (larger) WCET due to the copying of the larger payload. No- headed by the event ei ). Let MC(ei ) be a set of messages along tice that analysis is based on the set of task instances, e.g., the task chain headed by the event ei . b bm 1 bm 2 bm 3 RX implies the instances RX , RX , and RX . As discussed in For a triggering event ei and a task chain involving the devices Section II-A, our task and resource model is SRP compliant D and the links L,wehave allowing a plethora of available methods for analysis to be read- |T | ily applied and thus are not further discussed in the scope of  D (e )= (t ),t ∈ TC(e ) this paper. However, worth to notice is that the proposed design rt i rt i i i (8) i=1 isolates the impact of EMAC DMA memory accesses to the copying of payload from/to the receive/send buffers (which re- |M | side in the shared memory). The isolation facilitates safe WCET Lrt(ei )= l(mi ),mi ∈ MC(ei ) (9) analysis. i=1

D. Link Layer Model 5An actual collision free point-to-point implementation for the complete system would require additional (external MAC/Phy) hardware and thus is In order to assess the impact of networking, we first study impractical. However, the two-node scenario is sufficient to highlight the impact a simple (directly connected) point-to-point topology, which of networking for each link.

76 LINDGREN et al.: END-TO-END RESPONSE TIME OF IEC 61499 DISTRIBUTED APPLICATIONS OVER SWITCHED ETHERNET 293

Fig. 13. Switching operation without QoS. We assume FIFO ordering Fig. 14. Switching operation with QoS. In this example, m3 was en- m m of output queues. In this example 3 was enqueued before 2 , hence, queued before m2 . However, assuming per QoS priority ordering of out- m m 2 will be blocked by 3 . put queues and m2 having a higher QoS priority than m3 ,theblocking by m3 is prevented. (QoS reordering marked in red.)

Ert(ei )= Drt(ei )+Lrt(ei ) (10) with Drt(ei ) being the accumulated response times over all de- vices, Lrt(ei ) being the accumulated link delays, and Ert(ei ) being the total end-to-end response time (delay). Taken the running example, we have the chain e → a → m → e → bm 1 → b a1 1 1 TX 1 2 (11) with D (e )= (a )+ (bm 1 )+ (b ) rt a1 rt 1 rt 1 rt 2 (12) L (e )= l(m ) rt a1 1 (13) E (e )= (a )+ (bm 1 )+ (b )+l(m ). rt a1 rt 1 rt 1 rt 2 1 (14) a) Refinement: Given the assumption that the interar- 1 2 3 4 e E (e ) Fig. 15. ec chain (a) implies 4 transmissions, m , m , m , and m . rival time of a1 is larger than rt a1 , we can exclude the 1 4 4 2 2 blocking and interference by tasks tj along the same chain t ∈ TC(e ) rt(t ) ( j a1 ) for the computation of i (at the device level). The same reasoning goes for the link delays, where mes- the input queue through the switching fabric to the output port. sages belonging to the same chain cannot block each other. (Actual implementations may vary, e.g., storing input frames in However, for the given example, the result would be the same, a common memory and/or having separate output buffers is still but for cases, where the task chain traverses the same nodes covered by the presented model.) (and links), a multiple times a tighter, yet safe, estimate can be The output queuing mechanism may also vary: in Fig. 13, obtained. we assume FIFO queuing, i.e., merging streams in the order of arrival; while in Fig. 14, we assume queueing w.r.t. 802.1Q F. Switch Model QoS, i.e., merging streams according to QoS/priority code point In order to derive the response time over a switched network, (PCP) priority. (Actual implementations may feature extended we first have to define a model of the switch at hand. We take the QoS, based on VLAN priority groups, and/or stream manage- outset that switches are non-blocking at link speed and able to ment using double tagging inside service provider networks.) store and forward all packages as long as the output bandwidth In practice, switch implementations have limited buffer space (for each port) is sufficient.6 and computational resources, which brings implications to both Fig. 13 gives an architecture outline; incoming frames are functional and timing behavior. Configuration and characteriza- buffered and inserted (by reference) into the corresponding out- tion are further discussed in the experimental Section IV. put queue (or queues in case of multicast). The frame (if any) referenced at the head of each output queue is transmitted from G. Response Time in a Switched Star Network

6These assumptions are in line with the claims made for many modern Taking the outset of our example from Fig. 8 and the topol- switches such as the Cisco Catalyst 2960 [20]. ogy shown in Fig. 15, we take a closer look at the end-to-end

77 294 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 1, FEBRUARY 2017

e (a) m1,2,3,4 m5,6 Fig. 16. c 1 implies a transmission chain 4 and 2 .

Fig. 17. Real-time messages m will in the worst case be exposed to e blocking by best-effort traffic n on each trunk (backbone) link. response time for the task chains headed by c1 : (a) e → c → m → a → a → m → b → b → b c1 1 4 RX 2 2 RX 1 2 (b) e → c → m → b → b c1 1 3 RX 3 transmission of a BE frame has just started when the real-time (c) e → c → c c1 1 2 frame is enqueued for transmission. The tree topology at hand Focusing on the communication for (a) depicted in Fig. 15. implies two trunk links (between switches) in each direction. For chain (a), we find that the messages m2 and m4 passes two links (indexes above show the order of delivery). Each link IV. EXPERIMENTS delay can be derived analogously to Section III-D taking into 1) Devices: Experiments have been conducted on the em- account the queueing characteristics and scheduling overhead of bedded artist (EA) LPCXpresso1769 development board run- the switch as discussed in the previous section. The end-to-end ning at 96 MHz. The board features a LAN8720 PHY connected response time can then be derived analogously to Section III-E. to a 10/100 Base-Transformer (RJ45 jack) featured on the EA base board. (100 Mb per default for the experiments.) H. Switched Tree Model 2) Switches: The experiments were conducted using a set Fig. 16 depicts a tree network topology with edge switches of modern Cisco Catalyst 2960 S switches (running OS version S1 and S2 and a backbone switch S3 . The ports connecting 12.2) [20]. Cisco claims the Catalyst 2960 to be nonblocking to the backbone switch are the so-called trunks merging traffic at link speed. For the experiments, all automatic management from different streams. In this case, the links to/from S3 convey and spanning tree options were disabled; hence, MAC address mixed traffic from the real-time and the BE domains exposing tables and trunk ports were statically assigned to the switches. 2 3 the messages m4 an m4 to (potentially unbound) blocking and The Catalyst 2960 S provides ample configuration options for queueing overhead. implementing QoS. The architecture follows the generic switch b) Refinement: The effect of blocking can be mitigated model presented. The Catalyst 2960 S features output queueing by faster trunk links (e.g., 1 Gb Ethernet), but the potentially based on class of service (CoS, 802.1Qp) of incoming pack- unbounded overhead still remains. However, under the assump- ages for trusted ports. The switch provides four output (egress) tion that QoS mechanisms are predictable and real-time traffic queues, where queue 1 has priority (expedited unless empty). is classified to be of strictly higher priority than BE traffic, an The other queues are expedited to share or shape the remaining upper bound on blocking can be obtained. bandwidth under configurable weighted tail drop parameters. QoS configuration is solely port based (the Catalyst 2960 S I. Response Time in a Switched Tree Network does not feature VLAN-based QoS). This does not pose any severe limitations to the deployment of QoS for our purposes Taking the outset of our running example, we take a closer because we assume a static port assignment. To the end of BE look at the end-to-end response time for the task chains headed devices, the associated ports were set nontrusted. This implies e (a) by c1 . The communication for is depicted in Fig. 16, indexes a BE egress queuing even for packages tagged with (a higher) above show the order of delivery: CoS. In this way, we can ensure differentiation between traffic 1,2,3,4 5,6 from the real-time and BE domains. ec → c1 → m → a → a2 → m → b → b1 → b2 1 4 RX 2 RX 3) Basic Setup: Fig. 18 depicts the generic experimen- The response time can be derived analogously to Section III-G. tal setup allowing us to observe device/link layer and switch- However, we will need to take into account blocking and queuing ing overhead as well as implications of network topology overhead implied by BE traffic for the backbone (trunk) links and interference by BE traffic. An initial message (tagged (see Fig. 17). with an index of 1) is produced by a1 . Upon receiving a c) Refinement: Given a proper QoS configuration, the correct package (w.r.t destination and index), device A gen- maximum interference (blocking and queueing) from BE traf- erates a new message to device B with incremented index, fic (at each trunk link) can be bound to the transmission time while device B simply forwards any incoming package (as of a single MTU. This holds because in the worst case the received) to device A.Taska2 toggles a digital output on

78 LINDGREN et al.: END-TO-END RESPONSE TIME OF IEC 61499 DISTRIBUTED APPLICATIONS OVER SWITCHED ETHERNET 295

buffering and the hardware CRC check result in an additional delay. In the experimental setup (not depicted), the reduced me- dia independent interface’s receive data signal was correlated to the EMAC RxProduceIndex indicating a 7 μs delay for the hardware processing.8 The 33 μs round-trip time can thus be derived as: 6.5 μs for transmit/receive 7 μs for MCU EMAC (hardware) processing and 3 μs software processing. Totaling 2*6.5 + 2*7 + 2*3 = 33 μs.

B. Switched Star Network For this experiment, devices A and B were connected over a (single) switch. This experiment focuses on the real-time do- Fig. 18. Generic experimental setup to assess real-world network char- main (and isolates the influence of the switch on real time acteristics. traffic). No additional devices were connected. Again, for the round-trip time we measured with the logic analyzer 55 μs (with a jitter of < 1 μs), these are additional 22 μs compared to Ex1. In each invocation (allowing external monitoring of the round- theory (without cut-through), passing two times over a 100 Mb trip time). The station addresses were selected in the local link implies a transmission time of approximately 14 μs. This administrated space, 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 and leaves us with a total queuing overhead of 8 μs, i.e., a 4 μs 0x02, 0x00,...,0x00, 0x02 for A and B, respectively. The queuing (and CRC check) delay per frame for the Catalyst 2960 minimal Ethernet frame size (60+CRC/FCS) was deployed out switch. (This is a 50% improvement compared to the LPC1769.) of which only 16 bytes (including MAC, EtherType/size, and index) were actually used. C. Switched Single-Layer Network With Best 4) 802.1Q Setup: For the experiments involving 802.1Q, Effort Traffic frames from the real-time domain were tagged with the Ether- For this experiment, real-time devices (A and B) and BE Type 0x8100, PCP/CoS 7 (network), DEI 0 (nondrop), and devices (D and E) were connected over a (single) switch. VLAN 2 giving a frame size of 64 (according to Cisco’s inter- Real-time and BE traffic were separated by port (i.e., all real- pretation of the minimal frame size).7 time traffic is transmitted between devices A and B, while all The BE traffic was generated (and received) on ordinary BE traffic is transmitted between devices D and E). For the PCs (devices D and E) using the tools Ostinato 0.6 (for pack- experiment (on average) 5000 (MTU 1518 bytes) packages/s age generation) and Wireshark (for package analysis). The sta- were sent in between devices D and E (emulating a generic and tion addresses were assigned 0x02, 0x00,...,0x00, 0x03 and heavy IP traffic load). To stress buffering effects, packages were 0x02, 0x00,...,0x00, 0x04 for D and E, respectively. BE sent in bursts of 10 at the rate of 500 bursts/s. The measured frames were tagged with CoS 7 in order to verify that QoS round-trip time was still 55 μs (with a measured jitter of <1 μs), is neglected for untrusted ports (according to the Cisco specifi- which verifies the assumption that port separation allows us to cation). share the same physical switch without affecting the real-time performance. To this end, the Catalyst 2960 meets the claims A. Experiment 1: Point-to-Point Network (the switching fabric was able to forward packages at link speed For this experiment, devices A and B were directly connected for the presented load9). to each other allowing us to focus on the real-time domain with- out the influence of switches in the network. This shows the over- D. Switched Tree Network head of the link layer transmission time and EMAC hardware For these experiments, we tunneled all traffic over a common and overhead. For the round-trip time between A backbone (trunk) with link speed set to 100 Mb (Ex4, ..., Ex6) (receiving a frame, extracting payload, verifying index, sending and 1 Gb for Ex7. Two edge switches were used connecting (A, a frame) and B (receiving a frame, extracting payload, sending D) and (B, E), respectively. One backbone switch connected a frame with same payload), we measured 33 μs (with a jitter of the edge switches. In the first experiment (Ex4), only real-time < 1 μs). This experiment confirms the low link layer soft- traffic was emitted. This allowed us to assess the forwarding ware and hardware overhead indicating that end-to-end response times in the range of 15 μs are indeed possible using simple Ethernet-based devices and standard framing. A closer inspec- 8Notice that this information is LPC 1769 specific. Other MAC-layer imple- tion of the LPC1769 EMAC indicates that a receive buffer is mentations may express different characteristics. 9We seek to characterize the switch behavior on the link layer, thus the deployed in order to implement MAC address filtering. This presented BE traffic is set within the bandwidth limit of the link (5000*(1518 + 8 + 12)*8 = 61.52 Mb/s). Assessing the behavior when saturating the link (i.e., flow control) would require another type of experimental setup where the 7Finer grained QoS can be achieved for switch architectures implementing frame payload is actually interpreted and acted upon in real-time by the devices traditional priority queues. The Catalyst 2960 S implements hybrid queuing D and E. While being straightforward, such experiments are left out of scope where egress queue 1 has priority over the other 3 egress queues. for this paper.

79 296 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 1, FEBRUARY 2017

TABLE I Ethernet, e.g., [25]. Our models extend [23]–[25] and our ex- BE INDICATES THE PRESENCE (AND FRAME SIZE) OF BEST EFFORT periments indicate their validity. The problem of shared output TRAFFIC OVER THE SHARED LINKS queuing with QoS has been identified and discussed in [26]. In our case, we need only two priority classes in order to pro- Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 vide real-time guarantees (given the assumption that switches P2P Star Star + Tree Tree + Tree + GB Tree + are nonblocking). Our experiments validate the feasibility to BE 1518 BE 1518 BE 64 BE 1518 COTS switches (such as the Cisco 2960) and show that the im- 33 μs55μs55μs 103 μs 103:340 μs 103:103 μs 77:78 μs pact of noncritical traffic can be bound and safe response times obtained. (All frame sizes include FCS/CRC.) GB indicates a Gigbit backbone. Currently, there are attempts to add response-time predictabil- overhead across the trunks. A round-trip delay of 103 μs with ity to switched Ethernet by defining standards for deterministic a jitter of <1 μs was observed. These are additional 48 μs Ethernet [27]. Our approach is not becoming obsolete by having compared to Ex2. The tree topology adds 2 transmit/receive networks following those standards. In fact, analysis for end-to- delays and 2 buffering delays in each direction. Thus, 2*7 + 2*4 end response times in an IEC 61499 distributed application = 44 μs roughly correspond to the observed additional delay. becomes then vendor independent. Our approach is currently The second experiment (Ex5) contains both real-time and BE limited to Ethernet switches with a suitable and known priority streams. The observed best case delay was 103 μs and the worst queueing implementation. case delay was 340 μs, which roughly corresponds to the theoret- Our results are safe to the worst case assessing end-to-end ical bound of 103 + 2 ∗ tr(MTU) = 103 + 2 ∗ 122 = 347 μs. response in a holistic manner including the overhead of re- In the third experiment (Ex6), we reduce the BE MTU to ceive/transmit hardware and software. Further improvements 64 bytes. The additional blocking was masked by the internal can be obtained if the MTU of BE traffic is controlled. This, queuing delay of the switch showing that the overhead of BE however, requires the switches to be configurable as well as traffic can be efficiently mitigated by controlling the MTU. being capable of segmenting and reassembling packages at the In the final experiment (Ex7), we observe the effect of a Giga- edges of the network. Alternatively, each BE device needs to be bit backbone. Both best and worst case behavior were improved, configured to segment the traffic at the link layer, which might and the effect of additional blocking was effectively masked by be not practical in the general case. First, the link layer drivers the queuing mechanism of the switch. In comparison to Ex5 may not provide to set the MTU in a user configurable way. And (where the worst case additional delay compared to Ex2 was second, the usage of such devices outside the system will suffer 295 μs), a 10× speed improvement would yield a delay of from decreased performance. Moreover, relying on configura- 30 μs and thus 55 + 30 = 85 μs. This roughly coincides with tion is fragile and cannot be trusted. However, our experiments the measured times. This indicates that even without controlling show that the adversative effects of BE traffic can be effectively the MTU of BE traffic, a faster (Gigabit) backbone mitigates the mitigated by a faster backbone. blocking overhead and improves the overall performance. And In this way, our results challenge the common belief that that the switch performs as expected even at gigabit speed. specialized real-time Ethernet protocols and components are All measurements are within ±10% of the theoretical results, required in order to offer predictable timing over Ethernet. In thus validating the analytic model. To mitigate the effect of cases where absolute performance requirements cannot be met BE traffic, our experiments confirm that lowering the MTU of by the presented approach, specialized solutions and technolo- BE traffic lowers the blocking time of the shared link. How- gies (e.g., EtherCAT) may be combined with the presented work ever, this approach has the potential to create problems with the allowing to further reduce end-to-end response times with (po- packet scheduler of the switch. Because the switches are typi- tentially) tighter bounds to the analysis. cally specified to manage packages at full speed with maximum Set target of future work is to further automate the design MTU, switching minimum size packages will generate 20 times process for distribution allowing networking components to be the number of packages and thus putting an additional 20 times generated directly from 61499/RTFM models and to integrate load on the internal packet scheduler. Hence, special care must the proposed methods for analysis in the RTFM-4-FUN tool be taken to validate the behavior of the switch(es) if utilizing suit. In order to ensure correctness of the presented analysis this approach. and future automation features, rigorous formalization and ex- traction of certified code are possible through, e.g., Coq [28]. V. R ELATED AND FUTURE WORK Another area of interest is deriving a generic test bed allowing Compositional analysis of real-time systems have been ex- fast and accurate characterization of switch configurations facil- tensively studied in [21] and [22]. Similar to the recent work itating the deployment in real-life environments. In particular, [22], platform overhead is included in our analysis. In our case, we would like to study the Netgear low-cost solutions (e.g., the platform level scheduling is nonhierarchic and end-to-end re- GS10T-200) claiming to implement strict priority queues. sponse time in a distributed setting is obtained by including communication overhead. Guaranteed real-time services over VI. CONCLUSION standard Ethernet have been studied from a modeling and sim- ulation perspective (e.g., in [23]). Response time of IEC 61499 In this paper, we have discussed end-to-end response times has been studied at the device level, e.g., [24] and over switched for distributed IEC 61499 applications communicating over

80 LINDGREN et al.: END-TO-END RESPONSE TIME OF IEC 61499 DISTRIBUTED APPLICATIONS OVER SWITCHED ETHERNET 297

switched Ethernet networks. The end-to-end analysis takes de- [10] RTFM: Real-Time For the Masses. [Online]. Available: http://www.rtfm- vice level response time, network link layer transmission times, lang.org. Accessed on: Feb. 17, 2015 [11] P. Lindgren, M. Lindner, A. Lindner, D. Pereira, and L. M. Pinho, “RTFM- network topology, as well as switch specifics (up to layer 2) into core: Language and implementation,” in Proc. 10th IEEE Conf. Ind. Elec- consideration. For the analysis and implementation, the system tron. Appl., 2015, pp. 990–995. is translated into concurrent tasks and resources (critical sec- [12] P. Lindgren, M. Lindner, A. Lindner, J. Eriksson, and V.Vyatkin, “RTFM- 4-FUN,” in Proc. IEEE Int. Symp. Ind. Embedded Syst., 2014, pp. 1–4. tions) in RTFM-core language. Such models can be expressed [13] P. Lindgren, M. Lindner, A. Lindner, J. Eriksson, and V. Vyatkin, “Real- directly in the RTFM-core language derived from IEC 61499 time execution of function blocks for internet of things using the RTFM- system specifications or any mix thereof. The static nature of kernel,” in Proc. 2014 IEEE 19th Int. Conf. Emerg. Technol. Factory Autom., 2014, pp. 1–6. the RTFM-core model allows device level response times to [14] T. Strasser et al., “Framework for distributed industrial automation and be assessed using available schedulability analysis. For each control (4DIAC),” in Proc. 6th IEEE Int. Conf. Ind. Informat., 2008, link, the number of (outstanding) messages in the system is pre- pp. 283–288. [15] J. Maki-Turja and M. Nolin, “Fast and tight response-times for tasks cisely bound, thus allowing the blocking and queuing time to with offsets,” in Proc. 17th Euromicro Conf. Real-Time Syst., 2005, be safely estimated. To assess the impact of switches in the net- pp. 127–136. [Online]. Available: http://dx.doi.org/10.1109/ECRTS. work, a generic model is presented, and an experimental setup 2005.15 [16] K. Tindell, A. Burns, and A. Wellings, “Calculating controller area net- is developed allowing characterization of the switch(es) at hand. work (CAN) message response times,” Control Eng. Practice,vol.3, Our experiments conducted on commercially available micro- pp. 1163–1169, 1995. controllers and Ethernet switches indicate that safe end-to-end [17] M. D. Natale, “Scheduling the CAN bus with earliest deadline techniques,” in Proc. 21st IEEE Real-Time Syst. Symp., 2000, pp. 259–268. response times well below 1 ms are feasible even over tree net- [18] M. Felser, “Real-time ethernet-industry prospective,” Proc. IEEE, vol. 93, work (backbone) topologies and in the presence of unmanaged no. 6, pp. 1118–1129, Jun. 2005. BE traffic. [19] K. Lee and J. Eidson, “IEEE-1588 standard for a precision clock synchro- nization protocol for networked measurement and control systems,” in Proc. 34th Annu. Precise Time Time Interval Meeting, 2002, pp. 98–105. ACKNOWLEDGMENT [20] R. Froom, M. Flannagan, and K. Turek, Cisco Catalyst QoS: Quality of Service in Campus Networks. Indianapolis, IN, USA: Cisco Press, 2003. The authors would like to thank Dr. A. I. Awad, LuleaUniver-˚ [21] I. Shin and I. Lee, “Periodic resource model for compositional real- sity of Technology (LTU), for his support and allowing us to use time guarantees,” in Proc. 24th IEEE Real-Time Syst. Symp., Dec. 2003, pp. 2–13. the hardware facilities in the InfoSec Lab, LTU for conducting [22] L. Phan, M. Xu, J. Lee, I. Lee, and O. Sokolsky, “Overhead-aware compo- the experimental works in this article. sitional analysis of real-time systems,” in Proc. 2013 IEEE 19th Real-Time Embedded Technol. Appl. Symp., Apr. 2013, pp. 237–246. [23] X. Fan and M. Jonsson, “Guaranteed real-time services over standard EFERENCES R switched ethernet,” in Proc. 30th Annu. IEEE Conf. Local Comput. Netw., [1] Z. Xiaoxiang, “Modbus protocol and programing,” Electron. Eng.,vol.7, Sydney, Australia, 2005, pp. 490–492. 2005, Art. no. 016. [24] L. Lednicki, J. Carlson, and K. Sandstrom, “Device utilization analysis for [2] K. Bender, Profibus: The Fieldbus for Industrial Automation. Englewood IEC 61499 systems in early stages of development,” in Proc. 2013 IEEE Cliffs, NJ, USA: Prentice-Hall, 1993. 18th Conf. Emerg. Technol. Factory Autom., Sep. 2013, pp. 1–8. [3] G. Cena, L. Seno, A. Valenzano, and S. Vitturi, “Performance analysis [25] A. Schimmel and A. Zoitl, “Real-time communication for IEC 61499 of ethernet powerlink networks for distributed control and automation in switched Ethernet networks,” in Proc. 2010 Int. Congr. Ultra Modern systems,” Comput. Standards Interfaces, vol. 31, no. 3, pp. 566–572, Telecommun. Control Syst. Workshops., Oct. 2010, pp. 406–411. 2009. [26] R. Janowski, P. Krawiec, and W. Burakowski, “On assuring QoS in Eth- [4] D. Jansen and H. Buttner, “Real-time Ethernet: The EtherCAT solution,” ernet access network,” in Proc. Int. Conf. Netw. Serv., 2007, pp. 17–22. Comput. Control Eng., vol. 15, no. 1, pp. 16–21, 2004. [27] W. Steiner, F. Bonomi, and H. Kopetz, “Towards synchronous determin- [5] P. Ferrari, A. Flammini, and S. Vitturi, “Performance analysis of profinet istic channels for the internet of things,” in Proc. 2014 IEEE World Forum networks,” Comput. Standards Interfaces, vol. 28, no. 4, pp. 369–385, Internet Things, pp. 433–436, Mar. 2014. 2006. [28] Y. Bertot and P. Casteran,´ Interactive Theorem Proving and Program [6] Function Blocks—Part 1, Architecture, IEC Standard 61499, 2012. Development: Coq’Art: The Calculus of Inductive Constructions. Berlin, [7] Programmable Controller—Part 3, Programming Languages, IEC Stan- Germany; Springer-Verlag, 2013. dard 61131-3, p. 230, 1993. [8] T. Baker, “A stack-based resource allocation policy for realtime pro- cesses,” in Proc. 11th Real-Time Syst. Symp., Dec. 1990, pp. 191–200. [9] J. Eriksson, F. Haggstrom,¨ S. Aittamaa, A. Kruglyak, and P. Lindgren, “Real-Time for the masses, Step 1: Programming API and static priority SRP kernel primitives,” in Proc. IEEE Int. Symp. Ind. Embedded Syst., 2013, pp. 110–113. Authors’ photographs and biographies not available at the time of publication.

81 82 Paper C TRust - A Task Model for the Rust Language

Authors: Per Lindgren, Johan Eriksson, Marcus Lindner, David Pereira and Luis Miguel Pinho

To be submitted.

83 84 TRust-ATaskModel for the Rust Language

Per Lindgren, Johan Eriksson and Marcus Lindner David Pereira and Lu´ıs Miguel Pinho Lulea˚ University of Technology CISTER / INESC TEC, ISEP Email:{per.lindgren, johan.eriksson, marcus.lindner}@ltu.se Email: {dmrpe, lmp}@isep.ipp.pt

I. ABSTRACT Top ::= #> CCode <# | Task IntId{ Stmt}|TopTop Rust is a modern programming language, with the outstand- ing feature of compile time memory analysis. In this paper Stmt ::= #> CCode <# | claim Id{Stmt}|Stmt Stmt we discuss the outsets for expressing a task and resource model, that utilizes the Rust type system to give compile Fig. 1: Simplified grammar of the RTFM-core langauge. time guarantees for race and deadlock free execution onto single core MCUs of the IoT era. The approach promises // Resource A protects a, P protects p efficient executables (on par with hand written C code), with struct { int x; int y;}p={.x=0,.y = 1}; predictable response times, achieved by exploiting the under- int a[3] = {1,2,3}; lying hardware for Stack Resource Policy based scheduling. Task 2 task1 { In contrast to thread based approaches, concurrency in TRust claim A { #> a[0]=a[2]; <# } is implicit, directly reflecting the concurrent nature of the } system’s environment. In this way the programmer is relieved Task 1 task2 { from the tedious and error prone management of threading claim A{ primitives, and can focus the intended reactive behavior. claim P{ #> p.y=a[0]; a[1]=p.y; <# } CORE II. RTFM- #> a[2]=0; <# A. RTFM-core Language and Model of Computation } } The RTFM-core language [3] allows reactive real-time sys- Listing 1: RTFM-core: Simple example system. tems to be specified in terms of time constrained, parametrised tasks with (potentially nested) critical sections protected by named resources. Whereas the minimalistic RTFM-core lan- III. RUST guage focus on concurrency, inlined C code is used to specify Rust combines modern language features as closures, para- the functionality. A simplified grammar is presented in Figure metric polymorphism, and traits, with a novel ownership 1. Listing 1 depicts a system with two tasks task1 and model allowing compile time safety analysis of systems with task2 with priorities 2 and 1 respectively. The resource shared state. Through pre-processing (Rust macros) syntactic A ensures race free access to a between tasks task1 and extensions can be readily developed. Targeting systems pro- task2, while P ensures constancy of the struct p. Claims gramming, bindings to traditional threads based libraries are can be nested, allowing access e.g., to both p and a. available. Rust claims zero-overhead abstractions, motivated 1) RTFM-kernel: Targeting lightweight micro-controllers, by the fact that abstractions are efficiently instantiated. Ef- the RTFM-kernel implements the scheduling primitives in ficiency of the generated code can potentially surpass hand terms of C-macros exploiting the underlying bare-metal in- written C/C++ through allowing aggressive, yet safe optimiza- terrupt hardware for single core, static priory, preemptive tions. Rust relies on LLVM for analysis and back-end code scheduling under the Stack Resource Policy (SRP)[2]. SRP- generation. based scheduling [1] brings benefits such as deadlock free execution sharing a single execution stack, single blocking and A. Zinc established methods to response time analysis. 2 2) RTFM-core Compiler: The RTFM-core compiler[3] For bare-metal execution, Zinc provides a generic, mem- translates the RTFM-core model into plain C code, with ory safe ecosystem for the ARM-cortex architecture. The inlined references to the scheduling primitives. Static priorities generated executable (.elf), maps directly to the underlying are derived for the time constrained tasks, and static resource hardware. Targeting small embedded devices (of the IoT era), ceilings are derived for the critical sections1. a minimal executable is free of external dependencies (beyond the required Rust libcore). 1A subset of RTFM-core has been formally specified, and a verified compiler has been developed[4]. 2https://github.com/hackndev/zinc

85 IV. TRUST according to SRP. All references to shared data is safe through We envision TRust, a reliable task and resource model implicit de-referencing of (locked) guards. The vanilla Rust 4 for bare-metal applications. Building on Zinc, hardware ab- compiler will reject any unsafe usage. stractions are readily available, e.g., allowing the binding V. I N RUST WE TRUST! of functions to interrupt vectors (i.e., we can declare tasks Current work includes performing the SRP analysis on bases as functions). Moreover, the Rust ownership model prevents of the call tree. Required information is available in the MIR5 unsafe sharing between such tasks (functions). Resources, can representation of the program (spanning all tasks). Indeed, be modelled in a similar way to Mutex in Rust. Listing 2, resource ceilings are constants and should be derived as such, TRust 3 depicts a sketch of a possible implementation . The omitting the need for the res_ceiling field. Furthermore, Sync/Drop trait allows the memory analysis to acknowledge we intend to provide concrete syntax and compiler support for the mutual exclusion, the Deref/DerefMut traits provide claim task definitions, asynchronous calls and timing semantics of safe de-reference of shared data. The function returns [3], to provide a complete framework for real-time embedded a guard (from which de-referencing is possible). Notice here, software development that you can TRust! under SRP a claim will always be granted, hence execution is use TRust; lock-free by construction and thus extremely efficient. struct Ps { pub struct Res { x : i32, data: UnsafeCell, y : i32, res_ceiling: i32, } old_ceiling: atomic::AtomicUsize, } // A, resource ceiling is max {p(task1), p(task2)} = 2 static A: TRust::Res<[i32;3]> = TRust::Res::new([1,2,3], 2); #[must_use] // P, resource ceiling is max {p(task2)} = 1 pub struct ResGuard<’a, T: ’a> { static P: TRust::Res = TRust::Res::new(Ps{x:0, y:1}, 1); __lock: &’a Res, } // task1 at prio 2 pub fn task1() { unsafe impl Sync for Res {} let mut a = A.claim(); a[0]=a[2]; impl Res { } pub const fn new(value: T, ceiling: i32) -> Res { Res { // task2 at prio 1 data: UnsafeCell::new(value), pub fn task2() { res_ceiling: ceiling, let mut a = A.claim(); { old_ceiling: atomic::AtomicUsize::new(0), let mut p = P.claim(); } p.y = a[0]; a[1] = p.y; } } pub fn claim<’a>(&’a self) -> ResGuard<’a, T> { a[2] = 0; self.old_ceiling.store(NVIC_GET_BASEPRI(), ...); } NVIC_SET_BASPRIMAX(self.res_ceiling); ResGuard { __lock: self } } Listing 3: Example system }

impl<’res, T> ops::Deref for ResGuard<’res, T> { REFERENCES type Target = T; [1] T.P. Baker. A stack-based resource allocation policy for fn deref(&self) -> &T { unsafe { &*self.__lock.data.get() } realtime processes. In Real-Time Systems Symposium, } 1990. Proceedings., 11th, pages 191 –200, December } 1990. impl<’res, T> ops::DerefMut for ResGuard<’res, T> { [2] Johan Eriksson, Fredrik Haggstrom, Simon Aittamaa, An- fn deref_mut(&mut self) -> &mut T { unsafe { &mut *self.__lock.data.get() } drey Kruglyak, and Per Lindgren. Real-time for the } masses, step 1: Programming API and static priority SRP } kernel primitives. In SIES, pages 110–113. IEEE, 2013. impl<’res, T> Drop for ResGuard<’res, T> { [3] Per Lindgren, Marcus Lindner, Andreas Lindner, David fn drop(&mut self) { NVIC_SET_BASEPRI(self.__lock.old_ceiling.load(...)); Pereira, and Lu´ıs Miguel Pinho. RTFM-core: Language } and Implementation. In 10th IEEE Conference on Indus- } trial Electronics and Applications (ICIEA 2015), 2015. Listing 2: TRust [4] Per Lindgren, Marcus Lindner, David Pereira, and Lu´ıs Miguel Pinho. Towards certified compilation of rtfm- A. Example core applications. In 2016 IEEE 21st International Confer- ence on Emerging Technologies and Factory Automation The example from Listing 1, can be straightforwardly im- (ETFA) : Berlin, 6-9 Sept. 2016, 2016. plemented using the TRust abstraction. All resources are stat- ically allocated. The resource ceilings are manually inferred 4The authors would like to acknowledge Nathan Zadoks ([email protected]) for the fruitful discussions on IRC 3Details omitted for the NVIC and memory ordering of atomic operations. (irc.mozilla.org#libfringe). Notice, the implementation is highly experimental. 5https://blog.rust-lang.org/2016/04/19/MIR.html

86 Paper D Scheduling of CRO systems under SRP-DM

Authors: Per Lindgren, Johan Eriksson, Simon Aittamaa, Pawel Pietrzak, Jimmie Wiklander

Originally published in: Real-Time in Sweden (RTiS) 2011 (Workshop)

c LTU

87 88 Scheduling of CRO systems under SRP-DM

Per Lindgren Johan Eriksson Simon Aittamaa Pawel Pietrzak Jimmie Wiklander EISLAB, Lulea˚ University of Technology, 97187 Lulea˚ Email: {Per.Lindgren, Johan.Eriksson, Simon.Aittamaa, Pawel.Pietrzak, Jimmie.Wiklander}@ltu.se

Abstract—Concurrent Reactive Objects (CRO) offers a means In this paper we present a safe method for SRP DM for modeling real-time embedded software. Our previous work schedulability analysis applicable to CRO models. The method has shown how CRO models can be synthesized into executable is based on response time analysis, and takes kernel overhead code, and how preemption levels for a Stack Resource Policy (SRP) based Deadline Monotonic (DM) scheduler can be ex- and blocking into account for ensuring safe estimations. tracted. We have proposed a general design of an efficient SRP Traditional response time analysis requires a-priori informa- DM kernel, which minimizes kernel overhead and blocking by tion on the inter-arrival times of job requests. In our setting job exploiting commonplace interrupt hardware. requests stem either from the system environment (i.e., cap- In this paper we present a safe method for SRP DM schedula- tured as interrupts) or from internal asynchronous messages. blility analysis applicable to CRO models. The method is based on For the latter case we develop a method that analyses the CRO established response time analysis, and takes kernel overhead and blocking into account for ensuring safe estimations. In particular, model in order to derive inter-arrival times for the internal job we show how the complete kernel can be captured in terms of requests. CRO, and we give a set of kernel primitives enabling interrupt Response time analysis also requires the Worst Case Exe- hardware of commonplace platforms to utilized for implementing cution Times (WCETs) of job-instances. To this end we rely SRP-DM scheduling. By the translation from CRO to notions of on an empirical approach, that undertake WCET estimation by SRP, we can undertake established methods for schedulability analysis. To this end, we develop a method to, from a CRO performing clock-cycle accurate measurements on the actual model, derive the inter-arrival times of internal job requests, and executable code (synthesized from the CRO model). This we describe the units of execution for WCET analysis. Given an empirical approach is also applied to characterize WCET for assumption on inter-arrival times of external jobs, the developed scheduling operations. This is a semi-automatic process, that analysis is able to give safe schedulability estimates for CRO in case of data dependent WCET utilizes user input for the based systems. estimation. Due to space limitations this work is not included in the paper. I. INTRODUCTION We show that the complete system, including the SRP- DM scheduler, can be expressed in terms of CRO, and by Concurrent Reactive Objects (CRO) offers a means for the translation to the standard notions of SRP we can rely modeling real-time embedded software. The intended behavior on established methods for analysis. In particular, we detail is captured in terms of Time-Constrained Reactions (TCR) how interrupt hardware is managed through a minimalistic set [3] to events in the system. In case such all reactions have of primitives, and how timer queues (for managing internal completed the system is idle, awaiting future events. As such postponed job requests) are expressed in terms objects. Fur- the CRO model is well suited to systems with both real-time thermore, we discuss the trade-off in between schedulability and power constraints. and hardware resource utilization. In our previous work we have shown how CRO models The paper is structured as follows. Section II gives a can be synthesized into executable code. For the scheduling background to the CRO model and briefly review the code we have defined a mapping from the CRO model (objects, synthesis process from CRO to executable code. Section III methods, and synchronuous/asynchrnuous messages) to the describes the mapping from CRO model to the notions of SRP. notions of SRP (resources, resource requests/releases, job The SRP-DM schedulability analysis is presented in Section arrivals and jobs). Based on the mapping we have shown how V, taking into account the overhead of scheduling primitives resource ceilings and preemption levels for a Stack Resource and the management of timer queuues. The paper is concluded Policy (SRP) based Deadline Monotonic (DM) scheduler in Section VI, where we summarize our results and suggest can be extracted. The choice of SRP-DM is motivated by directions for future research in the field. our target of resource constrained real-time systems. SRP is II. BACKGROUND known to offer benefits of single shared stack for execution (which limits stack space RAM requirements in comparison to A. Underlying Model e.g., multi-threaded approaches), and ensures bound blocking 1) Concurrent Reactive Object Model: The concurrent times for the scheduling. In addition, under SRP-DM we can reactive object (CRO) model is the execution and concur- exploit on mechanisms for interrupt management provided by rency model of the Timber programming language, a general- commonplace hardware platforms (e.g., ARM Cortex M3) to purpose object-oriented language that primarily targets real- mitigate scheduling overhead. time systems [1]–[3]. A subset of C implementing the core

89      4) Message passing and specification of timing behavior: In the CRO model objects communicate by passing mes-    sages. Each message specifies a recipient object (O) and       a method (m) of this object that will be invoked. A mes- SYNC(O, m)   sage is either synchronous ( ) or asynchronous (ASYNC(O, m)). The sender of a synchronous message blocks Fig. 1. Permissible execution window for a message. waiting for the invoked method to complete, while the sender of an asynchronous message can continue execution concur- rently with the invoked method. Thus asynchronous messages features of Timber and also using the CRO model as its introduce concurrency into the system. An asynchronous mes- execution model is called TinyTimber [4]. sage can also be delayed by a specific amount of time. In this section we briefly describe the main features of Timing behavior of a system can be specified by defining the CRO model and its abstraction to Concurrent Reactive a baseline and a deadline for an asynchronous message (a Components, and discuss its implementation. We present an synchronous message always inherits the timing specification informal mapping from the CRO model to the notions of SRP. of the sender). The baseline specifies the earliest point in We focus on reactivity; object-orientation with complete state time when a message becomes eligible for execution, which encapsulation; object-level concurrency with message passing for an external stimulus corresponds to its “arrival time” and between objects, the ability to specify timing behavior of a for a message sent from one object to another is defined system, and the abstraction to components. directly in the code. If the defined baseline is in the future, 2) Reactivity: Reactivity is the defining property of the this corresponds to delaying the delivery of the message. The CRO model, which makes it particularly suitable for embedded deadline specifies the latest point in time when a message system, since functionality of most, if not all, embedded must complete execution, which is always defined relative systems can be expressed in terms of reactions to external to the baseline, see Figure 1 for the permissible execution stimuli and timer events. A reactive system can be described window of a message. Whenever we talk about timing behav- as follows: initially the system is idle, an external stimulus ior of asynchronous messages, we shall extend the notation (originating in the systems environment) or a timer event to ASYNC(O, m, B, D), where B and D are respectively a triggers a burst of activity, and eventually the system returns baseline and a relative deadline of the message. As mentioned to the idle state. A reactive object is either actively executing above, synchronous messages always inherit their timing spec- a method in response to an external stimulus or a message ification from the initial asynchronous message. from another object, or passively maintaining its state. Since Concurrent reactive objects can be used to model the system initially the system is idle, some external stimulus is needed itself and its interaction with its environment (e.g., via sensors, to trigger activity in the system. buttons, keyboards, displays). Events in the physical world 3) Objects and state encapsulation: The CRO model speci- (such as pushing a button) result in an asynchronous message fies that all system state is encapsulated in objects O1,...,On. being sent to a handler object, and system output (e.g., flashing Each object has a number of methods, and the encapsulated an LED) is represented as messages sent from an object to the state is only accessible from the object’s methods. This is also environment. known as a complete state encapsulation. A name of a method 5) Implementation of CRO: The implementation of an m can be fully expanded as Oi : m, where Oi is an object object instance can be either in software (e.g, synthesized from owning the method. Methods of two objects can be executed a CRO definition in C) or provided by the environment. This concurrently, but each method is granted an exclusive access allows incorporating hardware interactions and legacy code to its object’s state, so only one method of an object can (typically external software libraries) in the model, as long as be active at any given time. Coupled with a complete state their interface is compliant with the concurrent reactive object encapsulation, this provides a mechanism for guaranteeing model. state consistency under concurrent execution. The source of 6) Concurrent Reactive Components: To deal with com- concurrency in a system can either be two external stimuli that plexity, system state and functionality is partitioned into a are handled by different objects, or an asynchronous message hierarchy of concurrent reactive components encapsulating sent from one object to another (more about message passing component instances and at the lowest level concurrent re- below). active objects. Thus, interaction in between components will To ensure that execution is reactive in its nature, each always be grounded by interaction in between objects (be- method must follow run-to-end semantics [5], i.e., it is not longing to different components). This allows us to apply allowed to block execution awaiting external stimulus or a a consistent modeling for component interaction, component message. An example of this would be an object representing implementation and interaction with the environment. a queue: if a dequeue method is invoked on an empty queue, it is not allowed to wait until data becomes available, but it B. Stack Resource Policy must instead return a result indicating that the queue is empty. Stack resource policy (SRP) is a policy for scheduling real-time tasks with shared resources that permits tasks with

90 different priorities to share a single run-time stack [6]–[8]. subsequent messages must be synchronous. The set of re- SRP applies directly to scheduling policies with dynamic and sources (objects) potentially requested by a message M0 = static priority, including e.g., Earliest Deadline First (EDF) ASYNC(O0,m0,B0,Do) is defined as which is used by current Timber and TinyTimber kernels, ∗ Objs(M0)={O0}∪{O | M0 → SYNC(O, m, B, D)} and Deadline Monotonic (DM) for which interrupt hardware of commonplace platforms can be utilized efficiently. SRP We also say that M can lock objects Objs(M).Acurrent scheduling has many known advantages to scheduling real- resource ceiling O is just defined as time tasks onto resource constrained platforms and a plethora O = max{π(M)|M ∈M,O ∈ Objs(M)} of methods for analysis has been devised. The traditional version of SRP only addresses single-core systems, however, where M stands for all messages in the system. Note that O SRP has also been extended to multi-core and multi-processor can be computed statically. The current system ceiling Π is systems (see, for example, [9] and [10]). Π=max({0}∪{O|O is locked}

III. TRANSLATION OF THE CRO MODEL The SRP states (cf. [6]) that a message M = ASYNC(O, m, B, D) is blocked from starting execution SRP offers bounded priority inversion, deadlock free ex- until π(M) > Π. In addition to that, in order to be scheduled ecution, sharing of run-time stack, and at most two context for execution M must have highest priority of all jobs, which switches per job instance (since a job can never block waiting in the case of DM follows directly from π(M) > Π. for a resource once started). In order to allow a CRO system to be represented in notions IV. CODE SYNTHESIS OF CRC/CRO of SRP, we must first translate the CRO model into jobs and In this section we give a brief description of the code resources. A straightforward translation is possible: synthesis process. • O Each object i is treated as a single-unit resource. A. XML Format • Each asynchronous message M = ASYNC(O, m, B, D) is treated as a job request, where the resulting job instance A model is stored as an XML file that captures both system initially performs a resource request for O, invokes m, structure and implementation. At the top level, it contains and releases O. one or more modules. A module may contain one or more class definitions. Class definitions are used to declare both • Each synchronous message M = SYNC(O, m, B, D) as a resource request for O, invocation of m and release of components and objects. It also defines the required interface O. (the list of named arguments) and its provided interface (a list of named results). In general a class may contain: • The baseline B of a message ASYNC(O, m, B, D), cor- 1) One state structure: responds to the arrival time of a job request and the deadline D to the relative deadline of the job instance. . Consider 1 1 1 1 1 and M1 = ASYNC(O2,m2,B2,D2). In the DM schedul- 2) One or more methods: ing priorities are defined so that p(M1) >p(M2) iff level π(M), which should satisfy π(M1) <π(M2) D2. Values of preemption levels are natural ... ]]> numbers. For our case, with the DM scheduling, we can assign π(M)=p(M) for every M (since this assignment 3) Zero or more class definitions (as described above) only satisfies the condition for preemption levels, see (P1) in visible in the scope of the component and any sub- [6]). component(s) (referred to as local class definitions). This translation into notions of SRP is possible, since 4) One or more class instances: methods in the CRO model are run-to-end (blocking for future Assume a message M1 = SYNC(O1,m1,B1,D1) (or Where name is the instance name, class selects the M1 = ASYNC(O1,m1,B1,D1)). If the execution of m1 class definition to instantiate, and classarg argu- can give rise to sending a synchronous message M2 = ment(s) for instantiation. ∗ SYNC(O2,m2,B2,D2) then we write M1 → M2. Let → A definition of a component contains (optional) local class be a transitive closure of →. The initial message in a path definitions (3) and class instances (4), while a definition of an ∗ defined by → may be synchronous or asynchronous, the object contain method(s) (2) and (optional) state (1).

91 B. Code Synthesis S =(C, T, D, B, I, R, hp) From the internal CRC model C functions and object (1) definition structures (C typedefs) can be emitted. The next step In this section we will detail how to derive S for CRO based in code synthesis is to instantiate the system, which results systems, and in particular show how detailed knowledge of the in a CRO model [11] of the system. From this model we kernel design (and the underlying scheduling hardware) can be can emit the static object structure, and perform SRP ceiling incorporated into S for accurate of schedulabilty analysis. level extraction.Given emitted C-functions, object definitions and SRP ceiling levels, and the set of scheduling primitives A. Mapping of externally triggered CRO systems the system can be compiled (using a C-compiler) into an This boils down to the standard notion of a reactive system executable binary. with shared resources, having no internally triggered job The timing specification of the model is preserved during requests. code synthesis and later used by the run-time kernel, e.g. the For a CRO system without internal asynchronous commu- process method will render the following C-code: nication (no internal ASYNCs), the tasks of the system S all // local name bindings stem from external job requests, corresponding to messages #define get_feedback ..... Mi = ASYNC(Oi,mi,Bi,Di), where Oi defines the destina- #define control_out ..... m B #define process ..... tion object, i the method to execute, i the baseline relative // method implementation to the arrival time of Mi, and Di the relative deadline to Bi. int controller_process(int arg) { For external job requests Mi, Bi =0(if we want to postpone int fb = SYNC(get_feedback, NO_ARG); m // Controller state update etc execution we will let i emit an asynchronous job request SYNC(control_out, outval); ASYNC. ASYNC(process, MS(10), MS(1), NO_ARG ); We define: } #undef get_feedback Oi Is a resource (object) with a priority ceiling Oi, ac- #undef control_out cording to Section III. #undef process πi the pre-emption level of task i, which in our case is V. SRP-DM SCHDEDULING OF CRO SYSTEMS equal the base priority of task i. Pre-emption levels are π(M ) <π(M ) D >D In the analysis we assume the system to be deadline assigned such that 1 2 iff 1 2. constrained, i.e., inter-arrival times for each job is greater than Notice, for utilizing commonplace interrupt hardware for H the job deadline. the scheduling under SRP-DM, we define a mapping such that H(πi) >H(πj ) iff πi <πj . τi Is a the task i in the system. lp(i) The set of tasks of lower base priority than the base Ci The worst-case computation time required by task i on i i each release. At run-time we assume that any computa- priority of (these tasks could block ). Li The set of resource requests for task i. tion time from 0 to Ci could be required for a single i ∗ invocation of . Li = {(O, m) | mi → SYNC(O, m, ...)} Ti The lower bound on the time between successive arrivals of i.Ifi is a periodic task then this lower bound will also Assuming that the number of hardware interrupt sources and be the upper bound (i.e. the period is fixed and equal to priority levels are sufficient for scheduling the system using Ti ) the interrupt hardware, we have the following mapping: Di The deadline requirement of task i, measured relative to τi τi is corresponding to externally triggered message Mi a given release of i. Note that we require Di ≤ Ti. from the CRO. Bi The worst-case blocking time task i can experience Ci Is the WCET of an execution entity i, (e.g., task i). due to the operation of the priority ceiling protocol (or Ti The inter-arrival times of external job requests is assumed equivalent concurrency control protocol). Bi is normally as input for the analysis. equal to the longest critical section of lower priority tasks Di The relative deadline is explicitly expressed for accessing semaphores with ceilings higher than (or equal each task i. e.g. as in the job request Mi = to) the priority of i. ASYNC(Oi,mi,Bi, Di, ...). Ii The worst-case interference a task i can experience. Bi Bi is under SRP-DM equal to the longest critical section Interference on i is defined as the time higher priority of lower priority tasks accessing semaphores with ceilings tasks can pre-empt and execute, and hence prevent i from higher than (or equal to) the priority of i. For CRO this executing. implicates the worst case blocking time of any task j ri The worst-case response time for a task i measured from with a base priority lower than job i, that has a resource the time the task is released. For a schedulable task ri ≤ request to any object with a ceiling of Oj higher or equal Di (if there was no deadline requirement for i we would to the priority of i. require that ri ≤ Ti). hp(i) The set of tasks of higher base priority than the base Bi =max{0,Cm|∀(O, m) ∈ Lj , O≥πi} (2) priority of i (these tasks could pre-empt i). ∀j∈lp(j)

92 I i o  i The worst-case interference a task can experience. t =60, j =ASYNC(o , m , 0, 40) 1 Interference on i is defined as the time higher priority 1 1 1 11 m11 tasks can pre-empt and execute, and hence prevent i from m executing. 12    ri o  Ii = Cj (3) 2 Tj t2 =60, j2=ASYNC(o2, m21,0, 60) m21 m12 ∀j∈hp(j) m ={… SYNC(o , m ); ...} Ri The worst-case response time for a task i measured from 21 1 12 the time the task is released. For a schedulable task Ri ≤ Fig. 2. Example with external job requests. Di (if there was no deadline requirement for i we would require that ri ≤ Ti). An over approximation of Ri can be done calculating the the interference over the whole Bi will be subceptible to both interference and blocking, time interval of Di as follows: according to the definitions given.    For SRP-DM scheduling of CRO based systems, no  Di Ri = Ci + Bi + Cj (4) postlude is required, commonplace interrupt hardware suffices. Tj ∀j∈hp(j) 2) SYNC pre- and postludes (spr/spl): A synchronous M = SYNC(O ,m ) M R message j j j send from i amounts to The exact i can be obtained by the recurrence equation, O i a resource request j and, when admitted, the execution of computing the busy period of task as: m O   j , followed by a resource release j . Under SRP, such syn-  m m+1 wi chronous messages are ensured to be immediately admitted. wi = Ci + Bi + Cj (5) Tj Thus, for hardware scheduling of CRO SRP-DM, the prelude ∀j∈hp(j) amounts to admitting the request, by setting the hardware 0 Oj Iteration starts with an initial value wi , typically w0 = system ceiling to , and the postlude to re-setting the m+1 m Oi Ci+Bi, and ends when either wi = wi in which case hardware system ceiling to . m the worst-case response time Ri,isgivenbywi , or when m+1 C. Schedulability test of externally triggered CRO systems wi >Di, in which case the task is unschedulable. hp(i) In order to test the system for schedulability, we require The set of tasks of higher base priority than the base T priority of i (these tasks could pre-empt i). assumptions (user input) on inter-arrival times i for the external job requests Mi. Additionally, we need the (estimated) B. Scheduling primitives WCET Ci and blocking time for each resource requests Li, C C C C In the following we will define the schedeuling primi- together with the WCETs ipr, ipl, spr, and spl. Notice, C C C tives necessary for implementing SRP-DM onto commonplace that i is the total execution time including ipr, ipl, and C C hardware platfroms. potential spr’s and spl’s, and that we for safe analysis SYNC(O, m) 1) Interrupt pre- and postludes (ipr/ipl): An external job assume the blocking time of to be computed as Cspr + Cm + Cspl Mi will be admitted by the interrupt hardware iff πi > Π.On . For this presentation we assume WCETs admission, (interrupt entry), the hardware will push the current given. system ceiling level Π on an internal stack of Πs, and set the Based on the presented mapping from CRO to SRP-DM system ceiling Π=πi. On interrupt exit, the hardware will pop together with the presented scheduling primitives, we can the internal Π stack such to restore the previous system ceiling. apply standard SRP-DM analysis. Thus, commonplace interrupt hardware natively implements 1) Example: With the outset from the Example Figure 2, SRP-DM! we obtain the following setting: For externally triggered CRO systems, under SRP-DM, H(O1)=1,H(O2 =2) relative deadlines are sufficient for both scheduling (during D1 =40,D2 =60 run-time) and the schedulability analysis presented. However, H(π1)=1,H(π1)=2 to pave way for internal job request, we derive the arrival time C1 =2+10+2=24,C2 =2+20+2=24 (baseline Bi) for the message Mi, (i.e., the absolute point in (assuming WCET for interrupt pre- and postludes = 2) time at when the event from the environment arrived). On T1 =60,T2 =60, and commonplace platforms this can be implemented by setting L1 = {{},L2 = {(o1, 1 + 10 + 1 = 12)} up a free-running timer, and capture its current value when (assuming WCET for SYNC pre- and posludes = 1) the event is detected. Some platforms (e.g., ARM Cortex M3) The Figure 3, depicts the mapping of S onto a hardware offers hardware support for timer capture, hence provides platform. The environment (in this case the bare metal hard- exact information (to the level of the timer resolution) on ware) is represented by the object env. Interrupt handlers for event arrivals. If unavailable we have to refrain to a software ext int1 and ext int2 will be executed on behalf of occurring solution, capturing the current value of the free-running timer external events when admitted with respect to the system in the interrupt prelude. However, in this case, the baseline ceiling (implemented in hardware H(Π), marked (a) in Figure.

93 H(Π)(a) env R J H(π)  t1 =60, j1=ASYNC(o1, m11, 0, 40) H(π) source o1 o1 j1 1 (b) 1 ext_int1 m11

m12 o2 j2 2 (a)

o2 0 10 20 30 40 50 60 2 ext_int2 m211 m12 Fig. 5. Response time analysis. R1 = I1 + B1 + C1 = 0 + 12(b)+24= 36 ≤ 40, R2 = 24(a)+0+24=48≤ 60. t2 =60, j2=ASYNC(o2, m21, 0, 60) m ={… SYNC(o , m ); ...} 21 1 12 o  t =60, j =ASYNC(o , m , 0, 40) 1 Fig. 3. Externally triggered CRO system, the system ceiling is implemented 1 1 1 11 m11 in hardware Π, and how external interrupts (ext int1,ext int2) at levels (H(O1 =1,H(O2 =2), are connected to (o1.m11,o2.m21). m ={… ASYNC(o , m ); ...} 11 2 21 m12

R J H(π) o2 it2 =60, ij2=ASYNC(o2, m21, 0, 40) m21 m12 o1 j1 1 m21={… SYNC(o1, m12); ...} o j 2 Fig. 6. Example with internal job requests. it2 is derived from the CRO 2 2 model, the externally triggered j1 performs an ASYNC(o2,m22). Inter- arrival time, base- and deadline for it2 are inherited from j1. 0 10 20 30 40 50 60

Fig. 4. Utilization 24/60 + 24/60 = 0.8 ≤ 1.

inv2j (lj )=∀(rk,bk,lk)inlj i (9) max(0,bk ∈ lj, rk≤ri,inv2ji (lk)) The pre- and postludes of SYNC(o1,m12) (marked green in figure) constitute resource request and release by setting the For the example, this equates to blocking by priority in- inv =max(0, 1 + 10 + 1) j hardware system ceiling accordingly. version (Equation (8)) j1 ( 1 is blocked by the lower prioritized j2, holding the resource o1 D. Schedulability of externally triggered systems inv =0 for (1+10+1) time units, Figure 5 (b), while j2 (since it A system is schedulable if each job meets its deadline at is the lowest prioritized job, it will not be subceptible to (any) all times. Under SRP-DM a sufficient schedulability test is priority inversion). Notice, in Equation (9), the ∀(rk,bk,) ∈ lj to calculate the response times for each job, and if all jobs will look into the transitive chains of locks (SYNCs) and return meet their deadlines the system is scheduleable. A necessary the maximum priority inversion blocking inferred by the job condition is that the total utilization u ≤ 1, Equation (6). jj, (i.e., having rk≤ri). Finally we derive the response times , respj =(invj = |J| 1 1 ci 1+10+1)+((cj = 2 + 20 + 2)) = 36 ≤ 40, respj = ≤ 1 (6) 1 2 t (blockj =2+20+2)+(cj = 2 + 20 + 2) = 48 ≤ 60,as i=1 i 2 2 shown in Figure 5. Since the deadlines were met we conclude (24 + 1) Utilization: For he given setting this equates to the system to be scheduleable (Equation (7)). 24)/60 = 0.8 ≤ 1, thus the necessary condition is fulfilled, see Figure 4. In general the utilization is cheap to compute, E. Systems with internal job requests hence if failing the system is decisively unschedulable, and As a prerequisite to schedulability analysis we need to de- we can avoid further computations for analysis. rive the worst case inter-arrival times of internal job requests. 2) Response Time: We now turn to the sufficient condition, 1) Period extraction: To this end, we develop an extraction system is schedulable if Equation (7) holds; mechanism, such to derive internal job request periods (min- imum inter-arrival times) from the model at compile time. Ri ≤ Di (7) By traversing the job request call trees, and detect cycles ASYNC For the given setting this equates to a blocking times containing at least one , periodic processes can be derived. For example, Job A spawns job B with offset 50ms (Equation (5)) I1 =0(no blocking from higher prioritized 60 (ASYNC(B, 50ms, dl) ) job B spawns job A after 20ms jobs), and B1 =24∗ ( 60  =24). Figure 5 (a). (ASYNC(A, 20ms, dl)). Finding the cycle in this case is trivial  and the period is 50ms + 20ms for both jobs A and B. More inv = ∀j ∈ J, πj >πj, max(inv2 (l )) ji j j i ji j (8) complex cases such that, for the example job B spawn job A

94 a b c d e f (b )  H(Π) env 1 enqueuee ot1 H(π) source J H(π) (b )  dequeued timer_intt t 2 2 j1 1

(a)  o1 1 ext_int1 m 11 tj1 2 (b( 3)  3 int_int1 m12 ij 3 o2 2 m211 m 12 0 10 20 30 40 50 60 70

(a) t1 =60, j1=ASYNC(o1, m11, 0, 40) t1 =60, j1=ASYNC(o1, m11, 0, 40) m11={… M1=ASYNC(o2, m21); ...} m ={… M =ASYNC(o , m , 40, 60); ...} 11 1 2 21

(b) it2 =60, ij2=ASYNC(o2, m21,0, 40) Fig. 9. System with specific timing information added to m11. m21={… SYNC(o1, m12); ...}

Fig. 7. Timer queue object and hardware interaction. The postludes of The mapping from CRO to S for systems having internal engueue/dequeue is depicted in purple (dark grey). job requests (ASYNCs) extents the S by a set of Rt resources, one for each timer queue, together with a set of timer jobs Jt a b c d e f and the set of internal job requests IJ and their resources IR.

R J H(π) Rt = {rt1, .., rtn} (10) to1 o1 j1 1 The number of available hardware resources (number of physical timers), and the number of priority levels we are willing to spend for scheduling might vary in between plat- tj1 2 forms, and also for a given platform, with the application at hand (as timers may well be assigned to other tasks, such as controlling PWM outputs etc). Hence, the presented approach is fully parametrized (1 < |Jt| < |πint|), where πint is the set o2 ij2 3 of unique priority levels for internal job requests. 0 10 20 30 40 50 60 Each timer queue resource rtj , is implemented as CRO object oti, with methods to enqueue and dequeue internal Fig. 8. System ceiling Π for example execution. (Π is depicted as black job requests. Figure 7 depicts a system with a single timer lines.) Changes made under hardware control is marked in yellow (grey). Dotted lines indicate interrupt-chaining. Changes to Π by software (pre- and resource, showing how the timer interrupt source (interrupt postlude for resource requests) are showing in green (dark grey). handler) is referred to ot1.dequeue. For each queue object we can (if so desired) deploy specialized implementations (optimized for the derived queue length). Since it is treated after 20ms or 50ms is will not be automatically found and will as an ordinary resource by the synthesis and analysis, we can need to be manually entered as input to the tool. This is also rely on WCET characterization given derived queue lengths. true for all other ”non trivial” cases, for example an external In the following we outline the primitives needed for job that spawns a periodic job. Notice, this might in general hardware scheduling of internal job requests. To this end, bli be a source for an unscheduleable (erronuous) system, and and dli denotes the absolute points in time for the permissible the tool will thus report it as a potential hazard, that might be execution window of job ji (for the sake of the presentation overridden by user input (in case for example of the externally we do not go into details with limited time resolution and triggered reset job). range). # denotes the current time. All jobs that are spawned from a job that are found to 2) enqueue postlude: The pre-defined postlude for the be periodic inherits the caller’s period. (This also applies for enqueue method, will for a job request ji raise the interrupt external jobs with a manually entered minimum inter-arrival at level πi immediately if bli < #. In other case, the physical time (period).) timer appointed will be setup to release the job request (raising ij π bl For the given example 6] the internal job 2 inherits the the interrupt at level tji ) at time m, if that is earlier than minimum inter-arrival time (60 ms) from the external job j1. the currently set release for the timer job bltij .

95 3) dequeue postlude: The pre-defined postlude for the R J H(π) dequeue method, will raise the interrupts at level πm for i to o e each mi,bli = blcurrent, i.e. all arrived messages, and move 1 1 j1 1 each message to the end of the corresponding readyqueue. Notice, for the implementation we actually let messages that (a) (b) arrives during the dequeue also to be released. This does not tj1 2 d improve on worst case behavior for schedeulability analysis, but reduces the actual load on the system, in order to reduce (a) (b) power consumption, and improve on average response times. The postlude finally sets up the timer hardware to release the o2 ij2 3 next job request in queue (if any). 4) Interrupt handlers for internal jobs: Each distinct pre- 0 10 20 30 40 50 60 emption level has a corresponding interrupt handler, and a Fig. 10. Utilization ((2 + 20 + 2) + (2 + 4 + 2) + (2 + 20 + 2))/60 = corresponding readyqueue for released but not yet executed 0.9333... ≤ 1 messages. While the queue is non-empty we pop the first element in the queue and execute the corresponding message. Notice that, the accesses to the readyqueue must be executed the timerint is raised. The enqueue method returns, and the at the priority of the corresponding timer job, hence the system ceiling is reset (in this case no change is made since interrupt handler will request the timer job, perform dequeuing we were already operating at level 1. The method m11, runs and release the timer job resource. to completion and exits the interrupt handler for ext int1, Note: A interrupt handler may be shared between internal and π = max(π). In general the system is now idle, awaiting and external events if necessary. (In this case a simple check stimulus, and on the arrival of message M1, the timer int is performed at entry to see if the source is an external or will be raised. However in our case, the timer interrupt has internal event. already been raised. When the timerint is admitted, job tj1 5) Impact of shared timer queues: Since we treat timer is executed (π =2), Figure 7 (b2) and Figure 8 (c) showing queues as ordinary resources in the system, the bounded the chained interrupt. tj1 performs a request to1.dequeue, priority inversion under SRP holds. The implication is that the system ceiling is stored by the resource request prelude j pi ≤ pi ≤ tj o π =1 ot .dequeue jobs i with priorities roj ji j will be susceptible (green), set to that of the t1 ( ), and 1 to priority inversion by all jobs that access roj , i.e., performs is executed. The postlude of ot1.dequeue raises int int3 ASYNCs deferred to roj , and that the timer job tjj may block 8 (d) and exits. The resource request postlude will restore any job ji ≤ tjj . Thus, increased sharing of timer resources π =2(internal operations on the system ceiling is shown in increases the priority inversion in the system. Moreover, in- green (dark grey), Figure 8. The pending int int3 will when creased sharing also leads to larger queues, which in turns lead admitted be served, Figure 7(b3), and Figure 8 (e) (showing to larger WCETs for enqueue/queue operations. the chaining), and M1 is executed (ij2.m12). The resource Following the example (Figure 6), we find the externally request to o1.m12 is shown in Figure 8. Upon exit the system triggered job request j1, from which we can derive the internal ceiling is restored π = max(π), Figure 8 (f), and the system job request ij2, with the inherited inter-arrival time ij2 =6 is now idle. and dij =60. To this end, we will 1) create a timer resource Consider the system depicted in Figure 9. In contrast to rt1, with resource ceiling rt1 = o1, and 2) create timer job above, m11 now gives specific timing information for ij2, tj bl +40 1, with the a higher priority (lower number) than the highest setting the arrival time to j1 , and a relative deadline πt =2 de = bl + 40 + 60 = 100 priority of any job request of the timer, in this case j1 . of 60 ( ij2 j1 . Here the system will The timer job will request the timer resource rt1 for dequeuing become idle in between the execution of j1 and ij2. l = the timer queue, hence will have a corresponding lock tj1 7) Utilization: The system at hand can now be analyzed {(rt ,bt ), (rt ,bt ) 1 1enqueue 1 1dequeue . Blocking times for queue for schedulability using the same method as described for operations are obtained as discussed above, for the example externally triggered systems. For the utilization, inter-arrival we assume bt1=4. Figure 10 depicts the ASYNC operation of time of tjn is derived from the GCD of the inter-arrival times j1 (a) and the timer job tj1 (b). of its job requests (in this case 60). Figure 10, shows the 6) Examples: Consider the system depicted in Figures 6, 7 inter-arrival times for each job, and we compute the utilization and 8. Assume π = max(π) (system is idle), an event occurs according to Equation (6), 0.9333... ≤ 1. Notice that this causing a ext int1 to be raised, the interrupt is served (π =1, calculation takes all overhead of schdeduling into account. and m11 executes), Figure 8 (a). The ASYNC is executed as 8) Response Time: Also for the response time analysis a SYNC(ot1, enqueue, ...) (requesting the timer queue). The we can undertake the same method as described for ex- enqueue method enqueues the job request (message M1), and ternally triggered systems. This amounts to calculating the sets up the hardware timer accordingly (if earliest arrival time), response time for j1, it will be subceptible to priority inversion t Figure 8 (b). In this case, there will be just one message, so the form the worst case blocking time of the timer job j1 M max(bt ,bt queue is trivial, and since the arrival time of 1 has passed, ( 1enqueue 1dequeue ), as shown in Figure 11 (a,b). The

96 R J H(π) [7] S. K. Baruah, “Resource sharing in edf-scheduled systems: A closer look,” in Real-Time Systems Symposium, 2006. RTSS ’06. 27th IEEE to1 o1 j1 1 International, Dec. 2006, pp. 379–387. [8] P. G. Jansen, S. J. Mullender, P. J. Havinga, and H. Scholten, “Lightweight edf scheduling with deadline inheritance,” 2003. [Online]. (a) (b) Available: http://doc.utwente.nl/41399/ [9] P. Gai, G. Lipari, and M. Di Natale, “Minimizing memory utilization of tj1 2 real-time task sets in single and multi-processor systems-on-a-chip,” in Real-Time Systems Symposium, 2001. (RTSS 2001). Proceedings. 22nd IEEE, Dec. 2001, pp. 73 – 83. (a) (b) [10] P. Gai, M. Di Natale, G. Lipari, A. Ferrari, C. Gabellini, and P. Marceca, “A comparison of mpcp and msrp when sharing resources in the janus o2 ij2 3 multiple-processor on a chip platform,” in Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings. The 9th 0 10 20 30 40 50 60 IEEE, May 2003, pp. 189 – 198. [11] S. Aittamaa, “Programming Embedded Real-Time Systems: Implemen- = 4+(1+10+1)+(2+20+2) = tation Techniques for Concurrent Reactive Objects,” Licentiate Thesis, Fig. 11. Response time analysis, respj1 40 ≤ 40 = (1+10+1)+(2+20+2)+(2+4+4+2) = 48 ≤ 60 Lulea˚ University of Technology, 2011. , resptj1 , =(2+20+2)+(2+4+4+2)+(2+20+2)=60≤ 60 respij2 .For (a) timing window of ij2 is inherited from j1 (yielding an unscheduleable system), while for (b) the timing window for ij2 is explicitly set (with a baseline offset of 40 and the same deadline of 60), yielding a schedeulable system. timer jobs perform enqueue and dequeue operations, under the (artificial) deadline of their inter-arrival time.

VI. CONCLUSIONS AND FUTURE WORK In this paper we have presented a method for SRP-DM schedulability analysis applicable to CRO models, and we have shown how it exploits detailed knowledge of the kernel design in order to ensure safe estimations. The implications of kernel design and hardware support for scheduling have been discussed in detail. In addition, we have developed a method to derive the inter-arrival times of internal job requests, and we have described a method to derive the units of execution for WCET characterization. Given WCETs, and assumptions on inter-arrival times of external jobs, the developed analysis is able to derive safe schedulability estimates. Future work includes demonstrating the developed analysis on real-world applications, and refining analysis to account for offsets of internal job arrivals. The latter would open up for more precise offset based schedulability analysis, thus provide tighter results.

REFERENCES [1] M. Carlsson, J. Nordlander, and D. Kieburtz, “The semantic layers of Timber,” in First Asian Symp. on Programming Languages and Systems (APLAS), ser. Lecture Notes in Computer Science, vol. 2895. Berlin, Germany: Springer-Verlag, 2003, pp. 339–356. [2] A. P. Black, M. Carlsson, M. P. Jones, R. Kieburtz, and J. Nordlander, “Timber: A programming language for real-time embedded systems,” Tech. Rep., 2002. [3] The Timber Language. (webpage) Last accessed 2011-04-15. [Online]. Available: http://www.timber-lang.org [4] J. Eriksson, “Embedded real-time software using TinyTimber : reactive objects in C,” Licentiate Thesis, Lulea˚ University of Technology, 2007. [Online]. Available: http://epubl.ltu.se/1402-1757/ 2007/72/LTU-LIC-0772-SE.pdf [5] P. Lindgren, J. Nordlander, L. Svensson, and J. Eriksson, “Time for Timber,” Lulea˚ University of Technology, Tech. Rep., 2005. [Online]. Available: http://pure.ltu.se/ws/fbspretrieve/299960 [6] T. Baker, “A stack-based resource allocation policy for realtime pro- cesses,” in Real-Time Systems Symposium, 1990. Proceedings., 11th, Dec. 1990, pp. 191 –200.

97 98 Paper E Refined SRP Stack Memory Analysis by Exploiting Critical Sections for Shared Resources

Authors: Johan Eriksson, Simon Aittamaa, Per Lindgren

Originally published in: ERTS2 2014 Embedded Real Time Software and Systems

c LTU

99 100 101 102 103 104 105 106 107 108 Paper F Real-Time For the Masses, Step 1: Programming API and Static Priority SRP Kernel Primitives

Authors: Johan Eriksson, Fredrik H¨aggstr¨om, Simon Aittamaa, Andrey Kruglyak, Per Lindgren

Originally published in: 8th International Symposium on Industrial Embedded Systems (SIES)

c 2013,IEEE

109 110 Real-Time For the Masses, Step 1: Programming API and Static Priority SRP Kernel Primitives

Johan Eriksson Fredrik Haggstr¨ om¨ Simon Aittamaa Andrey Kruglyak Per Lindgren EISLAB, Lulea˚ University of Technology, 97187 Lulea˚ Email: {Johan.Eriksson, Fredrik.Haggstrom, Simon.Aittamaa, Andrey.Kruglyak, Per.Lindgren}@ltu.se

Abstract—Lightweight Real-Time Operating Systems have Establishing real-time and resource guarantees typically re- gained widespread use in implementing embedded software on quires expert knowledge in the field, as no turn-key solutions lightweight nodes. However, bare metal solutions are chosen, e.g., are available to the masses. when the reactive (interrupt-driven) paradigm better matches the programmer’s intent, when the OS features are not needed, or In this paper we set out to bridge the gap between bare when the OS overhead is deemed too large. Moreover, other metal solutions and traditional Real-Time OS paradigms. Our approaches are used when real-time guarantees are required. goal is to meet the intuition of the programmer and to provide Establishing real-time and resource guarantees typically requires a resource-efficient (w.r.t. CPU and memory) implementation expert knowledge in the field, as no turn-key solutions are with established properties, such as bounded memory usage available to the masses. In this paper we set out to bridge the gap between bare and guaranteed response times. We outline a roadmap for Real- metal solutions and traditional Real-Time OS paradigms. Our Time For the Masses (RTFM) and report on the first step: an goal is to meet the intuition of the programmer and at the intuitive programming API backed by an efficient scheduler same time provide a resource-efficient (w.r.t. CPU and memory) suitable for resource and timing analysis. implementation with established properties, such as bounded Our approach is based on the reactive programming memory usage and guaranteed response times. We outline a roadmap for Real-Time For the Masses (RTFM) and report on paradigm, where run-to-completion jobs are triggered either the first step: an intuitive, platform-independent programming from the system’s environment or programmatically inside the API backed by an efficient Stack Resource Policy-based scheduler system. Since such systems are inherently concurrent, critical and a tool for kernel configuration and basic resource and timing sections are typically introduced to avoid race conditions on analysis. shared variables or other system resources. In the context of real-time applications, reactions to events are typically I. INTRODUCTION associated with priorities or timing constraints expressed in Resource constrained platforms such as the ARM-7 and terms of deadlines. ARM Cortex Mx/Rx architectures have gained widespread use By associating events to interrupts and implementing the for cost, size and power efficient implementations of embed- corresponding reaction (job) directly in the respective interrupt ded real-time applications. The Artemis Research Agenda pre- handler (ISR), reactive (event/interrupt-driven) systems can dicts the number of embedded processors to be over 40 billion be straightforwardly implemented and efficiently scheduled by 2020 [1]; their functionality is to a large extent dependent directly by the hardware. However, in order to support shared on embedded software. Unlike programs for general-purpose resources between jobs we need to adopt a resource manage- processors, embedded software typically expresses interaction ment protocol. We choose the Stack Resource Policy (SRP) with internal and external peripherals and typically operates [5], for which we can construct an efficient implementation under resource and timing constraints. that exploit commonplace interrupt hardware. Moreover, SRP Lightweight Real-Time Operating Systems have gained brings additional benefits of deadlock-free execution on a widespread use in implementing embedded software on such single stack, bounded priority inversion, etc. SRP has been platforms. A prominent example thereof is FreeRTOS [2], extensively studied over the last decades and a rich set of whose success is (arguably) due to its familiar API, documen- methods for system analysis have been developed. tation, wide platform support, mature/stable code base, open In this paper, the programming model is restricted to SRP source licence etc. However, with single-unit resources, static priorities and sporadic tasks (deadline < inter-arrival time). These restrictions allow us to 1) bare metal solutions are chosen, e.g., develop a set of kernel primitives that utilise the underlying • when the reactive (interrupt-driven) paradigm better interrupt hardware for static priority preemptive scheduling matches the programmer’s intent, under SRP. In particular, we show that the job request and • when the OS features are not needed, or admission mechanisms for SRP can be handled with zero • whenever the OS overhead is deemed too large; overhead on common micro-controllers that support nested, 2) other approaches are used when real-time guarantees are priority-based interrupt handling. required, e.g., [3] and [4]. Furthermore, we report on the development of KCC (Ker- nel Configuration Compiler), a tool to automatically derive Co-funded by EU-ERDF, 978-1-4799-0658-1/13/$31.00 2013 IEEE resource ceilings and to synthesise a target- and application-

111 specific kernel configuration from an XML system model primitives with a simple C code API and basic tools. The sec- (derived from a C program). Additionally, the tool performs ond step further facilitates program development and increases basic stack depth analysis and, in order to determine schedula- the applicability of RTFM. Support for virtual timers and job bility, response time analysis given inter-arrival and worst-case requests with arguments forms the necessary infrastructure for execution times for jobs and critical sections. mapping more elaborate programming models to the RTFM We characterise the kernel primitives for the market- kernel, offering further abstractions and implementing OS-like dominating ARM Cortex M0/M3 architectures and find that services. job and resource requests introduce only a few bytes of Other lightweight approaches include EDFI [9]. The EDFI memory overhead, and the CPU overhead is in the submi- approach is similar to ours in that it adopts the notions of SRP. crosecond range even for instep M0-MCUs like NXP/LPC11. However, it relies on (dynamic) EDF-based scheduling, which In comparison, overhead is less than a tenth of the eventing precludes exploiting static hardware priorities for efficient mechanism in FreeRTOS; in fact, the proposed scheduler is scheduling. Another approach, combining non-preempting hard to beat even by means of manual coding. background tasks with interrupts, is used in TinyOS [8], which allows for a bounded stack depth. However, TinyOS provides II. REAL-TIME FOR THE MASSES,ROADMAP no native real-time support, and hence the performance of a system relies entirely on the programmer’s ability to split With the ambition to facilitate real-time programming for tasks into sufficiently short sections. In [10], a method to the masses outside the relatively small and domain-specific preemptively execute native TinyOS tasks is suggested. This communities, we propose the following steps: method could be deployed using the proposed RTFM SRP Step 1 Basic Kernel Primitives kernel and is a subject of future work. Yet another mechanism • Choosing a suitable task model for RTFM. We choose for single-stack execution is proto-threads, used in e.g. Contiki the task model of SRP with jobs and resources. [11]. However, there is no notion of timing or priorities for • Specifying an intuitive programming API to RTFM ker- proto-threads, hence no native real-time support. nels allowing both direct application development and the use of RTFM as a micro-kernel for other OS/RTOS. To this end, we propose a platform-independent C code API, III. TASK MODEL compatible with gcc-based tool chains. We adopt the task model used by Baker [5]: • An efficient implementation of kernel primitives for com- monplace hardware. Here we present kernel primitives for J A job J is a finite sequence of instructions to be executed static priority SRP scheduling for ARM Cortex M0/M3. on a single processor. • A tool for target- and application-specific kernel configu- JJdenotes both a job execution request and its execution. ration with support for a basic resource and schedulability p(J ) Defines the (base) priority of J . p(J ) >p(J ) indicates analysis. Here we present the KCC tool, supporting that expediting J  is sufficiently important that comple- response time and basic stack depth analysis, producing tion of J is permitted to be delayed. kernel configurations for ARM Cortex M0/M3. π(J) Defines the preemption level of a job, defined so that a Step 2 Infrastructure job J  may preempt J only if π(J ) >π(J). • Automatic generation of XML system models from C R A nonpreemptable resource R can be claimed by a job programs. for the execution of a critical section. • Implementing virtual interrupt sources necessary to sup- We restrict the SRP model from [5] to single-unit resources. port platforms with insufficient hardware capabilities. Following the abstract resource ceiling definition [5], we • Implementing support for job requests with arguments define: (messages) and virtual timers (for postponed messages). R R • Extending basic system analysis to exploit additional The (static) current ceiling of resource , information, such as task offsets. Step 3 Other Programming Models and True Parallelism R = max({0}∪{π(J)|J ∈ L(R)}) • Mapping of more elaborate programming models to RTFM. These include TinyTimber [6], the REKO frame- where L(R) is the set of jobs that (may) request R. work [7], and TinyOS [8]. Π The (dynamic) current system ceiling, • Investigating other scheduling approaches for RTFM, including dynamic priority SRP, hierarchical SRP, and Π=max({0}∪{π(J)}∪{R|R ∈ Rclaimed}) M-SRP for multicore systems. • Extending basic program analysis and kernel configura- where J is the currently executing job (if any), and tion to support the additional programming models and Rclaimed the set of currently claimed (outstanding) re- scheduling approaches. sources. The presented approach is influenced and motivated by pre- Under SRP a job execution request for J is blocked until vious work on Concurrent Reactive Objects and their imple- p(J ) has the highest priority of all outstanding jobs execution mentation in TinyTimber [6]. The first step covers scheduling requests and Π <π(J).

112 IV. PROGRAMMER’S API V. S TATIC PRIORITY SRP SCHEDULING We provide a basic C code API for RTFM, an example is We assign static preemption levels π(J)=p(J), hence shown in Listing 2. The task model is captured by: we may use π and p interchangeably in the following. We J A job J corresponds to a C code function (without assume that the hardware supports prioritised interrupt nesting, arguments and return values), e.g., void j1 Code() { ... }. which (as described below) allow us to view the interrupt For analysis we require functions to have bound execu- hardware as a preemptive static priority scheduler for our tion time and stack requirement. (Notice, this does not system. For the discussion, we exemplify the kernel primitives preclude the use of recursive functions as long as the (in C/assembler) by notions of the ARM Cortex Mx architec- stack depth requirement can be determined.) tures, (the kernel primitives can be devised similarly for other J Execution requests originate either from the environ- architectures). ment (see below) or posted programmatically within We define a mapping H from priorities p to interrupt the system, e.g., JOB REQUEST(j1). The priority for all priorities H(p), where a higher p gives a higher priority on job execution request to J is given by a define, e.g., the underlying hardware (typically 0 is the highest hardware #define j1 Prio 1, Listing 1. priority). We make the assumption that the number of priorities R Critical sections for the (nested) single-unit resources and ISR vectors (interrupt sources) are sufficient. can achieved by lock/unlock, as shown in j1 Code,or { } Listing 3. Kernel Primitives for ARM Cortex Mx alternatively by CLAIM(r1, ... ), as shown in j2 Code. #define IRQn(J) (J## IRQn )

The latter enforces the LIFO nesting of resource requests #include ”config .h”

as required by SRP. #define Ceiling (R) R## Ceiling #define Code ( J ) J## Code Additionally, to enable hardware based scheduling, each task #define BARRIER LOCK ( ) { asm volatile (”dsb\n” ”isb\n” : : : ”memory” ) ; } #define BARRIER UNLOCK ( ) { asm volatile ( ”” : : : ”memory” ) ; } (job), is bound to a corresponding ISR. The API provides: #define JOB REQUEST ( J ) { NVIC−>ISPR [0] = (1 << IRQn ( J ) ) ; }

J The mapping from an interrupt source to a job request #ifdef SOURCE MASKING int LockMask [5] = { is given by a define, e.g., #define j1 IRQn EINT1 IRQn. ˜(JP1 | JP2 | JP3 | JP4 ) , /∗ prio 0, idle ∗/ ˜(JP2 | JP3 | JP4 ) , /∗ prio 1 ∗/ The macro BIND SOURCE TO JOB REQUEST(J, ISR), ˜(JP3 | JP4 ) , /∗ prio 2 ∗/ ˜(JP4) , /∗ prio 3 ∗/ installs the function (job) as an ISR according to the given ˜(0) /∗ prio 4, highest ∗/ } mapping. ; #define LOCK(R) {\ Reset Program code run at startup/reset, enabling interrupts, ini- int old e n = NVIC−>ICER [ 0 ] ; \ NVIC−>ICER[0] = LockMask [ Ceiling (R)]; \ tializing interrupt priorities etc. For purpose of analysis, BARRIER LOCK ( ) ; Reset can be seen as a (separate) one-shot job. #define UNLOCK(R) \ BARRIER UNLOCK ( ) ; \ And finally for SRP based scheduling, the API provides: NVIC−>ISER [0] = old en ; \ } R The (current) resource ceiling is given by a define, #endif #define r1 Ceiling 2. #ifdef GLOBAL PRIORITY #define LOCK(R) {\ Listing 2 gives an example for the LPC11x (ARM Cortex int sc old = get BASEPRI ( ) ; \ set BASEPRI MAX(H(Ceiling (R))<<8− NVIC PRIO BITS ) ; \ M0), defining two jobs (j1 and j2) and their priorities (1, 2), 2 BARRIER LOCK ( ) ; #define UNLOCK(R) \ being higher. Since both jobs request r1, the resource ceiling BARRIER UNLOCK ( ) ; \ set BASEPRI ( sc old ); \ is 2. } #endif

Listing 1. Example configuration on LPC11x (ARM Cortex M0): config .h #define CLAIM ( R , CODE) { LOCK ( R ) ; {CODE}; UNLOCK ( R ) ; } #define SOURCE MASKING #define H(x) (3−x) #define BIND SOURCE TO JOB REQUEST ( J , ISR ) void ISR ( void ) { J## Code ( ) ; } #define j1 IRQn EINT1 IRQn #define ENABLE ( J ) { NVIC−>ISER [0] = (1 << (IRQn(J ))); } #define j2 IRQn EINT2 IRQn static INLINE void SetPriority (IRQn Type n, uint32 tp){ #define j1 Prio 1 NVIC−>IPR [ IP IDX ( n ) ] = ( NVIC−>IPR [ IP IDX ( n ) ] & ˜(0 xFF << BIT SHIFT (n ) ) ) | #define j2 Prio 2 (((p<< (8 − NVIC PRIO BITS ) ) & 0xFF ) << BIT SHIFT (n ) ) ; } #define r1 Ceiling 2 #define SETPRIO ( J ) { SetPriority (IRQn(J), H(J## Prio )); } #define JP1 (1 << IRQn ( j1 ) ) /∗ set of jobs at priority 1 ∗/ #include ”program . c” #define JP2 (1 << IRQn ( j2 ) ) /∗ set of jobs at priority 2 ∗/ #define JP3 (0) /∗ set of jobs at priority 3 ∗/ int main ( void ) { #define JP4 (0) /∗ set of jobs at priority 4 ∗/ #ifdef GLOBAL PRIORITY set BASEPRI (H(0)<<(8− NVIC PRIO BITS ) ) ; /∗ set to idle priority ∗/ #endif Listing 2. User program: program.c user reset (); /∗ user setup ∗/ { while (1) ; /∗ busy wait , can be modified for sleep mode ∗/ void j1 Code ( void ) } LOCK ( r 1 ) ; /∗ Critical section j1 , r1 ∗/ UNLOCK ( r 1 ) ; } A. Job request void j2 Code ( void ) { JOB REQUEST ( j 1 ) ; /∗ job execution request for j1 ∗/ A job execution request J amounts to a pending interrupt CLAIM ( r1 , { /∗ Critical section j2 , r1 ∗/ on source IRQn(J) associated with job J through the interrupt }); } vector table. If p(J) > Π (Π being the system ceiling) and J BIND SOURCE TO JOB REQUEST ( j1 , EINT1 IRQHandler ); has the highest priority of all pending sources, the IRQn(J) BIND SOURCE TO JOB REQUEST ( j2 , EINT2 IRQHandler ); interrupt is taken, and the interrupt hardware performs the void user reset (void ) { /∗ optional user startup code here , run before system started ∗/ sequence Πold =Π; Π=p(J); J executes; and when SETPRIO ( j1 ) ; SETPRIO ( j2 ) ; /∗ Set HW priorities∗/ ENABLE ( j 1 ) ; ENABLE ( j 2 ) ; /∗ Enable HW sources ∗/ completed Π=Πold, i.e., job admittance according to static } priority SRP is natively supported by the interrupt hardware.

113 Interference (due to preemption) for a job Jj is defined as VII. EVALUATION AND FUTURE WORK sum({E(J )|p(J ) >p(J )}) E(J) i i j [12], where is the execu- The kernel primitives have been implemented for the NXP J tion time for , under the assumption that we select the oldest LPC11c24 (M0) / LPC1769 (M3) platforms and tested with highest priority job first. However, typical interrupt controllers both gcc v4.6 and v4.7. The implementation is expected to choose the ISR on basis of the vector position, rather than work for all other M0/M3 MCU’s, hence a large portion of the time of arrival (for reasons of hardware implementation the embedded market is already covered. Table I shows a complexity). This has the implication that the interference is preliminary evaluation of CPU usage (in cycles) and memory sum({E(J )|p(J ) ≥ p(J ),J = J }) in the worst case i i j i j . overhead (in bytes). Devices operate at maximum speed of 48 B. Resource request/release and 120 MHz, respectively. The test programs were compiled using gcc with the option -Os. Job Latency/Job OH denotes Under SRP, when claiming a resource R, the system ceiling the cost of invoking a higher priority job from a lower, Lock Π should be set max(R, Π) for the duration of the critical OH/Unlock OH reports the overhead for resource manage- section. We present two efficient approaches to manipulate the ment, while Critical Section ID reports the number of cycles system ceiling. with disabled interrupts used for resource management. The 1) Source Masking, All Cortex Mx: — An interrupt IRQn(J) footprint excludes the CMSIS SystemInit (368 bytes). RTFM may be taken only if the corresponding source is enabled. overhead is constant and hard to beat even with manual coding Hence, we can emulate the effect of setting Π=x by disabling and amounts only to the necessary and sufficient protection sources corresponding to jobs J with p(J) ≤ x. Listing 3: mechanisms presented. SOURCE MASKING gives an example implementation for the M0/M1 architecture (having 4 priority levels). REFERENCES To ensure that interrupt masking has taken effect before [1] “ARTEMIS Strategic Research Agenda 2011,” ISBN/EAN 978-90- entering the critical section, reordering directives [13] and 817213-1-8f, April 2011. memory and instruction barriers may be required [14]. A safe [2] R. Barry, FreeRTOS reference manual: API functions and configuration options. Real Time Engineers Limited, 2009. approach is to enforce barriers after (potentially) raising, and [3] K. Hanninen,¨ J. Maki-Turja,¨ M. Bohlin, J. Carlson, and M. Nolin, before restoring, the priority. “Analysing stack usage in preemptive shared stack systems,” Malardalen¨ This approach is highly portable to architectures supporting University, Technical Report ISSN 1404-3041 ISRN MDH-MRTC- 202/2006-1-SE, July 2006. [Online]. Available: http://www.mrtc.mdh. centralized interrupt mask, (the LockMask needs to be adapted se/index.php?choice=publications&id=1136 to the number of levels of the target architecture). [4] J. Maki-Turja and M. Nolin, “Fast and tight response-times for 2) Global Priority, Cortex M3 and above: For architectures tasks with offsets,” in Proceedings of the 17th Euromicro Conference on Real-Time Systems, ser. ECRTS ’05. Washington, DC, USA: that directly support changing the base priority (global pri- IEEE Computer Society, 2005, pp. 127–136. [Online]. Available: ority level) efficient implementation is straightforward. Only http://dx.doi.org/10.1109/ECRTS.2005.15 sources with higher priority than the current base priority can [5] T. Baker, “A stack-based resource allocation policy for realtime pro- cesses,” in Real-Time Systems Symposium, 1990. Proceedings., 11th, be admitted. In effect, the system ceiling for SRP scheduling Dec. 1990, pp. 191 –200. is implemented directly by the hardware). [6] J. Eriksson, “Embedded real-time software using TinyTimber : reactive objects in C,” Licentiate Thesis, Lulea˚ University of VI. RTFM UTILITIES Technology, 2007. [Online]. Available: http://epubl.ltu.se/1402-1757/ 2007/72/LTU-LIC-0772-SE.pdf The presented RTFM API together with the static priority [7] J. Wiklander, “A reactive approach to component-based design of SRP kernel primitives is sufficient to implement reactive resource-constrained embedded systems,” PhD Thesis, Lulea˚ University software onto lightweight platforms. However, using the API of Technology, 2011. [8] P. Levis, S. Madden, J. Polastre, R. Szewczyk, A. Woo, D. Gay, J. Hill, requires the programmer to manually derive resource ceilings M. Welsh, E. Brewer, and D. Culler, “TinyOS: An operating system for and to establish resource usage and real-time properties. To sensor networks,” in in Ambient Intelligence. Springer Verlag, 2004. this end, we have developed a Kernel Configuration Compiler [9] P. G. Jansen, S. J. Mullender, P. J. Havinga, and H. Scholten, “Lightweight edf scheduling with deadline inheritance,” 2003. [Online]. (KCC) that, given an XML defining the target architecture Available: http://doc.utwente.nl/41399/ and chosen scheduler, the jobs, their resource requests, and [10] P. Lindgren, H. Makitaavola,¨ J. Eriksson, and J. Eliasson, “Leveraging environment bindings, produces a target-/application-specific TinyOS for integration in Process Automation and Control Systems,” in IECON 2012 : 38th Annual Conference of the IEEE Industrial kernel configuration. Furthermore, given WCET for jobs and Electronics Society, 2012. critical sections, deadlines and inter-arrival times, the tool [11] A. Dunkels, B. Grnvall, and T. Voigt, “Contiki - a lightweight and performs a schedulabilty test through basic SRP response time flexible operating system for tiny networked sensors,” in Proceedings of the First IEEE Workshop on Embedded Networked Sensors analysis and a basic, yet safe, stack memory analysis. The (Emnets-I), Tampa, Florida, USA, Nov. 2004. [Online]. Available: KCC will be described in a forthcoming paper. http://dunkels.com/adam/dunkels04contiki.pdf [12] N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. J. Wellings, TABLE I “Applying new scheduling theory to static priority pre-emptive schedul- EVALUATION,SM(SOURCE MASKING), GP (GLOBAL PRIORITY) ing,” Software Engineering Journal, vol. 8, pp. 284–292, 1993. [13] GCC,Volatiles. (webpage). [Online]. Available: http://gcc.gnu.org/ FreeRTOS M3 SM M0 GP ≥ M3 onlinedocs/gcc/Volatiles.html Job Latency/Job OH 650/1522 (best case) 20/40 (incl. barriers) 10/20 (incl. barriers) Lock OH/Unlock OH 260/170 (best case) 20/20 (incl. barriers) 10/10 (incl. barriesr) [14] “ARM CortexTM-M Programming Guide to Memory Barrier Instruc- Critical Section ID 40 (constant) - - tions,” ARM, infocenter DAI0321, 2012. Memory Job 468 (per task) 4 per nested resource request 4 per nested resource request Static Mem 76 (per queue) 4+4 per prio 0 Footprint Program 8184 732 720 Native Analysis None Basic Basic

114 Paper G Uniform scheduling of internal and external events under SRP-EDF

Authors: Simon Aittamaa, Johan Eriksson, Per Lindgren

Originally published in: International Conference on Real-Time & Embedded Systems (RTES 2010)

c LTU

115 116 Uniform scheduling of internal and external events under SRP-EDF

Simon Aittamaa Johan Eriksson Per Lindgren Lulea University of Technology Lulea University of Technology Lulea University of Technology 97187 Lulea Sweden 97187 Lulea Sweden 97187 Lulea Sweden [email protected] [email protected] [email protected]

ABSTRACT obvious effect is that scheduling is non-uniform between in- With the growing complexity of modern embedded real-time ternal and external events, complicating both system design systems, scheduling and managing of resources has become and analysis. However, more importantly is that the re- a daunting task. While scheduling and resource manage- source management provided by the RTOS is in effect set ment for internal events can be simplified by adopting a out of play. All resources that might be accessed by exter- commonplace real-time operating system (RTOS), schedul- nal event reactions (interrupt handlers) must be explicitly ing and resource management for external events are left protected. This forces the programmer to manually manage in the hands of the programmer, not to mention manag- critical sections (by interrupt masking) whenever resources ing resources across the boundaries of external and internal claimed might be accessed by external events. This requires events. In this paper we propose a unified system view in- the programmer to diverge from the uniform system view corporating earliest deadline first (EDF) for scheduling and and severely complicates the programming of real-time sys- stack resource policy (SRP) for resource management. From tems. an embedded real-time system view, EDF+SRP is attrac- tive not only because stack usage can be minimized, but In this paper we present a method for scheduling and re- also because the cost of a pre-emption becomes almost as source management that allows external and internal events cheap as a regular function call, and the number of pre- to be treated uniformly from a programmers perspective. emptions is kept to a minimum. SRP+EDF also lifts the Our proposed solution deploys earliest deadline first (EDF) burden of manual resource management from the program- scheduling, and manages shared resources under the stack mer and incorporates it into the scheduler. Furthermore, resource policy (SRP). Earliest Deadline First (EDF) schedul- we show the efficiency of the SRP+EDF scheme, the intu- ing has been shown to be optimal, given that there exists itiveness of the programming model (in terms of reactive no shared resources [8]. In section 2 we introduce a collab- programming), and the simplicity of the implementation. orative hardware/software scheme that performs pure EDF (pure in the sense that both external and internal events 1. INTRODUCTION are scheduled uniformly) scheduling onto platforms featur- Embedded real-time systems are naturally defined as time- ing static priority based interrupt hardware. bound reactions to external and internal events. The cor- rectness of hard real-time systems relies on executing all In addition to meeting the reaction deadlines, system re- reactions in accordance to their time-bounds. In addition to sources need to be adequately managed. The stack resource meeting the reaction deadlines, system resources need to be policy is a priority ceiling protocol with the following prop- adequately managed. Embedded software plays an increas- erties [2]: ingly important role for the realization of such systems. To aid system development, resource management and schedul- • ing can be simplified by the use of a real-time operating sys- schedulability test for systems with shared resources, tem (RTOS). However, commonplace RTOSs treats exter- • deadlock free execution, nal and internal events in a non-uniform manner, both with respect to scheduling and resource management[4, 10, 9]. • resource management is incorporated into scheduling, The scheduling of internal events (managed by the RTOS) is generally overruled by the underlying hardware interrupt • task scheduling onto a single execution stack, mechanism (scheduling reactions to external events). The • pre-emption becomes as cheap as procedure calls,

• eliminates yielding and context switch overhead, and

• SRP gives a tight bound for priority inversion.

In conclusion SRP brings a set of sought after features for resource constrained real-time systems. The deadlock free execution increases system robustness, while the single stack

117 execution eliminate the need (and memory overhead) of mul- – rq - ready queue, ordered by absolute deadline in tiple execution stacks and multiple execution contexts. This ascending order in turn eliminates yielding and context switch overhead, – tq - timer queue, ordered by absolute baseline in since a task is never started unless all resources are available ascending order for the task to complete. Moreover, pre-emption becomes as cheap as a procedure call, since there are no context switches – dl - the currently shortest absolute deadline in a SRP scheduled system. Finally, the bounded priority inversion ensures system responsiveness. • kernel operations – dispatch(t) 2. EDF SCHEDULING if t.deadline < dl then { In the following we assume that each task is triggered by predl = dl - push current deadline an event. Each task has an absolute baseline (time of re- lease), and an absolute deadline, the time in between gives a dl = t.deadline permissible execution window for the task. The intuition be- t.code() - execute task hind earliest deadline first scheduling is simple, tasks should dl = predl - pop pre-empted deadline be executed according to their deadline. This corresponds to if rq != {} then dispatch(rq.dequeue()) scheduling the top element of a priority queue, holding all re- } leased tasks, sorted by their absolute deadline. Whenever a else rq.enqueue(t) new event occurs, the corresponding task is released, its ab- – postpone(t) solute deadline should be computed, and the task should be tq.enqueue(t) inserted in the priority queue accordingly. Typically a task timer.schedule(t.baseline) release can stem either from external or internal events. In- – interrupt(i) ternal events may be postponed to occur at a later point in time, facilitating e.g., periodic events. Common-place micro t = interruptTaskVector[i] controllers support efficient hardware scheduling of external t.baseline = timer.getTime() and timer events through interrupt handlers. However, the t.deadline = t.baseline + t.relativeDeadline scheduling policies of interrupt handlers are pre-dominantly dispatch(t) adopting static priority schemes, which place a major hurdle to efficient implementation of EDF scheduling. – timer.interrupt while tq.top().baseline <= timer.getTime() { We address this problem by introducing a collaborative hard- r = tq.dequeue() ware/software kernel scheme (pure EDF) that deploys EDF rq.enqueue(t) scheduling on both internal and external events. } if tq != {} then timer.schedule(tq.top().baseline) 2.1 Design criteria dispatch(rq.dequeue()) The following key design criteria should be met: • rq.enqueue(t)

• pure EDF scheduling of both external and internal • tq.enqueue(t) events • rq.dequeue(t) • high timing accuracy (high timer granularity and bounded • tq.dequeue(t) timer jitter) • low overhead In the following we will elaborate on the general considera- tions needed for meeting the stated criteria by a re-entrant 2.2 Kernel anatomy for pure EDF kernel: In the following we outline the mechanisms of a platform in- dependent EDF kernel in accordance to the discussed design External events criteria. For a pure EDF scheduler all scheduling decisions should be made based on basis of absolute deadlines. Hence, we need a The task structure has the following selectors: mechanism to capture the absolute arrival times of external events, giving the baseline for the event and the correspond- .baseline - absolute point in time ing task. (The baseline for internal events can be derived .deadline - absolute point in time from the originating external event as discussed later.) In some cases we can rely on the underlying hardware to per- .relativeDeadline form time-stamping of external events, however, in general .code hardware support is as best limited to a subset of the event sources. A generic solution is to perform time-stamping in the interrupt handler (interrupt(i)). The accuracy is in this Key components are: case highly dependent on the blocking time of the interrupt handler, hence all interrupts are handled in a pre-emptive • state variables fashion.

118 Internal events • timerTask.relativeDeadline = 2 Internal events, may emanate either directly from the ex- • timerTask.code = timer.interrupt ecutionofatask,orbeingpostponedtobereleasedata • t1.relativeDeadline = 7 future point in time (given a postponed baseline, relative to • t1.code = ..., postpone(t2), ..., dispatch(t3) that of the originating task). Hence we need mechanisms • t2.baselineOffset = 4 to directly dispatch (dispatch(t)) and postpone events (post- • t2.relativeDeadline = 2 pone(t) ). For realizing the latter case, we use the micro- • t3.baselineOffset = Inherit controller hardware timer set to trigger an interrupt, that • t3.relativeDeadline = Inherit in turn will schedule the timer.interrupt task to release the postponed event. Initially the systems runs a non-terminating idle task with infinite deadline (hence dl is ∞). At this stage the system Timer management is possibly in low power mode awaiting stimuli. A task t1 The timing accuracy is directly dependent on the operating with relative deadline 7 is bound to interrupt source s1.Task frequency of the timer. In tick based kernels, timing events execution, pre-emption, and permissible execution window occur periodically, causing pre-emption overhead to the cur- is shown in figure 1. Stack allocation is shown in figure 2. rently executing task - hence system load will increase with increased timing accuracy [4, 7]. Fortunately, most modern At time 2, an external event s1 occurs, the corresponding micro-controllers offer remedy by means of output-compare interrupt handler is invoked by hardware, pre-empting the functionality. For such platforms we may use a free run- idle task. The baseline is set to 2, and the absolute deadline ning timer, and set the absolute time for an interrupt to be is set to 9. Task t1 is dispatched and since it has the earliest scheduled (timer.schedule(t)). deadline, dl is set to 9, stack resources are allocated, and t1 is executed. Assume that t1 first creates the postponed task Absolute time representation t2 with a relative deadline of 2, and a baseline offset of 4. As the timer has a finite representation (number of bits), the This task is queued in the timer queue (tq) and the next free running timer will eventually overflow (wrap around), timer interrupt is scheduled to occur at time 6. Then t1 such giving a truncated representation of the absolute point dispatches task t3, under the inherited base- and dead-line in time. Under the condition that the number of timer bits of t1, enqueuing it into the ready queue (rq ). is sufficient to encode the largest baseline offset we may un- dertake this truncated view of time. If we need a larger time At time 3, t1 terminates, dl is restored to ∞, t3 is dequeued span for baseline offsets, a virtual timer can be deployed that from rq and dispatched. Since dl is ∞, dl issetto9andt3 extends the hardware timer (least significants bits) with ar- is executed. bitrary number of most significant bits managed in software, implemented e.g., by simply advancing the most significants At time 6, the timer interrupt pre-empts t3, sets baseline to bits on a timer overflow. In the case that the hardware 6, deadline to 8, and dispatches the timerTask.Sincethedl timer bits suffice, the only effect of increasing the timing ac- is 9, dl issetto8andtimer.interrupt is executed. Task t2 curacy is that the range of baseline offsets is decreased (this, is dequeued from tq and enqueued into rq.Sincetq is now without the performance penalty of a fine granularity tick empty no further timer interrupt is scheduled at this time. basedsystem).Incaseweneedtocopewithlargerbaseline timer.interrupt returns to dispatch, the previous deadline offsets, the overhead is limited to that of the virtual timer (9) is restored, and the first task in rq is dispatched. Since implementation and the additional cost of accounting for the dl is 9, dl is set to 8, stack resources are allocated and t2 is range of the virtual timer on related operations. executed.

It should be noted that his scheme does not guarantee any At time 7, t2 terminates, stack resources are deallocated bound on the time-stamping error, this must be ensured and the previous deadline is restored (9). Since rq is empty by the interrupt source or in software, e.g. by disabling the dispatch exists, and the pre-empted t3 is resumed. At interrupt sources until they are allowed to trigger again. time 8, t3 terminates, stack resources are deallocated and the previous deadline is restored (∞). Since rq is empty, Priority queue management dispatch exists and the interrupt handler for s1 terminates, resuming the pre-empted idle task. Most kernel primitives must perform some queue manage- ment, thus the performance of the kernel is directly related to the efficiency of the queue management. While this can 3. SRP SCHEDULING UNDER EDF be done in several ways, we have chosen to use a simple lin- The EDF scheduler discussed in section 2 can easily be ex- ear queue for clarity. A heap-based data-structure would be tended to support shared resources with the Stack Resource more suitable for larger task-sets. Protocol. There are only a few extensions that need to be made compared to the EDF Scheduler (section 2). Here we Example outline the changes to the previous implementation: Assume the following system configuration: The task structure is extended with the following selector:

• v=[timerTask, t1] .preemptionLevel • rq={},tq={} The added/modified key components are: • dl = ∞

119 s1 timer 4.2 Example Application The example application given in listing 3 is an implemen- t3 tation of the previously described example in section 2.2 for SRP-EDF. t2 Listing 1: kernel-source.h // Implementation dependant MACROS t1 #d e f i n e NTASKS // Tasks ( application dependant) #d e f i n e DISABLE ( ) // Disable interupts globally #d e f i n e ENABLE ( ) // Enable interupts globally #d e f i n e STATUS ( ) // Get i n t e r r u p t sta tus #d e f i n e TIMERGET ( x ) // Get cur time 0 1 2 3 4 5 6 7 8 9 #d e f i n e TIMERSET ( x ) // Set next compare event #d e f i n e RETURN TO ( x ) // Push a new function on the thread stack for the ISR // to return from. Used since we cannot call user code Figure 1: Example task execution, pre-emption, and // from interrupt handlers on most platforms permissable execution windows. #d e f i n e SLEEP ( ) // Enter sleep mode #d e f i n e SEC ( x ) // Seconds to time ticks #d e f i n e MSEC ( x ) // Millisec to time ticks #d e f i n e USEC ( x ) // Microsec to time ticks #d e f i n e initObject (x) // Initialize object with // pre−emption level t2 typedef struct { int pl ; } Object ; t1/t3

idle Listing 2: kernel-source.c struct task block { Task ∗ next ; // for use in linked lists Time baseline ; // event time reference point Time deadline ; // absolute deadline (=priority ) 0 1 2 3 4 5 6 7 8 9 Object ∗ to ; // receiving object int (∗ code )( int ); // code to run int arg ; // argument to the above Figure 2: Example stack usage. } ; struct task block tasks [NTASKS]; Task taskPool = tasks ; • added state variables Task taskQ = NULL; Task timerQ = NULL; Time timestamp = 0; – sc - the system ceiling Task cur t a s k = NULL ; int system ceiling = MAXINT ; • modified/added kernel operations static void enqueueByDeadline (Task p, Task ∗ queue ) { Task prev = NULL, q = ∗ queue ; – dispatch(t) while (q && (q−>deadline − p−>deadline <=0)) { < < prev = q; if t.deadline dl AND t.preemptionLevel sc q=q−>next ; then { } predl = dl - push current task p−>next = q; if ( p r e v == NULL ) dl = t.deadline ∗ queue = p ; else sync(t) - execute task and claim resource prev−>next = p; } dl = predl - pop pre-empted task static void enqueueByBaseline (Task p, Task ∗ queue ) { if rq != {} then dispatch(rq.dequeue()) Task prev = NULL, q = ∗ queue ; while (q && (q−>baseline − p−>baseline <=0)) { } else rq.enqueue(t) prev = q; – sync(t) q=q−>next ; saved ceiling = sc } p−>next = q; sc = t.preemptionLevel if ( p r e v == NULL ) ∗ queue = p ; t.code() else prev−>next = p; sc = saved ceiling } static Task dequeue ( Task ∗ queue ) { The fundamental idea with SRP is to allow resource-sharing Task m = ∗ queue ; ∗ queue = m−>next ; in a well defined manner. Since we have resources we must return m; in some way ensure mutual-exclusion, this is reflected by the } addition of the sync() primitive. The sync() primitive en- static void i n s e r t ( Task m, Task ∗ queue ) { m−>next = ∗ queue ; sures that the correct system ceiling is maintained. It should ∗ queue = m; be noted that (as seen in the modified dispatch(t) pseudo- } void dispatch (void ) { code) a lower pre-emption level means higher priority. DISABLE(); if (taskQ&& ((curt a s k == NULL ) || ((curtask−>deadline < taskQ−>deadline ) 4. IMPLEMENTATION && ( s y s t e m ceiling > taskQ−>to−>pl ) ) 4.1 Kernel ) The kernel implementation shown in listing 1 and 2 is a ) { Task saved task = cur task ; minimalistic implementation of SRP-EDF and a derivative cur task = taskQ ; taskQ=taskQ−>next ; of TinyTimber as described in [7]. One important note is // In the paper sync is inlined that in section 3 resources are equivalent to tasks. How- sync ( cur task−>to , cur task−>code , cur task−>arg ); ever, in the kernel implementation, tasks and resources are insert ( cur task , &taskPool ); // recycle task cur task = saved task ; separate data-structures. }

120 ENABLE ( ) ; } myobj eintobj = { initObject ( EINT0 PL ) } ;

void TIMER INTERRUPT HANDLER ( void ) { int t1 (myobj ∗ , int ); Time now ; int t2 (myobj ∗ , int ); TIMERGET ( now ) ; int t3 (myobj ∗ , int );

while (timerQ && (timerQ−>baseline − now < 0)) int t1 ( eintobj ∗ self , int arg){ enqueueByDeadline ( dequeue(&timerQ ) , &taskQ ); POSTPONE ( MSEC ( 4 ) , MSEC ( 2 ) , &o b j 2 , t 2 ) ; if (timerQ) POSTPONE( −1, −1, &obj3 , t3 ) ; // Inh erit bl and dl . TIMERSET ( t i m e r Q−>baseline ); }

RETURN TO(taskswitcher ); int t2 (myobj ∗ self , int arg){ } // Perform 1 ms of work . } Task intsched (Time dl , Object ∗ to , int (∗ code )( int ), int arg ) { int t3 (myobj ∗ self , int arg){ Task m; // Perform 4 ms of work . Time now ; }

TIMERGET ( now ) ; void EINT0 IRQHandler ( void){ // Level triggered interrupt handlers needs to be disabled m = dequeue(&taskPool ); // in the hardware interrupt handler , otherwise we create m−>to = to ; // a infinite loop . The task in responsible for enabling it m−>code = code ; // again after servicing interrupt ( if appropriate ). m−>arg = arg ; NVIC DisableIRQ (18); m−>baseline = now; // Schedule task , with a deadline of 7 ms. m−>deadline = dl + m−>baseline ; intsched(MSEC(7),&eintobj , t1, 0); m−>rel deadline = dl ; }

enqueueByBaseline (m, &timerQ ); void main ( void){ initialize (); TIMERSET ( t i m e r Q−>baseline ); NVIC EnableIRQ ( 1 8 ) ; RETURN TO(taskswitcher ); idle (); return m; } }

Task postpone (Time bl , Time dl , Object ∗ to , int (∗ code )( int ), int arg ) { Task m; Time now ; 5. EXPERIMENTAL RESULTS DISABLE(); 5.1 Platform, Compiler and Setup m = dequeue(&taskPool ); m−>to = to ; The platform used when measuring was an NXP LPC1769 m−>code = code ; m−>arg = arg ; [1] featuring an Cortex-M3 MCU, 512k Flash, and 64k SRAM.

// Negative values => INHERIT The GNU Compiler Collection (GCC) version 4.4.3 was used m−>baseline = bl < 0?curtask−>baseline to compile the test-code. All code was compiler with the :curtask−>baseline ; m−>deadline = dl < 0?curtask−>deadline compiler flags -mcpu=cortex-m3 -mthumb -O2,the-mthumb :m−>baseline + dl ; m−>rel deadline = dl ; // Used for preemtion levels switch is required since the Cortex-M3 core only support the enqueueByBaseline (m, &timerQ ); TIMERSET ( t i m e r Q−>baseline ); Thumb-2 instruction set. All code was run from the flash and the flash accelerator module was disabled. ENABLE ( ) ; return m; }

// Extention , used for syncronous requests 5.2 Memory Requirements // to other objects , not discussed in paper The memory requirement of the kernel, is as follows: // but used for sharing resources . int sync ( Object ∗ to , int (∗ code )( int ), int arg ) { int result ; int saved ceiling stacked ; • Code-size 644 bytes (Flash) int status = STATUS(); DISABLE(); • Data-size 20 bytes (SRAM) saved ceiling stacked = system ceiling ; system ceiling = to−>pl ; • Task-size 24 ∗ n bytes (SRAM) ENABLE ( ) ; result = code(to , arg ); DISABLE(); • Object-overhead 4 ∗ m bytes (SRAM) system ceiling = saved ceiling stacked ; // Try to dispatch any preemted events Where n is the total number of tasks and m is the total taskswitcher (); if ( s t a t u s ) ENABLE ( ) ; number of objects. All memory requirements are derived return result ; } from the compiled code.

void initialize (void ) { // Set up the timer , taskpool etc , // Implementation dependent 5.3 Timing Behaviour } The timing behaviour of the kernel was measured in clock- void idle (void ) { ENABLE ( ) ; cycles by sampling the RIT timer of the LPC1769. while (1) SLEEP(); } In the first series of test seen in table 1 all assume the best- case. That is, both the external and internal events have the earliest deadline, the resources are available, and the only a Listing 3: application-source.c single event occurs. However, the synchronous call is always #include ”LPC17xx . h ” #include ”uTimber . h” a constant-time operation. The number of cycles measured

typedef struct{ is defined as follows: Object obj ; // Internal state variables here . } myobj ; Internal Event The number of cycles between the release- myobj objt2 = { initObject ( OBJT2 PL ) } ; time of the task and the execution of the first instruc- { } myobj objt3 = initObject ( OBJT3 PL ) ; tion of the task.

121 7. CONCLUSIONS AND FUTURE WORK Table 1: Runtime of kernel primitives. With the outset that embedded real-time systems are nat- Kernel Primitive Clock Cycles urally defined as time-bound reactions to external and in- Internal Event 138 ternal events, EDF scheduling is a natural choice. We have External Event 123 developed a scheme that through efficient software schedul- Synchronous Call 31 ing of interrupts accomplish pure EDF even on common- place platforms that deploy static priority scheduling of in- Table 2: Runtime of postpone(t) primitive, with the terrupt handlers. Furthermore, we have show how the pure given taskQ length. EDF scheme can be extended to perform SRP under EDF. Queue Length Clock Cycles This gives us efficient shedulability test, deadlock free sin- Empty 127 gle stack execution, efficient pre-emption management and Length 1 136 tightly bound priority inversion. We have shown the effi- Length 2 147 ciency of the proposed schema quantified by experiments on Length 3 160 our prototype implementations. Length 4 172 Future work include investigating the possibility of hard- ware support for EDF+SRP scheduling of internal and ex- External Event The number of cycles between triggering ternal events, as well as time-stamping of external events of an interrupt and the execution of the first instruc- (interrupts). We are also working on a complete system tion of the interrupt-task. analysis (incorporating interrupt overhead, queue-handling overhead, blocking time, worst-case execution time, etc.) for Synchronous Call The number of cycles between invoca- SRP+EDF scheduled system. As a part of the complete sys- tion of the sync method and execution of the first in- tem analysis we are investigating the impact of the underlay- struction of the code argument. ing data-structures relating to the queue-handling overhead To give an example of how the queue length impacts the and blocking time. timing behaviour we study a series of worst-case consecu- 8. REFERENCES tive invocations of the postpone(t) primitive. The results [1] 32Bit ARM Cortex-M3, NXP LP1769. from the test can be seen in table 2. The same worst-case http://www.nxp.com/pip/LPC1769 68 67 66 65 64 4.html. behaviour can be expected for all kernel primitives that in- [2] T. P. Baker. A stack-based resource allocation policy corporate queue-handling. The accuracy of time-stamping for realtime processes. In IEEE Real-Time Systems can be made free of artifacts due to blocking (dominated Symposium, pages 191–200, 1990. by the queue-handling) if the interrupt hardware supports timer capture for external events (interrupts). In our sim- [3]L.E.L.delFoyo,P.Mejia-Alvarez,andD.deNiz. plistic prototype implementation we can clearly see the lin- Predictable interrupt scheduling with low overhead for ear effect of using a linked list in the queue-handling. Other real-time kernels. Real-Time Computing Systems and data structures, such as heaps, buckets, etc. provide a signif- Applications, International Workshop on, 0:385–394, icant improvement for larger queue lengths, which leads to 2006. reduced queue-handling overhead, as well as improved accu- [4] FreeRTOS-A Free RTOS for ARM7, ARM9, racy of time-stamping in software (due to reduced blocking Cortex-M3, MSP430, MicroBlaze. time). http://www.freeRTOS.org. [5] T. Hofmeijer, S. Dulman, P. Jansen, and P. Havinga. Ambientrt - real time system software support for 6. RELATED WORK data centric sensor networks. In Proceedings of the Scheduling policies have been extensively studied from the- 2004 Intelligent Sensors, Sensor Networks and oretical perspectives, and are at least for single core/single Information Processing Conference, pages 61–66. CPU systems to be considered as being well understood [11]. IEEE Computer Society Press, 2004. However, in practice results apply only if the model used [6] K. Jeffay and D. L. Stone. Accounting for interrupt for analysis corresponds to the system at hand. Work on handling costs in dynamic priority task systems. pages scheduling theory often undertakes an ideal system model, 212–221, 1993. neglecting scheduling overhead and interrupt handling. In [7] P. Lindgren, J. Eriksson, S. Aittamaa, and [6] the cost of additional interrupt handling is included in the J. Nordlander. Tinytimber, reactive objects in c for feasibility and schedulability test. However, in their model real-time embedded systems. In DATE,pages interrupt handlers are treated separately from application 1382–1385. IEEE, 2008. tasks (interrupts being scheduled at a higher priority). Our [8] C. L. Liu and J. W. Layland. Scheduling algorithms model differs in that interrupt handlers are indeed treated for multiprogramming in a hard-real-time as being part of the application, where the occurrence of an environment. J. ACM, 20(1):46–61, January 1973. interrupt corresponds to the release of a task. This inte- [9] O. V. Portal. http://www.osek-vdx.org. grated task and interrupt management model can also be found in [3]. However, in our work we focus on resource [10] D. C. Sastry and M. Demirci. The qnx operating constrained systems under EDF and extend the results to system. Computer, 28(11):75–77, 1995. EDF-SRP scheduling for systems with shared resources. In [11] J. A. Stankovic, M. Spuri, M. D. Natale, S. S. S. the context of stack based EDF schedulers, we also find Am- Anna, and G. C. Buttazzo. Implications of classical bientRT [5]. However, under AmbientRT external events are scheduling results for real-time systems. IEEE treated separately from the application task set. Computer, 28:16–25, 1995.

122 Paper H Real-Time Execution of Function Blocks for Internet of Things using the RTFM-kernel

Authors: Per Lindgren, Marcus Lindner, Andreas Lindner, Johan Eriksson, Valeriy Vyatkin

Originally published in: IEEE Emerging Technology and Factory Automation (ETFA 2014)

c 2014, IEEE

123 124 Real-Time Execution of Function Blocks for Internet of Things using the RTFM-kernel

Per Lindgren, Marcus Lindner, Andreas Lindner, Johan Eriksson, Valeriy Vyatkin Lulea˚ University of Technology {per.lindgren, marcus.lindner, andreas.lindner, johan.eriksson, valeriy.vyatkin}@ltu.se

Abstract—Function Blocks provides a means to model and asynchronous events. However, the rendered model is still program industrial control systems. The recently acclaimed IEC compliant to the IEC 61499 standard, as the synchronous 61499 standard allows such system models to be partitioned and events can be seen as just immediately served asynchronous executed in a distributed fashion. At device level, such models are traditionally implemented onto programmable logic controllers events. that underneath have an operating system and a software For the resulting model, we propose a mapping onto the run-time environment which implies high resource demands. minimalistic RTFM-kernel [6] which, as based on the Stack However, there is a current trend to involve small embedded Resource Policy [7], brings both real-time scheduling and systems (so called Internet of Things devices) integrated into such opportunities for off-line (design time) analysis with respect distributed control systems. To this end, we seek to address the outsets for real-time execution of Function Block based designs to response time, overall schedulability and memory require- onto light-weight controllers (MCUs) with limited resources ments. (memory and CPU). Furthermore, we propose a mapping of the Function Block execution semantics onto the RTFM-kernel, II. BACKGROUND and discuss opportunities for off-line (design time) analysis with respect to response time, overall schedulability and memory This section gives brief overview of the IEC 61131/61499 requirements. standards, for detailed information we refer to recent publica- tions [8] and the IEC standard documents [9]. I. INTRODUCTION Industrial control systems are traditionally implemented A. IEC 61131 using Programmable Logic Controllers (PLCs). In order to The IEC 61131 consist of 8 parts [9]. The IEC 61131-3 improve portability of control applications standards such as standard (third edition as adopted in 2013) establishes a com- IEC 61131-3 [1] have been established. As a part thereof mon basis for data exchange (types for common elements) and Function Blocks have been introduced in order to provide a a number of alternative technologies to model and implement component based means to modularize designs and improve control applications, namely: design efficiency through reusability. Recently, the standard IEC 61499 [2] has been approved, • Ladder diagram (LD) which introduces event based communication and adds the • Function block diagram (FBD) possibility of distributing applications onto a network of • Structured text (ST) devices. This opens up for the opportunity to integrate small • Instruction list (IL) (but intelligent) embedded sensors and controllers into larger • Sequential function chart (SFC) control systems, and thus can draw on the current trend In brief, these technologies can be summarized as follows: towards Internet of Things (IoT). Traditional implementation With a heritage to non-computerized control systems, Ladder techniques for industrial PLCs rely on real-time operating Diagrams (LDs) offer means to graphically express the logic systems and software run-time environments that imply high relation in between inputs and outputs of a PLC. Function resource demands, hence prohibit the deployment to light- Blocks (FBs) offer capabilities to graphically structure the weight IoT devices. While tailored to resource limited devices, control system (application) into interconnected components, typical IoT operating systems like TinyOS [3] and Contiki [4] which improves the modularization and reuse. Structured text on their hand lacks native real-time support. (ST) is a textual representation of Function Blocks into a To this end, this paper attempts to address the outsets for Pascal-like language. Instruction Lists (IL), is another textual real-time execution of Function Block based designs onto representation that offers a low level (assembly like) program- light-weight controllers (MCUs) with limited resources (mem- ming model. To complete the picture, Sequential Function ory and CPU). Chart (SFCs) provides means to organize programs for sequen- Additionally, the IEC 61499 model of event-passing has tial and parallel control processing, similar to state diagrams. been criticized in [5] for not supporting both blocking and non- Through the IEC 61131 standard definitions of data types blocking interactions. In this paper we introduce an extension (common elements), applications may comprise parts using of the model that allows to distinguish synchronous and any of the aforementioned technologies.

125 B. IEC 61499 can serve as inspiration for finding more efficient execution Under IEC 61499 the overall interaction in between system solutions. components is event-based, where each event is associated 2) 4DIAC: The open source 4DIAC framework [11] com- with a (possibly zero) number of data inputs or outputs. prises an Eclipse based IDE and the FORTE run-time sys- Function Blocks and their representation in terms of Struc- tem. For execution system models are compiled into C++ tured Text together with Sequential Function Charts were the classes, which are compiled together with the FORTE run- outset for establishing the IEC 61499 industrial standard. The time core. System configuration is done dynamically, which main objectives of IEC 61499 were to provide means for allows making changes to the model during run-time (e.g., distributed control, and to improve on the IEC 61131 in terms creating/deleting FB instances and communication paths). of flexibility and reusability. Main contributions involve: Real-time properties of applications have been achieved as discussed in [12]. • Event based interaction/triggering of execution, 3) nxtStudio: The nxtStudio (nxtControl) framework [13] • Composite Function Blocks (CFBs), provides a commercial alternative. The IDE is .NET based • Execution Control Charts (ECCs), (hence limited to the Windows operating system) and supports • Service Interface Function Blocks (SIFBs), and the development of advanced GUI interfaces. The run-time • means for management and distribution. (nxtRT61499F) is based on an early adoption of FORTE. It In short, composite function blocks, as the name suggests, currently supports a large variety of platforms from industrial allows FBs to be organized and instantiated in a hierarchical vendors like Beckhoff, Wago, Siemens and other IPC/PAC manner, improving the modularization of code. ECCs allow vendors. The run-time operates under RTOS32, Windows the behavior of a function block to be defined in a state Embedded/Windows CE, Linux/uLinux and eCos. (In common machine like manner, where state transitions are associated these operating systems all provide thread support). with specific actions (algorithms for computations, setting 4) ISaGRAF: The ISaGRAF Workbench [14], is another output data and emitting output events). Additionally, to facil- commercial platform supporting the IEC 61131 and IEC itate distribution, the standard introduces means for dynamic 61499 standards. It is based on the Microsoft Visual Studio configuration and the notions of server/client and publish/sub- Shell. With a heritage to industrial PLCs, the run-time system scribe relationships. reflects the scan-based execution model. That is, repeatedly At large, a (distributed) system may consist of several with a fixed time interval, input signals are sampled, the applications, where each application can be mapped to dif- program is executed, and output signals are generated. This ferent resources. Each device (controller) hosts one or several can guarantee worst-case execution times for the entire scan. resources. When spanning applications over different devices, As such the scan-based model does not natively lend itself service interface function blocks for network communication well to exploiting interrupt centric architectures such as light- are used to convey events and their associated data. weight IoT platforms. C. Tool Support 5) BlokIDE and synchronous compiler: BlokIDE [15] is a Microsoft Visual Studio based integrated development en- There are several tools (toolchains) that are based on the vironment that aims to facilitate the design of safety-critical IEC 61499 standard. They differ in covering as well as ex- embedded systems. The main components of BlokIDE are vi- tending features of the specification, and systems rendered may sual model editor and Function Block Compiler [16] that gen- express different behavior depending on vendor/tool specific erates efficient C-code following the synchronous execution implementation choices. Also they (typically) rely on different model [17]. While this model and compiler have shown high run-time system paradigms and implementations techniques efficiency, they do not conform to the IEC 61499 standard. making them suitable to different application scenarios. For a recent (2012) overview of tools we refer the reader to [8]. III. IEC 61499 EXECUTION SEMANTICS In the following we discuss some alternative implementa- tions with a focus on their run-time support for light-weight The IEC 61499 standard covers distributed control from embedded platforms, in Section III we will discuss execution a holistic perspective. For this work, we focus on device semantics in more depth. For this presentation we focus on a level issues, and in particular the implications to scheduling number of representative and currently supported toolchains, and resource management from a general real-time systems the open source FBDK and 4DIAC frameworks, BlokIDE that perspective. has been developed in academia and the commercial nxtStudio Execution semantics are informally specified in the standard and ISaGRAF Workbench. which leaves some room for interpretation by implementers. 1) FBDK: Initially FBDK [10] was designed/developed (There is as of 2013 no reference implementation of IEC (by Holobloc) as a prototype implementation using Java both 61499, 4DIAC is currently considered to be the closest to the for implementing function blocks as well as for the run-time approved standard.) system (FBRT). As such FBDK is not particularly suitable to The IEC 61499 standard stipulates an (implicitly asyn- light-weight embedded systems, due to large memory footprint chronous) event-based communication model, where execution and run-time overhead. However its run-time execution model is triggered by the occurrence of a corresponding event.

126 A. Function Block Types A Function Block (FB) is described in terms of a set of input events with associated input data variables, and a set of output events with associated output data variables. 1) Basic Function Blocks: For basic FBs the functionality is described in terms of an Execution Control Chart (ECC). 2) Service Interface Function Blocks: Internal operation of a SIFB is implementation/vendor dependent and is not explicitly covered by the IEC 61499 standard. However, their interaction behavior can be expressed/specified in terms of a Service Sequence. SIFBs are used to introduce timers/de- lay elements, to implement interaction with the underlying hardware, interfacing external code (such as communication Fig. 1. ECC operation state machine. libraries, etc.) and to convey events and data (e.g., buffering). 3) Composite Function Blocks: A composite FB can be used to encapsulate a FB-network into a component for reuse (S2). Each action identifies an (optional) algorithm to be purposes. The encapsulated FB-network can consist of basic executed and (optional) output event. Algorithms can mutate FBs and SIFBs as well as another composite FBs. From a (read/write) input, internal, and output variables (hence as a programmer’s perspective it will be seen as a FB without any side effect change the values under which transition conditions associated ECC. are evaluated). When an output event is handled, the associated B. IEC 61499 Resources, Devices and Scheduling data output variables should be atomically written to the environment (before the corresponding output event is visible). Each FB (including Composite FBs) instantiated in an After the sequence of actions have been performed (t4), further application needs to be mapped to a resource residing on possible transitions are inspected and taken (t3) until no further a target device. All incoming and outgoing events from a transition is possible (t2). An input event is considered to be resource need to go through a SIFB. consumed when taking either branch t3 or t2. 1) Resource Scheduling: Each resource is associated with an event scheduler. Events should be provided from the C. Implication to scheduling and input/output behavior scheduler in sequential manner (each delivery contains a single event). When an event is delivered to an FB the associated The execution semantics are not formally defined (and data variables are read (sampled) from the incoming wire, event handling ambiguities at resource level have already and correspondingly, when an event is emitted from an FB the been discussed). This has led to different interpretations and associated data variables are written to the outgoing wire. To implementation approaches, rendering diverging and incon- ensure consistency, events and their associated data operations sistent input-output behavior for the same model (depending should be treated atomically. Besides that, the standard leaves on run-time and operating system specifics). This goes far eventing semantics (at resource level) undefined. In effect, this beyond what would be considered as sound non-determinism, leaves event delivery order and buffering mechanism (size, making it hard or even impossible to reason on correctness, event/data association, ageing, overflow handling, etc.) up to replaceability, interoperability etc. This is non-satisfactory, as the implementation. it prohibits re-use of models cross vendors/frameworks. The 2) ECC Execution Semantics: The ECC defines the state lack of formal semantics also limits the analysis of FB model transition conditions and corresponding actions. These condi- behavior to the specific semantics of the run-time system at tions are of three forms: hand. • Form 1: a (single) input event, • Form 2: logical expression on input variables, internal D. Implementation specifics variables and output variables, and The FBDK implementation undertakes a strictly syn- • Form 3: Form 1 in conjunction with a Form 2 expression. chronous approach to the event handling mechanism. In be- The ECC execution mechanism is captured by the state tween FBs associated to the same resource, output event(s) machine in Figure 1, starting in S0. associated to an action are scheduled synchronously (executed When an input event is handled, the associated input as a function call). In case such events transitively form a cycle variables are read (sampled) from the environment (wire) (inside a resource), the design should be considered faulty. (t1 in Figure 1), and possible ECC transitions are inspected However, if (along the synchronous event path) inserting a in S1. (If numerous possible transitions are possible, the SIFB (e.g., zero-delay element), the synchronous cycle can standard stipulates the natural order from the underlying XML be avoided resulting in a valid FB model. (On the other hand specification.) implementations such as the FORTE run-time, silently handles When a transition is taken in the ECC (t3), a sequence of such problems through internal buffering, but the execution associated actions (defined in the target state) are performed semantics have not been formally defined.)

127 Under FBDK, in compliance to the current standard, events and a LED SIFB. Both are manually written to interface the ac- are considered as consumed (input event variable set to false) tual hardware of the used test board (LPCXpresso LPC1769). when entering S1 in Figure 1, while the FORTE based The timer SIFB provides a ”one-shot” timer functionality nxt61499F run-time currently requires the programmer to from the internal timer of the microcontroller. The LED SIFB explicitly assign the input event variable to false,atany provides an interface to control a LED connected to one place where considered suitable. The latter clearly does not of the output ports of the microcontroller. The SIFBs are follow the stipulated ECC behavior in the approved IEC 61499 implemented in C under the simple and intuitive task model standard. of RTFM provided by the RTFM-kernel API. The PWM signal is generated with two chained timer Func- IV. MAPPING OF IEC 61499 TO RTFM-KERNEL tion Blocks, delayOn and delayOff. Function Block delayOff SCHEDULING is responsible for the switched-on time of the signal, i.e., it Eriksson et al. introduce in [6] a light-weight scheduling gives the delay until the signal is switched off. Respectively, system (RTFM-kernel) with the notions of tasks and resources the delayOn Function Block sets the switched-off time of the based on the Stack Resource Policy (SRP) [7]. A mapping signal. Both timer FBs run alternately (fire each other) and of IEC 61499 FBs to the RTFM-kernel is proposed in [18]. toggle the LED after the respective delay value is expired. It gives an unambiguous execution semantics, and allows The select FB sets the LED value regarding the corresponding to reason on properties of the system in terms of memory timer event. requirements (static footprint, heap and stack) as well as The duty cycle of this PWM generator is the ratio between response time(s) and overall schedulability. the DELAY value of delayOff and the sum of both DELAY In order to map an IEC 61499 model to the RTFM- values. By setting the DELAY values of the timer FBs the kernel, a necessary extension to the IEC 61499 standard is period and duty cycle of the FB modeled software PWM also proposed in [18]. This extension distinguishes between generator are configured. In the shown example the duty cycle synchronous and asynchronous events in an FB network, is fixed to 10% (ratio of 1:(1+9)). In a more sophisticated which results in a simple function call (synchronous events) or FB network the DELAY values could change dynamically by pending of a corresponding task (asynchronous event) in the setting it from other FBs via output data connections. RTFM model (see presented example in section V). In conse- The units of the DELAY values are determined by the quence a chain of synchronous events, with an asynchronous source code of the SIFBs and can be changed according event at the beginning, is mapped to a task with specifiable to requirements. The PWM period and thus the maximum priority. frequency is dependent on the given duty cycle and the units for the DELAY values. In the illustrated solution the resolution V. E XAMPLE:SOFTWARE EMULATED PWM SIGNAL of the duty cycle is freely selectable, but affects the frequency. GENERATOR To show the advantage of the proposed mapping we give in this section an example of a common use case in embedded systems. It exemplifies that the proposed mapping to the light-weight RTFM-kernel provides efficient implementations with establishing real-time properties for resource limited controllers. The use case is designed to evaluate the feasibility and the performance of our approach for the particular problem at hand (PWM signal generation), hardware support may of course be available. However, it leads us to conclude that the implementation is extremely efficient, allowing for both implementing systems with hard real-time requirements, as well as low power applications, since the system can be put to sleep in between the occurrence of asynchronous events (such Fig. 2. FB network for software emulated PWM signal generator. as the timer events shown). Figure 2 shows a Function Block network for a simple PWM application which dims a LED. For a better overview A. IEC 61499 Events, Tasks & Priorities the initialization chain (wires between INIT and INITO event The select Function Block in the given example emits ports at the Function Blocks) is omitted. After the initialization synchronously confirmation (CNF) events. Listing 1 simplifies (INIT events have been processed) the control FB emits a the corresponding function in pseudo code. A transformation START event (in this case triggering FB delayOff). This leads to C code and RTFM-kernel primitives is simple and straight- to an (infinite) periodic triggering of the FB network. forward. Before the output data is copied to the wire, which is For interaction with the environment (i.e., the peripherals of a global resource, in general, if shared (i.e., a wire having mul- the microcontroller) there are two types of SIFBs, a timer SIFB tiple sources) the associated resource should be locked in order

128 to prevent a potential race condition. After writing all output a maximum rate of 500 kHz PWM frequency on the 100MHz data values associated with the corresponding emitting event, microcontroller. the event handler of the receiving FB is called synchronously. This means the sum of the DELAY values for the delayOn and delayOff blocks (i.e. the PWM period) must not be smaller Listing 1. synchronous event emission than the 200 processor cycles needed for the execution of the 1 function select_CNF_handler() whole FB network. Otherwise the timer interrupts for changing 2 LOCK(wireA) the signal interleave each other. 3 wireA = select.DO Yoong et al. pursue in [17] a similar approach. They propose 4 UNLOCK(wireA) a translation from FB networks to the synchronous language 5 call led_REQ_handler() Esterel. The resulting program does not need a run-time environment anymore for executing the FB network, but still An asynchronous event is handled as shown in listing 2. runs on top of a full-fledged operating system. Evaluating When the underlying timer hardware generates an interrupt, the performance (measuring the execution time) is due to the corresponding ISR delayOff Event handler() is pended the unpredictable scheduling behavior of the thread-based by the hardware (and eventually dispatched). In turn the operating system only statistically possible. For this reason ISR calls the receiving event handler (i.e., select REQ2 - the authors executed each network one million times to get an handler()). In effect, the chain of synchronous events up to average value. the FB LED are treated by the same ISR. Our approach also omits an operating system and we run Priorities for asynchronous events are given as part of the the FB network directly on the hardware without any involved event definition and their correct usage is ensured by running caching mechanisms on the used platform. For this reason the interrupt handler with the specified priority. we have no variances in the execution time and can infer the performance based on the measurement of one PWM Listing 2. asynchronous event handler period. Furthermore, the compiled program has a size of only 1 \\ executed by interrupt handler a few hundred bytes, compared to the typical program storage 2 task{prio:1} delayOff_Event_handler() usage of several megabytes by the available FB run-time 3 call select_REQ2_handler() environments. The results have formidably shown the performance of the B. Resources proposed mapping. In this way it is possible to run FB- networks on small embedded systems (IoT devices) without On arrival of an event the corresponding event handler of the the need of a resource intensive run-time environment. FB has to lock all needed resources. The REQ event handler of the FB led locks the FB itself as well as the data wire VI. TOOLCHAIN wireA in order to read the data and treat the incoming event We implemented the proposed theoretical mapping by de- (see listing 3). veloping a framework (RTFM-4-FUN) to perform automatic code generation from device level FB models to RTFM- Listing 3. incoming event handling 1 function led_REQ_handler() kernel based executables. IEC 61499 models (FBs and sys- 2 LOCK(led) tem descriptions) are created using the open source 4DIAC 3 LOCK(wireA) framework, where FB descriptions are annotated by input 4 led.VALUE = wireA event to output event relations. Output events can be annotated 5 UNLOCK(wireA) as being either asynchronous or explicitly synchronous. For 6 led.ID = 1 asynchronous events, priority and inter-arrival time can be 7 led.executeEventREQ() optionally set. The event annotations are used by the RTFM- 8 UNLOCK(led) 4-FUN tool to generate C code, which using the RTFM- kernel primitives can be executed on bare metal. The prototype implementation has been successfully tested on the ARM C. Evaluation Cortex M3 (LPC1769) and results validate its feasibility and We compiled the given FB network example for a LPCX- indicate its potential performance benefits. presso LPC1769 ARM Cortex M3 based development board. The microcontroller on the board runs at 100MHz and the VII. RELATED WORK hardware timer used in the SIFB TIMER32 Function Block As already mentioned, the IEC 61499 standard has already is a free running timer connected to the system clock of given rise to a number of tool chains, but as of now there exist the MCU. We counted the actual processor cycles for the no complete reference implementation [8]. The standardization execution of one PWM period (number of system clock steps), committee is currently working to further refine the specifi- which is exactly one iteration of the FB network. Setting the cation in order to clarify the execution semantics (especially highest C compiler optimization level leads to less than 200 at resource level). An interesting approach is found in [17], processor cycles for executing one PWM cycle. This results in taking the outset from synchronous languages. They propose a

129 tations of the execution semantics. As a main contribution of the paper, we discuss in detail the outset for light-weight scheduling and propose a theoretical mapping of device level FB models onto the RTFM-kernel for real-time execution. As based on SRP scheduling, the mapping provides an unam- biguous execution semantics to device level FBs as well as a plethora of available methods for offline analysis.

REFERENCES [1] “International Standard IEC61131-3: Programmable Controller - Part Fig. 3. Example system with a one shot timer (Timer), triggering a restart 3, Programming Languages,” Geneva, Switzerland: Int. Electrotech. of the Timer (START) and the reading of a hardware register (In) and that in Commission, p. 230, 1993. turn triggers the writing of another hardware register (Out). The Trig event is [2] “International Standard IEC 61499: Function Blocks - Part 1, Architec- asynchronous, with a priority of 1. The output event DR is given the priority ture,” Geneva, Switzerland: Int. Electrotech. Commission, 2005. 0, hence in relation, the sampling of data should be conducted at a higher [3] P. Levis, S. Madden, J. Polastre, R. Szewczyk, A. Woo, D. Gay, J. Hill, priority (less jitter) than the writing of data. M. Welsh, E. Brewer, and D. Culler, “TinyOS: An operating system for sensor networks,” in in Ambient Intelligence. Springer Verlag, 2004. [4] A. Dunkels, B. Grnvall, and T. Voigt, “Contiki - a lightweight and flexible operating system for tiny networked sensors,” in Proceedings Globally Asynchronous Local Synchronous (GALS) execution of the First IEEE Workshop on Embedded Networked Sensors model, where (instead of delivering events in a sequential man- (Emnets-I), Tampa, Florida, USA, Nov. 2004. [Online]. Available: ner) all events are made available to all function blocks. Within http://dunkels.com/adam/dunkels04contiki.pdf [5] K. Thramboulidis, “IEC 61499: Back to the well proven practice of IEC a FB network execution is concurrent and locally synchronized 61131,” in Emerging Technologies Factory Automation (ETFA), 2012 in terms of ticks. The execution model is formalized which IEEE 17th Conference on, Sept 2012, pp. 1–8. opens up for reasoning and analysis. The approach differs [6] J. Eriksson, F. Haggstrom, S. Aittamaa, A. Kruglyak, and P. Lindgren, “Real-time for the masses, step 1: Programming API and static priority from ours in the sense that they deliberately stepped aside SRP kernel primitives.” in SIES. IEEE, 2013, pp. 110–113. [Online]. from the IEC 61499 execution semantics. Yoong [19] also Available: http://dblp.uni-trier.de/db/conf/sies/sies2013.html propose an instantaneous execution model for FBs similar to [7] T. Baker, “A stack-based resource allocation policy for realtime pro- cesses,” in Real-Time Systems Symposium, 1990. Proceedings., 11th, our proposal. However, we approach the problem from a real- Dec. 1990, pp. 191 –200. time scheduling perspective, identifying tasks/priorities and [8] J. Christensen, T. Strasser, A. Valentini, V. Vyatkin, A. Zoitl, resources. J. Chouinard, H. Mayer, and A. Kopitar, “The IEC 61499 Function Block Standard: Software Tools and Runtime Platforms,” in ISA Automation Week, 2012. [Online]. Available: VIII. ONGOING AND FUTURE WORK http://www.holobloc.com/papers/iec61499/61499 SWT RTP.pdf Future work includes detailing the implementation tech- [9] IEC standards. (2014, Mar.) International Electrotechnical Commission. [Online]. Available: http://www.iec.ch/ niques (that involves specialization of FB functions) and [10] Holobloc Inc. (2014, Mar.) Holobloc. [Online]. Available: assess performance on real use cases, e.g. the exploitation http://www.holobloc.com/ of RTFM-4-FUN into timing critical, wireless, distributed [11] PROFACTOR GmbH. (2014, Mar.) 4DIAC. [Online]. Available: http://www.fordiac.org/ monitoring and control applications. The underlying execu- [12] A. Zoitl, Real-Time Execution for IEC 61499. Durham, North Carolina: tion model opens up for offline analysis, e.g., response time International Society of Automation, 2008. and overall schedulability (useful e.g., to validating real-time [13] nxtControl GmbH. (2014, Mar.) nxtStudio. [Online]. Available: http://www.nxtcontrol.com/ performance). Under the RTFM-kernel all tasks are running on [14] (2014, Mar.) ISaGRAF. [Online]. Available: http://www.isagraf.com/ a shared (execution) stack. To derive safe and tight maximum [15] L. Yoong, Z. Bhatti, and P. Roop, “Combining iec 61499 model-based stack requirement is another topic set for future investigation design with component-based architecture for robotics,” in Simulation, Modeling, and Programming for Autonomous Robots, ser. Lecture Notes based on previous work [20]. in Computer Science, I. Noda, N. Ando, D. Brugali, and J. Kuffner, Synchronous events introduced in this text, does not allow Eds. Springer Berlin Heidelberg, 2012, vol. 7628, pp. 349–360. data to be returned synchronously. As seen in the example in [Online]. Available: http://dx.doi.org/10.1007/978-3-642-34327-8 32 [16] L. H. Yoong, P. Roop, and Z. Salcic, “Efficient implementation of iec Figure 3, an asynchronous communication is needed to read 61499 function blocks,” in Industrial Technology, 2009. ICIT 2009. IEEE the hardware register. We will further investigate the impli- International Conference on, Feb 2009, pp. 1–6. cations of extending the IEC 61499 standard by synchronous [17] L. H. Yoong, P. S. Roop, V. Vyatkin, and Z. Salcic, “A Synchronous Approach for IEC 61499 Function Block Implementation,” IEEE Trans- data retrieval. Moreover, in the example, the re-start of Timer actions on Computers, vol. 58, no. 12, pp. 1599–1614, 2009. would suffer from accumulated drift (due to scheduling jitter), [18] P. Lindgren, M. Lindner, A. Lindner, J. Eriksson, and V. Vyatkin, this could be eliminated by associating events with time “RTFM-4-FUN,” in SIES. IEEE, 2014. [19] L. Yoong, Modelling and Synthesis of Safety-critical Software stamps, another topic set for subsequent investigation. with IEC 61499. Electrical and Electronic Engineering - University of Auckland, 2010. [Online]. Available: IX. CONCLUSIONS http://books.google.se/books?id=yHvYZwEACAAJ [20] J. Eriksson, S. Aittamaa, and P. Lindgren, “Refined SRP Stack Memory In this paper we have discussed the IEC 61499 standard Analysis by Exploiting Critical Sections for Shared Resources,” in with a focus on scheduling FB based models onto resource Proceedings of the Embedded Real-Time Software and Systems - ERTS2, constrained platforms. We highlight a number of known am- 2014. biguities in the specification, and discuss alternative interpre-

130 Paper I Leveraging TinyOS for Integration in Process Automation and Control Systems

Authors: Per Lindgren, Henrik M¨akitaavola, Johan Eriksson, Jens Eliasson

Originally published in: 38th Annual Conference on IEEE Industrial Electronics Society (IECON 2012)

c 2012, IEEE

131 132                   

$ $ $   !$ $ $ $ $   $ $  $ $  $ $ $ $  $ ! #$ $ #$     $ "$

)+(+3 #-"'3 #3 $"%!0 ,13 $3 #,/$' 3 *#*$'*3 kmju\PQP}mjNsop}dQpvjm_\dZ}OLkLN`Q}jR}LPPmQoo\dZ}L}`LmZQ}dsb{ #3 ,-,$'*3 #3 #-*,' !3 "$# ,$' #3 #3 $#,'$!3 *1*,"*3 *3 NQm} jR} \dPsopm\L`} bjd\pjm\dZ} LdP} Ojdpmj`} Lkk`\OLp\jdo } B[QoQ} '% !13 #'* #3  *3 !!*3 $'3 0 !3 1,3  #,3 ",$*3 pykQo} jR} pQO[dj`jZ\Qo} [LuQ} NQQd} soQP} \d} p[Q} \dPsopmy} \d} ojbQ} /',3 , "3 #3 "$#13 $'3 * # #3 %!$1 #3 #3 " #, # #3 OLoQo}Rjm}bjmQ}p[Ld}p[\mpy}yQLmo}LdP}[LuQ}\d}Ojbbjd}p[Lp}p[Qy} *-3*1*,"*3 $3, *3 #3 '. 3 ' #,3 ' ,,-'*3  *3 #3 / '!**3 ,#$!$ *3 '3 $'*#3 ,$3 %!13 "%$',#,3 '$!*3 LmQ} PQo\ZdQP} pj}pmLdoRQm}obL``}Lbjsdp} jR}oQdojmy} \dRjmbLp\jd} #3 ,3 '3 $3  '!**3 #*$'3 ,/$' *3  *3  #13 jSqQd} v\p[} p\bQ OjdopmL\dpo} juQm} L} o\bk`Q} OLN`Q} \dopL``Lp\jd } 3 *3  #3 / 3 *%'3 -*3 " #!13 -*3 ,3 $'*3 3 /jvQuQm} p[QoQ} pQO[dj`jZ\Qo} [LuQ} djp} NQQd} PQo\ZdQP} Rjm} p[Q} * "%!3 %'$'"" #3 "$!3 $'$.'3 3 $"*3 / ,3 3 '13 pykQ} jR}ZQdQmL` ksmkjoQ}B(=30=} Ojisd\OLp\jd} kmQRQmmQP} Rjm} "3 $3 *3 3 %'$,$$!3 *, *3 #3 ,$3 "%!"#,3  3 A;$} v[\O[} bQLdo} p[Lp} b\ZmLp\jd} \o} djp} opmL\Z[p} RjmvLmP} #!3 . *3 #3 *3 . !!3 $'3 3 !'3 #-"'3 $3 ! ,2 / ,3 ,',3 %!,$'"*3 $/.'3 3 *3 1,3 ,$3 " 3 ,*3 F\mQ`Qoo} oj`sp\jdo} osO[} Lo} F\mQ`Qoo} oQdojm} dQpvjm_o} LdP} /13 #,$3 #-*,' !3 %%! , $#*3 /'3 '!, "3 $%', $#3 *3 F\-\} [LuQ} djv} mLk\P`y} ZL\dQP} \dpQmQop} LdP} LOOQkpLdOQ} Xjb} '&- '3 / 3 *3 ,1% !3 ,$3 "$# ,$' #3 #3 $#,'$!3 *1*,"*3 p[Q}\dPsopmy } F\mQ`Qoo}*p[QmdQp} F\-\} LdP} F\mQ`Qoo/$?B} LmQ} *3  #3 * #3 %' "' !13 / ,3 * "%!  ,13 #3 " #3 ,3 3 pvj} pQO[dj`jZ\Qo} p[Lp} [LuQ}NQQd} v\PQ`y}LPjkpQP} Rjm}\dPsopm\L`} 0-, $#3 "$!3 $'3 ,* *3 *3 #$#%'"%, .3 ! " , #3 *1*,"3 soQ } F\-\} v\p[}NLdPv\Pp[o}sk}pj}oQuQmL`}[sdPmQP} 8N\ro} OLd} '*%$#* .#**3 #3 *-! ! ,13 $3 $.'$"3 , *3 %'$!"3 %'"%, .3 '*3 *3 #3 #,'$-3 $/.'3 , *3 oskkjmp} u\mpsL``y} Ldy} oQmu\OQ } /jvQuQm} F\-\} \o} djp} os\pLN`Q} #,'$-*3 ,3  , $#!3 $"%!0 ,13 $3 ,' , $#!3 "-!, ,'3 Rjm} \dPsopm\L`} bjd\pjm\dZ} LdP} Ojdpmj`} \d} \po} OsmmQdp} PQo\Zd } %'$'"" #3 ,-*3 ,3 " #3 #,3 $3 3 *3 !$*,3 #3 , *3 F\mQ`Qoo/$?B} v\p[} Ns\`p \d} oskkjmp} Rjm} bQo[} dQpvjm_o} Qw{ %%'3 /3 %'*#,3 #3 !,'#, .3 0-, $#3 "$!3 $'3 3 ,,3 pQdPo}p[Q}/$?B}kmjpjOj`}\dpj}p[Q}v\mQ`Qoo}PjbL\d }$``}RQLpsmQo} !!$/*3 %'"%, .3 0-, $#3 / !3 %'*'. #3 ,3 * "%!  ,13 $3 Xjb}p[Q}/$?B} okQO\VOLp\jd}OLd}NQ}mQ soQP}\d}v\mQ`Qoo/$?B} . #,3 3 3 0"%! 13 ,3 "%,3 $3 *-! #3 ,$3 3 ,1% !3 *#*$',-,$'3 #$3 *#' $3 -'3 '*-!,*3 # ,3 ,,3 ,3 v[\O[} QdLN`Qo} L} oQLb`Qoo} pmLdo\p\jd} Xjb} v\mQP} pj} v\mQ`Qoo } %'$%$*3 %'"%, .3 0-, $#3 "$!3 *3%!3 $3 '- #3 $,3 ;p[Qm} pQO[dj`jZ\Qo} LmQ} I\Z%QQ} LdP} 7jF=$9} v[\O[} L`oj} !13 #3 '$%3 ',3 $'3 ,3  .#3 *#' $3 RQLpsmQo} P\mQOp} oskkjmp} Rjm} bQo[} dQpvjm_o} OLd} NQ} Ojdo\PQmQP} Lo} OLdP\PLpQo} Rjm} A;$} QdLN`QP} PQu\OQo } %y} so\dZ} opLdPLmP{ 0 } +3964$:"9*43i \zQP} 0dpQmdQp} kmjpjOj`o} osO[} Lo} B(=30=} \dpQmjkQmLN\`\py} v\p[} B[Q} dsbNQm} LdP} Ojbk`Qw\py} jR} dQpvjm_QP} oQdojmo} LdP} LO{ Qw\op\dZ} XLbQvjm_o} v\``} NQ} kjoo\N`Q } psLpjmo}\d}\dPsopm\L`}bjd\pjm\dZ}LdP}Ojdpmj`}oyopQbo}\o} mLk\P`y} 0d} p[Q} OLoQ} jR} pmLP\p\jdL`} dQpvjm_QP} PQu\OQo} v[QmQ} StdO{ \dOmQLo\dZ } B[\o} OL``o} Rjm} WQw\N`Q} yQp} QRVO\Qdp} bQp[jPo} v m p } p\jdL`\py}Nj\`o}Pjvd}pj}0 ;}LdP}VwQP}mLpQ}PLpL}pmLdoRQm}o\bk`Q} p\bQ} LdP} bjdQy} Rjm} PQo\Zd\dZ} PQk`jy\dZ} LdP} bL\dpL\d\dZ} LkkmjLO[Qo} pj}ojSqvLmQ} PQo\Zd} \bk`QbQdpLp\jd}LdP} uL`\PLp\jd} osO[} oyopQbo } Bj} p[\o} QdP} AQmu\OQ} ;m\QdpQP} $mO[\pQOpsmQo} [Lo}NQQd}osOOQooRs``y}Lkk`\QP }/jvQuQm}RsdOp\jdL`\py}Rjm}A;$} A;$o} LdP} v\mQ`Qoo} pQO[dj`jZ\Qo} LmQ} RjmQoQQd} pj} k`Ly} \b{ QdLN`QP} kjoo\N`y} v\mQ`Qoo} PQu\OQo} \o} [sZQ`y} bjmQ} Ojbk`Qw} kjmpLdp} mj`Qo } Q Z } oQmu\OQ} kmjOQoo\dZ} LdP} oQOsm\py} oO[QbQo } F[QmQLo} p[Q} BmLP\p\jdL``y} RsdOp\jdL`\py} [Lo} NQQd} OQdpmL`\zQP} v[QmQ} RsdOp\jdL`\py} jR} pmLP\p\jdL`} dQpvjm_QP} PQu\OQo} \o} pyk\OL``y} djPQo} LOp} Lo} P\opm\NspQP} 0 ;} v\p[} mQ`Lp\uQ`y} o\bk`Q} jkQmLp\jd } kQm\jP\OL`}Q Z }oLbk`\dZ}vL\p\dZ}Rjm}o`jp}oQdP\dZ}oLbk`\dZ} /jvQuQm} p[Q} \dpmjPsOp\jd} jR} oQmu\OQ} jm\QdpQP} LmO[\pQOpsmQo} } }}A;$} oQmu\OQo} LdP} PQu\OQo} LmQ} pyk\OL``y} QuQdp} Pm\uQd} Ny} A;$o} op\ks`LpQo} Ojbk`Qw} Lspjdjbjso} NQ[Lu\jm} jR} p[Q} PQ{ dLpsmQ} mQokjdP\dZ} pj} L} mQlsQop QuQdp} Ny} oLbk`\dZ} L} uL`sQ} u\OQo }(jbk`Qw\py}\o}Rsmp[Qm}\dOmQLoQP}\d}OLoQo}v[QmQ}kmjpjOj`o} LdP} oO[QPs`Q} p[Q} PLpL} Rjm} pmLdob\oo\jd} jm} QuQd} pL_\dZ} L} `jOL`} Rjm}oQOsmQ}Ojbbsd\OLp\jd}LmQ}mQls\mQP} Q Z } v[Qd}\dpmjPsO\dZ} PQO\o\jd} pj} Qb\p} Ld} QuQdp} NLoQP} jd} oQdojm} \dkspo} jp[Qm} v\mQ`Qoo} `\d_o } HQp} p[Q} oQdojm LOpsLpjm} jkQmLp\jd} o[js`P} bQQp} QuQdpo} LdP jm} OL`Os`LpQP} uL`sQo} J K } mQls\mQbQdpo} jR} mjNsopdQoo} kQmRjmbLdOQ} LdP} mQojsmOQ} `\b\pL{ 0d}p[Q}OLoQ}jR}ZQdQmL`}ksmkjoQ}OjbkspQmo}osO[}RsdOp\jdL`\py} p\jdo } 0d} JK} (j`jbNj} Qp } L` } `Lyjsp} L} A;$} Rjm} \dPsopm\L`} soQ} LmQ}pmLP\p\jdL``y}OLkpsmQP}Ny}bQLdo}jR}OjjkQmLp\jd}\d}NQpvQQd} LdP} OsmmQdp} opLpQ} jR} p[Q} Lmp} pQO[dj`jZ\Qo} LmQ} Rsmp[Qm} P\oOsooQP} p[Q} jkQmLp\dZ} oyopQb} \bk`QbQdp\dZ} oO[QPs`\dZ lsQsQo} QpO } \d} JK } p[Q} PQu\OQ} Pm\uQmo} `jv `QuQ`} 0 ;} p[Q} kmjpjOj`} opLO_o} bLd{ ;`PQm} pQO[dj`jZ\Qo} osO[} Lo"} /$?B} JK} $A0} JK} =mjVNso} LZ\dZ} kLO_Qp\dZ \dpQmkmQpLp\jd} jR} PLpL} LdP} p[Q} oQmu\OQoo } 0d} JK}8jPNso} JK}7jdvjm_o} JK}LdP}jp[Qmo}[LuQ}Rjm}L}`jdZ}p\bQ} osO[} oQpp\dZo} OjdOsmmQdOy} \o} sosL``y} Qwk`\O\p`y} OjPQP} \d} p[Q}

133 ojYvLmQ} so\dZ} p[mQLP\dZ} LdP} oydO[mjd\zLp\jd} km\b\p\uQo } \dP\OLpQ} p[Lp}jsm} bjPQ`} \o} Lo} ZjjP} jm}NQppQm}v m p } kLO_Qp} `joo} /jvQuQm} p[QmQ} LmQ} p[mQQ} bL^jm} PmLvNLO_o} Vmop`y} OjP\dZ} PsQ} pj} PmjkkQP} pLo_o} p[Qd} Ldy} kjoo\N`Q} kLmp\p\jd\dZ} so\dZ} Qwk`\O\p} OjdOsmmQdOy} \o} _djvd} pj} NQ} Qmmjm} kmjdQ} LdP} p\bQ} p[Q} pmLP\p\jdL`} B;A} bjPQ` } FQ} OjdO`sPQ} p[Lp} jsm} LkkmjLO[} Ojdosb\dZ} } oQOjdP`y} mjNsopdQoo} LdP} kQmRjmbLdOQ} ZsLmLdpQQo} [Lo} p[Q} kjpQdp\L`} pj} kmju\PQ} mQL` p\bQ} kQmRjmbLdOQ} jR} bs`p\} bLy}NQ}[LmP}pj}QopLN`\o[} LdP}p[\mP`y}mQojsmOQ}`\b\pLp\jdo}bLy} p[mQLPQP} LkkmjLO[Qo} o\b\`Lm} pj} JK} JK} v[\`Q} juQmOjb\dZ} NQ} u\j`LpQP } p[Q\m}LoojO\LpQP}mQojsmOQ}LdP}Ojbk`Qw\py}\oosQo}v[Qd}PQk`jyQP} B[Q}LmQL} jR}F\mQ`Qoo}AQdojm}9Qpvjm_o}FA9}o[LmQo} ojbQ} jdpj} `\Z[p vQ\Z[p} k`LpRjmbo } jR} p[Q} mQls\mQbQdpo} P\oOsooQP} QuQdp} Pm\uQd} dLpsmQ} Ojbk`Qw} 00 } B1:G;A} #43":66%3"@i 3$i &?%":9*43i14$%0i Ojbbsd\OLp\jd} kmjpjOj`o} LdP} `\Z[pvQ\Z[p} oyopQb} \bk`QbQd{ pLp\jd } Bj} p[\o} QdP} L} dsbNQm} jR} kmjZmLbb\dZ} bjPQ`o} LdP} 0p}\o}vQ``}_djvd}p[Lp}OjdOsmmQdOy}bLy}NQ}sp\`\zQP}pj}\bkmjuQ} jkQmLp\dZ} oyopQbo} [LuQ} NQQd} PQuQ`jkQP} Q Z } B\dy;AB;A} oyopQb} oO[QPs`LN\\py} LdP} mQokjdo\uQdQoo} [jvQuQm} QlsL``y} 'jdp\_\} 8Ldp\o} 9Ldj ?4} 6\pQ;A} oQQ} J!K} Rjm} Ld} juQmu\Qv } vQ``} _djvd} \o} p[Lp} OjdOsmmQdp} oyopQbo} bLy} NQ} osN^QOp} pj} $bjdZ} p[joQ} 9Ldj ?5}LdP} 7\pQ;A} kmju\PQo} mQL` p\bQ} jkQm{ mLOQ} OjdP\p\jdo} v[\O[} \d}psmd} OLsoQo}sdQwkQOpQP} LdP} jm bL`{ Lp\jd}Ny} bQLdo} jR}p[mQLP\dZ}km\b\p\uQo} v[QmQLo}p[Q}kmjZmLb{ RsdOp\jdL`} NQ[Lu\jm } 0d} p[Q} OLoQ} jR} `\Z[pvQ\Z[p} k`LpRjmbo} \p} b\dZ} bjPQ`o} jR}B\dy;A}LdP}'jd}p\_\}oLOm\VOQo}ojbQ} kjpQdp\L`} \o} L`oj} jR} \bkjmpLdOQ} p[Lp} p[Q} QwQOsp\jd} bjPQ`} O[joQd} OLd} OjdOsmmQdOy} Q Z } djd kmQQbkp\uQ} pLo_} QwQOsp\jd} \d} RLujm} jR} NQ} QRVO\Qdp`y} \bk`QbQdpQP } B[QoQ} LokQOpo} vQmQ} pL_Qd} \dpj} o\bk`\O\py } B;A}pjPLy}[j`Po}L}sd\lsQ}d\O[Q}kjo\p\jd}Lo} jdQ}jR} Ojdo\PQmLp\jd} v[Qd} PQo\Zd\dZ} p[Q} B\dy;A} QwQOsp\jd} bjPQ`} p[Q} `QLP\dZ} bjPQ`o} Rjm} FA9} kmjZmLbb\dZ } p[Q} dQo'} kmjZmLbb\dZ} `LdZsLZQ} LdP} OjbkjdQdp} bjPQ`} LdP} 0d} JK} )sTy} Qp } L` } mQkjmpo} jd} kmLOp\OL`} QuL`sLp\jdo} jR} OsmmQdp} B\dy;A} msd p\bQ} oyopQb} \dopL``bQdpo } bs`p\ p[mQLPQP} uo } QuQdp} Pm\uQd} oQdojm} djPQ} ;kQmLp\dZ} Ayo{ B\dy;A}  w} sdPQmpL_Qo} L} o\bk`\op\O} QwQOsp\jd} bjPQ`} \d{ pQbo } $o} L} vjm_\dZ} QwLbk`Q} B\dy;A} \o} OjbkLmQP} pj} 8Ldp\o } [Qm\pQP} Xjb} \po} kmQPQOQoojm} Vmop} uQmo\jd} B\dy;A}  w} Qd{ *uL`sLp\jdo} o[jv} p[Lp} 8Ldp\o} PsQ} pj} WQw\N`Q} oO[QPs`\dZ} OjbkLoo\dZ} djd kmQQbkp\uQ} msd pj Ojbk`Qp\jd} pLo_o} LdP} kmQ{ kmju\PQo} oskQm\jm} mQL` p\bQ} kQmRjmbLdOQ} v[\`Q} B\dy;A} \o} Qbkp\uQ}\dpQmmskp} [LdP`Qmo} p[Lp}LmQ}P\okLpO[QP}LoydO[mjdjso`y} bjmQ} kjvQm} QRVO\Qdp } 0d} JK} ?QZQ[m} Qp } L` } \duQop\ZLpQo} [\Qm{ Ny} p[Q} sdPQm`y\dZ} [LmPvLmQ } B[\o} L``jvo} p[Q} oO[QPs`\dZ} jR} LmO[\OL`} oO[QPs`\dZ} Lkk`\QP} pj} B\dy;A} mQos`p\dZ} \d} \bkmjuQP} QwpQmdL`} QuQdpo} pj} NQ} QUO\Qdp`y} \bk`QbQdpQP} P\mQOp`y} Ny} p[Q} mQL` p\bQ} kQmRjmbLdOQ } /jvQuQm} p[Q\m} LkkmjLO[} \o} mQls\mQo} [LmPvLmQ }

LPP\p\jdL`} mQojsmOQo} Lo} QLO[} B;A} pLo_} \o} msd} Lo} L} oQkLmLpQ} 6 "'46 /"16 ('0**'46 kmQ Qbkp\uQ} p[mQLP } A\b\`Lm`y} B;AB[mQLPo} JKB*=} [Lo} BLo_o} LmQ} Ld} Qwk`\O\p} Qdp\py} \d} p[Q} `LdZsLZQ#} L} kmjZmLb} NQQd} osZZQopQP} Lo} L} p[mQLP\dZ} oskkjmp} QwpQdo\jd } /jvQuQm} osNb\po} L} pLo_} pj} p[Q} oO[QPs`Qm} Rjm} QwQOsp\jd} v\p[} p[Q} )(./6 B;AB[mQLPo} LPPo} LPP\p\jdL`} Ojbk`Qw\py} pj} p[Q} o\bk`Q} LdP} jkQmLpjm } B[Q} oO[QPs`Qm} OLd} QwQOspQ} pLo_o} \d} Ldy} jmPQm} Nsp} opmL\Z[p} RjmvLmP} kmjZmLbb\dZ} bjPQ`} jR} B\dy;A } bsop} jNQy} p[Q} msd pj Ojbk`Qp\jd} ms`Q } BLmZQp} pj} jsm} vjm_} \o} pj} \bkmjuQ} oO[QPs`\dZ} WQw\N\`\py} jR} %QOLsoQ} pLo_o} LmQ} djd} kmQQbkp\uQ} LdP} msd pj Ojbk`Qp\jd} B\dy;A} v[\`Q} kmQoQmu\dZ} p[Q} o\bk`\O\py} jR} kmjZmLbb\dZ} Lo} p[Qy}LmQ}Lpjb\O} v\p[}mQokQOp}pj}QLO[}jp[Qm } /jvQuQm}pLo_o}LmQ} vQ``} Lo} mQpL\d\dZ} mQojsmOQ} LdP} kjvQm} QUO\Qdp} QwQOsp\jd } djp} Lpjb\O} v\p[} mQokQOp} pj} \dpQmmskp} [LdP`Qmo} jm} pj} OjbbLdPo} 0d}p[\o}kLkQm}vQ}LdL`yzQ}p[Q}OjdOsmmQdOy}bjPQ`}LdP}OsmmQdp} LdP}QuQdpo} \dP\mQOp`y} \duj_QP} Xjb} \dpQmmskpo } Bj}RLO\`\pLpQ}p[Q} \bk`QbQdpLp\jd} jR} B;A} LdP} \PQdp\Ry} pvj} bL\d} Njpp`QdQO_o} PQpQOp\jd} jR} mLOQ} OjdP\p\jdo} p[Q} B\dy;A} bjPQ`} P\op\dZs\o[} Rjm}OjdOsmmQdOy}dLbQ`y}djd kmQQbkp\uQ}pLo_}QwQOsp\jd}bjPQ`} oydO[mjdjso} LdP} LoydO[mjdjso} OjPQ} JK} JK"} LdP} Z`jNL`} Lpjb\O} oQOp\jd} \bk`QbQdpLp\jd } $o}L}OjdoQlsQdOQ} 4'!,'(0.6 (6  6 OjPQ} RsdOp\jdo} Ojb{ \d} jmPQm} pj} LO[\QuQ} mQokjdo\uQdQoo} LdP} mQL` p\bQ} kQmRjmbLdOQ} bLdPo} QuQdpo} pLo_o} p[Lp} \o} jd`y} mQLO[LN`Q} Rmjb} so\dZ} B;A} pLo_o} bsop} NQ} kLmp\p\jdQP} pj} Luj\P} N`jO_\dZ } FQ} pLo_o } kmjkjoQ} Ld} L`pQmdLp\uQ} QwQOsp\jd} bjPQ`} B;A =?;} L``jv\dZ} .4'!,'(0.6 (6  6 OjPQ} p[Lp} \o} mQLO[LN`Q} kmQ Qbkp\uQ}pLo_o}LdP}kmQoQdp}p[Q}OjdP\p\jdo}Rjm}mLOQ}XQQ}QwQ{ Xjb} Lp} `QLop} jdQ} \dpQmmskp} [LdP`Qm } Osp\jd }0d}OjdpmLop}pj}p[Q}QwpQdo\jd}B;AB[mQLPo}JKB*=} v[\O[} LPPo} p[Q} Ojbk`Qw\py} jR} p[mQLPQP} kmjZmLbb\dZ} jsm} 0d}ZQdQmL`}Ldy}skPLpQ}pj}o[LmQP}opLpQ}p[Lp}\o}mQLO[LN`Q} Xjb} kmjkjoQP} bjPQ`} _QQko} p[Q} o\bk`\O\py} LdP} Q`QZLdOQ} jR} p[Q} $'} \o} L} kjpQdp\L`} PLpL} mLOQ } Bj} mQ\dopLpQ} Lpjb\O\py} \d} osO[} jm\Z\dL`} B;A} bjPQ` } ;sm} bjPQ`} L``jvo} Lpjb\O} oQOp\jdo} pj} NQ} OLoQo} p[Q} kmjZmLiQm} [Lo} pvj} jkp\jdo"} OjduQmp} L``} jR} p[Q} \bk`QbQdpQP} so\dZ} LmN\pmLmy} ZmLds`Lm\py} L``jv\dZ} pj} Rsmp[Qm} OjdW\Op\dZ} OjPQ} pj} pLo_o} A'} jd`y} jm} soQ} Lpjb\O} oQOp\jdo} \bkmjuQ}mQL` p\bQ}kQmRjmbLdOQ } FQ}P\oOsoo}[jv}p[Q}QwQOsp\jd} pj} LOOQoo} p[Q} o[LmQP} opLpQ } B[so} dQo'} uQmo\jd}  } JK} \o} bjPQ`} OLd} NQ} QUO\Qdp`y} \bk`QbQdpQP} jdpj} [LmPvLmQ} k`Lp{ PQo\ZdQP} pj} QdRjmOQ"} Rjmbo}sp\`\z\dZ}VwQP}km\jm\py}oO[QPs`\dZ}[LmPvLmQ}\R}kmQoQdp }  *6 '1*"'/6 F 0R} L} uLm\LN`Q}  \o} vm\ppQd} \d} 8jmQjuQm} vQ} P\oOsoo} [jv} p[Q} \bk`QbQdpLp\jd} OLd} NQ} ksp} $'} p[Qd} L``} LOOQooQo} pj}  bsop} jOOsm} \d} Lpjb\O} pj} opLp\O} LdL`yo\o} Rjm} PQpQmb\d\dZ} '=D} LdP} bQbjmy} soLZQ } oQOp\jdo } Bj} \dP\OLpQ} p[Q} \bkLOp} jR} p[Q} kmjkjoQP} QwQOsp\jd} bjPQ`} vQ}  *6 '1*"'/6 F 0R}L}uLm\LN`Q}  \o}mQLP}\d}$'} o_QpO[} L} pyk\OL`} oOQdLm\j} jR} OjdOsmmQdp} [\Z[} Q Z } kLO_Qp} p[Qd} L``} vm\pQo} pj}  bsop} jOOsm} \d} Lpjb\O} oQOp\jdo } RjmvLmP\dZ}LdP}`jv}km\jm\py}pLo_o}NLO_ZmjsdP}kmjOQoo\dZ}jd} -smp[QmbjmQ} p[Q} dQo'} Ojbk\`Qm} \o} PQo\ZdQP} pj} QdosmQ} p[Lp} L}kj\oojd P\opm\NspQP}QuQdp}opmQLb}\dOjb\dZ}kLO_Qpo } ?Qos`po} Lpjb\O} oQOp\jdo} LmQ} opm\Op`y} [\QmLmO[\OL` }

134 F &)%&'//"('6 (6"'46 B[Q} ZjL`} jR} B;A =?;} \o} pj} \bkmjuQ} OjdOsmmQdOy} v[\`Q} bL\dpL\d\dZ} mLOQ} XQQ} jkQmLp\jd} jR} B;A} kmjZmLbo} \ Q } Rs`{ 0d} jmPQm} pj} PQL`} v\p[} mLOQ} OjdP\p\jdo} p[Lp} RmQlsQdp`y} jO{ V``\dZ}p[Q}mLOQ} XQQ} \duLm\Ldpo} Lo} Rsmp[Qm} P\oOsooQP} \d} AQOp\jdo} OsmmQP} \d} u w} B\dy;A} u w} OjbQo} v\p[} L} OjbkjdQdp} bjPQ`} 000 $}LdP}000 % } 0d}AQOp\jd}000 (} vQ}P\oOsoo}L}OLdP\PLpQ}pQO[{ LdP} Ojbk\`Qm} dQo(} L\P\dZ} p[Q} kmjZmLbbQm} pj} PQuQ`jk} mLOQ} d\lsQ} Rjm} \bk`QbQdp\dZ} B;A =?;} jdpj} `\Z[pvQ\Z[p} k`LpRjmbo} RmQQ} OjPQ} Ny} QdRjmO\dZ} oOjk\dZ} ms`Qo} LdP} \oos\dZ} vLmd\dZo} so\dZ} p[Q} ?*5;} RmLbQvjm_ } jd} kjpQdp\L`} PLpL} mLOQ} [LzLmPo } B[Q} OsmmQdp} dQo(} JK} Ojb{ B[Q} \dOmQLoQP} OjdOsmmQdOy} [Lo} p[Q} kjpQdp\L`} pj} \bkmjuQ} k\`Qm} P\opm\Nsp\jd} PjQo} L} RL\m} ^jN} Lp} opLp\OL``y} PQpQOp\dZ} mLOQo } oyopQb}mQokjdo\uQdQoo}pj}QwpQmdL`}QuQdpo}`\b\p\dZ}p[Q}dQZLp\uQ} /jvQuQm} \d} OLoQ} jR} Ojbk`\OLpQP} kLppQmdo} okLdd\dZ} bs`p\k`Q} \bkLOp}jR}Lpjb\O}oQOp\jdo}LdP}pj}\bkmjuQ}mQL` p\bQ}QwQOsp\jd} Lpjb\O} oQOp\jdo} mLOQ} PQpQOp\jd} b\Z[p} RL\` } 8jmQjuQm} p[Q} jR} pLo_o } Bj} p[\o} QdP} vQ} dQQP} LPP\p\jdL`} \dRjmbLp\jd} pj} OsmmQdp}P\opm\Nsp\jd}bLy}\oosQ}RL`oQ}kjo\p\uQo}v[Qd}\p}OjbQo}pj} km\jm\p\zQ} QwQOsp\jd } Byk\OL``y} QwpQmdL`} QuQdpo} LmQ} OLkpsmQP} \PQdp\Ry\dZ} kjpQdp\L`} mLOQo } -jm} p[\o} kmQoQdpLp\jd} vQ} LoosbQ} p[mjsZ[} p[Q} \dpQmmskp} [LmPvLmQ} sosL``y} v\p[} LoojO\LpQP} km\{ B\dy;A} bjPQ`o} LOOjmP\dZ} pj} p[Q} mLOQ} XQQ} \duLm\Ldpo } jm\p\Qo } AsO[} \dpQmmskp} km\jm\p\Qo} LmQ} djp} kLmp} jR} p[Q} B;A} B[Q} opLdPLmP} B;A} oO[QPs`Qm} Rj``jvo} L} -0-;} kj`\Oy} pj} \b{ kmjZmLbb\dZ} bjPQ`} kQm} oQ} Nsp} \o} soQP} \d} kmLOp\OQ} pj} Ojdpmj`} k`QbQdp} p[Q} B\dy;A} u w} pLo_} oQbLdp\Oo } $pjb\O}oQOp\jdo} LmQ} kmQOQPQdOQ} jR} \dpQmmskp} [LdP`\dZ } F\p[} mQokQOp} pj} pLo_o} p[Q} Ny}OsmmQdp}msd p\bQ}oyopQb} \bk`QbQdpLp\jdo} QdRjmOQP} p[mjsZ[} NLo\O PQRLs`p} -0-;} oO[QPs`Qm} PjQo} djp} \dOjmkjmLpQ} km\jm\p\Qo } Z`jNL``y} P\oLN`\dZ} \dpQmmskpo } /jvQuQm} Lo} P\oOsooQP} QLm`\Qm} p[Q} B;A} kmjZmLbb\dZ} bjPQ`}

F 3/'."('.6 L``jvo} Osopjb\zQP} oO[QPs`Qmo}pj}NQ}PQk`jyQP} Q Z } *)-}B*=} } } LdP} bjdjpjd\O} oO[QPs`Qmo} v[\O[} ksp} pj} soQ} \d} B;A} B[Q} NLo\O} B\dy;A} QwQOsp\jd} bjPQ`} djd kmQQbkp\uQ} -0-;} kmjZmLbo} OLd} Z\uQ} so} \dRjmbLp\jd} dQQPQP} Rjm} km\jm\p\z\dZ} oO[QPs`\dZ} jR} pLo_o} \bkjoQo} L} o\Zd\VOLdp} `\b\pLp\jd} pj} QwQOsp\jd } oO[QPs`\dZ}jR}mQL` p\bQ}Lkk`\OLp\jdo}v[QmQ}pLo_o}LmQ}osN^QOp}pj} P\TQmQdp} p\b\dZ} OjdopmL\dpo } %y} bL_\dZ} p[Q} pLo_} oO[QPs`Qm} Lo} 6 6 .$6 (%6 L} oQkLmLpQ} Qdp\py} OjbkjdQdp} jp[Qm} oO[QPs`\dZ} kj`\O\Qo} [LuQ} B[Q} B\dy;A}  w} kmjZmLbb\dZ} bjPQ`} op\ks`LpQo} pLo_o} pj} NQQd}\dpmjPsOQP} Q Z } *)-} B\dy;A} u w}B*=} } /jvQuQm} msd} djd kmQQbkp\uQ`y} \ Q } L} pLo_} bLy} dQuQm} kmQQbkp} Ldjp[Qm} \d} jmPQm} pj} Ojbk`y} pj} p[Q} QwQOsp\jd} bjPQ`} jR} B\dy;A} pLo_o} jdZj\dZ}OjbkspLp\jd } $o}bQdp\jdQP}p[\o}[Lo}p[Q}LPuLdpLZQ}jR} OLddjp} NQ} QwQOspQP} kmQQbkp\uQ`y} o\dOQ} p[\o} vjs`P} NmQL_} p[Q} mLOQ} XQQ} QwQOsp\jd} \d} NQpvQQd} pLo_o} LdP} `\b\po} opLO_} PQkp[} mLOQ} XQQ} \duLm\Ldp } Psm\dZ}msd p\bQ } B[Q}PmLvNLO_}\o}p[Lp}OjdOsmmQdOy}\o}oQuQmQ`y} B[so} \dPQkQdPQdp} jR} p[Q} oO[QPs`\dZ} kj`\Oy} Lkk`\QP} p[Q} `\b\pQP } RsdPLbQdpL`} kmjN`Qb} jR} djd kmQQbkp\uQ} QwQOsp\jd} mQbL\do } ;d} O`joQm} \dokQOp\jd} mLOQ} RmQQ} QwQOsp\jd} OLd} NQ} LO[\QuQP} 0p} \o} opmQooQP} \d} p[Q} OLoQ} v[QmQ} L} `jv} km\jm\py} pLo_} OLd} N`jO_} sdPQm} p[Q} OjdP\p\jd} p[Lp} pLo_o} v\p[} juQm`Lkk\dZ} opLpQ} LOOQoo} L} [\Z[Qm} km\jm\py} pLo_} Rmjb} QwQOsp\jd } 0d} jmPQm} pj} `\b\p} p[Q} LmQ} QwQOspQP} djd kmQQbkp\uQ`y } F[\`Q} mQls\m\dZ} Ojbk`Qw} \b{ N`jO_\dZ}p\bQo} p[Q}kmjZmQm}[Lo}pj}pL_Q}Rs``} mQokjdo\N\`\py} k`QbQdpLp\jd}L}Rs``}LdL`yo\o}jR}pLo_}opLpQ}LOOQooQo}vjs`P}mQdPQm} jR}kLmp\p\jd\dZ}`jdZQm}msdd\dZ}pLo_o}\dpj} oQuQmL`} o[jmp} msdd\dZ} p[Q} bLw\bsb} kjpQdp\L`} OjdOsmmQdOy } pLo_o } B[Q}mLOQ} XQQ}\duLm\Ldp}L``jvo}LmN\pmLmy}LOOQoo}pj}o[LmQP} 0dopQLP} Rjm} p[Q} B;A =?;} QwQOsp\jd} bjPQ`} LdP} \bk`Q{ opLpQ}uLm\LN`Qo}Rjm}A(}pLo_}OjPQ}p[so}LmN\pmLmy}ok`\pp\dZ}Ojs`P} bQdpLp\jd} vQ} kmjkjoQ} Ld} Lkkmjw\bLp\jd} NLoQP} jd} p[Q} B;A} OLsoQ} mLOQ} OjdP\p\jdo } (jdoQlsQdp`y} ok`\pp\dZ} bsop} NQ} PjdQ} bjPs`Q OjbkjdQdp} oOjk\dZ} ms`Qo } $o} opLpQ} uLm\LN`Qo} LmQ} jd`y} OLmQRs``y} \d} osO[} L} vLy} p[Lp} mLOQ} OjdP\p\jdo} Pj} djp} jOOsm } LOOQoo\N`Q} v\p[\d} p[Q} bjPs`Q} v[QmQ} PQVdQP} vQ} OLd} Loo\Zd}

$d}L`pQmdLp\uQ}oj`sp\jd} kmjkjoQP} \o}pj}\dpmjPsOQ}p[Q}OjdOQkp} L} bjd\pjm}  pj} QLO[} bjPs`Q} LdP} `Qp} p[Q} pLo_o} PQVdQP} \d} jR}pmLP\p\jdL`}p[mQLPo }B[Q}B;AB[mQLPo}B[mQLP}7\NmLmy}PQVdQo} p[Q} bjPs`Q} \dpQmdL`} LdP} OjPQ} OL``QP o\ZdL``QP} Ny} QwpQmdL`}

L}kmQQbkp\uQ}p[mQLP}QwpQdo\jd}pj}p[Q}B\dy;A}QwQOsp\jd}bjPQ` } pLo_o} jkQmLpQ} sdPQm} p[Q} bjPs`Qo} bjd\pjm}   B[\o} L``jvo} B;AB[mQLPo} msd} \d} p[Q} \P`Q} kQm\jPo} v[Qd} dj} QuQdpo} jm} OjdOsmmQdOy} \d} NQpvQQd} pLo_o} "6jkQmLp\dZ} jd} O`QLm`y} P\o^sdOp} pLo_o} LmQ} NQ\dZ} QwQOspQP } $o} osO[} B;AB[mQLPo} PjQo} djp} opLpQo } \bkmjuQ} p[Q} mQL` p\bQ} kQmRjmbLdOQ} jR} L`mQLPy} bLPQ} B\dy;A} OjPQ} LdP} mQls\mQo} OjPQ} pj} NQ} mQvm\ppQd} km\bLm\`y} Ny} bju\dZ} F 6 /(&"6 /"('.6 RsdOp\jdL`\py} jR} pLo_o} pj} p[mQLPo } B[Q} B\dy;A}  w} kmjZmLbb\dZ} bjPQ`} opLpQo"} /(&"6 ZsLmLdpQQ} p[Lp} p[Q} opLpQbQdp} \o} QwQOspQP} Lo{ 000 } B;A =?;}&?%":9*43i14$%0i 3$i \R} dj} jp[Qm} OjbkspLp\jd} jOOsmmQP} o\bs`pLdQjso`y} +250%2%39 9*43i LdP} Rsmp[QmbjmQ} Ldy} uL`sQo} opjmQP} \do\PQ} Ld} Lpjb\O} $o} P\oOsooQP} \d} p[Q} kmQu\jso} oQOp\jd} p[Q} B;A} QwQOsp\jd} opLpQbQdp} LmQ} u\o\N`Q} \do\PQ} L``} osNoQlsQdp} Lpjb\O} bjPQ`} LdP} OsmmQdp} \bk`QbQdpLp\jdo} kspo} `\b\po} pj}Qwk`j\p\dZ} opLpQbQdpo} OjdOsmmQdOy}[QdOQ} bLy} OLsoQ} kjjm} mQL` p\bQ} kQmRjmbLdOQ } B[\o} PQVd\p\jd} jTQmo} QwpQdo\uQ} OjdOsmmQdOy} \d} p[Qjmy} Lo} 0d} p[\o} AQOp\jd} vQ} kmjkjoQ} L} dQv} QwQOsp\jd} bjPQ`} 5 Lpjb\O} oQOp\jdo} v\p[} djd} juQm`Lkk\dZ} opLpQ} uLm\LN`Q} LOOQooQo} >@<} v[QmQ}>@}opLdPo}Rjm}>mQ+bkp\uQ}LdP} F Rjm}

135 p[Q} kmQoQdOQ} jR} Lpjb\O} oQOp\jdo } B[Q}dQo(} Ojbk\`Qm} PjQo} djp} *LO[} bjPs`Q} PQVdQo} L} dLbQ} okLOQ} L``} uLm\LN`Qo} PQVdQP} \d} oskk`y}Ldy}\dRjmbLp\jd}jd}v[Lp}okQO\VO}oQOp\jd opLpQ}uLm\LN`Qo} p[Q} NjPy} LmQ} LOOQoo\N`Q} \d} oOjkQ} pj} pLo_o} OjbbLdPo} LdP} pj} kmjpQOp} p[so} OsmmQdp} msd p\bQ} oyopQb} \bk`QbQdpLp\jdo} LmQ} QuQdpo} PQVdQP}\d}p[Q}bjPs`Q } B[QmQ}\o} dj}o[Lm\dZ}jR}uLm\LN`Qo} kmj[\N\pQP} pj} \bk`QbQdp} VdQ} ZmL\d} Lpjb\O\py} LdP} bsop} p[QmQ{ \d} NQpvQQd} bjPs`Qo} sd`Qoo} Qwk`\O\p`y} kLooQP} p[mjsZ[} L} OL`{ RjmQ} mQo\PQ} pj} PQk`jy\dZ} Lpjb\O} oQOp\jdo} Lp} p[Q} Z`jNL`} `QuQ`} ` o\ZdL` } (jbbLdPo} p[Lp}bLy} QwQOspQ} kmQQbkp\uQ`y} \o}bLm_QP} pyk\OL``y} Ny} bQLdo} jR} P\oLN`\dZ} L``} \dpQmmskpo } v\p[}p[Q}.4'6_QyvjmP}LdP}O[QO_QP}Ny}p[Q}dQo(}Ojbk\`Qm}Rjm} $djp[Qm} Ojbk`\OLp\dZ} bLppQm} \o} p[Lp} OjmmQOpdQoo} jR} Qw\op\dZ} Ojdo\opQdOy}v m p }pj}\dpQmRLOQ}pykQ}LdP}v\m\dZ}Lo}okQO\VQP}Ny} B;A} bjPQ`o} bLy} mQ`y} jd} p[Q} Loosbkp\jd} p[Q} Lpjb\O} oQOp\jdo} p[Q} OjdVZsmLp\jd \dopLdp\Lp\jd } /jvQuQm} dj} QoOLkQ} LdL`yo\o} LmQ} \bk`QbQdpQP} jd} Z`jNL`} `QuQ` } 0 Q } v[Qd} so\dZ} Lpjb\O} \o} kQmRjmbQP} p[so} v[Qd} kLoo\dZ} kj\dpQmo}mQRQmQdOQo}pj} PLpL} oQOp\jdo} pj} QdRjmOQ} p[Lp} p\b\dZ} Om\p\OL`} kLooLZQo} LmQ} pj} NQ} p[Q} mQokjdo\N\`\py} jR} Luj\P\dZ} mLOQ} OjdP\p\jdo} \o} `QRp} v\p[} p[Q} QwQOspQP} v\p[jsp} Ldy} kmQQbkp\jd} \dopQLP} jR} jm} LPP\p\jdL``y} kmjZmLbbQm } B[Q}LoosbQP}soQ}\o}p[Lp}p[Q}mQOQ\uQm}\o}PQ`QZLpQP} pj} kmjpQOp} Rmjb} PLpL} mLOQo } F[Qp[Qm} osO[} soLZQ} Ojbk`\Qo} pj} LOOQoo} pj} p[Q} PLpL } p[Q}B;A}PQVd\p\jd}OLd}NQ}PQNLpQP}Nsp}dQQPo}dQuQmp[Q`Qoo}pj}NQ} 0d}p[Q}(?;}bjPQ`}L}QLO[}jN^QOp}OLd}NQ}oQQd}Lo}L}OjbkjdQdp} Ojdo\PQmQP} v[Qd} PQuQ`jk\dZ} Ld} L`pQmdLp\uQ} QwQOsp\jd} bjPQ` } (?(}v\p[}mQls\mQP}LdP}kmju\PQP}\dpQmRLOQ}kjmpo } =jmpo}jR}p[Q} Bj} p[\o} QdP} Rjm} p[Q} B;A =?;} QwQOsp\jd} bjPQ`} LdP} \b{ OjbkjdQdp} \o} OjddQOpQP}Q\p[Qm}pj}osN} OjbkjdQdpo}(?(o} jm} k`QbQdpLp\jd} vQ} kmjkjoQ} p[Q} soQ} jR} bjd\pjmo} pj} \bk`QbQdp} Lp} p[Q} Njppjb} `QuQ`} pj} bQp[jPo } *LO[} jN^QOp} PQVdQo} L} dLbQ} Lpjb\O}oQOp\jdo }%LoQP}jd}p[Q}B;A}bjPs`Q OjbkjdQdp}oOjk\dZ} okLOQ} LdP} bQp[jPo} LdP} opLpQ} \o} LOOQoo\N`Q} jd`y} v\p[\d} p[Q} ms`Qo} vQ}Loo\Zd}L}bjd\pjm}  pj}QLO[}bjPs`Q} LdP}`Qp}Lpjb\O} jN^QOp } oQOp\jdo} \d} p[Q} bjPs`Q} jkQmLpQ} sdPQm}  $O[\Qu\dZ} PQo\mQP} $oosb\dZ}dj}mQRQmQdOQo}LmQ}kLooQP}\d}NQpvQQd}B\dy;A}Ojb{ p\b\dZ} NQ[Lu\jm} LdP} QuQd} opm\Op} djd kmQQbkp\jd} \o} PjdQ} kjdQdpo} \p}\o}oLRQ}pj}bLk}B\dy;A}bjPs`Qo}jdpj}(?;}jN^QOpo } p[mjsZ[}Qwk`j\p\dZ}\dRjmbLp\jd}Z\uQd}\d}p[Q}bjPQ` } $}PQpL\`QP} 0d}QRRQOp} opLpQ} uLm\LN`Qo}PQVdQP}\d}L}B\dy;A}bjPs`Q}NQOjbQo} P\oOsoo\jd} Rj``jvo} \d} p[Q} dQwp} oQOp\jdo } opLpQ} uLm\LN`Qo} \d} p[Q} OjmmQokjdP\dZ} (?; } (jbbLdPo} QuQdpo} LdP} pLo_o} bLk} pj} bQp[jPo} jR} p[Q} (?; } =mju\PQP} LdP} soQP} F 6 &)%&'//"('6 \dpQmRLOQo}jR}L}B\dy;A}bjPs`Q}bLk}P\mQOp`y}jdpj}kmju\PQP}LdP} B[Q}?*5;}0)*}\o}L}XLbQvjm_}Rjm}PQo\Zd\dZ}LdL`yz\dZ}LdP} mQls\mQP} \dpQmRLOQo} jR} p[Q} OjmmQokjdP\dZ} (?;} [QdOQ} \dpQmdL`} oydp[Qo\z\dZ} mQL` p\bQ} Lkk`\OLp\jdo} Rjm} mQojsmOQ} OjdopmL\dQP} pLo_o} B\dy;A} LmQ} djp} u\o\N`Q} \d} p[Q} (?;} \dpQmRLOQ} Nsp} oyopQbo} JK} JK } 0p} Ns\`Po} jd} p[Q} djp\jd} jR} (jdOsmmQdp} LOOQoo\N`Q} \dpQmdL``y} pj} p[Q} jN^QOp } Bj} Ojbk`QpQ} p[Q} k\OpsmQ} ?QLOp\uQ} ;N^QOpo} (?;o} JaK} v[QmQ} QLO[} (?;} \dopLdOQ} OL``}LdP}o\ZdL`}OjdopmsOpo}bLk}pj}oydO[mjdjso}bQooLZQo} v[\`Q} \bk`QbQdpo}L}bjd\pjm}QdRjmO\dZ}bspsL`}QwO`so\jd}\d}NQpvQQd} p[Q} kjop} pLo_} OjdopmsOp} bLko} pj} oQdP\dZ} Ld} LoydO[mjdjso} bQp[jPo} jkQmLp\dZ} jd} p[Q} jN^QOp} opLpQ } $} NLo\O} bLkk\dZ} \o} bQooLZQ }0d}Njp[}OLoQo}p[Q}bQooLZQ}PQVdQo}L}mQOQ\u\dZ}bQp[jP}

LO[\QuQP} Ny} pL_\dZ} QLO[} B;A} 8jPs`Q} \dpj} pvj} bjd\pjmo}  OLdP o\ZdL`} jm} pLo_} mQokQOp\uQ`y} LdP} L} pLmZQp} jN^QOp}   LdP}   mQkmQoQdpQP} Ny} (?;o}  LdP}  soQP} pj} oQm\L`\zQ} B[Q} `jO_\dZ} bQO[Ld\ob} jR} (?;} LspjbLp\OL``y} kmjpQOpo} LOOQoo} pj} Lpjb\O} oQOp\jdo} LdP} pLo_o} mQokQOp\uQ`y } p[Q} o[LmQP} opLpQ} Lo} bQp[jPo} jd} Ld} jN^QOp} msd} sdPQm} bspsL`} '/**0)/.6 '6 '/-6 ("'/.6 %jp[}B\dy;A}LdP}(?;}bjPQ`o} QwO`so\jd } LmQ}mQLOp\uQ} Pm\uQd}Ny}QuQdpo}jOOsmm\dZ}\d}p[Q}Qdu\mjdbQdp } 0d} p[Q} OLoQ} jR}B\dy;A} Ld}LoydO[mjdjso}QuQdp} [LdP`Qm} \o} Looj{ F 6 ('0**'46 1%0/"('6 O\LpQP} v\p[} L} [LmPvLmQ} \dpQmmskp } $PP\p\jdL``y} Rjm} p[Q} \bk`Q{ 0d} L} B\dy;A} bjPs`Q} OjbkjdQdp} Ldy} OjbbLdP} bLm_QP} bQdpLp\jd}Lppm\NspQ}OLd}NQ}oQp}pj}\dP\OLpQ} vQp[Qm} p[\o} \dpQmmskp} .4'6 $(} OjPQ} bLy} kmQQbkp} LdP} QwQOspQ} OjdOsmmQdp`y} pj} o[js`P} NQ} pmQLpQP} Lo} Ld} Lpjb\O} oQOp\jd}  /(&"!21'/6 jp[Qm}pLo_o}LdP jm} OjbbLdPo QuQdpo} QwQOsp\dZ}\d}oOjkQ}jR}p[Q} P\oLN`\dZ} \dpQmmskpo} Z`jNL``y} jm} L``jv\dZ} kmQQbkp\jd}  !25 OjbkjdQdp }0d}OLoQ}jR}p[Q}(?;}bLkk\dZ}dj}osO[}OjdOsmmQdOy} 1'/6 v\p[\d}Ld}jN^QOp}Qw\opo}PsQ}pj}p[Q}bspsL`}QwO`so\jd}\d}NQpvQQd} 0d}p[Q}(?;} bjPQ`} sdPQmpL_Qd} \d} p[Q} ?*5;} 0)*} L}okQO\L`} QwQOsp\dZ} bQp[jPo } $p} L} Vmop} Z`LdOQ} \p} bLy} LkkQLm} p[Lp} p[Q} '1",'&'/6 (&)(''/6 OjdopmsOp}\o} soQP}pj}PQVdQ}\dpQmLOp\jd} (?;} bjPQ`} vjs`P} \dWQOp} oQuQmQ} `\b\pLp\jdo} pj} OjdOsmmQdOy } v\p[} p[Q} oyopQb} Qdu\mjdbQdp } /jvQuQm} Lp} L} O`joQm} \dokQOp\jd} \d} jmPQm} pj} Luj\P} mLOQ} OjdP\{ 0R}Qdpmy}kj\dpo}LmQ}pj}NQ}pmQLpQP}Lpjb\OL``y}v m p}pj}QLO[}jp[Qm} p\jdo} vQ} oQQ} p[Lp} \d} OLoQo} v[QmQ} opLpQ} \o} o[LmQP} \d} NQpvQQd} p[\o}OLd}NQ}QdOjPQP}P\mQOp`y}\d}p[Q}(?;}bjPQ`}Ny} OjddQOp\dZ} $(} LdP} jp[Qm} OjPQ} o[LmQP} opLpQo} LOOQoo} bsop} NQ} kmjpQOpQP} p[Qb} pj} Qdpmy} kj\dpo} pj} bQp[jPo} jR} L} o[LmQP} pLmZQp} jN^QOp } 0d} so\dZ} Qwk`\O\p} Lpjb\O} oQOp\jdo } AsO[} Lpjb\O\py} \o} \bk`\O\p} \d} p[\o} vLy} p[Q}(?;} bjPQ`} kmju\PQo} bQLdo} pj} QwkmQoo}p[Q}QwLOp} p[Q} (?;} bjPQ` } ;d`y} \d} OLoQo} v[QmQ} p[Q} Lpjb\O} oQOp\jdo} LmQ} Lbjsdp} jR}PQo\mQP} OjdOsmmQdOy} \d} NQpvQQd} \dpQmmskp} [LdP`Qmo } o\Zd\VOLdp`y} o[jmpQm} p[Ld} p[Q} OjmmQokjdP\dZ} OjbbLdP QuQdp} /QdOQ}QuQdp}[LdP`Qmo}p[Lp}o[js`P}msd}djd kmQQbkp\uQ`y}o[js`P} p[Q} (?;} bjPQ`} `jjoQo} jsp} jd} OjdOsmmQdOy} \d} p[Q} bjPQ` } NQ}PQO`LmQP}\d}p[Q}oLbQ}B;A}bjPs`Q} v[\O[}bLko}pj}L}o[LmQP} /jvQuQm} \R} NmQL_\dZ} jsp} kLmpo} jR} OjbbLdPo QuQdpo} p[Lp} PjQo}

 LdP} o[LmQP}  OjmmQokjdP\dZ`y } djp} jkQmLpQ} jd} p[Q} opLpQ} Lo} NQ\dZ} jspo\PQ} p[Q} bjd\pjm}   p[\o} (' 0+/"('.6 '6 (0%.6 1.6 (&)(''/.6 '6 #/.6 `\b\pLp\jd} OLd} NQ} Rs``y} b\p\ZLpQP } B\dy;A} OjbkjdQdpo} LmQ} okQO\VQP} \d} pQmbo} jR} OjdVZsmLp\jdo} /jvQuQm}v[Qd}\p}OjbQo}pj}\bk`QbQdpLp\jd}OsmmQdp}B\dy;A} v[\O[} Lp} p[Q} `jvQop} `QuQ`} \o} \bk`QbQdpQP} Ny} bjPs`Qo } B[Q} Ojbk\`Qm} pjj` os\po} LdP} msd p\bQ} oyopQbo} QdRjmOQ} Lpjb\O\py} OjdVZsmLp\jd}\dpQmRLOQo}LmQ}v\mQP}pj}osN}OjbkjdQdp}\dopLdOQo } Ny} P\oLN`\dZ} \dpQmmskpo} Z`jNL``y } B[\o} L`jdQ} bL_Qo} B\dy;A}

136 kmjZmLbb\dZ} Ld} QwpmQbQ`y} PQ`\OLpQ} bLppQm} v[Qd} opeu2dZ} pj} kmjZmLbb\dZ} p[Q} \dpQmmskp} [LmPvLmQ} Ny} oQ`QOp\dZ} \dpQmmskp} jNpL\d} oyopQbo} v\p[} o[jmp} vjmop} OLoQ} mQokjdoQ} p\bQ} Lo} p[Q} km\jm\p\Qo } B[\o} \dRjmbLp\jd} OLd} NQ} P\mQOp`y} Lkk`\QP} \d} p[Q} \dpmjPsOp\jd} jR} Lpjb\O} oQOp\jdo} _\``o} L``} OjdOsmmQdOy} \d} p[Q} ?*5;} pjj`} \dpj} oQpp\dZ} p[Q} km\jm\py} \d} pQmbo} jR} mQ`Lp\uQ} oyopQb } B[Q}kmjN`Qb}ZjQo}NQyjdP}\bk`QbQdpLp\jd}jR}p[Q}msd{ PQLP`\dQ}Rjm}p[Q}OjmmQokjdP\dZ}Qdpmy}kj\dp }9jp\OQ}\d}OLoQ}jR} p\bQ} oyopQb} Lo} p[Q} jspksp} Rmjb} p[Q} dQo(} Ojbk\`Qm} PjQo} djp} \bk`QbQdp\dZ} [LmP} mQL`} p\bQ} oyopQbo} \dpQmmskp} km\jm\p\Qo} LmQ} \dP\OLpQ} \d} v[Lp} OjdpQwp} Lpjb\O\py} o[js`P} NQ} QdRjmOQP } -jm} \d\p\L``y} PQm\uQP} Rmjb} p\b\dZ} OjdopmL\dpo } B\dy;A} bjPQ`o} bLkkQP} pj} (?;} p[\o} kmjN`Qb} v\``} P\oLkkQLm} AQOjdP`y} v[Qd} \p} OjbQo} pj} pLo_o} p[Q} B;A} )(./6 /6 Ojd{ L`pjZQp[Qm} Lo} bspsL`} QwO`so\jd} \o} QdRjmOQP} jd`y} \d} NQpvQQd} opmsOpo} OLd} NQ} bLkkQP} \dpj} Ld} LoydO[mjdjso} (?;} bQooLZQo} bQp[jPo}QwQOsp\dZ}jd}p[Q}oLbQ}jN^QOp}v[\`Q}L``jv\dZ}bQp[jPo} pj} p[Q} bQp[jP}    0d} OLoQo} v[Qd} p[Q} B;A} bjPQ`} [j`Po} p[Q} jkQmLpQ} jd} P\TQmQdp} jN^QOpo} pj} QwQOspQ} OjdOsmmQdp`y } p\b\dZ} \dRjmbLp\jd} soQP} Rjm} L`pQmdLp\uQ} B;A} oO[QPs`Qmo} /(&"6 ./"('.6 '6 /.$.6 0d} jsm} kmjkjoQP} QwQOsp\jd} p[\o} \dRjmbLp\jd} OLd} NQ} P\mQOp`y} soQP} Ny} p[Q} (?;} oO[QPs`Qm } bjPQ`} QLO[} bjPs`Q} \o} LoojO\LpQP} v\p[} L} bjd\pjm} Rjm} Lpjb\O} $PP\p\jdL``y} p[Q} '?;} bjPQ`} L`oj} L``jvo} PQRQmmQP} jkQmLp\jdo} oQOp\jdo}  \d} AQOp\jd} 000 %} LdP} L} bjd\pjm} Rjm} pLo_o}  \d} pj} NQ} PQ`LyQP} v[\O[} bL_Qo} p[Q} soQ} jR} Qwk`\O\p} B;A} B\bQmo} AQOp\jd} 000 $ } /QdOQ} Rjm} QLO[} B;A} bjPs`Q} vQ} OmQLpQ} pvj} oskQmWsjso }

(?;o}  LdP}   v[QmQ} opLpQ} uLm\LN`Qo}jOOsmm\dZ}\d} Lpjb\O} &)%&'//"('6 oQOp\jdo} LmQ} PQVdQP} \d}  LdP} mQbL\d\dZ} opLpQ} uLm\LN`Qo} LmQ}

PQVdQP} \d}   B[Q} OjdOQkp} jR} PQLP`\dQo} \d} p[Q} kmjZmLhg\dZ} bjPQ`} Rjm} BLo_o} jR} p[Q} B;A} bjPQ`} pmLdo`LpQo} pj} bQp[jPo} jR} (?;} jkQdo} sk} Rjm} *)-} NLoQP} oO[QPs`\dZ } /jvQuQm} Rjm}

 (./6 /6 OjmmQokjdPo} pj} Ld} LoydO[mjdjso} bQooLZQ} `\Z[pvQ\Z[p} k`LpRjfgo} [LmPvLmQ} Loo\opQP} oO[QPs`\dZ} so\dZ}

         ? F F QwQOsp\dZ} ?F sdPQm} p[Q} jN^QOp bjd\pjm} opLp\O} \dpQmmskp} km\jm\p\Qo} jRRQmo} Ld} LppmLOp\uQ} L`pQmdLp\uQ} v\p[}   (jbbLdPo} LdP} QuQdpo} LmQ} RsdOp\jdo} QwQOspQP} P\mQOp`y} \po} QUO\Qdp} LdP} `jv} Ojbk`Qw\py} \bk`QbQdpLp\jd } Bj}p[\o} QdP} Ny} p[Q} OL``Qm o\ZdL``Qm} p[mjsZ[} L} oydO[mjdjso} bQooLZQ } *LO[} opLp\O} km\jm\p\Qo} OLd} NQ} P\mQOp`y} PQm\uQP} Xjb} PQLP`\dQo} \d} p[Q}

Lpjb\O} oQOp\jd} OjmmQokjdPo} pj} L} bQp[jP}  \d}   *dpQm\dZ} (?;}bjPQ`}LdP}sp\`\zQP}Ny}p[Q}msd p\bQ}oyopQb}\d}OjjkQmLp\jd}

 Rmjb} L} OjbbLdP QuQdp} jm} pLo_} OjmmQokjdPo} pj} L} oyd{ v\p[}p[Q}\dpQmmskp}[LmPvLmQ}pj}kQmRjmb}oO[QPs`\dZ}sdPQm}A?={

O[mjdjso} bQooLZQ}            QwQOsp\dZ}  sdPQm} )8}J K }A?=}ApLO_}?QojsmOQ}=j`\Oy} J!K}\o}L}km\jm\py}OQ\`\dZ} p[Q} bjd\pjm}   0d} p[Q} OLoQ} v[QmQ} QdpQm\dZ} Xjb} Ld} \dpQmdL`} kmjpjOj`} p[Lp}L``jvo} p[Q} QwQOsp\jd}opLO_} pj} NQ} o[LmQP} LbjdZop} pLo_} opLpQ} uLm\LN`Qo} \d}  bsop} NQ} LOOQoo\N`Q} pj}   B[\o} L``} pLo_o }

 OLd} OjdOQkpsL``y} NQ} LO[\QuQP} p[mjsZ[} kLoo\dZ}  Lo} Ld} B[Q} (?( (?;} bjPQ`} OLd} NQ} oQQd} Lo} L} okQO\VOLp\jd} Rjm}    LmZsbQdp} LdP} skPLpQ}  Lo} L} mQos`p} jR} QwQOsp\dZ} \ Q } jkQmLp\jd} sdPQm} [LmP} mQL` p\bQ} OjdopmL\dpo } .\uQd} vjmop} OLoQ}

                  } =Loo\dZ} LdP} skPLp\dZ} `LmZQ} QwQOsp\jd} p\bQo} LdP} LPP\p\jdL`} \dRjmbLp\jd} jd} b\d\bsb} \d{ opLpQ} OjdopmsOpo} vjs`P} \bkjoQ} kQmRjmbLdOQ} kQdL`py} [jvQuQm} pQmLmm\uL`} p\bQo} Rjm} QwpQmdL`} QuQdpo} LdP} \bkjmpLdp} O`Loo} jR} p[Q} [\QmLmO[\OL`} `jO_\dZ} L``jvo} P\mQOp} LOOQoo} pj}  \do\PQ} p[Q} oyopQbo} OLd} NQ} LdL`yzQP} Rjm} oO[QPs`LN\`\py} J K } /jvQuQm} \p} bjd\pjm}  QuQd} \R}  \o} jspo\PQ} p[Q} oydpLOp\O} oOjkQ} jR}   \o} vjmp[} pj} bQdp\jd} p[Lp} L`oj} oyopQbo} v\p[} ojSq} mQL` p\bQ} mQ{ B[so"} ls\mQbQdpo} vjs`P}NQdQVp}Rmjb}p[Q}kmjkjoQP}QwQOsp\jd}bjPQ` } 0d}osO[}OLoQ} vQ}bLy}O[jjoQ}pj}km\jm\p\zQ}QwQOsp\jd}Xjb}Q Z } @)=3FF CF +:,.F F .B.+A@.=F A8,.;F 6:81@:;F :(@F mQ`Lp\uQ} \bkjmpLdOQ} LdP} pL\`jm} p[Q} oO[QPs`Qm} LOOjmP\dZ`y } 0d}

:(@(=F  %'"FF FF :() F 6 F :(@(= FFF Ldy}OLoQ}p[Q} kmjkjoQP} QwQOsp\jd} bjPQ`}jTQmo}LPuLdpLZQo} jR} F 6F .B.+A@.=F A8,.;F 6:81@:;=F :(@F )8,F :()DF mQPsOQP} N`jO_\dZ}p\bQo}p[Q}Lpjb\O} oQOp\jdo} LmQ}QdRjmOQP} kQm} pmLdo`LpQo} \dpj} bjPs`Q} djp} Z`jNL``y} LdP} \dOmQLoQP} WQw\N\`\py} Rjm} oO[QPs`\dZ} PQO\o\jdo}kmQQbkp\jd}\o}L``jvQP}\d}NQpvQQd}pLo_o}jR}P\TQmQdp} @)=3F CF bjPs`Qo } +:,.F F .B.+A@.=F A8,.;F 6:81@:;F :(@F !# F :() F /jvQuQm}\dOmQLo\dZ}p[Q}OjdOsmmQdOy}jR}p[Q}oyopQb}L`oj}`QLP} 6F F 6F .B.+A@.=F A8,.;F 6:81@:;=F :(@F )8,F  )F pj} \dOmQLo\dZ} p[Q} PQkp[} jR} kmQQbkp\jdo } 0d} p[Q} B;A} bjPQ`} &"!# F :() F p[Q} bLw\bsb} PQkp[} jR} kmQQbkp\jdo} LmQ} p[Q} dsbNQm} jR} kmQ{ Qbkp\uQ}\dpQmmskp}[LdP`Qmo}k`so}jdQ}pLo_}sdPQm}p[Q}Loosbkp\jd} 9jp\OQ} p[\o} mQvm\ppQd} bjPQ`} o[js`P} djp} NQ} QwkjoQP} pj} p[Q} p[Lp} \dpQmmskpo} LmQ} djd mQQdpmLdp } B[Q} dsbNQm} jR} kjpQdp\L`} kmjZmLbbQm} Lo} Qwk`\O\p}  6 LOOQoo} \o} djp} L``jvQP} \d} p[Q} kmQQbkp\jdo} \d} p[Q} kmjkjoQP} QwQOsp\jd} bjPQ`} \o} jd`y} `\b\pQP} (?;} bjPQ`} Nsp} mLp[Qm} Ld} \bk`QbQdpLp\jd} pQO[d\lsQ} Lkk`\QP} Ny} p[Q} dsbNQm} jR} bjPs`Qo} \d} p[Q} oyopQb} o\dOQ} vQ} L``jv} Psm\dZ} OjPQ} oydp[Qo\o } QLO[} bjPs`Q} pj} QwQOspQ} jdQ} pLo_ } -jm} mQojsmOQ} OjdopmL\dQP} k`LpRjmbo}opLO_}bQbjmy}bLy}NQ}L}`\b\p\dZ}RLOpjm }DdPQm}A?={ "&"' 6 "'(*&/"('6 )8} vQ} OLd} Q`QZLdp`y} pmLPQ jT} OjdOsmmQdOy} pj} opLO_} PQkp[ } 0d} jmPQm} pj} Rs``y} Qwk`j\p} p[Q} kmjkjoQP} QwQOsp\jd}bjPQ`} \d{ 0 Q } QdRjmO\dZ} pvj} pLo_o} pj} o[LmQ} L} kmQQbkp\jd} `QuQ`} bL_Qo} RjmbLp\jd} pj} km\jm\p\zQ} QwQOsp\jd} bL_Q} oO[QPs`\dZ} PQO\o\jdo} p[Qb}QwQOspQ}sdPQm}bspsL`}QwO`so\jd}LdP}OLddjp}jOOsky}opLO_} OLd} NQ} LPPQP} pj} p[Q} (?;} bjPQ` } okLOQ} Lp} p[Q} oLbQ} p\bQ } B[so} bQmZ\dZ} km\jm\py} `QuQ`o} mQPsOQo} -\mop`y} v[Qd} \p} OjbQo} pj} QwpQmdL`} QuQdpo} \dpQmmskpo} p[Q} opLO_} bQbjmy} mQls\mQbQdp} oQQ} Q Z } JK } *wk`j\p\dZ} p[\o} B;A}kmjZmLbbQm}L`mQLPy} pjPLy}bL_Qo}Ld}LOp\uQ}PQO\o\jd}\dpj} pmLPQjT} \o} L} oQp} pjk\O} Rjm} RspsmQ} mQoQLmO[ }

137 6 "&"//"('.6 F 6 '(')*&)/"16 .!0%"' 6

0d} p[\o} kLkQm} vQ} [LuQ} RjOsoQP} jd} p[Q} kmjZmLn\dZ} LdP} 0d} p[Q} Vmop} QwkQm\bQdp} vQ} o[jv} p[Q} Pmjk} mLpQ} jR} pLo_o} QwQOsp\jd} bjPQ`o} jR} B;A} `QLu\dZ} jsp} PQpL\`o} jd} LPP\p\jdL`} C},} v[Qd} uLmy\dZ} p[Q} `jLP}C}&} sdPQm} B;A} jm\Z\dL`} djd{ OjdopmsOpo} `\_Q}kLmLbQpm\zQP}\dpQmRLOQo} jspksp} mQkQLpQmo} QpO } kmQQbkp\uQ} oO[QPs`\dZ } 0d} p[\o} oQpp\dZ} vQ} OLd} oQQ}CC}&} Lo} LdP} oydpLOp\O} \oosQo} L``} dQQPQP} pj} NQ} LPPmQooQP} \d} jmPQm} pj} p[Q}pjpL`}QwQOsp\jd}p\bQ} jR}L``}kjopQP} pLo_o}o\dOQ}p[Qy} v\``}L``} PQpL\`} L} Ojbk`QpQ} bLkk\dZ } [LuQ}pj}msd pj} QdP} NQRjmQ} pQdP\dZ}C}, }0d}OLoQ} \dpQmLmm\uL`} jR} F\p[} mQokQOp} pj} pLo_o} p[Q} kmjkjoQP} QwQOsp\jd} bjPQ`o} ,} \o} o[jmpQm} p[Ld}C,}}CC}&}}CC},}p[Q} vjmop} OLoQ} jkQdo} sk} Rjm} OjdOsmmQdOy } 0d} OLoQo} v[QmQ} pLo_o} mQ`y} jd} djd{ p\bQ}pj}kmjOQoo} p[Q}kmQu\jso}QwpQmdL`} QuQdp} p[Q}pLo_}C},}v\``} kmQQbkp\uQ} QwQOsp\jd} Rjm} QwLbk`Q} v[Qd} LOOQoo\dZ} L} o[LmQP} NQ} PmjkkQP} Rj``jv\dZ} Rmjb} p[Q} oyopQb} bjPQ` } QwpQmdL`} mQojsmOQ} p[Q} kmjZmLb} v\``} QwkmQoo} L} P\RRQmQdp} NQ{ 0d} -\ZsmQo} } } vQ} k`jp} so\dZ} mQP} OmjooQo} p[Q} vjmop{ [Lu\jm } B[\o} OLd} NQ} oj`uQP} Q Z } Ny} `Qpp\dZ} pLo_o} p[Lp} o[LmQ} OLoQ}kQmOQdpLZQ}jR}PmjkkQP}pLo_o}C},}Rjm}QwpQmdL`}QuQdpo} v\p[} LOOQoo} pj} Ld} QwpQmdL`} mQojsmOQ} NQ`jdZ} pj} p[Q} oLbQ} bjPs`Q} kj\oojd} P\opm\Nsp\jdo} v\p[}   }   } v[Qd} uLmy\dZ} \ Q } QwQOspQ}sdPQm}p[Q}oLbQ}bjd\pjm } 0d}ojbQ}OLoQo}p[\o}v\``} p[Q} N`jO_\dZ} p\bQ} jR} NLO_ZmjsdP} pLo_o}C& } 0d} p[Q} oQpp\dZ} mQls\mQ} mQRLOpjm\dZ} jR} p[Q} jm\Z\dL`} B;A} kmjZmLb} Nsp} jd} p[Q} jR} L} kLO_Qp} RjmvLmP\dZ} oOQdLm\j} Pmjkk\dZ} LC,} mQ`LpQo} pj} L} jp[Qm} [LdP} mQuQL`} PQkQdPQdO\Qo} jd} QwpQmdL`} mQojsmOQo } kLO_Qp} NQ\dZ} `jop } $o} QwkQOpQP} vQ} OLd} O`QLm`y} oQQ} p[Lp} p[Q} B;A}djd kmQQbkp\uQ}oO[QPs`\dZ}OLsoQo}PmLbLp\OL``y}\dOmQLoQP} kLO_Qp}`joo}PsQ}pj}N`jO_\dZ}CC}& }B[Q}N`sQ}Pjpo}PQk\Opo}p[Q} 0E } &?5%6*2%39i kmjNLN\`\py} bLoo} RsdOp\jd} jR} p[Q} okjmLP\O} QuQdpo} kmjNLN\`\py} Byk\OL`}Rjm}p[Q}dQwp}ZQdQmLp\jd}jR}A;$}QdLN`QP}oQdojmo}LdP} jR} \dpQmLmm\uL`} p\bQo} Rjm} p[Q} \dOjb\dZ} kLO_LZQo } LOpsLpjmo} LmQ} p[Q\m} QuQdp} Pm\uQd} dLpsmQ} pQdP\dZ} Rjm} QwLbk`Q} 0d} p[Q} OLoQ} jR} p[Q} jm\Z\dL`} B;A} -0-;} oO[QPs`Qm}C&}  Njp[} Ojbbsd\OLp\jd} oQdP\dZ mQOQ\u\dZ} PLpL} OjdOsmmQdp`y} pj} |CC}&]} \d} vjmop} OLoQ} vQ} [LuQ} pj} vL\p} Rjm} L``} lsQsQP} kQmRjmb\dZ} OjbkspLp\jdo} Q Z } OmykpjZmLk[y} Rjm} oQOsmQ} Ojb{ pLo_o } 0R} vQ} PQk`jy} L} km\jm\py} NLoQP} yQp} djd kmQQbkp\uQ} bsd\OLp\jd} kmjOQoo\dZ} jR} PLpL} oQdo\dZ} LdP} LOpsLp\dZ} QpO } oO[QPs`Qm} vQ} jNpL\d}C&}  cMxCC&]} \ Q } vQ} osRRQm} AjbQ} jR} p[QoQ} LOp\u\p\Qo} pyk\OL``y} mQls\mQ} osNopLdp\L`} (=D} N`jO_\dZ} jd`y} Xjb} p[Q} vjmop} OLoQ} NLO_ZmjsdP} pLo_ } DdPQm} mQojsmOQo} Q Z } OmykpjZmLk[y} LdP} PLpL} kmjOQoo\dZ } p[Q} Loosbkp\jd} p[Lp}C},}\o} Z\uQd} p[Q} [\Z[Qop} km\jm\py }

DdPQm}B;A}p[\o}\o} LO[\QuQP}so\dZ}L}b\wpsmQ}jR}QuQdpo Ojb{ F 6 2"/!6)*&)/"16 .!0%"' 6 bLdPo}LdP}pLo_o } %Qop}kmLOp\OQ}op\ks`LpQo}ok`\p}k[LoQ}jkQmLp\jd} 0d} p[Q} oQOjdP} QwkQm\bQdp} vQ} o[jv} p[Q} Pmjk} mLpQ} jR} pLo_o} _QQk\dZ} QwQOsp\jd} p\bQ} Rjm} QuQdpo OjbbLdPo} Lo} o[jmp} Lo} kjo{ C},}v[Qd}uLmy\dZ}p[Q}`jLP}C}&}sdPQm}p[Q}kmQ Qbkp\uQ}oO[QPs`{ o\N`Q} pj} kmj[\N\p} N`jO_\dZ } $o} L} OjdoQlsQdOQ} Ojbbsd\OLp\jd} \dZ} osO[} Lo} p[Q} kmjkjoQP} QwQOsp\jd} bjPQ`} LdP} B;A =?;} opLO_o}o[js`P}soQ}QuQdpo OjbbLdPo}jd`y}Rjm}p[Q}0 ;}jkQmLp\jdo} oO[QPs`Qm } ;d`y} \d} OLoQ} \dpQmLmm\uL`} jR},} \o} o[jmpQm} p[Ld} P\mQOp`y} v[\`Q} PQRQmm\dZ} kLO_Qp} jkQmLp\jdo} pj} pLo_o } C,CC,}p[Q}pLo_}C,}v\``}NQ}PmjkkQP}o\dOQ}vQ}[LuQ}dj} 0d}p[Q}Rj``jv\dZ}oQp}jR}QwkQm\bQdpo}vQ}v\``}o[jv}p[Q}\bkLOp} N`jO_\dZ} Xjb} jp[Qm} pLo_o} LZL\d} Loosb\dZ} p[Lp}C},}\o} Z\uQd} jR} oO[QPs`\dZ} kj`\O\Qo} pj} oyopQb} kQmRjmbLdOQ} \d} L} ZQdQm\O} p[Q}[\Z[Qop}km\jm\py } B[\o}y\Q`Po}L}OjdopLdp}Pmjk}mLpQ}o[jvd}\d} \bk`QbQdpLp\jd} LZdjop\O} oQpp\dZ } /jvQuQm} p[Q} kmQoQdpQP} -\ZsmQo}}}so\dZ}ZmQQd}opLmo } F\p[}  } vQ}oQQ}p[Lp}p[Q} mQos`po} OLd} Q Z } NQ} \dpQmkmQpQP} Lo} p[Q} osOOQoo} mLpQ} jR} kLO_Qp} N`jO_\dZ}PsQ}pj}OsmmQdp`y}jspopLdP\dZ}C,}}CC},}OLsoQ}L} RjmvLmP\dZ}sdPQm}P\RRQmQdp}\dpQmLmm\uL`}p\bQo}LdP}oyopQb}`jLPo } Pmjk} kmjNLN\`\py} jR} Lkkmjw\bLpQ`y}  } v[\`Q}   } jR} `Qoo} p[Ld}   } 6 4./&6 &(%6 ;Nu\jso`y}mQPsO\dZ}p[Q} N`jO_\dZ} p\bQ}CC}&} Rjm} p[Q} djd{ -jm}p[Q}QwkQm\bQdpo}vQ}bjPQ`}p[Q}oyopQb}Lo}okjmLP\O}QwpQm{ kmQQbkm\uQ} oO[QPs`\dZ} pj} } Z\uQo} p[Q} oLbQ} mQos`p} Lo} p[Q} kmQ{ dL`} QuQdpo} \dOjb\dZ} kLO_Qpo} pj} NQ} RjmvLmPQP} LdP} p[Q} NLO_{ Qbkp\uQ}oO[QPs`\dZ}v[\O[}OjmmQokjdPo}pj}mQVd\dZ}ZmLds`Lm\py} ZmjsdP}`jLP}Lo}pLo_o} Q Z } kmjOQoo\dZ}oQdojm}PLpL} kQmRjmb\dZ} jR} NLO_ZmjsdP} pLo_o} pj} \dVd\pQ`y} obL``} N`jO_o} jR} OjPQ} LdP} Omykpj} OL`Os`Lp\jdo} QpO } B[Q} QwpQmdL`} QuQdpo},} LmQ} Lppm\NspQP} so\dZ} L} km\jm\py} oO[QPs`Qm } B[\o} \dP\OLpQo} p[Lp} p[Q} kmjkjoQP} QuQdp} [LdP`\dZ} mQOQ\u\dZ} p[Q} kLO_Qp} v\p[} QwQOsp\jd} p\bQ} QwQOsp\jd}bjPQ`}LdP}oO[QPs`\dZ}bQO[Ld\ob}\o}jsp kQmRjmb\dZ} C,} v[\O[}o\ZdL`o} p[Q}pLo_}C,} v\p[}QwQOsp\jd}p\bQ}CC,} p[Q} djd kmQQbkp\uQ} oO[QPs`Qm} QuQd} \R} QwpQdo\uQ} Lbjsdp} jR} pLo_}C},}kQmRjmbo}p[Q}LOpsL`} vjm_}jR}RjmvLmP\dZ } FQ} bjPQ`} vjm_} \o} ksp} \dpj} mQvm\p\dZ} OjPQ} Rjm} ok`\p k[LoQ} jkQmLp\jd } p[Q} NLO_ZmjsdP} vjm_} Lo} pLo_o}C}&]} v\p[} p[Q} QwQOsp\jd} p\bQ} B[Q} mQos`po} LmQ} u\LN`Q} Rjm} jp[Qm} kmQ Qbkp\uQ} oO[QPs`Qmo} Q Z } CC}&] } A\dOQ} vQ} LoosbQ} pj} [j`P} dj} LPP\p\jdL`} \dRjmbLp\jd} B;AB[mQLPo } /jvQuQm} p[Q} P\op\dZs\o[\dZ} LPuLdpLZQ} jR} p[Q} jd} p[Q} okjmLP\O} QuQdpo} vQ} bjPQ`} p[Q} \dpQmLmm\uL`} p\bQo} jR} p[Q} kmjkjoQP}QwQOsp\jd}bjPQ`}\o}p[Lp}p[Q}o\bk`\O\py}jR}p[Q}jm\Z\dL`} QwpQmdL`} QuQdpo} p[mjsZ[} L} kj\oojd} kmjOQoo } B;A} bjPQ`} \o} sk[Q`P} v[\`Q} B;AB[mQLPo} m\oQo} L``} \oosQo} mQ`LpQP} pj} p[Q} p[mQLP\dZ} kLmLP\Zb } $OOjmP\dZ} pj} B;A} bjPQ`} vQ} LoosbQ} p[Lp} L} pLo_} bsop} NQ} QwQOspQP} pj} QdP} NQRjmQ} mQkjopQP} Q`oQ} p[Q} E } #43"0:8*438i pLo_} v\``} NQ} PmjkkQP } 0dPsopm\L`} bjd\pjm\dZ} LdP} Ojdpmj`} oyopQbo} LmQ} RjmQoQQd} pj}

-jm}p[Q}oQp}jR}QwkQm\bQdpo}vQ}LoosbQ}C,}   }CC,}  LPjkp} Ld} QuQm} \dOmQLo\dZ} dsbNQm} jR} dQpvjm_} oQdojmo} LdP} F LdP} L} uLmy\dZ}CC}&] } LOpsLpjmo } Bj} p[\o} QdP} A;$o} LdP} v\mQ`Qoo} pQO[dj`jZ\Qo} LmQ}

138 OLkLN`Q} jR} mQPsO\dZ} Njp[} PQ`Ly} LdP} Pmjk} mLpQ} LdP} \o} Lo} ZjjP}    jm}NQppQm}p[Ld}Ldy}jkp\b\zQP}kLmp\p\jd\dZ}jR}B;A}Lkk`\OLp\jdo } #  "/34= 0%$(%2%39i #  )$# !") ) # ) #)  #) &() " &) ) )"#$#) #  $!(") "")  #) "!) #) ") # ) ") #) $# !") #  EF  !) !%#$)"$"" ") ) ' !) ")) $#)'#)"$ !#) )   #  #) $! ) )  ) ! #) ) $# !") ' $) ) # ) #)    # !#!") ) #)  ) ! #)  !) "$"" ")   # 7%'%6%3"%8i   # A,Bi i > i "ZRZUEZi CYGi 8 i /CVZb_QZ_i 6C"8&=E >-'E "$>6-'E @   # # >@8(E E (8B.%(8.(4>(&E 86=="D(8E 5*9=>8@%>@8(E 9MHi %b^Z[HCYi                      9HRHFZUUbYOFC`OZY_i 8`CYGC]G_i *Y_`O`b`Hi %98*i i FM i  i ABi 6 i/fb_CQZci. i%ROC__ZYi. i$HR_OYLi. i cCYi$HcHY`H]i CYGi. i(b_`CI__ZYi 49*207/F>05-F5<F *Y`HL]C`OZYi ZIi dO]HRH__i _HY_Z]i CYGi CF`bC`Z]i YZGH_i dO`Mi O`i OYI]C_`]bF`b]Hi 'OL i  i $]Z[i ]C`Hi [CFQH`i RZ__i IZ]i Z]OLOYCRi 948i OYGOFC`HGi Efi ]HGi F]Z__H_i b_OYLi _H]cOFHZ]OHY`HGi C]FMO`HF`b]Hi E 94="$>.64=E 64E 4&@=>8."1E dMORHiL]HHYi_`C]_iOYGOFC`H_i`MHi[]Z[Z_HGi 948564i_FMHGbRH] i!RbHiGZ`_i_MZdi 4)683">.$=E i FFH[`HG i ABi ) 69i 'ZbYGC`OZYi M``[ ddd MC]`FZUU Z]Li 2C]FMi  i `MHi []ZECEORO`fi UC__i IbYF`OZYi  i  i IZ]i `MHi OY[b`i %i OY`H]C]]OcCRi `OUHi EH`dHHYi `dZi OYGH[HYGHY`i [CFQCLH_ i %e[H]OUHY`i _MZdi `MC`i `MHi G]Z[i ]C`Hi O_i ABi  8*Y`H]ICFHi M``[ C_OY`H]ICFH YHai 2C]FMi  i MOLMRfi RZCGi GH[HYGHY`i OYi `MHi FC_Hi ZIi `MHi Z]OLOYCRi 948i _FMHGbRH]i dMORHi RZCGi ABi 5]ZJEb_i 5]ZJYH`i M``[ ddd []ZJEb_ FZW i 2C]FMi  i OYGH[HYGHY`i IZ]i `MHi 948564i _FMHGbRH] i ABi 2ZGEb_i4]LCYOgC`OZYi M``[ ddd UZGEb_ FZX i 2C]FMi  i ABi 0ZYidZ]Q_i %FMHRZYi "Z][Z]C`OZYi M``[ ddd HFMHRZY FZX`HFMYZRZLf RZYdZ]Q_ i 2C]FMi  i ABi $F 5OH`]gCQi CYGi $F 0OYGL]HYi 9ZdC]G_i Ci ROLM`dHOLM`i FH[i HYLOYHi IZ]i    HUEHGGHGi _f_`HU_i E >-E 44@"1E 64)'8'4$'E 6)E >-'E E  # 4&@=>8."1E 1'$>864.$=E 6$.'?DE i FFH[`HG i  ABi 2 i 4 i 'C]ZZ\i CYGi 9 i /bYgi 4[H]C`OYLi 8f_`HU_i IZ]i =O]HRH__i 8HY_Z]i #  3H`dZ]Q_i i 8b]cHfi '4=68=E E  E [ ii ii i,, i  # ABi " i $bIIfi : i 6ZHGOLi . i )H]EH]`i CYGi " i 8]HHYCYi  Yi  %e[H]OUHY`CRi "ZU[C]O_ZYi ZIi %cHY`i $]OcHYi CYGi 2bR`O9N]HCGHGi # 8HY_Z]i 3ZGHi 4[H]C`OYLi 8f_`HU_i OYi ;$''&.4,=E 6)E >-'E /+-E EF   E 4>':">.64"1E 65*'8'4$'E 64E '8A"=.A'E 637@>.4,E "4&E #   633@4.$">.64=E !680=-67=E _H] i 5%6"42=i  i =C_MOYL`ZYi  #  $"i :8 i *%%%i "ZU[b`H]i 8ZFOH`fi i [[ i  i A4YROYHB i cCORCERHi M``[ Ge GZO Z]L S SSi 5%6"42> i i  #  A,,Bi . i 6HLHN]i i 6HOGi / i =HEEi 2 i 5C]QH]i CYGi  F 0H[]HCbi   # %cZRcOYLi ]HCR`OUHi _f_`HU_i b_OYLi MOH]C]FMOFCRi _FMHGbROYLi CYGi  FZYFb]]HYFfi CYCRf_O_i OYi 86$''&.4,=E 6)E >-'E >-E E 4>':">.64"1E  #  '"1 3'E D=>'3=E D376=.@3E _H] i 6988i  i =C_MOYL`ZYi $"i   :8 i *%%%i "ZU[b`H]i 8ZFOH`fi i [[ i  i A4YROYHB i cCORCERHi

# # # # M``[ GR CFU Z]L FO`C`OZY FIUiOG ii 49*207/F>05-F5<F ABi 9OYf48i MZUH[CLH i dHE[CLHi 0C_`i CFFH__HGi  i A4YROYHB i

'OL i i _i 'OLb]Hi i dO`M i   i 9MHi]O_Qi ZIi [CFQH`i RZ__i O_i Ib]`MH]i ]HGbFHGi cCORCERHiM``[ ddd `OYfZ_ YHai dO`Mi OYF]HC_HGi OY`H]C]]OcCRi `OUH_ i ABi $F 0HcO_i 8 i 2CGGHYi  F 5ZRC_`]Hi 6 i8gHdFgfQi i=ZZi $ i(Cfi . i )ORRi 2 i=HR_Mi % i!]HdH]iCYGi$ i"bRRH]i 9OYf48i Yi Z[H]C`OYLi_f_`HUiIZ]i _HY_Z]i YH`dZ]Q_i OYi .4E3#.'4>E 4>'11.,'4$'E 8[]OYLH]i 8."1E3#'&&'&ED=>'3=E 0d} p[\o} kLkQm} vQ} [LuQ} kmjkjoQP} Ld} L`pQmdLp\uQ} QwQOsp\jd} E 22E >-E E 4>':">.64"1E D376=.@3E 64E PbYHi i [[ i i  i bjPQ`} B;A =?;} LdP} P\oOsooQP} \po} \bk`QbQdpLp\jd} so\dZ} ABi . i %]OQ__ZYi %UEHGGHGi ]HCR`OUHi _ZI`dC]Hi b_OYLi 9OYf9OUEH]i i ]HCF`OcHi (jdOsmmQdp} ?QLOp\uQ} ;N^QOpo} \d} p[Q} ?*5;} RmLbQvjm_ } B[Q} ZEPHF`_i OYi "i 0OFHY`OC`Hi 9MH_O_i 0bRHDi :YOcH]_O`fi ZIi 9HFMYZRZLfi kmjkjoQP} QwQOsp\jd} bjPQ`} kmju\PQo} \dOmQLoQP} WQw\N\`\py} Rjm}  i A4YROYHB i cCORCERHi M``[ H[bER R`b _H  -09:h 0T"8% [GIi oO[QPs`\dZ}jR}Njp[}pLo_o}LdP}Lpjb\O}oQOp\jdo}v[\`Q}bL\dpL\d\dZ} ABi . i %]OQ__ZYi 8 i O``CUCCi . i =OQRCYGH]i 5 i 5OH`]gCQi CYGi $F 0OYGL]HYi p[Q}o\bk`\O\py}LdP} Q`QZLdOQ}jR}p[Q}jm\Z\dL`} B;A} kmjZmLbb\dZ} 8][GUi_FMHGbROYLiZIiFZU[ZYHY`EC_HGiHUEHGGHGi]HCR`OUHi_ZI`dC]Hi bjPQ` } B[Q} \bk`QbQdpLp\jd} so\dZ} p[Q} ?*5;} 0)*} jRRQmo} L} OYi .8=>E 4>':">.64"1E !680=-67E 64E '7'4&"#1'E "4&E '$@8'E 4&@=>8."1E "4&E 3#'&&'&E D=>'3=E !E 22E .bYHi  i RmLbQvjm_} Rjm}PQo\Zd} LdL`yo\o} LdP} oydp[Qo\o} v[\O[} jkQdo} sk} ABi 9 i !CQH]i  i _`CFQEC_HGi ]H_Zb]FHi CRRZFC`OZYi [ZROFfi IZ]i ]HCR`OUHi []Zh Rjm}PQk`jy\dZ}B;A}pj}Lkk`\OLp\jd}LmQLo}mQls\m\dZ}mQL` p\bQ}LdP} FH__H_i OYi '"1 .3'E D=>'3=E D376=.@3E E 86$''&.4,=E 22>-E mQojsmOQ} ZsLmLdpQQo} Rjm} jkQmLp\jd } $HF i i [[ i i  i ABi $F(COi( i0O[C]Oi CYGi2 i$Oi3C`CRHi 2OYOUOgOYLi UHUZ]fib`OROgC`OZYi ZIi FQ} QwQbk`\Ry} p[Q} \bkLOp} jR} oO[QPs`\dZ} pj} L} pyk\OL`} oQd{ ]HCR`OUHi `C_Qi _H`_i OYi _OYLRHi CYGi U;R`O[]ZFH__Z]i _f_`HU_ZYCFMO[i OYi ojm LOpsLpjm} djPQ} oOQdLm\j} kLO_Qp} RjmvLmP\dZ } ;sm} mQos`po} '"1 .3'E D=>'3=ED376=.@3E  E  E  E;$''&.4,=E 4&E \dP\OLpQ} p[Lp} p[Q} kmjkjoQP} kmQQbkp\uQ} QwQOsp\jd} bjPQ`} \o} E $HF i i [[ ii   i

139 140 Paper J An IDE for Component-Based Design of Embedded Real-Time Software

Authors: Jimmie Wiklander, Johan Eriksson, Per Lindgren

Originally published in: 6th IEEE International Symposium on Industrial and Embedded Systems (SIES 2011)

c 2011, IEEE

141 142 An IDE for Component-Based Design of Embedded Real-Time Software

Jimmie Wiklander∗, Johan Eriksson∗ and Per Lindgren∗ ∗EISLAB, Lule˚a University of Technology, 971 87 Lule˚a Email: {Jimmie.Wiklander/Johan.Eriksson/Per.Lindgren}@ltu.se

Abstract—This paper describes work in progress on a tool      for component-based design of embedded real-time software. The tool supports graphical modeling of software systems using    concurrent reactive objects and components, as well as generation       of C code from the model. The resulting application code can then be combined with a lightweight kernel for execution on bare   metal. Fig. 1. A permissible execution window for a reaction to an event. Here I. INTRODUCTION tafter is the baseline offset, tbefore is the time between the baseline and the deadline. Tool support is instrumental for efficient and correct design of embedded software. In this paper we present ongoing work on an Integrated Development Environment (IDE) for components, and discuss modeling interaction between com- Component-Based Design (CBD) of embedded real-time soft- ponents/objects as well as a system’s interaction with its ware, supporting the design methodology presented in our environment. earlier work [1]. The aim is to construct system models based on concurrent reactive objects and components in a graphical A. Time-Constrained Reactions environment and to generate efficient C code from the model Time-constrained reactions [4] lie at the heart of our model. for execution on bare metal. More specifically, the system Interaction between the system and its environment, as well structure is defined graphically and the actual method code is as between components of the system is modeled as discrete written manually in C. The tool ensures that this code does not events occurring at specific times. Following a reactive ap- violate the properties of the model, such as state encapsulation proach, functionality is specified in terms of time-constrained in objects. reactions to such events. Distinctive to our approach is that timing requirements are Timing requirements on system operation can be specified captured as part of the model and are preserved throughout by defining the earliest and the latest reaction time (baseline the design process; they can then by utilized by our kernel and deadline) relative to the time of the event triggering the for scheduling at run-time. This stands in contrast to other reaction. The time window between the reaction baseline and modeling approaches (such as RT-UML [2] and MARTE [3]), its deadline is called a permissible execution window for this which are typically based on successive transformations be- reaction (Fig. 1). tween different models, an approach that increases complexity of the design process. Due to space limitations, this work does B. Concurrent Reactive Objects not include any comparison to other modeling approaches. The fundamental modeling construct of the framework is a The rest of this paper is organized as follows. In the concurrent reactive object (CRO) [5]. Each object has a state next section (Section II) we describe the underlying modeling and one or several methods, and it is reactive in the sense framework. In Section III we focus on the functionality and that it reacts to an incoming event by executing one of its intended use of the tool. In Section IV we present the internal methods. Thus a CRO is either idle (maintaining its state) format of the tool. Finally, in Section V we discuss imminent or executes a method. Methods execute run-to-end, that is, future work. once a method has started execution, no other method of the same object may preempt it. However, any two methods of II. MODELING FRAMEWORK different objects may execute concurrently. A method’s code Our software design methodology relies on a unified, con- can perform computations on local variables, read/mutate the sistent modeling of both hardware and software. The modeling object’s state, and invoke a method of the same or another framework is based on the notions of reactive objects and object by sending a message to it. time-constrained reactions. A more detailed presentation of Methods can be of two kinds (which corresponds to two the framework can be found in [1]. In this section we high- kinds of messages sent between objects): asynchronous, which light the key aspects of time-constrained reactions, reactive are executed concurrently with the caller and can be delayed objects, their abstraction to hierarchical concurrent reactive by a certain amount of time, and synchronous, with the caller

143 blocked until the invoked method completes execution, option- ASYNC(PortName, BaselineOffset, ally returning a value (such methods cannot be delayed). Thus RelativeDeadline, Argument) the permissible execution window of an asynchronous message 5) invoke a synchronous method of another object using can be either inherited or explicitly specified in the code the name of an output port: relatively to the caller’s baseline; in either case, it is viewed in SYNC(PortName, Argument) the model as a separate reaction. A synchronous message, on 6) return a value (only if the method is synchronous). the other hand, always inherits the caller’s time constraints and A CRC definition defines the provided and required inter- is viewed as a part of the original time-constrained reaction. faces and specifies instances of CROs and CRCs encapsulated Both asynchronous and synchronous messages can carry data in the component. (the values of arguments of the invoked methods). Each CRO has a provided interface (the methods, or input E. Kernel support and interaction with the environment ports, of the object that can be invoked by other objects) and a A system model (i.e. the system’s structure in terms of required interface (the methods, or output ports, in the object’s CROs and communication paths between them, and C code environment that it may invoke). In the model, an output port written for each method) can be used to generate C code in the required interface of an object can be linked to an input for the application. Execution on a hardware platform also port in the provided interface of another object, creating a requires infrastructure supporting the model in the form of a communication path between two different CROs. Multiple lightweight kernel supporting scheduling of method execution output ports can be connected to a single input port. and message passing between objects. C. Concurrent Reactive Components Embedded systems interact with their environment and this must be reflected in the model. This is modeled using a Concurrent reactive objects create a flat, non-hierarchical special construct, an environment interface component that structure of a system. To support efficient development of defines an interface to the hardware. An instance of such complex software system, we introduce the notion of con- interface component enables reading from hardware registers current reactive components (CRC), which contain no own and writing to them and is also used to connect specific state or methods but instead encapsulate1 a number of objects hardware interrupts to asynchronous methods of particular and other components, creating a simple hierarchical struc- objects. The model does not make any assumptions regarding ture. Like objects, components have provided and required the behavior of the environment. interfaces, but the ports in the interfaces are connected not Sometimes it is necessary to include legacy software that to methods, but to ports in the interfaces of inner objects or is not based on the CRO model (typically, external software components. Thus communication across component boundary libraries). Such libraries can be treated as part of the system’s is in fact communication between two objects belonging environment, and we can use an environment interface com- to two different components, which allows for an efficient ponent as a wrapper to legacy code. implementation where a component hierarchy has no overhead at run-time. III. IDE FOR EMBEDDED REAL-TIME SOFTWARE It is worth noting that while a CRO can only execute In this section we focus on the functionality and intended one method at a time with concurrency existing between use of the tool, that is, constructing system models in a graph- different objects, a CRC can be concurrent in itself as it may ical environment. An example system (a process controller) is encapsulate multiple objects. also presented. D. Classes and Instances A model of a particular system along with configuration Every instance of a CRO/CRC belongs to a class, which settings is stored in a project file in XML format. Each project can be defined at the top level in a module or locally within window initially contains two tabs: a configuration tab and a another class; in the latter case, it can only be used inside that system tab. class. A CRO definition defines the object’s state variables, A. System Tab methods (these are written manually in C and the tool verifies that the C code complies with the model’s requirements as The system tab supports creating a graphical model of a outlined below), and the provided and required interfaces of system using modeling constructs described in the previous the object. In the C code of a method we may: section. Definitions of objects and components can be cre- ated and edited in the tool, and instances of objects and 1) define local variables, components can be included in a component definition.The 2) read/update the object’s state, CRC/CRO definitions for the process controller example are 3) perform calculations on the object’s state, method argu- shown in Figure 2. The content of the system and app CRC ment, and local variables, definitions from the example are shown in Figures 3 and 4, 4) invoke an asynchronous method of this object or of respectively. Connecting one interface port to another enables another object using the name of an output port (see message passing (sending a synchronous or an asynchronous Fig. 1): message) between instances at run-time. The content of a 1No object or component can be part of two different components. CRO definition (state and methods) can be edited in a built-in

144 Fig. 2. The complete set of CRC/CRO class definitions (including an interface to the environment) for the controller system (CRC definitions in blue, CRO definitions in yellow, and an environment interface in orange). The top label shows the class name and the labels at the left/right side of each class represent its provided/required interfaces. Note that only the system component (i.e., the model of the whole system that encapsulates an instance of the environment interface) does not require any other CRC/CRO in order to be instantiated (it has no provided/required interface).

Fig. 3. The definition of the system CRC (a CRC instance in blue and an environment interface in orange) with arrows indicating message paths. text editor by opening a separate CRO tab for the object in • inc and dec methods for increasing/decreasing the local question. state variable (these also invoke setpoint). The controller object definition is specified in a similar way: B. Configuration Tab • init initializes the local state and invokes a private (not Certain settings have to be specified to generate code for a visible in the interface) method process, particular hardware platform. This is done in the configuration • setpoint updates the internal state variables, and tab. In addition, it is necessary to specify the “root” CRC • process, a private method that samples the AD, per- (the CRC that contains the whole system and that should be forms control calculations and sets a new output value instantiated at system startup). The root component cannot periodically (the definition of a private method is that it have any ports in its interfaces and all interface ports of is not part of the object’s interface, so it cannot be invoked the inner objects/components must be connected for code from outside the object; private methods of an object can generation to succeed. be edited by opening a CRO tab). The logger is similar to the controller in that it has an C. Control system example initialization method and a method that periodically samples This example implements a simple control application, the AD for logging purposes. where the root component (called system) defines the envi- IV. INTERNAL DATA FORMAT ronment interface envinst and the CRC instance appinst, encapsulating the CROs implemented in software. Thus A model is stored in an XML file capturing the hierarchical the interfaces of the app component describe interaction structure of the system and method implementations. At the with the environment, in this case there are two outputs top level, it contains one or several modules: (control_out, get_feedback) and three inputs, or entry points: the startup routine (start) and two interrupt handlers [CONTENT] (inc, dec). The application component instantiates three objects (adapter, controller,andlogger). In this paper we have restricted us to using only one module, The adapter object definition provides three methods: but combining multiple modules in one project is useful for • a startup routine for setting up the local state, and creation of commodity libraries for component re-use. initializing the controller and logger by calling their Each module contains one or several class definitions for initialization methods, and CROs and CRCs. A class definition specifies a required

145 Fig. 4. The definition of the app CRC (provided/required interfaces of the app CRC in white and encapsulated CRO instances in yellow) with arrows indicating message paths. interface (the list of output ports arg) and a provided interface Here name is the instance name, class selects the (the list of input ports res). Assignments to the provided class definition to instantiate, and classarg are the interface ports of the class are defined in con.Hereisan argument(s) for instantiation. example of a class definition (CONTENT stands for inner V. F UTURE WORK modeling constructs included in the class definition): There are a number of obvious extensions that will be support for combining a number of ports into a single port, [CONTENT] which is important to avoid clutter in graphical representation of larger embedded systems. We will also work on introducing a graphical representation of timing constraints for method A class definition of a CRO may contain: execution. Finally, we will add support for component repos- itories (the tool currently supports creating multiple instances 1) One state structure with C code declaring state variables from a class definition but there is no support for re-use of of the object, e.g.; definitions between projects). ACKNOWLEDGMENT This work was supported in part by the Knowledge Founda- 2) One or several methods with C code implementation, tion in Sweden under a research grant for the SAVE-IT project, written according to guidelines discussed in in Sec- the EU Interreg IV A North Programme (grant no. 304-15591- tion II, e.g.: 08), the ESIS project (European Regional Development Fund, [1] J. Wiklander, J. Eliasson, A. Kruglyak, P. Lindgren, and J. Nordlander, “Enabling component-based design for embedded real-time software,” Journal of Computers (JCP), vol. 4, no. 12, pp. 1309–1321, 2009. And a class definition of a CRC may contain: [2] S. Graf, I. Ober, and I. Ober, “A real-time profile for UML,” Int. J. on Software Tools for Technology Transfer, vol. 8, no. 2, pp. 113–127, 2006. 1) Zero or more class definitions (as described above) only [3] M. Faugere, T. Bourbeau, R. De Simone, and S. Gerard, “MARTE: Also visible in the scope of the component and any sub- an UML profile for modeling AADL applications,” in 12:th IEEE Int. Conf. on Engineering Complex Computer Systems, 2007, pp. 359–364. component(s) (referred to as local class definitions). [4] J. Nordlander, M. P. Jones, M. Carlsson, and J. Jonsson, “Programming 2) One or more instances of objects or components, created with time-constrained reactions,” Lule˚a University of Technology, Tech. within an instance of this class, e.g.: Rep., 2005. [Online]. Available: http://pure.ltu.se/ws/fbspretrieve/441200 [5] J. Nordlander, M. P. Jones, M. Carlsson, R. B. Kieburtz, and A. Black,

146 Paper K RTFM-lang Static Semantics for Systems with Mixed Criticality

Authors: Per Lindgren, Johan Eriksson, Marcus Lindner, David Pereira and Luis Miguel Pinho

Originally published in: Proc of Workshop on Mixed Criticality for Industrial Systems (WMCIS 2014) , Ada User Journal, Volume 35, Number 2, June 2014

c 2014, Ada-Europe

147 148 149 150 151 152 153