Formalizing Software Architectures for Embedded Systems

Pam Binns and Steve Vestal

This work has been brought to you by

DARPA AFOSR AMCOM Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Integrated and Traceable Specification, Analysis, Implementation

design feed-back formal modeling verification and analysis methods and tools discipline-specific design notations and editing and tools implementation methods and tools implementation

Increase assurance the implementation behaves the way the models say it will behave Improve quality of system design through more accurate and rapid design-time evaluation Decreased modeling, implementation, debugging and verification effort Integrated Modular Avionics System Integration

Target Hardware Specifications MatLab Re-engineering of legacy software ControlH MetaH Software & Systems M MATRIXx Integration Toolset Traditional Development Other Complete Specialized Toolsets Executable System

Meta-Tooling An Open Systems Solution

• Compatible with existing standards (e.g. Ada, C/C++, POSIX)

• Emerging SAE standard Avionics Architecture Description Language • first ballot scheduled 2003 • industry and government participation, e.g.

Army Boeing Dassault NIST Rockwell/Collins Smiths Industries Navy Lockheed-Martin Honeywell Pratt/Whitney Raytheon Airbus

• Potential UML-RT profile for safety-critical hard real-time MetaH Toolset Functions

source modules AADL specifications

graphical textual editor editor

compliance syntax and checker semantics checker

HW/SW binder

schedulability reliability partition configurer analyzer analyzer analyzer

make linear hybrid automata load image formal verification AMCOM Effort Saved Using MetaH

total project savings 50%, re-target savings 90%

8000

7000

6000 Man Hours Man 5000

4000

3000 Traditional 2000 Approach

1000 Using 0 MetaH Review 3-DOF Trans- Current 6-DOF RT- late Trans- Test MetaH 6DOF form RT- MetaH Current 6DOF Build Debug Missile Debug Re-target

Development cost (NRE) is usually a small fraction of life cycle cost (LCC). Maximizing design quality is often more important than minimizing design effort. Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Specification Language is Hierarchical and Compositional

A A

Interface to B C objects of type A

A.X B

C D E F

Leaf objects describe software Implementation X for and hardware components objects of type A (zero or more allowed) Software Descriptions and Composition

Application Groupings of functional Mode subsystems and connections between Macro them Connections

Process Package/Monitor Subprogram Port Type Port Variable Event Descriptions of source code Hardware Descriptions and Composition

Application Groupings of functional System subsystems and connections between Connections them

Descriptions of physical Device hardware objects Memory Processor Channel

AADL will combine application, macro and system into a single more powerful system category with improved support for software/hardware co-design, and layered system specification, etc. Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Computation and Communication

process

release time deadline

execution

time message in message out Process are repetitively dispatched periodic (time-triggered) aperiodic (event-triggered)

Message input/output occurs at release times and deadlines Periodic workloads have deterministic functional dependency independent of execution and communication times (if schedulable) independent of software/hardware binding

Shared objects are supported Messages

Data is copied by an automatically configured executive – from an out port variable after sender completion – to an in port variable before the start of receiver execution

Connections and transfers may – be undelayed (with implied execution order constraints) – have single sample delay

There is a combined event-with-data connection

User selection among real-time semaphore protocols for shared objects Events Have Continuous Semantics

Event signal rise time may be time different on different processors.

rise times fall time Event signal fall time is identical and fault-tolerant on all processors.

• default event duration is the period of the raising process

• mode changes occur at the falling edge of the triggering event

• events arriving at executing aperiodics may nudge, signal or interrupt

• meaningful semantics for logical operations on events Dynamic Reconfiguration

A mode is a configuration of active processes and connections. Process B Mode changes stop and start subsets Mode A of processes and change patterns of Process message and event connections. A

Event connections create a hierarchical mode transition diagram. Process C

Mode B Schedulability Analysis

Given • process/processor and message/channel bindings • process periods, deadlines, criticalities • sequence of modules executed by a process • module nominal and worst-case compute times

Compute • processor and channel schedulability • processor, channel, process, module utilizations • parametric compute time sensitivity analysis Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Stochastic Automata Fault Model

error_free error_free

permanent error propagation permanent fault synchronization fault failed propagate propagate failed

processor processor

Component error models are specified as stochastic automata Error propagation synchronizations can be determined from • architecture specification • voting protocol specifications For Poisson rates, Markov chain system model can be generated Fault-Tolerance and Safety Features

A process may be time and space partitioned

Safety/design assurance level may be specified for any component

Hazardous run-time capabilities enabled on a per-process basis

Executive consensus protocol is plug-replaceable

Message data errors detected and reported (but not corrected)

Process error handling semantics are defined

Model generators output human-readable, structured models Reliability Analysis

Given • possible fault types and error states • system architecture (potential propagation paths) • consensus/voting planes • operational versus failed system configurations • mission duration

Compute • Pr(fail) Partition Isolation Analysis

Given • time-and-space partitions in architecture • safety/assurance level (A..E) for each component

Verifies • no error in a component with lower safety level can propagate to a component with higher safety level Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Code Integration/Generation

Automatically configures an executive/middleware

– Generates time-driven dispatcher for periodic processes and messages

– Generates message passing code

– Generates code to vector events for processes, messages, mode changes

– Tailors an API to the services required by and authorized for each process

Automatically performs compiles and links needed for each processor image Middleware Structure

Application Application Application Application Application process process process process process

Automatically generated MetaH executive components

MetaH executive components target-specific library components

Run-time or RTOS

Processor A Processor B

One downloadable image file is generated for each processor. Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Complex Embedded Workload Issues and Goals

Hard real-time scheduling theory limited to • repetitive weakly-interacting tasks • fixed bounds on arrival rates and compute times

Event-driven models are more widely used, e.g. • remote procedure calls • concurrent processes • stochastic performance metrics

Goals • co- mission-critical event-triggered models and safety-critical time-triggered models in partitioned IMA systems • analytic ways to predict response time distributions Slack Scheduling of Event-Driven Activities Event arrival

Slack scheduling efficiently reclaims and reallocates unused

CPU time as soon as possible. { Slack(t) is maximum time that can be taken from deterministic workload at time t. Partitioned aperiodic, incremental and dynamic threads • MetaH (also deferred server, period-enforced aperiodic) • DEOS (Primus Epic RTOS, replaced deferred server)

COTS FTP/TCP/IP stack hosted in DEOS • > 3X improvement in throughput • 7X reduction in reserved processor utilization

Remote procedure calls (expected to start soon) Stochastic Performance Modeling

Limitations of traditional stochastic performance models • limited resource sharing with deterministic tasks • often averages, not distributions

Result: analytic models for response time distributions of slack-scheduled and background aperiodic tasks.

slack-scheduled background % responding within % responding response time Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Distributed Scheduling Goals

• Efficient hardware utilization • e.g. >90% processor, >75% bus utilizations

• Small end-to-end latencies • e.g. single-frame

• Assured and certifiable performance • analytic schedulability analysis

• Tractable for large systems • schedule >1000 tasks on >100 nodes in <10 sec • fast incremental rescheduling at workload changes

• Adaptable to various redundancy management schemes, COTS networking hardware and RTOS Decomposition Scheduling process period and process release time implicit deadline Di Ti process preperiod deadline communication deadline Processor process Schedule computation

communication release time Bus process Schedule communication

process response message response time time L Ri Ci Rij LX ij

Iteratively compute a good set of Di’s Decomposition Scheduling History & Results

• Initial prototype developed in 1997, scheduled very sanitized 6- ARINC 653 workload in <1 second, synthetic workload of 450 nodes in <4 minutes.

• Used to schedule and analyze Comanche MEP systems in 1998 scheduled 24 node MEP in <1 second scheduled synthetic workload of 146 nodes in < 10 seconds

• Used in baseline 1999 ID program to assess feasibility of using COTS networking hardware and overheads.

• Garcia & Harbour, Universidad de Cantabria, published essentially the same iterative approach but using a different iteration equation, 1995. Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Some Real-Time Verification Problems

Traditional hard real-time scheduling theory applies only to - Weakly-interacting repetitive tasks. - Uniprocessors. This does not work well for - complex task interactions, e.g. remote procedure calls, voting/communication protocols - distributed scheduling anomalies - tasks whose behaviors depend significantly on nontrivial interactions - verifying implementations Traditional concurrency models apply to - Processes with enumerated finite states. - Processes without resource requirements or temporal behaviors. This does not work well for - modeling software variables - modeling real time - modeling interactions with external physical systems Approach and Results

..If C > 10 Hybrid automata can model both C = 1 C = 0 discrete event and continuous activities and their interactions. C := 0 . C = 0 New Results

•Resourceful linear hybrid automata models

•Decideability given practical restrictions

•A more robust and efficient reachability method

•Modeling and verification of a real-time executive MetaH Executive

Threads and Time_Slice implement the basic scheduling, fault handling, and time partitioning operations, using configuration tables generated in other modules.

Threads and Time_Slice account for 1800 of the 2800 lines of application- independent, target-independent code. Summary of Verification Results Detected 9 defects 1 previous ad-hoc testing should have caught 3 almost impossible to detect using testing 5 detectable with careful timeline control during recovery testing

A blend of testing and formal analysis all possible sequences verified for a fixed set of applications set of applications selected to achieve full model coverage TBD induction argument is needed to argue correctness for all applications

Estimated effort comparable to unit testing no effort needed to check unit test outputs additional effort needed for model generation

Estimated more thorough than requirements testing of verified features

Model generation and analysis (mostly) automated

Well-coupled with development processes and tools Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Mixed Reliability

Allow different redundancy management approaches to be used for different applications within an IMA system, e.g. • quad modular redundancy with error masking • hot/cold standby with dynamic reconfiguration • dynamic reconfiguration to shed load

Approach integrates technologies for • time and space partitioning • fault-tolerant real-time mode changes

Some open issues • application mode resynchronization after transient processor fault • mode design assurance (code enabling/disabling) • computational tractability for multiple subsystems with multiple modes • component behavioral modes Safety Modeling

Language should be extended • hazards • failure modes, effects and summaries • fault trees Integrated semantics, including stochastic automata and partitioning

Toolset should be extended • computational tractability for large systems • symmetry-detecting optimizing transformations • user-controlled multi-level modeling abstraction • integrated with newer solution libraries • more extensive and traceable analysis • most likely failure scenarios • parametric analysis • model cross-checking • partitioning consistent with independence assumptions Outline

Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems

Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Dynamic Reconfiguration

Additional specification language features are needed, e.g. component behavioral modes/states explicit modification of “inherited” mode components

Off-line spec change and verification, on-line upgrade to running system

On-line incremental mode/configuration enumeration, analysis, admission

Systems of systems without fixed size, e.g. arbitrary number of instances of a set of specifications theorem proving methods