Formalizing Software Architectures for Embedded Systems
Pam Binns and Steve Vestal
This work has been brought to you by
DARPA AFOSR AMCOM Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Integrated and Traceable Specification, Analysis, Implementation
design feed-back formal modeling verification and analysis methods and tools discipline-specific design notations and editing and visualization tools implementation methods and tools implementation
Increase assurance the implementation behaves the way the models say it will behave Improve quality of system design through more accurate and rapid design-time evaluation Decreased modeling, implementation, debugging and verification effort Integrated Modular Avionics System Integration
Target Hardware Specifications MatLab Re-engineering of legacy software ControlH MetaH Software & Systems M MATRIXx Integration Toolset Traditional Development Other Complete Specialized Toolsets Executable System
Meta-Tooling An Open Systems Solution
• Compatible with existing standards (e.g. Ada, C/C++, POSIX)
• Emerging SAE standard Avionics Architecture Description Language • first ballot scheduled 2003 • industry and government participation, e.g.
Army Boeing Dassault NIST Rockwell/Collins Smiths Industries Navy Lockheed-Martin Honeywell Pratt/Whitney Raytheon Airbus
• Potential UML-RT profile for safety-critical hard real-time MetaH Toolset Functions
source modules AADL specifications
graphical textual editor editor
compliance syntax and checker semantics checker
HW/SW binder
middleware schedulability reliability partition configurer analyzer analyzer analyzer
make linear hybrid automata load image formal verification AMCOM Effort Saved Using MetaH
total project savings 50%, re-target savings 90%
8000
7000
6000 Man Hours Man 5000
4000
3000 Traditional 2000 Approach
1000 Using 0 MetaH Review 3-DOF Trans- Current 6-DOF RT- late Trans- Test MetaH 6DOF form RT- MetaH Current 6DOF Build Debug Missile Debug Re-target
Development cost (NRE) is usually a small fraction of life cycle cost (LCC). Maximizing design quality is often more important than minimizing design effort. Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Specification Language is Hierarchical and Compositional
A A
Interface to B C objects of type A
A.X B
C D E F
Leaf objects describe software Implementation X for and hardware components objects of type A (zero or more allowed) Software Descriptions and Composition
Application Groupings of functional Mode subsystems and connections between Macro them Connections
Process Package/Monitor Subprogram Port Type Port Variable Event Descriptions of source code Hardware Descriptions and Composition
Application Groupings of functional System subsystems and connections between Connections them
Descriptions of physical Device hardware objects Memory Processor Channel
AADL will combine application, macro and system into a single more powerful system category with improved support for software/hardware co-design, virtual machine and layered system specification, etc. Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Computation and Communication
process
release time deadline
execution
time message in message out Process are repetitively dispatched periodic (time-triggered) aperiodic (event-triggered)
Message input/output occurs at release times and deadlines Periodic workloads have deterministic functional dependency independent of execution and communication times (if schedulable) independent of software/hardware binding
Shared objects are supported Messages
Data is copied by an automatically configured executive – from an out port variable after sender completion – to an in port variable before the start of receiver execution
Connections and transfers may – be undelayed (with implied execution order constraints) – have single sample delay
There is a combined event-with-data connection
User selection among real-time semaphore protocols for shared objects Events Have Continuous Signal Semantics
Event signal rise time may be time different on different processors.
rise times fall time Event signal fall time is identical and fault-tolerant on all processors.
• default event duration is the period of the raising process
• mode changes occur at the falling edge of the triggering event
• events arriving at executing aperiodics may nudge, signal or interrupt
• meaningful semantics for logical operations on events Dynamic Reconfiguration
A mode is a configuration of active processes and connections. Process B Mode changes stop and start subsets Mode A of processes and change patterns of Process message and event connections. A
Event connections create a hierarchical mode transition diagram. Process C
Mode B Schedulability Analysis
Given • process/processor and message/channel bindings • process periods, deadlines, criticalities • sequence of modules executed by a process • module nominal and worst-case compute times
Compute • processor and channel schedulability • processor, channel, process, module utilizations • parametric compute time sensitivity analysis Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Stochastic Automata Fault Model
error_free error_free
permanent error propagation permanent fault synchronization fault failed propagate propagate failed
processor processor
Component error models are specified as stochastic automata Error propagation synchronizations can be determined from • architecture specification • voting protocol specifications For Poisson rates, Markov chain system model can be generated Fault-Tolerance and Safety Features
A process may be time and space partitioned
Safety/design assurance level may be specified for any component
Hazardous run-time capabilities enabled on a per-process basis
Executive consensus protocol is plug-replaceable
Message data errors detected and reported (but not corrected)
Process error handling semantics are defined
Model generators output human-readable, structured models Reliability Analysis
Given • possible fault types and error states • system architecture (potential propagation paths) • consensus/voting planes • operational versus failed system configurations • mission duration
Compute • Pr(fail) Partition Isolation Analysis
Given • time-and-space partitions in architecture • safety/assurance level (A..E) for each component
Verifies • no error in a component with lower safety level can propagate to a component with higher safety level Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Code Integration/Generation
Automatically configures an executive/middleware
– Generates time-driven dispatcher for periodic processes and messages
– Generates message passing code
– Generates code to vector events for processes, messages, mode changes
– Tailors an API to the services required by and authorized for each process
Automatically performs compiles and links needed for each processor image Middleware Structure
Application Application Application Application Application process process process process process
Automatically generated MetaH executive components
MetaH executive library components target-specific library components
Run-time or RTOS
Processor A Processor B
One downloadable image file is generated for each processor. Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Complex Embedded Workload Issues and Goals
Hard real-time scheduling theory limited to • repetitive weakly-interacting tasks • fixed bounds on arrival rates and compute times
Event-driven models are more widely used, e.g. • remote procedure calls • concurrent processes • stochastic performance metrics
Goals • co-host mission-critical event-triggered models and safety-critical time-triggered models in partitioned IMA systems • analytic ways to predict response time distributions Slack Scheduling of Event-Driven Activities Event arrival
Slack scheduling efficiently reclaims and reallocates unused
CPU time as soon as possible. { Slack(t) is maximum time that can be taken from deterministic workload at time t. Partitioned aperiodic, incremental and dynamic threads • MetaH (also deferred server, period-enforced aperiodic) • DEOS (Primus Epic RTOS, replaced deferred server)
COTS FTP/TCP/IP stack hosted in DEOS • > 3X improvement in throughput • 7X reduction in reserved processor utilization
Remote procedure calls (expected to start soon) Stochastic Performance Modeling
Limitations of traditional stochastic performance models • limited resource sharing with deterministic tasks • often averages, not distributions
Result: analytic models for response time distributions of slack-scheduled and background aperiodic tasks.
slack-scheduled background % responding within % responding response time Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Distributed Scheduling Goals
• Efficient hardware utilization • e.g. >90% processor, >75% bus utilizations
• Small end-to-end latencies • e.g. single-frame
• Assured and certifiable performance • analytic schedulability analysis
• Tractable for large systems • schedule >1000 tasks on >100 nodes in <10 sec • fast incremental rescheduling at workload changes
• Adaptable to various redundancy management schemes, COTS networking hardware and RTOS Decomposition Scheduling process period and process release time implicit deadline Di Ti process preperiod deadline communication deadline Processor process Schedule computation
communication release time Bus process Schedule communication
process response message response time time L Ri Ci Rij LX ij
Iteratively compute a good set of Di’s Decomposition Scheduling History & Results
• Initial prototype developed in 1997, scheduled very sanitized 6-node ARINC 653 workload in <1 second, synthetic workload of 450 nodes in <4 minutes.
• Used to schedule and analyze Comanche MEP systems in 1998 scheduled 24 node MEP in <1 second scheduled synthetic workload of 146 nodes in < 10 seconds
• Used in baseline 1999 ID program to assess feasibility of using COTS networking hardware and overheads.
• Garcia & Harbour, Universidad de Cantabria, published essentially the same iterative approach but using a different iteration equation, 1995. Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Some Real-Time Verification Problems
Traditional hard real-time scheduling theory applies only to - Weakly-interacting repetitive tasks. - Uniprocessors. This does not work well for - complex task interactions, e.g. remote procedure calls, voting/communication protocols - distributed scheduling anomalies - tasks whose behaviors depend significantly on nontrivial interactions - verifying implementations Traditional concurrency models apply to - Processes with enumerated finite states. - Processes without resource requirements or temporal behaviors. This does not work well for - modeling software variables - modeling real time - modeling interactions with external physical systems Approach and Results
..If C > 10 Hybrid automata can model both C = 1 C = 0 discrete event and continuous activities and their interactions. C := 0 . C = 0 New Results
•Resourceful linear hybrid automata models
•Decideability given practical restrictions
•A more robust and efficient reachability method
•Modeling and verification of a real-time executive MetaH Executive
Threads and Time_Slice implement the basic scheduling, fault handling, and time partitioning operations, using configuration tables generated in other modules.
Threads and Time_Slice account for 1800 of the 2800 lines of application- independent, target-independent code. Summary of Verification Results Detected 9 defects 1 previous ad-hoc testing should have caught 3 almost impossible to detect using testing 5 detectable with careful timeline control during recovery testing
A blend of testing and formal analysis all possible sequences verified for a fixed set of applications set of applications selected to achieve full model coverage TBD induction argument is needed to argue correctness for all applications
Estimated effort comparable to unit testing no effort needed to check unit test outputs additional effort needed for model generation
Estimated more thorough than requirements testing of verified features
Model generation and analysis (mostly) automated
Well-coupled with development processes and tools Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Mixed Reliability
Allow different redundancy management approaches to be used for different applications within an IMA system, e.g. • quad modular redundancy with error masking • hot/cold standby with dynamic reconfiguration • dynamic reconfiguration to shed load
Approach integrates technologies for • time and space partitioning • fault-tolerant real-time mode changes
Some open issues • application mode resynchronization after transient processor fault • mode design assurance (code enabling/disabling) • computational tractability for multiple subsystems with multiple modes • component behavioral modes Safety Modeling
Language should be extended • hazards • failure modes, effects and summaries • fault trees Integrated semantics, including stochastic automata and partitioning
Toolset should be extended • computational tractability for large systems • symmetry-detecting optimizing transformations • user-controlled multi-level modeling abstraction • integrated with newer solution libraries • more extensive and traceable analysis • most likely failure scenarios • parametric analysis • model cross-checking • partitioning consistent with independence assumptions Outline
Avionics Architecture Description Language (MetaH) Overview and motivation Structure and syntax Computation and communication Reliability and safety Implementing systems
Research Activities Integrated partitioned time-and-event workloads Efficient low-latency distributed system scheduling Hybrid automata scheduling and analysis Integrated reliability and system safety Dynamic reconfiguration Dynamic Reconfiguration
Additional specification language features are needed, e.g. component behavioral modes/states explicit modification of “inherited” mode components
Off-line spec change and verification, on-line upgrade to running system
On-line incremental mode/configuration enumeration, analysis, admission
Systems of systems without fixed size, e.g. arbitrary number of instances of a set of specifications theorem proving methods