arXiv:cs/0504109v1 [cs.SE] 29 Apr 2005 htapistaiinlcnrlzdmtgtv strate- mitigative system’ centralized ‘expert traditional an design in- applies num- to scenarios that the impossible fault it so intractable make DSPs, and volved 2500 components of trigger of consist BTeV ber planned will the alone of system miti- 1 failure Level strategies. and gation resource flow, hardware data of application implementation specifications, and facil- design to developed diag-itate was discover, environment modeling to generic A processing ability real-time in discontinuities the to provide react and with nose, objects layer These self-healing a environments. dynam- to changing designed automatically are ically adapt to reliablility self-configuring for be reconfig- objects to fail- The Adaptive, mobile approach. and competence. cascading distributed urable, of a detecting mitiga- using self-protecting, layers ures and specified are monitoring on agents reactive based responsible are and tion prototype the proactive em- of for 1 agents Level self-optimizing within bedded devel- Lightweight software experiment. physics energy embedded high BTeV proposed the adaptive for oped fault large-scale hspprdsrbsacmrhniepooyeof prototype comprehensive a describes paper This rttp fFutAatv meddSfwr o Large-Sc for Software Embedded Adaptive Fault of Prototype heaset@adriteu [email protected] [email protected], Abstract eateto lcrclEgneigadCmue Science Computer and Engineering Electrical of Department chessreu sesesreu [email protected] [email protected], [email protected], nvriyo liosa Urbana-Champaign at Illinois of University nttt o otaeItgae Systems Integrated Software for Institute ee ese iaJn,adJeC Oh C. Jae and Jung, Mina Messie, Derek heaSet n tvnNordstrom Steven and Shetty Shweta elTm Systems Real-Time adritUniversity Vanderbilt ihEeg Physics Energy High ahil,T 37235 TN Nashville, yaue Y13244 NY Syracuse, yaueUniversity Syracuse [email protected] raa L61801 IL Urbana, ihe Haney Michael . tdb olsoso hsc atce nextremely in particles gener- physics data of process intelli- collisions to as by recovery ensure well ated and as to diagnosis lower-level gent fault-tolerance, software develop integrity, intelligent to system embedded is collabora- formed responsibility real-time (RTES) we whose Systems hardware, tion, Embedded BTeV Real-Time soft- upcoming build a the to order for In sensors. ware compo- and (FPGAs) detectors Arrays hardware pixel Gate and other Programmable Pur- Field from General like of nents apart number and large Processors (DSPs) very Labora- pose Processors a National Signal of Fermi consists Digital system the (HEP) This at Physics tory. proposed system par- Energy the High experiment a for accelerator-based trigger- system system, ticle the acquisition (http://www.btev.fnal.gov/) for data BTeV components and analysis ing and quisition Introduction 1. by developed group. methodologies Systems Embedded system Real-Time and the tools possible imple- the is every using approach reactive mented capturing distributed a rules Instead, state. on based gies edsrb ndti rttp o h aaac- data the for prototype a detail in describe We ale high data-rate environments (approximately 1.5 Ter- per describes the design and development of the pro- abytes per second). Given the complexity of the totype in detail. system, the goal is to develop tools and method- The rest of this paper is organized as follows. Section ologies that are self-* (self-configuring, self-healing, 2 provides some background on the BTeV experiment self-optimizing, self-protecting) as possible. and the RTES collaboration. The RTES system devel- The BTeV trigger system has three levels, namely opment environment is then presented in Section 3, in- Level 1 (L1), Level 2 (L2), and Level 3 (L3). L1 con- cluding an overview of VLAs and ARMOR. The vari- sists of approximately 2,500 DSPs that process data ous system modeling tools developed within the collab- collected from sensors. L2 and L3 are approximately oration, along with an explanation of how each is used 2,500 Linux machines for processing the data passed for design and implementation is also detailed. through L1 processors. In all three levels, processing Section 4 describes the prototype that was presented the data collected from sensors is the most important at SuperComputing 2003 (SC2003). Design motivation work, which is carried out by High Energy Physics is discussed, followed by software and hardware spec- (HEP) applications1. Due to the high-speed data rate ifications. The embedded VLA design and implemen- and enormous amount of data, the system has to be dy- tation for the prototype is detailed, along with an ex- namically fault-adaptive and self-correcting. planation of the Experimental Physics and Industrial Very Lightweight Agents (VLAs) [14] are embed- Control System (EPICS) used to inject system faults ded within L1 as simple software entities which can and monitor VLA mitigation and overall system be- be implemented in a few dozen lines of assembly lan- havior. Lessons learned are also provided. guage, and take advantage of the exception-signaling Finally, future efforts planned for the next phase and interrupt-handling mechanisms present in most of prototype development are described, followed by DSP kernels to expose errors in the kernel behavior. a conclusion. VLAs consist of a proactive part and a reactive part to provide fault tolerance in the form of intelligent er- 2. Background and Motivation ror detection, diagnosis, and recovery. The proactive 2.1. RTES/BTeV part of VLAs can further be divided into a mandatory part and an optional part. BTeV is a proposed particle accelerator-based High When the VLA detects (e.g., by monitoring DSP ex- Energy Physics (HEP) experiment currently under de- ception signals) an error condition, it may take fault velopment at Fermi National Accelerator Laboratory. mitigative action directly, or notify appropriate higher The goal is to study charge-parity violation, mixing, level components, which may take appropriate actions and rare decays of particles known as beauty and such as disabling the execution thread or discarding charm hadrons, in order to learn more about - the current data item. A similar mechanism will be ex- asymmetries that exist in the to- plored for the monitoring and reporting of deadlines, day [11]. When approved, the BTeV experiment will be traffic, processor loads, etc. sponsored by the Department of Energy. The fault tolerance and performance-oriented ser- vices offered at L2/L3 will be encapsulated in intelli- gent active entities (agents) called ARMORs (Adap- tive, Reconfigurable, and Mobile Objects for Reliabil- ity) [9]. ARMORs are, by design, highly flexible pro- cesses, which can be customized to meet the runtime needs of the system. A prototype for the BTeV L1 trigger system has been built on DSP boards consisting of 16 Texas In- strument DSPs. The prototype includes L1 VLAs, AR- MORs, and the Experimental Physics and Industrial Control System (EPICS). It exhibits several fault adap- tiveness and tolerance behaviors. The prototype pro- vided us a great opportunity to realize the ideas and concepts to a real-working hardware platform. This pa- Figure 1: BTeV pixel detector layout. 1 HEP applications are also called physics applications (PAs) The BTeV experiment will operate in conjunction pression is performed, it is expected that the resulting with a particle accelerator where the collision of pro- data rate will be approximately 200 Megabytes per sec- tons with anti- can be recorded and examined ond. The events that are actually accepted within this for detached secondary vertices from charm and beauty system occur very infrequently, and the cost of oper- hadron decays [7]. The layout for the BTeV detector is ating this environment is high. The extremely large shown in Figure 1. streams of data resulting from the BTeV environment The experiment uses approximately 30 planar sili- must be processed real-time with highly resilient adap- con pixel detectors to record interactions between col- tive fault tolerant systems. For these reasons, a Real- liding protons and in the presence of a large Time Embedded Systems Collaboration (RTES) was magnetic field. The pixel detectors, along with read- formed with the purpose of designing real-time embed- out sensors are embedded in the accelerator, which are ded intelligent software to ensure data integrity and connected to specialized field-programmable gate ar- fault-tolerance within this data acquisition system. The rays (FPGAs). The FPGAs are connected to approxi- collaboration includes team members from Fermi Lab, mately 2,500 digital signal processors (DSPs). Syracuse University, Vanderbilt University, University The measurements of the interactions resulting from of Illinois at Urbana-Champaign, and the University of the collision of protons and antiprotons are carried via Pittsburgh. custom circuitry hardware to localized processors that reconstruct the 3-dimensional crossing data from the 3. RTES System Development Environ- silicon detector in order to examine the trajectories for ment detached secondary vertices [13]. These detached ver- An overview of the BTeV system design and run- tices are indicators of the likely presence of beauty or time framework is shown in Figure 2. There are four charm decays. primary components, including very lightweight em- BTeV will operate at a luminosity of 2x1032cm−2s−1 bedded fault mitigation agents (VLAs), adaptive, re- corresponding to about 6 interactions per 2.53 MHz configurable, mobile objects for reliability (ARMOR), beam crossing rate [11]. Average event sizes will be a generic modeling environment (GME), and a system around 200 Kilobytes after zero-suppression of data is operator interface (EPICS). performed on-the-fly by front-end detector electronics. EPICS (http://www.aps.anl.gov/epics) provides an Every beam crossing will be processed, which translates interface for injecting faults into the system, which al- into the extremely high data rate of approximately 1.5 lows for evaluation of the effect of individual fault sce- Terabytes of data every second, from a total of 20x106 narios on the BTeV environment. It provides a way for data channels. operators to monitor and control overall system behav- A three tier hierarchical trigger architecture will be ior. Details and screenshots of the EPICS interface are used to handle this high rate. Data from the pixel de- presented in later sections of this paper that describe tector and muon detector will be sent to the Level 1 the system prototype. trigger processors, where an accept or reject decision will be made for every crossing. The Level 1 vertex trig- 3.1. Very Lightweight Agents ger processor will perform pattern recognition, track, 3.1.1. Overview Multiple levels of very lightweight and vertex reconstruction on the pixel data for every agents (VLAs) are one of the primary components re- interaction [11]. It has been estimated that 99% of all sponsible for fault mitigation across Level 1 of the real- minimum-bias events will be rejected by the Level 1 time embedded RTES/BTeV data acquisition system vertex trigger, while 60-70% of the events containing [14]. As described earlier, Level 1 alone is made up of beauty or charm decay will still be accepted for fur- 2,500 DSPs, with each DSP consisting of three com- ther evaluation. ponents, namely a physics application (PA), a very Levels 2 and 3 will be implemented on a cluster, and lightweight agent (VLA), and the DSP kernel itself. data that makes it past the Level 1 filter is assigned to The PA is responsible for running Level 1 data filter- one of these Level 2/3 processors for further analysis. ing algorithms, while the VLA provides each PA (and Data that survives Level 2 will be passed to Level 3 the DSP kernel), with a lightweight, adaptive layer algorithms to determine whether or not it should be of fault mitigation. Also, several DSPs are grouped recorded on archival media [5]. onto a single farmlet are each assigned a farmlet VLA It is estimated that Level 2 will decrease the data (FVLA) that provides a layer of fault mitigation across rate by a factor of 10, and Level 3 will further reduce all DSPs within a given farmlet. Likewise, a regional the incoming rate by a factor of 2. Once data is fil- VLA (RVLA) provides a higher layer of fault mitiga- tered through all three levels, and additional data com- tion across a group of farmlets. One of the latest phases Figure 2: BTeV System Design and Runtime Framework - System Models use domain-specific, multi-view represen- tation formalisms to define system behavior, function, performance, fault interactions, and target hardware. Analysis tools evaluate predicted performance to guide designers prior to system implementation. Synthesis tools generate system configurations directly from the models. A fault-detecting failure-mitigating runtime environment executes these con- figurations in a real-time, high performance, distributed, heterogeneous target platform, with built-in, model-configured fault mitigation. Local, regional, and global perspectives are indicated. On-line cooperation between runtime and mod- eling/synthesis environment permits global system reconfiguration in extreme-failure conditions. of work at Syracuse University has involved implement- triggered from a central location acting on rules cap- ing individual proactive and reactive rules across mul- turing every possible system state. Rather, the Syra- tiple layers of VLAs for specific Level 1 system failure cuse team has implemented specific layers of fault mit- scenarios. igative behavior similar to Rodney Brooks’ multi-layer, decentralized subsumption approach for mobile robot One of the major challenges is to find out how the design. behavior of the various layers of VLAs will scale when implemented across the 2,500 DSPs projected for Level 3.1.2. VLA Subsumption Model The phrase sub- 1 of the BTeV environment. In particular, how will sumption architecture was first used by Brooks to de- rules within each VLA interact as they are activated scribe a bottom-up approach for mobile robot design in parallel at multiple layers of the system, and how that relies on multiple layers of distributed sensors for will this affect other components and the overall behav- determining actions [3]. Until that time, designs relied ior of a large-scale, real-time embedded system such as heavily on a centralized location where most, if not all, BTeV. Given the number of components and countless of the decision making process took place. In fact, only fault scenarios involved, it would be impossible to de- initial sensor perception and motor control were left to sign an ‘expert system’ that applies mitigative actions distributed components. As a result, the success and adaptability of these systems was almost entirely de- 3.2. Adaptive, Reconfigurable, Mobile Ob- pendent on the accuracy of the model and actions rep- jects for Reliability (ARMOR) resented within the central location. While embedded VLAs provide a lightweight, adap- In contrast, Brooks proposed that there should be tive layer of fault mitigation within Level 1, the Univer- essentially no central control. Rather, there should be sity of Illinois is developing software components that independent layers each made up of a large number of run as multithreaded processes responsible for moni- sensors, with each layer responsible for distinct behav- toring and fault mitigation at the process and applica- ior. Communication and representation is developed in tion layer. the form of action and inaction at each of the individ- Adaptive, Reconfigurable, and Mobile Objects for ual layers, with certain layers subsuming other layers Reliability (ARMOR) [9] are multithreaded processes when necessary. In this way, layer after layer is added composed of replaceable building blocks called Ele- to achieve what Brooks refers to as increasing levels ments that use a messaging system to communicate. of competence. This breaks the problem down into de- The components within the flexible architecture are sired external manifestations, as opposed to slicing the designed such that distinct modules responsible for a problem on the basis of internal workings of the solu- unique set of tasks can be plugged into the system. tion as was typically done in the past [4]. There are separate Elements that are responsible for recovery action, error analysis, and problem detection, Multiple layers of individual proactive and reactive which can each be developed and configured indepen- VLAs have been embedded within the RTES/BTeV en- dently. ARMORs are configured in a hierarchy across vironment. Lower level worker VLAs are responsible multiple nodes of the entire system. A sample ARMOR for mitigative actions performed at local worker DSPs, is shown in Figure 3. In this example, a primary AR- while VLAs at higher levels (FVLAs, RVLAs) perform MOR daemon is watching over the node and reporting fault mitigation related to components at the farm- to higher-level ARMORs out on the network [10]. Ele- let and region level. In addition, farmlet VLAs moni- ments within node-level ARMORs communicate to en- tor and communicate with lower level groups of work- sure that all nodes are operating properly. ers, and may subsume the actions of individual worker VLAs if a pattern of behavior is observed across other Execution ARMOR is responsible for monitoring and workers within the same farmlet. Similarly, regional ensuring the integrity of a single application, without VLAs are responsible for fault mitigation at the re- requiring any modifications to the program itself. It gional level, and monitor and communicate with lower watches the program to ensure that it is continues to level farmlet VLAs. At each layer, individual VLAs are run, and has the ability to restart the application when responsible for monitoring and communicating with a necessary. As it is monitoring, it may generate mes- specific group of lower level VLAs, and may subsume sages for other Elements to analyze and act on based certain actions if a particular pattern of behavior across on what it finds. The Execution ARMOR is also capa- the group exists. ble of triggering specific recovery actions based on the pattern of return codes that it receives from the ap- plication. Another distinct ARMOR known as Recov- ery ARMOR consists of Elements that have the abil- ity to automatically migrate processes from one ma- chine to another when the work load across machines is not balanced. Within the trigger, ARMORs provide error detec- tion and recovery services to the trigger system, along with any other processes running on Levels 2 and 3. Hardware failures may also be detected. ARMOR com- ponents are designed to run under an operating system such as Linux and Windows, and not within low level embedded systems that require real-time memory and processing time constraints. There is also an ARMOR API that allows trigger Figure 3: Sample ARMOR consisting of multiple Ele- applications to proactively send specific error informa- ments tion directly to an Element. Data processing and qual- ity rates can also be sent directly to the ARMOR where they may be distributed to corresponding Elements for System states are represented with distinct nodes in analysis [10]. the state diagram, each corresponding to a particular phase of system operation. Lines are used to represent 3.3. System Modeling Tools transitions between states, capturing the logical pro- gression of system modes. Transitions occur when spe- The Generic Modeling Environment (GME) tool cific events or sets of events are triggered (hardware [12][1] developed by the Institute for Software Inte- faults, OS faults, user-defined errors, fault-mitigation grated Systems (ISIS) at Vanderbilt University pro- commands from higher level VLAs, etc.). Actions are vides a graphical language that is used to specify and defined for each trigger (moving tasks, rerouting com- design the RTES/BTeV environment. Various aspects munications, resetting and validating hardware, chang- of the system can be modeled, including application ing application algorithms, etc.). data flow, hardware resource specification, and failure UML is used to capture the various associations and mitigation strategies. interactions between components in the meta-model for The GME tool was used to model several aspects of the state machine [2]. Behavior state machines per- the prototype described in detail in Section 4. The Data form actions based on triggering conditions, where a Flow Specification Language was used to specify data trigger is defined as a connection which contains at- flow within the prototype, while the Resource Specifi- tributes that define triggering conditions and actions cation Language defined the physical hardware layout. performed. Statecharts can be used to describe the be- Portions of the Fault Mitigation Language were used as havior of individual fault managers. Ports can be des- well. ignated as Input or Output. 3.3.1. Data Flow Specification Language The Actions are written in C, and typically involve application data flow model allows a system devel- forwarding messages upstream or downstream to no- oper to define the key software components and the tify the appropriate layer of VLAs or other necessary flow of data between them [16]. Standard hierarchi- system components. There are three primary types cal dataflow notation is used, where nodes capture the of messages, all of which are passed asynchronously. software components, and connectors show the flow of Fault/Error messages report errors in hardware or the data between nodes. These models can represent syn- application, while control messages are decision re- chronous or asynchronous behavior, and a variety of quests or commands that force parameters to change scheduling policies. For the BTeV trigger, these are in the running system. The model also allows for defin- primarily asynchronous operations, with data-triggered ing periodic statistical messages. scheduling. The overall objective for modeling fault mitigation The primitive software components in the dataflow strategies is to realize minimal functionality loss for any model are associated with a script that provides the set of possible component failures, recover from fail- implementation of the software component [16]. Fault- ures as quickly and completely as possible, and to min- manager processes are associated with specific fault- imize the cost associated with excessive hardware re- mitigation strategies. dundancy. 3.3.2. Resource Specification Language A re- 3.3.4. System Generation The overall system is source specification language defines the physical struc- generated automatically once the model has been suf- ture of the target architecture. Block diagrams capture ficiently defined by the user. Several low-level arti- the processing nodes (CPUs, DSPs, FPGAs), and con- facts need to be generated from the models in order nections capture the networks and busses over which to derive an implementation. System dataflow synthe- data can flow. One of the assumptions made here is sis involves mapping a specific dataflow model into a that the hardware component is modeled exactly the set of software processes and inter-process communi- same way as it is laid out physically. cation paths [16]. This mapping also needs to derive the execution order or schedule of the processes ex- 3.3.3. Fault Mitigation Language The modeling ecuting on the processors. The communication paths environment also provides a language for specifying between software processes must be setup such that fault mitigation strategies to address hardware resource the software process itself is unaware of the location and data flow failure scenarios. Statechart-like nota- of other software processes that it is communicating tion [8] is used for defining various failure states. Con- with. However, the mapping process alone cannot en- ditions to enter or leave those states, along with ac- force location-transparent communication. It relies on tions to be performed when state transitions occur are some capabilities in the runtime execution infrastruc- also defined [16]. ture in order to facilitate this [15]. in the middle of the panel. The Set Parameters button causes the rate and size sliders to take effect. (The Au- thority buttons will be explained later). The Efficiency graph indicates the ratio of processed (not lost) to gen- erated data, and the Missing Events displays (number and graph) indicate in absolute terms the events that have been lost. The primary BTeV operator controls (Stop, Go) are at the bottom of the panel. The right panel titled System Monitor shows the op- erational state of the DSPs in each of 3 farmlets. For each DSP, utilization is subdivided into P (physics ap- plication), V (VLA), and I (idle) time bar graphs. The System Monitor panel also shows the Buffer Manager queue occupancies and overall system utilization. Un- der normal operation, only 2 farmlets are active; the third is a hot spare available to take on work if one of Figure 4: Prototype Architecture Overview the other farmlets fails. The center panel titled Fault Injection allows the 4. Prototype prototype user (not to be confused with the ”BTeV operator”) to hang or restart the physics application The RTES group has developed various method- on any of the individual DSPs within a given farmlet, ologies and tools for designing and implementing as well as severing the data and control links. An Er- very large-scale real-time embedded computer sys- ror Rate slider is provided for automatically generating tems. However, prior to this prototype, there was no corrupt data, and a Run Well, Run Poor button-pair se- single integrated system that had been developed us- lects whether the physics application reacts gracefully ing all of these tools together to meet a common objec- (ignore) or ungracefully (hangs) in response to corrupt tive. The overall goal of the prototype was to demon- events. strate an implementation of these various tools and There are two ‘exceptions’ with respect to the orga- methodologies within a single system capable of ef- nization of the controls, reflecting the abstract distinc- ficient fault mitigation for a set of error conditions. tion between the prototype user (someone who would The prototype was demonstrated at SuperComput- use this prototype) and the BTeV operator (some who ing 2003 (SC2003). would use BTeV). It is unlikely that the BTeV opera- 4.1. Component Details tor would (or would be able to) hang the physics ap- plication, but it may well be the case that the opera- An overview of the complete prototype architecture tor would have a control to restart the application. The presented at SC2003 is found in Figure 4. Level 1 of the Hang and Restart buttons are both shown on the mid- BTeV event filter is the primary setting for the demon- dle panel as user controls, even though Restart may be stration. The prototype hardware consists of a 7-slot an operator control. VME crate with 4 fully populated motherboards and The other exception is the Authority control near 16 DSPs. The DSPs were Texas Instruments C6711 the top of the left panel. This collection of buttons with 64MB of RAM each, running at 166 MHz. determines which mitigation strategy(ies) are enabled. While several controls are provided, the most impor- 4.1.1. EPICS As mentioned earlier, the Experimen- tant are: tal Physics and Industrial Control System (EPICS) was used to provide an interface for controlling the oper- • Worker Reset (WR) - authorizes the VLA on a ation of the prototype system, as well as for inject- worker to restart the physics application if it fails ing faults into the system. A screenshot of the EPICS- to meet a timeout deadline. controlled prototype presented at SC2003 is shown in • Farmlet Prescale (FP) - authorizes a farmlet to Figure 5. determine a farmlet-wide rate for dropping events The left panel titled Experiment Information pro- without analysis, in an effort to prevent queue vides a number of controls and display relevant to the overflow. BTeV experiment. The Interaction Rate and Interac- tion Size sliders affect the generation of physics data; • Global Prescale (GP) - similar to farm- their influence is shown in the Rate and Size histograms let prescale, but the drop rate is uniform across all farmlets. tailed in Section 3.1.2 describing the VLA subsump- tion model, trends in the type and frequency of mes- • Global Failover (GF ) - authorizes an upper- sages sent to higher level VLAs can lead to subsuming layer ARMOR to declare a farmlet to be unfit, the actions of lower level VLAs. and to redirect future work to a hot spare farm- let. 4.2. Lessons Learned 4.1.2. VLA Prototype Multiple layers of proactive Following the SC2003 conference, a formal review and reactive VLAs were implemented within the SC03 [6] of the RTES/BTeV prototype was conducted by a prototype. Since the physics application (PA) at the team consisting of members both internal and exter- worker level is responsible for the critical overall objec- nal to the project. Everyone was in agreement as to tive of Level 1 data filtering, it is extremely important the substantial value of successfully producing a sin- that DSP usage by the VLA at the worker level is min- gle integrated system using many of the component imal, and only occurs either when the PA is not utiliz- designs and tools developed across the collaboration. ing the DSP, or when emergency fault mitigative ac- There were of course also some valuable lessons learned. tion is required. For this reason, the prototype worker A few of the primary areas of concern cited in the re- VLA is implemented as an Interupt Service Routine view follow. (ISR) that is triggered only when expected PA pro- Firstly, GME is an integral piece of the software cessing time thresholds are exceeded. The TI T6711 development cycle. Many different groups within the DSP processor used within the prototype has 15 hard- RTES collaboration will be developing, testing, and re- ware interrupts (HWIs). HWI 15 is assigned Timer 1, leasing various BTeV modules in parallel. Therefore the and HWI 14 is assigned Timer 0. The VLA prototype review stressed the need to break the current GME uses HWI 15 (Timer 1). model down into sub-models, so that work on dis- One of the fault scenarios modeled within the pro- tinct subsections of the model can occur simultane- totype occurs when the DSP is found to be over the ously. Submodels accessed through a standard change estimated time budget on crossing processing (e1). In control tool will ease the future coordination and track- this scenario, HWI 15 (Timer 1) is used by the VLA ing of overall model changes. to monitor PA crossing processing times, and trigger Another related issue that was raised in the review the VLA ISR if the time threshold is exceeded. At the is that of overall software release versioning. Currently, start of processing each crossing, the PA provides the various components and tools for the system are be- VLA with a time estimate as to the maximum time ing developed in parallel by different teams within the that it should take to process the current crossing. The collaboration. Since the primary goal is to be able to Timer 1 Period Register (T1PR) is assigned this es- provide a total integrated package of these components timated value, and timer counting is enabled. If the and tools, a versioning system must be developed that PA completes crossing processing as expected prior to facilitates a single production version of the BTeV soft- the timer expiring, then the timer is stopped and re- ware. This will make it easier for different developers to set when the PA begins processing the next crossing. If work with different versions of distinct components or on the other hand, the timer expires before the PA has tools without colliding with each other or with the pro- completed processing, then the VLA ISR is called. The duction release. BTeV will need to have several produc- first time that the VLA ISR is triggered, the VLA no- tion versions in use at one time, and also an arbitrar- tifies the PA of the time threshold violation, and re- ily large number of development versions. Establishing sets the timer for a set grace period. The PA then at- a formal versioning system now that spans develop- tempts to cleanup any remaining processing that it has ment efforts will ensure a deliverable of a single inte- to complete. If successful, the PA stops the timer, and grated BTeV package, where consistent versions can be continues on to the next crossing. If the cleanup is un- used across multiple development and production en- successful, the VLA ISR is again called, and this time, vironments. it either attempts to reset the PA itself (if it has au- Next, Elements of the ARMOR are written in thority), or sends communication up to the next level Chameleon, a framework built to be a research vehi- of VLA (in this case the Farmlet VLA) for remedial ac- cle for exploring the world of conceptual programming. tion. However, since BTeV authors will be expected to in- In addition to taking direct fault mitigative actions vent and implement new Elements quickly, the use of a on various system components, multiple layers of VLAs more standard development language such as Python are also responsible for communicating specific error would help reduce the learning curve and effort re- messages to higher layers within the system. As de- quired for adding and testing new elements. Figure 5: EPICS Prototype Screenshot

Finally, it is critical that physicists that use this sys- many of the component designs and tools that have tem are provided as much detail as possible on tracking been developed across the collaboration. Each of the changes that occur within the trigger system. For ex- teams within the collaboration have been able to take ample, if a prescale value is changed, a log must be kept away some valuable lessons learned that will be incor- that allows the user to identify the precise time of the porated into the development process moving forward. change in order to compare it against the modified be- In addition to addressing these lessons, there are many havior experienced within the system. There must be other challenging goals that RTES has set. a standard and easy way to log and review any and all Firstly, as detailed earlier, the prototype included system control changes across time, not just a real-time 16 DSPs at L1. Since the hardware projected for L1 current view of the values from the graphical user in- consists of 2,500 such DSPs, RTES needs to demon- terface. strate how the components and tools developed will A formal document [17] in response to the issues scale when implemented on a much higher volume of raised within the review was also completed. DSPs. Issues of scalability are one of the primary areas that RTES will be focusing on for the future, and plans 5. Next Steps are already being made for the next phase of a proto- As described in the previous section, the design and type that will include far more processors and support- implementation of the SC2003 prototype was an im- ing hardware. portant step for RTES in showing the integration of Next, VLA research is exploring ways that the lightweight, adaptive nature of the VLA may be fur- References ther used to coordinate communication and mitigative actions across the large-scale environment. Adaptive [1] T. Bapty, ‘Design Environment for Fault-Adaptive Sys- agent architectures that facilitate large-scale coordina- tems’. Workshop on High Performance, Fault Adap- tion are being evaluated for idioms that may address tive Large Scale Real-Time Systems, Nashville, TN, (November 2002). specific challenges within the RTES environment. [2] T. Bapty, S. Neema, J. Scott, J. Sztipanovits, and The next phase of modeling tools are also being de- S. Asaad, ‘Model-Integrated Tools for the Design of Dy- veloped that will further support component design namically Reconfigurable Systems’, VLSI Design, 10, and implementation. 3, pages 281–306, (2000). [3] R.A. Brooks, ‘A Robust Layered Control System for a 6. Conclusion Mobile Robot’, IEEE Journal of Robotics and Automa- tion, RA-2, (April 1986). This paper has described a large-scale fault adap- [4] R.A.Brooks,‘Intelligence Without Representation’, Ar- tive embedded software prototype for the proposed tificial Intelligence, 47, pages 139–160, (1991). Fermilab BTeV high energy physics experiment. Self- [5] J.N. Butler et. al, ‘Fault Tolerant Issues in the BTeV optimizing, self-protecting, proactive and reactive Very Trigger’. FERMILAB-Conf-01/427, (December 2002). Lightweight Agents (VLAs) are embedded within Level [6] M. Fischler and M. Paterno, ‘RTES SC2003 Re- 1 to provide an adaptive layer of fault mitigation view’. http://www-btev.fnal.gov/DocDB/0024/ across the RTES/BTeV environment. Adaptive, Re- 002439/001/RTES SC 2003 review.pdf, (January 2004). configurable, and Mobile Objects for Reliability (AR- [7] E. E. Gottschalk, ‘BTeV Detached Vertex Trigger’. 9th MOR) are designed to be self-configuring to adapt au- International Workshop on Vertex Detectors (VER- tomatically to dynamically changing environmental de- TEX2000), National Lakeshore, Michigan, (September mands. The prototype demonstrates the self-healing 2000). qualities of these objects designed with the ability to [8] D. Harel, ‘Statecharts: A Visual Formalism for Complex discover, diagnose, and react to discontinuities in real- Systems’, Science of Computer Programming, vol. 8, time processing. pages 231–274, (June 1987). The prototype was developed by the RTES collabo- [9] Z. Kalbarczyk, ‘ARMOR-based Hierarchical Fault ration, whose responsibility is to develop low-level real- Management’. Workshop on High Performance, Fault time embedded intelligent software to ensure system in- Adaptive Large Scale Real-Time Systems, Nashville, tegrity and fault-tolerance across extremely high data- TN, (November 2002). [10] J. Kowalkowski, ‘Understanding and Coping with Hard- rate environments. The objective of the prototype was ware and Software Failures in a Very Large Trigger to produce a single integrated system using many of the Farm’. Conference for Computing in High Energy and component designs and tools developed thus far across Nuclear Physics (CHEP), (March 2003). the collaboration. [11] S. Kwan, ‘The BTeV Pixel Detector and Trigger Sys- The Generic Modeling Environment (GME) devel- tem’. FERMILAB-Conf-02/313, (December 2002). oped by the Institute for Software Integrated Systems [12] A. Ledeczi, M. Maroti, A. Bakay, G. Nordstrom, J. Gar- (ISIS) at Vanderbilt University was used to design and rett, C. Thomason, J. Sprinkle, and P. Volgyesi, GME implement application data flow, hardware resource 2000 Users Manual, version 2.0, Vanderbilt University, specifications, and failure mitigation strategies. The 2001. Experimental Physics and Industrial Control System [13] S. Nordstrom, A Runtime Environment to Support Fault Mitigative Large-Scale Real-Time Embedded Systems (EPICS) that was used within the prototype to inject Research, Master’s thesis, Graduate School of Vander- system faults and monitor VLA mitigation and over- bilt University, May 2003. all system behavior was also presented. [14] J. Oh, D. Mosse, and S. Tamhankar, ‘Design of Very Finally, lessons learned from designing, implement- Lightweight Agents for Reactive Embedded Systems’. ing, and presenting the prototype, along with planned IEEE Conference on the EngineeringofComputer Based future efforts for the RTES collaboration were also pro- Systems (ECBS), Huntsville, Alabama, (April 2003). vided. [15] J. Scott, S. Neema, T. Bapty, and B. Abbott, ‘Hard- ware/Software Runtime Environment for Dynamically ACKNOWLEDGEMENTS Reconfigurable Systems’. ISIS-2000-06, (May 2000). The research conducted was sponsored by the National [16] S. Shetty, S. Neema, and T. Bapty, ‘Model Based Self Science Foundation in conjunction with Fermi National Adaptive Behavior Language for Large Scale Real time Laboratories, under the BTeV Project, and in associ- Embedded Systems’. IEEE Conference on the Engineer- ation with RTES, the Real-time, Embedded Systems ing of Computer Based Systems (ECBS), Brno, Czech Group (NSF grant # ACI-0121658). Republic, (May 2004). [17] L. Wang, Q. Liu, and Z. Kalbarczyk, ‘Response to the Review of ARMOR Environment’. http://www- btev.fnal.gov/DocDB/0026/002615/001/review respon se uiuc.pdf, (March 2004).