<<

A Hierarchical Distributed Coiitrol Model for Coordinating Intelligent Systeiiis

Richard M. Adler Symbiotics, Inc. 875 Main Street Cambridge, MA 02139

ABSTRACT

This paper describes a hierarchical distributed con- on related tasks. These requirements are dificult trol (HDC) model for coordinating cooperative to satisfy given existing AI technologies. Current problem-solving among intelligent systems. The knowledge-based systems are generally single-user, model was implemented using SOCIAL, an inno- standalone systems based on heterogeneous data vative object-oriented tool for integrating hetero- and knowledge models, developinen languages and geneous, distributed systems. SOCIAL tool shells, and processing platforms. Interfaces to embeds applications in “wrapper” objects called users, databases, and other conventional software Agents, which supply predefined capabilities for systems are typically custom-built and difficult to distributed communication, control , data specifi- adapt or interconnect. Moreover, intelligent sys- cation and translation. The HDC model is real- tems developed independently of one another tend ized in SOCIAL as a “Manager” Agent that coordi- to be ignorant of information resources, problem- nates interactioiis among application Agents. The solving capabilities, and access protocols for peer HDC-Manager: indexes the capabilities of appli- systems. cation Agents; routes request messages to suitable SOCIAL is an innovative collection of ohject- server Agents; and stores results in a commonly ac- oriented tools designed to alleviate these pervasive cessible “Bulletin-Board”. This centralized control integration problems [AdSOb]. SOCIAL provides model is illustrated in a fault diagnosis application a family of “wrapper” objects, called Agenls, that for launch operations support of the Space Shuttle supply predefined capabilities for distributed com- fleet at NASA, Kennedy Space Center. munication, control, da.ta specification and transla- tion. Developers embed programs within Agents, Keywords: distributed artificial intelligence, sys- using high-level, message-based interfaces to spec- tems integration, hierarchical distributed control, ify interactions between prograiiis, their embedding intelligent control, cooperative problem-solving Agents, and other application Agents. The rel- evant Agents transparently manage the transport INTRODUCTION and mapping of specified data across networks of disparate processing platforms, languages, devel- Knowledge-based systems are helping to automate opment tools, and applications. SOCIAL’S parti- important functions in complex problem domains tioning of generic and application-specific beliav- such as operations and decision support. Successful iors shields developers from network protocols and deployment of intelligent systems requires: (a) in- other low-1eveI complexities of distributed coniput- tegration with existing, conventional software pro- ing. More important, the interfaces between appli- grains and data stores; and (b) coordinating with cations and SOCIAL Agents are modular and non- one another to share complementary knowledge and intrusive, minimizing the number, extent, and cost skills, much as people work together cooperatively of modifications necessary to re-engineer esist ing

183 systems for integration. Non-intrusiveness is partic- OVERVIEW OF SOCIAL ularly important in mission-critical space and mili- tary applications, where alterations for integration The central problems of integrating heterogeneous entail stringent validation and verification testing. distributed systems include: This paper focuses on a specialized ”Manager” Agent that realizes a hierarchical distributed con- 0 communicating across a distributed network of trol (HDC) model on top of SOCIAL’S basic in- heterogeneous computers and operating sys- tegration services. The SOCIAL HDC-Manager tems in the absence of uniform interprocess Agent coordinates the activities of Agents that em- communication services; bed independent knowledge-based and conventional 0 specifying and translating information (i.e., applications relating to a common domain such as data, knowledge, commands), across applica- decision or operations support. Such centralized tions, programming languages and develop- control models are important for managing dis- ment shells with incompatible native data rep- tributed systems that evolve over time through the resentations; addition of new applications and functions. Cen- tralized control is also important for organizing 0 coordinating problems-solving across applica- complex distributed systems that display not only tions and development tools that rely on dif- small-scale, one-to-one relationships, but also large- ferent internal models for communication and scale structure, such as clustering of closely related control. subsets of application elements. The H D - Manager Agent’s coordination func- SOCIAL addresses these issues through a uni- tionality derives from a set of centralized control fied collection of object-oriented tools for dis- services including: maintaining an index knowledge tributed communication, control, data (and data base of the capabilities, addresses, and access mes- type) specification and management. Develop- sage formats for application Agents; formatting and ers access the services provided by each tool routing requests for data or probleni-solving pro- through high-level Application Programining Inter- cessing to suitable server Agents; and posting re- faces (APIs). The APIs conceal the low-level com- quest responses and other globally useful data to a plexities of implementing distributed computing commonly accessible “Bulletin-Board” . systems. This means that distributed systems can The next section of the paper reviews the over- be developed by who lack expertise in all architecture and functionality of SOCIAL. Sub- areas such as interprocess and network communica- sequent sections describe the structure and behav- tion (eg, Remote Procedure Calls, TCP/IP, ports ior of the HDC-Manager Agent and illustrate its and sockets), variations in data architectures across application in the domain of launch operations sup- vendor computer platforms, and differences among port for the Space Shuttle fleet at NASA, Kennedy data and control interfaces for standard develop- ment tools such as AI shells. Moreover, SOCIAL’S Space Center. Specifically, a HDC-Manager Agent coordinates the activities of standalone expert sys- high-level development interfaces to distributed ser- tems that monitor and isolate faults in Shuttle vehi- vices promote modularity, maintainability, estensi- cle and Ground Support systems. The cooperative bili ty, and portability. problem-solving enabled by the HDC-Manager pro- The overall SOCIAL architecture is summa- duces diagnostic conclusions that the applications rized in Figure .l. SOCIAL’S predefined distributed are incapable of reaching individually. processing functions are bundled together in ob- jects called Agents: Agents represent the active computational processes within a distributed sys- tem. Developers assemble distributed systems by:

184 (a) selecting and instantiating Agents from SO- plications and their embedding Agents, as well as CIAL’s of predefined Agent classes; and (b) among application Agents. Messages typically con- embedding individual application elements such as sist of: commands that Agent passes direct1 y programs and databases within Agents. Embed- into its embedded application, such as database ding consists of using the APIs for accessing SO- queries or calls to execute signal processing pro- CIAL’S distributed processing capabilities to estab- grams; data arguments to program commands that lish the desired interactions between applications, an Agent might call to invoke its embedded appli- their associated wrapper Agents, and other appli- cation; and symbolic flags or keywords that sig- cation Agents. New Agent subclasses can be cre- nal the Agent to invoke one or another fully pre- ated through a separate development interface by programmed interactions with its embedded ap- customized or combining services in novel ways to plication. For example, a high-level MetaCourier satisfy unique application requirements. These new API call issued from a local LISP-based applica- Agent types can be incorporated into SOCIAL’S tion Agent such as: Agent library for subsequent reuse or adaptation. (Tell :agent ’sensor-monitor :sys ’Symb The following subsections review the component ’(poll measurement-2)) distributed computing technologies used to con- st uc t S0 CIA L Agent s. transports the message contents, in this case a com- mand to poll measurement-X, from the calling pro- I Library of Agent Classes (Managers, Gateways)) gram to the Agent sensor-monitor resident on plat- form Syrnbl. The Tell function initiates a message transaction based on an asynchronous communica- tion model; the application Agent that issues such a message can immediately move on to other pro- IDi cessing tasks. The MetaCourier API also provides I Distributed Communications (MetaCourier) I a synchron~~~“Tell-and-Block” message function Network, Processor, and Platform for “wait-and-see” processing models. Agents contain two procedural methods that Figure .l: Architecture of the SOCIAL Toolset control the processing of messages, called in-filters and out-filters. In-filters parse incoming messages e argument list structure specified Distributed Coitimunicat ion is defined. After parsing the mes- ter typically either invokes the Agent’s SOCIAL’S distributed compu ng utilities are or- rce or application, or passes the ganized in layers, enabling c plex functions to t may modify) onto another Agent. be built up from simpler ones. The base or sub- itic model entails a directed strate layer of SOCIAL is th utational graph of passed messages. which provides a high-level, ular distributed passes are required, the in-filter communications capability for gent runs to completion. This between applications based o ethod is then executed to pre- guages, platforms, operatin ly, which is automatically re- and network protocols [Sy90]. y modified) through the out- ‘The Agent objects tha e Agents back to the originat- based applications or resour target Agent for the original CIAL’s MetaCourier level. MetaCourier API to pass messages between ap- time kernel resides on each

185 application host. The kernel provides: (a) a uni- tures (e.g., opposing byte ordering conventions). form message-passing interface across network plat- SOCIAL’S Metadata subsystem addresses these forms; and (b) a scheduler for managing messages complex compatibility problems. MetaData prc- and Agent processes (Le., executing filter meth- vides an object-oriented data model for specifying ods). Each Agent contains two attributes that spec- and manipulating data and data types across dis- ify associated Host and an Environment objects. parate applications and platforms. Developers ac- These MetaCourier objects define particular hard- cess these tools through a dedicated API. Meta- ware and software execution contexts for Agents, Data handles three basic tasks: (a) encoding and including the host processor type, operating sys- ,decoding basic data types (e.g., , , tem, network type and address, language , float), transparently across different machine ar- linker, and editor. The MetaCourier kernel uses chitectures; (b) describing data of arbitrarily com- the Host and Environment associations to manage plex abstract types (e.g., database records, frames, the hardware and software platform specific depen- b-trees), in a uniform object-oriented model; and dencies that arise in transporting messages between (c) mapping across data models native to particu- heterogeneous, distributed Agents (cf. Figure .2). lar applicqtions and SOCIAL’S generic data model. Agent4 Env-A Host-A Host-B Env-B Agent-B Like Metakourier, MetaData’s object-oriented API is basically, uniform across different programming languages. Also, MetaData allows new types to be defined and manipulated dynamically at runtime. Most alternative data management tools, such as XDR, are static and non-object-oriented.

Figure .2: Operational Model of MetaCourier SOCIAL integrates MetaData with Meta- Courier to obtain transparent distributed commu- MetaCourier’s high-level message-based API is nication of complex data and data types across het- basically identical across different languages such erogene,ous computer platforms as well as across as LISP or C. MetaCourier’s communication model disparate applications: developers embed Meta- is also symmetrical or “peer-to-peer”. In pntrast, Data API function calls within the ip-filter and out- client-server models based on remote procedure call filter methods of interacting Agents, using Meta- communication formalisms (RPCs) are asymniet- ier messages to transport MetaData objects applications residing on distributed hosts. ric: only clients can initiake communication, and while multiple clients can interact with a partic- Data API functions decode and encode mes- ular server, a specific client’proc sage contents, mapping information to and from teract with a particular server. the native representational models of source and recently, RPCs were restricted to target applications and MetaData objects. SO- chronous (i.e., blocking) communic CIAL thereby separates distributed communication from data specification and translation, and cleanly partitions both kinds of generic functionality from Data Specification and Translation application-specific processing.

A major difficulty in getting hete tions and information resources Distributed Control and Informatioii Access another is the basic incompatibi lying models for representing da SOCIAL’S third layer of object-oriented tools estab- commands. These problems are lishes custom, high-level API interfaces for Agent applications are distributed across heterogeneous classes specialized for particular integra tion or co- computing platforms with different data architec- ordination functionality. Lower-level RlIetaCourier

186 and MetaData these Agent API data an high-level APh largely MetaData interfaces fr developers of distribute predefined specialized representation and t building blocks for sat tectural requirements, acc each sudh Agent type thr level API interfaces. Agents map between unit objects. Meta- For example, app to pass commands to often constructed usi such as database manage and AI shells. SOCIAL edge Gateway Agent classes application-indepen data interfaces to API interfaces. Si Agents for coo Gateways and work together cooperatively. The HDC-Manager, class, using its shell-specific API described in the following sections, defines one k ram the required interactions: the of centralized model for distributed control of the particular data or commands cessary, developers can a

and define corresponding n interfaces. The strategy of modu- ay, Manager, or fully custom

for designing custom types of

ted using in-house, proprietary dge Gateway Agent class defines standard methods. The gen quests initiated b MetaData, and Gateway Agents cation to pass on to support for integrating conven-

ams to clean and filter data nt events. Database Gate- SOCIAL uses M

187 while extracting and writing note- teraction pathways and to on-line storage systems. Scientists relationships among func- might then study the data through DC-Manager is the first s is and visualization intermediary User The developer of such a system would embed the constituent applications and databases in suitable Agents and specify their direct, one-to-one interac- tions in terms of the relevant Agent APIs. Once autonomous knowlege-based systems are incorporated into a distributed system, integration man manager large organization. The IID tools alone are no longer fully adequate. First, the sequencing or composition of behaviors both within its subordinates. The H and across autonomous systems is typically deter- mined dynamically, based on the content of incom- ing data at run-time. A decentralized approach to managing such data-driven behaviors quickly providing a centralized Index knowledge base becomes intractable. The problem is particularly that specifies: each available information acute in distributed systems that evolve over an ex- source or problem-solving service; the subordi- tended lifecycle, in which application elements are nate Agent that can provide that resource; the enhanced, added, superseded, or reorganized (i.e., Agent’s logical address; and a procedure for broken apart or consolidated), over time. converting data in a generic resource request Second, complex organizational relationships into a message format that is suitable for that emerge among clusters of applications in large-scale server Agent; distributed intelligent systems. For example, pro- 0 analyzing tasks requesting information or grams that automate on-li problem-solving services based on the Index puter networks and other co stems are nat- knowledge base and routing suitable messages urally coupled more closely to the relevant subordinate Agents to accoin- to decision or maintenance su tools for the plish those tasks; same target domain. At the actions regularly take place be 0 mediating all interact ns with external (i.e., across functional groupings. non-subordinate) Agents; and scheduling tools (decisi tem configuration activities 0 providing a cent ulletin-Board to store ~hilebehavioral anomalies d nd other data of coni- tems (operations support) tri ccess by subordinat,e bleshooting, diagnosis, and tenance support). Ideally, i group interactions should be es the advantages of ter level, to minimize sensiti ure in a complex dis- within functional groups. In short, the intelligent complex distributed systems tion of individual grated, but also coordinate tion is necessary both to 0 modularity;

188 0 extensibility and maintainability; with respect to the physical distribution and inter- nal architectures (i.e., communication, control, and 0 support for heterogeneity. knowIedge structures), of subordinate and external Agents with which it interacts. Consequently, an Modularity derives from the HDGManager’s HDC-Manager can subordinate standalone applica- centralized Bulletin-Board and Index knowledge tion Agents, Gateway Agents, and other Manager base. Each Index entry identifies services avail- Agents, HDC or otherwise, with equal ease. In par- able from subordinate Agents symbolically (e.g., ticular, HDC-Manager Agents can be organized in find-fault-precedents). Each Bulletin-Board post- a nested hierarchy to support large complex dis- ing identifies the item type, such as service request tributed systems, as illustrated in Figure .3. or reply, the posting Agent, and the requesting Agent (when appropriate). Subordinate Agents do Director not need to know about the functionality, struc- IManager Agent1 ture, or even the existence of any other application Agents; they only require: (a) the generic high- /t\ level API interface to the HDC-Manager Agent; and (b) knowledge of the symbolic names used by the HDC-Manager to index the resource types avail- able within a specific distributed application. The same minimal requirements hold for external ap- Gateway Gateway plication and Manager Agents that need to inter- Agent Agent act with an HDC-Manager to obtain information or planning and scheduling config. mgmt., fault detection, troubleshooting, repair, services from its subordinates. 8y~tm8,databases end isolation ~yslems, inventory systems, error-tracking databases databases Extensibility and maintainability follow from the HDC-Manager’s modular architecture: new Figure .3: Hierarchy of HDC-Manager Agents subordinate Agents are incorporated simply by: (a) updating the Index knowledge base with appropri- ate entries to describe its services; and (b) extend- HDC-Manager Architecture and Operation ing other subordinate Agents, as needed, to be able to request new capabilities and process the results. The HDGManager Agent was implemented using Moreover, existing subordinate Agents can be re- SOCIAL and in a uniform, fully configured with minimal disruption. For example, object-oriented manner. The Agent is comprised suppose that problem-solving functions in one sub- of a collection of state variables and utility meth- ordinate Agent are reallocated to some other appfi- ods (cf. Figure .4). The value for each state vari- cation Agent, old or new. Neither the Manager’s able consists of a list of MetaData objects, such other subordinates nor any external Agents have to as Agent-Index-Items. Each such object type has be modified; only the HDC-h4anager Index knowl- associated test predicate, instance creation, and edge base needs to be updated to reflect the config- access functions (e.g ., Agent-Index-I ten1P, Create- uration changes. Agent-Index-Item, Check-Agent-Index). These low-level functions, written using the RletaData Heterogeneity follows from the modular nature API, are invoked through a higher-level HDC- of SOCIAL Agents in general. The high-level API Manager API and are transparent to developers. to the HDC-Manager Agent’s coordination capa- Five state variables store the static and dynamic in- bilities is distinct from, but fully compatible with, formation necessary for the HDC-Manager to fnnc- the API interfaces used to embed applications with tion : Gateways or other kinds of SOCIAL Agents. Thus, the HDC-Manager Agent operates transparently

189 0 a Task-Agenda for not yet investi such possibilities.) The re- dispatching service maining three variables are more dynamic structures: their contents change continually as the 0 a Bulletin-Board fo HDC-Manager regulates an operational distributed other data items t system. The HDGManager Agent also incorporates four sets of supporting procedural methods: 0 an Index knowledg

ordinate Agents, th 0 in-filter and out-filter methods for parsing and capabilities, and fun for assembling data responding to incoming MetaCourier messages and processing replies to outgoing messages;

0 auxiliary API utility methods that realize ing the Agenda queue of pending Tasks to be the HDC-Manager’s global hierarchical control routed by the Manager; model;

0 an Activity Log for tracing and debugging 0 auxiliary API utility methods for creating and Manager behavior during application develop- modifying HDGManager data items, posting men t ; them to and retrieving them froin the Agent’s state variables;

State Variables 0 optional methods for handling application Task Agenda Tasks within the HDGManager itsself. Such Bulletin-Board methods may be called for when a Manager Index of subordinate Agents Agent is configured as a subordinate to other Task Priorities Manager Agents. Action Log Procedural Methods The HDGManager’s specialized coordination MetaCourier Methods: functions and informatioil structures are accessed in-filter, out-f ilter Utility Methods: through a dedicated API (cf. Figure .5). This HDC control model methods API is implemented via lower-level MetaData and access methods for Manager state variables MetaCourier capabilities and APIs. The high-level Apptication-Specific Methods: API shields SOCIAL developers from the underly- local Manager task handlers, ext. interfaces ing mechanics of packaging, transporting, and de- ciphering messages containing HDC-Manager com- Figure .4: HDC-Manager Agent Structures mands and data structures among heterogeneous Agents. The API utility methods for accessing Currently, the Index knowledge base and the HDC-Manager state variables and their contents in- Prioritization conditions represent static structures clude: that are specified once, when an HDC-Manager Agent is first defined to coordinate a specific 0 two methods for searching the Index Iinowl- set of applications. Typically, changes are made edge Base and Bulletin-Board. Both meth- during development, but infrequently thereafter, ods call a generic symbolic pattern-matching when subordinate application Agents are added, function with application-specific search con- removed, or restructured. (Self-adapting systems ditions; could modify Priority Conditions or even create a a single Create-Item method, which dispatches new server Agents dynamically; however, we have to the various functions that create instances

190 of HDC-Manager MetaData object types: In- 0 a Log-Utility method. A menu-driven dex and Bulletin-Board entry items; Priority- trace/debug facility can be used to toggle the Conditions, and Tasks (Service Requests); Activity Log, which tracks all messages pro- cessed by the Command-Manager, and other 0 four parallel methods for modifying HDC- flags that trace of all runtime modifications to Manager state variables : Task- Agen da, Index Manager state variables. Knowledge Base, Bulletin-Board, and Priority-

Conditions; Each method supports keyword Control Methods Data Structure Accessors

options for resetting the variables, posting and ~ deleting specific data items; Command-Manager Check-Agent-Index Control-Cycle Check-Bulletin-Board The HDGManager API is also used to invoke Initialize Create-Item the utility methods that implement the HDC con- Log-Utility Modify-Agenda trol model. These methods are generally triggered Prioritize-Agenda Modify-Agent-Index automatically, from the Manager’s predefined in- Task-Dispatcher Modify-Bufletin-Board filter and out-filter, but can be activated by other Modify-Priority-Conditions Agents as required. HDC-Manager control meth- ods include: Figure .5: HDC-Manager Agent Utility Methods

0 a Command-Manager method, which dis- The Command-Manager enforces regulated hi- patches API command messages, either to in- erarchical control channels. A subordinate Agent ternal HDC-Manager API methods or to the can communicate with any HDC-Manager Agent Command-Manager of another HDC-Manager within a Manager hierarchy; however, any such Agent. This method also traps illegal coni- message is processed first by the Agent’s immedi- mands; ate superior and then by all intervening Managers. This design makes it possible to define and enforce 0 an Initialize method for resetting the HDC- alternative organizational reporting policies. The Manager state variables and performing any default policy is very flexible: messages are tracked, application-specific actions; but passed along the Manager hierarchy without fil-

0 a Prioritize-Agenda method for sorting pend- tering. Thus, any subordinate Agent can access the ing Tasks for the Manager to dispatch in ac- Bulletin-Board or request services from other Man- cordance with the declarative ordering con- agers through its immediate Manager. More or less ditions specified in the Priorities-Conditions restrictive control architectures may be appropri- state variable. Each condition specifies a Task ate under different conditions. For esample, mes- Attribute (e.g., service-category, priority), and sages from subordinates to higher-le\.el Managers an optional list of Attribute values for ordinal in time-critical applications could be screened or sorting; prioritized. Parallel, antagonistic or competitive models can also be explored. Finally, the HDC- 0 a Task-Dispatcher method, which uses the In- Manager architecture does not preclude a subor- dex Knowledge Base to generate and send a dinate reporting to multiple Managers, thus per- suitable command message to the relevant sub- mi t ti ng n on- h ier ar chical mode Is (e.g , matr is st r u c- ordinate Agent requesting the service specified tures), or elaborate hybrid organizational struc- in a Task; tures.

0 a top-level Control-Cycle method that invokes The HDC-Manager was designed in a modu- the Prioritize-Agenda and Task-Dispatcher lar fashion to facilitate maintenance and esten- methods that ground the HDC model; sion. API accessor utility methods conform to uni-

191 form conventions for naming, argument call struc- Processing System, monitoring, control, fault isola- tures, and parallel behavior. For example, Tasks tion and management of on-board Shuttle systems or other data structures can be modified by adding and Ground Support Equipment. Prototypes have attributes (e.g., timestamps for First-In-First-Out been tested successfully (off-line) in support of sev- ordering), or changing attribute names. The devel- eral Shuttle missions and are currently being ex- oper merely changes the relevant Create-X function tended and refined for formal field testing and val- (and possibly one of the API Check methods). New idation. Final deployment will require integrating HDGManager state variables and data objects to these applications, both with one another and with populate them can be added by implementing ap- existing Shuttle operations support systems. propriate MetaData type-checking predicates, cre- KSC recently initiated the EXODUS project ation and accessor functions, adding a case to the (Expert Systems for Operations Distributed Users) Create-Item API method, and extending the ta- to prepare for this challenging systems int,egra- ble that drives the Command-Manager and Task- tion task. As part of this effort, ILSC is funding Dispatcher control methods. Symbiotics, Inc. to develop the SOCIAL toolset Similarly, the HDC-Manager API control meth- to help validate, refine, and ultimately iniple- ods share parallel argument call structures and can ment the proposed EXODUS architecture [AdSOa]. be modified or extended selectively. For example, Proof-of-concept prototypes have been constructed the Initialize method can be customized to per- to demonstrate central EXODUS design concepts: form any actions required to load and initiate all distributed data transfer; non-intrusive physical subordinate Agents and their embedded applica- distribution of knowledge bases from existing in- tions. Moreover, as noted above, alternative or- telligent system to facilitate resource control and ganizational policies can be implemented to cap- sharing; and integration of expert systems and ture logical relationships specific to particular dis- databases via Gateway Agents for CLIPS, KEE, tributed systems. All such modifications are im- and Oracle development tools. This section de- plementing by creating HDC-Manager Agent sub- scribes a fourth EXODUS prototype, which used classes with custom methods that override tlie stan- a SOCIAL HDC-Manager Agent to coordinate the dard methods defined by the root Agent class. fault isolation activities of two previously stan- For instance, the default initialization behavior can dalone intelligent systems. be redefined simply by creating a subclass Agent with a new Initialize method that calls a Custom- Background KSC Launch Operations Initialize function. Specialization preserves the on structure of the original HDGManager Agent class Processing, testing, and launching of Sliu ttle ve- for use in applications where it is suitable. At the hicles takes place at facilities dispersed across tlie same time, inheritance and functional abstraction KSC complex. Many activities, such as storing and (through method dispatching) promotes adaptabil- loading fuels and controlling the environments of ity and compact definitions for customized Manager Shuttles on Launch Pads require elaborate elec- Agent subclasses. tromechanical Ground Support Equipment. The Launch Processing System (LPS) supports all Shut- USING THE HDC-MANAGER AGENT tle preparation and test activities from arrival at FOR LAUNCH OPERATIONS SUPPORT KSC through to launch. The LPS provides the sole direct real-time interface between Shuttle engi- Over the past decade, NASA Kennedy Space Cen- neers, Orbiter vehicles and payloads, and associated ter (KSC) has developed knowledge-based systems Ground Support Equipment [HeS7]. to increase automation of operations support tasks The locus of control for the LPS is the Fir- for the Space Shuttle fleet. Major applications in- ing Room, an integrated network of computers, clude, operations support of the Shuttle Launch

192 software, displays, controls, switches, data links ulated) PCM telemetry data to detect and iso- and hardware interface devices. Firing Room com- late faults in communications between Shuttle GPC pu ters are configured to perform independent LPS computers and their associated GPGFEP coiiipu t- functions through loads. Shut- ers in LPS Firing Rooms. The GPC-X prototype tle engineers use computers configured as Consoles is implemented in CLIPS on a Sun M70rkstation. to remotely monitor and control specific vehicle and Ground Support systems. Each such application Coordinating Fault Diagnosis with HDC- Console communicates with an associated Front- Managers End Processor (FEP) computer that issues coin- mands, polls sensors, and preprocesses sensor mea- One type of memory hardware fault in GPC coni- surement data to detect significant changes and ex- puters manifests itself during switchovers of Launch ceptional values. These computers are connected to Data Buses. These buses connect GPCs to GPC- data busses and telemetry channels that interface FEPs until just prior to launch, when communica- with Shuttles and Ground Support Equipment. tions are transferred to telemetry links. Unfortu- The LPS Operations team ensures that KSC’s nately, the data stream available to GPC-X does four independent Firing Rooms are available con- not provide any visibility into the occurrence of tinuously, in appropriate error-free configurations, Launch Data Bus switchovers. Thus, GPC-X can to support test requirements such as Launch Count- propose, but not test certain fault hypotheses about down or Orbiter Power-up sequences for the Sliut- GPC problems. However, switchover events are tle fleet. A dedicated Console computer is con- monitored by the LPS Operating System, which figured for these Operations support functions in triggers messages that can be detected by OPERA. each Firing Room. This computer displays mes- Typical of the current generation of knowledge- sages triggered by tlie LPS Operating system that based systems, OPERA and GPC-X were dei.el- signal anomalous events such as improper register oped independently of one another, using different values or expiring process timers. The Operations representation schemes, reasoning and control - Console supports other conventional programs for els, software and hardware platforms. More criti- monitoring and retrieving Firing Room status data cally, neither system possesses internal capabilities as well. for modeling or communicating with (remote) peer OPERA (for Operations Analyst) consists of an systems. The EXODUS prototype demonstrates integrated collection of expert systems that auto- the use of SOCIAL Agents to rectify these short- mates many critical LPS operations support func- comings (cf. Figure .G). The distributed applica- tions [AdsSb]. OPERA taps into the same data tion uses two Knowledge Gateway Agents to in- stream of error messages that the LPS sends to tegrate OPERA and GPC-X. An HDC-Manager the Operations Console. OPERA’S primary ex- Agent mediates interactions between the OPERA pert systems monitor the data stream for aiioma- and GPC-X Agents, coordinating their independent lies and assist LPS Operations users in isolating and fault isolation activities obtain enhanced diagnostic managing faults by recommending troubleshooting, results. recovery and/or workaround procedures. In ef- Specifically, GPC-X, at the appropriate point fect, OPERA retrofits the Operations Console with in its rule-based fault isolation activities, issues a knowledge-based fault isolation capabilities. The request via its Gateway Agent to check for Launch system is implemented in KEE on Texas Instru- Data Bus switcliovers to the HDC-Manager. The ments Lisp Machines. request is triggered by adding a simple consequent GPGX is a prototype expert system for iso- clause of the form (GW-Return LDB-Sn7itchover- Iating faults in the Shuttle vehicle’s on-board com- Check) to the CLIPS rule that proposes tlie mem- puter systems, or GPCs. GPC-X monitors (sim- ory fault hypothesis. GW-Return is a custom

193 C function defined in the SOCIAL API interface for embedding CLIPS. When the rule fires, GW- Return interacts with the GPC-X CLIPS Gate- way Agent, causing it to formulate a message to the HDC-Manager containing a Modify-Agenda re- quest to add a MetaData Task object for the LDB- II Switchover-Check service. The HDC-Manager's JC + response (to lest LDB task to search for Command-Manager dispatches this request, caus- fauR hypothesis) Launch Data Bus info ing the Task to be posted to the Task-Agenda. 1 Next, the HDC-Manager executes the Control- Cycle method, which results in the Task being pro- cessed via the Task Dispatcher. First, the Index knowledge base is searched for a server Agent for LDB-Switchover-Checks. The search identifies the Figure .6: Hierarchical Coordination in EXODUS OPERA Gateway Agent as a suitable server. Task data is then reformulated into a command message This EXODUS prototype illustrates non- using the procedure specified by the Index Knowl- intrusive system-level coordination of distributed edge Base. The message is then passed to the applications that solve problems at the subsystem OPERA Gateway Agent, whose in-filter method level of Shuttle Operations: neither OPERA nor performs a search of the knowledge base used by GPC-X are capable of accomplishing the task of OPERA to store and interpret LPS Operating Sys- confirming or rejecting the GPC memory fault - tem error messages. The objective is to locate error pothesis individually. GPCX generates fault can- messages (represented as KEE units) indicative of didates, but lacks sufficient resources to complete LDB Switchover events. Search results are encoded diagnosis, which requires Both generate and test ca- within a Manager Bulletin-Board MetaData object, pabilities. OPERA automatically detect LPS error which the OPERA Gateway's out-filter method re- messages that are relevant to GPC-X's fault test re- turns as a message containing an API command to quirements. However it lacks contextual knowledge post that object to the Manager's Bulletin-Board. about GPC computers - their architecture, behav- In this prototype, the OPERA Gateway contains ior, fault modes and symptoms - to recognize the all of the task processing logic: OPERA itself is significance of such data. OPERA also lacks the a passive participant that continues its monitoring capabilities to communicate its interpretations of and fault isolation activities without significant in- LPS data back to GPC-X to complete diagnosis. terruption. Together with the Gateway application Agents, The GPC-X CLIPS Gateway Agent queries the the HDC-Manager provides the links required to HDC-Manager to check the Bulletin-Board for a combine and utilize the otherwise isolated or frag- response to its LDB-Switchover-Check request and mented knowledge about Shuttle and Firing Room retrieves the results. The retrieved Bulletin-Board systems and their relationships to one another. The item is decoded and the answer is converted into resulting coordination architecture is non-in trusive a fact that is asserted into the GPC-X fact base. in that neither application was modified to include Finally, the Gateway activates the CLIPS rule en- direct knowledge of the other system, its interfaces, gine to complete GPC fault diagnosis. Obviously, knowledge model, or delivery platform. The HDC- ne~7rules have to be added to GPGX to exploit Manager introduces an isolating layer of ahstrac- the newly available hypothesis test data, However, tion; application Agents need only know how to all of the basic integration and coordination logic is communicate with the HDC-Manager to request supplied by the embedding GPC-X Gateway Agent services and retrieve responses for their embedded or the HDC-Manager.

194 applications. formance overhead from duplicated processing. As argued earlier, localized coordination strategies can The proposed design for the complete EXODUS be cumbersome and difficult to maintain for hetero- system specifies an extended HDC model to inte- geneous, evolving “multiple problem” DAI systems. grate and coordinate all of KSC’s operations sup- port applications across areas of functional over- The MACE DAI tool [Ga86] incorporates man- lap. A SOCIAL HDC-Manager Agent will support ager agents for centralized routing of messages architectural changes as EXODUS evolves through among agents, closely resembling the organizing its lifecycle of initial deployment, maintenance, and role played by SOCIAL HDGManagers. How- enhancement. Initial EXODUS applications are ever, it is not clear that MACE managers can loosely-coupled and will interact relatively infre- be configured in multi-level hierarchies. Moreover, quently. Consequently, the HDC-Manager’s cen- MACE managers do not provide shared memory tralized control strategy does not entail serious per- Bulletin-Boards. MACE and several other dis- formance penalties. As new applications are added tributed system tools such as ABE [Ha881 and and interaction traffic increases, bottlenecks will CRONUS [SliSS] insulate developers from low level be addressed by reorganizing the EXODUS con- distributed computing functions and support mes- trol architecture in terms of hierarchies of HDC- sage sending among processes across computer net- Manager Agents, much as growing human organi- works. However, unlike SOCIAL, these tools do not zations evolve. implement generic distributed services in uniformly object-oriented layered modules that are accessible to developers for customizing. In addition, SO- RELATED WORK CIAL provides more extensive tools for integrating across languages and shells. The HDGManager generalizes and extends a hi- erarchical distributed control (HDC) model orig- inally developed for NASA’s Operations Analyst FUTURE WORK (OPERA) system [Ad89a]. The OPERA version of the control model was implemented using dis- Future development will extend SOCIAL’S library tributed blackboard objects [Ad89b]. OPERA re- of Manager Agents. Alternative organizational quires all applications to be co-resident and to be models will explore alternative types of nonhierar- implemented using KEE, and only supports a sin- chical cooperative coupling. For example, we are in- gle Manager and subordinate group. The extended vestigating control behaviors based on group-based HDC model relaxes these restrictions using Meta- tasking [Br89], as one approach to providing fault Courier, MetaData, API tools, SOCIAL Agents. tolerance in distributed systems: groups can be used to define sets of application Agents that dupli- Most research in Distributed Artificial Intelli- cate support for given services. A group Agent that gence (DAI) has focused on domains involving a could detect loss of an Agent configured to provide single complex problem, such as data fusion. Con- a service (e.g., due to dropped network links or host trol schemes have emphasized purely local coordi- platform failures), could activate another member nation methods to achieve cooperation among in- Agent to resume the service. Redundancy and re- telligent systems [Bo88,Hu87]. For example, [Le831 coverability are critical prerequisites for distributed employs a homogeneous collection of blackboards systems in mission-critical space and military ap- that interpret data from a (simulated) network of plications. We also plan to implement versions spatially distributed sensors to reconstruct vehicle C positions and movements. Data and hypotheses. of SOCIAL Manager and Gateway Agents that are curreii tly Lisp- b sed. are shared across adjacent sensor regions. Solu- tions emerge consensually, without global manage- ment. Decentralized control entails significant per-

95 CONCLUSIONS Acknowledgments

Development tools for distributed intelligent sys- SOCIAL was designed aiid implemented by the au- tems must be modular and non-intrusive to: (a) thor, Bruce Cottman and Rick Wood. Development facilitate integration of existing, standalone sys- of SOCIAL has been sponsored by NASA Kennedy tems “after the fact;” and (b) minimize lifecycle Space Center under contract NAS10-11606. Astrid costs for maintaining, enhancing, and re-verifying Heard of the Advanced Projects Office at Kennedy systems. Tools for building distributed systems Space Center has provided invaluable support and must be able to coordinate as well as integrate suggestions for the SOCIAL effort. Ms. Heard also autonomous application elements. Coordination is initiated and directs the EXODUS project. Meta- necessary to manage large numbers of dynamic in- Courier was developed by Robert Paslay, Bruce teraction pathways and to capture complex organi- Nilo, and Robert Silva, with funding support from zational relationships among application elements. the U.S. Army Signals Warfare Center under Con- SOCIAL provides a unified set of object-oriented tract D A A B 10- 87- C- 0 0 53. tools that address all of these requirements. The HDC-Manager Agent realizes a hierarchical dis- REFERENCES tributed control model that adopts a highly cen- tralized approach to coordination. Developers use [Adgoal Adler, R.M. and Cottinan, B.H. “EXO- a high-level Application Programming Interface to DUS: Integrating Intelligent Systems for Launch access the HDC-Manager’s coordination capabili- Operations Support.” Proceedings, Space Op- ties. The API conceals lower-level SOCIAL tools erations, Applications, and Research Symg~osium for transparent distributed communication, control, (SOAR90). Albuquerque, NRl. June 26-21, 1000. and data management. [Adgob] Adler, R.M. and Cottman, B.H. “A Devel- SOCIAL has broad applicability for distributed opment Framework for AI Based Distributed Op- intelligent systems that are being developed in erations Support Systems.” Proceedings Fifth Con- space-related domains. Knowledge-based and con- ference on AI for Space Applications. Huntsville, ventional tools for managing and analyzing data AL, May 21-23, 1990. need to be coupled to help space scientists explore and utilize NASA’s growing stores of astronomi- [Ad89a] Adler, R.M., Heard, A., and Hosken, cal and environmental information. Linking short- R.B. “OPERA - An Expert Operations Analyst and long-term scheduling and planning tools will for A Distributed Computer Network.” Proceed- improve decision support capabilities for complex ings Annual AI Systems in Government Confer- space missions. EXODUS-like architectures can in- ence, Computer Society of the IEEE. \Yashington , crease automation of operations support by coor- D.C., March 27-31, 1989. dinating autonomous tools across subsystems aiid [AdSSb] Adler, R.M. “A Distributed Black- functional areas (e.g., configuration, anomaly de- board Architecture for Integrating Loosely-Coupled tection, diagnosis and correction). Example do- Knowledge-Based Systems.” Intelligen*t Systems mains include payload and Shuttle processing, com- Review. 1, 4, Summer, 1989, Association for In- puter and communications networks, and vehicle or telligent Systems Technology, E. Syracuse, RY. Space Station subsystems (e.g., power generation, power distribution, mission payloads, life support). [Br89] I<. Birinan et. al. The ISIS System A!frzn,rlnl Finally, flight and mission control centers can en- VI.2. Department of Computer Science, Cornell hance automation and safety in directing launches, University, Ithaca, NY, June 1989. satellites, and space probes, by combining decision [Bo881 Bond, A.H., and Gasser, L. (eds.) Rend- and operations support tools into fully unified, co- ing s in Distrih uted A rtificia 1 Intel ligen ce. hlorga n- operating systems. Kaufinann, Sail Mateo, CA, 1988.

196 [Ga86] Gasser, L., Braganza, C., and Herman, N. MACE: A Flexible Testbed for Distributed AI Re- search. Distributed Artificial Intelligence Group, Computer Sci. Dept. USC, 9-Aug-1986. [Ha881 Hayes-Roth, F., Erman, L.D., Fouse, S., Lark, J.S., and Davidson, J. (1988). “ABE: A Cooperative Operating System and Development Environment.” in A.H. Bond and L. Gasser, (eds.) Readings in Distributed Artificial Intellk gence. Morgan-Kaufmann, San Mateo, CA, 1986. [Hut371 Huhns, M.N. (ed.) Distributed Artificial In- telligence. Morgan-Kaufmann, Los Altos, Califor- nia, 1987. [Le831 Lesser, V.R and Corkill, D.D. “The Dis- tributed Vehicle Monitoring Testbed: A Tool for Investigating Distributed Problem Solving Net- n7orl~.’’AI Magazine. Fall 1983 pp. 15-33. [ShPG] Schantz, R., Thomas, R. and Bono, G. “The Architecture of the Cronus Distributed Operating System. Proceedings 6th International Conference OTL Distributed Computing Systems. May, 1986. [SygO] Symbiotics, Inc. Object-Oriented Hetero- gerieous Distributed Computing with Meta Co urier. Technical Report, Cambridge, MA, March, 1990.

197