<<

1 Abstractions, Architecture, Mechanisms, and a Middleware for Networked Control Scott Graham, Girish Baliga, and P. R. Kumar

Abstract— We focus on the mechanism half of the policy- was met by the explosion of work on state-space based mechanism divide for networked control systems, and ad- design by Kalman, Pontryagin, and others. dress the issue of what are the appropriate abstractions Now, about fifty years later, we may be on the thresh- and architecture to facilitate their development and de- ployment. We propose an abstraction of “virtual colloca- old of a third technological revolution in control plat- tion” and its realization by the infrastructure of forms. This is driven by advances in VLSI hardware, middleware. Control applications are to be developed as a wireline and wireless communication networking, and collection of software components that communicate with advanced software such as middleware. There has been each other through the middleware, called Etherware. The tremendous growth in embedded computers, with 98% middleware handles the complexities of network operation, such as addressing, start-up, configuration and interfaces, of microprocessors now sold being embedded [2]. In by encapsulating application components in “Shells” which communication, besides the Internet, we may be on the mediate component interactions with the rest of the system. cusp of a wireless revolution with Wi-Fi (IEEE 802.11x) The middleware also provides mechanisms to alleviate the experiencing double-digit growth since 2000 [3]. effects of uncertain delays and packet losses over wireless Less recognized is that has also channels, component failures, and distributed clocks. This is done through externalization of component state, with evolved significantly in the last two decades. Although primitives to capture and reuse it for component restarts, many basic ideas such as object oriented programming upgrades, and migration, and through services such as clock and distributed were mooted early on, they synchronization. We further propose an accompanying use have been brought to fruition only recently due to better of local temporal autonomy for reliability, and describe the understanding of software management and advances in implementation as well as some experimental results over a traffic control testbed. hardware. In particular, experience in design of large and complex software programs has been well codified and Index Terms— Abstractions, Architecture, Middleware, widely reused through design patterns such as Strategy Mechanisms, Networked Control, Third Generation Control, Networked Embedded Control Systems. and Memento [4], software frameworks such as Model- View-Controller (MVC) [4], and processes such as eXtreme Programming (XP) and Ra- I. I tional Unified Process (RUP) [5]. The development and widespread use of software libraries and infrastructure A. A historical perspective such as middleware has drastically reduced develop- The first generation of platforms for control systems in ment cycle times, and resulted in better software. A the electronic age that began with the vacuum tube was good indicator of advances made is that the level of based on analog computing1 . This created a need for abstractions, particularly in programming languages, is theories to use the technology of operational amplifiers. much higher today. primitives have The challenge was met by Black, Bode, Nyquist and evolved from registers and variables to libraries, compo- others, who developed frequency domain based design nents, messages, and remote procedure calls. A typical methods. A second era of digital control systems com- mid-range automobile has about 45 micro-controllers menced around 1960, based on the technology of digital connected by Controller Area Networks (CAN) and uses computers. It too created a need for new theories for complex software [6]. This has thrust software engineer- exploiting the capabilities of digital computing, which ing into a central role [7] since the development and testing of complex software is expensive and fraught This material is based upon work partially supported by USARO with numerous problems. There is much impetus to Contracts W911NF-08-1-0238 and W-911-NF-0710287, NSF Contracts reuse previously developed and tested software, since ECCS-0701604, CNS-07-21992, CNS-0626584, CNS-05-19535, and CCR- 0325716, DARPA/AFOSR Contract F49620-02-1-0325, DARPA Con- it not only reduces the effort and expenses involved, but tact N00014-0-1-1-0576, and AFOSR Contract F49620-02-1-0217. also enables sharing of valuable experience. About 40% Please address correspondence to [email protected]. of the software for the Boeing 777 airplane was of the AFIT, USA. [email protected]. Google, Mountain View, CA 94043. [email protected]. commercial-off-the-shelf (COTS) variety [8]. CSL and Dept. of ECE, Univ. of Illinois, 1308 West Main St., Urbana, These developments present new opportunities for IL 61801, USA. Email: [email protected]. control, as well as several challenges. Properly har- 1Mindell [1] provides a detailed account of how control, communica- tion and computation were intertwined in the first half of the twentieth nessed, they can potentially revolutionize control system century. platforms and lead to a new era of system building. 2

B. The importance of abstractions and architecture

In order for networked control systems to proliferate, they will need to be made easy to design and easy to deploy. This is the objective in this paper. The goal addressed is how to reduce the design and development time. This also requires alleviation of the burden on the control system designer to be an expert in issues which lie outside the domain of control, by minimizing the attention needing to be paid to extraneous details. An analogy is general purpose computing, where a com- putational platform is easily usable by multiple users with different applications. Another analogy is general purpose networking where computers can be easily in- Fig. 1. Networked control system: Software component architecture terconnected to communicate with each other regardless of the envisaged application. These objectives require the identification of well de- terface between hardware and software is what today signed abstractions, the design of a matching architecture, allows hardware and software designers to develop their and the development of a supporting middleware. It is products separately, resulting in the proliferation of se- also important to provide services that render design rial computation. In contrast, parallel computation lacks easy and quick, as well as design principles that enhance such a “von Neumann bridge” [12], which, arguably, has reliability. This paper can be regarded as focusing on the stifled its proliferation. “mechanism“” half of the mechanism-policy divide. The The development of appropriate abstractions and development of a holistic and systematic theory, that has matching architecture has been fundamental to the pro- been the grand project of control systems research since liferation of other successful technologies too. The reason the 1950s, can be regarded as the other “policy” half of for the success of networked communication is, arguably, the mechanism-policy divide. fundamentally its architecture, and, only secondarily, the An analogy with the developments leading to mod- involved, though of course they too are very ern computation and is appropriate. important. In the Open Systems Interconnection (OSI) The work of Alonzo Church and Alan Turing laid the architecture [13], each layer provides a service to the theoretical foundations of sequential computing. They layer above it, and in turn can be oblivious to the details independently developed formal systems to define and of layers beneath it. These capabilities make networks model the notion of computable functions [9]. However, robust and evolvable, and give longevity to the basic de- it was John von Neumann’s subsequent ideas about sign. For example, over the course of the years, different organizing these concepts into an architecture that is versions of the transport layer protocol, TCP, have been the basis of today’s sequential computers. It led to the proposed and deployed, without necessitating a change practical realization of the theoretical ideas. In particular, in other layers. The overall design allows heterogeneous the stored program concept [10] was a significant break- systems to be composed in a plug and play fashion, through. It has led to the development of simple and uni- making them amenable to massive proliferation, and has form mechanisms to write, test, and maintain complex resulted in the Internet. The massive proliferation has programs today. The von Neumann architecture had a led to reduced cost per unit over the long run, and wide influence on the first generation of digital computer resulted in minimizing deployment costs for new net- engineers in the 1950s. Till the mid 1960s, commercial works, further stimulating proliferation. A similar role computers produced by IBM, Burroughs, UNIVAC, and has been played in digital communication by Shannon’s others, were still custom built machines. However, the source/channel separation, one of the rare architectural IBM 360, first shipped in June 1966, changed all that. It principles resulting from mathematical theory. Source was the first commercially successful general purpose coding is often performed in software by algorithms computer, with the name 360 (degrees) intended to such as JPEG, while channel coding techniques such as market its “all round” general purpose capabilities. The QPSK are performed in hardware by network interface 360/370 series introduced the notion of compatibility in cards [14]. Closer home, in control, it is a standard ab- computers [11]. All was built to straction to view the plant separately from its controller. a common set of abstractions such as instruction sets, This is however obvious only in retrospect, and, even memory models, etc. Consequently, the same software now, is not routine in other fields; for example, it is diffi- could run on all the different machines in this series. cult to routinely simulate new policies in manufacturing This standardization tremendously reduced software de- system simulation software where plant and controller velopment costs. More importantly, it solved problems are intermixed. Another architectural result, also one due to changing customer requirements by supporting with a mathematical foundation, is the separation of seamless upgrades of hardware and software. This in- estimation and control, which considerably simplifies 3 design of control systems. The situation in control systems today is similar to that of computers in the early 1960s, with the critical present requirement being a well designed organizing architecture that will make them amenable to massive proliferation. Complex systems design can involve a basic trade-off between architecture and performance. A system designed with a more extensible architecture has redundancies to improve design flexibility, while a system optimized for performance has tighter cou- pling between sub-systems at the cost of lower design flexibility. While the latter approach is better in the Fig. 2. Networked Control System: Traffic Control Testbed short term, the former is better in the long term. Im- proved design flexibility implies easier system exten- C. Summary of contributions sibility, and modular design. Additional functionality to address changing requirements can be incorporated Rather than implementing a classical control system much more easily into systems which have an architec- as simply a software program involving the control laws ture that allows for evolution. Modular design allows and plant interfaces, it is preferable to use a component different sub-systems to be developed separately and architecture [15], [16], [17], which consists of decomposing then integrated into operational systems. Further, these it into components, which are autonomous software mod- sub-systems can be changed, or even reused in other ules with well-defined functionalities. As an example, StateEstimator Supervisor Controller Sensor systems, with minimal impact on the rest of the system. , , , , and Actuator All this results in systems that are more reliable and , are all possible components in Figure 1. A have a longer useful lifetime. Moreover, more general component architecture has several benefits. It allows architecture allows recovery or even enhancement of individual components to be developed separately and system performance by facilitating sophisticated self- integrated later, which is important for development of optimization capabilities. As an example, we will show large systems. Since components are well-defined, they how one may deploy mechanisms such as automated can be replaced without affecting the rest of the system. software migration to optimize the use of resources and For instance, a zero-order hold filter can be dynamically loop delays at run-time, by designing a sufficiently rich replaced by a Kalman filter without having to change the software infrastructure that supports such mechanisms. remaining software or restarting an operational system. Software reuse is promoted, since a component such There are several key challenges. Distributed opera- as a control tested on one system can be tion introduces problems due to concurrency and non- transplanted into others. Mechanisms that support com- determinism in program execution. The network compli- ponent interface specification, including sequence and cates the system further with configuration, operational, types of component interactions and format and contents and maintenance problems. While present networking of messages exchanged, and explicit specification of technology provides some network abstractions, there design assumptions, can simplify integration problems remain numerous configuration details, such as deter- while reusing components. mining network addresses and protocols, which need Component based design is based on primitives such to be resolved in each application. These details occupy as components and messages, which are at a higher substantial portions of design time and effort, motivating level of abstraction than sockets and processes supported the need for even higher-level abstractions. Another by operating systems. These abstractions can be created issue is that wireless links are prone to unpredictable by software infrastructure such as middleware, which delays and high losses. Hence, support for methodolo- has emerged as a preeminent framework to develop gies to facilitate control system design in the presence large distributed applications. Since systems integration of such characteristics will have to be provided. What and testing constitute the most expensive development is required is an infrastructure that supports reusable activities in the engineering of complex software [5], solutions for common problems. a lot of effort has been directed at the development One caveat is in order. Just as general purpose comput- of software tools and architectures. Many successful ing is not targeted at high performance problems such as applications have been developed using a CORBA based scientific computing, so also “general purpose” control architecture [18]. Currently, IBM [19], Microsoft [20], and may be inappropriate for situations where only a highly SUN [21] promote their middleware based technologies coupled solution, highly customized in hardware and as platforms for software engineering. We anticipate software, can meet stringent requirements. However, that the next generation of control applications will also even then, in an extensible architecture, one could embed follow this path, and present such a proposal. lower level custom code in key subsystems to enhance The middleware we have developed is called Ether- performance. ware. It requires that the control system designer only 4 write the logic of the control law in components, with- testbed2 featuring a networked system of cars, vision out being burdened with how it will be executed. The sensors, wireless and wired networks, and the full range information flow between components is routed through of control hierarchy including regulation, tracking, tra- the middleware, without a component needing to be jectory generation, planning and scheduling, along with aware of where a destination component is executing, a discrete event system for liveness and safety. thus providing location independence of components. The This paper does not address the issue of theories middleware also supports semantic addressing. It further needed for networked control. It only focuses on the supports facilities such as migration, whereby a Kalman mechanism half of the mechanism-policy division of re- Filter Component can migrate to a more computationally search. Continuing theoretical effort, conducted on the powerful node or a node with lesser latency between policy front, is needed to fully realize the potential of it and the sensor, or between it and the actuator. The networked control. The current situation is that technol- middleware also provides features to enhance reliability, ogy has overtaken theory in some respects, in that we a key performance requirement for control systems. It do not know how to optimally utilize the capabilities allows automatic restart upon failure of a component, already supported in the middleware. We conclude by through mechanisms such as checkpointing of the state. providing some examples of theoretical challenges for The middleware employs several Design Patterns [4] future research. that codify good design solutions, capturing best prac-    tices, thus leading to much better reuse of design. We II. D have incorporated the design patterns of Strategy, Me- While describing the middleware, how the networked mento, and Facade in Etherware. The Strategy pattern has control system is to be implemented, and what func- led to replaceable control laws, Memento has resulted in tionalities the middleware can provide, it is helpful to component state check-pointing mechanisms, and Facade refer to specific illustratory examples. Hence we provide has allowed a simple and uniform middleware interface. a brief account of the networked control testbed, see Fig- The major abstraction that we propose and which ure 2, on which all of the above has been implemented. the middleware realizes is a Virtually Collocated System, It consists of a set of remote controlled cars with no on- which allows control engineers to design control laws board computational capabilities. Each car is controlled for networked systems using the same techniques as by radio transmitter over a dedicated radio frequency for centralized systems, i.e., as though the distributed channel connected to the serial port of a laptop through application were executing on a single computer. The a micro-controller, which converts discrete commands control system designer need not concern herself with from the laptop into analog controls specifying the speed extraneous details of a networked system, and needs to and steering angle in discrete steps. Commands can be focus only on the details relevant to the control law. sent at a rate of up to 50 Hz, i.e., one command every It can potentially significantly reduce design time for 20ms. Each car has a chassis top with uniquely coded networked control. color patches that are used to identify its position and orientation. The cars are monitored using a pair of ceiling Architecturally, we propose that this abstraction be mounted cameras. The video feed from each camera is regarded as a Virtually Collocated System Layer (VCL) that processed by an image processing algorithm executing resides above the Transport Layer. It can be regarded as on a dedicated desktop computer. This feedback is avail- a replacement for the Presentation and Session Layers, able at the rate of 10Hz, i.e., one update every 100ms. which are anyway absent in the TCP-IP architecture. All computers in the testbed are connected optionally by Local Tem- We propose an accompanying principle of a wired Ethernet or by an ad hoc wireless network with poral Autonomy for design of reliable networked control IEEE 802.11 [23] PCMCIA cards. systems, where the goal is to design components so The architecture of the control application3 is illustrated that they can function for a limited period of time even in Figure 1 where the control software components in when neighboring components fail, thus allowing them the testbed are shown. The camera feed is processed to migrate, or for a neighbor to restart. This facilitates in VisionSensor components. A data fusion compo- robustness, reliable system evolution, development and nent, called the VisionServer, combines the sensor data debugging. We show how Kalman Filters, Receding from both vision systems and distributes the position Horizon Control, and Actuator Buffers, can all be used and orientation information for each car via the com- to provide Local Temporal Autonomy. munication network. Individual cars are operated by We identify services that are useful across a range of corresponding Controllers that implement Receding control systems and show how they can be supported. Horizon Model Predictive Control. Incoming sensor data One example is distributed time. This is useful since is passed to a StateEstimator module which imple- networked control systems employ time-driven comput- ments a Kalman Filter. Its prediction of future states is ing in addition to the usual event-driven computing. We show how such services are supported through a 2Movies are available at [22]. 3 middleware feature called ”Filters.” It should be specifically noted that this is distinct from the primary focus of this paper, the architecture of the middleware, the infrastructure We have implemented the middleware on a laboratory over which the application is run. 5 fed to the Controller, thus buffering it against delays important for design of distributed operations. It al- and jitter of the communication system. A software lows components to communicate without distinguish- module, called the Actuator, feeds commands to the ing between local and remote components. It also allows radio-control system. The Actuator is designed to buffer integrating components uniformly in different network a small horizon of future commands, so that it can configurations. For example, in the traffic testbed, the tolerate brief control outages, and stopping the car if the ability to easily switch between wired and wireless net- Controller is inoperative for too long. A Supervisor works, which use different addressing schemes, requires resides above the Controllers and oversees the op- such details to be abstracted away from the components. eration of the testbed. The Controllers need to be The abstraction also supports the future use of other given trajectories to operate the respective cars. They networking technologies such as Bluetooth [25], without are generated by the Supervisor, which also ensures having to update component code. global properties such as safety, the avoidance of car Service description. Connecting components in a net- collisions, and liveness, the elimination of traffic grid- work involves determining if appropriate components locks. The Supervisor generates trajectories along a pre- are executing in the system. If several such components specified network of roads, using algorithms described are available, then the most suitable component has to in [?]. Briefly, blocks of the road network are modeled be selected. This requires mechanisms to specify and dis- as bins in a corresponding graph. The planning prob- cover services provided by components. A car controller lem is formulated as finding shortest paths in a graph, that knows the geographic location of its car should be and the scheduling problem is modeled as assigning able to connect directly to a sensor covering that location, cars to bins while avoiding collisions and gridlocks. without having to know about sensor locations in the Finally, the Supervisor also receives feedback from the network. VisionServer, forming a higher-level control loop. Due Interface compatibility. Integrating independently de- to hardware constraints, the Actuator component for veloped software components can lead to interface in- each car, and the VisionSensor component for each compatibilities; e.g., the set of functions defined in the camera, must be executed on respective computers. All interface of a sensor module may not be compatible with other components can execute on any computer in the the functions required by a controller component. This testbed. can be eliminated if standard interfaces are specified and We have tested the system with up to eight cars oper- supported for compatibility. ating simultaneously, and closely following pre-specified Semantics. Incongruent assumptions in the implementa- trajectories. Examples of traffic scenarios implemented tion of components can lead to problems. The interaction are pursuit–evasion where a set of software controlled cars between different software components is usually based follow a manually controlled car, and automatic collision on an implicit finite state machine (see Figure 6), with avoidance; see the videos at [22]. exchanged messages assumed to be in specific formats. However, current interface description languages such III. N    as the CORBA IDL [26] do not provide a mechanism We begin by identifying requirements for common to specify these assumptions properly. For example, Controller functionalities needed for general networked control suppose the in Figure 1 is implemented so VisionServer applications that should be part of a control-specific that it checks for an update at the before VisionServer middleware. They may be grouped into Operational Re- computing controls. This assumes that quirements, Non-functional Requirements, and Management responds immediately, returning an update if available, VisionServer Requirements. and none otherwise. However, if is im- plemented so that it checks with VisionSensors for updates before responding, the additional delay may A. Operational Requirements lead to failure as Controller is also waiting for an Distributed operation. Connecting diverse compo- update. Hence, Controller and VisionServer may have nents executing on different nodes on a network leads consistent interfaces, but the interaction semantics may to numerous problems, including the need to locate still be incompatible. Consequently, there must be addi- and connect related components across a network, and tional provision for specifying interaction semantics in support the exchange of messages between connected interface descriptions. components. Examples of the problems that arise are Distributed time. Components executing on different initializing the components, connecting them, develop- computers do not share common clocks, and need a ing protocols for their interaction, and synchronizing mechanism to correctly translate time. For example, their operation to resolve conflicts. In the traffic control time-stamps on remote sensor updates must be properly testbed, the Supervisor and the various Controllers translated to local time to avoid collisions. execute on different computers. Location independence. This is an abstraction that pro- B. Non-functional Requirements vides an addressing scheme for components that is These are features that are not necessarily required independent of their actual location in the network, for the correct operation of the system, but support 6 for which would significantly improve system perfor- Filter. However, what is of more interest to us here is mance.4 that, architecturally, the interposed State Estimator can Robustness. Robustness being a fundamental require- continue to provide position estimates of the cars, even ment for the viability of networked control, component if sensor measurement updates are lost or delayed, thus failures must be contained and their effect on the overall enhancing the local temporal autonomy of the downstream system minimized. For instance, the failure of a faulty consumer of sensor information, Controller, from the sensor module should not immediately cause a con- producer, Sensor. nected controller to fail as well. This requires dependen- We further enhance reliability by using receding hori- cies between components to be eliminated or replaced zon control where the controller sends a sequence of by use-only if available relationships as much as possible. future controls that can be used to control the car for a Delay-reliability trade-off. Reliable delivery of data over longer period, rather than sending just one control input a network introduces additional delay due to retransmis- [27]. Future controls are stored in an Actuator Buffer at the sion of lost or re-ordered packets. However, some com- actuator, and used in case updates are delayed or lost. A ponents may not require reliable delivery, and should similar buffer at the controller holds future way-points therefore be able to trade-off reliability for lower average from Supervisor. delay. For example, time-stamped sensor updates are By providing support for such strategies in the mid- more useful when delays are smaller, even though a few dleware, the cost of application design complexity can updates may be lost over the network. be reduced. Since critical components can now endure Other requirements. Algorithms should have good scal- some delays, we have been able to implement the entire ability. Security is also a key requirement and the system system using only soft real time control5. must be protected from misbehaving or malignant com- Stability considerations. Robustness requires the ability ponents. to restart application components efficiently to maintain system stability. For example, if Controller in Fig- C. Management Requirements ure 1 fails due to a software error, then restarting it should not require reestablishing communication with These are required for starting, managing, and updat- VisionSensor, reinitializing Actuator, or re-establishing ing components in an operational system. Controller state, as these delays could cause system Startup. Programmable interfaces to startup procedures instabilities and result in car collisions. that allow the specification of dependencies between This motivates a simple and uniform design to ex- components help ensure correct overall system initial- ternalize component state, so that it can be reinitialized ization. For instance, a controller component may need with this state, and restarted at the same node or even to be started up before a related actuator component. migrated to a different location and restarted there. System evolution. It is necessary to be able to migrate or Etherware enforces component state externalization as update components at run-time. Component migration a basic architectural precept. Each component is instan- is required to optimize the configuration of software tiated with an initial state, and is required to support components to balance or reduce communication and a state check-pointing mechanism. On a check-point Controller computational loads. For example, a in the request, it has to return a state object that can be used testbed can be migrated to another computer, where it to reinitialize it upon restart, update, or migration. is closer to the corresponding Actuator or less loaded Maintenance of communication channels across such computationally, thus reducing loop delay. Supporting changes is also supported in Etherware. Identifiers for such run time optimization allows changing controllers communication channels can be saved as part of the to address evolving plant goals, without having to restart check-pointed state, allowing restarted or upgraded the whole system. components to continue using previously established channels. It also provides communication continuity to IV. E other components during such changes. We now present Etherware, a middleware for net- Message based communication. Components in net- worked control. We begin with the design considera- worked control systems have to respond to changes tions, followed by the usage model and architecture. as soon as possible. Upon detecting a safety violation, Application design. Control loops have to be closed over Controller may not be able to wait for an acknowl- communication channels that are unreliable, especially edgment from Supervisor before it takes action. Over in the case of a wireless network. We can mitigate a wireless channel in particular, delays can be large this by employing various application design strategies, due to deep fading and queued packets. Another im- particularly the principle of local temporal autonomy. As portant consideration is the presence of dependencies an example, noisy feedback to the controller is filtered in push-based communication channels. For example, using State Estimator, which implements a Kalman the controller cannot wait for the updates from the

4Strange as the terminology may seem, these may actually be more 5We are presently incorporating real-time support into Etherware important than functional requirements. [28]. 7 sensor before sending controls to Actuator since updates basic versions of services to be deployed initially for may be delayed or lost. Conventional middlewares for resource savings, and subsequently updated based on transaction based systems such as CORBA [26] use syn- changing application requirements, without having to chronous communication where a component sending restart the system. Hence, invariant aspects of the mid- a message is blocked until it receives a reply. This dleware, which cannot be changed dynamically, have to leads to complex design with each component requiring be minimized to maximize flexibility. multiple threads of control, since a separate receiving These considerations have motivated us to adopt a thread is required for blocking on each channel. On micro-kernel [31] based design. This consists of using a the other hand, asynchronous operation eliminates this simple and efficient kernel, which has only the most source of complexity since a component is not blocked basic functionality needed for minimal operation, and when sending a message. which will not need to be changed during system op- Based on these considerations, Etherware has been de- eration. This consists of functionality to deliver mes- veloped as a message-oriented middleware. Message based sages between components, and primitives for creating communication requires a specification of message for- new components. All other middleware functionality is mats. Support for interface and semantic compatibil- implemented as components, just as the application is; ity during changes and component reuse requires this hence these can be changed without having to restart specification to be flexible, extensible, and backward the system. This philosophy of maximizing flexibility compatible. Flexibility is the ability to easily incorporate has also resulted in the development of a bare minimum changes in the interface and semantics of a component, functional interface for components to interact with the and extensibility supports ease of adding specifications middleware. Flexibility is increased since the above de- for new functionality, while still honoring the original sign minimizes the number of aspects in the functional specifications used by older components to communicate interface that need to be changed or upgraded during with it. Based on these requirements, we have used XML system operation. For uniformity, application compo- [29] as the language for messages, with all communica- nents interact with middleware services with messages, tion in Etherware through well-formed XML documents just as though they were other application components. with appropriately defined and extensible formats. For platform independence, and due to availability of sup- A. Etherware usage model port for XML, Etherware has been implemented using the Java [30]. While these choices We next turn to the programming or usage model for incur some additional processing overhead, we believe Etherware, that is, the abstractions provided to control that the use of open standards and advances in computer engineers and software using Etherware. hardware compensate for this. A component in Etherware can participate in control Architecture. An important architecture design decision hierarchies. For example, Controller in Figure 1 takes was to choose an execution model for active components, goals from the higher-level Supervisor and sends com- such as sensors, which need their own threads of con- mands to the lower-level Actuator. The VisionServer trol. We decided to go with thread-per-active-component of Figure 1 takes data inputs from VisionSensors and instead of process-per-active-component model for the provides data to Controllers. following reasons. First, support for Thread management Etherware is an asynchronous message-based middle- is much better than support for Process management in ware. Components communicate by exchanging mes- Java. For instance, support for thread prioritization and sages, which are well formed XML documents [29]. XML thread interruption is available in a standardized and documents can be directly manipulated as large strings. platform-independent interface. Second, since Threads However, Etherware also provides a hierarchy of Java are much lighter-weight entities than Processes, their classes which provide various primitives to manipulate creation and update time is much smaller. This is well the underlying XML document across a Java based illustrated in Section VIII-A where we see that a Process interface. This hierarchy can be extended to support restart is on the order of seconds, while a Thread restart additional messages with user-defined XML formats, in is on the order of tens of milliseconds. Finally, having addition to the predefined messages with specific XML one Etherware process per computer makes it easier formats that are encoded in these classes. to define and use resources such as published network Two basic problems attending message delivery in ports for remote communication. However, individual distributed systems are discovery and identification of active components can create and manage their own destination components. The identification problem is processes if necessary. For instance, due to programming solved in Etherware by associating a globally unique language and platform constraints, the VisionSensor id, called a Binding, to each component. The discovery components for cars in the testbed create and manage problem is solved by associating semantic profiles to processes to collect data from the frame grabbers con- addressable components. Each component that needs nected to the camera outputs. to be addressed registers one or more profiles with Services provided by the middleware need to be eas- Etherware. A profile describes the set of services that ily restartable and upgradeable. Supporting this allows a component provides. For example, a profile for a 8

Sensor Kalman Filter Controller Higher Component

Control in Feedback

MessageStream Filter Component Data in Control Law Data out Input Component State Output Component (Strategy) (Memento) Fig. 3. Filters for MessageStreams Messages Shell (Facade)

Control out Feedback

Lower Component vision sensor could specify the type of camera and the geographic region covered by it, and a car controller Fig. 4. Programming model for a Component in Etherware could use this information to connect to a relevant vision sensor. All messages have the following three XML tags or addressed using appropriate design patterns. constituents: Memento. Support for restarts and upgrades requires Recipient Profile. This identifies the recipient of the mes- the ability to externalize and capture application state. sage. A profile can be a service description as above, or This is solved by the Memento pattern, wherein com- the globally unique Binding of a component. ponent state can be check-pointed and restored on re- Content. This represents the contents of the message initialization. including application specific information. Strategy. System stability is enhanced by the ability Time-stamp. Each message has a time-stamp associated to replace components without disrupting service. In with it which is automatically translated to the local particular, if the functional (syntactic) interface used clock on that computer as a message moves from one to communicate with the component is invariant, then computer to another. components can be replaced dynamically. In this case, By default, messages are delivered reliably and in the Strategy pattern is used in conjunction with the order. However, there may be streams of messages that Memento pattern. need to be delivered using other specifications as well. Facade. Interaction with various services in the middle- To identify and manipulate a stream of messages as ware usually requires a component to send messages a separate entity, Etherware supports the notion of a using function calls. If the component has to directly MessageStream. A component can open a MessageStream call functions on the various sub-systems, the structure to another component and send messages through it. of these subsystems needs to be programmed in it. MessageStreams have settings that can be used to specify This introduces unnecessary dependencies and makes how messages are delivered through them. For exam- the component and the middleware hard to evolve, as ple, Controller can tolerate a few lost sensor updates changing the middleware architecture will require the for lower delay, but has no use for old updates. Ac- software of all the components to be updated as well. cordingly, VisionSensor could open a MessageStream to This is eliminated by using the Facade pattern to provide Controller requesting unreliable in-order delivery, as a uniform functional interface that is independent of the shown in Figure 3. This means that messages along actual middleware architecture. this pipe will not be retransmitted if lost, and messages Components can be active or passive. Passive compo- arriving late will be discarded. In addition, Etherware nents do not have any active threads of control. They provides MulticastStreams that support efficient multi- only respond to incoming messages by processing them source multi-cast of messages. appropriately and generating resulting messages if any. Etherware provides the capability to add, at run time, Active components have one or more active threads of Filters that can intercept all messages sent to, or received control. They can generate messages based on activi- by, a component, since it may be necessary to modify ties in their individual threads of control. For example, VisionSensor messages in a MessageStream in response to changes can be implemented as an active compo- in operating conditions. For example, updates from nent with a separate thread to block on sensor hardware VisionSensor could get noisy due to bad lighting con- updates, and generate appropriate messages when new ditions, and it is desirable to have the capability to filter sensor data is available. out this noise without having to change Sensor or the Controller. Figure 3 shows the effective configuration B. Etherware architecture after a Kalman filter has been added to the message pipe The architecture of Etherware is based on the micro- between VisionSensor and Controller. kernel concept shown in Figure 5. As motivated in The design of a generic Etherware component is Section IV, the Kernel represents the minimum invariant shown in Figure 4. It is based on several well known in Etherware. All other services are implemented as com- “design patterns” [4]. In software development, many ponents. The Kernel manages all components on a given design problems have a common recurring theme, and computer. The function of the Kernel is to create new design patterns represent solutions that exploit the components, and deliver messages between its (local) theme. We now consider the various design problems components. We allow one Etherware process per node for Etherware components, and describe how these are in the current implementation. 9

Shell Shell Shell Start Component Active Service Component Component RegisterProfileEvent, RegisterTickEvent

Messages Messages ControlEvent Receive

OperateCar TickEvent KERNEL Scheduler Operate

Fig. 5. Architecture of Etherware Fig. 6. Finite state automata for Actuator semantics

The Kernel has a Scheduler that is responsible for it is to time-stamp information once we have the notion scheduling all messages and threads. The Scheduler can of Filters). Time translation is based on computing clock be replaced dynamically if more complex scheduling offsets using the Control Time Protocol [32]. algorithms are necessary as the application evolves. In addition, the Scheduler also provides a notification ser- vice, whereby a component can request to be notified C. Etherware capabilities with alarms after a given delay. This allows components We now describe how the requirements listed in Sec- to wait, without having to create separate threads of tion III are addressed in Etherware. control for this purpose. Components can also register Operational Requirements. These requirements are to receive periodic notifications or ticks. This has been addressed primarily by the Etherware programming used to implement all soft real time control in the model. testbed. The car Controller operates at 10 Hz, and has Distributed operation. Components executing on differ- been implemented as a passive component. For periodic ent computers can communicate with each other using activation, it registers with Scheduler to receive periodic the same primitives as used for local interactions. The tick messages every 100ms. following mechanisms collectively implement location Each component is encapsulated in its own Shell as independence in Etherware. Each component is assigned shown in Figure 5. A Shell presents a facade to the a globally unique-id called a Binding. This addressing component, and provides a uniform interface for it to in- scheme is independent of network location and topology. teract with the rest of the system. Shells also encapsulate Each Binding is mapped into a network specific address, component specific information such as configurations of such as an IP address, and a component id on a given MessageStreams. Activities involved in component restart, computer. This mapping can change as components mi- upgrade, and migration have also been implemented in grate. Local routing of messages on a single computer is Shells. performed by the Kernel. Routing of messages between All other functionality in Etherware is provided by ser- computers is done by the NetworkMessenger. vice components. An instance of each service component Service description. This is a consequence of addressable executes on each computer. The following basic services components being able to register service profiles. A are used: component that wishes to access a given service can ProfileRegistry. The ProfileRegistry is used to register and just address messages using the appropriate profile. The look up profiles of components. Profiles of newly created profile is then matched with registered profiles by the components are registered and recipients of messages ProfileRegistry as explained in Section IV-B. If a match addressed using profiles are determined by the registry. is found, then the message is directly forwarded to the Each node has a Local ProfileRegistry for components on appropriate component. If not, an appropriate exception that node. A network also has a Global ProfileRegistry for message is returned. components on all nodes in the network. Interface compatibility. This is primarily achieved by use NetworkMessenger. This sends and receives remote mes- of XML documents for communication between compo- sages, i.e., messages between a local component and a re- nents. Besides, components use a simple and uniform mote component operating on another computer. Hence, functional interface provided by the Etherware Shell, it encapsulates all communication with remote nodes and as suggested by the Facade pattern in Section IV- over the network, including details such as IP addresses, A. ports, and transport layer protocols. By default, all mes- Semantics. This requirement can be properly addressed sages addressed to remote components are forwarded by an interface description language (IDL), which is by the Kernel to this service. The NetworkMessenger is a language to describe the functional and interaction an active component with separate threads to receive interface exposed by a component. The Etherware pro- messages from remote nodes. gramming model mandates message based interactions NetworkTimeService. This service translates time-stamps between components. This requires components to in- of messages as they are transmitted from one node corporate interaction semantics that can be formally to another. To implement this, a NetworkTimeService specified using a finite state machine [33]. It includes component is added as a Filter to the NetworkMessenger, specification of the messages that a component can send so that it intercepts all messages that are sent to and or receive, and conditions under which it can send or re- received from other nodes. (It should be noted how easy ceive specific messages. Figure 6 represents the FSA that 10 specifies Actuator semantics. The labels on the arrows Distributed time. This is provided by the NetworkTimeSer- represent messages and actions, with overbars indicating vice. received messages. Etherware component semantics can Non-functional requirements. These requirements have be formally specified in an executable fashion in the been addressed by various architectural features and rewriting logic based Maude [34], as services in Etherware. below: Robustness. This is facilitated primarily by the use of mod TESTBED is the Memento pattern. It allows the effect of component pr RAT . pr QID . failures to be contained by efficient check-point and inc CONFIGURATION . restart mechanisms in Etherware. The efficiency of these subsort Qid < Oid . techniques is considered in Section VIII. sorts ActuatorState State . subsort ActuatorState < State . Delay-reliability trade-off. MessageStreams support trading sorts Control Controls . off reliability for lower delays. They also provide a subsort Control < Controls < Attribute . simple mechanism to incorporate support for other QoS ops startA receiveA : -> ActuatorState . op state :_ : State -> Attribute . requirements. op car :_ : String -> Attribute . Other requirements. The current algorithms for various op Actuator : -> Cid . op registerMsg(_,_) : Oid Qid -> Msg . services scale well for the testbed requirements. We have op controlMsg(_,_,_) : Oid Int Rat -> Msg . op actionMsg(_,_) : Int Rat -> Msg . tested them by operating up to eight cars at a time, op tickMsg(_) : Oid -> Msg . which represents the scenario with the maximum load in op control(_,_) : Int Rat -> Control . op nil : -> Controls . the system. Further, component Shells provide a uniform op _ _ : Controls Controls -> Controls [assoc id: nil] . location to potentially incorporate security features. op newActuator(_,_) : Qid String -> Object . The Etherware architecture essentially trades-off sys- var A : Oid . var S : String . tem performance (verbose protocols, java vms, soft real- var I : Int . var R : Rat . time operation, etc.) to support complex functionality var AS : AttributeSet . var C : Control . and interactions between components. However, high var CS : Controls . performance critical control loops can still be imple- eq newActuator(A,S) = < A : Actuator | state : startA , car : S, nil > . mented using a locally optimized language and plat- rl [a-init] : < A : Actuator | state : startA , AS > => < A : Actuator | state : receiveA , AS > registerMsg(A,A) . form, and these can then be ”wrapped” with an Ether- rl [a-recv] : < A : Actuator | state : receiveA , AS , CS > controlMsg(A,I,R) ware component to plug it into a larger system. In => < A : Actuator | state : receiveA , AS , CS control(I,R) > . this design, Etherware acts as an integration platform rl [a-send] : < A : Actuator | AS , control(I,R) CS > tickMsg(A) => < A : Actuator | AS , CS > actionMsg(I,R) . for complex networked control systems, with higher- endm level supervisory control implemented for soft real-time Components and systems specified in this fashion can operation, connected to lower-level critical control loops be verified for validity and correctness using theorem implemented on native platforms for hard-real time proving tools in Maude. XML provides the representa- operation. tion for strongly typed messages, where the type and Management Requirements. These are addressed using format of all possible messages can be defined as part of configuration support and design: the interface. Startup. Start-up configurations and dependencies are Note that the IDL provided by Etherware is only a lan- specified to Etherware using a configuration file for each guage for specifying component interface and interaction computer. Etherware ensures that components are initial- semantics. The actual semantics for a given component ized in the correct order. Currently, the absolute order IDL specification are highly dependent on its domain in which components need to be initialized on a given of operation. For instance, the IDL specification of a node is specified, but support for more complicated VisionSensor component could specify the refresh rate dependencies can be easily supported using a richer using an XML tag called ”RefreshRate,” but for other dependency evaluation system such as Apache Ant [35]. components to correctly use its sensory output, they System evolution. System evolution through component must understand that the ”RefreshRate” tag identifies update and migration, is supported by the use of Me- the specification of the refresh rate of the vision sensor, mento pattern. Application state is maintained across i.e., they must have a common shared vocabulary with both operations using Mementos. Connections using Mes- consistent meanings. sageStreams are maintained across component updates. In addition, an Etherware proxy service has also been As noted in Section IV-A, MessageStreams and Filters implemented, which allows formal component specifica- also provide a mechanism for system evolution with- tions in Maude [34] to interact with regular components out having to change any existing connections between in Etherware. This enables rapid prototyping, where components. formal specifications can be checked for both logical correctness and operational stability. For instance, the V. S Actuator module specified above can be plugged into the actual testbed, and could emulate a real car in the In addition to the Etherware infrastructure itself, many system. control applications have many common sub-problems 11

OSI STACK PROPOSED STACK Sensor Controller

Application Layer Control software

Presentation Layer Virtual Collocation (Middleware) Session Layer State SensorEstimator Controller Transport Layer TCP, UDP

Network Layer IP Fig. 8. A State Estimator separating Sensor and Controller can Datalink Layer Network reduce execution and timing dependencies. between them Physical Layer Infrastructure

Fig. 7. Mapping from OSI stack to proposed architecture up computers in an appropriate order. Also, clocks are aligned and all sensor and other information is auto- matically time-stamped. These features correspond to whose solutions could be reused. We facilitate such reuse the Session and Presentation layers shown in Figure 7. into services bundled with the middleware. Some of the Utilizing the abstraction of a virtually collocated system services that we have considered are: provided by middleware, a system designer can focus Distributed time management. We have implemented a on algorithmic code for planning, scheduling, control, clock synchronization algorithm. adaptation, estimation, or identification. Plant delay estimation. A dynamic delay estimation service is of interest for closing loops since delays may VII. D   L T A change with time. Networked control systems will be subject to commu- Component repositories. Software component reposito- nication and node failures as well as component failures. ries, from where new control software can be down- We suggest the local temporal autonomy design principle: loaded and installed dynamically into operational sys- Design components to tolerate, with graceful degrada- tems would be useful in promoting use of new theories tion, failures in connections to, and operation of, other and algorithms. connected components. We provide three illustratory Dynamic system optimization. This is a useful service examples. to facilitate automatic determination of optimal system State Estimator: To match the unpredictable charac- configuration at run-time. For instance, whether to pro- teristics of the communication network with the pre- cess all the pixels from the camera at a certain node, dictability required by control, we insert a buffer, a thus stressing its computational resource, or to transfer StateEstimator, before controllers in the system, see them to another node, thus stressing the communication Figure 8. It removes the execution and timing dependen- network, is a low level decision that depends on the cies of Controller on the inherently unreliable commu- power of the processor at a node as well as the current nication channel , since it can be used by Controller to communication traffic load, and should be automatically interpolate between lost sensor updates. Such a design done. The control designer’s role can thus move to the can take aperiodic sensor data as input, and provide higher plane of specifying high level rules or algorithms continuous or periodic outputs to the Controller, the for such migration. format preferred by digital control design methodology. This improves local temporal autonomy of Controller,     VI. A L N C since it buffers it from failures in the sensor and the The Open Systems Interconnections (OSI) [13] archi- communication channel. tecture is a standard reference model for networked Actuator Buffer: Similarly, at the actuator end, instead of systems. Our proposed architecture is based on this providing just one current control input at each update model, and the mapping between the different layers to an actuator, by using receding horizon control, a is shown in Figure 7. Central to it is the abstraction Controller can provide a window of control inputs up of a virtually collocated system, where information ex- to a time horizon. This again improves the local temporal change between application components is supported autonomy of Actuator as it now has fall-back controls at the level of messages exchanged between seemingly in case subsequent controller updates do not arrive. local components. This abstraction is provided by the By weakening the dependence between components, middleware, which hides the details of a networked these approaches also facilitate dynamic component up- system such as IP addresses, network protocols, data grade and migration, in addition to simplifying design formats, and computational resources. Components can and testing. be identified using application specific content instead Migration for self-optimization and reliability: Com- of network addresses and ports. They can exchange ponents in the networked control system can execute at information on a regular basis, as information push any location. Their optimal physical locations depend whenever updates are available, or as pull on demand upon timing constraints, communication delays, compu- by an information consumer. Hardware or software can tational loads, etc., which may vary during operation. be added or removed while the system is running, and Hence it would be desirable for the execution of a com- the designer need not deal with issues like starting ponent to be able to migrate to a better location within 12 the system. To accomplish migration, the middleware mechanisms described in Section IV contributed to the incorporates several supporting functions such as the quick recoveries. First, the Shell intercepted exceptions components being able to continue to communicate after due to the Controller faults, and restarted it without the move. This involves updating the communication affecting the MessageStream connections to the other information at each of the nodes or components which components. Second, before termination, the Controller communicate with the migrating component, as well as state was check-pointed according to the Memento pat- updating the communication information at the compo- tern, and then used for re-initialization. To illustrate the nent itself. To help an application optimize decisions on performance of these two mechanisms, we also tested when and where to migrate a component, the middle- the alternative mechanism of restarting the Etherware ware must be able to estimate the loads on the physical process managing the Controller at about 100 seconds resources which will exist before and after the migration, after the start. We see that the restart of Etherware and which can be provided as a middleware service. the Controller took about three seconds, during which the car position accumulated a large error of about 0.8 VIII. E meters. This illustrates the necessity for efficient restarts. Furthermore, even though the Controller restarted after This section presents three experiments evaluating the three seconds, additional error was accumulated before performance of Etherware mechanisms. recovery. This was because the Controller had to re- connect to the other components, rebuild the state of the A. Controller Restart car, and bring it back on track. These avoidable delays are eliminated through efficient restart mechanisms in 1200 1500 Deviation (mm) Deviation (mm) Time of controller down Old controller down Etherware. Time of controller up New controller up 1000 Elapsed time to recover (ms) Elapsed time to recover (ms)

800 3219 1000

600 B. Controller Upgrade

400 500 47ms Deviation from trajectory (mm) Deviation from trajectory (mm) In the second experiment, we tested the performance 200 45 83 39 46 42 128 of software upgrade mechanisms in Etherware. The 0 0 0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 time (seconds) time (seconds) car is initially controlled by a coarse Controller that (a) Error in car trajectory due (b) Error in car trajectory due operates myopically. Etherware is then commanded, at to controller restarts to controller upgrade about 90 seconds, to upgrade the coarse Controller 500 Deviation (mm) Old controller down 450 Migrated controller up Controller Time to migrate (ms) to a better model predictive . From Figure 400

350 9(b), we see the improvement in the car operation after 300 update. The involved transients are within the system 250 200 error bounds as well. This functionality is due to three 491ms 150 Deviation from trajectory (mm) 100 key Etherware mechanisms from Section IV. First, the 50

0 Controller 0 10 20 30 40 50 60 70 80 90 Strategy pattern allows one to be replaced by time (seconds) another without any changes to the rest of the system. (c) Error in car trajectory due to controller migration Second, the Shell is able to upgrade the Controller without affecting the connections to the other compo- Fig. 9. Performances of controller restart, upgrade and migration nents. Finally, the Memento pattern allows the coarse Controller to check-point its state before termination. In the first experiment, Controller was restarted sev- This is then used to initialize the new Controller. eral times. Faults were injected at random by performing The first mechanism allows for simple upgrades, while an illegal operation (divide by zero) in Controller, the other two mechanisms minimize the impact of the which caused Controller to raise exceptions and be upgrade on other components and the car operation. restarted by Etherware. The deviation of the actual car positions from the desired trajectory, as a function of time, is shown in Figure 9(a). Restarts are indicated by C. Component Migration pointers, and the accompanying numbers indicate, in In the third experiment, the support for migration in milliseconds, the time for each restart. These are time- Etherware was tested. As before, the error in the car stamps at Observer, and include communication and trajectory is shown in Figure 9(c). The large spike at synchronization times between the restarted Controller the beginning of the graph was the error due to the car and Observer. The first restart occurred at about 70 trying to catch up with its trajectory. This is achieved at seconds into the experiment, and was followed by two about 10 seconds into the experiment, after which the other restarts in the next 20 seconds. The last three car follows the trajectory within an error of 50mm. In faults were also handled by the restart mechanisms. particular, the error introduced due to migration is well We see that the error in the car position during these within the operational error of the car. Two Etherware restarts was within the system error bounds during mechanisms enable migration. First, the Memento pattern normal operation prior to restart. Two of the Etherware allows the current state of Controller to be captured 13

Co!located components upon its termination. Second, the primitives in the Ker- Renewal Sensor nel on the remote computer allow a new controller to be Guarantees Controller started there with the check-pointed state of Controller. Zero delay Plant Zero delay Actuator IX. R W This section presents an overview of related earlier Fig. 10. Control Loop Model work in this area. CORBA [26] is probably the most popular middleware and has been used in a variety of different domains. It was primarily intended for transac- present an illustrative example. Consider the car control tions based business and enterprise systems. Hence, the loop model shown in Figure 10. The Controller and the trade-offs incorporated in CORBA are not all compatible Actuator are collocated, see [45], and there is negligible with requirements for control systems. For example, the delay between a Controller issuing a control, and the specification mandates the use of TCP [36] for reliable Actuator effecting it in the car. Suppose the feedback delivery. This can be a serious limitation if lower delay channel from the Sensor to the Controller is scheduled is more important than reliability, as such a trade-off according to the renewal task model, [46] where a task is cannot be supported. Popular versions of middleware a sequence of jobs that execute on a specified resource, for control applications are based on various flavors and where each job has a fixed computation time C and a of CORBA, such as Real time CORBA [37] and Min- relative deadline D, with the renewal referring to the fact imum CORBA [38]. For example, OCP [39] is based that a new job is released immediately after a previous on Real Time CORBA and has been used to control job is completed. A renewal task is characterized by unmanned aerial vehicles. ROFES [40] implements Real its task density ρ = C/D. In our setting, each job is a Time CORBA and Minimum CORBA, and is targeted computation of car controls, speed and orientation, to for real-time applications in embedded systems. Fault- be applied for a specific time interval. tolerant CORBA (FT-CORBA) [26] is the primary OMG We model a car itself as an oriented point in a lane, specification that addresses fault tolerance in distributed with state (d, θ), where d is the absolute distance of the systems. A key problem in using CORBA based middle- car from the center of the lane, and θ is the angle of ware is that the CORBA interface description language the car to the median of the lane. As a convention, the (IDL) is not descriptive enough to specify key assump- direction of traversal of the lane is assumed to be from tions in component design. In particular, issues involved left to right in what follows. We assume that a car has in semantic compatibility, such as description of data va- a single non-zero speed s, and four steering controls: lidity and interaction semantics, cannot be specified. This two in anti-clockwise direction, and two in clockwise leads to numerous problems while integrating indepen- direction, both allowing motion along circles of radii R dently developed, yet functionally compatible compo- and R with R < R. Based on the renewal task model for nents. Other interesting approaches include Giotto [41], sensory feedback, a car can move a bounded distance real-time framework [42] for robotics and automation, before it is observed again. Since the car moves at a and OSACA [43] for automation. A good overview of constant speed s, this corresponds to a deadline of Rα/s research and technology that has been developed for in the renewal task model. Finally, a control is sent implementing reusable, distributed control systems is immediately after the car has been observed. provided in [44]. Theorem 10.1: A set of cars can be driven along pre- specified routes without collisions (safety guarantee) or X. T  gridlocks (liveness guarantee), if the road network has 2γRR straight single-lane roads of length at least L = , Exploiting the capabilities of Etherware poses theo- R R retical challenges. At the tactical level, we need to be and lane-width at least 2(W + R(1 cos γ)) with W− = − 1 able to optimally exploit middleware capabilities already R(1 cos β(2 cos α 1)), α π/3, and β = cos− (2 cos α 1). present, such as migration, restart, and the services Proof−. We provide− a brief≤ outline; see [47] for details.− provided. Two examples of problems that arise are the First, it can be proved that a car can be controlled so that following. Since the middleware provides to the con- it stays within a road of width W = R(1 cos β(2 cos α 1)). troller the actual delay that has already been incurred Then it can be proved that the length− of the road− is by a packet containing a measurement, i.e., random but sufficient to keep the car within its lane when moving known delay, how should one design a controller that is from one lane to another at an intersection. Third, it can robust to plant uncertainties? Another is how to optimize be shown that a set of cars in a road network with single the “migration controller” as empirical delay profiles lane roads can be cleared by a supervisory algorithm de- between various origin-destination node pairs over a scribed in [?]. This algorithm models the traffic system as networked system change. a discrete graph, where a lane is modeled as a sequence At the strategic level, there is a need for a cross- of bins. The algorithm then moves cars between bins in domain theory that is holistic and includes real-time discrete time. This establishes the liveness guarantee. scheduling, plant uncertainties, and hybrid systems. We Next, it can be shown that the cars can be scheduled 14 without collisions by ensuring that at most one car can [12] L. G. Valiant, “A bridging model for parallel computation,” be in a bin at a given time. This is enforced during Communications of the ACM, vol. 33, no. 8, August 1990. [13] W. R. Stevens, TCP/IP Illustrated. bin transitions by keeping the bin in front of each car [14] V. Kawadia and P. R. Kumar, “A cautionary perspective on cross empty. This, in turn, is accomplished by associating an layer design,” IEEE Wireless Communication Magazine, pp. 3–11, additional ”virtual” car always in front of each real car. vol. 12, no. 1, Feb 2005. [15] CORBA Components, Version 3.0, Object Management Group, 2002. This final step establishes the safety guarantee for the [16] Microsoft .NET, http://www.microsoft.com/net/, Microsoft Inc. system.  [17] V. Matena and B. Stearns, Applying Enterprise JavaBeans: As we build more complex systems, a challenge is Component-Based Development for the J2EE Platform, ser. Java Series. Sun Microsystems Press, Jan 2001. to be able to automate such proofs of overall system [18] CORBA Success Stories, OMG Inc, correctness. http://www.corba.org/success.htm. [19] Middleware is Everywhere, IBM, http://www- 306.ibm.com/software/. XI. C  [20] Microsoft .NET, Microsoft Inc, http://www.microsoft.com/net/. [21] Jave 2 Platform, Enterprise Edition (J2EE), Sun Microsystems, This paper has focused on the mechanism half of http://java.sun.com/j2ee/. the policy-mechanism divide, and proposed an abstrac- [22] “IT convergence lab,” http://decision.csl.uiuc.edu/ testbed/. [23] IEEE 802 LAN/MAN Standards Committee, “Wireless∼ LAN tion and a middleware based approach to support the medium access control (MAC) and physical layer (PHY) design, development, and deployment of networked specifications,” IEEE Standard 802.11, 1999 edition, 1999. control systems. The Etherware design supports the de- [24] A. Giridhar and P.R.Kumar, “Scheduling Traffic on a Network of Roads,” IEEE Trans on Veh Tech, pp. 1467–1474, vol. 55, Sep 2006. velopment of applications as collections of components. [25] “The Official Bluetooth Website,” http://www.bluetooth.com/. The support for standardized interfaces and seamless [26] CORBA: Core Specification, Object Management Group, Dec 2002. integration enables the reuse of components in other [27] S. Kowshik, G. Baliga, S. Graham, and L. Sha, “Co-design based approach to improve robustness in networked control systems,” applications, which allows the building a of com- in Proceedings of the International Conference on Dependable Systems ponents embodying various control algorithms and de- and Networks, Yokohama, Japan, June 2005. signs that can be assembled into working applications in [28] K.-D. Kim and P. R. Kumar, “Architecture and mechanism design for real-time and fault-tolerant etherware for networked control,” deployed systems. Application development consists of in Proc 17th IFAC World Congress, Seoul, July 2008, pp. 9421–9426. specifying the interconnection of components and their [29] XML, W3C - Consortium, Oct 2000. individual configurations. Using Etherware primitives, [30] “Java 2 platform (j2se),” http://java.sun.com/j2se/. [31] Applied Concepts, John Wiley, 2000. system optimization can be supported by developing al- [32] S. Graham and P. R. Kumar, “Time in general-purpose control sys- gorithms to automatically determine optimal component tems: The control time protocol and an experimental evaluation,” interconnection configurations, and migrate components in Proc 43rd IEEE CDC, pp. 4004–4009, Dec 2004. [33] A. V. Aho and J. D. Ullman, Foundations of Computer Science. W. accordingly. H. Freemand and Co, 1995. [34] M. Clavel, F. Duran, S. Eker, P. Lincoln, N. Marti-Oliet, J. Meseguer, and C. Talcott, Maude Manual (Version 2.1), A http://maude.cs.uiuc.edu/maude2-manual/, Mar 2004. [35] “Apache ant,” http://ant.apache.org. We are grateful to Lui Sha for enthusiastic discussions [36] T. Socolofsky and C. Kale, RFC 1180 - TCP/IP Tutorial. and much wisdom over several years. [37] Real-Time CORBA Specification Version 2.0, OMG, Inc, Nov 2003. [38] Minimum Corba Specification, OMG, Inc, Aug 2002. [39] L. Wills, S. Sander, S. Kannan, A. Kahn, J. V. R. Prasad, and R D. Schrage, “An open control platform for reconfigurable, dis- tributed, hierarchical control systems,” in Proceedings of the Digital [1] D. A. Mindell, Between Human and Machine: Feedback, Control, and Avionics Systems Conference, Oct 2000. Computing Before Cybernetics. JHU Press, 2004. [40] “Rofes: Real-time corba for embedded systems,” [2] J. Stankovic, “Vest: A toolset for constructing and analyzing http://www.lfbs.rwth-aachen.de/users/stefan/rofes/. component based operating systems for embedded and real-time [41] T. Henzinger, B. Horowitz, and C. Kirsch, “Giotto: A time- systems,” Univ. of Virginia, Tech. Rep. TRCS-2000-19, 2000. triggered language for embedded programming,” in Proc. 1st Int. [3] A. L. K. Carter and N. McNeil, “Unlicensed and Workshop Embedded Software (DMSOFT ’01), Tahoe City, Ca, 2001. unshackled: A joint OSP-OET white paper on [42] A. Traub and R. Schraft, “An object-oriented realtime framework unlicensed devices and their regulatory issues,” for distributed control systems,” in Proc. IEEE Conference on http://hraunfoss.fcc.gov/edocs: public/attachmatch/DOC- Robotics and Automation, Detroit, Mi, May 1999. 234741A1.pdf, May 2003. [43] P. Lutz, W. Sperling, D. Fichtner, and R. Mackay, “Osaca - the [4] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns, vendor neutral control architecture,” in Proc. European Conference ser. Professional Computing Series. Addison-Wesley, 1995. on Integration in Manufacturing, Dresden, Germany, Sept 1997, pp. [5] I. Sommerville, Software Engineering, Addison-Wesley, 2000. 247–256. [6] R. Bannatyne, “Microcontrollers for the au- [44] B. S. Heck, L. M. Wills, and G. J. Vachtsevanos, “Software tech- tomobile,” Micro Control Journal, 2003, nology for implementing reusable, distributed control systems,” http://www.mcjournal.com/articles/arc105/arc105.htm. IEEE Control Systems Magazine, vol. 23, no. 1, February 2003. [7] R. Sanz and K.-E. Arzen, “Trends in software and control,” IEEE [45] C. Robinson and P. R. Kumar, “Optimizing controller location in Control Systems Magazine, June 2003. networked control systems with packet drops,” To appear in IEEE [8] R. J. Pehrson, “Software development for the Boeing 777,” The Journal on Selected Areas in Communications, Special Issue on Control Boeing Company, Tech. Rep., 1996. and Communications, 2008. [9] H. P. Barendregt and E. Barendsen, “Introduction to lambda [46] K.-J. L. C.-C. Han and C.-J. Hou, “Distance-constrained schedul- calculus,” in Aspenæs Workshop on Implementation of Functional ing and its applications to real-time systems,” IEEE Transactions Languages, G¨oteborg, 1988. on Computers, vol. 45, no. 7, pp. 814–826, 1996. [10] J. von Neumann, “First draft of a report on the edvac,” June 1945. [47] G. Baliga, “A Middleware Framework for Networked Con- [11] C. H. Ferguson and C. R. Morris, Computer Wars - The Fall of IBM trol Systems,” Ph.D. dissertation, Univ. of Illinois at Urbana- and the Future of Global Technology. Three Rivers Press, 1994. Champaign, 2005.